Finds through relationships are slow

This is for a payroll type database (e.g. tracking salary over time). This is under FMS18, and the payroll table has about a half million records (about 4000 per month)

The setup:

  • There's a single Payroll table with one record per person per month
  • it has a self relation Payroll Self by Last Month which relates to itself by ID and a stored, calculated Date field which is equal to one month prior.

The Find is fairly simple: find everyone on salary this month who was not on salary last month:

  • Enter find mode
  • set field Date = ThisMonth
  • set field Salary > 0
  • set field Payroll Self by Last Month::Salary = 0
  • Perform find

It works fine, but is SLOWW, taking about 3 minutes.

By contrast, I intentionally denormliazed the database by adding an auto-enter calc field

  • SalaryLastMonth = Payroll Self by Last Month::Salary

When I do the new find:

  • Enter find mode
  • set field Date = ThisMonth
  • set field Salary > 0
  • set field SalaryLastMonth = 0
  • Perform find

It's almost instant.

Why so slow? I've checked the usual subjects:

  • the ID field is numeric and indexed
  • the Date and DateLastMonth fields are stored and indexed
  • the Payroll Self by Last Month relationship is not sorted

Would you post a minimally sufficient (non private data) example to demonstrate your issue?

I'll second the request for an example. Related searches are not by definition and automatically slow so there is something in the setup that. makes it so.

In my experience, related searches that also depend on local search results are slow. I asked the same question, why so slow, a few times. I am by no means an authority on the subject, but have come up with a hypothesis.

In the case of your search, FileMaker will first find all records that satisfy the local criteria. It will then perform a search on the related criteria for each record it found in the first search. If the first search returned 4000 records, that translates to 4000 additional searches. The former search can be viewed as one AND search followed by a bunch of OR searches. The latter could be viewed as a single AND search.

Denormalizing the database as you have done allows FileMaker to perform a single AND search.

As I said, this is based on a hypothesis. It fits my observations to date. I also noticed that performance degrades as the number of records on either side of the relationship increases.

Unsure if this helps, but I hope it is useful food for thought.

1 Like

Here's a test file (about 15MB zipped)

username/password = admin/admin

When hosted on FMServer:

  • the Slow find (through a relationship) takes about 15 seconds each time it's run.
  • the Fast find (using only fields in a single table) takes about 2 seconds

When run locally in FMPro:

  • the Slow find (through a relationship) takes about 11 seconds on the first run, then under 2 seconds on subsequent runs.
  • the Fast find (using only fields in a single table) takes about 1 second each time it's run

So this test demonstrates a 7x to 10x speedup. It's not quite as dramatic as what I'm seeing in my real system, where the speedup is 30x or higher.

One notable difference is that in my real system, the table is very wide (having 160 fields) whereas in this test file, the table only has about 4 fields.

I wonder if what's happening is that in one case, FM Server is sending all the record data to the client, whereas in the other, the Find is happening server-side, and only the records which are found are sent?

I played with your file and found much the same thing as you were seeing. I fiddled around with a few alternative methods until I found one that performs reasonably well but it is not as fast as the search on a local field. It runs in 3 - 4 seconds.

Here's a link to the modified file. It's a temporary file transfer service so it will disappear on 21st Dec.

1 Like

A couple of notes.

  • If you are searching on a number field, why are you using the "==0" find, as opposed to "=0"? May or may not save you some ms on a big find, and depending on the data you are searching. You can sometimes force FileMaker to not use the index with a find like this.
  • As you get into hosted data, that data movement is going to be expounded significantly. As you see in your own tests. GTRR as Malcolm tested can work quickly sometimes, but on a hosted file, especially as the width and record count grows, you will see it become very painful. GTRR has definitely gotten slower as versions progressed.
  • Look at the file I posted below. Locally, it's not going to be as fast fast as a local search, but it will be consistent. Hosted, you should start to see a more significant difference between the approaches. Especially as the data set grows, and the wider the table is, the better it will perform compared to the other methods.

The idea behind my file is to get the list of IDs, drop them in a global field, and then GTRR ( going to related records, but only for the current record, NOT the found set ). This method, I have found, scales much better, and more linearly based on the found set. It also very often, works well over a WAN.

On my setup, this is what I was getting hosted:

  • Find local (fast) 1700-1800 ms
  • Find Related (slow) 14,000-15,000 ms
  • Find Related (gtrr traditional ) - never completed, measured in minutes.
  • Find Related (gtrr global) - 2700-3000 ms ( locally I was getting between 950-1150 ms )

The Find Related (gtrr global) approach will likely sit around the same speed ( 2-3 secs ) as long as you aren't trying to GTRR with 1 million rows in the global. The leaner that first list is, the faster this technique can be.

( Link expires on 12/21/2020 ) https://www.dropbox.com/t/TzqjmTIdQBAG3QmL

Interesting results @jormond. On my machine the local find performs in ~1200ms. After pump-priming ( second and subsequent runs after opening the file ) your global GTRR is ~1500ms and my GTRR is ~1400ms. The initial run for my method is ~3300ms but a file in use may already have the indices in memory so I think it's reasonable to compare second runs.

I know exactly what you mean about the performance hit that using GTRR across all records can have. It really is a calculated risk and I'd want to be sure that the environment and data sets were appropriate. That is the thinking behind searching for last months zero salary first. I'm expecting it to be the smallest initial dataset.

I was surprised to find that my GTRR was consistently faster than your global GTRR. I know that the time difference is only a blip but it was consistent. What that says to me is that there is something under the hood which is optimised for using GTRR in the way that I did and that there is a little bit of extra overhead required for using a large multi-key list to form the relationship.

Interesting - my understanding was that "==" meant "exactly equal to" and was typically a faster search than "=" which means "containing", so I would have expected "==" to be faster, if anything. Maybe I'm wrong?

With a numeric field, do "=" and "==" even do anything different? Worth a test.

It is over WAN, and not local, where you will see a difference in the performance of the 2 GTRR methods. In this case, it can be as vast as 3 secs vs more than 5 mins.

If I have to GTRR, other than really small data sets, I always use the global method. This only became the more standard approach for me in the last 2 or 3 versions.

1 Like

The answer to this question, as I've posed it to the engineers in the past, is ... well, complicated. It all depends on a lot of factors. If FileMaker can use the index, it will not make much of a difference. If not, it will be dependent on the find itself, and potentially ignore the index and do an unindexed find...that may also include sending a ton of data to the client to figure it out.

Often, I test the difference between just "0" ( no quotes ), "=0", and "==0" to make sure I get the same results, and compare speed. The answer isn't always the same...though I've gotten decent on knowing which one to expect to work the best.

AND... some of the performance improvements in 19.1 are specifically geared towards making this technique faster.

2 Likes

Nice to know, thanks for the insight Wim.

Can you say more? The 19.1.2 release notes note that Startup Restoration was removed, but I was under the impression that one of the benefits of Startup Restoration was some sort of multi-threaded Find ability.

Edit to add: there's a different note which talks more about the specific changes for performacne: ClarisPKB
It's got a surprising amount of detail:

Performance improvements

Use of a sharing lock for low-level access to the database structure

To replace the use of PLL as described above, FileMaker Server 19.1.2 now uses a sharing lock to manage low-level access to the database structure. Before PLL was introduced, the database engine managed access using a mutex (mutually exclusive object), which allowed only one thread at a time to read or write the database structure. However, a sharing lock allows concurrent read access (as long as system resources allow) but still requires an exclusive lock for write access. Because most database structure accesses are read operations, the use of a sharing lock improves performance for concurrent users of the same database.

Parallel indexed find operations

Because a sharing lock is used to allow concurrent read access, version 19.1.2 can now perform find operations on indexed fields in parallel. Previously, only one indexed find operation could be performed at a time.

Sorting records on the host

In previous versions, FileMaker Server always transferred records to FileMaker clients (FileMaker Pro and Claris FileMaker Go®) to perform a sort. However, starting with version 19.1.2 of the FileMaker clients and FileMaker Server, FileMaker Server can perform the sort, if it is not busy; otherwise, the client will perform the sort.

Memory-based key comparison during indexing

A text field containing multiple values (each value separated by a carriage return) that's used as a match field in a relationship is called a multikey field . The FileMaker database engine can resolve relationships that use a multikey field as the match field. In versions earlier than 19.1.2, the database engine parses multikey fields and stores the individual indexable keys in a temporary file. To improve how multikey text fields are indexed, version 19.1.2 now stores the keys in memory and performs key comparisons in memory, which is more efficient.

Memory-based key comparison during indexing is supported in FileMaker Server and FileMaker Pro but not in FileMaker Go.

3 Likes

I just tested my demo with "==0" and "=0", locally, and the speed is identical, which is reassuring.

You found it. The relevant part is the paragraph on " Memory-based key comparison during indexing"

That article is part of the "Engineering Blog" and it is worth to keep your eye on:

1 Like

The 19.1 replacement for "startup restoration" is the new Sharing Lock and it works on the same area of FM: queries. And those are not just users or scripts doing actual finds, but anything in FM that has to run a query. Including GTRR, which has to produce a query to come up with the related records. Even showing things in a portal is a query.

3 Likes

It goes much faster if you forget about the "Enter Find Mode" and "Set Field"-statements and only use the Perform Find-statement with inclusive request. I'm getting the same times as with the local find

2 Likes

Has anyone else seen this performance improvement as @LucThomaere mentioned? (as someone who always uses the 3 script step approach)

1 Like

Not as a general rule, no.
So it would be good to flesh it out with a reproducible test.

We had a few anecdotal instances where - on very busy layouts - we saw this. But we solved it by streamlining the layout (getting rid of unnecessary triggers and other layout-loading / mode-switching delays).

2 Likes