Best Way To Implement Huge SQL Query In FIleMaker

Looking for suggestions on how to implement a huge GROUP BY SQL query in FileMaker.

The table has about 400,000 rows and in my experience anything over about 10,000 rows will, for all intents and purposes, hang FileMaker with a GROUP BY using ExecuteSQL().

Below is the GROUP BY query (with generic fields for presentation publicly here) that executes in MariaDB in 1.62 seconds on 400,404 rows.

SELECT
field_1,
field_2,
SUM( field_1 ) AS Sum_field_1
FROM
table1, table2
WHERE
table1.field_1 = table2.field_2
AND table_1.field_1 NOT IN (
SELECT
table.field
FROM
Table T
WHERE
field = 'some_value'
AND CURRENT_DATE BETWEEN date_1 AND date_2)
GROUP BY 1, 2
ORDER BY field DESC
LIMIT 25

Question: How would this be implemented in FileMaker for reasonable performance (that is, not ExecuteSQL)?

Thanks

Just as an aside: I happened to be driving by what appeared to be the MariaDB building about an hour ago, and almost stopped to take a photo to send to you @OliverBarrett , as I know you make frequent and great use of it. It happened to be really late however, and I just wanted to get home, so I figured perhaps another time when I am in that area.

2 Likes

Thanks Steve.

I'm a fan of whatever works best for the client. I like MariaDB/MySQL since they're both free and wickedly fast and have COMPLETE Type IV JDBC drivers (unlike FMP whose JDBC driver lacks several key components like Connection Pooling, Transactions, and JNDI to name a few).

However, FileMaker is the coolest UI DB client product still on the market. Thus, if I can implement this query in FileMaker somehow with acceptable performance, I'll do it for my current prototype (read: sales) effort.

Otherwise, this data will probably end up in SQL Server using a REST service to extract it. The client is currently on the edge of just going ahead with MS CRM (a Microsoft shop). I'm trying to show that FileMaker would be a strong contender.

Thanks again.

O

2 Likes

Can you say more about the data? For example, are the GROUP BY fields something regular or predictable? For example, calendar months, zip codes, cities,... or are the GROUP BY fields contents totally arbitrary?

Other questions/Ideas

  • you say 400k rows in "the table" but your query shows 3 tables (table_1, table_2, and T). How many rows in each?
  • it looks like you are filtering on Date - what is the typical found set? E.g. do you typically sum over all 400k records? Or does the date range limit you to a much smaller working set?
  • do you have a maximum allowed time for this to run? I have some reports I run in FileMaker where I'm selecting about 50K records (from a table holding more than 3M records) and where records are grouped by about 20 categories, and one of these reports runs in about 60 seconds.

Good questions!
Yes, only one table has 400,000 rows; it's kind of a transaction table.
Yes, the query considers all of the 400,000 records for the date eval.

It sounds like you're getting reasonable performance. I'll keep trying to convert the SQL and see how it doez.

Thanks!

I just ran some tests:

  • Table1 (containing a set of 25 rows -- think "zip codes" or similar), related to Table2 (containing over 8 million rows, each row dated). The relationship has 3 predicates including less than and greater than for a date range, and sums found records and computes percentages for another category. This report runs in about 10 minutes (on local gigabit ethernet LAN). The "found set" here is about 400k records (out of 8 million)

  • Same test, but using a simpler relationship, instead uses an "=" relationship for a specific date, but otherwise the setup is the same. The found set here is about 30k records. This report runs in about 1 minute.

So my experience is that 400k records is not really that much, and with careful relationship design and intentional denormalization (liberal use of stored, auto-entered, triggered, indexed fields) you should be able to get performance that's good enough.

did you try to "cache" this with a separate query then append this that it matches the SQL syntax? (it appears to me it needs to be only determined once).

2 Likes

I tried to separate the queries and also did not create any relationships. The inner query completes very fast, but the main query (without the sub-query) on the large table still simply appears to hang FileMaker. 30 minutes later....

I also tried to run the main query in the Data Viewer. It got to a point, then was "sorting". However, at some point in that sort, it too hung and I had to FORCE-QUIT FileMaker.

sort-hung

Then trying to restart FileMaker after the necessary Force-Quit, I got the lovely 805 (File Corrupt) message. Running "Recover" didn't (recover). File apparently damaged beyond repair.

This 805 corruption happened twice in the several hours I spent trying to get FileMaker to produce results. Of course, to continue, I had to create a brand new FMP12 file, import 400K records and then 35 K for the other table, then change the field types that the FMP import gets wrong, etc. About a half hour just to create the new FMP12 file for testing after it's corrupted beyond repair following necessary Force-Quit. ("Cancel" button unresponsive).

This FileMaker SQL performance isn't going to fly with my client. I also don't have days to try to find some "trick" to get sub-optimal but still maybe only a few minutes (when SQL Server will do it in a couple seconds). The client knows what his enterprise DB can do

Therefore, he'll (sadly) just end up using MS CRM as it's super fast (SQL Server back end). Sure, not as pretty as FileMaker (or as fun to use, etc.), but functional.

This SQL performance in FileMaker has lost me plenty of business. AFAIK, nothing has changed since ExecuteSQL() was introduced years and years ago. I keep trying ....

Thanks for your reply. :slight_smile:

P.S. My experience in MySQL and MariaDB was quite different. I took the client's existing query, made two changes and ran it (5 minutes to make query changes for MariaDB/MySQL). Query runs every time in 3 seconds. Done.

2 Likes

I am curious… how many results typically in the inner query?

1 Like

About 20.

Hi @OliverBarrett

I have pretty strong doubts that ExecuteSQL in FMP will ever handle this type of query scenario in a performant manner. I refrained from posting my doubts earlier because it seemed like you were already maybe seeking out a non-ExecuteSQL way to approach this, and also because I wanted to leave room for thoughts from others before immediately chiming in with my doubts.

If it is a must to use the ExecuteSQL function for this, then my guess is that it's going to be a no-go, though of course, I'd be happy to be proven incorrect.

If you are interested in non-ExecuteSQL approaches, we could look at how to break down the problem into parts so that you can see what you think about a more "traditional" FileMaker approach. Such an approach would mean temporarily leaving behind any mindset about how one performs these tasks in an SQL database. Instead it would require developing a small amount of FileMaker bread-and-butter chops, e.g.,

  • Writing a script with a Perform Find step to create a target found set
  • Adding a summary field to the table definition (it will be responsible for performing the SUM)
  • Exporting data with sorting and group-by features
  • Additional steps, as necessary, to massage the data to the best format.

Now, whether the above will be performant enough to fit the bill is still an open question, but, IMO, it's the best place to start to see if FMP can even be in the running for doing an on-the-fly query/report of this nature.

A couple of caveats:

  • Someone else may still have some approach that I am overlooking, or about which I am ignorant, and so its still good to take the above with a grain of salt.

  • For some years now, there's been an increasing practice amongst FMP developer to pre-calculate and store (cache) summarized data so that the overhead of performing that summarization does not have to happen at the time of reporting. If such an approach is possible, it's definitely going to be your most performant and scalable option. I can't tell whether this would be an option for the scenario described in the original post here.

As always, HTH.

2 Likes

On a train so tought I would give this a try. Unsure how slow or fast this will be. I am just creating the FileMaker steps that gives the same result as the SQL query.

Relationship graph:

  • Occurrence T_1_1 points to Table1 in the relationship graph;
  • Occurrence T_1_2 points to Table1 in the relationship graph;
  • Occurrence T_2 points to Table2 in the relationship graph;
  • Occurrence T_T points to T in the relationship graph;
  • Occurrence1 relates to Occurrence2 when Occurrence1::field_1 = Occurrence2::field_2;
  • Layout L_1_1 shows data from occurrence T_1_1, no summary field;
  • Layout L_1_2_1 shows data from occurrence T_1_2, no summary field;
  • Layout L_1_2_2 shows data from occurrence T_1_2 has summary field;
  • Layout L_T shows data from occurrence T_T.
  1. In layout L_T, find records where field T_T::field = "some value" and current date = date1...date2. Store the list of T::field values in variable $listOfTvalues. I think this could be done with ExecuteSQL without much of a performance penalty.

  2. In layout L_1_1, find records with the following requests:

    • Find request: field T_2::field2 contains anything (">0" works);
    • Omit requests (one per value in $listOfTvalues): table1::field_1 = value in $listOfTvalues;
  3. Go to related records:

    • Get related records from T_1_1;
    • Show records uning layout: L_1_2_1;
    • Show only related records: match all records in the current found set;
    • This removes the relationship between Table1 and Table2, preventing FileMaker from loading related Table2 data in subsequent steps.
  4. Sort records for grouping

    • You son't be able to sort by T_2::field2 but I don't think you need to. Table1::field1 must be equal to Table2::field2 to be in the found set per the relationship and find criteria.
  5. Go to layout L_1_2_2

    • This will cause the summary field to calculate its values.

I think this gives you what you are looking for. Try it out and let us know how good or bad this is.

2 Likes

ExecuteSQL in FileMaker is a bit slow as it internally gets translated from SQL Query language to the FileMaker internal query language. Some things work well, but others are horrible slow. Especially JOINs can be slow. Slow enough, that we often do them ourselves after we got results.

Sometimes it can be quicker to load data into an in-memory SQLite database with MBS Plugin and run the query there.

For your query, I would try to reduce the query and e.g. do the inner SELECT first, take the result and include it in the SQL for the outer query.

And as said, try to run two requests, one against table1 and one against table2 and then do the join yourself if possible.

Another thing is to skip SQL all together and do the same query with find operations in FileMaker to get the better join performance.

2 Likes

Das Klingt gut.

I'm thinking two finds: one for the outer SQL and one for the inner SQL. Then, use list operations to exclude the inner result from the outer.

I believe that would be faster. Will experiment with.

Was halten Sie davon?

For now, I must get my demo done so I'll have to just do a query against MariaDB via a REST service.

Appreciate the ideas!

I woke up thinking something like this. See my reply to Christian below and see if it makes sense.

Thanks!

You're amazing as always.

I replied to Christian below since it was the last message I read, but I think skipping ExecuteSQL and using Finds and possibly list operations will do the trick quickly.

Will experiment and report back.

Thanks Steve!

P.S. Where is that picture of the MariaDB building? Would love to see that. :slight_smile:

1 Like

@OliverBarrett are you able to create a simple test file we could play with?

if only 20 records then replace that inner query with a generic predicate with these cached 20 records. Also is the ExecuteSQL running in a script Full Access granted then if no record access restriction is relevant see what you gain. (if access security in place you need to replicate the restrictions in your WHERE clause which of course slows this down further ..).

great article BTW FileMaker ExecuteSQL() - the Good, the Bad, and the Ugly

3 Likes

Hey @OliverBarrett

Glad to read that you are considering attacking this without ExecuteSQL. Though, I will mention that for the smaller inner query, if that is performing quickly then I see no need to move that piece away from using ExecuteSQL. On a related note, I agree with @FileKraft that it looks like that inner query only needs to be performed once.

As far as an approach with Finds and so forth:

I'd suggest trying to get as much accomplished with the Find requests as possible before starting to process those results with some kind of list comparison/weeding routine. I say this because FMP is likely to be more efficient at executing the find requests than it will be at executing code that compares and massages lists.

In other words, rather than doing Finds and implementing something additional to weed out unwanted values, make sure you first try to leverage Find requests which use FMP's "Omit Request" feature. This could potentially allow you to get the exact set of records that you need without ever having to any additional processing to get at only the data that you want to work with.

Earlier in this thread @bdbd posted a comprehensive strategy/recipe for this type of approach. You can see the Omit requests are leveraged in item #2 of that recipe.

One question:

I noticed that the original SQL query limited the number of rows returned. Is this because there is something like pagination involved where the user only sees N records of data at a time?

3 Likes

I was driving out of Redwood City, California when I saw a very nice looking sign lit up with the name MariaDB and an image/shape of an animal.

1 Like