Found set being in a sorted state causes performance degradation in Loop sequences of scripts

Torsten · January 12, 2020, 1:45am

When looping through a set of records, FMS takes more and more time to load the next record 'Go To Record (Next)'. From an initial 1-2 ms it raises to 125 ms on FMS. This slows down the routine considerably.

A demo is added. If you have a server available, test on server.
User: Admin, PW: Admin

Populate the source table, launch script 'Generate Source Records'
Test, launch routine 'Launch_Create_TargetRecords'
Open layouts 'Target' and 'TimeLog' and observe what's happening

I checked the top call stats log which shows 0 wating and i/o time almost all the time. Every 50th record a 15 ms waiting time occurs.

When working on larger data sets this slowing down becomes a real hassle. Any ideas how speed could be kept up while looping through a large set of records?
LoopPerformance.fmp12 (216 KB)

Malcolm · January 12, 2020, 2:21am

Update your scripts. Include a go to first record after the sort in cerate_targetRecords.

After populating the source table, we are left on the last record. The Create_targetRecords script jumps into a loop that uses "Exit after Last". When run, the script runs through the loop once and exits

Torsten · January 12, 2020, 2:28am

Done. Thanks Malcolm!

Malcolm · January 12, 2020, 2:36am

I stopped the process at 7690 records. The first two records had read/write times that went off the chart, so I set them to zero. The slow down is progressive.

Torsten · January 12, 2020, 2:38am

That is what I observed, except for the off-chart values. But why is it slowing down? I have no clue.

Malcolm · January 12, 2020, 3:06am

I wondered if it was anything to do with layout objects and data so I created blank layouts for the scripts. That didn't make any difference. Here's the screen shot. Again, I had to remove significant outliers in the data to obtain a sensible chart. I'll post the chart with the sanitised data first. Then the chart drawn from the outliers. The jump to around the 70 occurs at approx 6100 records.

anon45965781 · January 12, 2020, 11:15am

I never use FMP when I have to loop - if I can avoid it. FMP is way too slow.

I offload that looping to JDBC called from FMP. Looping though a JDBC ResultSet (similar to a FoundSet) is extremely fast.

With JDBC itself I get about 3X better performance than with FMP.

When looping is involved, JDBC performance goes WAY up beyond 3X.

Torsten · January 12, 2020, 2:28pm

No doubt that FMS is not the fastest horse on the track. Its force lies in versatility. However, a degradation of performance as observed in loops should not happen. Performance would be acceptable if FM could keep the initial speed.

What I need to find out is what we have at hand here:

A - my script is misconfigured
B - it is a bug
C - it is a feature and intended behaviour

I have spend hours on the issue without result. Hopefully someone in the community can see through it.
@Malcolm already added evidence to the investigation.

Cecile · January 12, 2020, 7:06pm

Is your script for reading or for writing?

Torsten · January 12, 2020, 7:28pm

It reads from a source table and writes to a target table.

Cecile · January 12, 2020, 8:13pm

The reading isn’t an issue however for the writing fm must update the index at each pass.
David just gave me a crash course in how things get written and although each system handles this differently, and he doesn’t know fm specific approach to writings (what gets put in cache when/how often series of blocks get written) but he said that it is normal that looping becomes less and less efficient. The way I understood the explanation is that to insert a block somewhere first the tail block must be sent farther away the info written then the tail block rewritten right after. When looping, the system might hold several pieces before writing everything. I am sure if he reads what I just wrote he will kill me but I am assuming that you know more and can sort out these bits i said to get some usable info.

He also mentioned that systems akin to SQL do seem faster because all is done in live memory, which comes with the risk to loosing data when the computing crashes

Torsten · January 12, 2020, 8:57pm

Indexing seems to have an impact. Indexing in the target file is set to automatic index management.

There is no programmatic control over 'update index' and turning off the indexing permanently is no option.

Working with large data sets > 200'000 records gets brutally slow in loops. A task that would take 2-3 hours at the initial speed actually takes 12-13 hours in my production system. Such large data sets (i. e. xml) are common (example here).

If updating indexes slows down FMS in loops to the extend we observe, FM lacks the means to control indexing programmatically in order to cope with situations when constant updating makes no sense.

This adds to @anon45965781's hopes for FM v19. I'm sure that programmatical index control has been posted to FMI as an ' idea' a while ago but can't find it anymore over there.

The attached graph shows that there is a pattern in the slowdown.

Loop over 6247 records.

Cecile · January 12, 2020, 9:03pm

It gets faster to export and import the data if i understood correctly although large imports also slow down past the hundreds of thousands.

Torsten · January 12, 2020, 9:05pm

Please elaborate.

Cecile · January 12, 2020, 9:10pm

You fetch the data, so read only, output it in a file then import it. I suppose you could output in a JSON structure to facilitate import?

steve_ssh · January 12, 2020, 9:17pm

@Torsten @Cecile

To whatever extent you feel comfortable and able:

I am wondering if you could elaborate a bit more on how you are seeing the task of indexing specifically affecting the Go To Record step ?

The sample file that @Torsten posted appears to take careful measures to isolate the metric for Go To Record from the metrics for any data-writing activity in the script.

Thus, to look at a dramatically escalating graph of times for Go To Record, and then assert that this is due to io write times and indexes feels like it needs some further explanation to connect the dots. I not saying the assertion is incorrect, but rather, without describing how indexing affects Go To Record, it does not follow the sort of logical trajectory that I expect to see.

@Torsten

Have you tried getting metrics from a script that does no data writes (other than the logging) ?
And, if not, would you consider doing so?
And, would it not make sense to also turn off indexing for any written fields in such a test?

The above might provide some nicer isolation to be able to make more solid/convincing assertions about what overhead can be attributed to indexing and io, and what overhead is mysteriously spiking with Go To Record.

Additional note:

This reminds me of an overhead issue that existed in v14 on MacOS, where the New Window script step would take longer and longer each time it was invoked in a client session. In a solution that make abundant use of this step, clients would see dramatic slowdowns.

Lastly:

I offer my apologies for making a bunch of suggestions about how to tighten up the testing without pitching in to do some of the work myself. I'm keeping an eye on this thread because it is of interest to me, but (at least today) I don't have the time to rework and run the additional tests I mentioned.

Kind regards to all and thanks to all who are discussing this good thread.

-steve

steve_ssh · January 12, 2020, 9:22pm

Hi @Torsten

Are you able to share any of the details about what the script is doing with the XML?

I ask only to see if, collectively, we might be able to identify more efficient ways to do the same.

This is not to say that you have not raised a valid concern about overhead. It certainly looks to be the case that you have.

It is more targeted at looking for ways to optimize, in case it could help. Most scripts that I have written to do large batch processes have needed some extra care and strategy to squeeze the best performance out of them.

Kind regards,

-steve

Cecile · January 12, 2020, 9:26pm

Excellent observations steve. I am following the thread on my cell while installing HVAC system so far from computer atm. I often ask David about stuff discussed here as I learn from it and since he knows almost everything (an awful lot anyway) i try to tap in his brain to bring back info. However I am not the best parrot and some advanced stuff i might not grasp properly.

jormond · January 12, 2020, 9:35pm

The current script has more to it than just the looping. If you hold the timing of the script ( the measurement parts ) in a variable, using 'Insert Calculated Result' and don't log them until the end, there isn't much variance. So it appears the delay is happening because of the record creation, instead of the 'Go To Record'.

Torsten · January 12, 2020, 9:36pm

Sure, please find the script as text file here. Field are auto-indexed with the exception of 'ID' which is fully indexed.CreateHeading.txt (5.9 KB)

Addendum:
The xml is imported into FM and lines are stored in the XML_Line table. From here, other data sets are produced. The script we are looking at creates a data set with all headings and additional information like where to find a section in a html file (bookmark).

Topic		Replies	Views
R.I.P - GTRR Heads-Up! scripting , slowing	95	4254	March 7, 2021
FMPA memory uptake and file size ballooning due to missing explicite type conversion in script Heads-Up! scripting , type-conversion	69	1553	October 31, 2019
SORT ISSUE Questions scripting	4	292	March 25, 2021
JSON too slow in loop (or loops too slow in general) community	62	1918	June 19, 2020
Record Creation Time for PO_LINES Table Questions	6	332	February 7, 2021

Found set being in a sorted state causes performance degradation in Loop sequences of scripts

Related topics