Performance core principles: 9. Data imports

  1. DATA IMPORTS

How do you ensure that data imports run as fast as possible?

Importing large datasets can be glacially slow or lightning fast depending on how you do it and most experienced developers keep their techniques to themselves or mutter about them under their breath with other experts.

Using Isolated Table Occurrences
An offhand remark from Paul deHalle in a FileMaker kitchen event many years ago alerted me to the basic technique of importing and indeed exporting data from a layout that has no relationships, so the context is an isolated Table Occurrence. This simple technique prevents FileMaker from re-evaluating the known universe for every single record as it is imported. This makes an enormous difference. It does, of course, require the developer to have ensured that all the data required for the preceding export does actually reside in the table to be exported and that none of the fields are unstored calculations referencing a field in another table occurrence. This is a good thing as it encourages good practice.

Using tab delimited or comma delimited data files
An operation which may take minutes or longer using the fmp12 file format can be completed in seconds or less if the data is plain text.

Using repeating fields
It appears that there may be a data import technique using repeating fields - but I am not familiar with it?

[Can anyone elucidate this?]

I do know that a narrow table with a single field set to have thousands of repeats remains very fast in the manner that I user them. When you need to add additional fields to a table adding the additional fields by just increasing the number of repeats of you data field can work very well provided you have a method of keeping track of which repeat is used for which data.

A repeating field is perhaps a good example of a field that should not be indexed. Indexing up to the 36,000 maximum number of repeats of a single field would doubtless be heavy work. Use of many repeats as places to store the values of css properties, for example, works very well because this a form of data storage that never requires a Perform Find.

[Other techniques?]

Cheers, Nick

1 Like

It is worth remembering that when setting up scripts for data import, if the data source has Encrypted At Rest set (EAR - GDPR compliant) and is a local (non-hosted) file, then every time the field mapping is referenced within a script step first the EAR password and then the normal FileMaker username/password has to be entered.

During data imports/transfers involving many tables, this has a significant productivity hit.

All of our hosted systems have EAR set, but we’ve not carried out any back to back performance comparisons as we have no option to run without EAR to remain GDPR compliant.

2 Likes

The technique simply gathers all the field data into a repeating field. Then when importing, you only import the repeating field. The benefits are really only realized when transfering over WAN, and the benefits ONLY applies to the transfer of data and import.

Once you have imported the data into the repeating field on the other end, you then have to unpack the data. And that process is slower than a normal import. So it really depends on what you problem you need to solve. If an import takes 5 hours because it's moving data across the WAN, it may be worth it. If it's a ton of data, unpacking the data on the target may take some time.

2 Likes