Importing Text with HTML Formatting

One of my vendors provides a monthly product text file (UTF-8) that contains HTML formatting text strings. When I import the text, FileMaker does not convert these strings to standard ascii text. I looked online and the best option may be to use an online converter to modify the data before importing. Any better options?

Can you read the file and separate the html parts?

Then maybe use our Text.HTMLtoStyledText function.

There are three custom functions on Brian Dunning's site that you may be able to incorporate, depending on exactly what your source data look like. Search there for "HtmlToText" (as one word) to find them.

FileMaker handles UTF-8 text encoding natively, but does not strip HTML tags and such. If you share a small snippet example of your text strings we can probably help guide you more precisely.

When I do imports that need adjusting or validation of content I import into an intermediary table and do the 'massaging' there, then complete the import to the final destination. (On simple stuff I'll likely just use an Auto-entered calculation on the destination field(s).)

If you're on Mac OS you can consider using AppleScript to strip the HTML stuff ( textutil does a decent job of removing tags and such). Having FileMaker call a python script (using beautifulsoup4) is another option providing more coyntrol.

You mention that this occurs ~once per month. It would be pretty easy to place the file into a "working" folder and run a shell script (e.g. python) to output plain text to import to FM. That's likely the approach I'd take. If you'd like input on this approach, please share a tiny example of your source HTML data and let us know what OS, etc. you're working with.

1 Like

I discovered there are only six (or so) formatting issues — mostly quotes, non-breaking spaces, and em dashes. The easiest solution will be to loop through the records and use the substitute command to replace these with ascii text. Thanks to everyone for their suggestions.

1 Like