Convert Windows 1252 -> UTF-8

Is there a way in FileMaker to convert existing Windows 1252-formatted text to UTF-8?

Thanks in advance.

You mean reading a text file in that encoding?

I mean converting Windows 1252 (in this case) -> UTF-8 to eliminate problem characters like smart quotes and the like.

I need to export this file for other programs to use.

Thx

The help pages for the TextEncode and TextDecode functions indicate that both Windows 1252 and UTF-8 are supported. That makes prospects seem promising (though I must admit that I am not certain that I accurately understand the task as described above).

Hope this helps!

Just want to remove all the Unicode characters from some text and end up with "regular" UTF-8.

Thanks

Hey @OliverBarrett,

UTF-8 stands for " Unicode Transformation Format – 8-bit". It is Unicode. Regular UTF-8 can encode all of the Unicode character set (144 697 characters).

Do you mean limiting the result to one-byte UTF-8 code points? If so, you are then trying to encode Windows-1252 to ASCII.

2 Likes

Just trying to strip out the smart quotes and other (possible) 1252 UTF stuff in Windows 1252.

Thanks

I get it.

I suggest you look at using the Substitute function for the select characters you want translated.

BTW… Keep in mind that text is stored as Unicode encoded characters in FileMaker… except in container fields or calculation fields returning container results. Whether the encoding uses UTF-8, UTF-16 or UTF-32, I don't know.

2 Likes

I'm actually already doing the equivalent technique in another language where the data are coming from FileMaker. I was just wondering if I had missed an easy step where this parsing wasn't needed.

Appreciate your replies!

See also Text.ConvertToTextEncoding and Text.ConvertFromTextEncoding function in MBS FileMaker Plugin to convert between different text encodings, even very exotic ones.

2 Likes

Interesting. How does your code work? Do you parse the text and remove things like smart quotes and such are are you using a library internally?
Really wondering!
Thanks Christian.

This uses iconv library.
Some characters are changed, e.g. – to -, if the encoding has no – as character (wide dash).
Other characters may simply be changed to ? in conversion.

We made it to convert to some exotic old DOS encodings to send to a device.

Thanks