UUID & indexing language

I've never bothered changing the indexing language of any field using a UUID to 'Unicode' instead of 'English'. Mostly because when fields are duplicated, that stuff sticks and I then risk having a plain field indexed as Unicode, and I know it will take me forever to figure out why I'm not getting what I expect out of a simple basic query.

That said, how I am at risk (what is my risk level) of making a find against a UUID and finding multiple records because 2 or more UUIDs have the exact same letters and numbers where, for the same letters, one uses uppercase and the other is lowercase?

Statistics were not my favorite in school and that too long gone anyway for me to attempt solving a problem like this one. I hope the community will pardon my lazyness...

Thanks for anyone who is brave enough to chime in with an answer.

Hello @Bobino

Are you using Get( UUID ) ?

I have never seen that function return a value containing lower-case letters. All the output I have seen has used upper-case alpha only (plus numbers and hyphens).

Such behavior could probably be confirmed with Claris (or anyone else in the know).

Would this not obviate the concern you posed? Am I missing some other aspect about this?

Kind regards,

-steve

1 Like

You are so right! I'll add this to the things that are weird in the system I inherited.

I noticed it because when debugging, I wanted to find a record and simply entered a substring of the UUID, the first few characters. Manually typing on my keyboard (using lower case), it returned nothing and that's when I noticed the difference. I guess the original developer introduced it for no good reason. (It got me thinking there could be lower case characters in the UUID).

The help makes no mention about using only upper case characters, but I trust only upper case letters are used.

I'm simply happy I don't have to use unicode for indexing language. ::face_palm::

I'm trying to imagine what their reasoning might be.

Perhaps the Unicode index performs faster or takes up less space because it does not deal with a word index as a text field indexed by language might....

[EDIT: I had provided an example (now removed) to help illustrate, but as @Bobino helped me realize (see post below), I wasn't thinking it through correctly. :crazy_face: ]

If index is set to minimal, my understanding is that only the value index will be created. Is that correct or not?

1 Like

Ah! Good point! I had thoughtlessly set up my English language test field with full indexes. Creating a new field with minimal English language index indeed shows the same behavior as the Unicode index.

Apologies for the bum steer!

-steve

I'm just puzzled as to why someone would change the indexing language to Unicode for a field that contains UUIDs.

Sometimes it is best not to think too hard about what others were thinking, we may end up thinking they did not put so much thought into it :smile:

Letters in UUID are hex digits, according to RFC 4122 and shall be lower case.
I am using UUIDNumber for internal key fields in order to avoid any issues related to FM’s implementation of UUID. Documentation suggests the use of UUIDNumber over UUID.

Lower (UUID) makes the UUID string literal compliant with RF4122.

In terms of 'uniqueness', UUIDNumber should do better than UUID with 192 bit compared to 128 bit.

in german language the hyphen in the UUID is seen as a word separator, that can leed to a broken behaviour when working with the index. So in german systems with german indexes I put UUID fields to unicode. Others say it's save to just put index to minimum to avoid any complications. I like to put on belt and braces.

Holger

2 Likes

Maybe the system was converted from something else and it was to deal with some legacy stuff