Regular Expressions in Script Search

You are an advanced developer using FileMaker? Then you probably know regular expressions already. Where you can write search terms to find interesting things in texts. Let us use them for our regular

The old way to search was to enter multiple texts. We split by space characters and take each word and search for all them in a line. If all are found, the line turns yellow. That is what you know and we expect that the majority of users only types one thing to find. But there is a small fraction of power users, that ask for more.

For version 13.1 we use regular expressions for each of this words, so you can do OR searches or search with wildcards better. But since you like to search for $ variables, we escape a $ or $$ on the beginning of the search word for you. Otherwise you may do backslashes for escaping some special characters like $ or brackets. If the text is

Let us show you a few examples:

$abc
If you search for a text, it finds all lines with this text.

Set $abc
Finds all lines, which contain set and $abc as text anywhere. e.g. a Set Field which uses the variable in the expression.

^Set
We use our first regular expression here. The caret character defines the beginning of a line, so this searches for all lines starting with Set, e.g. Set Variable or Set Field.

^Set $abc
This finds lines starting with Set and which have $abc somewhere too.

Feld|Field
Find either Feld or Field in a script line.

\].$
This finds all lines ending with a square bracket. When we query text from FileMaker, the line gets an extra space, so we need to put a dot in the expression and find the all lines ending in the square bracket.

$m(.*)x
This finds variables with a name starting in m and ending in x.

\bError
Search for Error as text in front of a word. \b is the delimiter for a word boundary.

Feld\[\d
Find field access by index with a digit.

\bclaris:|\bfmp1?9?:
Find fmp URLs with optional the version or the claris scheme. And it must start with a word boundary.

Feel free to try this with the 13.1pr2 version and let us know if you have problems, suggestions or questions.

7 Likes

Any chance you know of a regular expression that would split text with the same logic as FileMaker's native wordCount() function?

I've come close but the final count is usually off by a few words. There are lots of special characters that only count under certain conditions, so lots of lookaheads/lookbehinds.

I was working on this about a year ago, and was looking for a single REGEX that would split a string into the same word count as FileMaker. I got pretty close with this expression:

\s|[!@#$%^&*()_+={}[\]|\\;"<>?~`]|((?<=[^a-z])['.]|['.](?=[^a-z])|(?<=[^0-9])[-:\/]|[-:\/](?=[^0-9]))

space: \s| OR

any character that's always a separator: [!@#$%^&*()_+={}[\]|\\;"<>?~`]| OR

period or single quote, unless alpha on both sides: ((?<=[^a-z])['.]|['.](?=[^a-z])| OR

hyphen, colon, forward slash, unless number on both sides: (?<=[^0-9])[-:\/]|[-:\/](?=[^0-9])) 

... based on the definition from here:
https://support.claris.com/s/article/Word-separators-in-FileMaker-Pro-1503692915258?language=en_US

It's not perfect, but it was close enough for my use case so I moved on. ChatGPT wasn't around at the time, but...

I just asked it to write a REGEX based on this set of rules, and it seems to require more than just a single regex split.

function splitTextByFileMakerProRules(text) {
  return text.split(/([\s!@#$%^&*()_+=\{\}\[\]\|\\:;"\'<>?\/*~0-9]+)/)
    .filter(word => word.length > 0)
    .map(word => {
      if (word.match(/^[\s!@#$%^&*()_+=\{\}\[\]\|\\:;"\'<>?\/*~0-9]+$/)) {
        return word.replace(/([\s!@#$%^&*()_+=\{\}\[\]\|\\:;"\'<>?\/*~0-9]+)/g, ' ');
      } else {
        return word;
      }
    });
}

I like this approach, and it's easy to understand the logic. You split by all the characters that are always word separators first, then throw out the extra white space, then map over and swap out the conditional characters with a something you can easily separate by.

But why not a single expression?? I asked ChatGPT again:

function splitTextByFileMakerProRules(text) {
  return text.split(/(?<!\w)([\s!@#$%^&*()_+=\{\}\[\]\|\\:;"\'<>?\/*~0-9]+(?!\w))/);
}

Hmm, so why not just give me a single regex in the first place? What was wrong with mine?

Your regular expression is close, but there are a couple of issues with it:

The first alternative \s matches spaces and other white-space characters, but it will also match the tab character and other control characters. You may need to modify this alternative to only match the space character.

The second alternative [!@#$%^&*()_+={}[\]|\\;"<>?~]` matches a set of characters that are always treated as word separators, but it does not include the hyphen, colon, or forward slash.

The third alternative ((?<=[^a-z])['.]|['.](?=[^a-z])|(?<=[^0-9])[-:\/]|[-:\/](?=[^0-9])) attempts to match the hyphen, colon, forward slash, single quote, and period based on the characters adjacent to them, but it has a couple of issues

Then I asked it for a single REGEX again and got a slightly different response. So I can't trust it.

Can anyone familiar with the FileMaker codebase confirm if this is more than just a REGEX split? Is there other logic involved?

I'd think the status for any output by chatGPT is "unverified."

But regex is so complicated that there will be more than one way to achieve a result. Did any of the suggestions do what you wanted?

I think you need our RegEx.FindMatches function.
That would give you list with all the results.

A few of them are close, but I haven't found anything that exactly matches. I was hoping someone may know if it really is a single regex, or if other logic needed to replicate it.