The Levenshtein distance between two strings is the minimum number of insertion, deletion, and substitution operations required to transform the first string into the second. The distance is named after the Russian scientist Vladimir Levenshtein, who introduced it in 1965.
The Levenshtein algorithm is used in Google search and, for example, also in WORD for error correction.
I have created an Add-On in which you can use this algorithm in every FM database. To make the results available for any table, a parameter table is added upstream, in which the necessary information can be entered.
There is an extension of Levenshtein’s algorithm that also recognizes transposed letters. A transposition is rated with only one operation, while the original algorithm requires 2 operations.
This algorithm is called Levenshtein-Damerau. This extension is quite helpful in practice, as input transpositions are relatively common and difficult to find.
For detailed information and for free download this Add-On please visit my website https://www.woisoft.de
interesting.
The Levenshtein distance is so simple to implement using a little REST service, FileMaker, and just a wee bit of code. We’ve been using this for many years.
In Python, for example, you might use this for Levenshtein-Damerau:
def damerau_levenshtein(str1, str2):
Initialize matrix to store distances
len_str1, len_str2 = len(str1), len(str2)matrix = [[0] * (len_str2 + 1) for _ in range(len_str1 + 1)]
# Initialize base case: distances for empty strings
for i in range(len_str1 + 1):
matrix[i][0] = i
for j in range(len_str2 + 1):
matrix[0][j] = j
# Fill in the matrix using dynamic programming
for i in range(1, len_str1 + 1):
for j in range(1, len_str2 + 1):
cost = 0 if str1[i - 1] == str2[j - 1] else 1
# Standard edit operations (insertion, deletion, substitution)
matrix[i][j] = min(
matrix[i - 1][j] + 1, # Deletion
matrix[i][j - 1] + 1, # Insertion
matrix[i - 1][j - 1] + cost # Substitution
)
# Damerau transposition (adjacent character swap)
if i > 1 and j > 1 and str1[i - 1] == str2[j - 2] and str1[i - 2] == str2[j - 1]:
matrix[i][j] = min(matrix[i][j], matrix[i - 2][j - 2] + cost)
# The final distance is in the bottom-right cell of the matrix
return matrix[len_str1][len_str2]
Example usage
str1 = "kitten"str2 = "sitting"distance = damerau_levenshtein(str1, str2)print(f"The Damerau-Levenshtein distance between '{str1}' and '{str2}' is {distance}.")
This problem benefits by using Dynamic Programming giving O(n) time.
Dynamic programming turns exponential‑time recursive solutions into polynomial‑time algorithms by systematically reusing work. It’s a cornerstone technique for algorithmic challenges ranging from classic textbook problems to real‑world optimization tasks.
I’m no expert in this area, but I spent a decent amount of time learning about it for an app I built that scores student writing samples for a university research team. There are a bunch of other similar measurements that can be implemented in FileMaker with a webviewer and a little JS.
After I finished building the app, I wrote about the different measurements I had learned about, like Lexical Density and Flesch-Kincaid Readability score. Really interesting stuff, and fun to replicate with code.
What you wrote to me is very new to me. I did some reading and found the corresponding formulas and algorithms for determining the number of syllables in both English and German texts. This means that implementation in FileMaker would be relatively easy. But the question for me is: what do I do with the result in a database solution? Where can I use it? Where are texts analyzed for readability other than at universities or schools?
The readability and text cohesion are useful for analysing technical documentation for software. Most of the use cases are around writing analysis of some kind, so maybe not much use in FileMaker unless you’re building for a university study.
What’s the practical application in FM that is performant? I have in the past tried to use this however primarily as a search tool, to find matching records based on fuzzy search. The issue I have with it is for any given search, the distance has to be calculated for each individual record. You can’t “store” a distance on a record because it all depends on what you are searching on in real time.
In short, it makes any search basically akin to searching on an unstored calculation in terms of perfromance. Perhaps “fast” searching is not a great use case?
I programmed the fuzzy search using the same technique as for a virtual list; an SQL command writes the two fields relevant to the search from all data records into a string. The Levenshtein distance is then determined in the string. This is quite efficient. For practical application in FM projects, I would say that fuzzy search is most useful where manual entries are made, such as in an address table. The number of records in such a table should then remain manageable.