Fuzzy-matching helpers:
- Measure distance between two strings.
- Compute phonetic string code.
- Transliterate a string.
Adapted from libstrcmp by Ross Bayer and spellfix.c by D. Richard Hipp.
If you want a ready-to-use mechanism to search a large vocabulary for close matches, see the spellfix extension.
Measure distance between two strings:
dlevenshtein(x, y)
- Damerau-Levenshtein distance,edit_distance(x, y)
- Spellcheck edit distance,hamming(x, y)
- Hamming distance,jaro_winkler(x, y)
- Jaro-Winkler distance,levenshtein(x, y)
- Levenshtein distance,osa_distance(x, y)
- Optimal String Alignment distance.
sqlite> select dlevenshtein('awesome', 'aewsme');
2
sqlite> select edit_distance('awesome', 'aewsme');
215
sqlite> select hamming('awesome', 'aewsome');
2
sqlite> select jaro_winkler('awesome', 'aewsme');
0.907
sqlite> select levenshtein('awesome', 'aewsme');
3
sqlite> select osa_distance('awesome', 'aewsme');
3
Only ASCII strings are supported.
Compute phonetic string code:
caverphone(x)
- Caverphone code,phonetic_hash(x)
- Spellcheck phonetic code,soundex(x)
- Soundex code,rsoundex(x)
- Refined Soundex code.
sqlite> select caverphone('awesome');
AWSM111111
sqlite> select phonetic_hash('awesome');
ABACAMA
sqlite> select soundex('awesome');
A250
sqlite> select rsoundex('awesome');
A03080
Only ASCII strings are supported.
Transliteration converts the input string from UTF-8 into pure ASCII by converting all non-ASCII characters to some combination of characters in the ASCII subset.
Distance and phonetics functions are ASCII-only, so to work with Unicode string one should transliterate it first.
sqlite> select translit('привет');
privet
SQLite command-line interface:
sqlite> .load ./fuzzy
sqlite> select soundex('hello');
See How to Install an Extension for usage with IDE, Python, etc.