Loupe can be fine-tuned to match your requirements. As with any search engine, Loupe optimizes the documents you index for efficient retrieval later on. This means, indexing takes rather long compared to searching. Moreover, Loupe is typo tolerant which is achieved using a State Set Index implementation. Loupe is shipped with sane defaults, but you may want to tweak the parameters for your use case.
But let's start with the basic configuration.
Every document has to have an identifier in Loupe. By default, Loupe expects every document you index to have an id
key. But you can adjust that to your needs:
$configuration = \Loupe\Loupe\Configuration::create()
->withPrimaryKey('uuid')
;
By default, Loupe indexes all the attributes of your documents. This makes the search index considerably bigger. So be sure to configure, which attributes you want to search through later on:
$configuration = \Loupe\Loupe\Configuration::create()
->withSearchableAttributes(['firstname', 'lastname'])
;
By default, no attribute can be filtered on in Loupe. Any attribute you want to filter for, needs to be defined as such before you start indexing. Notice that the attributes can be single values (scalar) but also arrays - Loupe does everything for you:
$configuration = \Loupe\Loupe\Configuration::create()
->withFilterableAttributes(['departments', 'age'])
;
Loupe can order your results by any scalar attribute of your document:
$configuration = \Loupe\Loupe\Configuration::create()
->withSortableAttributes(['age', 'lastname'])
;
In order to optimize tokenization for your use case, read the "Tokenizer" section of the docs. These are the options:
$configuration = \Loupe\Loupe\Configuration::create()
->withMaxQueryTokens(12)
->withLanguages(['en', 'fr', 'de'])
;
In Loupe - as in MeiliSearch - we follow the philosophy of prefix search.
Prefix search means that it's not necessary to type a word in its entirety to find documents containing that
word — you can just type the first few letters. So huck
would also find huckleberry
.
Prefix search is only performed on the last word in a search query. Prior words must be typed out fully to get
accurate results. E.g. my friend huck
would find documents containing huckleberry
- huck is my friend
, however,
would not.
Searching by prefix (rather than using complete words) has a significant impact on search time. The shorter the query term, the more possible matches in the dataset.
That's why you can also configure the minimum length of characters that a term must contain before the prefix search
kicks in. By default, this is configured to 3
. So searching for h
would not find huckleberry
while huc
would.
You can configure this behavior:
$configuration = \Loupe\Loupe\Configuration::create()
->withMinTokenLengthForPrefixSearch(1)
;
Loupe is typo tolerant! This is achieved by implementing the algorithm presented in the 2012 research paper "Efficient Similarity Search in Very Large String Sets" by Dandy Fenz, Dustin Lange, Astrid Rheinländer, Felix Naumann, and Ulf Leser from the Hasso Plattner Institute, Potsdam, Germany and Humboldt-Universität zu Berlin, Department of Computer Science, Berlin, Germany.
The algorithm allows to efficiently search through huge datasets with typos (Levenshtein distance) while keeping the index size small. Download the paper and read all the details here.
Typo tolerance is configured as a sub object of the Configuration
class:
$typoTolerance = \Loupe\Loupe\Config\TypoTolerance::create();
$configuration = \Loupe\Loupe\Configuration::create()
->withTypoTolerance($typoTolerance)
;
In the following examples, we're thus only going to look at the TypoTolerance
method calls.
By default, typo tolerance is enabled, but you can disable typo tolerance entirely. It's as easy as this:
$typoTolerance = \Loupe\Loupe\Config\TypoTolerance::disabled();
Those are the two major configuration values that affect basically everything in Loupe:
- The index size
- The indexing performance
- The search performance
It's pretty hard to explain the State Set Index algorithm in a few short words but I tried my very best to explain
some of it in the Performance section. Best is to read the academic paper
linked. However, one thing to note: You cannot get wrong search results no matter what values you configure. Those
values are basically about the number of potential false-positives that then have to be filtered by
running the Levenshtein algorithm on all results. The higher the values, the less false-positives. But also the more
space required for the index.
The alphabet size is configured to 4
by default. The index length to 14
.
$typoTolerance = \Loupe\Loupe\Config\TypoTolerance::create()
->withAlphabetSize(5)
->withIndexLength(18)
;
Usually, the longer the words, the more typos should be tolerated. It makes no sense to tolerate 6
typos for a word
like search
as it would mean that engine
matches as well.
By default, Loupe tolerates 2
typos for words that are 9
or more characters long and 1
typo for 5
to 8
character long words. You can configure those thresholds. The key is the threshold and the value represents the
allowed typos:
$typoTolerance = \Loupe\Loupe\Config\TypoTolerance::create()
->withTypoThresholds([
8 => 2, // 8 or more characters allow for 2 typos
3 => 1, // 3 - 7 characters, allow one typo
])
;
Typos at the beginning of a word are not as likely as typos in between words. Thus, Loupe counts a typo at the first character of a word as two typos by default. You can disable this behavior like so:
$typoTolerance = \Loupe\Loupe\Config\TypoTolerance::create()
->withFirstCharTypoCountsDouble(false)
;
By default, Loupe will not allow typos on prefixes. So if you e.g. search for Huckle
, it will find Huckleberry
but if you search for Hukcle
, it won't. This is for performance reasons. However, you can enable typo tolerance on
prefix search. Just be aware that you probably shouldn't do this in case you have tens of thousands of documents:
$typoTolerance = \Loupe\Loupe\Config\TypoTolerance::create()
->withEnabledForPrefixSearch(true)
;
You may pass a PSR-3 logger to Loupe. For the sake of simplicity, Loupe also ships with a very simple
InMemoryLogger
so you don't have to require any special package only to quickly debug internals:
$logger = new \Loupe\Loupe\Logger\InMemoryLogger();
$configuration = \Loupe\Loupe\Configuration::create()
->withLogger($logger)
;
print_r($logger->getRecords());