Skip to content

Commit

Permalink
README update
Browse files Browse the repository at this point in the history
  • Loading branch information
ts-thomas committed Feb 12, 2019
1 parent f22b231 commit 8b11bf4
Show file tree
Hide file tree
Showing 2 changed files with 75 additions and 16 deletions.
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
# Changelog

#### v0.5.1

- Provide customizable scoring resolution

#### v0.5.0

- Where / Find Documents
Expand Down
87 changes: 71 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -946,6 +946,33 @@ var index = new FlexSearch({
});
```

Using a custom stemmer, e.g.:
```js
var index = new FlexSearch({

stemmer: function(value){

// apply some replacements
// ...

return value;
}
});
```

Using a custom filter, e.g.:
```js
var index = new FlexSearch({

filter: function(value){

// just add values with length > 1 to the index

return value.length > 1;
}
});
```

Or assign stemmer/filters globally to a language:

> Stemmer are passed as a object (key-value-pair), filter as an array.
Expand Down Expand Up @@ -1112,6 +1139,30 @@ var index = new FlexSearch({
});
```

You are also able to provide custom presets for each field separately:

```js
var index = new FlexSearch({
doc: {
id: "id",
field: {
title: {
encode: "extra",
tokenize: "reverse",
threshold: 7
},
cat: {
encode: false,
tokenize: function(val){
return [val];
}
},
content: "memory"
}
}
});
```

#### Complex Objects

Assume the document array looks more complex (has nested branches etc.), e.g.:
Expand Down Expand Up @@ -1150,6 +1201,8 @@ var index = new FlexSearch({
});
```

> __Hint:__ This is an alternative for indexing documents which are much more complex: https://github.com/nextapps-de/flexsearch/issues/36
#### Add/Update/Remove Documents to/from the Index

Just pass the document array (or a single object) to the index:
Expand Down Expand Up @@ -1286,6 +1339,8 @@ To get by ID, you can also use short form:
index.find(1);
```

Getting a doc by ID is actually the fastest way to retrieve a result from documents.

Find by a custom function:
```js
index.find(function(item){
Expand Down Expand Up @@ -1362,7 +1417,7 @@ index.search("foo", {

> __IMPORTANT NOTICE:__ This feature will be removed due to the lack of scaling and redundancy.
Tagging is pretty much the same like adding an additional index to a database column. Whenever you use ___where___ on an indexed/tagged attribute will improve performance drastically but also at a cost of additional memory.
Tagging is pretty much the same like adding an additional index to a database column. Whenever you use ___index.where()___ on an indexed/tagged attribute will really improve performance but also at a cost of some additional memory.

> The colon notation also has to be applied for tags respectively.
Expand Down Expand Up @@ -1410,7 +1465,7 @@ Find all documents by an attribute:
index.where({"cat": "comedy"}, 10);
```

Since the attribute "cat" was tagged (has its own index) this expression performs extremely fast. This is actually the fastest way to retrieve results from documents.
Since the attribute "cat" was tagged (has its own index) this expression performs really fast. This is actually the fastest way to retrieve multiple results from documents.

Search documents and also apply a where-clause:
```js
Expand All @@ -1426,7 +1481,7 @@ index.search("foo", {
});
```

For a better understanding, using the same expression without the where clause has pretty much the same performance. On the other hand, using a where-clause without a tag on its property has an additional cost.
An additional where-clause has a significant cost. Using the same expression without _where_ performs significantly better (depending on the count of matches).

<a name="sort"></a>
## Custom Sort
Expand Down Expand Up @@ -1728,44 +1783,43 @@ Tokenizer effects the required memory also as query time and flexibility of part
<tr>
<td><b>"strict"</b></td>
<td>index whole words</td>
<td><b>foobar</b></td>
<td><code>foobar</code></td>
<td>* 1</td>
</tr>
<tr></tr>
<!--
<tr>
<td><b>"ngram"</b> (default)</td>
<td>index words partially through phonetic n-grams</td>
<td><b>foo</b>bar<br>foo<b>bar</b></td>
<td><code>foo</code>bar<br>foo<code>bar</code></td>
<td>* n / 3</td>
</tr>
<tr></tr>
-->
<tr>
<td><b>"forward"</b></td>
<td>incrementally index words in forward direction</td>
<td><b>fo</b>obar<br><b>foob</b>ar<br></td>
<td><code>fo</code>obar<br><code>foob</code>ar<br></td>
<td>* n</td>
</tr>
<tr></tr>
<tr>
<td><b>"reverse"</b></td>
<td>incrementally index words in both directions</td>
<td>foob<b>ar</b><br>fo<b>obar</b></td>
<td>foob<code>ar</code><br>fo<code>obar</code></td>
<td>* 2n - 1</td>
</tr>
<tr></tr>
<tr>
<td><b>"full"</b></td>
<td>index every possible combination</td>
<td>fo<b>oba</b>r<br>f<b>oob</b>ar</td>
<td>fo<code>oba</code>r<br>f<code>oob</code>ar</td>
<td>* n * (n - 1)</td>
</tr>

</table>

<a name="phonetic"></a>
## Phonetic Encoding
## Encoders

Encoding effects the required memory also as query time and phonetic matches. Try to choose the most upper of these encoders which fits your needs, or pass in a <a href="#flexsearch.encoder">custom encoder</a>:

Expand Down Expand Up @@ -1814,14 +1868,14 @@ Encoding effects the required memory also as query time and phonetic matches. Tr
<tr></tr>
<tr>
<td><b>function()</b></td>
<td>Pass custom encoding: function(string):string</td>
<td>Pass custom encoding via <i>function(string):string</i></td>
<td></td>
<td></td>
</tr>
</table>

<a name="compare" id="compare"></a>
#### Comparison (Matching)
#### Encoder Matching Comparison

> Reference String: __"Björn-Phillipp Mayer"__
Expand Down Expand Up @@ -1967,7 +2021,7 @@ The required memory for the index depends on several options:
</tr>
<tr>
<td align="left">Mode</td>
<td align="left">Multiplied with: (n = <u>average</u> length of indexed words)</td>
<td align="left">Multiplied with: (n = average length of indexed words)</td>
</tr>
<tr>
<td>"strict"</td>
Expand Down Expand Up @@ -2005,7 +2059,7 @@ The required memory for the index depends on several options:
</tr>
</table>

Adding, removing or updating existing items has a similar complexity.
Adding, removing or updating existing items has a similar complexity. The contextual index grows exponentially, that's why it is actually just supported for the tokenizer ___"strict"___.

<a name="consumption"></a>
#### Compare Memory Consumption
Expand Down Expand Up @@ -2126,15 +2180,16 @@ Performance Checklist:

- Using just id-content-pairs for the index performs almost faster than using docs
- An additional where-clause in `index.search()` has a significant cost
- When adding multiple fields of documents to the index try to set the lowest possible preset for each field
- When adding multiple fields of documents to the index try to set the lowest possible preset for each field separately
- Make sure the auto-balanced ___cache___ is enabled and has a meaningful value
- Using `index.where()` to find documents is very slow when not using a tagged field
- Getting a document by ID via `index.find(id)` is extremely fast
- Do not enable ___async___ as well as ___worker___ when the index does not claim it
- Use numeric IDs (the datatype length of IDs influences the memory consumption significantly)
- Verify if you can activate _contextual index_ by setting the ___depth___ to a minimum meaningful value and tokenizer to ___"strict"___
- Try to enable _contextual index_ by setting the ___depth___ to a minimum meaningful value and tokenizer to ___"strict"___
- Pass a ___limit___ when searching (lower values performs better)
- Pass a minimum ___threshold___ when searching (higher values performs better)
- Try to minify the content size of indexed documents by just adding attributes you really need to get back from results

## Best Practices

Expand Down

0 comments on commit 8b11bf4

Please sign in to comment.