Skip to content

guillaumekln/simdoc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

47 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

simdoc

simdoc associates most similar documents in a dataset using the TF-IDF method.

Requirements

  • C++11
  • boost
  • Intel TBB
  • CMake

Usage

./simdoc [options] input

where input is either a directory or a text file with one file per line.

See --help option for a complete usage.

Example

The command

./simdoc -t 8 -c 5 -r data/ > output.json

associates the 5 most similar documents to each document in the data directory and its sub-directories using 8 threads.

See examples/output.json for an output example.

About

Similar document search using TF-IDF.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published