rorqual (finner)

TL;DR: Like Apache OpenNLP (TokenNameFinder) but faster and actually working.

Detects all kinds of entities in financial news texts, like ISINs, FIGIs (BBGIDs), monetary values, date and time stamps, etc.

Red tape

Licence: BSD-3cl
Project: https://www.fresse.org/finner/
Downloads: https://github.com/hroptatyr/finner/releases
Issues: https://github.com/hroptatyr/finner/issues

Motivation

Unlike the NERs from the big NLP packages this is rule based and hence the false positive rate is low while the false negative rate is zero.

Instead of accounting for every edge case finner aims to cover the top 95% of use cases and is meant to be part of a booster setup. This way a bunch of heuristics, each one mediocre in coverage and accuracy, can outperform a well-trained single (but more complex) model.

Technicalities

The command-line tool finner comes with a built-in tokeniser. This is mostly due to the finner anno subcommand which would otherwise be hard to implement. However, if interactive annotation is not one of your use cases anyway and the built-in tokeniser has proved to be problematic for your input, it's highly advised to pre-tokenise the input and pass it to finner.

A somewhat more comprehensive tokeniser, based on Unicode's character classes (and therefore with full UTF8 support) is terms(1) from the glod project.

Example

A simple showcase:

$ echo '5 May 2015 Issue of $500,000,000 0.875 percent' | finner anno
5 May 2015/date Issue of $500,000,000/amt(USD) 0.875 percent/num(*0.01)

Or alternatively

$ echo '5 May 2015 Issue of $500,000,000 0.875 percent' | finner extr
5 May 2015      date    [0,10)
5       num     [0,1)
May     date    [2,5)
2015    date    [6,10)
$ 500,000,000   amt(USD)        [20,32)
$       ccy(USD)        [20,21)
500,000,000     num     [21,32)
0.875 percent   num(*0.01)      [33,46)
0.875   num     [33,38)
percent unit(*0.01)     [39,46)

Name		Name	Last commit message	Last commit date
Latest commit History 261 Commits
build-aux		build-aux
m4		m4
snap		snap
src		src
test		test
.gitattributes		.gitattributes
.gitignore		.gitignore
.travis.yml		.travis.yml
GNUmakefile		GNUmakefile
Makefile.am		Makefile.am
README.md		README.md
configure.ac		configure.ac
version.mk.in		version.mk.in

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

rorqual (finner)

Red tape

Motivation

Technicalities

Example

About

Releases 2

Packages

Languages

hroptatyr/finner

Folders and files

Latest commit

History

Repository files navigation

rorqual (finner)

Red tape

Motivation

Technicalities

Example

About

Topics

Resources

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages