Skip to content


Repository files navigation

librAIry NLP toolkit

nlp service provides an efficient and easy way to analyze large amounts of texts through standard HTTP and TCP APIs.


Built on top of several NLP open-source tools it offers:

  • Part-of-Speech Tagger (and filter)
  • Lemmatizer
  • N-Grams Identifier
  • Wikipedia Relations
  • Wordnet Synsets

And all this by means of json-based queries via HTTP or TCP request deployed with Docker containers.

Quick Start


This is oriented to small load tests. If you need more resources it is recommended to start an instance on your servers.

Run locally

  1. Install Docker
  2. Run the service by:
    $ docker run --rm -p 7777:7777 -e "REST_PATH=/nlp" librairy/nlp:latest
  3. Once started, the service should be available at: http://localhost:7777/nlp

Run in distributed mode

Create a Swarm and configure as services as you need.


It can be tuned using the following environment variables when defining the Docker container:

Variable Description
REST_PATH Service Namespace. (/ by default)
REST_PORT HTTP listening port (7777 by default)
NLP_AVRO_PORT TCP listening port (65111 by default)
SPOTLIGHT_ENDPOINT DBpedia Spotlight url (required to discover Wikipedia references)
SPOTLIGHT_THRESHOLD Confidence (required to discover Wikipedia references)


All services can include lemmatizer actions, part-of-speech tagging and even n-grams identifications.

Given a text, it:

  • /annotations : adds grammatical info.
  • /groups: creates a bag-of-words.
  • /tokens: filters words from the request.

DBpedia Spotlight

The service uses DBpedia Spotlight to identify Wikipedia resources referenced in the text.

When you enable the option references in the requests to the service, it is required to have made the deployment of the corresponding module:

  1. Install Docker-Compose
  2. Create and move into a directory named nlp/
  3. Create a file named docker-compose.yml with the following content:
    version: '2'
        image: dbpedia/spotlight-english:latest
        command: java -Dfile.encoding=UTF-8 -Xmx15G -Dthreads.max=15 -Dthreads.core=15 -jar /opt/spotlight/dbpedia-spotlight-nightly-build.jar /opt/spotlight/en
        restart: always
        image: librairy/nlp:latest
        restart: always
         - "7777:7777"
          - REST_PATH=/nlp
          - JAVA_OPTS=-Xmx32768m
  4. Make sure you have added the DBpedia Spotlight modules for the languages you are going to use.
  5. Check that you have the necessary resources (e.g. memory, cpu, disk...), as these modules are very demanding.
  6. Run the service by:
    $ docker-compose up
  7. Check that the following traces appear (depending on the environment it may take a few minutes)
    nlp_1 | [main] INFO  e.u.o.l.n.Application - Started Application in 16.847 seconds (JVM running for 18.964)
    dbpedia-en-spotlight_1  | Server started in / listening on
  8. Once started, the service should be available at: http://localhost:7777/nlp
  • The above commands run two services: DBpedia Spotlight and librAIry NLP, and uses the settings specified within docker-compose.yml.
  • The DBpedia service has a lazy start. This means that first requests will be slower until all resources are initialized.


You can use the following to cite the service:

 author = {Badenes-Olmedo, Carlos and Redondo-Garcia, Jos{\'e} Luis and Corcho, Oscar},
 title = {Distributing Text Mining Tasks with librAIry},
 booktitle = {Proceedings of the 2017 ACM Symposium on Document Engineering},
 series = {DocEng '17},
 year = {2017},
 isbn = {978-1-4503-4689-4},
 pages = {63--66},
 numpages = {4},
 url = {},
 doi = {10.1145/3103010.3121040},
 acmid = {3121040},
 publisher = {ACM},
 keywords = {data integration, large-scale text analysis, nlp, scholarly data, text mining},


This repository is maintained by Carlos Badenes-Olmedo. Please send me an e-mail or open a GitHub issue if you have questions.


NLP toolkit for cross-lingual texts







No releases published


No packages published
