Skip to content

An isanlp wrapper for the winner solution from GramEval-2020 shared task (morphology, lemmatization, and UD syntax parsing for Russian).

Notifications You must be signed in to change notification settings

tchewik/isanlp_qbic

 
 

Repository files navigation

Description

An isanlp library wrapper and docker container for the ru_bert_final_model model from the 1st place solution for GramEval-2020 competition.

Usage example

  1. Install IsaNLP and its dependencies:
pip install grpcio
pip install git+https://github.com/IINemo/isanlp.git
  1. Deploy docker container with qbic model for lemmatization, morphology and syntax annotation on host machine:
docker run --rm -p 3334:3333 tchewik/isanlp_qbic
  1. Connect from python using PipelineCommon with some external tokenizer (in this example, from UDPipe module):
from isanlp import PipelineCommon
from isanlp.processor_remote import ProcessorRemote


address_udpipe = ('host_address', port)
address_grameval2020 = ('host_address', 3334)

ppl_qbic = PipelineCommon([
    (ProcessorRemote(address_udpipe[0], address_udpipe[1], '0'),
     ['text'],
     {'sentences': 'sentences',
      'tokens': 'tokens'}),
    (ProcessorRemote(address_grameval2020[0], address_grameval2020[1], '0'),
     ['tokens', 'sentences'],
     {'lemma': 'lemma',
      'postag': 'postag',
      'morph': 'morph',
      'syntax_dep_tree': 'syntax_dep_tree'})
])

text = "По нашим данным, кости обнаружили при рытье котлована торгового центра «Европа» еще в 2006 году."

res = ppl_qbic(text)
  1. The variable res['syntax_dep_tree'] can be visualized as:
        ┌──► По         case
        │ ┌► нашим      det
      ┌►└─└─ данным     parataxis
      │ └──► ,          punct
      │   ┌► кости      obj
┌─┌───└─┌─└─ обнаружили 
│ │     │ ┌► при        case
│ │     └►└─ рытье      obl
│ │   ┌─└──► котлована  nmod
│ │   │   ┌► торгового  amod
│ │ ┌─└──►└─ центра     nmod
│ │ │     ┌► «          punct
│ │ └──►┌─└─ Европа     appos
│ │     └──► »          punct
│ │   ┌────► еще        advmod
│ │   │ ┌──► в          case
│ │   │ │ ┌► 2006       amod
│ └──►└─└─└─ году       obl
└──────────► .          punct

See the full example code and a toy comparison with UDPipe output in example.ipynb

About

An isanlp wrapper for the winner solution from GramEval-2020 shared task (morphology, lemmatization, and UD syntax parsing for Russian).

Topics

Resources

Stars

Watchers

Forks

Languages

  • Python 91.8%
  • Jupyter Notebook 5.3%
  • Shell 2.1%
  • Dockerfile 0.8%