This verision is used only for verifying the experimental results in the paper (Linear co-occurrence rate networks (L-CRNs) for sequence labeling, Zhemin Zhu, Djoerd Hiemstra, Peter Apers, Statistical Language and Speech Processing 2014, Springer. pp. 185-196). All rights of the datasets belong to their original authors.
Co-occurrence rate networks are for sequence labeling tasks, such as named entity recognition, part-of-speech tagging … The applications of this software are similar to CRFs (http://crfpp.googlecode.com/svn/trunk/doc/index.html). But CRN can be trained much faster and obtain better or very competitive results.
The Ubuntu 12.04 and gcc 4.7.3 are used for compiling the software. We do not know if this works on other systems. If your gcc is old version, you can update it using these steps to gcc 4.7.3:
- sudo add-apt-repository ppa:ubuntu-toolchain-r/test
- sudo apt-get update
- sudo apt-get install gcc-4.7
- sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-4.6 60 --slave /usr/bin/g++ g++ /usr/bin/g++-4.6
- sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-4.7 40 --slave /usr/bin/g++ g++ /usr/bin/g++-4.7
- sudo update-alternatives --config gcc
- sudo make
- ./train train_file template_file model_folder
- ./decode model_folder test_file result_file
- ./train ./data/ner_en.train ./data/ner.template ./model/
- ./decode ./model/ ./data/ner_en.testa testa.result
- ./decode ./model/ ./data/ner_en.testb testb.result
Training and decoding data has the same format, see examples in the ./data.
See example in the ./data for the template format.