This repository contains State of the Art Language models and Classifier for Sanskrit, which is an ancient Indian language.
The models trained here have been used in Natural Language Toolkit for Indic Languages (iNLTK)
Architecture/Dataset | Sanskrit Wikipedia Articles |
---|---|
ULMFiT | ~6 |
TransformerXL | ~3 |
Dataset | Accuracy | Kappa Score |
---|---|---|
Sanskrit Shlokas Dataset | 84.3 | 76.1 |
Architecture | Visualization |
---|---|
ULMFiT | Embeddings projection |
TransformerXL | Embeddings projection |
Download pretrained Language Model from here
Download classifier from here
Trained tokenizer using Google's sentencepiece
Download the trained model and vocabulary from here