We have proposed embedding derived from syllables and morphemes for the words to improve the performance of language model. Our method has achieved state of the art performance in terms of Key Stroke Saving (KSS) w.r.t. to existing device input prediction methods and has been commercialized.
The data set which is used for evaluating the model proposed in ""
Evaluation data is manually curated to compare our performance with existing word prediction methods. The dataset is consist of 67 sentences (825 words, 7,531 characters) which are collection of formal and informal utterances from various sources which covers general keyboard scenarios.
You can download the data set directly from the comman line:
git clone https://github.com/Meinwerk/SyllableLevelLanguageModel.git
You can also download the data set as a zip file using the following URL:
https://github.com/Meinwerk/SyllableLevelLanguageModel/master.zip
./eval_kss_ko.txt
Contact: Seunghak Yu, Nilesh Satish Kulkarni
Email: <full_name>@gmail.com