-
Notifications
You must be signed in to change notification settings - Fork 0
ElleryXii/Author_Identification
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
A brief exploration of author identification of English and Chinese text. Features: Tf-idf, word count and word vector. Classifiers: Logistic Regression, Naive Bayes For details, see report.pdf. Word embeddings can be found at: English word embedding: download glove.840B.300d.zip from https://nlp.stanford.edu/projects/glove/ rename the word embedding file "en_wordembedding.txt" and put it in the data folder. Chinese word embedding: download from https://pan.baidu.com/s/1IG8IxNp2s7vVklz-vyZR9A rename the word embedding file "cn_wordembedding.txt" and put it in the data folder.
About
A brief exploration into author identification of English and Chinese text.