Skip to content

A brief exploration into author identification of English and Chinese text.

Notifications You must be signed in to change notification settings

ElleryXii/Author_Identification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A brief exploration of author identification of English and Chinese text.
Features: Tf-idf, word count and word vector.
Classifiers: Logistic Regression, Naive Bayes 
For details, see report.pdf.


Word embeddings can be found at:

English word embedding: download glove.840B.300d.zip from https://nlp.stanford.edu/projects/glove/ 
rename the word embedding file "en_wordembedding.txt" and put it in the data folder. 

Chinese word embedding: 
download from https://pan.baidu.com/s/1IG8IxNp2s7vVklz-vyZR9A
rename the word embedding file "cn_wordembedding.txt" and put it in the data folder. 

About

A brief exploration into author identification of English and Chinese text.

Topics

Resources

Stars

Watchers

Forks

Languages