GitHub - ElleryXii/Author_Identification: A brief exploration into author identification of English and Chinese text.

ElleryXii / Author_Identification Public

Notifications You must be signed in to change notification settings
Fork 0
Star 0

A brief exploration into author identification of English and Chinese text.

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data		data
.gitignore		.gitignore
AuthorIdentify.py		AuthorIdentify.py
Report.pdf		Report.pdf
readme.txt		readme.txt

Repository files navigation

A brief exploration of author identification of English and Chinese text.
Features: Tf-idf, word count and word vector.
Classifiers: Logistic Regression, Naive Bayes 
For details, see report.pdf.


Word embeddings can be found at:

English word embedding: download glove.840B.300d.zip from https://nlp.stanford.edu/projects/glove/ 
rename the word embedding file "en_wordembedding.txt" and put it in the data folder. 

Chinese word embedding: 
download from https://pan.baidu.com/s/1IG8IxNp2s7vVklz-vyZR9A
rename the word embedding file "cn_wordembedding.txt" and put it in the data folder.