Run all jupyter notebook cell and put all dataset on a directory "data". Result file create is "result.json"
python3 hw3.py
For doing this competition, I have use a cluster method based on sklearn tools. Cluster is based on name of co author with a pre-processing for merge firstname and lastname.
And the cluster is build based on metrics "cosine" and others parameters for have a better score (0.594).