This competition is hosted and run for free using github and travis-ci!
Deadline: 1st of July 2018
The task is to learn to automatically assign one of the tags (Research, Project, News, Discussion) to each title of /r/MachineLearning.
The data has been gathered from this public BigQuery dataset.
It is useful to learn a classifier for 2 reasons:
- Automatically classify new posts
- Classify posts in the past that do not have tag
All you have to do is implement a sol.py
(see here for an example) and do a pull request!
See How To Compete for the full flow.
The top 3 is listed on this repo at the end of the deadline:
# | Name | Score |
---|---|---|
1 | pwiercinski_simple_tfidf | 0.6404 |
2 | kootenpv_classical_nlp | 0.6284 |
3 | kootenpv_simple_neural | 0.5629 |
You can also check out the full leaderboard
Maybe the subreddit hosters can use the model?
One of the example solution just takes 14 lines of Python.
- Fork the repository on Github.
mkdir challenge/solutions/my_solution
- Adapt
challenge/solutions/kootenpv_classical_nlp/sol.py
and save it in yourmy_solution
folder - Commit and push
- Open up a pull request
- The build system will automatically score your solution
Include changes in the Dockerfile
. I'm not guaranteeing it will be accepted, but you can try :) Changes will not be "reasonably withheld".
Make changes in your solutions/
folder and run:
docker build -t reddit_ml_challenge . && docker run --rm -it reddit_ml_challenge
If you do not want to use docker you can also try to just run:
cd challenge
ls -1rtd solutions/** | tail -n 1 > solution_test.txt
pytest -s test_solution.py
But please do make sure it is compatible with the format of the challenge when submitting.
If you also want to run such a challenge, it would be cool if you refer to this project :)