Skip to content

kootenpv/reddit_ml_challenge

Repository files navigation

Reddit Machine Learning: Tagging Challenge

Build Status

This competition is hosted and run for free using github and travis-ci!

Deadline: 1st of July 2018


Description

The task is to learn to automatically assign one of the tags (Research, Project, News, Discussion) to each title of /r/MachineLearning.

The data has been gathered from this public BigQuery dataset.

It is useful to learn a classifier for 2 reasons:

  • Automatically classify new posts
  • Classify posts in the past that do not have tag

All you have to do is implement a sol.py (see here for an example) and do a pull request! See How To Compete for the full flow.

Prize

The top 3 is listed on this repo at the end of the deadline:

# Name Score
1 pwiercinski_simple_tfidf 0.6404
2 kootenpv_classical_nlp 0.6284
3 kootenpv_simple_neural 0.5629

You can also check out the full leaderboard

Maybe the subreddit hosters can use the model?

How to compete?

One of the example solution just takes 14 lines of Python.

  • Fork the repository on Github.
  • mkdir challenge/solutions/my_solution
  • Adapt challenge/solutions/kootenpv_classical_nlp/sol.py and save it in your my_solution folder
  • Commit and push
  • Open up a pull request
  • The build system will automatically score your solution

Want to make changes to the build?

Include changes in the Dockerfile. I'm not guaranteeing it will be accepted, but you can try :) Changes will not be "reasonably withheld".

Local development flow

Make changes in your solutions/ folder and run:

docker build -t reddit_ml_challenge . && docker run --rm -it reddit_ml_challenge

If you do not want to use docker you can also try to just run:

cd challenge
ls -1rtd solutions/** | tail -n 1 > solution_test.txt
pytest -s test_solution.py

But please do make sure it is compatible with the format of the challenge when submitting.

Attribution

If you also want to run such a challenge, it would be cool if you refer to this project :)

About

Reddit Machine Learning: Tagging Challenge

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages