Spam Video Detecting System for Self-media Website Bilibili
We design a span video detector. Given impression information before user watched the video(title/cover image/clicks/type), output the possibility of being a spam video.
We define spam video as following:
- High clicks with low collections or likes.
- May have attracting cover image containing pornography or bizarre features.
- May have attracting title not in line with the fact or raising curiosity.
Author: https://github.com/GANGE666
Multiprocess crawler for getting comprehensive information of videos on bilibili. Including avid, title, times of view, cover picture address, etc.
Author: https://github.com/ZhiyangZhang98
Output a score indicating whether one video is a bait. It takes number of favorites, coins, views and comments into consideration.
Use BERT model for natural language processing. It was casted as a binary classifier here.
Author: https://github.com/KiedaTamashi
BERT source repo: https://github.com/google-research/bert
Use Resnet_v2_101 for video cover image spam classification.
Author: me (adonis-dym) https://github.com/adonis-dym
Author: https://github.com/GANGE666
Use our results to tag videos in browsers.