NSFW - not safe for work
Trained on 600,000 labled pictures:
porn
- pornography imageshentai
- hentai images, but also includes pornographic drawingssexy
- sexually explicit images, but not pornography. Think nude photos, playboy, bikini, etc.neutral
- safe for work neutral images of everyday things and peopledrawings
- safe for work drawings (including anime)
pytorch 0.4.0
#train
python train.py --model resnet101 --epochs 90 --batch-size 512 --checkpoint ./checkpoint --data-dir ./data
#test
python test_confusion_matrix.py
#predict
python predict --model resnet101 --checkpoint ./checkpoint/x
#if your machine has connected to the internet and you dosen't want to download the image to your disk
cat urls.txt | python predict_url.py
Special thanks to the nsfw_data_scraper for the training data. If you're interested in a more detailed analysis of types of NSFW images, you could probably use this repo code with this data.
If you want make better result.Contact me.I can provide you the best training data.
Sexy and porn is a little similar.In my view,it does'nt matter.
SEXY
NETURAL
I have tried various methods include some pretrained models like resnet/inceptionv3 and data augumentation and finetuing.
Here are some tips which make a greate effect to the final result:
- Make batch size bigger.(the bigger the better since I make it 512 with my p40)
- Use pretrained model.(you can use torchvision. pretrained model can help your model convergence more faster)
- Lock some layer and finetune FC.(after train_init.py then lock some layer just finetune the FC)
- Adjust learning rate.(make lr dynamic when training in order to get saddle point)
- Select appropriate pretrained model.(I choose resnet101 since it receive better result than resnet50 or inceptionv3)
Thanks for my wife FeiFei Li. She gave me lots of encouragement. And made the beautiful logo for NSFW preject.
Thanks for my workmate Kuai Li. He gave me lots of good suggestion.
If you have good points.Join us!
You can attach me by:
[email protected]
https://twitter.com/yangbisheng2009
基于60万图片数据训练性感&色情模型,标签如下:
porn
- 色情hentai
- 动漫色情、图画sexy
- 性感neutral
- 普通drawings
- 普通动漫、图画
好心人提供的开源数据: nsfw_data_scraper.
如果你想训练鲁棒性、效果更好的模型,可使用这这份数据.
如果你想训练适合工业生产环境的高准召模型,可以联系我.
这个工程,我尝试了各种各样的方法。试验了通过直方图特征/傅里叶变换特征/小波变换特征 + 传统机器学习方法,以及inceptionv3,resnetX等各式各样的迁移学习方法,总结如下:
- 尽可能调大batch_size,我在我的P40机器上,设置了512
- 我最终选用了resnet101 pretrained model,它在诸多的方案中,表现最好
- 在整体finetune后,可以lock模型前面的N层,重新finetune一次。为什么这么做能带来更好的效果,我想你们都懂的
- 根据你的数据集、模型选择等因素,动态调整你的learning rate
- 我采用了很多数据增强的方法,如颜色变换、高斯噪声点、旋转、平移、剪切、色调对比度变换。但是结果发现这些方法,并没有太大的卵用(针对这个工程而言),最终只保留一小部分
这个工程耗时很长,在模型选型、模型调参、数据筛选过程中,都遇到了各种各样的困难(当然我认为,如果你想取得一个很不错的效果,你的数据肯定是最重要的)。感谢我老婆李菲菲女士,帮我制作了NSFW的logo,就是文章开始红色的那个,漂亮吧。同时也感谢我的伙伴李快同学,他给了我很多比较好的建议。