-
Microsoft Translator, AI4Bharat, IIT Madras
- Hyderabad, India
- http://anoopk.in
Stars
Chat Templates for 🤗 HuggingFace Large Language Models
Explain complex systems using visuals and simple terms. Help you prepare for system design interviews.
Access a database of word frequencies, in various natural languages.
Full named-entity (i.e., not tag/token) evaluation metrics based on SemEval’13
Implementation of ChatGPT RLHF (Reinforcement Learning with Human Feedback) on any generation model in huggingface's transformer (blommz-176B/bloom/gpt/bart/T5/MetaICL)
This repository provides details and links to the ACL anthology corpus/collection including .bib, .pdf and grobid extractions of the pdfs
A list of publically available audio data that anyone can download for ASR or other speech activities
🔊 A comprehensive list of open-source datasets for voice and sound computing (95+ datasets).
OpenNyAI is a mission aimed at developing open source software and datasets to catalyze the creation of AI-powered solutions to improve access to justice in India. BUILD is the first benchmark data…
Custom work around the Universal Declaration of Human Rights in Unicode
A Python framework for sequence labeling evaluation(named-entity recognition, pos tagging, etc...)
These are lists for a variety of languages containing words that are distinctive to each language.
📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.
A machine translation reading list maintained by Tsinghua Natural Language Processing Group
Appraise code used as part of WMT21 human evaluation campaign
Exploring representations for word similarity in Hindi
A neural word aligner based on multilingual BERT
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML
Neural Machine Translation Toolkit by Natlang Laboratory at SFU
A collection of links and notes on forced alignment tools
Article extraction benchmark: dataset and evaluation scripts
Fast and robust date extraction from web pages, with Python or on the command-line