GitHub - ancjainil/Text_Summa: This is a research based on Gujarati Text Summarization

Text_Summarization

This is a research based on Gujarati Text Summarization Model : Transformer XL (Attention Layer)

Dataset

I have used a bifurgated dataset available on Indic nlp website. Besides, I have also created a dataset using Web Scraping also which can be utilized for Gujarati Text Classification.

Goals

Make a Gujarati news dataset. (I used webscraping here)

Generate some output in Gujarati language itself (regardless of the Accuracy) which gives 20% to 30% of the Input Text. If this is accomplished then go for accuracy and focus on fine tuning of the model.

Input and Output

FUNCTIONALITY

The intention is to create a coherent and fluent summary having only the main points outlined in the document. Automatic text summarization is a common problem in machine learning and natural language processing (NLP).

Summary can be Utilized as News Headline for the news.

LOGIC

--> Web Scraping for News Datset Generation.

--> Use Html Classes to extract URLs especially anchor tags.

--> Try pre-processing this dataset and divide them into pkls.

--> For summarization process, the flow would be as follows:

Here I have implemented TF-IDF to predict words around the target words.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.ipynb_checkpoints		.ipynb_checkpoints
Pkls		Pkls
Tokenization		Tokenization
__pycache__		__pycache__
language-model		language-model
utils		utils
.gitattributes		.gitattributes
README.md		README.md
Untitled.ipynb		Untitled.ipynb
WikiDesc.ipynb		WikiDesc.ipynb
WikiLinks.ipynb		WikiLinks.ipynb
__init__.py		__init__.py
all_gujarati_wikipedia_links.pkl		all_gujarati_wikipedia_links.pkl
prepro.ipynb		prepro.ipynb
preprocessor.py		preprocessor.py
stemmer.ipynb		stemmer.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text_Summarization

Dataset

Goals

Input and Output

FUNCTIONALITY

LOGIC

About

Releases

Packages

Languages

ancjainil/Text_Summa

Folders and files

Latest commit

History

Repository files navigation

Text_Summarization

Dataset

Goals

Input and Output

FUNCTIONALITY

LOGIC

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages