Skip to content

This is a research based on Gujarati Text Summarization

Notifications You must be signed in to change notification settings

ancjainil/Text_Summa

Repository files navigation

Text_Summarization

This is a research based on Gujarati Text Summarization Model : Transformer XL (Attention Layer)

Dataset

I have used a bifurgated dataset available on Indic nlp website. Besides, I have also created a dataset using Web Scraping also which can be utilized for Gujarati Text Classification.

Goals

Make a Gujarati news dataset. (I used webscraping here)

Generate some output in Gujarati language itself (regardless of the Accuracy) which gives 20% to 30% of the Input Text. If this is accomplished then go for accuracy and focus on fine tuning of the model.

Input and Output

image

FUNCTIONALITY

The intention is to create a coherent and fluent summary having only the main points outlined in the document. Automatic text summarization is a common problem in machine learning and natural language processing (NLP).

Summary can be Utilized as News Headline for the news.

LOGIC

--> Web Scraping for News Datset Generation.

--> Use Html Classes to extract URLs especially anchor tags.

--> Try pre-processing this dataset and divide them into pkls.

--> For summarization process, the flow would be as follows:

image

Here I have implemented TF-IDF to predict words around the target words.

About

This is a research based on Gujarati Text Summarization

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published