Abstractive text summarization on CNN/Dailymail and inshorts dataset.
The code uses a typical transformer architecture as a baseline and uses a pretrained bert encoder as the final model. The project needs at least 16 GB of VRAM and 32 GB of RAM to run successfully, wandb to be initialized and tweaks on the bert architecture.