Skip to content

Repo for NAACL 2025 Paper "Unfolding the Headline: Iterative Self-Questioning for News Retrieval and Timeline Summarization"

Notifications You must be signed in to change notification settings

Alibaba-NLP/CHRONOS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CHRONOS: News Timeline Summarization

Pytorcharxiv badge

📑 Paper: https://arxiv.org/abs/2501.00888

🌏 Chinese Web Demo: https://modelscope.cn/studios/vickywu1022/CHRONOS

demo

🚀Overview

  • We propose CHRONOS, a novel retrieval-based approach to Timeline Summarization (TLS) by iteratively posing questions about the topic and the retrieved documents to generate chronological summaries.
  • We construct an up-to-date dataset for open- domain TLS, which surpasses existing public datasets in terms of both size and the duration of timelines.
  • Experiments demonstrate that our method is effective on open-domain TLS and achieves comparable results with state-of-the-art methods of closed-domain TLS, with significant improvements in efficiency and scalability.

overview

⚗️ OPEN-TLS Dataset

We release our Open-TLS dataset for open-domain Timeline Summarization.

The target news query is presented in news_keywords.py and the ground truth timeline is presented in data/open/{NEWS_KEYWORD}/timelines.jsonl following the below format:

[["YYY-MM-DDT00:00:00", ["", "", ""]]]

Statistics of Open-TLS are:open

🛠 Running CHRONOS

Step 1. Dependencies

pip install -r requirements.txt

Step 2. Exampled Questions Generation

The second step is to construct a topic-questions example pool for datasets in data/.

python question_exampler.py

Or, you can use our provided data/question_examples.json, which contains examples for the Crisis, T17 and Open-TLS datasets.

Step 3. Running CHRONOS

We have released the code of CHRONOS to complete open-domain Timeline Summarization task. You may also refer to our modelscope repo to build an app with streamlit.

Replacing Keys

Before running, please replace the placeholder with your own API keys in src/model.py to call either Qwen or GPT models.

DASHSCOPE_API_KEY = "YOUR_API_KEY"
OPENAI_API_KEY = "YOUR_API_KEY"

Please also replace it with your own BING Web Search API key in src/searcher.py to search news from the Internet.

BING_SEARCH_KEY = "YOUR_API_KEY"

If you want the CHRONOS to use the full page instead of only the snippet, please replace your own JINA key in src/reader.py.

JINA_API_KEY = "YOUR_API_KEY"

Running Script

To experiment with the Open-TLS dataset, run:

python main.py \
      --model_name "$model" \
      --max_round "$round" \
      --dataset open \
      --output "$output_dir" \
      --question_exs

where "$round" is the maximum self-questioning round and "$output_dir" sets the output directory containing: (1) retrieved news, (2) generated timelines and (3) evaluation scores.

📝 Citation

@article{wu2025unfoldingheadlineiterativeselfquestioning,
      title={Unfolding the Headline: Iterative Self-Questioning for News Retrieval and Timeline Summarization}, 
      author={Weiqi Wu and Shen Huang and Yong Jiang and Pengjun Xie and Fei Huang and Hai Zhao},
      year={2025},
      eprint={2501.00888},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2501.00888}, 
}

Star History

Star History Chart

About

Repo for NAACL 2025 Paper "Unfolding the Headline: Iterative Self-Questioning for News Retrieval and Timeline Summarization"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages