Skip to content

txye/Summarization

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Summarization

data

存放cv和train数据
cv和train里面,docs里为输入的多文档集合(已去除标签),model里为对应每个文档集合的ground truth, headline为对应每篇文档的headline(从原来的或标签里提取出来的)

article.py

整合cluster里面的文章,过滤一些乱七八糟的符号,去掉多余空格,然后进行分句,再计算每一句的向量(tf-isf,使用sklearn里面计算tfidf的包)

summary.py

主要利用了SubModular的方法进行摘要

About

Multi-document summarization

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published