GitHub - ppwwyyxx/SoPaper at 7bc7d02076436656276c390535577e699f65b857

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 158 Commits
manage		manage
report		report
sopaper		sopaper
webapi		webapi
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
TODO		TODO
paper-downloader.py		paper-downloader.py
pdf-compress.py		pdf-compress.py
standalone_server.py		standalone_server.py

Repository files navigation

SoPaper, So Easy

This is a project designed for researchers to conveniently access papers they need.

A command line tool paper-downloader.py is included, to automatically search and download paper from Internet, with the name of the paper given. The downloaded paper will thus have a readable file name (I wrote it at the beginning because I'm tired of seeing the file name being random strings). It mainly supports searching papers in computer science.

How to Use

To run the command line tool, you'll need the following installed:

requests (pip install --user requests)
BeautifulSoup4 (pip install --user beautifulsoup4)
termcolor (pip install --user termcolor)
pdftk command line executable.
poppler-utils (optional)

Usage:

$ ./paper-downloader.py --help
$ ./paper-downloader.py "Distinctive image features from scale-invariant keypoints"
$ ./paper-downloader.py "http://arxiv.org/abs/1506.03184"

NOTE: If you are not in school, you may need proxy by environment variable http_proxy and https_proxy, to be able to download from certain sites (such as 'dl.acm.org').

Features

The searcher module will fuzzy search and analyse results in

Google Scholar
Google

and the fetcher module will further analyse the results and download papers from the following sources:

Searcher and Fetcher are extensible to support more resources.

The command line tool will directly download the paper with a clean filename. All the downloaded paper will be compressed using ps2pdf from poppler-utils, if available.

The server provide:

RESTful APIs on papers
Interactive paper reading UI supported by pdf2htmlEX

Command line tool is sufficient to use. If you'd like to play with the server, you'll need:

Python2 with virtualenv. Python headers are needed (python-dev on debian/ubuntu).
ghostscript
libcurl (libcurl4-{openssl,nss,gnutls}-dev on debian/ubuntu)
xapian (libxapian-dev & python2-xapian on debian/ubuntu)
pdf2htmlEx installed. See its download guide
poppler-utils which provide the 'pdftotext' command line util

Note: if you need to run server on debian/ubuntu, make sure you do not have 'python2-bson' package installed.

TODO

Fetcher dedup: when arxiv abs/pdf apperas both in search results, page would be downloaded twice (maybe add a cache for requests)
Don't trust arxiv link from google scholar
Is title correctly updated for dlacm?
Extract title from bibtex -- more accurate?
Fetcher for other sites

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SoPaper, So Easy

How to Use

Features

TODO

About

Releases

Packages

Contributors 4

Languages

License

ppwwyyxx/SoPaper

Folders and files

Latest commit

History

Repository files navigation

SoPaper, So Easy

How to Use

Features

TODO

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages