Skip to content

Commit

Permalink
archives: add full archives list, use yaml and convert to json before…
Browse files Browse the repository at this point in the history
… compose build

use ARCHIVE_JSON to specify file to be mapped to /archives.json as volume
pywb: support custom unrewritten_url paths
  • Loading branch information
ikreymer committed Nov 30, 2015
1 parent e3dce7f commit 2aa4190
Show file tree
Hide file tree
Showing 6 changed files with 160 additions and 100 deletions.
88 changes: 0 additions & 88 deletions archives.json

This file was deleted.

145 changes: 145 additions & 0 deletions archives.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
# Wayback Based
- id: ia
name: Internet Archive
timegate: http://web.archive.org/web/
timemap: http://web.archive.org/web/timemap/link/

- id: ba
name: Bibliotheca Alexandrina Web Archive
timegate: http://web.archive.bibalex.org/web/
timemap: http://web.archive.bibalex.org/web/timemap/link/

- id: blarchive
name: UK Web Archive
timegate: http://www.webarchive.org.uk/wayback/archive/
timemap: http://www.webarchive.org.uk/wayback/archive/timemap/link/

- id: loc
name: Library of Congress
timegate: http://webarchive.loc.gov/all/
timemap: http://webarchive.loc.gov/all/timemap/link/

- id: archiveit
name: Archive-It
timegate: http://wayback.archive-it.org/all/
timemap: http://wayback.archive-it.org/all/timemap/link/

- id: is
name: Icelandic Web Archive
timegate: http://wayback.vefsafn.is/wayback/
timemap: http://wayback.vefsafn.is/wayback/timemap/link/

- id: swa
name: Stanford Web Archive
timegate: https://swap.stanford.edu/
timemap: https://swap.stanford.edu/timemap/link/

# LANL Proxy
- id: es
name: Estonian Web Archive
timegate: http://timetravel.mementoweb.org/es/timegate/
timemap: http://timetravel.mementoweb.org/es/timemap/link/
unrewritten_url: http://veebiarhiiv.nlib.ee/a/{timestamp}id_/{url}

- id: si
name: Slovenian Web Archive
timegate: http://timetravel.mementoweb.org/si/timegate/
timemap: http://timetravel.mementoweb.org/si/timemap/link/
unrewritten_url: http://nukrobi2.nuk.uni-lj.si:8080/wayback/{timestamp}id_/{url}

- id: nara
name: Slovenian Web Archive
timegate: http://timetravel.mementoweb.org/nara/timegate/
timemap: http://timetravel.mementoweb.org/nara/timemap/link/
unrewritten_url: '{archivehost}/{timestamp}id_/{url}'


# other
- id: ukparliament
name: UK Parliament Web Archive
timegate: http://webarchive.parliament.uk/timegate/
timemap: http://webarchive.parliament.uk/timemap/
unrewritten_url: http://webarchive.parliament.uk/frame/{timestamp}/{url}

- id: proni
name: PRONI Web Archive
timegate: http://webarchive.proni.gov.uk/timegate/
timemap: http://webarchive.proni.gov.uk/timemap/
unrewritten_url: http://webarchive.proni.gov.uk/frame/{timestamp}/{url}


# Rhizome Collections
- id: excellences-and-perfections
name: 'Excellences & Perfections by Amalia Ulman'
timegate: http://webenact.rhizome.org/excellences-and-perfections/
timemap: http://webenact.rhizome.org/excellences-and-perfections/timemap/*/

- id: my-body-a-wunderkammer
name: "my body -- a Wunderkammer by Shelley Jackson"
timegate: http://webenact.rhizome.org/my-body-a-wunderkammer/
timemap: http://webenact.rhizome.org/my-body-a-wunderkammer/timemap/*/

- id: beautiful-frog
name: 'Beautiful Frog by Porpentine'
timegate: http://webenact.rhizome.org/beautiful-frog/
timemap: http://webenact.rhizome.org/beautiful-frog/timemap/*/

- id: howling-dogs
name: 'Howling Dogs by Porpentine'
timegate: http://webenact.rhizome.org/howling-dogs/
timemap: http://webenact.rhizome.org/howling-dogs/timemap/*/

- id: either-we-inspire-or-we-expire
name: 'Either we inspire or we expire by Liam Gillick and Nate Silver'
timegate: http://webenact.rhizome.org/either-we-inspire-or-we-expire/
timemap: http://webenact.rhizome.org/either-we-inspire-or-we-expire/timemap/*/

- id: do-you-want-love-or-lust
name: 'Do you want love or lust? by Claude Closky'
timegate: http://webenact.rhizome.org/do-you-want-love-or-lust/
timemap: http://webenact.rhizome.org/do-you-want-love-or-lust/timemap/*/

- id: smiling-at-the-past
name: 'Smiling at the Past by Constant Dullaart'
timegate: http://webenact.rhizome.org/smiling-at-the-past/
timemap: http://webenact.rhizome.org/smiling-at-the-past/timemap/*/

- id: vvork
name: 'VVORK by Rhizome'
timegate: http://webenact.rhizome.org/vvork/
timemap: http://webenact.rhizome.org/vvork/timemap/*/

- id: yelp-droitcour
name: 'Yelp! reviews by Brian Droitcour'
timegate: http://webenact.rhizome.org/yelp-droitcour/
timemap: http://webenact.rhizome.org/yelp-droitcour/timemap/*/

- id: keeping-up-appearances
name: 'keeping up appearances by Mendi+Keith Obadike'
timegate: http://webenact.rhizome.org/keeping-up-appearances/
timemap: http://webenact.rhizome.org/keeping-up-appearances/timemap/*/

- id: big-data-little-narration
name: 'Big Data, Little Narration by Dragan Espenschied'
timegate: http://webenact.rhizome.org/big-data-little-narration/
timemap: http://webenact.rhizome.org/big-data-little-narration/timemap/*/

- id: v4ult
name: 'V4ULT by V4ULT'
timegate: http://webenact.rhizome.org/v4ult/
timemap: http://webenact.rhizome.org/v4ult/timemap/*/

- id: sqrrl
name: 'SQRRL by John Russel'
timegate: http://webenact.rhizome.org/sqrrl/
timemap: http://webenact.rhizome.org/sqrrl/timemap/*/

- id: untitled-scrollbars
name: 'Scrollbar Composition by Jan Robert Leegte'
timegate: http://webenact.rhizome.org/untitled-scrollbars/
timemap: http://webenact.rhizome.org/untitled-scrollbars/timemap/*/

- id: queers-in-love-at-the-end-of-the-world
name: 'queers in love at the end of the world by Anna Anthropy'
timegate: http://webenact.rhizome.org/queers-in-love-at-the-end-of-the-world/
timemap: http://webenact.rhizome.org/queers-in-love-at-the-end-of-the-world/timemap/*/
14 changes: 7 additions & 7 deletions docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,10 @@ pywb:
build: ./pywb
restart: always
environment:
- ARCHIVE_JSON=${ARCHIVE_JSON}
- ARCHIVE_JSON=/archives.json

volumes:
- ./archives.json:/archives.json
- ${ARCHIVE_JSON}:/archives.json

# for init order
volumes_from:
Expand All @@ -15,12 +15,12 @@ memgator:
image: ibnesayeed/memgator:master
restart: always

command: --arcs=${ARCHIVE_JSON} server
#ports:
# - 1208:1208
command: --arcs=/archives.json server
ports:
- 1209:1208

volumes:
- ./archives.json:/archives.json
- ${ARCHIVE_JSON}:/archives.json

nginx:
build: ./nginx
Expand All @@ -33,7 +33,7 @@ nginx:
ports:
- 80:80
- 1208:1208
#- 1210:1210
# - 1210:1210

redis:
image: redis:latest
Expand Down
6 changes: 4 additions & 2 deletions pywb/archivereplayview.py
Original file line number Diff line number Diff line change
Expand Up @@ -174,7 +174,7 @@ def _get_urls_to_try(self, cdx, skip_hosts, wbrequest):

#full_url = self.archive_template + wbrequest.coll + '/' + cdx['timestamp'] + 'id_/' + url
full_url = self.archive_template.format(timestamp=cdx['timestamp'],
url=cdx['url'])
url=cdx['url'])

try_urls = [full_url]
return try_urls, self.archive_template, self.archive_name
Expand Down Expand Up @@ -208,7 +208,9 @@ def load_archive_info_json(self, url):
id_ = arc['id']
name = arc['name']
uri = arc['timegate']
unrewritten_url = uri + '{timestamp}id_/{url}'
unrewritten_url = arc.get('unrewritten_url')
if not unrewritten_url:
unrewritten_url = uri + '{timestamp}id_/{url}'

self.archive_infos[id_] = {'uri': uri,
'name': name,
Expand Down
4 changes: 2 additions & 2 deletions pywb/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -32,8 +32,8 @@ collections:

# Specify memento archivelist XML
#memento_archive_xml: 'http://labs.mementoweb.org/aggregator_config/archivelist.xml'
#memento_archive_json: '/archives.json'
memento_archive_json: $ARCHIVE_JSON
memento_archive_json: '/archives.json'
#memento_archive_json: $ARCHIVE_JSON

reverse_proxy_prefix: http://netcapsule_nginx_1:1210/

Expand Down
3 changes: 2 additions & 1 deletion run-local.sh
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
#export ARCHIVE_JSON=http://webenact.rhizome.org/collinfo.json
export ARCHIVE_JSON=/archives.json
export ARCHIVE_JSON=./archives.gen.json
python -c "import yaml; import json; data = yaml.load(open('archives.yaml')); open('$ARCHIVE_JSON', 'w').write(json.dumps(data))"
docker-compose --x-networking build
docker-compose --x-networking up -d

0 comments on commit 2aa4190

Please sign in to comment.