forked from acl-org/acl-anthology
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Mirroring infrastructure (acl-org#1124)
* Remove build_hugo, closes acl-org#1089 unused since the introduction of the Makefile ages ago * Makefile: documentation, remove unneeded dependencies hugo already depends on bibtex, mods, endnote, so site does not need to depend on it as well. * Mirroring infrastructure This commit adds - a script to download all ACL files not in the git repo - a configurable websites directory, including the ability to host in any subdir (/anthology/, ... also top-level) - reworked anthology-files directory, symlinked into the anthology web directory (and automatically adapted in the .htaccess file) - renamed constants in anthology/data.py, including facility to set them via environment variables - additional Makefile documentation - checks in the Makefile and fewer dependencies on phony tasks create_mirror.py reads a list of anthology XML files, checks the checksums of already downloaded files, downloads new ones and checks the checksum to only put correct files into the download dir. It can be parallelized by calling it several times with different sets of XML files. The website is now built under a "website" directory; otherwise top-level builds would not be separate from other generated data. The Makefile creates a symlink inside the anthology directory to the path where the anthology-files will be on the server. Apache needs to follow symlinks for this to work. The environment variable ANTHOLOGY_PREFIX defines the host and directory under which the site is supposed to be hosted. The constants in anthology/data.py have been renamed (long-standing TODO) and the canonical URL template has been separated from the host and prefix used for hosting a copy. It is therefore now possible to host a mirror of only the HTML or HTML plus files. * Workflow adjustments No longer use Github secrets for mirroring. Github "publish" workflow now uses a Make target to sync to the main server at aclweb.org. Added a workflow that uses the new mirroring infrastructure to automatically create and (untested) remove branch previews at `https://aclanthology.org/previews/{branchname}`. Co-authored-by: Matt Post <[email protected]>
- Loading branch information
Showing
19 changed files
with
516 additions
and
116 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
name: preview | ||
|
||
on: | ||
push: | ||
branches: | ||
- '*' | ||
- '!master' | ||
|
||
jobs: | ||
preview: | ||
runs-on: ubuntu-20.04 | ||
steps: | ||
- name: install hugo | ||
run: wget https://github.com/gohugoio/hugo/releases/download/v0.58.3/hugo_extended_0.58.3_Linux-64bit.deb && sudo dpkg -i hugo_extended*.deb | ||
- name: update | ||
run: sudo apt-get update | ||
- name: install other deps | ||
run: sudo apt-get install -y jing bibutils openssh-client rsync libyaml-dev libpython3.8-dev | ||
- name: dump secret key | ||
env: | ||
SSH_KEY: ${{ secrets.PUBLISH_SSH_KEY }} | ||
run: | | ||
mkdir -p $HOME/.ssh/ | ||
echo "$SSH_KEY" > $HOME/.ssh/id_rsa | ||
chmod 600 $HOME/.ssh/id_rsa | ||
- uses: actions/checkout@v1 | ||
- name: extract branch name | ||
shell: bash | ||
run: echo "##[set-output name=branch;]$(echo ${GITHUB_REF#refs/heads/})" | ||
id: extract_branch | ||
- name: build | ||
shell: bash | ||
env: | ||
ANTHOLOGY_PREFIX: https://aclanthology.org/previews/${{ steps.extract_branch.outputs.branch }} | ||
run: | | ||
echo "Running make ANTHOLOGY_PREFIX=${ANTHOLOGY_PREFIX} NOBIB=true check site" | ||
make ANTHOLOGY_PREFIX=${ANTHOLOGY_PREFIX} NOBIB=true check site | ||
- name: preview | ||
env: | ||
ANTHOLOGY_PREFIX: https://aclanthology.org/previews/${{ steps.extract_branch.outputs.branch }} | ||
run: | | ||
make ANTHOLOGY_PREFIX=${ANTHOLOGY_PREFIX} NOBIB=true preview |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
name: remove-preview | ||
|
||
on: | ||
delete: | ||
branches: | ||
- '*' | ||
- '!master' | ||
|
||
jobs: | ||
remove-preview: | ||
runs-on: ubuntu-20.04 | ||
steps: | ||
- name: update | ||
run: sudo apt-get update | ||
- name: install other deps | ||
run: sudo apt-get install -y openssh-client rsync | ||
- name: dump secret key | ||
env: | ||
SSH_KEY: ${{ secrets.PUBLISH_SSH_KEY }} | ||
run: | | ||
mkdir -p $HOME/.ssh/ | ||
echo "$SSH_KEY" > $HOME/.ssh/id_rsa | ||
chmod 600 $HOME/.ssh/id_rsa | ||
- uses: actions/checkout@v1 | ||
- name: extract branch name | ||
shell: bash | ||
run: echo "##[set-output name=branch;]$(echo ${GITHUB_REF#refs/heads/})" | ||
id: extract_branch | ||
- name: remove-preview | ||
env: | ||
BRANCH: ${{ steps.extract_branch.outputs.branch }} | ||
run: | | ||
echo "Would delete branch ${BRANCH}" | ||
echo ssh -o StrictHostKeyChecking=accept-new rm -rf /var/www/aclanthology.org/previews/${BRANCH} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,6 @@ | ||
# -*- coding: utf-8 -*- | ||
# | ||
# Copyright 2019 Arne Köhn <[email protected]> | ||
# Copyright 2019-2021 Arne Köhn <[email protected]> | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
|
@@ -15,23 +15,60 @@ | |
# limitations under the License. | ||
|
||
# Instructions: | ||
# - if you edit the a command running python, make sure to | ||
# run . $(VENV) && python3 -- this sets up the virtual environment. | ||
# - all targets running python somewhere should have venv as a dependency. | ||
# - if you edit a command running python, make sure to | ||
# write . $(VENV) && python3 -- this sets up the virtual environment. | ||
# if you just write "python3 foo.py" without the ". $(VENV) && " before, | ||
# the libraries will not be loaded during run time. | ||
# - all targets running python somewhere should have venv/bin/activate as a dependency. | ||
# this makes sure that all required packages are installed. | ||
# - Disable bibtex etc. targets by setting NOBIB=true (for debugging etc.) | ||
# (e.g., make -j4 NOBIB=true) | ||
|
||
SHELL = /bin/sh | ||
ANTHOLOGYHOST := "https://www.aclweb.org" | ||
ANTHOLOGYDIR := anthology | ||
|
||
# If you want to host the anthology on your own, set ANTHOLOGY_PREFIX | ||
# in your call to make to your prefix, e.g. | ||
# | ||
# ANTHOLOGY_PREFIX="https://example.com" make | ||
# | ||
# (There is no need to change the value here.). PLEASE NOTE that the prefix | ||
# cannot contain any '#' character, or a Perl regex below will fail. | ||
# The following line ensures that it is exported as an environment variable | ||
# for all sub-processes | ||
|
||
export ANTHOLOGY_PREFIX ?= https://www.aclweb.org/anthology | ||
|
||
SLASHATEND:=$(shell echo ${ANTHOLOGY_PREFIX} | grep -q '/$$'; echo $$?) | ||
|
||
ifeq (${SLASHATEND},0) | ||
$(error ANTHOLOGY_PREFIX is not allowed to have a slash at the end.) | ||
endif | ||
|
||
# hugo wants to know the host and base dir on its own, so | ||
# we sed the prefix into those parts. | ||
ANTHOLOGYHOST := $(shell echo "${ANTHOLOGY_PREFIX}" | sed 's|\(https*://[^/]*\).*|\1|') | ||
ANTHOLOGYDIR := $(shell echo "${ANTHOLOGY_PREFIX}" | sed 's|https*://[^/]*/\(.*\)|\1|') | ||
|
||
# the regexp above only matches if we actually have a subdirectory. | ||
# make the dir empty if only a tld was provided as the prefix. | ||
ifeq ($(ANTHOLOGY_PREFIX),$(ANTHOLOGYDIR)) | ||
ANTHOLOGYDIR := | ||
endif | ||
|
||
# We create a symlink from $ANTHOLOGYDIR/anthology-files to this dir | ||
# to always have the same internal link to PDFs etc. | ||
# This is the directory where you have to put all the papers and attachments. | ||
ANTHOLOGYFILES ?= /var/www/html/anthology-files | ||
|
||
HUGO_ENV ?= production | ||
|
||
sourcefiles=$(shell find data -type f '(' -name "*.yaml" -o -name "*.xml" ')') | ||
xmlstaged=$(shell git diff --staged --name-only --diff-filter=d data/xml/*.xml) | ||
pysources=$(shell git ls-files | egrep "\.pyi?$$") | ||
pystaged=$(shell git diff --staged --name-only --diff-filter=d | egrep "\.pyi?$$") | ||
|
||
# these are shown in the generated html so everyone knows when the data | ||
# was generated. | ||
timestamp=$(shell date -u +"%d %B %Y at %H:%M %Z") | ||
githash=$(shell git rev-parse HEAD) | ||
githashshort=$(shell git rev-parse --short HEAD) | ||
|
@@ -68,7 +105,7 @@ HAS_BIB2XML=$(shell which bib2xml > /dev/null && echo true || echo false) | |
VENV := "venv/bin/activate" | ||
|
||
.PHONY: site | ||
site: bibtex mods endnote hugo sitemap | ||
site: build/.hugo build/.sitemap | ||
|
||
|
||
# Split the file sitemap into Google-ingestible chunks. | ||
|
@@ -77,10 +114,10 @@ site: bibtex mods endnote hugo sitemap | |
sitemap: build/.sitemap | ||
|
||
build/.sitemap: venv/bin/activate build/.hugo | ||
. $(VENV) && python3 bin/split_sitemap.py build/anthology/sitemap.xml | ||
@rm -f build/anthology/sitemap_*.xml.gz | ||
@gzip -9n build/anthology/sitemap_*.xml | ||
@bin/create_sitemapindex.sh `ls build/anthology/ | grep 'sitemap_.*xml.gz'` > build/anthology/sitemapindex.xml | ||
. $(VENV) && python3 bin/split_sitemap.py build/website/$(ANTHOLOGYDIR)/sitemap.xml | ||
@rm -f build/website/$(ANTHOLOGYDIR)/sitemap_*.xml.gz | ||
@gzip -9n build/website/$(ANTHOLOGYDIR)/sitemap_*.xml | ||
@bin/create_sitemapindex.sh `ls build/website/$(ANTHOLOGYDIR)/ | grep 'sitemap_.*xml.gz'` > build/website/$(ANTHOLOGYDIR)/sitemapindex.xml | ||
@touch build/.sitemap | ||
|
||
.PHONY: venv | ||
|
@@ -115,13 +152,14 @@ static: build/.static | |
|
||
build/.static: build/.basedirs $(shell find hugo -type f) | ||
@echo "INFO Creating and populating build directory..." | ||
@echo "INFO Split ${ANTHOLOGY_PREFIX} into HOST=${ANTHOLOGYHOST} DIR=${ANTHOLOGYDIR}" | ||
@cp -r hugo/* build | ||
@echo >> build/config.toml | ||
@echo "[params]" >> build/config.toml | ||
@echo " githash = \"${githash}\"" >> build/config.toml | ||
@echo " githashshort = \"${githashshort}\"" >> build/config.toml | ||
@echo " timestamp = \"${timestamp}\"" >> build/config.toml | ||
@perl -pi -e "s/ANTHOLOGYDIR/$(ANTHOLOGYDIR)/g" build/index.html | ||
@perl -pi -e "s#ANTHOLOGYDIR#$(ANTHOLOGYDIR)#g" build/website/index.html | ||
@touch build/.static | ||
|
||
.PHONY: yaml | ||
|
@@ -202,16 +240,27 @@ build/.hugo: build/.static build/.pages build/.bibtex build/.mods build/.endnote | |
@echo "INFO Running Hugo... this may take a while." | ||
@cd build && \ | ||
hugo -b $(ANTHOLOGYHOST)/$(ANTHOLOGYDIR) \ | ||
-d $(ANTHOLOGYDIR) \ | ||
-d website/$(ANTHOLOGYDIR) \ | ||
-e $(HUGO_ENV) \ | ||
--cleanDestinationDir \ | ||
--minify | ||
@cd build/website/$(ANTHOLOGYDIR) \ | ||
&& perl -i -pe 's|ANTHOLOGYDIR|$(ANTHOLOGYDIR)|g' .htaccess | ||
@cd build/website/$(ANTHOLOGYDIR) && ln -s $(ANTHOLOGYFILES) anthology-files | ||
@touch build/.hugo | ||
|
||
.PHONY: mirror | ||
mirror: venv/bin/activate | ||
. $(VENV) && bin/create_mirror.py data/xml/*xml | ||
|
||
.PHONY: mirror-no-attachments | ||
mirror-no-attachments: venv/bin/activate | ||
. $(VENV) && bin/create_mirror.py --only-papers data/xml/*xml | ||
|
||
.PHONY: test | ||
test: hugo | ||
diff -u build/anthology/P19-1007.bib test/data/P19-1007.bib | ||
diff -u build/anthology/P19-1007.xml test/data/P19-1007.xml | ||
diff -u build/website/$(ANTHOLOGYDIR)/P19-1007.bib test/data/P19-1007.bib | ||
diff -u build/website/$(ANTHOLOGYDIR)/P19-1007.xml test/data/P19-1007.xml | ||
|
||
.PHONY: clean | ||
clean: | ||
|
@@ -235,14 +284,14 @@ check_staged_xml: | |
fi | ||
|
||
.PHONY: check_commit | ||
check_commit: check_staged_xml venv | ||
check_commit: check_staged_xml venv/bin/activate | ||
@. $(VENV) && pre-commit run | ||
@if [ ! -z "$(pystaged)" ]; then \ | ||
. $(VENV) && black --check $(pystaged) ;\ | ||
fi | ||
|
||
.PHONY: autofix | ||
autofix: check_staged_xml venv | ||
autofix: check_staged_xml venv/bin/activate | ||
@. $(VENV) && \ | ||
EXIT_STATUS=0 ;\ | ||
pre-commit run || EXIT_STATUS=$$? ;\ | ||
|
@@ -255,7 +304,7 @@ autofix: check_staged_xml venv | |
.PHONY: serve | ||
serve: | ||
@echo "INFO Starting a server at http://localhost:8000/" | ||
@cd build && python3 -m http.server 8000 | ||
@cd build/website && python3 -m http.server 8000 | ||
|
||
# this target does not use ANTHOLOGYDIR because the official website | ||
# only works if ANTHOLOGYDIR == anthology. | ||
|
@@ -265,8 +314,14 @@ upload: | |
echo "WARNING: Can't upload because ANTHOLOGYDIR was set to '$(ANTHOLOGYDIR)' instead of 'anthology'"; \ | ||
exit 1; \ | ||
fi | ||
@echo "INFO Running rsync..." | ||
# main site | ||
@rsync -azve ssh --delete build/anthology/ [email protected]:anthology-static | ||
# aclanthology.org | ||
# @rsync -azve ssh --delete build/anthology/ [email protected]:/var/www/html | ||
@echo "INFO Running rsync for main site and mirror..." | ||
# main site | ||
@rsync -aze "ssh -o StrictHostKeyChecking=accept-new" --delete build/website/anthology/ [email protected]:anthology-static | ||
# mirror | ||
@rsync -aze "ssh -o StrictHostKeyChecking=accept-new" --delete build/website/anthology/ [email protected]:/var/www/aclanthology.org | ||
|
||
# Push a preview to the mirror | ||
.PHONY: preview | ||
preview: | ||
@echo "INFO Running rsync for the '${ANTHOLOGYDIR}' branch preview..." | ||
@rsync -avze "ssh -o StrictHostKeyChecking=accept-new" --delete build/website/${ANTHOLOGYDIR}/ [email protected]:/var/www/aclanthology.org/${ANTHOLOGYDIR} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,55 @@ | ||
#! /bin/bash | ||
# -*- coding: utf-8 -*- | ||
# | ||
# Copyright 2021 Arne Köhn <[email protected]> | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
set -e | ||
set -u | ||
|
||
# modify these two variables to your needs. | ||
# This is the URL under which your mirror will be accessible. | ||
# Note: There is no slash at the end. | ||
export ANTHOLOGY_PREFIX="https://example.com/aclmirror" | ||
|
||
# The directory under which the HTML files will reside | ||
export ANTHOLOGY_HTML_ROOT="/var/www/aclmirror" | ||
|
||
# this is the directory under which the additional files | ||
# will be stored. This directory will be symlinked | ||
# into the ANTHOLOGY_HTML_ROOT and needs to be accessible | ||
# by the webserver (depending on your configuration, it | ||
# might not need to be under the www document root). | ||
export ANTHOLOGYFILES="/var/www/html/anthology-files" | ||
|
||
# This is the directory where the anthology git will be cloned | ||
# to and the website will be built. | ||
export GITDIR="/home/anthology/anthology-git-dir" | ||
|
||
# initialize if necessary | ||
if [[ ! -e $GITDIR ]]; then | ||
mkdir -p $GITDIR | ||
fi | ||
cd $GITDIR | ||
if [[ ! -e .git ]]; then | ||
git clone https://github.com/acl-org/acl-anthology . | ||
fi | ||
|
||
ANTHOLOGYDIR=$(echo "${ANTHOLOGY_PREFIX}" | sed 's|https*://[^/]*/\(.*\)|\1|') | ||
|
||
if git pull -q; then | ||
make -j4 | ||
make mirror-no-attachments | ||
rsync -av --delete build/website/$ANTHOLOGYDIR $ANTHOLOGY_HTML_ROOT | ||
fi |
Oops, something went wrong.