-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dockerfile for CKAN #1755
Dockerfile for CKAN #1755
Conversation
This commit adds a Dockerfile and support files (including nginx and runit configuration) for a binary CKAN 2.2 docker image. Specifically, this allows you to build a docker image including CKAN 2.2, running behind nginx and gunicorn, by running docker build . For example, to build an image called ckan/ckan tagged at version 2.2, you might run docker build -t ckan/ckan:2.2 . The resulting image contains only CKAN, with a nearly vanilla configuration. In order to use it, you must do one of two things. You can either use the vanilla configuration as-is, and this requires that you specify the location of a Postgres database and a Solr core on startup: docker run -i -t -p 80:80 \ -e DATABASE_URL=postgres://user:pass@hostname/db \ -e SOLR_URL=http://hostname:8983/solr/ckan_default \ ckan/ckan:2.2 This will run CKAN, connect to the database, and initialise it if need be. Configuring Solr will have to be done separately. Alternatively, you can use this image as a base for extension. If a configuration file is injected to /etc/ckan/default.ini, the image will use that and ignore the DATABASE_URL and SOLR_URL environment variables. Lastly, by default the CKAN file store is at /var/lib/ckan, and you may well wish to mount this data volume outside the running container: docker run ... -v /mnt/storage:/var/lib/ckan ...
This ensures that we can configure error emails from the CKAN instance inside the container. An optional environment variable, ERROR_EMAIL, can be set for the container. If set, it will configure CKAN to send error emails to $ERROR_EMAIL. If unset, no emails will be sent.
As discussed previously, we're going to change one thing at a time so this sticks to the default installation instructions and uses mod_wsgi to run CKAN rather than gunicorn. Also revert paths to the defaults (/etc/ckan/default, /usr/lib/ckan/default, etc.)
This adds a Dockerfile which can build a simple container running Solr 4.8.1 with a "ckan" core which uses the schema from the ckan/config/solr directory.
That's great to see that this pull request is following the canonical approach to installing CKAN as written in the docs. Along the same lines, it may be better to use a canonical Ubuntu image instead of phusion/baseimage. The phusion image makes significant deviations from a stock Ubuntu system, including replacing the init system, different ways of setting ENV vars, and adding it's own run scripts that it recommends the user use. While some of these changes are nice, it a) deviates the dev environment from your ultimate production environment, when the two should be as close as possible and b) requires any dev working with and changing the Dockerfile to now also familiarize themselves with the opinionated baseimage. As far as I can see there doesn't seem to be a reason why CKAN couldn't work on an official base docker image (https://github.com/dotcloud/docker/wiki/Public-docker-images#official-images). Since in the previous pull request some have expressed a desire to stick to the canonical way of installing CKAN, it would be great to see the a ubuntu:12.04 or ubuntu:14.04 supported. |
Adds a page to the installation documentation explaining how to use and customise the Docker image. Note that not all of the commands listed here will work until we merge the Dockerfile and configure the building of the image on the Docker Hub.
Absolutely it does, and that's precisely why I'm using it. If I replace the base of this image with The |
I have to admit this makes me uncomfortable as well, for exactly the same reasons. I don't like it. But if using the stock Ubuntu 14.04 image would mean we have to re-implement most of what phusion/baseimage does anyway, then I guess we may as well use the phusion one. Along the same lines: The current CKAN package and source install (and all our deployed sites) are Ubuntu 12.04. We do want to migrate them all to 14.04 at some point of course. But in the meantime it would be nice not to have everything on 12.04 except the docker image on 14.04. Maybe this doesn't make much difference, though. The point about phusion deviating from the dev environment will be alleviated once we have a version of the docker container for dev, so we'll be using phusion in dev as well. (Then we will just have the Docker production and dev environments deviating from the non-Docker ones...) |
I can change this, but I really don't see how it matters. If you can point me to a way in which the fact that the version of Let's get the damn thing merged and then fix minor problems with it later. This is |
sudo apt-get install -y postgresql solr-jetty | ||
|
||
after which you should follow the instructions in :ref:`postgres-setup` and | ||
:ref:`setting up solr` to configure these services. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
postgres and solr configurations default to listen only on loopback (when I installed them at least) so I think we need a discussion of how to listen on the docker interface and how to find out what to use
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
postgres and solr configurations default to listen only on loopback (when I installed them at least) so I think we need a discussion of how to listen on the docker interface and how to find out what to use
IIRC (and I can't seem to find it quickly) the Debian policy is that a daemon should, by default, not be open to the world, and should be bound to loopback unless specifically opened up by a config file (etc) change.
So I had a go at this (without too much spare time) and got stuck when Solr and Postgres refused connection to the docker container. Just to clarify (as I could not find it in the docs) what's the recommended approach here? Should I allow remote connections to my local Solr/Postgres or run separate Solr/Postgres containers? If the latter can we get more details on how it works? Thanks |
Getting CKAN to talk to Postgres and Solr on the Docker host is perfectly possible, but explaining how to set up appropriate routes in iptables is outside the scope of the CKAN installation docs. This commit updates the CKAN run script to support connection to Pg and Solr through Docker links, and updates the documentation accordingly.
@wardi, @amercader: I hear you. Setting up containers to talk to the Docker host is a level of detail I'd rather avoid going into in our documentation. As such, I've just pushed a commit which adds Running dockerised CKAN is now as simple as:
No Postgres installation, no Solr installation, just those three commands. |
Postfix initialisation is a bit more complicated than the service file assumed and doesn't actually work due to pieces missing from the chroot. Delegate to the sysv init script and tail the log to keep the process running. This is potentially a problem if postfix crashes and doesn't come back, but I reckon finding a crashing bug in postfix is a few orders of magnitude less likely than finding a similar bug in CKAN.
No idea why the tests have failed here. I haven't touched any CKAN code proper. Other than somebody kicking Travis, are we waiting for anything else here? |
@nickstenning On my queue on review. I've been sick this week, that's the only delay. I'll try to merge this before Tuesday. |
@nickstenning There doesn't seem to be a ckan/ckan. Something going wrong here? |
RUN a2dissite 000-default | ||
|
||
# Configure nginx | ||
ADD ./contrib/docker/nginx.conf /etc/nginx/nginx.conf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have nginx around as a proxy. Would it make sense to have that outside of the docker instance so we could run multiple ckan dockers proxied through one nginx outside?
This is not a blocker and I'm just asking questions for future improvement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely, but using nginx outside the container doesn't preclude you from using it inside as well. The running container is supposed to be opaque -- it serves CKAN on some port. As such, it should include the nginx proxy caching that's necessary to make serving static files through the WSGI app not a completely awful idea ;).
If we could get ckan/ckan working, I'm happy for this to go in and we can discuss improvements in a new issue. Can we please add that docker support is beta or perhaps alpha before we do that though? |
I haven't pushed
|
I'll build |
As discovered by the inimitable bug-hunter @nigelbabu, on the version of Nginx in the image, the value of types_hash_bucket_size is set at runtime to the processor cache line size. This means that on hardware with small cache lines (16 bytes) the image can fail to start with: nginx: [emerg] could not build the types_hash, you should increase either types_hash_max_size: 1024 or types_hash_bucket_size: 32 This commit sets types_hash_bucket_size and types_hash_max_size to the values used by latest Nginx (1.7.2 at the time of writing).
We've run into a few bugs but the basic stuff now works. |
Can I suggest making multiple new issues labelled docker for any stuff that you'd like to see, but that you don't think needs to block this pr? I have a bunch I was going to start adding soon |
I'm running:
And getting:
from the final The db and solr servers seemed to run fine, except for this warning:
|
|
@seanh What version of docker are you on? I had this problem with the version of docker from packages. |
@nickstenning Can you add to the documentation that this needs the latest version of docker? After that, I'm happy to merge this in. |
@nigelbabu Are you sure that issue is caused by using an older version of docker? I've upgraded to 1.0.1 and I'm still getting it |
After deleting all my docker containers and images created by docker 0.9 and starting again with 1.0.1, it works (at least I can get CKAN running by following the docs, haven't gotten further) |
A couple of small things:
I've tried both |
The built images don't appear to work with Docker<0.10.0, in particular not with the version of Docker that currently ships as "docker.io" in the Ubuntu 14.04 repositories. The install instructions do link directly to the official install docs for Docker, which don't recommend the distro packages, but we make this explicit now in a callout at the top of the page.
Support for these parts of CKAN will come in a future PR.
Hurrah! |
Thank you @nigelbabu, @seanh, @wardi, @amercader and everyone else who contributed here. |
What is this?
This pull request is a continuation of the discussion in #1724
This PR adds a Dockerfile and support files (including nginx and postfix configuration) for a binary CKAN docker image. Specifically, this allows you to build a docker image including CKAN, running behind nginx and mod_wsgi, by running
For example, to build an image called
ckan/ckan
, you might runThe resulting image contains only CKAN, with a nearly vanilla configuration. In order to use it, you must do one of two things. You can either use the vanilla configuration as-is, and this requires that you specify the location of a Postgres database and a Solr core on startup:
This will run CKAN, connect to the database, and initialise it if need be. Configuring Solr will have to be done separately. There are a couple of other environment variables you can use to customise the deployment, including
ERROR_EMAIL
, which does what you might expect.Alternatively, and perhaps more realistically you can use this image as a base for extension. If a
configuration file is injected to /etc/ckan/default.ini, the image will use that and ignore the
DATABASE_URL
,SOLR_URL
, andERROR_EMAIL
environment variables.A minimal Dockerfile that uses this as a base might look something like this:
Lastly, by default the CKAN file store is at
/var/lib/ckan
, and in a production environment you would almost certainly mount this data volume outside the running container:Why should I care?
I'm of the opinion that deploying CKAN at the moment is too complicated. The package installation makes certain assumptions (Postgres and Solr on the same server; only one CKAN per machine) which seem unrealistically restrictive (not to say unwise) for production environments.
This setup allows you to trivially run multiple CKAN instances on a single machine, all pointing to Postgres and Solr by URL (either local or remote, it doesn't matter), without having to resort to the 10-step source install. It also makes it trivial to use LXC to memory constrain individual instances (
docker -m 1g ...
) which will be important in high-density deployments.Perhaps more excitingly, this can be used as a binary build base for more complicated deployments, including those which need additional extensions and configuration. You can simply start from this image and add extensions/additional services as necessary. The image builds on baseimage-docker which makes running additional supervised services very easy.
What's not right yet?
site_title
,site_logo
,site_description
...) should move into the database?What else should I know?
ckan/ckan
we can.