Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dockerfile for CKAN #1755

Merged
merged 10 commits into from
Jun 30, 2014
Merged

Dockerfile for CKAN #1755

merged 10 commits into from
Jun 30, 2014

Conversation

nickstenning
Copy link
Contributor

What is this?

This pull request is a continuation of the discussion in #1724

This PR adds a Dockerfile and support files (including nginx and postfix configuration) for a binary CKAN docker image. Specifically, this allows you to build a docker image including CKAN, running behind nginx and mod_wsgi, by running

docker build .

For example, to build an image called ckan/ckan, you might run

docker build -t ckan/ckan .

The resulting image contains only CKAN, with a nearly vanilla configuration. In order to use it, you must do one of two things. You can either use the vanilla configuration as-is, and this requires that you specify the location of a Postgres database and a Solr core on startup:

docker run -i -t -p 80:80 \
  -e DATABASE_URL=postgres://user:pass@hostname/db \
  -e SOLR_URL=http://hostname:8983/solr/ckan_default \
  ckan/ckan

This will run CKAN, connect to the database, and initialise it if need be. Configuring Solr will have to be done separately. There are a couple of other environment variables you can use to customise the deployment, including ERROR_EMAIL, which does what you might expect.

Alternatively, and perhaps more realistically you can use this image as a base for extension. If a
configuration file is injected to /etc/ckan/default.ini, the image will use that and ignore the DATABASE_URL, SOLR_URL, and ERROR_EMAIL environment variables.

A minimal Dockerfile that uses this as a base might look something like this:

FROM ckan/ckan

ADD ./mycustomconfig.ini /etc/ckan/default.ini

Lastly, by default the CKAN file store is at /var/lib/ckan, and in a production environment you would almost certainly mount this data volume outside the running container:

docker run ... -v /mnt/storage:/var/lib/ckan ...

Why should I care?

I'm of the opinion that deploying CKAN at the moment is too complicated. The package installation makes certain assumptions (Postgres and Solr on the same server; only one CKAN per machine) which seem unrealistically restrictive (not to say unwise) for production environments.

This setup allows you to trivially run multiple CKAN instances on a single machine, all pointing to Postgres and Solr by URL (either local or remote, it doesn't matter), without having to resort to the 10-step source install. It also makes it trivial to use LXC to memory constrain individual instances (docker -m 1g ...) which will be important in high-density deployments.

Perhaps more excitingly, this can be used as a binary build base for more complicated deployments, including those which need additional extensions and configuration. You can simply start from this image and add extensions/additional services as necessary. The image builds on baseimage-docker which makes running additional supervised services very easy.

What's not right yet?

  1. Perhaps the most important omission from this PR is documentation updates. I'd like to have the discussion with people about whether this is something they want to see in the main CKAN repository before I commit to writing up the docs.
  2. CKAN is currently autoconfigured using some questionable techniques. Perhaps we should instead permit configuring certain key config properties using environment variables? Perhaps some of what is currently configuration (site_title, site_logo, site_description...) should move into the database?
  3. I don't know. You guys know far more about what a sensible CKAN deployment does than I do. Tell me what's missing.

What else should I know?

  • I've taken the liberty of reserving the "ckan" username at the Docker index, so if we want to push this as suggested, to ckan/ckan we can.

This commit adds a Dockerfile and support files (including nginx and
runit configuration) for a binary CKAN 2.2 docker image.

Specifically, this allows you to build a docker image including CKAN
2.2, running behind nginx and gunicorn, by running

    docker build .

For example, to build an image called ckan/ckan tagged at version 2.2,
you might run

    docker build -t ckan/ckan:2.2 .

The resulting image contains only CKAN, with a nearly vanilla
configuration. In order to use it, you must do one of two things. You
can either use the vanilla configuration as-is, and this requires that
you specify the location of a Postgres database and a Solr core on
startup:

    docker run -i -t -p 80:80 \
      -e DATABASE_URL=postgres://user:pass@hostname/db \
      -e SOLR_URL=http://hostname:8983/solr/ckan_default \
      ckan/ckan:2.2

This will run CKAN, connect to the database, and initialise it if need
be. Configuring Solr will have to be done separately.

Alternatively, you can use this image as a base for extension. If a
configuration file is injected to /etc/ckan/default.ini, the image will
use that and ignore the DATABASE_URL and SOLR_URL environment variables.

Lastly, by default the CKAN file store is at /var/lib/ckan, and you may
well wish to mount this data volume outside the running container:

    docker run ... -v /mnt/storage:/var/lib/ckan ...
This ensures that we can configure error emails from the CKAN instance
inside the container.

An optional environment variable, ERROR_EMAIL, can be set for the
container. If set, it will configure CKAN to send error emails to
$ERROR_EMAIL. If unset, no emails will be sent.
As discussed previously, we're going to change one thing at a time so
this sticks to the default installation instructions and uses mod_wsgi
to run CKAN rather than gunicorn.

Also revert paths to the defaults (/etc/ckan/default,
/usr/lib/ckan/default, etc.)
This adds a Dockerfile which can build a simple container running Solr
4.8.1 with a "ckan" core which uses the schema from the ckan/config/solr
directory.
@deniszgonjanin
Copy link
Contributor

That's great to see that this pull request is following the canonical approach to installing CKAN as written in the docs.

Along the same lines, it may be better to use a canonical Ubuntu image instead of phusion/baseimage. The phusion image makes significant deviations from a stock Ubuntu system, including replacing the init system, different ways of setting ENV vars, and adding it's own run scripts that it recommends the user use.

While some of these changes are nice, it a) deviates the dev environment from your ultimate production environment, when the two should be as close as possible and b) requires any dev working with and changing the Dockerfile to now also familiarize themselves with the opinionated baseimage.

As far as I can see there doesn't seem to be a reason why CKAN couldn't work on an official base docker image (https://github.com/dotcloud/docker/wiki/Public-docker-images#official-images). Since in the previous pull request some have expressed a desire to stick to the canonical way of installing CKAN, it would be great to see the a ubuntu:12.04 or ubuntu:14.04 supported.

Adds a page to the installation documentation explaining how to use and
customise the Docker image. Note that not all of the commands listed
here will work until we merge the Dockerfile and configure the building
of the image on the Docker Hub.
@nickstenning nickstenning changed the title [WIP] Dockerfile for CKAN Dockerfile for CKAN Jun 18, 2014
@nickstenning
Copy link
Contributor Author

Documentation added. Please review @wardi @seanh, et al.

@nickstenning
Copy link
Contributor Author

Along the same lines, it may be better to use a canonical Ubuntu image instead of phusion/baseimage. The phusion image makes significant deviations from a stock Ubuntu system, including replacing the init system, different ways of setting ENV vars, and adding it's own run scripts that it recommends the user use.

Absolutely it does, and that's precisely why I'm using it. If I replace the base of this image with ubuntu:14.04 then I will have to reimplement most of what phusion/baseimage provides anyway. This container runs multiple processes (including syslog, cron, and postfix) as well as the web application. Running sysv init in order to do this isn't a sensible option in a container, and so phusion/baseimage provides runit, which is what we use to run these services.

The phusion/baseimage documentation does a good job of explaining the problems with running multiple processes in containers based on the default images.

@seanh
Copy link
Contributor

seanh commented Jun 18, 2014

Along the same lines, it may be better to use a canonical Ubuntu image instead of phusion/baseimage. The phusion image makes significant deviations from a stock Ubuntu system, including replacing the init system, different ways of setting ENV vars, and adding it's own run scripts that it recommends the user use.

While some of these changes are nice, it a) deviates the dev environment from your ultimate production environment, when the two should be as close as possible and b) requires any dev working with and changing the Dockerfile to now also familiarize themselves with the opinionated baseimage.

I have to admit this makes me uncomfortable as well, for exactly the same reasons. I don't like it. But if using the stock Ubuntu 14.04 image would mean we have to re-implement most of what phusion/baseimage does anyway, then I guess we may as well use the phusion one.

Along the same lines: The current CKAN package and source install (and all our deployed sites) are Ubuntu 12.04. We do want to migrate them all to 14.04 at some point of course. But in the meantime it would be nice not to have everything on 12.04 except the docker image on 14.04. Maybe this doesn't make much difference, though.

The point about phusion deviating from the dev environment will be alleviated once we have a version of the docker container for dev, so we'll be using phusion in dev as well. (Then we will just have the Docker production and dev environments deviating from the non-Docker ones...)

@nickstenning
Copy link
Contributor Author

But in the meantime it would be nice not to have everything on 12.04 except the docker image on 14.04. Maybe this doesn't make much difference, though.

I can change this, but I really don't see how it matters. If you can point me to a way in which the fact that the version of crond or apache isn't exactly what's specified in the default install instructions causes any change whatsoever in the behaviour of CKAN, I'll happily modify it. But all reasonable expectations dictate that it's irrelevant at this stage.

Let's get the damn thing merged and then fix minor problems with it later. This is master, not a gold-plated release branch.

sudo apt-get install -y postgresql solr-jetty

after which you should follow the instructions in :ref:`postgres-setup` and
:ref:`setting up solr` to configure these services.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

postgres and solr configurations default to listen only on loopback (when I installed them at least) so I think we need a discussion of how to listen on the docker interface and how to find out what to use

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

postgres and solr configurations default to listen only on loopback (when I installed them at least) so I think we need a discussion of how to listen on the docker interface and how to find out what to use

IIRC (and I can't seem to find it quickly) the Debian policy is that a daemon should, by default, not be open to the world, and should be bound to loopback unless specifically opened up by a config file (etc) change.

@amercader
Copy link
Member

So I had a go at this (without too much spare time) and got stuck when Solr and Postgres refused connection to the docker container. Just to clarify (as I could not find it in the docs) what's the recommended approach here? Should I allow remote connections to my local Solr/Postgres or run separate Solr/Postgres containers? If the latter can we get more details on how it works?

Thanks

Getting CKAN to talk to Postgres and Solr on the Docker host is
perfectly possible, but explaining how to set up appropriate routes in
iptables is outside the scope of the CKAN installation docs.

This commit updates the CKAN run script to support connection to Pg and
Solr through Docker links, and updates the documentation accordingly.
@nickstenning
Copy link
Contributor Author

@wardi, @amercader: I hear you. Setting up containers to talk to the Docker host is a level of detail I'd rather avoid going into in our documentation.

As such, I've just pushed a commit which adds ckan/solr and ckan/postgresql containers which can be used as the default option.

Running dockerised CKAN is now as simple as:

docker run -d --name db ckan/postgresql
docker run -d --name solr ckan/solr
docker run -d --link db:db --link solr:solr -p 80:80 ckan/ckan

No Postgres installation, no Solr installation, just those three commands.

Postfix initialisation is a bit more complicated than the service file assumed
and doesn't actually work due to pieces missing from the chroot. Delegate to
the sysv init script and tail the log to keep the process running.

This is potentially a problem if postfix crashes and doesn't come back, but I
reckon finding a crashing bug in postfix is a few orders of magnitude less
likely than finding a similar bug in CKAN.
@nickstenning
Copy link
Contributor Author

No idea why the tests have failed here. I haven't touched any CKAN code proper.

Other than somebody kicking Travis, are we waiting for anything else here?

@nigelbabu
Copy link
Contributor

@nickstenning On my queue on review. I've been sick this week, that's the only delay. I'll try to merge this before Tuesday.

@nigelbabu
Copy link
Contributor

@nickstenning There doesn't seem to be a ckan/ckan. Something going wrong here?

RUN a2dissite 000-default

# Configure nginx
ADD ./contrib/docker/nginx.conf /etc/nginx/nginx.conf
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have nginx around as a proxy. Would it make sense to have that outside of the docker instance so we could run multiple ckan dockers proxied through one nginx outside?

This is not a blocker and I'm just asking questions for future improvement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely, but using nginx outside the container doesn't preclude you from using it inside as well. The running container is supposed to be opaque -- it serves CKAN on some port. As such, it should include the nginx proxy caching that's necessary to make serving static files through the WSGI app not a completely awful idea ;).

@nigelbabu
Copy link
Contributor

If we could get ckan/ckan working, I'm happy for this to go in and we can discuss improvements in a new issue. Can we please add that docker support is beta or perhaps alpha before we do that though?

@nickstenning
Copy link
Contributor Author

I haven't pushed ckan/ckan to the Docker hub because I don't want to mislead people before this pull request is merged. You can either use ckan/tmp for the moment, which is just ckan/ckan pushed to a different name, or you can build the image yourself:

docker build -t ckan/ckan .

@nigelbabu
Copy link
Contributor

I'll build

As discovered by the inimitable bug-hunter @nigelbabu, on the version of
Nginx in the image, the value of types_hash_bucket_size is set at
runtime to the processor cache line size.

This means that on hardware with small cache lines (16 bytes) the image
can fail to start with:

    nginx: [emerg] could not build the types_hash, you should increase
    either types_hash_max_size: 1024 or types_hash_bucket_size: 32

This commit sets types_hash_bucket_size and types_hash_max_size to the
values used by latest Nginx (1.7.2 at the time of writing).
@nigelbabu
Copy link
Contributor

We've run into a few bugs but the basic stuff now works.

@seanh
Copy link
Contributor

seanh commented Jun 27, 2014

Can I suggest making multiple new issues labelled docker for any stuff that you'd like to see, but that you don't think needs to block this pr? I have a bunch I was going to start adding soon

@seanh
Copy link
Contributor

seanh commented Jun 29, 2014

I'm running:

$ docker run -d --name db ckan/postgresql
$ docker run -d --name solr ckan/solr
$ docker run -d -p 80:80 --link db:db --link solr:solr ckan/tmp

And getting:

Error: Cannot start container 75fe8c23d3220d5a9b9723c13d228a409ce3fb4fd1cd5df35a77406ec8b212aa: Cannot link to a non running container: /db AS /pensive_pike/db

from the final docker run command.

The db and solr servers seemed to run fine, except for this warning:

WARNING: Local (127.0.0.1) DNS resolver found in resolv.conf and containers can't use it. Using default external servers : [8.8.8.8 8.8.4.4]

@seanh
Copy link
Contributor

seanh commented Jun 29, 2014

docker ps shows a Solr container running but not a postgresql one. The postgresql container seems to run and then stop immediately (no error), doesn't keep running like the java -jar start.jar command does for the Solr one.

@nigelbabu
Copy link
Contributor

@seanh What version of docker are you on? I had this problem with the version of docker from packages.

@nigelbabu
Copy link
Contributor

@nickstenning Can you add to the documentation that this needs the latest version of docker? After that, I'm happy to merge this in.

@seanh
Copy link
Contributor

seanh commented Jun 30, 2014

@nigelbabu Are you sure that issue is caused by using an older version of docker? I've upgraded to 1.0.1 and I'm still getting it

@seanh
Copy link
Contributor

seanh commented Jun 30, 2014

After deleting all my docker containers and images created by docker 0.9 and starting again with 1.0.1, it works (at least I can get CKAN running by following the docs, haven't gotten further)

@seanh
Copy link
Contributor

seanh commented Jun 30, 2014

A couple of small things:

  • The solr container is called ckan/solr and the link to it is solr:solr as well. The PostgreSQL container is called ckan/postgresql but the link to it is db:db. Shouldn't it be postgresql:potstresql? Or just rename them both to ckan/postgres and postgres:postgres, easier to type.
  • Why are all the docker files in a contrib dir? Why not just have the docker dir at the root of the repo? Not sure what contrib means here

I've tried both docker run ckan/tmp and docker build -t ckan/ckan .; docker run ckan/ckan now. They both work for me at least as far as CKAN is running, would be interesting to run the tests in the docker container..

The built images don't appear to work with Docker<0.10.0, in particular
not with the version of Docker that currently ships as "docker.io" in
the Ubuntu 14.04 repositories.

The install instructions do link directly to the official install docs
for Docker, which don't recommend the distro packages, but we make this
explicit now in a callout at the top of the page.
Support for these parts of CKAN will come in a future PR.
nigelbabu added a commit that referenced this pull request Jun 30, 2014
@nigelbabu nigelbabu merged commit de550a7 into ckan:master Jun 30, 2014
@nigelbabu
Copy link
Contributor

Hurrah!

@nickstenning nickstenning deleted the docker branch June 30, 2014 12:24
@nickstenning
Copy link
Contributor Author

Thank you @nigelbabu, @seanh, @wardi, @amercader and everyone else who contributed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants