Deploying documentation to GitHub Pages with continuous integration

Originally published as a guest post on the CircleCI Blog.

Continuous integration (CI) tools have been evolving towards flexible, general-purpose computing environments. They aren’t just used for running tests and reporting results, but often run full builds and send artifacts to external systems. If you’re already relying on a CI system for these other needs, it can be convenient to build and deploy your documentation using the same platform rather than pulling in an additional tool or service.

This post gives an overview of some popular options currently available for building and deploying documentation before diving into the details of using CircleCI to deploy documentation to GitHub Pages, a workflow that will be convenient for teams already using those tools for hosting code and running automated tests.

Options for deploying documentation

API documentation is generally rendered from a codebase using a language-specific documentation tool (sphinx for Python, javadoc for Java, etc.). The building of the documentation can be done on the developer’s local machine, in a CI environment, or in a documentation-specific hosting service.

Services for hosting documentation are generally language-specific and can be a great low-friction option for a team that tends to write projects in a single language. For example, Read the Docs has been a standard tool in the Python community for many years. Read the Docs uses webhooks to watch commits to a hosted repository and will automatically build and render documentation for each code update and offers some nice conveniences that could be difficult to replicate in your own pipeline, such as deploying multiple versions of documentation and maintaining links from rendered docs to source code. Its limitations come into play if teams need to deploy docs for additional languages or if builds require uncommon system dependencies that can’t be installed via the pip or conda package managers. Using a documentation-specific service also means maintaining another set of user accounts and permissions for that additional service.

Conversely, the least infrastructure-dependent workflow for building documentation is for developers to build docs locally and check the results into the project repository. Most teams prefer to keep generated content out of source control to keep code reviews simpler and to lessen developer responsibility for building and committing the content, but some may enjoy seeing the revision history of documentation alongside the code. GitHub has developed support for this workflow by offering the option to render contents of a docs directory to GitHub Pages. Other setups may still need a separate deploy step for documentation in a CI system.

If instead, a team decides to build documentation as part of a CI flow, content could be deployed to a wide variety of destinations such as a locally maintained server, an object store like Amazon S3, GitHub Pages, or some other external hosting service. In most cases, the CI job will need some form of credentials in order to authenticate with the destination, which can be the most complex part of the flow. One of the main advantages of GitHub Pages as a documentation host is the consolidation of permissions; any developer with admin access on a repository can set up deploys to GitHub Pages and provision the deploy keys needed for a CI service to commit content.

Options for deploying to GitHub Pages

GitHub offers three options for deploying a site to GitHub Pages, with different implications for workflows and credentials.

The oldest option, and the one we’ll use in our walkthrough, is for pushes to a special gh-pages branch to trigger deploys. This is generally maintained as an “orphan” branch with a completely separate revision history from master, which can be a bit difficult to maintain. In our case, we’ll build a CircleCI workflow that builds documentation, commits changes to the gh-pages branch using a library, and then pushes the branch to GitHub using a deploy key that we’ll provision.

The second option is to have GitHub Pages render the master branch. This can be useful for a repository that exists only to host documentation, but doesn’t help much if your goal is to benefit from keeping code and rendered documentation close together with a single permissions model.

Finally, GitHub Pages can render a docs directory on the master branch, which supports workflows where developers are expected to generate and commit documentation as part of their local workflows. This requires no CI platform and no additional credentials, but most teams prefer not to include generated content in their master branch as discussed in the previous section.

Step-by-step walkthrough

Creating a basic Python project

Let’s build a small Python package that uses standard Python ecosystem tools for tests (pytest) and documentation (sphinx). We’ll configure CircleCI to run tests, build documentation, and finally deploy to GitHub Pages via a gh-pages branch. Full code for the project is available in jklukas/docs-on-gh-pages.

In a fresh directory, we’ll create a simple package called mylib with a single hello function. mylib/__init__.py looks like:

def hello():
    return 'Hello'

We also need to create a test directory with an empty __init__.py file and test_hello.py containing:

import mylib
    
def test_hello():
    assert mylib.hello() == 'Hello'

To actually run the tests, we’ll need to have pytest installed, so let’s specify that in a requirements.txt file. We’ll also request sphinx, the documentation tool we’ll be using in the next section:

sphinx==1.8.1
pytest==3.10.0

At this point, we can write a very simple CircleCI workflow containing a single job that will run our test. We create a .circleci/config that looks like:

version: 2
    
jobs:
  test:
    docker:
      - image: python:3.7
    steps:
      - checkout
      - run:
          name: Install dependencies
          command: pip install -r requirements.txt
      - run:
          name: Test
          command: pytest
    
workflows:
  version: 2
  build:
    jobs:
      - test

We commit all these results, push them to a new GitHub repository, and enable that repository in CircleCI. CircleCI should generate an initial build for the master branch which should come back green.

Add docs

Now that we have a basic library with tests, let’s set up the documentation framework. At this point, you’ll need to have sphinx installed locally, so you may want to create a virtual environment using the venv tool and then call pip install -r requirements.txt, which makes the sphinx-quickstart command-line tool available for generating a documentation skeleton. We’ll invoke it like this:

sphinx-quickstart docs/ --project 'mylib' --author 'J. Doe'
# accept defaults at all the interactive prompts

sphinx-quickstart generated a Makefile for us, so building docs is as simple as calling make html from the docs/ directory. Let’s codify that in a new job in our CircleCI flow. We can add the following underneath jobs:

  docs-build:
    docker:
      - image: python:3.7
    steps:
      - checkout
      - run:
          name: Install dependencies
          command: pip install -r requirements.txt
      - run:
          name: Build docs
          command: cd docs/ && make html
      - persist_to_workspace:
          root: docs/_build
          paths: html

Invoking make html populates a docs/_build/html directory containing the content that we want to deploy. The final persist_to_workspace step of our new docs-build job saves the contents of that directory to an intermediate location that will be accessible to later jobs in our workflow. For now, we’ll add this new job to our workflow:

workflows:
  version: 2
  build:
    jobs:
      - test
      - docs-build

and commit the results.

Even without deploying the rendered content, this job is now serving as a check on the integrity of our docs. If sphinx is unable to run successfully, this job will fail, letting you know something is wrong.

Deploying rendered docs to a gh-pages branch

We’re ready at this point to start building the final piece of our CI workflow, a job that will deploy the built documentation by pushing it to the gh-pages branch of our repository.

We want gh-pages to be an “orphan” branch that tracks only the rendered docs and has a separate timeline from the source code in master. It’s possible to create such a branch and copy content into it using bare git command-line invocations, but it can be full of edge cases and easily lead to a corrupted work environment if anything goes wrong. Pulling in a purpose-built tool is a reasonable choice in this case and there several available as open source projects. The most popular among these at the moment is actually a Node.js module called gh-pages that includes a command-line interface, which is what we’ll use here.

You would be completely justified in questioning why we’d choose an application requiring a JavaScript environment for deploying Python docs. It seems like added complexity at first glance, but it actually fits in our workflow fairly seamlessly since our CI environment supports Docker containers natively and we can choose independent base images for each of our jobs. We get to build the documentation inside a container with a Python runtime, then share the output with a new container with a Node.js runtime.

Let’s go ahead and write a first version of a docs-deploy job underneath the jobs section of our config.yml file and walk through the steps:

  docs-deploy:
    docker:
      - image: node:8.10.0
    steps:
      - checkout
      - attach_workspace:
          at: docs/_build
      - run:
          name: Install and configure dependencies
          command: |
            npm install -g --silent gh-pages@2.0.1
            git config user.email "ci-build@klukas.net"
            git config user.name "ci-build"
      - run:
          name: Deploy docs to gh-pages branch
          command: gh-pages --dist docs/_build/html

We use a node base image so that the npm package manager and Node.js runtime are available. The attach_workspace step mounts the rendered documentation from the docs-build step into our container, then we call npm install to download the target module, which includes a command-line utility, gh-pages, that we’ll invoke in the next step. The git config commands are required per the module documentation. Finally, the invocation of gh-pages --dist docs/_build/html copies the contents of the html directory into the root of the gh-pages branch and pushes the results to GitHub.

Let’s add this new step to our workflow. The workflows section now looks like:

workflows:
  version: 2
  build:
    jobs:
      - test
      - docs-build
      - docs-deploy:
          requires:
            - test
            - docs-build
          filters:
            branches:
              only: master

We made the docs-deploy job dependent on the other two steps, meaning that it won’t run until both those steps complete successfully. This ensures we don’t accidentally publish docs for a state of the repository that doesn’t pass tests. We also set a filter to specify that the docs-deploy job should be skipped except for builds of the master branch. That way, we don’t overwrite the published docs for changes that are still in flight on other branches.

If we check in all these changes and let CircleCI run our job, our new job will fail:

ERROR: The key you are authenticating with has been marked as read only.

So there’s a bit more work we need to do to clean this up and make sure our CI job has the necessary credentials.

Provisioning a deploy key

As discussed in CircleCi’s integration docs, GitHub provides a few options for giving a job access to change a repository. Generally, GitHub permissions are tied to users, so a credential must either be tied to a single human user account or a special machine user account must be provisioned. There’s a lot of flexibility there for granting access across repositories, but it can become somewhat complex.

We opt instead to provision a read/write deploy key. This is an ssh key pair specific to a single repository rather than a user. This is nice for teams, because it means access doesn’t disappear if the user who provisions the key leaves the organization or deletes their account. It also means that any user who is an administrator on the account can follow the steps below to get the integration set up.

Let’s follow the instructions in the CircleCI docs and apply them to our case.

We start by creating an ssh key pair on our local machine:

ssh-keygen -t rsa -b 4096 -C "ci-build@klukas.net"
# Accept the default of no password for the key (This is a special case!)
# Choose a destination such as 'docs_deploy_key_rsa'

We end up with a private key docs_deploy_key_rsa and a public key docs_deploy_key_rsa.pub. We hand over the private key to CircleCI by navigating to https://circleci.com/gh/jklukas/docs-on-gh-pages/edit#ssh, hitting “Add SSH Key”, entering “github.com” as the hostname, and pasting in the contents of the private key file. At this point, we can go ahead and delete the private key from our system, as only our CircleCI project should have access:

rm docs_deploy_key_rsa

The https://circleci.com/gh/jklukas/docs-on-gh-pages/edit#ssh page will show us the fingerprint for our key, which is a unique identifier that’s safe to expose publicly (unlike the private key itself, which is sufficient to give an attacker write access to your repository). We add a step in our docs-deploy job to grant the job access to the key with this fingerprint:

      - add_ssh_keys:
          fingerprints:
            - "59:ad:fd:64:71:eb:81:01:6a:d7:1a:c9:0c:19:39:af"

While we’re on the subject of security, we’ll head to https://circleci.com/gh/jklukas/docs-on-gh-pages/edit#advanced-settings and double check that “Pass secrets to builds from forked pull requests” is set to its default of “Off”. SSH keys are one of the types of secrets that we only want to make available if we trust the code being run; if we allowed this key to be available to forks, an attacker could craft a pull request that prints the contents of our private key to the CircleCI logs.

Now, we need to upload the public key to GitHub so that it knows to trust a connection from CircleCI initiated with our private key. We head to https://github.com/jklukas/docs-on-gh-pages/settings/keys > Add Deploy Key, make the title of it “CircleCI write key” and paste in the contents of docs_deploy_key_rsa.pub. If you haven’t already deleted the private key, be extra careful you’re not accidentally copying from docs_deploy_key_rsa!

Some final fixups

Before we test that our CircleCI workflow can successfully push changes to GitHub, let’s address a few final details.

First, our built documentation contains directories starting with _, which have special meaning to jekyll, the static site engine built into GitHub Pages. We don’t want jekyll to alter our content, so we need to add a .nojekyll file and pass the --dotfiles flag to gh-pages since that utility will otherwise ignore all dotfiles.

Second, we need to provide a custom commit message that includes [skip ci] which instructs CircleCI that it shouldn’t initiate anew when we push this content to the gh-pages branch. The gh-pages branch contains only rendered HTML content, not the source code and config.yml, so the build will have nothing to do and will simply show up as failing in CircleCI. Our full job now looks like:

  docs-deploy:
    docker:
      - image: node:8.10.0
    steps:
      - checkout
      - attach_workspace:
          at: docs/_build
      - run:
          name: Disable jekyll builds
          command: touch docs/_build/html/.nojekyll
      - run:
          name: Install and configure dependencies
          command: |
            npm install -g --silent gh-pages@2.0.1
            git config user.email "ci-build@klukas.net"
            git config user.name "ci-build"
      - add_ssh_keys:
          fingerprints:
            - "59:ad:fd:64:71:eb:81:01:6a:d7:1a:c9:0c:19:39:af"
      - run:
          name: Deploy docs to gh-pages branch
          command: gh-pages --dotfiles --message "[skip ci] Updates" --dist docs/_build/html

We’re ready to commit our updated configuration and let CircleCI run the workflow. Once it shows green, we should notice that our repository now has a gh-pages branch and that the rendered content is now available at https://jklukas.github.io/docs-on-gh-pages/.

Conclusion

There is no one obvious “best way” to build and deploy documentation. The path of least resistance for your team is going to depend on the particular mix of workflows, tools, and infrastructure that you are already familiar with. Your organizational structure is important as well, as it will have implications for who needs to be involved to provision credentials and get systems talking to one another.

The particular solution presented here is currently a good fit for the data platform team at Mozilla (see an example in practice at mozilla/python_moztelemetry) because it is adaptable to different languages (our team also maintains projects in Java and Scala), it minimizes the number of tools to be familiar with (we are already invested in GitHub and CircleCI), the permissions model gives our team autonomy in setting up and controlling the documentation workflow, and we haven’t seen a need for any of the more advanced features available from documentation-specific hosting providers.