Triggering trusted CI jobs on untrusted forks

Originally published as a guest post on the CircleCI Blog.

On public code repositories, arbitrary users are generally allowed to make forks and issue pull requests (which we will refer to as “forked PRs”). These users could be from outside your organization and are generally considered untrusted for the purposes of running automated builds. This causes a problem if you want a Continuous Integration service to run tests that need access to credentials or sensitive data as a malicious developer could propose code that exposes the credentials to CI logs. Even for a private repository, you may want some users within your organization to have view access to the repository without the potential of triggering a build that exposes secrets.

Previously, in Managing secrets when you have pull requests from outside contributors, we discussed how to hide credentials from forked PRs and skip the builds that require them. But that assumed the credentialed builds were intended for staging build artifacts and were only truly necessary for merges to master. If the output of a trusted job is instead necessary for proper review of the change before merging, then we can’t simply skip the job. What we want is a way for a trusted member of our team to check over a forked PR, verify it doesn’t introduce changes that might spill secrets, and then kick off the trusted CI jobs we need for ensuring integrity of the changes. In the first half of this post, we will discuss how to use Git itself as a means for marking code as trusted and enabling this workflow. The second half will walk through a full demonstration of how these concepts can be applied to a specific repository host (GitHub) and CI provider (CircleCI).

Marking code as trusted by pushing upstream

The act of merging a pull request into a repository can be seen partly as an act of marking code as valid and trusted. Forks can be seen as a way of isolating a user’s changes until they are validated. Outside contributors generally cannot push code to any branches of the main repository, but only to their own forks.

Providers like GitHub have set up their security controls in a way that supports this model: generally, only a specific set of named users are allowed access to make any changes to a repository. CI providers likewise set up their permissions with this in mind: CI permissions are generally configured so that the CI system will expose credentials only to jobs triggered from pushes to the main repository.

Any system that puts build configuration directly in the repository risks running untrusted code if builds are triggered automatically upon creation of a PR. But this also means that the act of pushing code to an upstream branch can be used as an expression of trust, allowing trusted jobs to run. If the code from a forked PR is pushed to an upstream branch, the Git references (“refs”) for the upstream branch and forked branch will be identical, pointing to the same commit. GitHub understands that they are identical code and many of GitHub’s API endpoints care only about the commit ref rather than the branch the ref is associated with. In the case of CI jobs, GitHub indexes by commit, so a trusted job that runs as a result of pushing code to an upstream branch will also be associated with the forked PR that points to that identical commit.

Implementing a workflow for triggering trusted CI jobs

Now, let’s put the above theory into practice by building a sample project. We’ll have a basic test job that runs on any PR, even if it’s from a fork, but we’ll also have a more comprehensive test-with-data job that requires AWS credentials to pull some private data from Amazon S3. We want to avoid triggering this latter job until someone with commit access on the repository reviews the PR to ensure it’s not going to leak credentials.

Creating the project

Let’s start with the same small Java project we created in Managing secrets when you have pull requests from outside contributors. Using Apache Maven as the build tool, we can generate the project via:

mvn archetype:generate -DgroupId=com.mycompany.app -DartifactId=managing-secrets -DarchetypeArtifactId=maven-archetype-quickstart -Dversion=1.3 -DinteractiveMode=false

Now, we’ll create a simple CircleCI workflow that runs a single test job:

version: 2.1

jobs:
  test:
    docker:
      - image: circleci/openjdk:8u171-jdk
    steps:
      - checkout
      - run: mvn clean test

workflows:
  version: 2
  build:
    jobs:
      - test

Once we commit this to GitHub and enable it as a project in CircleCI, each push will trigger a run of the build workflow in CircleCI. PRs issued from any branch on the main repository will show the status of the test job.

Enabling CI for forked pull requests

Now, we’d like to enable this same workflow for pull requests originating from forked repositories. Not only does this allow proposed changes from contributors without commit access , but it’s also helpful for committers who prefer to work from their own forks.

To enable CI for forked pull requests, we go to the settings page for our project within CircleCI and choose Build Settings > Advanced Settings and enable the “Build forked pull requests” option.

While we’re there, notice the next option, “Pass secrets to builds from forked pull requests”. That’s disabled by default, which is exactly what we want here. In the next step, we’re going to upload AWS credentials and we don’t want to accidentally expose them to users outside our organization.

Adding secrets

We make our AWS credentials available to trusted builds by setting them as project-specific environment variables. Note that it’s also possible to create a Context that is shared by multiple projects.

We’ll now move to the Build Settings > Environment Variables section of the project configuration in CircleCI and add AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY variables that contain credentials allowed to read from a location in Amazon S3 where we have staged our private test data. These variables will only get set for pushes to branches on the main repository initiated by someone with commit access, not for CircleCI jobs triggered from a forked pull request.

Adding a trusted job that does not run for forked PRs

We’re now ready to set up an additional job that accesses credentials and run tests based on the private data. Let’s take a look at the updated config:

version: 2.1

jobs:
  test:
    docker:
      - image: circleci/openjdk:8u171-jdk
    steps:
      - checkout
      - run: mvn clean test
  test-with-data:
    docker:
      - image: circleci/openjdk:8u171-jdk
    steps:
      - checkout
      - run: |
          if [ -z "$AWS_ACCESS_KEY_ID" ]; then
            echo "No AWS_ACCESS_KEY_ID is set! Failing..."
            exit 1;
          else
            echo "Credentials are available. Let's fetch the data!"
          fi

workflows:
  version: 2
  build:
    jobs:
      - test
      - test-with-data:
        filters:
          branches:
            # Forked pull requests have CIRCLE_BRANCH set to pull/XXX
            ignore: /pull\/[0-9]+/

The new test-with-data job is a bit of a sham; to keep this post brief, we haven’t implemented any real fetching of data or tests that use that data. Instead, we have stubbed this out by just checking for the existence of credentials. If this job is initiated by a push to the main repository, then environment variables should be made available and the job should pass.

We put the test-with-data job into our workflow with a filter that ensures the job will not run if it was trigger by a forked PR. In the previous post, we instead implemented a technique for returning early in the case of a forked PR; the filter accomplishes much the same thing, but is less verbose.

Testing out the review workflow for forked PRs

Let’s make a small configuration change on our repository now to make it more obvious that tests are being skipped; after all, if we’re going through the trouble of adding a trusted job to run our tests with real data, then making sure those tests pass is likely central to ensuring the integrity of our project.

In our GitHub repo, we’ll go to Settings > Branches and we’ll add a Branch Protection Rule. In our particular case, the target branch is triggering and we want to enable the toggle for “Require status checks to pass before merging” and ensure that we select both of our CI jobs as required (which show up as ci/circleci: test and ci/circleci: test-with-data).

Now, when someone issues a PR from a fork, the test job will run and (hopefully) turn green on success. The test-with-data job won’t run (since this is a forked PR), but because we’ve marked it as required, it will still show up in the status checks section of the PR page in status “Expected — Waiting for status to be reported”, drawing the reviewer’s attention to the fact that that job will need to be run before merging. Even better, we can create a PULL_REQUEST_TEMPLATE.md with a checklist for the reviewer, explicitly instructing them to look for potential security problems before kicking off a trusted build.

Let’s consider an example of a PR in this state – jklukas/managing-secrets#4. A reviewer taking a look at this PR would do an initial scan of the code, catch that the PR introduces a change in the CI job that prints credentials to logs, and either ask for a change or close the PR; test-with-data never runs, so credentials are never exposed.

If instead the change were benign and a reviewer decided it was safe to run, they could kick off the trusted job by pushing the PR’s commits to a branch of the main repository. This can be done manually via direct git invocations, but it is a bit tedious. The general workflow is to add the PR’s fork as a remote, pull down the PR’s branch, push that branch to the main repository, and then clean up. You can install a small bash script from jklukas/git-push-fork-to-upstream-branch to make this a more convenient one-liner:

git-push-fork-to-upstream-branch upstream <fork_username>:<fork_branch>

Once the code has been pushed to an upstream branch, the test-with-data status check on the forked PR will go into the executing state, and hopefully get us to a fully green state, ready to merge.