Continuous integration (CI)

The Zulip server uses CircleCI for continuous integration. CircleCI runs frontend, backend and end-to-end production installer tests. This page documents useful tools and tips when using CircleCI and debugging issues with it.

Goals

The overall goal of our CI is to avoid regressions and minimize the total time spent debugging Zulip. We do that by trying to catch as many possible future bugs as possible, while minimizing both latency and false positives, both of which can waste a lot of developer time. There are a few implications of this overall goal:

  • If a test is failing nondeterministically in CI, we consider that to be an urgent problem.

  • If the tests become a lot slower, that is also an urgent problem.

  • Everything we do in CI should also have a way to run it quickly (under 1 minute, preferably under 3 seconds), in order to iterate fast in development. Except when working on the CI configuration itself, a developer should never have to repeatedly wait 10 minutes for a full CI run to iteratively debug something.

CircleCI

Useful debugging tips and tools

  • Zulip uses the ts tool to log the current time on every line of the output in our CircleCI scripts. You can use this output to determine which steps are actually consuming a lot of time.

  • You can sign up your personal repo for CircleCI so that every remote branch you push will be tested, which can be helpful when debugging something complicated.

  • With your personal repo signed up, CircleCI allows you to SSH into the job container if a job fails. SSHing into the containers can be helpful, especially in rare cases where the tests are passing in your computer but failing in the CI. Make sure that you have uploaded your SSH keys to GitHub: CircleCI uses those SSH keys for authentication.

Suites

The main CircleCI configuration file defining how the tests are run is ./circleci/config.yml. Our code for running the tests in CI lives under tools/ci; but they are mostly thin wrappers around Zulip’s test suites or production installer tooling.

We run multiple jobs during a CircleCI build to run Zulip’s test suites on our supported production platforms. They are currently:

  • bionic-backend-frontend

  • focal-backend

Each runs the Zulip backend test suites, using the indicated platform/OS. As suggested by the names, only one suite runs the frontend test suites, since those are not platform-dependent.

Additionally, there a couple jobs designed to do an end-to-end test on Zulip’s production installer:

  • bionic-production-build

  • bionic-production-install

  • xenial-legacy

The production-build job builds a Zulip release tarball, which is then installed in a fresh container in the production-install job; various Nagios and other checks are run to confirm the installation worked.

The xenial-legacy tests are just designed to ensure we give the right error messages when trying to install or upgrade a Xenial system to master.

Configuration

The remaining details in this section are primarily relevant for doing development on our CI system and/or provisioning process.

The first key of the job section is docker. The docker key specifies the image CircleCI should get from Docker Hub for running the job. Once CircleCI fetches the image from Docker Hub, it will spin up a docker container. See images section to know more about the images we use in CircleCI for testing.

After booting the container from the configured image, CircleCI will create the directory mentioned in working_directory and all the steps are be run from here.

The steps section describes describes everything: fetching the Zulip code, provisioning, fetching caught data, running tests and uploading coverage reports. The steps with prefix * reference aliases, which are defined in the aliases section at the top of the file.

Images

CircleCI tests are run in containers that are spun off from the images maintained by Zulip team. The Dockerfiles for the various images can be generated by running ./tools/ci/generate-dockerfiles. This command will generate the Dockerfiles of the three Ubuntu releases in ./tools/ci/images/{release_name} directories. Take a look at ./tools/ci/images.yml to see how the Dockerfiles for the three releases differ from each other. To further generate images from the Dockerfiles and upload it to Docker Hub follow the instructions in the generated Dockerfiles.

Performance optimizations

Caching

An important element of making CircleCI perform effectively is caching between jobs the various caches that live under /srv/ in a Zulip development or production environment. In particular, we cache the following:

  • Python virtualenvs

  • node_modules directories

This has a huge impact on the performance of running tests in CircleCI CI; without these caches, the average test time would be several times longer.

We have designed these caches carefully (they are also used in production and the Zulip development environment) to ensure that each is named by a hash of its dependencies and ubuntu distribution name, so Zulip should always be using the same version of dependencies it would have used had the cache not existed. In practice, bugs are always possible, so be mindful of this possibility.

A consequence of this caching is that test jobs for branches which modify package.json, requirements/, and other key dependencies will be significantly slower than normal, because they won’t get to benefit from the cache.