Continuous integration (CI)

The Zulip server uses GitHub Actions for continuous integration. GitHub Actions runs frontend, backend and end-to-end production installer tests. This page documents useful tools and tips when using GitHub Actions and debugging issues with it.

Goals

The overall goal of our CI is to avoid regressions and minimize the total time spent debugging Zulip. We do that by trying to catch as many possible future bugs as possible, while minimizing both latency and false positives, both of which can waste a lot of developer time. There are a few implications of this overall goal:

If a test is failing nondeterministically in CI, we consider that to be an urgent problem.
If the tests become a lot slower, that is also an urgent problem.
Everything we do in CI should also have a way to run it quickly (under 1 minute, preferably under 3 seconds), in order to iterate fast in development. Except when working on the CI configuration itself, a developer should never have to repeatedly wait 10 minutes for a full CI run to iteratively debug something.

GitHub Actions

Useful debugging tips and tools

GitHub Actions stores timestamps for every line in the logs. They are hidden by default; you can see them by toggling the Show timestamps option in the menu on any job’s log page. (You can get this sort of timestamp in a development environment by piping output to ts).
GitHub Actions runs on every branch you push on your Zulip fork. This is helpful when debugging something complicated.
You can also ssh into a container to debug failures. SSHing into the containers can be helpful, especially in rare cases where the tests are passing in your computer but failing in the CI. There are various Actions available on GitHub Marketplace to help you SSH into a container. Use whichever you find easiest to set up.

Suites

We run multiple jobs during a GitHub Actions build to efficiently run Zulip’s various test suites, some of them multiple times because we support multiple versions of the base OS. See the Actions tabs for full list of Actions that we run.

Files which define GitHub workflows live in .github/workflows directory. zulip-ci.yml is the main file where most of the tests are run. production-suite.yml builds a Zulip release tarball, which is then installed in a fresh container. Various Nagios and other checks are run to confirm the installation worked.

zulip-ci.yml is designed to run our main test suites on all of our supported platforms. Out of them, only one of them runs the frontend tests, since puppeteer is slow and unlikely to catch issues that depend on the version of the base OS and/or Python. Similarly, only a (different) one runs the documentation tests.

Our code for running the tests in CI lives under tools/ci; but that logic is mostly thin wrappers around Zulip’s test suites or production installer.

The Legacy OS tests are designed to ensure we give good error messages when trying to upgrade Zulip servers running on very old base OS versions with EOL Python versions that Zulip no longer supports.

Configuration

The remaining details in this section are primarily relevant for doing development on our CI system and/or provisioning process.

The first key of the job section is docker. The docker key specifies the image GitHub Actions should get from Docker Hub for running the job. Once GitHub Actions fetches the image from Docker Hub, it will spin up a docker container. See images section to know more about the images we use in GitHub Actions for testing.

After booting the container from the configured image, GitHub Actions will create the directory mentioned in working_directory and all the steps will be run from here.

The steps section describes everything: fetching the Zulip code, provisioning, fetching caught data, running tests and uploading coverage reports. The steps with prefix * reference aliases, which are defined in the aliases section at the top of the file.

Images

GitHub Actions tests are run in containers that are spun off from the images maintained by Zulip team. The Docker images can be generated by running tools/ci/build-docker-images; see instructions at the top of tools/ci/Dockerfile for more information.

Performance optimizations

Caching

An important element of making GitHub Actions perform effectively is caching between jobs the various caches that live under /srv/ in a Zulip development or production environment. In particular, we cache the following:

Python virtualenvs
node_modules directories

This has a huge impact on the performance of running tests in GitHub Actions CI; without these caches, the average test time would be several times longer.

We have designed these caches carefully (they are also used in production and the Zulip development environment) to ensure that each is named by a hash of its dependencies and ubuntu distribution name, so Zulip should always be using the same version of dependencies it would have used had the cache not existed. In practice, bugs are always possible, so be mindful of this possibility.

A consequence of this caching is that test jobs for branches which modify package.json, pyproject.toml, and other key dependencies will be significantly slower than normal, because they won’t get to benefit from the cache.