On (not) making Wikimedia CI faster: part 2, Selenium

2021-12-15 10:25:54 +0100 +0100

In the previous post, we saw that the Selenium job in Ci is by far the slowest one in the group, at around 22 minutes. Although, looking at the build trends, the longest running recent job was 38 minutes 🙀. What could we do to improve this?

⚠️ Disclaimer

As in previous post, I’ll repeat that I’m an interested observer and occasional participant in trying to improve Selenium job times, but the post here isn’t a critique of anyone’s efforts past or present, just some observations to hopefully improve things.

With that out of the way, let’s have a closer look.

Console output review

My starting point is to look at what is happening in each phase of the test. I’ll use build #127253 as an example.

First let’s review the console output for the job. We see it started at 9:05:04 and ended at 9:22:23. Let’s break down what happens in those 17 minutes.

Setup (1 minute 24 seconds)

09:05:04 Started by user unknown or anonymous
09:05:04 Running as SYSTEM
09:05:04 Building remotely on integration-agent-docker-1005 (pipelinelib Docker blubber) in workspace /srv/jenkins/workspace/workspace/wmf-quibble-selenium-php72-docker
09:05:05 [wmf-quibble-selenium-php72-docker] $ /bin/bash -xe /tmp/jenkins12548012000970804262.sh
09:05:05 + mkdir -m 2777 -p cache
09:05:06 [wmf-quibble-selenium-php72-docker] $ /bin/bash /tmp/jenkins4414388339502769169.sh
09:05:06 + set -o pipefail
09:05:06 ++ pwd
09:05:06 + exec docker run --volume /srv/jenkins/workspace/workspace/wmf-quibble-selenium-php72-docker/cache:/cache --security-opt seccomp=unconfined --init --rm --label jenkins.job=wmf-quibble-selenium-php72-docker --label jenkins.build=127253 --env-file /dev/fd/63 docker-registry.wikimedia.org/releng/castor:0.2.4 load
09:05:06 ++ /usr/bin/env
09:05:06 ++ egrep -v '^(HOME|SHELL|PATH|LOGNAME|MAIL)='
09:05:06 Defined: CASTOR_NAMESPACE="castor-mw-ext-and-skins/master/wmf-quibble-selenium-php72-docker"
09:05:06 Syncing...
09:05:07 rsync: failed to set times on "/cache/.": Operation not permitted (1)
09:06:23 rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1668) [generator=3.1.2]
09:06:23 
09:06:23 Done

First, the job is cloning a cache for use in the job. I don’t know if the rsync error is a problem. It’s also unclear what exactly is being rsynced. It’s 1 minute and 16 seconds, so perhaps worth looking at more closely.

09:06:24 [wmf-quibble-selenium-php72-docker] $ /bin/bash -xe /tmp/jenkins13589427653727773721.sh
09:06:24 + set -eux
09:06:24 + mkdir -m 2777 -p log
09:06:24 [wmf-quibble-selenium-php72-docker] $ /bin/bash /tmp/jenkins5429125508407867888.sh
09:06:24 + set -o pipefail
09:06:24 + exec docker run --user=nobody --entrypoint=/usr/bin/find --volume /srv/jenkins/workspace/workspace/wmf-quibble-selenium-php72-docker:/workspace --security-opt seccomp=unconfined --init --rm --label jenkins.job=wmf-quibble-selenium-php72-docker --label jenkins.build=127253 --env-file /dev/fd/63 docker-registry.wikimedia.org/buster:latest /workspace/log -mindepth 1 -delete
09:06:24 ++ /usr/bin/env
09:06:24 ++ egrep -v '^(HOME|SHELL|PATH|LOGNAME|MAIL)='
09:06:25 [wmf-quibble-selenium-php72-docker] $ /bin/bash -xe /tmp/jenkins3239550946283738087.sh
09:06:25 + set -eux
09:06:25 + mkdir -m 2777 -p src
09:06:26 [wmf-quibble-selenium-php72-docker] $ /bin/bash /tmp/jenkins17595688023582983572.sh
09:06:26 + set -o pipefail
09:06:26 + exec docker run --user=nobody --entrypoint=/usr/bin/find --volume /srv/jenkins/workspace/workspace/wmf-quibble-selenium-php72-docker:/workspace --security-opt seccomp=unconfined --init --rm --label jenkins.job=wmf-quibble-selenium-php72-docker --label jenkins.build=127253 --env-file /dev/fd/63 docker-registry.wikimedia.org/buster:latest /workspace/src -mindepth 1 -delete
09:06:26 ++ /usr/bin/env
09:06:26 ++ egrep -v '^(HOME|SHELL|PATH|LOGNAME|MAIL)='
09:06:27 [wmf-quibble-selenium-php72-docker] $ /bin/bash -eu /tmp/jenkins6221564899181341474.sh
09:06:27 + chmod 2777 src
09:06:27 [wmf-quibble-selenium-php72-docker] $ /bin/bash /tmp/jenkins8547938132531281831.sh
09:06:27 + set -o pipefail
09:06:27 ++ pwd
09:06:27 ++ pwd
09:06:27 ++ pwd
09:06:27 + exec docker run --tmpfs /workspace/db:size=320M --volume /srv/git:/srv/git:ro --volume /srv/jenkins/workspace/workspace/wmf-quibble-selenium-php72-docker/src:/unused --volume /srv/jenkins/workspace/workspace/wmf-quibble-selenium-php72-docker/cache:/cache --volume /srv/jenkins/workspace/workspace/wmf-quibble-selenium-php72-docker/log:/workspace/log --security-opt seccomp=unconfined --init --rm --label jenkins.job=wmf-quibble-selenium-php72-docker --label jenkins.build=127253 --env-file /dev/fd/63 docker-registry.wikimedia.org/releng/quibble-buster-php72:1.2.0-s3 --packages-source vendor --db mysql --db-dir /workspace/db --git-parallel=8 --run selenium

The Quibble command is ready to start now.

Zuul clone with parameters (34 seconds)

Hard to see how this could be optimized further; this is where we use the cache from the previous step to load core, skins and extensions, and then update to the relevant commits needed for the build.

Extension and skin submodule update (8 seconds)

Also not worth looking into more; here we are running git submodule update for skins/extensions.

Install composer dev-requires for vendor.git (10 seconds)

Again, nothing to optimize here. We’re using a composer cache, so this is fast.

Start backends, Install MediaWiki, npm install in /workspace/src, start backends

Each of these takes a few seconds, and is not worth optimizing.

Browser tests for projects

Quibble first runs browser tests for the extension/skin under test, then runs tests for other extensions. In this case, the build I’m looking at is for GrowthExperiments, so the first tests run are for GrowthExperiments. We can see that npm install takes 20 seconds. The total test time is from 9:07:36 to 9:08:57, 1 minute and 19 seconds.

A few things that stand out:

There are lots of calls to api.php
There are 85 HTTP 404 responses and 26 HTTP 301 responses
npm install time varies from 20 seconds to GrowthExperiments to 1 minute for Wikibase and MobileFrontend.
The tests are all running serially
Some assertions in the tests can be achieved via PHPUnit integration tests

Let’s look at these in detail below.

🕵️ Optimize what happens in the tests

The best thing to do here is to look at spec files in tests/selenium in core and various extensions/skins. Some of my observations:

Some tests have (occasionally elaborate) set up processes to put content in place, and these are all executed via the browser. It would probably be faster and also less error-prone to have a documented entrypoint for importing content via maintenance/importDump.php; perhaps the convention could be for Quibble to look for XML files in tests/selenium/seed-content/ and import these; or the main wdio configuration file in core could have a setUp process that uses childProcess.spawnSync() to run import and runJob maintenance commands.
Some spec files require a new login for each test; this is probably not always needed and reusing a user would save some seconds in spec files with lots of tests.
Many spec files create new instances of Api.bot() in each test, and some overhead could be reduced by reusing a single instance T284443

⛓️ Parallelization

While the above optimizations would help, except for standardizing the content fixture setup process, none of these are necessarily an improvement in the developer experience, and it’s probably also not going to make a substantial difference in test execution time.

Is there anything we can run in parallel to speed things up?

Selenium

We have a task to Run browser tests in parallel from 2019, and @awight has put a lot of effort into the Quibble infrastructure and experimenting with Wikibase tests. There are still a few issues to sort out – one is that we need an Apache backend to handle requests in parallel (more on that below), but another is that various tests and spec files make assumptions about being run serially, and without that assumption they become more flaky.

So while running the browser tests in parallel (either the extensions, or the tests within each spec, or both) remains a work in progress, one opportunity for an immediate gain, would be to run the npm install step required for each extension in parallel.

There’s a patch pending in Quibble that shows a reduction in build times from 18 minutes 13 seconds to 13 minutes 20 seconds when parallelism is used to run npm install for all extensions/skins under test.

Another option for parallelism is to split the jobs being run into two or more groups. For example, wmf-quibble-selenium-group-A and wmf-quibble-selenium-group-B, where group A runs tests for half of the extensions under test, and group B runs tests for the other half. Coupled with the parallel npm install for each build, that could feasibly bring the build time down to less than 8 minutes, assuming that there are enough executors available to run an extra job.

Apache

We currently use single-threaded PHP built-in server (e.g. php -S localhost:9412) with the browser tests. That is problematic because a typical request to MediaWiki involves multiple HTTP requests: a request to index.php to get the HTML content, multiple calls to load.php to fetch JS/CSS, calls to api.php to get or set preferences. Each of these requests is processed serially. Quibble has support for using Apache as the web backend, but we need to make the switch to use it in CI T285649 . That will require some care when it is rolled out, because there are tests that may become flaky in the event of being run against a web server that can process requests in parallel.

✂️ Run fewer tests

The last point is to write fewer Selenium tests, or rather, to choose which Selenium tests we write wisely. Selenium tests are slow, often fragile, and difficult to maintain. On the other hand, they are invaluable in being able to simulate an actual browser and interactions with your UX, in a way that PHPUnit tests and QUnit tests cannot. Keeping this balance in mind, it’s worthwhile reviewing the tests we have to see what could be moved to PHPUnit / QUnit, and what should be preserved, optimized and relied on as a browser test.

So, that’s part 2. Next time I’ll write some more about parallelization. See you in part 3!