Perf Matters at Wikipedia in 2018

Looking back at our ups and downs.

Why performance matters

Ian Marlier explains why web performance is important, particularly for the Wikimedia Foundation.

📝 Read more at Why performance matters.

Singapore caching data center goes live

In the lead up to Wikimedia’s fifth data center going live, we put measurements in place that capture the before and after, and quantify the impact.

Peter expanded our synthetic testing to include agents across the Asian continent (T168416). Gilles developed a geographic oversampling capability for our real-user monitoring (T169522).

Navigation Timing graph for Wikipedia page views from Singapore, in March 2018. Plotted are TCP time, Response start, and DOM Complete. The activation of the Singapore data center ocurred on 22 March 2018. — Page load time in Singapore **decreased by 40 percent**. Before the change, the median page load varied between 1.5 and 1.7 seconds. Afterward, the median varied between 0.8 and 0.95 seconds.

WebPageTest goes Linux

WebPageTest is the software that powers our synthetic testing across different browsers. While open source, the WebPageTest agent code historically assumed a Windows operating system. In 2017, upstream WebPageTest started work on a multi-platform agent. The flagship webpagetest.org service started switching to Linux in January 2018.

Peter Hedenskog migrated our virtual servers from Windows to Linux, too, for a couple of reasons:

The Linux agent is cheaper to run than Windows (on AWS).
Linux enables us to locally debug and run tests ourselves, fix upstream bugs, and easily review and iterate on pull requests.
Allows us to host the service in-house in the future, in alignment with our guiding principles, and as required for production servers. Self-hosting would give us significantly more capacity to test on a wider range of wikis, more kinds of articles/URLs, and at a higher frequency.

📝 Read more at wikitech: WebPageTest/UpdateToLinux and T165626.

Hello, WebPageReplay

We concluded last year that WebPageReplay is our best choice for greater stability and granularity in synthetic testing.

This year, Peter deployed WebPageReplay (alongside Browsertime), slowly replacing WebPageTest. We also added support for Chrome Tracelog (T182510).

📝 Documentation at wikitech: WebPageReplay

Data from WebPageTest varies between 1800 and 2200 milliseconds. WebPageReplay varies between 2100 and 2133 milliseconds. — Difference in stability for metrics collected using WebPageTest and WebPageReplay. WebPageReplay provides greater stability, involving much less variation.

Private wiki support for Thumbor and Thumbor goes active/active!

In 2017, we deployed Thumbor as MediaWiki’s service for generating thumbnail images on all public wikis. Read more in Journey to Thumbor: 3-part blog series.

This year, we took the final steps in the Thumbor project: making it work on private wikis.

As part of the Multi-DC MediaWiki initiative, we made Thumbor serve traffic in both data centers simultaneously. Thumbor is now an active-active service that the CDN can call from its nearest edge (T201858). This eases annual switchovers and brings Wikipedia closer to becoming a multi-datacenter deployment.

Literature review and our own study on Perceived Performance

The field of web performance tends to assume that faster is always better. Guidelines (like Google’s RAIL) imply the existence of absolute universal values for what feels fast or slow. Over the years, many academic studies have looked into the subject of performance perception, and we attempt here to provide a meaningful literature review for this body of work, to shed some light on what is known and what is yet to be studied.

Gilles carried out an extensive literature review on performance perception (T165272).

📝 The resulting essay: Perceived Performance (2018)

📝 Read more at Performance – Magic numbers

This has changed our perspective on how much we really know, injecting a healthy dose of skepticism in what we do.

Existing frontend metrics correlated poorly with user-perceived page load performance. This suggests there is still significant work to be done in creating metrics that correlate well with what people perceive and value. It’s also clear that the best way to understand perceived performance is still to ask people directly about their experience. We set out to run our own survey to do exactly that, and look for correlations between new performance metrics and the lived experience. We partnered with Dario Rossi to carry out this research over the course of the next year (T187299).

📝 Machine learning: how to undersample the wrong way

📝 Mobile web performance: the importance of the device

Prepare for multi-DC operations

We introduced WANCache in 2015 as a developer-friendly interface for Memcached in MediaWiki, offering high resiliency at Wikipedia scale. Its design protects MediaWiki developers from incorrect assumptions, and allows teams to treat cached data as comparable in staleness to data from a database replica. This works through automatic broadcasting of purges and tombstones.

From our multi-datacenter MediaWiki roadmap, we implemented the plan for cross-DC Memcached purging. Mcrouter is a Memcached proxy that can relay events across clusters. SRE deployed Mcrouter proxies, and we rolled out Mcrouter as MediaWiki’s WANCache backend.

Mcrouter in the MediaWiki at WMF diagram.

Aaron Schulz led the MediaWiki work and the rollout. Along with the various bugs we encountered and fixed along the way, this is tracked under T198239.

Adoption of WANCache has continually increased since its introduction two years ago. Of particular note this year is adoption in the Echo extension for MediaWiki (T164860). Echo powers notifications on Wikipedia. Echo used a “recache-on-change” strategy, which is unreliable in practice and incompatible with Multi-DC requirements.

Faster loading pages: Two stage rocket

ResourceLoader is Wikipedia’s delivery system for CSS, JavaScript, and localization. Its JavaScript bundler works in two stages to improve HTML cache performance. Until now, our two-stage design manifested as three network roundtrips, rather than two. This was due to the ResourceLoader client depending on a set of base modules (the “mw” base library, and jQuery).

This year we’ve eliminated one of the round trips required for web pages to be “interaction ready”. Our refactoring made “mw.loader” a dependency-free ResourceLoader client, written in vanilla JavaScript. Aaron Schulz and Timo Tijhof re-structured the payloads to embed “mw.loader” within the “startup” module (T192623).

ResourceLoader is designed to control when code executes, separate from when it arrives from the network (by using a closure). This means it can download base modules and other dependencies, concurrent with the modules that depend on them!

We gained a ~10% speed up in JS delivery time on web pages, by rewriting an internal recursive function as a loop instead. This rewrite had the bonus of making our stack traces easier to understand. There is now clearer attribution of how time is spent in a given MediaWiki feature. The recursion meant that each module appeared to include all modules after it, creating an infinite staircase. In some cases, the visualization in browser DevTools broke in other ways (details at T202703).

Miscellaneous

The team published five times in the Web Performance Calendar this year:
Gilles implemented a new “Backend-Timing” metric on Apache PHP web servers. This is our first full-scale measurement of MediaWiki backend latencies. It is also among the first MediaWiki-related metrics to adopt Prometheus instead of Statsd/Graphite (T131894).
Timo created Fresnel CI, which gives automatic web performance feedback on every patch (T133646).
We reduced the data formats and top-level directories in MediaWiki core, by phasing out “.ser” files. We relocated files to more suitable places in specific components, rather than in a generic directory. The replacements re-use either JSON, PHP static arrays, or CDB (Gerrit change sets, related RFC).
Ian evolved the navtiming service. This service submits real-user monitoring beacons to Graphite, for visualization and alerting. The service is now multi-DC compliant, with automatic failover and switchover based on Kafka and Etcd. The code is now also in its own repository and deployed through Scap, like our other services. (T191994)
Ian solved long-standing tech debt by decoupling Coal and Coal-web from Graphite (T159354, T158837). Coal now writes to Graphite over the network (instead of to disk). Coal-web now proxies the Graphite API with caching (instead of directly from disk). Both are now no longer co-located on Graphite hardware, Multi-DC aware, and deployed with Scap like our other services.