By Noam Rosenthal with Gilles Dubuc, Wikimedia Performance Team
It might seem surprising to hear that the Wikimedia Foundation is commissioning the implementation of a web browser feature. Yet, all major browser engines are FLOSS projects nowadays, and sometimes browser vendors have different priorities that don’t match the needs of the Wikimedia Foundation.
In this instance, we decided to commission the implementation of a feature that lets us observe web performance from an end-user perspective. This web browser feature tells us at what point in time content started to appear on the screen for a visitor. That particular feature has been available first as vendor-specific APIs, and now increasingly as the standardized Paint Timing API, initially on Chrome. However, it’s always been impossible to measure this in any form on Apple’s Safari web browser.
Being unable to measure such a basic event in the user experience for Safari visitors means that we are blind to an important category of performance issues for 20% of our visitors in the field. As different browsers behave differently, this means that a category of issues or bugs could negatively impact the initial rendering of pages on Safari specifically and we wouldn’t be able to know. While we can measure this metric in a simulated environment, lab tests can never reproduce the real-world conditions of hundreds of millions of visitors. Measuring web performance data in the field is critical to our monitoring. It’s an essential part of the toolset that lets us keep Wikimedia sites fast for everyone.
The Paint Timing API
The Paint Timing API is a part of several performance APIs that allow web developers to observe and improve their web application performance. The Paint Timing API measures the time it takes between the moment a user navigates to a URL and the moment the browser has displayed something.
The Paint Timing API has been available in Chrome since version 60 (2017). Its metrics have become useful for understanding loading experience, and it is now part of Google’s Web Vitals initiative.
The Paint Timing API has two metrics:
- First paint: when the user sees anything at all that’s not the browser’s default background.
- First contentful paint: when the user sees something “contentful”, like an image or text. For example, a DIV with a background color is not considered “contentful”.
The Paint Timing API, specifically first contentful paint, is especially useful for understanding loading performance trends, as it measures something that’s simple yet directly meaningful to users— the time between navigating to a website and being able to consume some of the content visually.
Apple’s Webkit lacked Paint Timing
Although Paint Timing has become a major metric, out of the three major browser engines, Google’s Chromium (Chrome), Gecko, which powers Firefox, and WebKit , which is Apple’s browser engine powering Safari, it was only made available in Chrome. Gecko and WebKit had Paint Timing on their todo lists, but it was given a low priority. The priority lists of core browser engine teams are dynamic and influenced by web developer wishes, as well as internal pressures and interests of software engineers working on those projects.
Over the years, WebKit has been more and more transparent about their project goals and feature roadmap. However, external contributions and priorities from web players like Wikimedia (or, in the past, Bloomberg and others) have been a factor in prioritization. If someone outside the core team contributes the code and tests for a feature, why not review it and accept it?
Wikimedia cares deeply about having good and stable metrics, especially for mobile users, and one of the Performance team’s primary missions is to measure and understand web performance. The team had been waiting for years for Apple to implement this requested browser feature, but Apple had prioritized other improvements. Luckily, Webkit is an Open Source project, which means anyone can offer contributions to it. We decided to invest in doing what it takes to make the Paint Timing API available in WebKit. It is the engine for all the browsers on iOS, an operating system used by 20% of our visitors, and it’s essential to collect the metrics that will improve their experience.
We approached Apple
To implement a new feature in WebKit, especially an API available to web developers, it needs to be aligned with the project goals, and Apple has to be behind it. Although it’s possible to experiment with new features in the WebKit project to some extent, and although anyone can offer patches and possibly become a reviewer (like the undersigned), a feature would only be available in Safari and in any iOS browser if Apple chose to enable it. Our first step was to reach out to Apple to begin the conversation.
We discussed the Paint Timing API with the Webkit team at Apple, and they raised several points:
- Great, Apple generally wants it!
- First-paint is problematic. Only first-contentful-paint is desirable. The WebKit implementation delays the first paint if there are pending styles and fonts unless a meaningful amount of content is ready to be rendered (Equivalent to 1024 pixels of images, or 200 characters of text).
Apple’s main argument was that web developers might optimize for first-paint if it’s available, and it’s less desirable because pages would appear quickly but with missing content.
- The spec is too vague. It defines what “contentful” is only in general terms, and leaves too much interpretation regarding what it means in practice.
- It’s not clear how Paint Timing relates to the viewport. Is it supposed to occur only when elements that are in the viewport are painted? What is the “viewport” in this case?
- Cross-origin iframes shouldn’t necessarily be given access to this API, as it may expose information to them about the root frame, at least in Safari.
We updated the initial spec
After talking with Apple, we went back to the spec drawing board—together with Google who had written the spec originally and had a corresponding working implementation, and Mozilla, who joined the discussion later on.
It’s been an interesting challenge to explicitly define what “contentful” means when measuring “first contentful paint.” It took several iterations to create a thorough definition of contentful. Basically, it means any image, text or replaced element (canvas/video and the likes) that is visible and has valid coordinates would trigger first-contentful-paint.
Another modification to the spec was to separate the Paint Timing API from the viewport. Instead of measuring when elements go below/above the fold, which is affected by user window sizes and scrolling, first-contentful-paint is only measured based on which elements are available, so a page can report its first contentful paint even when all of its content is below the fold. This was a controversial decision. However, detecting when elements are in the viewport is something that can be achieved with other means, such as intersection observer. In addition, we “relaxed” some of the requirements of the API. First-paint is not mandatory, and the feature detection is document specific, meaning that the browser can decide if an iframe gets it or not.
To test all these particular angles of the spec, we’ve added over 25 tests to the “web platform tests” project to make sure browsers are aligned when implementing it.
The spec discussions showed something that I personally appreciate about how the web works. There are different pulls and pushes from different players creates a balance. In this case, Google wanted to keep the spec relatively close to its implementation, while also showing flexibility in changing it after it has been delivered. Apple pushed back to something that has all its details fleshed out with regards to compatibility, performance, and privacy, and for making sure that the spec fits in with the details of the WebKit rendering process, and Wikimedia pushed for something that’s clear and useful for web developers.
Having all these players (plus Mozilla and others) in the spec discussion, and making compromises that came from these discussions, helped us end up with something that feels tight and generic enough to become a cross-browser API.
The first obstacle in the WebKit implementation was aligning first-contentful-paint with WebKit’s “Visually Non-Empty” (VNE) heuristics— the code that delays the painting until enough pixels/characters are available. That heuristic, unlike the FCP spec, only triggered paint for text and for image elements but ignored SVG, canvas, video, and background images, which the Paint Timing API does consider as contentful.
Before implementing the actual FCP metric, we’ve leveled the field by making those also trigger the first paint (for example https://trac.webkit.org/changeset/260045/webkit).
Doing this exposes what makes some of these heuristics problematic when it comes to edge cases. For example, a page with 100 characters of text and a gradient background might still delay painting in some situations—as it doesn’t pass the 200 character / 1024 pixel threshold.
The second challenge was writing this into the WebKit rendering codebase without slowing down rendering.
Initially, we tried a naive implementation, which went recursively through the render tree, which is an internal representation of all renderable objects in WebKit, and found objects that are contentful, visible, and intersect with the document rect. This proved to be quite costly, as checking the exact geometry of every element in a big page requires a lot of information and matrix math.
Instead, the implementation works by invoking a “fake paint.” It goes through the motions of painting without changing any pixel. The fake paint mechanism in WebKit had been implemented before, as a way to trigger paint-related events, such as async image decoders.
The way first contentful paint reporting works in WebKit, is that it runs that “fake paint” process before the actual paint, in a standard step called “update the rendering,” to determine if the document contains any elements that match the paintable and contentful criteria. If such an element is found, the current time is marked as the timing of first-contentful-paint, and that step would not run again in the future.
Paint Timing is available in Safari Tech Preview since version 106, and you can turn it on in the “Develop > Experimental Features” menu. To make it available to the general public in Safari releases, Apple would require to test that the fake paint doesn’t create a performance regression, and to run an extra review of the code. Currently, the fate of Paint Timing in Safari has moved from the hands of the community in the WebKit Open Source project to Apple internally to test it, turn the flag on, or request changes.
In the meantime, the folks at Mozilla were making progress on implementing Paint Timing in Gecko/Firefox, and made some contributions to the spec and to the platform tests.