Saturday, February 22, 2025
HomeAnalyticsSimple Way To Measure A/B Test Flicker Impact

Simple Way To Measure A/B Test Flicker Impact


“Flickering” or “Flash Of Original Content” (FOOC) is a phenomenon where there’s a (typically) slight but observable delay in the browser updating the site or element layout if the user is included in a variant group for experimentation. This manifests in the original, unmodified element being rendered in the visible portion of the page before the experiment library updates it with the variant.

There are ways to mitigate the flicker:

  1. Add the A/B testing library directly into the page template and don’t load it via some other, asynchronously loaded dependency (e.g. Google Tag Manager).
  2. Load the A/B testing library synchronously, and have it hide the element that is being tested until the library is loaded.
  3. Utilize some kind of anti-flicker tech.
  4. Run the experiments server-side, and render content with the variant in place.

Typically, the only non-intrusive and consistent way to avoid the flicker is to look into server-side rendering for your experiments. For example, tools like Conductrics offer a robust set of APIs to do all the decision-making logic in your server. Then there are tools like Google Optimize that require you to do the variant selection and assignment manually, but the tool can then handle the data collection and reporting.

However, the reason you’ve read thus far is probably because you’re worried about client-side testing.


X


The Simmer Newsletter

Subscribe to the Simmer newsletter to get the latest news and content from Simo Ahava into your email inbox!

Introducing the problem

With JavaScript-based experimentation libraries, you’re subject to the rules and limitations of the page render in the browser. The flicker happens because the page with the modified element is being rendered from the page HTML source, but the experimentation library needs to wait for an opening to allow the browser to process the experiment data.

This is most often a problem when you’re running scripts asynchronously. Async load means that once the browser starts to download the library, it doesn’t wait for the download to complete. Instead, it proceeds with the page render. Once the download is complete, and as soon as the browser has an available slot in its single thread of execution, it will start parsing and executing the JavaScript within the library.

By moving from asynchronous to synchronous loading, you solve part of this issue. However, it’s not like synchronous loading actually fixes anything automatically. Since the library is loaded at the top of , a synchronously loaded library doesn’t have access to the elements it’s designed to modify (since those elements are created in the , which hasn’t yet been generated).

Instead, libraries like Google Optimize, when loaded synchronously, actually hide the element that’s being tested. They inject a style declaration that sets the visibility of all elements matching the CSS selector of the experimentation targets to hidden. Only once the element has been actually added to the page, can Optimize then modify it and unhide it. This is fairly elegant but it might introduce a slight flicker of another kind, where the element seems to “pop” into place out of sequence with the rest of the render.

A similar solution is anti-flicker JavaScript. The purpose here is to actually hide the entire page until the experimentation library has loaded. This is, and has been, my biggest objection about how A/B-testing tools are implemented. I simply can’t fathom the logic behind potentially sacrificing the usability of the entire page just to get better data quality for your experimentation.

Considering how crucial page performance and perceived page performance is these days, I steer clear of anti-flicker snippets that hide the entire page. It doesn’t matter if there are mitigations in place for ad blockers and download errors. If the endpoint is unresponsive or lags, Google Optimize’s default anti-flicker snippet has the page wait for a maximum of 4 seconds (this is adjustable) before revealing the content. Naturally, if the container loads before that, the page is revealed faster. But still, OUCH!

Measuring the impact of flicker

So, let’s assume the situation is as follows:

You’ve got an experiment running that treats a home page element, which is visible above the fold if the page is loaded without a scroll threshold in place.

You’ve deployed Google Optimize using the new snippet. You’ve deployed the asynchronous snippet, and you are not using the anti-flicker JavaScript, so there’s a visible and measurable flicker in place.

Flicker of the original (grey background) before the variant (red background)

Flicker of the original (grey background) before the variant (red background)

In order to measure the severity of this flicker, we need to collect a number of timings:

  1. Time when the original element was added to the page,
  2. Time when the original element became visible in the viewport,
  3. Time when the experimentation library was loaded,
  4. Time when the experiment change was applied to the page.

The flicker is the time delta between (2) and (4). If the element isn’t visible in the viewport, or if the experiment is applied before the base element becomes visible, the flicker is not a problem. (3) is interesting metadata about how the experimentation library itself works, and how fast it manages to apply the change after loading.

Introduction to the JavaScript we’ll need

The solution will rely on two pieces of JavaScript code running directly in the page template. You can’t execute this code reliably through a dependency like Google Tag Manager, because Google Tag Manager in many cases loads after all steps (1)-(4) have already happened, meaning you won’t get accurate measurements.

The first bit of JavaScript is run at the very top of , even before the Optimize snippet. This script uses the optimize.callback API to collect the timestamp of the experimentation library load. This is timing number (3) in the list above.

The second JavaScript snippet is added to the top of , because the observers need access to document.body. Here’s what it does:

  • A MutationObserver waits on the page and reacts to two changes: when the element is first added to the page, and when the element is updated with the variant. These are timings (1) and (4), respectively, in the list above.
  • An IntersectionObserver is added to the page as soon as the original element is rendered. The purpose of the IntersectionObserver is to fire a callback as soon as the original element is visible in the viewport. This is timing (2) in the list above.

Once the timings have been collected, they are pushed into dataLayer to be used in Google Tag Manager.

Other preparations

To best measure the application of the experiment element, I have added the data attribute data-test="true" to the variant. This makes it easier for me to locate the element using CSS selectors.

The attribute is added via the Optimize editor, and is thus only present on the element after it’s modified by Google Optimize.

Finally, I’m collecting all this data using Google Tag Manager, and I’m sending it to App + Web because I want to collect it in BigQuery for more granular analysis.

You could just as well calculate the delta directly in the client and send it to, for example, Universal Analytics as an event. This is entirely up to you. I opted for the BigQuery approach – I justify this later in the article.

Installing the Optimize library and callback

To install the Optimize library, I’m adding the

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments