2 misunderstandings of Google’s Core Web Vitals

Timothy Bednar
Waterfaller.dev
Published in
4 min readNov 20, 2020

--

This post assumes that you have been working to improve Core Web Vitals and need clarity on how Google uses the metrics. Please visit Waterfaller.dev to fix page speed and core web vital issues on your web site.

I have been working on making pages faster for over a decade and have seen all kinds of shenanigans when it comes to page speed. Way back in the day, we mistakenly used server response times which were soon replaced by time-to-first byte. Oh and then we got fancy.

We started measuring “page load” (or specifically the window.load event which corresponds to when the browser stops “spinning”). But soon it was clear that we could cheat that metric to look good on internal reports without any real-world improvement. (Oddly enough Google Analytics still uses this metric in its Site Speed report.)

How fast is fast enough?

One of my questions used to be, “How fast is fast enough?” This was answered differently at every organization where I have worked. In some cases, it was fast enough when speed no longer improved conversion rates. Or it was fast enough exactly when other business objectives took priority. Other times, the problem was solved when an executive no longer complained.

Starting in May 2021, Google announced that Core Web Vitals are becoming the metrics used to determine search rankings. And for many businesses, this is the answer on what to measure and how fast is fast enough. But even with that clarity, two common misunderstandings might keep us from fixing our issues. Today, I will focus on first input delay as measured by Google in their Search Console.

Misunderstanding #1 — field metrics

It is common to misunderstand a new Web Vital metric called first input delay (FID). The main blocker to properly testing FID is that it is a “field metric” (also called RUM or real user monitoring). It is designed to measure the delay experienced by actual users — not computers. So this story is useless,

As an application, I want to improve the FID so that it is less than 300ms.

Instead, you can use a “lab metric” called total blocking time (TBT) as a proxy for the first input delay. TBT measures how long the main thread is blocked from the first contentful paint until time to interactive (TTI). The idea is that if we reduce TBT, then the probability of a user getting a delay goes down. So you could write something more testable,

As an application, I want to reduce the Total Blocking Time of our landing page so that it is less than 600ms.

I have recently found that while this is testable. Depending on our situation, the scope of the fix could make this story unsuccessful. TBT adds up the blocking time of “long tasks” or tasks long take longer than 50ms. So a more testable story might be,

As an application, I want to reduce the number of long tasks that occur on our landing page to improve total blocking time.

The above story forces us to inventory all the long tasks on the page, find the scripts responsible, and then target specific solutions. Depending on what we discover, we might create user stories for each task addressed. This specificity allows us to create stories that are testable.

Misunderstanding #2 — percentiles

Now the reality is that we may release code that reduces the number of long tasks and drops our total blocking time. But this does not mean that our Core Web Vitals report is fixed. Why?

The issues identified in the Search Console are based on percentiles. Every day, Google rolls up all the URLs for our domain then it calculates the 75th percentile for first input delay. It then applies its rubric to that number: 0–300ms is fast, 300–600ms gets labeled “needs improvement” and everything slower is “poor”. If a page is visited 48 times and the FID is “poor”, it means that 36 visitors experience an input delay over 600ms.

So to make sure that our FID is fixed, we will need to make sure that our testing describes how to test focused on those 36 visitors. For example, we discovered that users getting poor FID were mobile users visiting from India. This insight now informs our acceptance criteria for our story.

Story: As a mobile visitor, I want my browser to process fewer long tasks so that I can quickly interact with the call to action.

Scenario: mobile visitor from India visits landing page
Given that I’m located in India
and using a Moto g4 on a slow 3G connection
When I visit on the landing page
Then my browser processes fewer long tasks
and I can quickly interact with the call to action

For this story, we can test by comparing the before and after values for the number of long tasks and then also use the Chrome Web Vitals plugin to measure FID when interacting with the call to action.

If Web Vitals and Google ranking is critical to our business, we would do well to avoid these common misunderstandings which also will apply to largest contentful paint and cumulative layout shift.

I’m daily working to improve page speed, and as a side hustle, I created Waterfaller. I appreciate your comments.

--

--