Analytics Toolkit was conceived in 2012 as a set of tools that automate essential Google Analytics-related tasks and augment the GA functionalities in various ways. This goal was achieved in the years since with the release of over a dozen tools utilizing the Google Analytics API. These were accompanied by dozens of in-depth technical articles on the same topic posted on this very blog which gathered hundreds of thousands of views over time. The toolkit served hundreds of digital agencies and freelance experts working in web analytics.
Given the above, deciding to shut down the Google Analytics API integration on Jul 1, 2023 is a decision we did not make lightly. The discussion below is intended to help our customers understand why this move is necessary, but it may also be useful to a wider audience of experts in web analytics or A/B testing.
Why we are parting ways with Google Analytics
The short answer is that the “Universal Analytics” version of the product will be discontinued as an active product after July 1, 2023 and will be superseded by “Google Analytics 4”.
As a consequence:
- Google Analytics will no longer offer interface or API access to data with the accuracy needed to perform statistical analyses, such as those in A/B testing, regressions, or time-series analyses.
- It seems Google Analytics will no longer be a viable tool for the majority of our core user base (small and medium businesses, large businesses not focused on digital, as well as the digital agencies serving them)
- With Analytics Toolkit’s main focus having shifted from providing Google Analytics tools towards tooling for statistical planning and analysis of A/B tests (randomized controlled experiments), it makes even less sense to develop new tools tailored to the needs of GA4 users.
The third point is self-explanatory, while the first two are discussed in detail below.
Lack of sufficiently accurate data
With any kind of statistical analysis a key ingredient is accurate data, or at least data whose inaccuracy can be modeled in a useful and dependable manner. Due to the independence assumption underlying most statistical models used in analysis of experiments, time-series analyses, and regressions, these are typically performed on user-level metrics such as average revenue per user, various conversion rates per user, etc. For this and other issues related to using session-based metrics and similar, consider reading When Session-Based Metrics Lie.
The problem with getting accurate users counts is that it can be very expensive in terms of memory requirements. An often-used workaround is to calculate so-called cardinality estimates, with a prominent example being HyperLogLog and its variants. However, using such estimates instead of actual counts poses a grave issue to any statistical analysis performed on metrics based on these estimates. To our knowledge, we published the first in-depth exploration of the extent of the issues caused by the use of HyperLogLog-like estimates back in 2020. To put it briefly, any statistical estimates you get from such data, including p-values, confidence intervals, point estimates, etc. would be as bad as to be unusable. It essentially makes it impossible to do valid A/B testing, time-series analysis or regressions.
The issue with estimates in Google Analytics data is explored in detail in The Perils of Using Google Analytics User Counts in A/B Testing. While Universal Analytics uses such estimates, there is a way to circumvent them and to extract the actual count both in the GA interface as well as through the API which Analytics Toolkit relies on.
Not so for Google Analytics 4 which uses a similar count estimation algorithm in the interface and for API requests, but offers no way to extract accurate counts other than a raw data export via BigQuery. This greatly increases the monetary and technological cost of extracting useful data from GA4 for everyone. Given that there is no way to do so via the GA4 API, it makes no sense for Analytics Toolkit to support an API integration as we did for Universal Analytics.
If the data used as input in our A/B testing hub or the various standalone statistical calculators is garbage, the output will also be garbage. Given the current state of Google Analytics 4 there is only harm to be found in offering a GA4 API integration. Those technically skilled users who are happy to use BigQuery to extract useful data from Google Analytics will find that it is even easier to use our Data API and Reporting API to perform statistical analysis of their online A/B tests (more on this below).
The shifting focus of Google Analytics 4
In our view, with “Google Analytics 4” Google has decided to turn Google Analytics from an accessible and practical tool suitable for all but the most highly sophisticated enterprise customers into a tool heavily geared towards highly sophisticated enterprise customers. Obviously, this makes it much less suitable for use by the majority of our typical users: small and medium businesses as well as large companies that are not focused solely on digital and may have only a small team of web analysts on payroll. In our experience, digital agencies are likewise finding it difficult to make GA work for their clients.
In making this transition, Google Analytics seems to have failed to take into account the needs of its core user base, making functionalities difficult to use, adopting a model that makes little sense for web tracking, and discarding many beloved and highly necessary functionalities such as views and view-level filters, easy exclusion of URL parameters, content groupings, not to mention the many useful pre-configured reports. Add to this the bulky GTAG library which replaced the lightweight analytics.js one (hurting page load speeds) and an arguably nightmarish UX and you have a product that we’ve seen users seeking alternatives more than ever before with new competitors springing every day and existing ones registering high growth rates.
As our typical user is less likely to use Google Analytics, or at least to use it to any meaningful extent, it becomes impractical to seek to create new tools tailored to GA4.
How to use A/B test data stored in Google Analytics
While a direct integration with both Google Analytics and Google Optimize (which uses GA as storage, and is also to be retired later in 2023) will no longer be offered following Jul 1, 2023, there are ways to get accurate data stored in GA into our statistical tools in an automated fashion. To do so, one can utilize our Data API by sending data extracted from GA4 via BigQuery to it. The BigQuery data would use raw counts and not HyperLogLog estimates, so it will be suitable for any kind of statistical analyses one might want to perform with any of the tools at Analytics Toolkit.
Obviously, such roundabout integration requires some technical skills, but so does making use of BigQuery in itself. Given how easy it is to start using our API, it should not be a challenge to most, and we are always happy to provide customers looking to start using the Data API with the support they need to do so.
In addition, we’ve recently launched our Reporting API which adds another layer of automation to the planning and statistical analysis of A/B tests in our platform.
If you are a customer of Analytics Toolkit we hope you appreciate our decision to drop Google Analytics integration as merely a continuation of our mission to deliver unrivaled statistical rigor. Only with such rigor can experiments perform their primary role as tools for risk management and estimation and currently it cannot be achieved via a Google Analytics API integration. If you aren’t yet a customer, then we hope this was a useful read.
About the author