Eliminate Duplicate Transactions in GA4 & Google Ads: Why & How
As marketeers in the digital age, data accuracy and integrity is paramount. With every campaign we run, every decision we make, and every strategy we implement, data sits at the…
Knowledge base article
Google Analytics is not showing you all your data. And they are not the only ones. Google and other online advertising providers are known for using their Machine Learning capabilities to optimize the way they serve ads. So, these systems are more and more being called a black box by advertisers who in the past had access to all the interaction data to optimize their campaigns. Unfortunately, these same techniques are being used in GA4 today which can make it harder for analysts to understand what their users want.
That is not what this article is about though. Here we will look at another technique used that makes the analysis less reliable. I am talking, of course, about data sampling. So let’s have a look at what data sampling is, why you should care, how to know when your data is sampled, and how to avoid it. Let’s go!
First, we need to understand what data sampling is. Sampling is a statistical analysis technique used to identify patterns and trends in a larger data set by selecting, manipulating, and analyzing a representative subset of data points. Using a small, manageable amount of data, data scientists, predictive modelers, and other data analysts can build and run analytical models more quickly, while still achieving accuracy.
In other words; it takes time to calculate and present your results in your GA reports. So, in some cases, GA will take a part of the data and use this to estimate the total: Data sampling. You might already see some of the disadvantages of the practice, but why should you care?
So why does it matter that some analytics tools like GA4 sample data and others do not? Well, sampled data is not the whole data set therefore it’s less accurate and reliable. It’s that simple.
Of course, there are also benefits to data sampling. The only thing is that these benefits are mostly for the tool provider. But, let’s have a look at the pros and the cons of data sampling.
A great benefit of data sampling is that it reduces the computing costs to produce your report. This is only a benefit for the analytics vendor of course.
So as you can see data sampling in the case of a tool like GA4 is not very beneficial for your results. The only direct advantage for you is that your reports load faster, but at what cost? It is important then, to detect when your reports are based on sampled data.
Google Analytics 4 doesn’t always sample your reports. It actually only happens when you deal with very large data sets. To be precise, your reports will be sampled from 500k sessions in GA4 and from 100M sessions in GA360 (at view level).
One plus though, standard reports are always unsampled. Only advanced reports like explorations can be sampled. So when you run a smaller website and keep your analysis to smaller periods of a few months at a time, you will not be subjected to data sampling.
As you can see GA360 users have a bit more flexibility. The sampling threshold is much higher plus recently Google added a new feature that allows its users to toggle between faster analysis (with data sampling applied) or more accurate results (no data sampling applied). Also, there is an unsampled exploration beta available for GA360 users. It allows the users to create exploration with up to 50 billion events for more accuracy.
Look at the top of your report. If there is a green check, then it’s unsampled. If you see a yellow or red % icon your report is sampled. Hover over it to see the percentage of sampling applied.
In GA360, this is where you find the toggle between the two analysis modes.
When you reduce the date range you will decrease the size of the data set to the point where it is smaller than 500k sessions to avoid sampling.
Simplifying your reports will also reduce the size of the data set and therefore the chance that your data will be sampled.
When you use BigQuery to analyze your data, data sampling will never be applied. That is what is great about BigQuery. The downside is that it’s not free. In BigQuery you pay per query you run.
Google made it very easy to start using BigQuery with GA4 data. In your GA4 admin panel in the property column, you will find a BigQuery integration option. Once you link the accounts here your GA4 will automatically flow into BigQuery for analysis.
As I mentioned GA360 provides you with more flexibility and higher data sampling thresholds.
There are many other web analytics tools that don’t use data sampling in any of their reports. If you want to learn more about some alternatives for Google Analytics you can check out our side-by-side feature comparison tool. It allows you to compare over 40 tools in-depth and see which ones use data sampling and which ones don’t.
As you have seen data sampling is quite a handy technique only it might give you a wrong picture of your users. Know how to spot it and take it into account or simply choose an alternative. Happy analysing!
Related
As marketeers in the digital age, data accuracy and integrity is paramount. With every campaign we run, every decision we make, and every strategy we implement, data sits at the…
A while back Google added the landing pages report back into GA4. If you have already used it, you might have noticed something weird.. There is a (not set) dimension…
Report on CPC & ROAS in GA4? Yes, you can. Are you looking to analyse metrics like ROAS in GA4? Want to gain deeper insights into the efficientcy from your…
Do you have a login on your site? Would you like to analyze individual users in-depth in GA4? Well, you can. You can pass your own User IDs to Google…
Check out our knowledge base for more articles and glossary terms. Level up your knowledge with our articles on core concepts in web analytics.
Continue learning