Knowledge base article

Missing data in GA4: Data sampling and its solutions

5 minutes reading time

Google Analytics is not showing you all your data. And they are not the only ones. Google and other online advertising providers are known for using their Machine Learning capabilities to optimize the way they serve ads. So, these systems are more and more being called a black box by advertisers who in the past had access to all the interaction data to optimize their campaigns. Unfortunately, these same techniques are being used in GA4 today which can make it harder for analysts to understand what their users want.

That is not what this article is about though. Here we will look at another technique used that makes the analysis less reliable. I am talking, of course, about data sampling. So let’s have a look at what data sampling is, why you should care, how to know when your data is sampled, and how to avoid it. Let’s go!

What is data sampling?

First, we need to understand what data sampling is. Sampling is a statistical analysis technique used to identify patterns and trends in a larger data set by selecting, manipulating, and analyzing a representative subset of data points. Using a small, manageable amount of data, data scientists, predictive modelers, and other data analysts can build and run analytical models more quickly, while still achieving accuracy.

In other words; it takes time to calculate and present your results in your GA reports. So, in some cases, GA will take a part of the data and use this to estimate the total: Data sampling. You might already see some of the disadvantages of the practice, but why should you care?

Why should you care?

So why does it matter that some analytics tools like GA4 sample data and others do not? Well, sampled data is not the whole data set therefore it’s less accurate and reliable. It’s that simple.

Of course, there are also benefits to data sampling. The only thing is that these benefits are mostly for the tool provider. But, let’s have a look at the pros and the cons of data sampling.

The benefits of data sampling

1. Low cost of sampling

A great benefit of data sampling is that it reduces the computing costs to produce your report. This is only a benefit for the analytics vendor of course.

2. Less time consuming

The use of sampling takes less time also. This way analytics tools like GA4 can provide you with the report you want much faster.

3. More manageable datasets

Sampling makes it easier to manage and process large amounts of data, which is especially helpful when working with limited computing resources.

The downsides of data sampling

1. Chances of bias

The serious limitation of the sampling method is that it involves biased selection and thereby leads us to draw wrong conclusions. Bias arises when the method of selection of samples employed is faulty.

2. Difficulties in selecting a truly representative sample

A truly representative sample produces reliable and accurate results only when they are representative of the whole data set. Selecting good samples is difficult.

3. Limited precision of results

Sampling can lead to imprecise results, as the sample size may not be large enough to capture all the nuances of the data.

So as you can see data sampling in the case of a tool like GA4 is not very beneficial for your results. The only direct advantage for you is that your reports load faster, but at what cost? It is important then, to detect when your reports are based on sampled data.

When might your data be sampled in GA4?

Google Analytics 4 doesn’t always sample your reports. It actually only happens when you deal with very large data sets. To be precise, your reports will be sampled from 500k sessions in GA4 and from 100M sessions in GA360 (at view level).

One plus though, standard reports are always unsampled. Only advanced reports like explorations can be sampled. So when you run a smaller website and keep your analysis to smaller periods of a few months at a time, you will not be subjected to data sampling.

As you can see GA360 users have a bit more flexibility. The sampling threshold is much higher plus recently Google added a new feature that allows its users to toggle between faster analysis (with data sampling applied) or more accurate results (no data sampling applied). Also, there is an unsampled exploration beta available for GA360 users. It allows the users to create exploration with up to 50 billion events for more accuracy.

How do you know if your data is sampled in GA4?

Look at the top of your report. If there is a green check, then it’s unsampled. If you see a yellow or red % icon your report is sampled. Hover over it to see the percentage of sampling applied.

In GA360, this is where you find the toggle between the two analysis modes.

How to avoid data sampling in GA4?

1. Reduce the date range

When you reduce the date range you will decrease the size of the data set to the point where it is smaller than 500k sessions to avoid sampling.

2. Simplify your report

Simplifying your reports will also reduce the size of the data set and therefore the chance that your data will be sampled.

3. Use BigQuery

When you use BigQuery to analyze your data, data sampling will never be applied. That is what is great about BigQuery. The downside is that it’s not free. In BigQuery you pay per query you run.

Google made it very easy to start using BigQuery with GA4 data. In your GA4 admin panel in the property column, you will find a BigQuery integration option. Once you link the accounts here your GA4 will automatically flow into BigQuery for analysis.

4. Use GA360

As I mentioned GA360 provides you with more flexibility and higher data sampling thresholds.

5. Use another web analytics tool

There are many other web analytics tools that don’t use data sampling in any of their reports. If you want to learn more about some alternatives for Google Analytics you can check out our side-by-side feature comparison tool. It allows you to compare over 40 tools in-depth and see which ones use data sampling and which ones don’t.

The takeaway

As you have seen data sampling is quite a handy technique only it might give you a wrong picture of your users. Know how to spot it and take it into account or simply choose an alternative. Happy analysing!

Profielfoto Freek Kampen

By Freek Kampen

Data & Analytics specialist and co-owner of New North Digital. With a background in online advertising, I solve tracking and data issues for entrepreneurs and agencies.

Related

Continue learning

Looking for more answers?

Check out our knowledge base for more articles and glossary terms. Level up your knowledge with our articles on core concepts in web analytics.

Continue learning
Compare list
Close
Get help

Send us a message