4 Easy Ways to Reduce Cardinality Today

What is cardinality?

Cardinality is the number of elements in a set or other grouping, as a property of that grouping. For example, the set A = {2, 4, 6} contains 3 elements, and, therefore, A has a cardinality of 3. High cardinality means there are a lot of unique values in the set or grouping. Having low cardinality means there are few unique values in the set or grouping.

How can I tell if I am experiencing high cardinality in Google Analytics?

Google Analytics has a limit of the amount of unique values in a dimension per day. Daily processed tables store up to 50K rows for Standard Analytics and up to 75K rows for Google Analytics 360. Multi-day processed tables store up to 100K rows for standard Analytics and 150K rows for Google Analytics 360. Multi-day processed tables contain 4 days’ worth of data.

When the amount of cardinality passes this limit, Google Analytics automatically chooses the top values to display and creates a row labeled “(other)” for the remaining values.

 


Dimensions with high cardinality potential:

  • Page
  • Event Label
  • Custom Dimensions

 


You might also see this message pop up on the top right side of your browser.


How do I get rid of (other)?

There are numerous steps and workarounds to deal with high cardinality in Google Analytics. Before reading further, we highly recommend applying the following steps by creating a new view or using a test view before applying to your production views.

 

Solution 1: Query Parameter Settings

The easiest and the quickest step you can take to reduce cardinality is to change your query parameter setting. You can reduce the number of possible values in the Page dimension by filtering out dynamic session/customer ID variables in the query parameter settings.

Any query parameters or unique sessions that you do not want to appear in your URLs can be entered in the setting as a comma-separated list. For example, if you do not want to see “sessonid” in your URLs because it is causing high cardinality, enter in in sessionid in the Exclude URL Query Parameters setting.

 


It’s important to note that the information inside this setting is before filters. This means if the original URI is “SessionID”, you will need to input “SessionID” even if you have the lowercase URI filter applied to this view.


Solution 2: Site Search Settings

Having a search function on your website can lead to high cardinality. When a user enters text into the search function, the text is captured in the URL.

 

Tracking all the different text that users enter in the search box can lead to high cardinality. To solve this issue, go to the view setting in Google Analytics and check the “Strip query parameters out of URL” box.

original URL:
www.example.com/?s=xxx

 

New URL:
www.example.com/

 

Instead of appearing in the URL data, the search terms can now be found under Site Search reports. Search terms are cleaner and easier to analyze.

Before the ‘Strip query parameters out of URL’ checkbox is checked, this is what the page dimension might look like:

 

After applying the site search settings, the search terms report shows this:

 

 

The page dimension now shows:

 

 

Solution 3: Custom Tables

Custom Tables give you access to all of the data for a particular set of metrics, dimensions, segments, and filters on a daily basis. The main purposes of custom tables is to avoid sampling of your data due to large number of metrics such as sessions. Although this does not eliminate cardinality issues, unlike standard daily processed table, Custom Tables have a limit of 1M unique rows per day, which is 925K higher than the normal limit of 75K rows for GA 360.

 

Solution 4: Filters

There are many filters you can utilize to remove high cardinality. Before applying any filters please note these rules regarding filters:

 

  • Filters are destructive. Filtering your incoming hits permanently includes, excludes, or alters those hits in that view, according to the type of filter. Therefore, you should ALWAYS maintain an unfiltered view of your data so you always have access to your full data set.
  • Filters require up to 24 hours before they are applied to your data.
  • Fields specified in a filter must exist in the hit and not be null in order for the filter to be applied to that hit. For example, if you are filtering on Hostname, but the hit does not contain that field (perhaps the hit was sent via the Measurement Protocol and that request did not contain the &dh parameter), then any filters acting on Hostname will be ignored and the hit will be processed as if there was no filter.
  • Filters are account-level objects. If you edit a filter at the view level, you are also changing the filter at the account level, and any other views that use the filter are also affected by the change. If you want to customize a single instance of an existing filter used by multiple views, create a new filter and apply it to that single view.
  • Filters are applied in chronological order. The assignment of filter order does matter. Google Analytics will process data in the order of the filter. Make sure that if you do not want to accidentally lose data, filters are assigned in the correct order.

 

Lowercase Request URI Filter:

The first and easiest filter to apply is a lowercase Request URI filter to remove casing fragmentation.

The page dimension might look something like this:


Because URLs are case sensitive, we recommend lowercasing Request URI using this filter below:

 

 

After applying the filter, page dimension should look like this:

 

 

Casing fragmentation of page paths can make for messy reports which can cause inaccurate analysis of data.

 

Remove All Query Parameter Filter:

If excluding URL Query Parameters and stripping site search query parameters in the View setting still did not solve your high cardinality problem, try to remove all query parameters using this filter.


To avoid losing any data, you would want to create a custom dimension that contains the Full URL. First, create a Full URL Custom dimension in Google Analytics. Then, create a Full URL Custom Dimension filter like below:

 

 

Afterward, create this filter to remove all of the query parameters.

 

 

Customized Regex Filter:

What if you did not want to delete the whole query parameter, but just part of the query parameter? What if one (or more) section of the URL was causing cardinality? What if fields other than Page are experiencing high cardinality? You can always write a custom regex filter to solve these issues.

For example, let’s say Event Action was experiencing high cardinality.

 

 

The data after Browser was neither useful or informative to the user. Instead of having Classifieds|Browser|xxxxx, the user wanted to consolidate it all to Classified|Browser.

To do this, an event action filter was applied.

 

 

The filter notifies to the end user that the field after browser has been moved to another Custom Dimension. Now, the user will see this when looking at the Event Action report.

 

 

In this scenario, the field that got replaced with ‘{Moved to CD}’ was not providing valuable information to the GA report user. Including this value caused fragmentation and it led to messy data which can cause inaccurate analysis of data.

Here is another example of rows of data that can be consolidated.

 

 

After applying a filter, the page data now looks like this.

 

 

In this scenario, before applying the filter, the user would have to create a filter to only include testauthor to view total pageviews for a particular author (testauthor). After applying the filter, the user does not have to create any filters because the total pageviews or metrics associated with the testauthor would be consolidated into 1 row.

Filter applied:

 

 

Let us know in the comments other ways you have tried to reduce cardinality!

Leave a Comment