Noisy Reports Reduce Insight and Discovery
You’ll often discover page URLs that make your GA reports unreadable. Some URLs contain ID, SKU, or query parameters that cause page fragmentation. Fragmentation causes the metrics to be distributed across many URLs. The preference is that we record similar URLs as a single row into the data.
Analytics Pros solves this problem using GA Advanced Filters. These filters allow us to use regular expressions that match fragmented URLs and clean them to improve readability in your reports. We know data needs to be clean to be trustworthy, and reducing cardinality is a great step toward clean data!
Common Scenarios that lead to Noisy Page URLs
URLs show up in Google Analytics in a few ways. The most prominent is the page dimension. Other popular dimensions that include the URL are landing page, hostname, and referrer path.
The term dimensions refers to the columns of data stored in Google Analytics, specifically, the dimension is a column of data that represents a label, like a date or a string of text. Conversely, metrics is a column of data that can be measured or otherwise enumerated.
Example from StackOverflow
A StackOverflow user asked about a scenario where they needed to reduce the complexity of the page dimension by removing references to the dynamic product id. You can see the question and answer here: https://stackoverflow.com/q/46678036/7498378
For you convenience, the following is a summary of our approach to the problem.
GA Advanced Filter
If [product id] was 3 or more consecutive digits, ie: 123456789, then we would navigate to the Admin >> View >> Filters screen and select to + Add Filter.
- Select Advanced Filter
- Field A: Request URI:
- Field B: (empty)
- Output To: Request URI:
- Check Field A Required and
Override Output Field
The above configuration will rewrite the Request URI from:
One question to consider is the importance of the placeholder. Should it be removed entirely, or should there be evidence of the edit? My answer to that question is that we leave the placeholder in the URI. I believe it important to represent the edited state of the URI, and demonstrate that more information exists if the user desires, but this particular data point has been aggregated.
In this case, the product id was a very simple pattern, for example a series of three or more digits. If the id were more complicated, like alphanumeric, then we would want to ensure we used an appropriate regex pattern to match. We will review more advanced pattern matching techniques in a future blog. In the meantime, check out our RegEx Cheat Sheet and keep in touch @AnalyticsPros! Talk to you soon!