Study shows the widespread usage of user tracking code

May 19, 2016 15:45 GMT  ·  By

Princeton Assistant Professor Arvind Narayanan and graduate student Steven Englehardt have conducted a massive study on how websites track users using different techniques.

The results of the Princeton Web Census research, which they claim to be the biggest to date, show that Google, through multiple domains, is tracking users on around 80 percent of all Top 1 Million domains.

Google is a serial killer tracker

Researchers reveal that Google-owned domains, from where browsers load tracking code, account for the top 5 most popular trackers and 12 of the top 20 tracker domains.

In fact, after studying the Top 1 Million sites, researchers discovered over 81,000 different domains from where tracking code was loaded. Taking a closer look at the data, researchers said that only 123 of these third-party trackers are found on more than 1 percent of all sites.

"This suggests that the number of third parties that a regular user will encounter on a daily basis is relatively small," Princeton Web Census researchers explain. "The effect is accentuated when we consider that different third parties may be owned by the same entity. In fact, Google, Facebook, and Twitter are the only third-party entities present on more than 10% of sites."

All of this means there's a high chance that you visit a website, or click on a link, and that one of the three companies mentioned above already knows about it. This is certainly true for Google, who loads some sort of tracking code on four out of five websites.

Most encountered tracking domains
Most encountered tracking domains

Third-party trackers move to implement audio-based fingerprinting

To collect their results, researchers said they used a custom piece of software called OpenWPM, which they also open-sourced. OpenWPM loads websites in Chrome, Firefox and Internet Explorer, collecting data on the tracking technology loaded on each page.

OpenWPM will look for third-party JavaScript files, Flash objects, cookies, fonts, and fingerprinting techniques such as the ones that employ HTML5 canvas, the AudioContext API, and WebRTC local IP discovery.

Out of all these, the newest tracking technology the researchers discovered is the one that leverages the AudioContext API. Third-party trackers use it to send low-frequency sounds to a user's PC and measure how the PC processes the data, creating a unique fingerprint based on the user's hardware and software capabilities. If you want to check your own audio fingerprint, there's a demo page set up by the researchers.

User tracking is everywhere

The results of this study aren't a surprise. A similar one carried out by researchers from MIT and Oxford and published this week revealed that Twitter location tags on just a few tweets can show a lot of details about the account's owner, like their real-world address, hobbies, and medical history. Of course, this data is collected by Twitter and used to delivered more targeted advertising.

But tracking is not specific to the World Wide Web only. Another recently released study, this one by researchers from Standord, shows that phone call metadata can also be used to infer personal details about a phone owner. Even if the mobile carrier is not listening in on calls, they may already know as much about you as Google, if not more.

All of these studies reveal that it's becoming harder and harder to maintain a low profile and have a private life. Some users might be able to block HTML5 canvas fingerprinting, but they might use their phones on a regular basis. If they don't use phones, then WebRTC reveals their real IP address from behind VPNs. Regardless of what users might attempt to avoid third-party trackers, there's always somebody watching in the shadows.

The Princeton study's results are available as a free-to-download PDF and as raw SQL data files. The research is titled "Online tracking: A 1-million-site measurement and analysis."

A sample audio fingerprint created with the AudioContext API
A sample audio fingerprint created with the AudioContext API

Photo Gallery (3 Images)

Tracking code is everywhere, from websites to smartphones
Most encountered tracking domainsA sample audio fingerprint created with the AudioContext API
Open gallery