Tech / AI

What Type of Data Do Websites Collect About You?

Online Tech Tips is reader-supported. We may earn a commission when you buy through links on our site. Learn more.

When the web first became mainstream in the mid 90s or so, one of its key characteristics was anonymity. No one used their real names and you could live a second life online, at a blazing 33 kbps.

The web of today is very different. Not only is there a strong push to deanonymize people, the websites you visit on a daily basis can record and capture all sorts of information about you. What kinds of information? Read on to find out.

Table of Contents
    What Type of Data Do Websites Collect About You? image - What-Type-of-Data-Do-Websites-Collect-About-You-Title-Image

    Your IP Address

    This is the most common type of information that a website will log. Your IP or Internet Protocol address is a number that denotes where on the internet you are located.

    It’s basically the same thing as a real-world address. If someone wants to send you a letter, they’ll write your address on it. When you receive it, their return address will be on the back. So you know where it came from.

     - Ethernet-Cables

    If you replace “letter” with “internet packet” you basically know how an IP address works. The problem is that a website can actually figure out quite a lot of private information about you from just your IP address.

    They’ll know more or less where you are browsing from and which ISP you’re using. With a little more detective work (and perhaps a legal warrant) an IP address can lead someone directly to your door.

    This is why so many people are using VPNs (virtual private networks) these days. The VPN acts as a middleman, so only their IP address is visible to the site you are visiting.

    Hardware & Software Details

    Web browsers report all sorts of information to a website that asks for it. This includes a wealth of information about the computer you are using.

     - Components

    The site will know your operating system, processor, GPU and more. This may seem innocent, but could be used to track or ID a specific machine.

    One way to get around this is to browse from within a virtual machine, which will provide generic system information to the website.

    1st & 3rd Party Cookies

    A cookie is a small file that a site leaves on your computer to keep a record of things such as your site preferences. So the next time you visit, it will already know things about you.

    Cookie technology is not a bad thing in itself. Session cookies, for example, delete themselves when you close the browser. You also get first-party persistent cookies, which are the ones saved to your device by the site for its own use.

    1st & 3rd Party Cookies image - Cookies

    A tracking cookie is a persistent, third-party cookie which is read by sites other than the ones which created them. That cookie accumulates information about your web activities and that information can then go back to the cookie’s creator.

    Legislation about how and when cookies can be used has been tightening in recent years. Which is way almost every site has its cookie policy pop up the minute you visit it for the first time. If you disagree with that policy then no cookies will be stored on your machine.

    However, there is nothing stopping a rogue site from peppering your machine with tracking cookies without your knowledge. Luckily you can use your browser’s privacy settings to block and delete cookies as desired.

    Invisible Trackers

    Cookies are perhaps one example of an invisible tracker, but as a larger category, invisible trackers also include web apps and external sites embedded in a legitimate site.

    Major news sites and other popular web pages often have advertising content embedded at the bottom of an article which includes some form of tracking. Google does this as well. This is why when you search for a specific product in Google you’ll see ads for it pop up on every other site that features Google Adsense.

     - INvisible-Man

    Luckily there are privacy-focused search engines such as DuckDuckGo which explicitly don’t track you.

    Modern browsers now also support a feature known as “do not track”, which tells a site that it should turn off its tracking technology when you visit. However, this is a voluntary agreement so the site can ignore it if it wants to.

    The most effective tool in the fight against invisible trackers is the EFF’s Privacy Badger.

    Autofill Data

    You’ve probably noticed that when you have to fill in shipping details on a new site you’ve never visited before, your browser automatically fills in details like your name and address. It’s a convenient feature, but it is also a privacy nightmare.

     - Checklist

    Unscrupulous sites can be coded to capture that information the second it’s autofilled. This means that site has now captured your full details without your knowledge. As you can imagine having information such as an address, full name or social security number can be used to wreak havoc in the wrong hands.

    It’s best to just disable autofill in your browser settings.

    Other Accounts You’re Logged In To

    When you visit a site, it can detect what other accounts you are currently logged into by the traces they leave on your machine. This is actually very valuable information, because combined with a known email address it tells hackers which other accounts you have.

    So if one of those accounts have been part of a data breach and your password is uncovered, you may be in trouble. Many people use the same or similar passwords across accounts so this makes it much easier for hackers to breach your security.

     - social-1

    The best thing to do here is use strong, unique passwords for every account. A good password manager that generates those random passwords is highly recommended.

    Detailed Input Logs

    It’s possible for websites to be coded in such a way that every keystroke and every mouse movement you make are recorded in detail. The tracking abilities of websites in this regard are pretty extensive.

    A research paper detailing “session replay scripts” demonstrated that most major websites make complete recordings of your keystrokes and mouse movement while you are visiting and then use this for further analysis. You can probably imagine the sorts of privacy issues this could cause.

    Browser Fingerprints

    A browser “fingerprint” is simply the unique combination of browser data, such as which cookies are on your system and what plugins are installed. The longer a browser is used and the more it is customized the easier it is to link to a specific user.

    Browser Fingerprints image - Keyboard-with-Fingerprint-1

    For example, even if you use a VPN to access a site, the site knows your fingerprint. So if you visit another site using that same browser without the protection of anonymity, a clear link between those activities can be made.

    Using a privacy-oriented browser such as the Tor Browser is a good way to prevent this sort of de-anonymization.

    How To Check What You Are Leaking

    Several websites exist that will help you figure out where and how you are leaking information. Panopticlick is a great tool by the Electronic Frontier Foundation which does just that.

    Just click the big “test me” button and all your paranoid fears may be confirmed. Luckily there’s never a bad time to sharpen up your privacy practices.