To understand the operation of web browser cookies, we must first understand how web browsers work. If you read this page carefully just once, you will understand how insidious tracking on the web has become...
How Web Browsing Works . . .
|
The key concept is that individual page assets exist separately on remote servers, and each must be requested separately by the web browser. |
As depicted in the diagram below, a single web page can request and receive web page assets from more than one server. Since individual web page assets are identified by URL addresses, just like web pages, they can originate from anywhere on the Internet. |
Another way to express what we summarized above, would be to say that when a web page is displayed it can contain, and usually does, URL references to page assets existing on web servers other than the site we're visiting. To retrieve all of the page's content, our web browser will then — without notifying or asking us in any way — send queries to all referenced “third-party” web servers requesting the required page assets.
As we will now see, what's disturbing about this is the amount of potentially private information that's automatically sent to unknown & untrusted “third-party” organizations located anywhere on the Internet . . .
The following table shows the complete contents of a single query for a single web page asset.
Please don't worry about understanding the details of the material below. (There will not be a test on this.) We just wanted to give you one complete sample of a query so that you could see how much “stuff” a web browser supplies during every query:
A Web Browser's Query to a Web Server
Header Name | Header Value |
GET | /cookies/browser4.png HTTP/1.1 |
Host: | www.grc.com |
User-Agent: | Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.16) Gecko/20080702 Firefox/2.0.0.16 |
Accept: | image/png,*/*;q=0.5 |
Accept-Language: | en-us,en;q=0.5 |
Accept-Encoding: | gzip,deflate |
Accept-Charset: | ISO-8859-1,utf-8;q=0.7,*;q=0.7 |
Keep-Alive: | 300 |
Connection: | keep-alive |
Referer: | https://www.grc.com/cookies/operation.htm |
Cookie: | temp=ii1cggumuicc5; perm=f5boqn3rlw3pu |
If-Modified-Since: | Fri, 25 Jul 2008 21:13:39 GMT |
If-None-Match: | "2875f4f9beec81:9cf" |
Cache-Control: | max-age=0 |
Web browser queries are composed of a series of lines of information, called headers, with each header line containing a “name” and a “value” (known as name-value pairs). Not all queries contain all of the same items. Some may contain additional name-value pairs, some fewer. The example above is typical and you can probably infer much of the intent from the names and values themselves.
Once your eyes have uncrossed from encountering all of the gory details of one complete query, we'll focus upon just the few details that are important for our exploration:
Header Name | Header Value |
GET | /cookies/browser4.png HTTP/1.1 |
Host: | www.grc.com |
Referer: | https://www.grc.com/cookies/operation.htm |
Cookie: | temp=ii1cggumuicc5; perm=f5boqn3rlw3pu |
GET | The “GET” header line provides the “target address” of the web asset being requested, and specifies to the web server the highest HTTP protocol level understood by the web browser. In the example shown above, the browser is requesting a graphic image named “browser4.png” located in the “/cookies” directory of the web server, and the browser is declaring that it understands HTTP protocol v1.1. |
Host: | The “Host” header specifies the “machine name” to which the query is directed. This is used for at least two purposes: First, multiple web sites can be located (hosted) at a single IP. So when a connection is made to a shared IP, the Host query header must be used to “disambiguate” the query so that it applies to the correct web site sharing a single IP address. The second use for the Hosts header is for domain tagging of any cookies that may also be included with the query. Web browsers store cookies “per domain” and return previously received cookies in subsequent replies when the domain name under which they were received and stored matches the domain specified by the query. |
Referer: | The “Referer” header provides the full URL of the referring object. If a query is being made for a web page's image asset, the Referer header will provide the URL of the web page making the request. If a query is a link being clicked by the user, the Referer header provides the URL of the page containing the link. (The "Referer" header was originally misspelled and has remained misspelled ever since.) |
Cookie: | The “Cookie” header specifies the cookies the web server wishes to set for the domain specified in the Host header. Multiple cookies can be set, they can have expiration dates, be secured so that they will only be sent over encrypted SSL connections, and have various other properties. The web server can also delete previously set cookies by setting their value to nothing. |
With an understanding of the concepts provided above, the “trick” of third-party web browser cookie tracking can be readily understood:
FIRST, a user visits Web Site A containing an advertisement from (third-party) Web Site X ...
From then on, as this user visits any other web sites also containing an advertisement being served by the same (third-party advertising) Web Server X ...
The Final Bit of (Very) Disturbing Truth
Up to this point all of the information collected about the user has been anonymous. Using the common third-party cookie, the third-party web server is able to track this anonymous user wherever they go when visiting other sites also hosting advertisements or other third-party assets sourced from the tracking server.
However, a huge economic motivation exists to break this anonymity.
Advertisers would love to know the postal address and zip code of these tracked users since from that they can learn a great deal about the socioeconomic status of the less and less anonymous users being tracked by their database. So here's the really bad news: Thanks to the tracking linkage provided by third-party cookies, it takes only one colluding web site, knowing the names, addresses and other non-anonymous information about its users, to break the anonymity for every site its users visit. Some web sites that establish advertising relationships with their vendors obtain a “kick back” for sharing the real world names, addresses, eMail, and other personal information with their “partners”. And if you read the fine print of the agreement you clicked when you provided that information, you'll see that you inadvertently agreed to this back-channel information sharing.
Your account number often appears somewhere embedded in the URL address of the logged-in pages you visit on a web site. If so, that URL containing your account number in the “Referer” header is sent along with the third-party cookie to the third-party server. Now the third-party server knows your private account number at the referring web site. That colluding web site then only needs to send its third-party advertising “partner” a record containing that account number and all the information they have about you. The third-party server merges all of that information into its ever-growing database.
Or, if a logged-in user's account number doesn't appear in the site's URL addresses, the site can simply add it to the end of the URL requesting the third-party advertisement. This essentially says: “The user with this account is requesting a page with this advertisement.” Since the user's browser will include the third-party cookie with the request, the advertising partner again obtains all the information required to link the account number to the previously anonymous third-party cookie. That cookie is anonymous no longer.
Although the technology of web browsing was never explicitly designed to facilitate this sort of privacy leakage and identity disclosure, neither was it designed with this problem in mind. The original designers of the system were thinking about how functional and cool the technology was . . . not how that cool functionality could be abused for commercial privacy-invading purposes.
Unfortunately, so much pressure exists to track and profile users across the Internet that the use of third-party cookies is no longer the sole means for accomplishing such tracking. Nevertheless, because it is supported by all browsers and enabled by default for all browsers other than Apple's Safari browser, the use of third-party cookies remains the number one most popular and powerful means of tracking users and the other non-cookie tricks are generally used to “reconstitute” third-party cookies that the browser's user has deliberately deleted in an attempt to prevent Internet tracking and surveillance.
Therefore, the easiest and most immediate thing you can do is to simply configure you web browser to not accept or return cookies offered to it by third-party web servers. The other pages in this region of this web site (see the link block below) will assist you in configuring your browser appropriately.
|
Gibson Research Corporation is owned and operated by Steve Gibson. The contents of this page are Copyright (c) 2024 Gibson Research Corporation. SpinRite, ShieldsUP, NanoProbe, and any other indicated trademarks are registered trademarks of Gibson Research Corporation, Laguna Hills, CA, USA. GRC's web and customer privacy policy. |
Last Edit: May 04, 2013 at 17:12 (4,235.24 days ago) | Viewed 5 times per day |