Web Browser Cookies
Operation, Use & Abuse




To understand the operation of web browser cookies, we must first understand how web browsers work. If you read this page carefully just once, you will understand how insidious tracking on the web has become...

How Web Browsing Works . . .

Introducing an important new term:   “Web Page Asset”
Although web browsers display whole scrollable pages, each of the many different pieces used to compose a single page — the text, pictures, photos, diagrams, animations, advertisements and so on — actually exist on the Internet as individual web page “assets.”

While people often use the term “asset” to describe financial property, “asset” is also the term used to refer to individual pieces or components of web pages.

Individual web page assets are anything web pages display or use, such as the page's textual content, page layout and formatting information, executable scripts, images, animations, videos, advertisements, and so on.
With that definition in mind, a web browser's “operation model” is simple and straightforward; it is just a series of individual queries and replies:
1. Query:The browser requests an “asset” from a
remote server.
2. Reply:If the server has the requested asset, it
is returned and provided to the browser.
Lightbulb The key concept is that individual page assets exist
separately on remote servers, and each must be
requested separately by the web browser.
How is an entire web page displayed?
Since each individual page asset must be separately requested from remote web servers, pages are literally built-up and assembled by requesting, receiving, and accumulating many separate assets onto a single page.

1brThe main "body" of the page is first retrieved from a remote web server:
browser1
2brAfter receiving the page's text, the web browser searches for all references to additional assets contained on the page and sends out a second wave of requests for each of the page's additional assets:
browser2
3brThe remote web server (or servers) locate and return all of the requested page assets to the requesting browser, thereby completing the page and allowing it to be displayed in finished form:
browser3
During your web wanderings, you may have noticed some pages where the images appear after some or all of the page's text has already been displayed. Now you know why this happens: The web browser displayed what it had (the main body of the page) while it was waiting for its requests for the rest of the page's pieces to be fulfilled. Then, as it received the additional pieces of page content they were popped onto the page until the page was complete.
Lightbulb As depicted in the diagram below, a single web page can request and receive web page assets from more than one server. Since individual web page assets are identified by URL addresses, just like web pages, they can originate from anywhere on the Internet.
browser4
4brAs shown in the diagram above, the “first-party server” is addressed by the page's main URL. It provides the web page's main page & body, and usually provides most of the page's other assets. “Third-party servers” are any other servers, located at different web addresses, which pitch-in to provide some of the page's content.
How web-based advertising works:
Web advertisements are often provided by “third-party ad servers” which inject their advertisements into hosting web pages in the fashion shown above.
What we know so far:
  • Web pages are assembled from many separate bits & pieces.
  • The main body of the web page reserves space for the page's additional assets such as images, animations, videos, advertisements, etc.
  • After receiving the main page body, the web browser emits individual queries to gather the page's remaining assets.
  • Individual page assets are addressed and identified, like the page itself, by an Internet URL that can request a page asset from anywhere on the Internet.
  • The first-party server is the primary source of the page. It's the web site the user is visiting and appears in the browser's URL address bar. It provides the page's main body and, usually, most of the page's other assets.
  • Third-party servers are located at other Internet addresses, but web browsers will request page assets from them just as they do from the page's first-party server.
  • The powerful flexibility provided by third-party assets allows third-party advertisements to be seamlessly “plugged-in” to web pages.

Another way to express what we summarized above, would be to say that when a web page is displayed it can contain, and usually does, URL references to page assets existing on web servers other than the site we're visiting. To retrieve all of the page's content, our web browser will then — without notifying or asking us in any way — send queries to all referenced “third-party” web servers requesting the required page assets.

As we will now see, what's disturbing about this is the amount of potentially private information that's automatically sent to unknown & untrusted “third-party” organizations located anywhere on the Internet . . .

The anatomy of a browser query:

The following table shows the complete contents of a single query for a single web page asset.

Please don't worry about understanding the details of the material below. (There will not be a test on this.) We just wanted to give you one complete sample of a query so that you could see how much “stuff” a web browser supplies during every query:

A Web Browser's Query to a Web Server

    Header Name    Header Value
GET/cookies/browser4.png HTTP/1.1
Host:www.grc.com
User-Agent:Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.16) Gecko/20080702 Firefox/2.0.0.16
Accept:image/png,*/*;q=0.5
Accept-Language:en-us,en;q=0.5
Accept-Encoding:gzip,deflate
Accept-Charset:ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive:300
Connection:keep-alive
Referer:https://www.grc.com/cookies/operation.htm
Cookie:temp=ii1cggumuicc5; perm=f5boqn3rlw3pu
If-Modified-Since:Fri, 25 Jul 2008 21:13:39 GMT
If-None-Match:"2875f4f9beec81:9cf"
Cache-Control:max-age=0

Web browser queries are composed of a series of lines of information, called headers, with each header line containing a “name” and a “value” (known as name-value pairs). Not all queries contain all of the same items. Some may contain additional name-value pairs, some fewer. The example above is typical and you can probably infer much of the intent from the names and values themselves.

Once your eyes have uncrossed from encountering all of the gory details of one complete query, we'll focus upon just the few details that are important for our exploration:

    Header Name    Header Value
GET/cookies/browser4.png HTTP/1.1
Host:www.grc.com
Referer:https://www.grc.com/cookies/operation.htm
Cookie:temp=ii1cggumuicc5; perm=f5boqn3rlw3pu
GET The “GET” header line provides the “target address” of the web asset being requested, and specifies to the web server the highest HTTP protocol level understood by the web browser. In the example shown above, the browser is requesting a graphic image named “browser4.png” located in the “/cookies” directory of the web server, and the browser is declaring that it understands HTTP protocol v1.1.
Host: The “Host” header specifies the “machine name” to which the query is directed. This is used for at least two purposes: First, multiple web sites can be located (hosted) at a single IP. So when a connection is made to a shared IP, the Host query header must be used to “disambiguate” the query so that it applies to the correct web site sharing a single IP address. The second use for the Hosts header is for domain tagging of any cookies that may also be included with the query. Web browsers store cookies “per domain” and return previously received cookies in subsequent replies when the domain name under which they were received and stored matches the domain specified by the query.
Referer: The “Referer” header provides the full URL of the referring object. If a query is being made for a web page's image asset, the Referer header will provide the URL of the web page making the request. If a query is a link being clicked by the user, the Referer header provides the URL of the page containing the link. (The "Referer" header was originally misspelled and has remained misspelled ever since.)
Cookie: The “Cookie” header specifies the cookies the web server wishes to set for the domain specified in the Host header. Multiple cookies can be set, they can have expiration dates, be secured so that they will only be sent over encrypted SSL connections, and have various other properties. The web server can also delete previously set cookies by setting their value to nothing.
How Third-Party Cookies can be used for Tracking:

With an understanding of the concepts provided above, the “trick” of third-party web browser cookie tracking can be readily understood:

FIRST, a user visits Web Site A containing an advertisement from (third-party) Web Site X ...

tpc-transactions-1
As depicted above:

From then on, as this user visits any other web sites also containing an advertisement being served by the same (third-party advertising) Web Server X ...

tpc-transactions-2
As depicted above:
How Anonymity is Lost:

The Final Bit of (Very) Disturbing Truth

Up to this point all of the information collected about the user has been anonymous. Using the common third-party cookie, the third-party web server is able to track this anonymous user wherever they go when visiting other sites also hosting advertisements or other third-party assets sourced from the tracking server.

However, a huge economic motivation exists to break this anonymity.

Advertisers would love to know the postal address and zip code of these tracked users since from that they can learn a great deal about the socioeconomic status of the less and less anonymous users being tracked by their database. So here's the really bad news: Thanks to the tracking linkage provided by third-party cookies, it takes only one colluding web site, knowing the names, addresses and other non-anonymous information about its users, to break the anonymity for every site its users visit. Some web sites that establish advertising relationships with their vendors obtain a “kick back” for sharing the real world names, addresses, eMail, and other personal information with their “partners”. And if you read the fine print of the agreement you clicked when you provided that information, you'll see that you inadvertently agreed to this back-channel information sharing.

Your account number often appears somewhere embedded in the URL address of the logged-in pages you visit on a web site. If so, that URL containing your account number in the “Referer” header is sent along with the third-party cookie to the third-party server. Now the third-party server knows your private account number at the referring web site. That colluding web site then only needs to send its third-party advertising “partner” a record containing that account number and all the information they have about you. The third-party server merges all of that information into its ever-growing database.

Or, if a logged-in user's account number doesn't appear in the site's URL addresses, the site can simply add it to the end of the URL requesting the third-party advertisement. This essentially says: “The user with this account is requesting a page with this advertisement.” Since the user's browser will include the third-party cookie with the request, the advertising partner again obtains all the information required to link the account number to the previously anonymous third-party cookie. That cookie is anonymous no longer.

Although the technology of web browsing was never explicitly designed to facilitate this sort of privacy leakage and identity disclosure, neither was it designed with this problem in mind. The original designers of the system were thinking about how functional and cool the technology was  . . . not how that cool functionality could be abused for commercial privacy-invading purposes.

What Can Be Done?

Unfortunately, so much pressure exists to track and profile users across the Internet that the use of third-party cookies is no longer the sole means for accomplishing such tracking. Nevertheless, because it is supported by all browsers and enabled by default for all browsers other than Apple's Safari browser, the use of third-party cookies remains the number one most popular and powerful means of tracking users and the other non-cookie tricks are generally used to “reconstitute” third-party cookies that the browser's user has deliberately deleted in an attempt to prevent Internet tracking and surveillance.

Therefore, the easiest and most immediate thing you can do is to simply configure you web browser to not accept or return cookies offered to it by third-party web servers. The other pages in this region of this web site (see the link block below) will assist you in configuring your browser appropriately.


Internet Privacy & Tracking Pages:
Set GRC Site Options  (to enable/disable automatic notifications)

Jump to top of page
Gibson Research Corporation is owned and operated by Steve Gibson.  The contents
of this page are Copyright (c) 2016 Gibson Research Corporation. SpinRite, ShieldsUP,
NanoProbe, and any other indicated trademarks are registered trademarks of Gibson
Research Corporation, Laguna Hills, CA, USA. GRC's web and customer privacy policy.
Jump to top of page

Last Edit: May 04, 2013 at 18:12 (1,509.81 days ago)Viewed 7 times per day