Web Browser Cookies
Operation, Use & Abuse




How Web Browsing Works . . .
Introducing an important new term:   “Web Page Asset”
Although web browsers display whole scrollable pages, each of the many different pieces used to compose a single page — the text, pictures, photos, diagrams, animations, advertisements and so on — actually exist on the Internet as individual web page “assets.”

While people often use the term “asset” to describe financial property, “asset” is also the term used to refer to individual pieces or components of web pages.

Individual web page assets are anything web pages display or use, such as the page's textual content, page layout and formatting information, executable scripts, images, animations, videos, advertisements, and so on.
With that definition in mind, a web browser's “operation model” is simple and straightforward; it is just a series of individual queries and replies:
1. Query:The browser requests an “asset” from a
remote server.
2. Reply:If the server has the requested asset, it
is returned and provided to the browser.
Lightbulb The key concept is that individual page assets exist
separately on remote servers, and each must be
requested separately by the web browser.
How is an entire web page displayed?
Since each individual page asset must be separately requested from remote web servers, pages are literally built-up and assembled by requesting, receiving, and accumulating many separate assets onto a single page.

1. The main "body" of the page is first retrieved from a remote web server:
browser1
2. After receiving the page's text, the web browser searches for all references to additional assets contained on the page and sends out a second wave of requests for each of the page's additional assets:
browser2
3. The remote web server (or servers) locate and return all of the requested page assets to the requesting browser, thereby completing the page and allowing it to be displayed in finished form:
browser3
During your web wanderings, you may have noticed some pages where the images appear after some or all of the page's text has already been displayed. Now you know why this happens: The web browser displayed what it had (the main body of the page) while it was waiting for its requests for the rest of the page's pieces to be fulfilled. Then, as it received the additional pieces of page content they were popped onto the page until the page was complete.
Lightbulb As depicted in the diagram below, a single web page can request and receive web page assets from more than one server. Since individual web page assets are identified by URL addresses, just like web pages, they can originate from anywhere on the Internet.
browser4
4. As shown in the diagram above, the “first-party server” is addressed by the page's main URL. It provides the web page's main page & body, and usually provides most of the page's other assets. “Third-party servers” are any other servers, located at different web addresses, which pitch-in to provide some of the page's content.
How web-based advertising works:
Web advertisements are provided by “third-party ad servers” which inject their advertisements into hosting web pages in exactly this fashion.

What we know so far:
  • Web pages are assembled from many separate bits & pieces.
  • The main body of the web page reserves space for the page's additional assets such as images, animations, videos, advertisements, etc.
  • After receiving the main page body, the web browser emits individual queries to gather the page's remaining assets.
  • Individual page assets are addressed and identified, like the page itself, by an Internet URL that can request a page asset from anywhere on the Internet.
  • The first-party server is the primary source of the page. It provides the page's main body and, usually, most of the page's other assets.
  • Third-party servers are located at other Internet addresses, but web browsers will request page assets from them just as they do from the page's first-party server.
  • The powerful flexibility provided by third-party assets allows third-party advertisements to be seamlessly “plugged-in” to web pages.
Another way to express what we summarized above, would be to say that when a web page is displayed it can contain, and usually does, URL references to page assets existing on web servers other than the site we're visiting. To retrieve all of the page's content, our web browser will then — without notifying or asking us in any way — send queries to all mentioned “third-party” web servers requesting the required page assets.

As we will now see, what's disturbing about this is the amount of potentially private information that's automatically sent to unknown & untrusted “third-party” organizations located anywhere on the Internet . . .
The anatomy of a browser query:
The following table shows the complete contents of a single query for a single web page asset.

Please don't worry about understanding the details of the material below. (There will not be a test on this.) We just wanted to give you one complete sample of a query so that you could see how much “stuff” a web browser supplies during every query:
GET/cookies/browser4.png HTTP/1.1
Host:www.grc.com
User-Agent:Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.16) Gecko/20080702 Firefox/2.0.0.16
Accept:image/png,*/*;q=0.5
Accept-Language:en-us,en;q=0.5
Accept-Encoding:gzip,deflate
Accept-Charset:ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive:300
Connection:keep-alive
Referer:http://www.grc.com/cookies/operation.htm
Cookie:temp=ii1cggumuicc5; perm=f5boqn3rlw3pu
If-Modified-Since:Fri, 25 Jul 2008 21:13:39 GMT
If-None-Match:"2875f4f9beec81:9cf"
Cache-Control:max-age=0
Web browser queries are composed of a series of lines of information, with each line containing a “name” and a “value”. Not all queries contain all of the same items. Some may contain additional “name:value pairs”, some fewer. The example above is typical and you can probably infer much of the intent from the names and values themselves.

Once your eyes have uncrossed from encountering all of the gory details of one complete query, we will focus upon just the few details that are important for our exploration:
GET/cookies/browser4.png HTTP/1.1
Host:www.grc.com
Referer:http://www.grc.com/cookies/operation.htm
Cookie:temp=ii1cggumuicc5; perm=f5boqn3rlw3pu



THIS PAGE IS UNDER ACTIVE CONSTRUCTION






The architects of the original world wide web designed the powerful and flexible system we still use today. Unfortunately, the trusting, academic world they designed the system for is different from the world we live in today.

While on the surface this seems like a reasonable thing to do, a web browser's page asset queries contain so much information — which is being provided to arbitrary third-party servers the user has not chosen to trust — that