This page carefully defines the many terms used throughout these “cookie pages” to help clarify any confusion that might result from our use of terminology that may be new and can be confusing. We have worked to minimize confusion throughout these pages by using consistent terms. So with a bit of attention, these ideas, which may be new, should become clear . . .
The world wide web (www) consists of web browsers acting as clients of web servers. Web servers are located at addresses on the Internet that can be located with their domain names such as “microsoft.com”, “yahoo.com” and “amazon.com”. These web servers listen 24/7 for incoming connections and queries from users' client web browsers located anywhere in the world. So . . . web browsers are clients being served by web servers.
The interaction between web browsers and web servers is surprisingly simple and abbreviated: The web browser asks (queries) for just one “thing” at a time, and the web server returns it (replies). Complete web pages, which are composed of many “things” (which we refer to as Web Page Assets throughout these pages) are created by having the web browser repeatedly query and query and query as the web server replies and replies and replies. The web browser first queries the web server for the page's “HTML” contents. This contains the page's text, as well as “references” to all of the other “things” (assets for the page) that will be needed to complete the page. The web browser reads through the page's HTML and, as it gets to references to page assets it doesn't already have, it sends off successive web queries requesting each asset in turn. As the web server receives the successive queries it replies with each asset one by one.
You have probably noticed evidence of this process where pieces of a page “fill in” after the text of the page has been displayed and laid out. Those pieces are arriving later and being patched into the page on the fly.
As was explained above (see Browser Queries & Server Replies) we use the term “page asset” to refer to the components of a web page. These include the page's base text plus all other additional components of the page such as its images, advertisements, scripting, animations, menus, and other independent pieces. The notion of separate assets is important since not all pieces of modern web pages are served by the same (first-party) server that serves the page's base text.
“Metadata” is an important concept to grasp because that's what cookies are. In computer science, “metadata” refers to supplementary information about some object. For example a computer's file system maintains metadata on files such as the file's size, dates of creation and last modification, and perhaps who has permission to read and write to it. Those attributes are not part of the file itself, but they are “about” the file.
In the case of world wide web (www) assets, a web browser's query to a server will specify the format(s) it can accept for the server's reply, the language it would prefer, the web page which referred to the asset being requested, the designation of the web browser “user-agent” itself, the name of the server being queried at the connected IP address, and whether the connection should be kept alive and re-used for subsequent queries and replies, or whether it should be closed once the server has replied. All of those details are the metadata accompanying the web browser's query.
Web server replies carry even more metadata, such as the status (OK, Not Found, etc.) of the query's reply, the last modified data of the asset which lets the browser know whether any copy it might already have in its cache is still current, the overall length of the asset being returned, the type and format of the asset, the cacheable lifetime of the asset and whether the browser is allowed to and should retain a copy for future use, one or more cookies to be associated with the web domain being browsed, and whether the server is going to close the connection once the reply is completed.
The term “first-party” refers to the remote web server a user is deliberately and directly accessing over the Internet's world wide web by putting its URL address into the browser's address bar or clicking on a link to go to that server's web site.
The first-party server may (and almost always will) include one or more web browser cookies with any replies it returns to the visiting web browser. Cookies returned by the first-party server are therefore referred to as first-party cookies.
Unlike third-party cookies (see below), first-party cookies must be received and returned during subsequent visits to identify you to the web sites you visit. They enable you to remain logged on and to receive web services. First-party cookies are therefore both benign and required for the use of modern Internet web site.
The term “third-party” refers to remote web servers which YOU are not visiting directly, but which are providing content for the first-party web sites you are visiting. In other words, when you visit a (first-party) web site, that site might return a web page containing advertisements that it doesn't itself provide, but which are provided by other third-party web servers.
The controversial and distressing fact is that these third-party servers, which you have not visited and over which you have little control and can't really even detect, are equally able to return their own web browser cookies with their replies. Since these cookies are being returned by third-party servers, they are called third-party cookies. And unless your web browser has been deliberately configured to ignore and reject third-party cookies (only Apple's Safari web browser is initially configured this way) your browser will accept, retain, and return these cookies in the future.
As you wander around the Internet, other first-party web sites you visit will be supplying advertisements and other content from these same third-party servers. Since your web browser will have previously received a cookie from the third-party sites, unless it has been configured to block third-party cookies, it will return the original cookie, thus allowing the third-party servers to track your movements around the Internet.
Third-party web browser cookies therefore function like an identifier beacon, uniquely identifying you and your web browser from among the millions of other Internet users. Over time a profile of the places you visit, the searches you enter, and the links you click on are compiled to create an increasingly comprehensive profile.
A “cookie” is HTML metadata, as described above, which is passed back and forth between web browsers and web servers. Anytime a web browser is querying a web server for any asset, it checks to see whether it might already have one or more “cookies” associated with the query's web domain. If so, it adds those cookies as metadata to its queries. Similarly, any time a web server wants to “set a cookie” — give a web browser a cookie — it simply adds one or more cookies to the metadata of its reply. Since cookie acceptance and return is the default behavior for all of today's web browsers — other than Apple's Safari which wonderfully blocks third-party cookies by default — web browsers that have not been configured to block cookies will return any and all cookies they have previously received from any server with the same domain name in the past.
The “Referer” — historically misspelled and now unfixable — is web browser query metadata sent with every browser query. Any time a web browser requests a web page or any assets (images, charts, advertisements, etc.) required to complete a web page, the browser indicates on whose behalf it is asking by including the URL of the referring page. “Referer” is the name of the query header that conveys and contains this referring information.
The ubiquitous presence of the “Referer” information is of concern because this is the way third-party (tracking) web servers know where Internet users are travelling as they move around the Net. Each web page they visit which requests an advertisement or other common asset from a third-party server transmits the URL of that web page in the Referer header as part of the advertisement (or other) asset query.
Thus, by storing the URLs provided by the Referer headers, a log of a user's long-term Internet use can be accumulated over time.
In the context of web browser cookies, “Context” refers to whether a cookie is being sent to, or received from, the same domain as the browser's page — thus, a first-party cookie — or any other different domain — thus, a third-party cookie. In other words, a cookie's “context” is the “partyness”, first or third, of the cookie.
Although this might seem like a simple issue that's hardly worthy of it own term, things can quickly become tricky when we're dealing with world wide web interactions and clever designers who very much want to track us. For example, when some web browsers are configured to block third-party cookies, they only block their receipt but not their return. That might seem fine, since if cookies aren't accepted and are blocked from getting in, then there's no cookie in the browser to get out. But web designers are tricky. Using something called “redirection” it's possible for a web server to return a redirection request in reply to a request for a web page. The browser then goes to the server it's been redirected to — such as a third-party advertiser's web server — which, since the web browser is there directly has a first-party context. If the advertiser's web server then gives a cookie to the browser as it bounces (re-redirects) the browser back to the original first-party site, that advertiser's cookie will have been received in a first-party context, and will be accepted by the web browser. Now an advertiser's cookie has sneaked in. If the web browser doesn't block the return of existing cookies, that tracking cookie will now be sent back even in response to third-party queries, thus allowing tracking to occur.
The most privacy aware browsers tag the “receipt context” of each of their cookies and only return cookies when the browser's query matches the context under which the cookie was received. We provide a complete treatment of this, and a test for your browser(s) on our Cookie Contexts page.
|
Gibson Research Corporation is owned and operated by Steve Gibson. The contents of this page are Copyright (c) 2024 Gibson Research Corporation. SpinRite, ShieldsUP, NanoProbe, and any other indicated trademarks are registered trademarks of Gibson Research Corporation, Laguna Hills, CA, USA. GRC's web and customer privacy policy. |
Last Edit: Feb 04, 2011 at 17:57 (4,973.52 days ago) | Viewed 2 times per day |