The fundamental problem is that the Internet has outgrown the original system
we are still trying to use. And rather than fixing it, the Internet industry has
largely stopped using it . . . and hoped no one would notice or care.
Each certificate authority publishes its own continuously changing list of the certificates it has issued that have been revoked and are not yet expired. Each issued certificate contains a URL link pointing to this list so that the operating system or web browser relying upon the certificate's veracity knows where to obtain the list for inspection.
This creates problems: Since certificates are being continually revoked, replaced or invalidated for various reasons, the “true list” is a continually moving target, and any copy of the list is out of date from nearly the moment it is published. Certificate authorities update their published revocation lists weekly. Since the certificate authority (CA) model is “trust unless proven invalid”, this means that an aging CRL will not contain information of recently revoked certificates, resulting in those certificates being erroneously trusted when they should not be.
The obvious solution is to update the CRLs as often as possible. So already we face a tradeoff in a system that, to be secure, should not have tradeoffs. The question is how often to we update? The answer is complicated by the fact that many certificate revocation lists are quite large:
Credit: Websense blog, July 10th, 2013
Note that the vertical axis is logarithmic. Thus, this chart tends to obscure the true magnitudes.
The smallest of these CRL's is 236 bytes, while the largest is a stunning 28,198,759 bytes! In other words, it is completely infeasible to download the larger CRL's on-the-fly . . . yet the CRL system only works if all web browsing clients have copies of the very latest up-to-date revocation lists.
Fortunately, only certificates which are both non-expired and invalid need to be listed in the CRLs. Domain validation (DV) certificates that are used to authenticate the identity of remote web servers have, at most, a three-year life and the life of Extended validation (EV) certificates is limited to two years. So the CRLs only need to maintain lists of invalid certificates for, at most, three years. But, as shown below, the rapid escalation in certificate revocation rates has outpaced even the expiration mechanism:
Credit: Websense blog, July 10th, 2013
Note that the apparent decline in 2013 is only caused by the fact that only the first five months of 2013 were available in this data. During only those first five months, 766,451 revocations occurred. Thus, 2013 was well on its way to hugely exceeding the previous year's total.
The Heartbleed Effect None of this takes into account the “revocation tsunami,” as it's beingcalled, resulting from the many tens of thousands of certificates revoked in the wake of the “heartbleed” vulnerability. Heartbleed has massively increased the size of CRLs for the next three years. [Reference: A page at SANS tracking heartbleed-related revocations with lots of great detail.] |
The (pre-heartbleed) explosion in certificate revocation depicted in the chart above begs the question: Why are all of these certificates being revoked? CRLs provide a facility for allowing their publisher to specify the reason for each revocation. Although not all CAs take advantage of this feature, most do. The following chart summarizes the available data:
Credit: Websense blog, July 10th, 2013
This data demonstrates that while the most prevalent reason for revocation is web server key compromise (or believed compromise), the strong runner-up is “Cessation of Operation.” In these cases, certificate authorities are absolutely required to revoke any certificates that they had issued to domains whose ownership has changed.
Many years before the CRL crisis became this acute, the Internet's
engineers were working on a replacement . . . known as OCSP.
This document specifies a protocol useful in determining the current status of a digital certificate without requiring CRLs. Additional mechanisms addressing PKIX operational requirements are specified in separate documents. |
The OCSP protocol allows anyone relying upon the validity of any apparently-valid certificate, to directly query the issuing certificate authority, on-the-fly, to determine whether the certificate is still valid.
At the time of this writing, April of 2014, the OCSP protocol is nearly fifteen years old. The primary incentive behind OCSP was to deal with the inherent “always out of date” nature of any static revocation list. So the impetus was to move to a dynamic real time revocation system.
There were always two problems with this system:
In other words, OCSP leaks browsing behavior
and, like CRL's, it does not scale very well.
There was also a HUGE problem for the browser vendors: What to do for no reply?
If an OCSP “responder” quickly affirms that a certificate is valid, everybody is happy. And if an OCSP responder affirms that a certificate which a site is attempting to use has previously been revoked, then the user is protected from what is almost certainly a site with malicious intent. So that's good too.
But what does, or should, the web browser do when it receives no response from the OCSP responder? How long should it wait for a response? Remember that the browser's goal is to protect its user from possibly malicious websites. So the web browser must suspend the new connection until it can determine whether it is safe to proceed. Due to the way the current certificate authority system operates, the only reliable way to know about the true up-to-the-second status of any certificate is to use OCSP to ask the certificate issuer directly whether the possibly-still-valid certificate is truly still valid. But this imposes a massive and ultimately undeliverable burden upon the OCSP servers. They are too often overwhelmed and unable to respond.
To make matters worse, there are situations where an OCSP query and/or response might be administratively blocked, such as when “captive portals” are used. A captive portal is one, such as where free WiFi is available, where the user's access to the wider Internet is blocked until they have logged on with their credentials, agreed to the portal's terms of service, watched an advertisement, or jumped through whatever hoops the bandwidth provider requires. The point is that such pages might require a user logon and might therefore be secured. But if the portal disallows any access to the wider Internet, there's no way for the user's OCSP query to be answered to confirm the portal's own security certificate. If web browsers had always enforced strong revocation checks, captive portals would have been designed to permit those checks. But that's not the way history has been written.
Needing to find a practical solution, web browsers do not block their user in the event of no reply to an OCSP request. If web browsers bother to perform OCSP checks at all (and many do not under their default security settings), they adopt a “fail soft” policy, meaning that they treat no reply as a good reply. Only the Firefox browser offers an option to treat no reply as invalid. It's alone in the pack, and even that option is disabled unless the user turns it on.
Today, only one browser offers you the choice to be totally safe: option, you will be completely protected at the theoretical cost of false positive blocks. We have always had it enabled and have never encountered a single false positive. |
The nearly universal adoption of OCSP fail-soft policy opens the revocation system to malicious interference. If an attacker can simply arrange to block a web browser's access to the OCSP system, the browser will fail soft, to treat a revoked certificate as valid.
Unfortunately, this original design of the OCSP system results in a single point of failure for the certificate verification system. If an OCSP server is offline, overloaded, under attack or unable to reply for any reason, certificate validity cannot be confirmed. And even when it is, the user's privacy is threatened. We still need a better solution.
As the very real problems of OCSP began to surface, clever Internet
engineers invented another solution known as “OCSP Stapling.”
Then, as defined by RFC 6066 in June, 2011, the TLS protocol was extended to allow a web browser to request and a web server to supply this OCSP information in its initial connection handshake. If the connection-initiating web browser indicates that it is aware of this TLS extension, and the web server offers the feature, the OCSP assertion can be provided at the time of the connection:
Aside from the fact that it is considerably more elegant to have the web site that's offering a certificate also able to reassert that certificate's continued and current validity, “stapling” the current OCSP status into the initial TLS handshake solves most of OCSP's longstanding problems:
Note that we didn't say that OCSP stapling solved every OCSP
problem . . . only MOST of them. What's the remaining problem?
A July 2013 survey by Netcraft showed that only 22% of all certificates were served with a stapled response:
Credit: Netcraft Archives, July 19th, 2013
And of those 22%, more than 95% of them were served by Microsoft servers. Microsoft's servers have had OCSP stapling built in and enabled by default for many years. So all recently deployed and updated Microsoft-based websites will be offering stapling:
Credit: Netcraft Archives, July 19th, 2013
Since there is no known reason for the majority of the Internet's non-Microsoft web servers to have stapling disabled, there is no excuse. All of the major servers offer it, though, as can be seen above, few websites are bothering to use it:
OCSP stapling has had many birthdays. It's growing up. There's
NO excuse for both web browser and servers not to be using it.
But now we know that a stapled OCSP response which is bundled right into the connection's opening handshake doesn't require access to another OCSP server. So it cannot be blocked.
So then the engineers explain, again correctly, that if bad guys steal a certificate and setup a malicious duplicate website, they will turn off their own server's OCSP stapling. Assuming that they can also block the web browser's direct access to the OCSP server, and the client is set to fail soft, the web browser will assume everything's fine and proceed with the malicious connection.
So what's the final solution?
It's known as “Must Staple” and it's coming soon
to your web browser . . . or at least to Firefox!
How can a browser know that a site “must staple”?
There are two proposals, both of which are likely to be adopted:
1: Add a “must staple” assertion to the site's security certificate:
Just as the TLS protocol has “extensions”, so, too, do security certificates. Here is the formal IETF proposal for adding a new extension to security certificates for this purpose:
Once this feature is implemented, any website wishing to protect its visitors from the possibility of revoked certificate abuse can include that assertion in its certificate. The web browser that receives this certificate can verify that an OCSP stapled reply was provided by the server and flatly refuse to proceed (hard fail) with the connection otherwise. It's the best possible solution to the revocation problem.
Any attacker who attempts to abuse such a certificate will be out of luck, because the certificate itself asserts that OCSP stapling must be provided. But the attacker cannot provide OCSP stapling because that would identify their use of the certificate as fraudulent. This renders stolen and revoked certificates completely useless, as they were always meant to be . . . but never really have been.
No changes are required to the web's servers other than updating Apache, nginx, and LiteSpeed to the latest versions that support OCSP stapling and enabling it. But it will take some time to get this certificate extension added to the Internet standards. In the meantime, we have the second solution . . .
2: Create a new HTTP response header similar to HSTS:
An alternative and immediately available solution has been proposed and is being worked on by members of the Mozilla Firefox team. Any web server that wants to protect its users from certificate fraud, and thus offers OCSP stapling as the first step, will soon be able to add a “Must-Staple:” response header to their server replies. The header includes a “max-age” specification which is usually set to a large value. Once this header is received – over a stapled connection to prevent abuse – any aware web browsers will note and retain that information for many months or years. This prevents bad guys from excluding that header and not using OCSP stapling. All web servers support the addition of “static response headers” through simple configuration options, so adding this assertion to the server takes just minutes.
As you will have already guessed, if the web browser has flagged a site as Must-Staple, because it once saw that header and the age hasn't yet expired, it will hard fail if (a) it cannot obtain OCSP from stapling or (b) as a backup it also cannot obtain it from the OCSP provider designated in the certificate. And once again we have high-reliability robust enforcement of certificate revocation.
The “first visit” problem: As with the HSTS (HTTP Strict Transport Security) header, this solution does suffer from the “first visit” problem: Without having at least once previously visited the authentic site to receive and retain the Must-Staple assertion, a web browser would not know to always insist upon OCSP for that site. This creates an opening for an attacker who can interfere with a web browser's first-ever visit to a website. While this is indeed possible, it's an instance where we should not allow the quest for a perfect solution to prevent us from using something that's very good in the meantime. Binding the Must-Staple assertion into a site's security certificate is the ultimate solution. But until then, we'll be able to have nearly-as-good enforcement almost immediately. And from a practical standpoint, the simple response header solution fully protects the websites we visit routinely where we are most likely to have a relationship requiring strong security, rather than sites we've never visited before.
Here is the late October, 2013, IETF mailing list archive posting by Brian Smith of the must-staple proposal: http://www.ietf.org/mail-archive/web/tls/current/msg10351.html
The best imperfect system possible
The only way to achieve that “instant global revocation” level of perfection, would be for the security of every TLS connection being made everywhere on the Internet to be individually verified, in real time, by the issuing certificate authority. Doing this securely would require not only absolutely reliable real time access to the issuing certificate authority, but also unique queries containing random nonces, each individually cryptographically signed to prevent both spoofing and reuse. There is no known practical way to achieve those goals today. None.
If it's not possible to do it perfectly, how then ‑exactly‑ does the OCSP Must Staple system perform when faced with a supremely powerful and capable attacker?
Here is the worst-case attack scenario for a site protected by the OCSP Must-Staple system:
No, it's not instantaneous, but we've already established that “instantaneous” is not possible, and this system cannot be bypassed. That's huge. It is enforceable, and it provides the best protection our current technology can provide. To minimize the post-revocation vulnerability window, sites requiring heightened security may negotiate to obtain shorter OCSP response lifetimes from the issuing authority.
When you consider that traditional certificate revocation lists, when they worked at all, were published weekly, this represents a considerable improvement.
It's where we need to go.
Web server engineers have generally stayed on the leading edge of improvements. Although when software has not enabled those features by default, they have often remained disabled.
Web browsers have faced the daunting task of “just working” for users who are focused upon the page's content and have no interest whatsoever in what's going on behind the scenes . . . and never want to. But at the same time, those users demand that their web browsers detect and protect them from any possible malicious activity.
With the advent of “Must-Staple” OCSP stapling, we finally appear to be nearing the end of a decades long struggle to find a solution to the certificate revocation problem which is both reliable and secure.
All major web servers can offer OCSP stapling today—Microsoft's have for years, by default.
The Mozilla team appears to be leading the effort to introduce near-term support for response header based must-staple, and the IETF is on the way to moving this into security certificates where it ultimately belongs.
All of the required technology is in place. Technically astute end users should ask their browser and operating system vendors to give priority attention to adding support for OCSP must-staple.
The Internet's engineers keep saying there's no demand for it. We believe that's because everyone assumes the current revocation systems work. Now we know they don't.
Only by asking for change will any change happen.
There is still a bit more coming: The surprising truth about how the various operating systems and web browsers operate today. Stay tuned! |
Gibson Research Corporation is owned and operated by Steve Gibson. The contents of this page are Copyright (c) 2024 Gibson Research Corporation. SpinRite, ShieldsUP, NanoProbe, and any other indicated trademarks are registered trademarks of Gibson Research Corporation, Laguna Hills, CA, USA. GRC's web and customer privacy policy. |
Last Edit: Apr 28, 2014 at 09:52 (3,817.78 days ago) | Viewed 11 times per day |