Closed Bug 360420 Opened 18 years ago Closed 12 years ago

Implement OCSP Stapling in libSSL

Categories

(NSS :: Libraries, enhancement, P1)

enhancement

Tracking

(blocking2.0 -)

RESOLVED FIXED
Tracking Status
blocking2.0 --- -

People

(Reporter: nelson, Assigned: KaiE)

References

(Blocks 1 open bug)

Details

Attachments

(7 files, 34 obsolete files)

(deleted), patch
rrelyea
: review+
Details | Diff | Splinter Review
(deleted), patch
rrelyea
: review+
Details | Diff | Splinter Review
(deleted), patch
rrelyea
: review+
Details | Diff | Splinter Review
(deleted), patch
rrelyea
: review+
Details | Diff | Splinter Review
(deleted), patch
KaiE
: review+
Details | Diff | Splinter Review
(deleted), patch
Details | Diff | Splinter Review
(deleted), patch
wtc
: review+
Details | Diff | Splinter Review
RFC 4366 specifies an extension for the TLS hello messages, known as the
"Certificate Status Request" extension, better known as "OCSP Stapling".

The idea is that the server sends its certificate AND the OCSP response
from it certificate's issuer, declaring that the server's cert is not 
revoked.  The OCSP response is thus "stapled" to the certificate. 

If the response is sufficiently recent and its signature can be verified
as coming from the issuer or the issuer's designated OCSP responder, then
the stapled response obviates an additional round trip for the OCSP request
and response.  

We should implement this in our libSSL.  To do so, we need the OCSP response
cache to be implemented.  The implementation would validate the response 
and then insert it into the cache.  Then later, at the point where we are
validating the server cert, and we're about to issue the OCSP request, we
will see the response in the cache and use it instead.  

It is debatable whether this needs to be implemented before OCSP can be 
enabled by default in the browser.
Assignee: nobody → ngm+mozilla
Nelson: are you able to provide a quick technical update here? For example, my understanding is that NSS now implements an OCSP response cache... is that right?

Gerv
Yes, the OCSP cache was implemented in bug 205406 in 2007 and has been in 
NSS since NSS 3.12.0.  I could provide much more update than than, but it
would be political, not technical.
A New Internet-Draft is available from the on-line Internet-Drafts
directories.

  Title       : Adding Multiple TLS Certificate Status Extension requests
  Author(s)   : Y. Pettersen
  Filename    : draft-pettersen-tls-ext-multiple-ocsp-00.txt
  Pages       : 9
  Date        : 2009-10-16

This document introduces a replacement of the TLS Certificate Status Extension 
to allow clients to specify and support multiple certificate status methods.  
Also being introduced is a new OCSP based method that servers can use to 
provide status information not just about the server's own certificate, 
but also the status of intermediate certificates in the chain.

A URL for this Internet-Draft is:
http://www.ietf.org/id/draft-pettersen-tls-ext-multiple-ocsp-00.txt
I am informed that Nagendra Modadugu, the assignee of this bug, is working on it and plans to complete it. :-)

Gerv
Attached patch OCSP stapling support for libssl (obsolete) (deleted) — Splinter Review
Here's a prelim patch to add OCSP stapling support to libssl.

I think the OCSP cache is supposed to be separate from libssl, so there's no interaction here. What it does provide is a way for the user of libssl to get the raw DER bytes of the response.

It's a shame that the RFC says that CertificateStatus comes after Certificate as this patch wants it the other way round. So it buffers the Certificate message until it either has a CertificateStatus, or it gets a message that means that a CertificateStatus isn't coming.

If we pulled the bottom half of ssl3_HandleCertificate into its own function then we could do without buffering the message. I must admit that the error handling in ssl3_HandleCertificate scared me away and I went for the easy answer.

(Only tested against Apache SVN. If anyone knows of a site that supports OCSP stapling, then I'd be happy to know.)
Attachment #418746 - Flags: review?
Adam, thank you for the patch.  To accept this patch we
also need a server-side implmenetation.  At least, that
will allow us to test NSS against itself.

Nagendra, what's the status of your work?  If you won't
be able to finish it soon, could you attach your work
in progress as a reference?  Thanks.
I can write the server side too. I'd like a little guidance about the interface.

In the client-side patch I keep OCSP completely separate from the rest of NSS since I believe that libssl is supposed to be somewhat independent. So would a server side patch that added a callback (i.e. "CertificateStatusCallback") be the correct design? The libssl user could register that callback and provide an OCSP reply (or not) for each new pertinent connection.
Comment on attachment 418746 [details] [diff] [review]
OCSP stapling support for libssl

I just glanced at this patch and found it to be much smaller than I would have guessed.  Is it thought to be complete (for the client side)? 
I will review it at length, hopefully within a week's time.
Attachment #418746 - Flags: review? → review?(nelson)
Depends on: 542510
Depends on: 542538
Adam, I discussed this with Robert Relyea and Wan-Teh and we agreed that the server side only needs to be implemented enough to allow effective testing of the client side code; we don't expect any NSS server products to enable the OCSP stapling support unless/until a server product vendor provides a real server side implementation. If you have any work in progress for the server side, please attach it to the bug.

Nominated as a FF 4.0 blocker. I discussed this with Sid Stamm and Dan Veditz and it was determined that this is important enough to try to get into the FF 4.0 release because it is a big privacy and performance win when the server supports it, and because it will help strengthen our CA policy (see bug 585122) to do EV revocation checking correctly. Besides this patch, we would need to create a patch to implement enough of the server side to enable automated tests.

We should also use the SSL Labs list, at least, to test NSS against existing OCSP-stapling-enabled servers. I will also find out the exact default OCSP stapling behavior of IE8 and IE9. If no regressions are detected using the SSL Labs tests and IE already supports stapling by default, then that should provide enough assurance that OCSP stapling will not be problematic to enable for FF 4.0.
blocking2.0: --- → ?
I don't consider this a priority I'm afraid. OCSP stapling is useful for very large sites to reduce load on OCSP responders and variance in revocation checking times. (That's why google.com uses it.) But it's busted by design. It can only carry a single response and hardly any sites have only one OCSP certificate in their chain these days.

So it doesn't eliminate the OCSP lookup delay, which it's primary attraction.

In the future, a multi-OCSP extension could be deployed, but I don't currently have the capacity to push that through at the moment.
The primary attraction for sites is to eliminate the delay (although that could also be done, to an extent, by making the request asynchronous and aborting document loading if it fails), but the primary attraction for CAs is to avoid melting their OCSP servers.

I agree that the right topology for certs has an (offline) root, an intermediate and an end-entity cert, and the latter two should have OCSP capability. But surely we can e.g. have the end-entity response returned stapled, and it's quite possible the intermediate (if it's from a common CA) will be cached?

Having said that, permitting the stapling of only one response was rather, er, short-sighted. :-|

Gerv
(In reply to comment #11)
>
> Having said that, permitting the stapling of only one response was rather, er,
> short-sighted. :-|

Could we suggest to the authors of RFC 4366, or the IETF-TLS mailing list, that a new variable-length TLS-extension should be created, that may n OCSP responses, where n <= length of chain? Maybe that's not too difficult to get RFC'd?
Being able to get the one OCSP response stapled is still a big benefit, especially for popular CAs as it is likely we would already have the CA OCSP responses in the cache (especially after we implement the persistent cache).

> Could we suggest to the authors of RFC 4366, or the IETF-TLS mailing list, that
> a new variable-length TLS-extension should be created, that may n OCSP
> responses, where n <= length of chain? Maybe that's not too difficult to get
> RFC'd?

There is/was already a draft for such an extension. See Bug 611836.
blocking2.0: ? → -
Whiteboard: [d?]
(In reply to comment #9)
<snip>
> Nominated as a FF 4.0 blocker. I discussed this with Sid Stamm and Dan Veditz
> and it was determined that this is important enough to try to get into the FF
> 4.0 release because it is a big privacy and performance win when the server
> supports it, and because it will help strengthen our CA policy (see bug 585122)
> to do EV revocation checking correctly. Besides this patch, we would need to
> create a patch to implement enough of the server side to enable automated
> tests.

I understand that FF 4.0 is likely to be released very soon.  Will it feature support for OCSP Stapling?
Rob: no. I'm afraid we are well past the time for taking changes of this magnitude. This bug has been marked "blocking2.0-", which means the product team will not block releasing Firefox for this bug. Also, no work has been done on the patch since 2009.

Gerv
(side note: Chromium has been running with this patch on Linux for some time and, therefore, should support OCSP stapling. I have, sadly, been too busy recently to actually verify that everything is actually working, but it certainly didn't break anything. We only support it on Linux because, on other platforms, we use the platform's certificate validation. On Windows this means that we can plumb the stapling response from libssl into the validation. On Mac, I don't think we support stapling.)
I agree with the blocking decision here, removing [d?]
Whiteboard: [d?]
Do we know of sites who already implement OCSP stapling, that could be used to test this code?
Blocks: 611836
You can try Google's websites.  Adam Langley set up OCSP stapling
on them.
Actually, Google isn't running OCSP stapling at the moment. It was shutdown due to the OpenSSL security issue [1] and I didn't switch it back on afterwards. It wasn't clear that the extra 1KB per connection was worthwhile when clients were caching the OCSP reply on disk anyway.

accounts.live.com:443 is doing stapling however if you need a test.

[1] http://www.openssl.org/news/secadv_20110208.txt
Attached patch patch v2 - untested work in progress (obsolete) (deleted) — Splinter Review
Thank you very much for this initial patch, which was a helpful starting point. I merged the patch to NSS trunk and enhanced it.

We should try to avoid the need to allocate-and-save the certificate message.
This enhanced patch attempts to implement that.

The idea is, split funtion ssl3_HandCertificate into two pieces. The second piece is the authentication piece, which get delayed, until we're sure the CertStatus has been received and imported into the cache.

I didn't like the proposed implementation of GetStapledData, because the caller should be able to retrieve the required buffer size, and not be required to guess it. I enhanced flexibility of the function, see the updated interface description in the patch.

r=kaie on the locking in SSL_GetOCSPStapledData, because that is the same as being used in similar functions.

The patch did not yet insert any data into the cache (this has already been mentioned in the bug). I made an initial attempt to implement that. It builds, but is not yet tested.

Thanks to the person who had implmented CERT_CacheOCSPResponseFromSideChannel already. 

This is a local diff. I'll attach a real cvs diff with more context etc. later.
Attachment #418746 - Attachment is obsolete: true
Attachment #418746 - Flags: review?(nelson)
I agree that allocating and saving the certificate message was a mess. I think I like your solution better.

The old GetStapledData could be called with NULL, 0 in order to get the size of the buffer but, if that wasn't obvious, then it was a bad API.

Also, the reason for not calling CERT_CacheOCSPResponseFromSideChannel was that, in Chromium, we don't ship with libnss on Linux so we can't know if that function is available. However, that shouldn't influence the code that goes into NSS.
(In reply to comment #21)
> The idea is, split funtion ssl3_HandCertificate into two pieces. The second
> piece is the authentication piece, which get delayed, until we're sure the
> CertStatus has been received and imported into the cache.

This is a good idea, but don't rename the ssl3_HandleCertficate to ssl3_HandleCertificateData. In ssl3con, all the ssl3_Handle* functions are named to match the handshake messages they process, and we should not change this.

> Thanks to the person who had implmented
> CERT_CacheOCSPResponseFromSideChannel already. 

We will need to add the automated tests for this function, to make sure it behaves as we expect. (We should also document our expectations of the semantics. A "revoked" status in the cache should never be replaced with a status that isn't "revoked," even if the non-revoked status is newer. Also, how do we deal with "try later", etc.)

I am no fan of "goto," but I don't think the refactoring of the error handling in ssl3_HandleCertificate is necessary and it makes the patch very hard to review. Please leave the error handling as it was.

Your new ss->ssl3.hs.pending_certs member is basically the same as ss->sec.peerCertChain. It is OK to assign ss->sec.peerCertChain in ssl3_HandleCertificate instead. In fact, it is useful to do so, not just for this bug, but for bug 651996 too. Note also that everything in ss->sec is "pending" during a handshake.
(In reply to comment #23)
> (In reply to comment #21)
> > The idea is, split funtion ssl3_HandCertificate into two pieces. The second
> > piece is the authentication piece, which get delayed, until we're sure the
> > CertStatus has been received and imported into the cache.
> 
> This is a good idea, but don't rename the ssl3_HandleCertficate to
> ssl3_HandleCertificateData. In ssl3con, all the ssl3_Handle* functions are
> named to match the handshake messages they process, and we should not change
> this.

ok


> > Thanks to the person who had implmented
> > CERT_CacheOCSPResponseFromSideChannel already. 
> 
> We will need to add the automated tests for this function, to make sure it
> behaves as we expect. (We should also document our expectations of the
> semantics. A "revoked" status in the cache should never be replaced with a
> status that isn't "revoked," even if the non-revoked status is newer.

I agree in general, but the logic might need improvement for "temporarily rejected" messages.


> Also,
> how do we deal with "try later", etc.)

I think the existing cache only stores meaningful responses.


> I am no fan of "goto," but I don't think the refactoring of the error
> handling in ssl3_HandleCertificate is necessary and it makes the patch very
> hard to review. Please leave the error handling as it was.

Removing the goto was necessary in order to split the function.

I don't understand why you see a reason to do a newer patch that tries to add gotos back. I think this is a good opportunity to get rid of the gotos.



> Your new ss->ssl3.hs.pending_certs member is basically the same as
> ss->sec.peerCertChain. It is OK to assign ss->sec.peerCertChain in
> ssl3_HandleCertificate instead. In fact, it is useful to do so, not just for
> this bug, but for bug 651996 too. Note also that everything in ss->sec is
> "pending" during a handshake.

I would have to look at this in more detail.
(In reply to comment #24)
> (In reply to comment #23)
> > (We should also document our expectations of the semantics. A "revoked"
> > status in the cache should never be replaced with a status that isn't
> > "revoked," even if the non-revoked status is newer.
>
> I agree in general, but the logic might need improvement for "temporarily
> rejected" messages.

RFC 2560 says:
   The "revoked" state indicates that the certificate has been revoked
   (either permanantly [sic] or temporarily (on hold)).

So I think that a newer non-revoked status *should* replace an older revoked status in the cache.
Attached patch library patch v3 (obsolete) (deleted) — Splinter Review
This is the previous patch merged to NSS trunk.

This patch passed my tests. I'll provide some updated on the tests I performed.

Let's try to get this into 3.13.3
Attachment #524993 - Attachment is obsolete: true
Target Milestone: --- → 3.13.3
Blocks: 700693
Blocks: 700701
Comment on attachment 572853 [details] [diff] [review]
library patch v3

Review of attachment 572853 [details] [diff] [review]:
-----------------------------------------------------------------

Heads up: Your refactoring of ssl3_HandleCertificate has many conflicts with the work I am doing for bug 542832.

I did not review the entire patch but I have added a few comments for ssl3con.c only.

::: mozilla/security/nss/lib/ssl/ssl3con.c
@@ +7878,5 @@
>   * ssl3 Certificate message.
>   * Caller must hold Handshake and RecvBuf locks.
>   */
>  static SECStatus
> +ssl3_HandleCertificateData(sslSocket *ss, SSL3Opaque *b, PRUint32 length)

Don't rename this function. All ssl3_HandleXXX functions are named after the handshake message that they parse, and we should maintain this naming convention.

@@ +8033,3 @@
>  
> +static SECStatus
> +ssl3_HandleCertificateAuth(sslSocket *ss, SSL3Opaque *b, PRUint32 length)

Let's not use the ssl3_Handle* naming convention for this function because ssl3_Handle* means something specific in libssl. I suggest "ssl3_AuthCertificate".

@@ +8034,5 @@
> +static SECStatus
> +ssl3_HandleCertificateAuth(sslSocket *ss, SSL3Opaque *b, PRUint32 length)
> +{
> +    SECStatus        rv;
> +    PRBool           isServer   = (PRBool)(!!ss->sec.isServer);

Need to get or assert that we have a lock here (I think ss->sec.isServer is protected by the 1st handshake lock).

@@ +8036,5 @@
> +{
> +    SECStatus        rv;
> +    PRBool           isServer   = (PRBool)(!!ss->sec.isServer);
> +    int              errCode    = SSL_ERROR_RX_MALFORMED_CERTIFICATE;
> +    PRBool isTLS = (PRBool)(ss->ssl3.prSpec->version > SSL_LIBRARY_VERSION_3_0);

AFAICT, you can get the negotiated version from ss->version instead. Otherwise, you would need to get or assert the spec. read lock before accessing ss->ssl3.prSpec.

@@ +8037,5 @@
> +    SECStatus        rv;
> +    PRBool           isServer   = (PRBool)(!!ss->sec.isServer);
> +    int              errCode    = SSL_ERROR_RX_MALFORMED_CERTIFICATE;
> +    PRBool isTLS = (PRBool)(ss->ssl3.prSpec->version > SSL_LIBRARY_VERSION_3_0);
> +    ssl3CertNode * certs = ss->ssl3.hs.pending_certs;

Need to get or assert the SSL3 handshake lock before accessing ss->ssl3.hs

@@ +8229,5 @@
> +loser(sslSocket *ss, ssl3CertNode *certs, SSL3AlertDescription desc, int errCode)
> +{
> +    ss->ssl3.peerCertChain = certs;
> +    /* what's the purpose of assigning null to these pointers?
> +     * is this an old leak?

No, it isn't a leak and I have clarified the code in my patch for bug 542832.

@@ +8820,4 @@
>  	rv = ssl3_HandleServerHello(ss, b, length);
>  	break;
>      case certificate:
> +	rv = ssl3_HandleCertificateData(ss, b, length);

It is important, for performance reasons, to start the certificate authentication as early as possible. In particular, we should call ssl3_HandleCertificateAuth here if !ss->ssl3.hs.may_get_cert_status.

@@ +8835,5 @@
>  	    return SECFailure;
>  	}
> +        rv = ssl3_HandleCertificateAuth(ss, b, length);
> +	if (rv != SECSuccess)
> +	    break;

AFAICT, this call isn't necessary if we do what I suggested above (adding the ssl3_HandleCertificateAuth call back to the processing of the Certificate message.)

@@ +8845,4 @@
>  	    PORT_SetError(SSL_ERROR_RX_UNEXPECTED_CERT_REQUEST);
>  	    return SECFailure;
>  	}
> +        rv = ssl3_HandleCertificateAuth(ss, b, length);

Ditto (see previous comment).

@@ +8861,4 @@
>  	    PORT_SetError(SSL_ERROR_RX_UNEXPECTED_HELLO_DONE);
>  	    return SECFailure;
>  	}
> +        rv = ssl3_HandleCertificateAuth(ss, b, length);

Ditto.

@@ +9714,5 @@
>  	ss->ssl3.hs.messages.len = 0;
>  	ss->ssl3.hs.messages.space = 0;
>      }
> +    ss->ssl3.hs.pending_certs = NULL;
> +    if (ss->ssl3.hs.cert_status.data) {

We should free the cert_status data as soon as we've called CERT_CacheOCSPResponseFromSideChannel, and in ssl3_DestroySSL3Info.

::: mozilla/security/nss/lib/ssl/ssl3ext.c
@@ +1056,5 @@
>  SECStatus
> +ssl3_ClientHandleCertificateStatusXtn(sslSocket *ss, PRUint16 ex_type,
> +                                      SECItem *data)
> +{
> +    /* If we didn't request this extension, then the server may not echo it. */

I believe that libssl enforces this automatically. (I would double-check though).
> > This is a good idea, but don't rename the ssl3_HandleCertficate to
> > ssl3_HandleCertificateData. 

I will reverted this change in the next patch.


> > > Thanks to the person who had implmented
> > > CERT_CacheOCSPResponseFromSideChannel already. 
> > 
> > We will need to add the automated tests for this function, to make sure it
> > behaves as we expect.

I cannot offer automated tests for this yet.
But I performed manual testing.

I've set up 3 server ports that support OCSP stapling.
Thanks to StartCom for providing me with free certs and free revocation for test purposes.

- https://kuix.de:5143
  uses a valid cert, with OCSP data saying "good cert"

- https://kuix.de:5144
  uses a revoked cert, with OCSP data saying "bad cert"

- https://kuix.de:5145
  uses an old expired test cert, where the OCSP server responds with "unauthorized request"

For testing purposes, I built Firefox with this patch, plus the patch from bug 700693 to enable client side request, set the pref to "true/enabled", and traced the OCSP behaviour, both using a network sniffing tool (tcpdump) and the OCSP tracing mechanisms (https://wiki.mozilla.org/NSS:Tracing ). Also I set OCSP to mandatory in Firefox validation prefs.

At port 5143, the good OCSP response gets inserted into the cache.
As expected, NSS will not attempt to contact the OCSP server.
This confirms that CERT_CacheOCSPResponseFromSideChannel works.

At port 5144 and port 5145, Firefox will ignore the stapled responses coming in from the SSL server. This is expected as commented in the patch. (We don't want a MITM server to send false information and poison our cache.) As expected, NSS will contact the OCSP server for current information.


> (We should also document our expectations of the
> > semantics.

Brian, where do you want the behaviour of CERT_CacheOCSPResponseFromSideChannel documented?
Which documentation is missing?
Please have a look at the implementation of CERT_CacheOCSPResponseFromSideChannel which already has a detailed comment.


> > > A "revoked"
> > > status in the cache should never be replaced with a status that isn't
> > > "revoked," even if the non-revoked status is newer.
> >
> > I agree in general, but the logic might need improvement for "temporarily
> > rejected" messages.
> 
> RFC 2560 says:
>    The "revoked" state indicates that the certificate has been revoked
>    (either permanantly [sic] or temporarily (on hold)).
> 
> So I think that a newer non-revoked status *should* replace an older revoked
> status in the cache.

In my opinion, for the sake of simplicity, this detail should be dealt with in a separate bug.

This concern is a general remark about the behaviour of the OCSP cache, which is not modified in this bug/task.
(In reply to Kai Engert (:kaie) from comment #28)
> > > We will need to add the automated tests for this function, to make sure it
> > > behaves as we expect.
> 
> I cannot offer automated tests for this yet.
> But I performed manual testing.

I think I have some code laying around to support OCSP unit tests. I will find it and post it for you.

> - https://kuix.de:5143
>   uses a valid cert, with OCSP data saying "good cert"
> 
> - https://kuix.de:5144
>   uses a revoked cert, with OCSP data saying "bad cert"
> 
> - https://kuix.de:5145
>   uses an old expired test cert, where the OCSP server responds with
> "unauthorized request"

> > (We should also document our expectations of the
> > > semantics.
> 
> Brian, where do you want the behaviour of
> CERT_CacheOCSPResponseFromSideChannel documented?
> Which documentation is missing?
> Please have a look at the implementation of
> CERT_CacheOCSPResponseFromSideChannel which already has a detailed comment.

I mean we should state clearly what semantics libssl's OCSP stapling support should have, not what semantics CERT_CacheOCSPResponseFromSideChannel should have. This will be necessary for the security review. I will post a separate comment detailing the semantics I think we should have once I have thought about it more.

> In my opinion, for the sake of simplicity, this detail should be dealt with
> in a separate bug.
>
> This concern is a general remark about the behaviour of the OCSP cache,
> which is not modified in this bug/task.

We have to answer these questions to know if using CERT_CacheOCSPResponseFromSideChannel is appropriate in the first place.
Kai, I remember why we cannot add the call to CERT_CacheOCSPResponseFromSideChannel inside of libssl. CERT_CacheOCSPResponseFromSideChannel internally calls CERT_VerifyCert*. In fact, IIRC, it calls it multiple times, wastefully. We cannot call CERT_VerifyCert* within libssl, because:

(1) AIA fetching, CRL fetching, etc. may need to be done, and that may need to be done on the thread that the call into libssl was done on
(2) We do not know how the application wishes to validate certificates. It may use a non-NSS certificate validation mechanism but still want to support OCSP stapling.

I believe that the original patch that Adam Langley posted in this bug added a function that allows the application to retrieve the raw OCSP response bytes during certificate validation. Then, the application is responsible for getting its certificate path validation engine to see the OCSP response (e.g. by calling CERT_CacheOCSPResponseFromSideChannel). I think we should take that approach. Perhaps we can even take that patch as-is, since it was already reviewed for Chromium.

(Also, in my half-started work on OCSP improvements in libpkix, I noticed that OCSP validation in NSS is needlessly inefficient. In particular, it doesn't take advantage of the fact that the OCSP response must either be signed by the signer of the certificate, or by a certificate that is directly signed by the signer of the certificate. Instead, when verifying an OCSP response, it needlessly builds a path all the way to the root CA cert and then verifies that. This causes many unnecessary PKI operations and certdb accesses. AFAICT, the only way to have efficient OCSP response verification is to extend libpkix so that we can pass the stapled OCSP response to CERT_PKIXVerifyCert.)
Assignee: ngm+mozilla → kaie
(3) It uses non-libpkix certificate path validation instead of libpkix.
(4) If we were to change CERT_CacheOCSPResponseFromSideChannel to use libpkix, we would also have to add a mechanism to libssl to configure the libpkix cert_pi_* and/or cert_po_* flags for that validation. That would be very messy.
Ok.

My understanding is, that my plan WOULD work based on the old logic and implementation of libssl and PSM's SSL thread.

I had a chat with Brian, and he's telling me that he's doing a major reorganization of this code.

I understand I will have to wait until this work is done.
(In reply to Kai Engert (:kaie) from comment #32)
> 
> I understand I will have to wait until this work is done.


Brian, which bug is the main one that tracks your reorganization work, that I should wait for (and that this bug should depend on)?
Bug 674147 is the bug you are referring to. But, there is no reason for that Firefox bug to block work on this bug. IIRC, Chromium works in a very similar way and Adam or Wan-Teh could probably refute or confirm my comments in comment 30 and comment 31.
I don't want to repeat the mistake of writing code that is no longer appropriate. Instead of trying to write code against a moving target, I would like to wait until your work has settled down and is known to be stable.
Depends on: 674147
Ok, after digesting comment 30 + 31...

I agree that the fact that our OCSP handling code has CERT_Verify* calls embedded is a problem, because the app wants to decide how to verify the OCSP signer cert (classic vs. pkix code). Verification should probably use the registered AuthCertificate callback.

I agree with your request that the application should trigger the verification, allowing it to control the thread that calls into verification.

However, where should the code reside, that processes the OCSP data, drives verification and cache injection? I tend to add this code to NSS, but allow the application to trigger this code at its prefered point of time.

Here is a proposed design:


- libSSL remembers the CertStatus data retrieved during the handshake,
  it can be queried using the file descriptor.

  (the patch already does that in function SSL_GetOCSPStapledData,
   I propose to rename the function to SSL_RevealCertificateStatus)


- libSSL triggers the cert validation by calling authCertificate,
  which calls either default function SSL_AuthCertificate or
  the application registered AuthCertificateHook
  
  (we already do that as of today)


- we implement a new function SSL_ProcessStapledCertStatus,
  which retrieves the stapled information using SSL_RevealCertificateStatus,
  iterates over the list of stapled OCSP responses and
  injects them into the cache.

  (It could call existing CERT_CacheOCSPResponseFromSideChannel 
   for each of the stapled responses).

  The function is a no-op if no status was retrieved from the server.


- we change the default SSL_AuthCertificate implementation
  to run SSL_ProcessStapledCertStatus prior to cert verification.


- in PSM, in the application provided authCertificate code,
  we'll call SSL_ProcessStapledCertStatus, prior to executing the 
  cert verification.

  PSM can return the would-block status prior to calling 
  SSL_ProcessStapledCertStatus, which means it can be executed in 
  the same asynchronous job that performs the certificate verification.

  With this approach, the handshake must be restarted only once,
  despite potentially verifying multiple certificates.


How can we control whether classic or pkix verification code will be called?

In my opinion, we must allow NSS code to execute verification functions on its own,
I assume NSS already does that in various scenarios.

In my opinion, the OCSP response handling code (in particular function
CERT_VerifyOCSPResponseSignature), which as of today calls CERT_VerifyCert,
should *not* be changed to call an application provided callback.

(In particular the authCertificate hook is inappropriate, because there is
 no SSL socket involved at this point.)

We should change CERT_VerifyOCSPResponseSignature to behave depending on
NSS' global flag usePKIXValidationEngine, which can be set by the application
using CERT_SetUsePKIXForValidation.

Depending on the flag, CERT_VerifyOCSPResponseSignature should call either
classic or pkix validation code, using NSS-level defined parameters.

(If you didn't like this approach, we whould have to add a new callback function
 for the purpose of application performed cert verification,
 independent from the context of a file descriptor.)


(Also note that PSM currently does not make any calls to the default 
 implementation SSL_AuthCertificate. With the above plan, processing of the
 cert status will happen only once.)


Another related proposal:
I think the application should not deal with the raw OCSP bytes, because some of the knowledge required for decoding the response might be private? I might be wrong here, but instead, my preference is to do the following:

Function SSL_GetOCSPStapledData (renamed to SSL_RevealCertificateStatus) should return a list of OCSP responses.

Why? My preference is that our implementation is already prepared with future multi-stapling. The code that retrieves data from SSL_RevealCertificateStatus shouldn't have to guess whether the data is a single OCSP response or a list of responses. By returning a list of OCSP responses, this code can be compatible with today and the future requirement of a multiple stapling handshakes.
Whiteboard: [feedback wanted for comment 36]
(In reply to Adam Langley from comment #20)
> 
> accounts.live.com:443 is doing stapling however if you need a test.

This site is gone, but https://login.live.com/ now uses stapling.
CVS Patch for NSS: I'm unable to create that at this time, because bug 542832 is not checked in, and my code needs careful merging with that code.

I can attach a patch that was created using hg - it applies fine on the NSS tree, after having applied the patches from bug 542832 first.
Attached patch patch v10 (NSS library portion) (obsolete) (deleted) — Splinter Review
patch on top of bug 542832
Attachment #572853 - Attachment is obsolete: true
Patch v10 implements most of comment 36.

It does not yet implement the dynamic switch between classic/pkix cert verification code inside the OCSP verification code. I would like to do that in a future step.

I attached a patch with PSM level changes to bug 700693.

Initial testing with Firefox looks good. (Not yet tested with the NSS test suite.)
(In reply to Brian Smith (:bsmith) from comment #29)
> 
> I think I have some code laying around to support OCSP unit tests. I will
> find it and post it for you.

Have you found it?

I would appreciate help or advice on adding automated tests for OCSP staplig.

What's today's state of OCSP testing?
Does it create OCSP responses on its own?

(If we didn't have any automated creation of OCSP responses yet, then I'd argue, manual testing should be sufficient to land this code.)
(In reply to Kai Engert (:kaie) from comment #36)
> I agree that the fact that our OCSP handling code has CERT_Verify* calls
> embedded is a problem, because the app wants to decide how to verify the
> OCSP signer cert (classic vs. pkix code). Verification should probably use
> the registered AuthCertificate callback.

I think it is OK to leave the CERT_Verify* calls for now, as long as we don't call the functions within libssl, or on the socket transport thread within Gecko.

> - we implement a new function SSL_ProcessStapledCertStatus,
>   which retrieves the stapled information using SSL_RevealCertificateStatus,
>   iterates over the list of stapled OCSP responses and
>   injects them into the cache.
>
>   (It could call existing CERT_CacheOCSPResponseFromSideChannel 
>    for each of the stapled responses).

I would prefer that we do not couple libssl to certhigh or any of the certificate validation/OCSP functions. Google Chrome on Windows uses libssl without using those functions, for example. I think it is reasonable to just have the application (PSM) loop through all the stapled OCSP response data itself for now, calling CERT_CacheOCSPResponseFromSideChannel itself.

> - we change the default SSL_AuthCertificate implementation
>   to run SSL_ProcessStapledCertStatus prior to cert verification.

+1. 

>   PSM can return the would-block status prior to calling 
>   SSL_ProcessStapledCertStatus, which means it can be executed in 
>   the same asynchronous job that performs the certificate verification.

PSM needs to extract out the OCSP response data before returning would-block, put them call do the calls to CacheOCSPResponseFromSideChannel on one of the certificate validation threads, for the reasons I explained in bug 700693 comment 3.

> We should change CERT_VerifyOCSPResponseSignature to behave depending on
> NSS' global flag usePKIXValidationEngine, which can be set by the application
> using CERT_SetUsePKIXForValidation.

I believe that, since CERT_VerifyOCSPResponseSignature calls CERT_VerifyCert, and CERT_VerifyCert already respects the "use libpkix" flag, this already happens.

> Another related proposal:
> I think the application should not deal with the raw OCSP bytes, because
> some of the knowledge required for decoding the response might be private? I
> might be wrong here, but instead, my preference is to do the following:
> 
> Function SSL_GetOCSPStapledData (renamed to SSL_RevealCertificateStatus)
> should return a list of OCSP responses.
> 
> Why? My preference is that our implementation is already prepared with
> future multi-stapling. The code that retrieves data from
> SSL_RevealCertificateStatus shouldn't have to guess whether the data is a
> single OCSP response or a list of responses. By returning a list of OCSP
> responses, this code can be compatible with today and the future requirement
> of a multiple stapling handshakes.

I agree that, if it is simple and clear how to add multi-stapling support, then it is OK to take that into consideration for this API. But, I would prefer to return a list of SECItems that contain the raw OCSP response data. I think this would accomplish the same thing with respect to future multi-stapling support. I would avoid adding any references to the CERTOCSPResponse type to the libssl API for the reason I mentioned previously--we shouldn't add a new dependency on certhigh's OCSP types or data structures, to support applications that use non-NSS certificate validation with libssl, like Chromium.
(In reply to Brian Smith (:bsmith) from comment #42)
> PSM needs to extract out the OCSP response data before returning
> would-block, put them call do the calls to CacheOCSPResponseFromSideChannel

s/put them/but then/
Attached patch patch v11 (obsolete) (deleted) — Splinter Review
Brian, I believe this patch addresses all your requests and suggestions.

I still want to provide a kind of default implementation.
In order to avoid the dependency between OCSP and SSL code, I introduced a new callback function.

An application that wishes to use the default processing can call 
  SSL_ProcessCertStatusHook(fd, CERT_CacheOCSPResponseFromSideChannel)
and it will be called from inside the default SSL_AuthCertificate code.
Attachment #581511 - Attachment is obsolete: true
Attachment #581795 - Flags: superreview?
Attachment #581795 - Flags: review?(bsmith)
Whiteboard: [feedback wanted for comment 36]
Comment on attachment 581795 [details] [diff] [review]
patch v11

Review of attachment 581795 [details] [diff] [review]:
-----------------------------------------------------------------

Please include showfunc = true in your hgrc, so that the patch context includes the function name.

You set the sr? but did not provide an email address. Who do you intend to do the sr? I recommend Wan-Teh, especially if agl doesn't have time to provide feedback.

::: security/nss/lib/certhigh/ocsp.c
@@ +809,5 @@
>      SECStatus rv;
>      OCSPCacheItem *cacheItem;
>      OCSP_TRACE(("OCSP ocsp_CreateOrUpdateCacheEntry\n"));
>    
> +    if (certIDWasConsumed)

Kai, why not just pass a valid pointer for certIDWasConsumed in the new code below? Then we can avoid changing this function, which seems to be used via many code paths, which do not pass NULL here.

@@ +4893,5 @@
>   *   CERTCertDBHandle *handle
>   *     certificate DB of the cert that is being checked
>   *   CERTCertificate *cert
>   *     the certificate being checked
> + *     if cert is null, function will cache all valid responses

A caller should never pass NULL for cert. At least for OCSP stapling, we never need to do so.

The check that that the OCSP response is applicable to a given certificate is a very important one for security. If you don't think this check is needed, please explain why you think it is safe to omit it, and also why it is necessary for this change to be made for enabling EE OCSP stapling.

::: security/nss/lib/ssl/ssl.h
@@ +397,5 @@
> + * should call
> + *      SSL_ProcessCertStatusHook(fd, CERT_CacheOCSPResponseFromSideChannel)
> + * after each socket construction.
> + */
> +SSL_IMPORT SECStatus SSL_ProcessCertStatusHook(PRFileDesc *fd, 

Sorry I wasn't clear before. I think it is OK for SSL_AuthCertificate to call CERT_CacheOCSPResponseFromSideChannel directly, because (a) SSL_AuthCertificate already has a dependency on certhigh, (b) applications that don't want to use NSS for certificate validation will not use SSL_AuthCertificate, and (c) Chromium builds NSS in such a way that SSL_AuthCertificate doesn't get linked in, so it shouldn't matter for them, AFAICT.

Accordingly, we don't need SSL_ProcessCertStatusHook.

::: security/nss/lib/ssl/ssl3con.c
@@ +7827,5 @@
> +    SSL3AlertDescription desc;
> +
> +    if (!ss->ssl3.hs.may_get_cert_status ||
> +       ss->ssl3.hs.ws != wait_server_cert ||
> +       ss->ssl3.hs.cert_status.len) {

AFAICT, this condition should just be ss->ssl3.hs.ws != wait_cert_status. The spec says:

> If a server returns a "CertificateStatus" message, then
> the server MUST have included an extension of type 
> "status_request" with empty "extension_data" in the
> extended server hello.

In ssl3_HandleCertificate, we can set ss->ssl3.hs.ws to wait_cert_status, and check in ssl3_HandleCertificateStatus that ws == wait_cert_status. That way, we can guard against the case where a CertificateStatus message is sent before (or instead of) the Certificate message, and/or the case where multiple CertificateStatus messages are sent.

To handle the case where the server negotiated the extension but doesn't include a CertificateStatus message, ssl3_HandleCertificateAuth should advance from the wait_cert_status to the next state. (In fact, it already advances to the next state.)

Such changes may make pendingHandleCertificateAuth redundant. (I am not sure.)

@@ +7835,5 @@
> +    }
> +
> +    status = ssl3_ConsumeHandshakeNumber(ss, 1, &b, &length);
> +    if (status != 1)
> +       goto format_loser;

This 4-space-then-3-space indention style doesn't match the rest of libssl. Tabs are always 4 spaces and it is also required to use tabs instead of 8 leading spaces in libssl. (Maybe you are using 8 spaces and this is just a weird thing in the Splinter review tool; in any case, these should be tabs.)

@@ +7845,5 @@
> +    SECITEMARRAY_Alloc(NULL, &ss->ssl3.hs.cert_status, 1);
> +    if (!ss->ssl3.hs.cert_status.items)
> +       return SECFailure;
> +
> +    ss->ssl3.hs.cert_status.items[0].data = PORT_Alloc(length);

We need to check that the OCSP response is of a sane size, like we do for the Certificate message. I suggest 16KB is more than enough.

@@ +7859,5 @@
> +    return SECSuccess;
> +
> +format_loser:
> +    errCode = SSL_ERROR_BAD_CERT_STATUS_RESPONSE_ALERT;
> +    desc = bad_certificate_status_response;

I believe that the bad_certificate_status_response is only supposed to be returned if there is something wrong with the OCSP response itself, not the CertificateStatus message. I think decode_error is the correct alert for these types of errors. (And then, you can use your new decode_loser function to report the errror.)

The cases where bad_certificate_status_response should be sent are cases like (a) the OCSP response's cert ID doesn't match the EE certificate, (b) the OCSP response was signed by a CA that didn't issue the certificate, (c) the OCSP response couldn't be parsed, etc. Basically, these are the cases where CERT_CacheOCSPResponseFromSideChannel returns an error (other than memory allocation errors and the like).

Unfortunately, the spec says that we MUST return bad_certificate_status_response when the OCSP response is bad. AFAICT, the only way we can do that is to document that the cert auth hook must return SSL_ERROR_BAD_CERT_STATUS_RESPONSE_ALERT if there is a problem with the OCSP response, and then send the alert if it does so.

But, that means that the async interface I defined in bug 542832 is insufficient, because it doesn't allow an async auth cert hook to return errors to libssl. It looks like I need to change that interface so that it can do so. :(

@@ +7914,5 @@
>  	CERT_DestroyCertificate(ss->sec.peerCert);
>  	ss->sec.peerCert = NULL;
>      }
> +    
> +    ss->ssl3.hs.pendingHandleCertificateAuth = PR_TRUE;

Note that in the "no certificate" case for servers, pendingHandleCertificateAuth should be set to false before we "return SECSuccess" below, right?

@@ +7944,5 @@
> +            return loser(ss, desc, errCode);
> +	}
> +        ss->ssl3.hs.ws = wait_client_key;
> +        return SECSuccess;
> +        /*return check_loser(rv, ss, desc);*/

remove the comment.

@@ +8033,5 @@
> +    return SECSuccess;
> +}
> +
> +static SECStatus
> +ssl3_HandleCertificateAuth(sslSocket *ss, SSL3Opaque *b, PRUint32 length)

I suggest using the name ssl3_AuthCertificateIfNeeded. ssl3_Handle* is the convention for functions that parse peer records/messages.

@@ +8037,5 @@
> +ssl3_HandleCertificateAuth(sslSocket *ss, SSL3Opaque *b, PRUint32 length)
> +{
> +    SECStatus        rv;
> +    PRBool           isServer   = (PRBool)(!!ss->sec.isServer);
> +    int              errCode    = SSL_ERROR_RX_MALFORMED_CERTIFICATE;

Do not initialize errCode to SSL_ERROR_RX_MALFORMED_CERTIFICATE because that error code is not valid for anything this function checks.

@@ +8039,5 @@
> +    SECStatus        rv;
> +    PRBool           isServer   = (PRBool)(!!ss->sec.isServer);
> +    int              errCode    = SSL_ERROR_RX_MALFORMED_CERTIFICATE;
> +    PRBool isTLS = (PRBool)(ss->ssl3.prSpec->version > SSL_LIBRARY_VERSION_3_0);
> +    SSL3AlertDescription desc   = bad_certificate;

We may need to send either bad_certificate or bad_certificate_status_response depending on why the cert authentication failed.

(Again, there is a bug in my definition of SSL_RestartHandshakeAfterAuthCertificate that causes us to never send a bad_certificate alert.)

@@ +8809,5 @@
>  	}
>  	rv = ssl3_HandleServerHello(ss, b, length);
>  	break;
>      case certificate:
>  	rv = ssl3_HandleCertificate(ss, b, length);

Here, we should call ssl3_HandleCertificateAuth if the certificate status extension wasn't negotiated, because we want to start authenticating the certificate ASAP for performance reasons.

@@ +9699,5 @@
>  	ss->ssl3.hs.messages.len = 0;
>  	ss->ssl3.hs.messages.space = 0;
>      }
>  
> +    if (ss->ssl3.hs.cert_status.len) {

This is good. But, we should be careful to document in ssl.h that the certificate status data is only available within the auth certificate hook, and not elsewhere.

::: security/nss/lib/ssl/ssl3ext.c
@@ +1056,5 @@
> +ssl3_ClientHandleCertificateStatusXtn(sslSocket *ss, PRUint16 ex_type,
> +                                      SECItem *data)
> +{
> +    /* If we didn't request this extension, then the server may not echo it. */
> +    if (!ss->opt.enableOCSPStapling)

Is this check necessary? i thought there was already a check in the code that would call this, such that this function will only be called when the extension was correctly negotiated.

@@ +1060,5 @@
> +    if (!ss->opt.enableOCSPStapling)
> +	return SECFailure;
> +
> +    /* The echoed extension must be empty. */
> +    if (data->len != 0)

Need to set error code.

@@ +1063,5 @@
> +    /* The echoed extension must be empty. */
> +    if (data->len != 0)
> +	return SECFailure;
> +
> +    ss->ssl3.hs.may_get_cert_status = PR_TRUE;

AFAICT, we don't need this variable. We can just call ssl3_ExtensionNegotiated in ssl3_HandleCertificateStatus.

::: security/nss/lib/ssl/sslauth.c
@@ +263,5 @@
>      CERTCertDBHandle * handle;
>      sslSocket *        ss;
>      SECCertUsage       certUsage;
>      const char *             hostname    = NULL;
> +    PRTime             tnow = PR_Now();

Nit: "now" is a better name.

::: security/nss/lib/ssl/sslimpl.h
@@ +779,5 @@
>      SECItem               ca_list;     /* used only by client */
>      PRBool                isResuming;  /* are we resuming a session */
>      PRBool                usedStepDownKey;  /* we did a server key exchange. */
>      PRBool                sendingSCSV; /* instead of empty RI */
> +    PRBool                may_get_cert_status; /* the server echoed a

use camelCase.

@@ +784,5 @@
> +                                                  cert_status extension so may
> +                                                  send a CertificateStatus
> +                                                  handshake message. */
> +    PRBool                pendingHandleCertificateAuth;
> +    SECItemArray          cert_status; /* an array of OCSP responses */

use camelCase.

::: security/nss/lib/ssl/sslreveal.c
@@ +65,5 @@
>  
> +/* TODO: Implement new function SSL_RevealCertList
> + *       that returns all certificates sent by the server.
> + *       It will be required to implement tls-ext-multiple-ocsp.
> + */

Remove this comment. There is already an open bug in bugzilla for this feature.

@@ +105,5 @@
>  
> +/* Given PRFileDesc, returns the OCSP information that was sent by the server
> + * during the handshake.
> + * We use a list of OCSP responses, to prepare for tls-ext-multiple-ocsp.
> + * Caller must free data when done.

Move the application-level documentation to ssl.h.

Document that the function may only be called from the auth certificate hook (or bad cert handler?).
Attachment #581795 - Flags: review?(bsmith)
Attachment #581795 - Flags: review-
Attachment #581795 - Flags: feedback?(agl)
I suggest that we do the following:

1. define a new error code SSL_ERROR_BAD_CERTIFICATE_STATUS_RESPONSE.

2. If the auth certificate hook returns SECfailure with PR_GetError() returning SSL_ERROR_BAD_CERTIFICATE_STATUS_RESPONSE, then send the bad_certificate_status_response alert. Otherwise, send the bad_certificate alert for other error codes as is currently done.

3. Update the patch in bug 542832 so that the SSL_RestartHandshakeAfterAuthCertificate function is given the following signature:

  SECStatus SSL_AuthCertificateComplete(PRFileDesc* fd, PRErrorCode status);

Then, interpret the value of status like we would interpret the value of PR_GetError() in #2 above (sending an alert mapped from the error code, if/as appropriate).

I am updating the patch in bug 542832 to do this now.
I modified the patches in bug 542832 so that the application can pass the error back to libssl, so that libssl can send the appropriate alert. In the process of doing so, I had to factor out the code that maps errors to alerts in ssl3_HandleCertificate (see bug 542832 attachment 584595 [details] [diff] [review]). I believe this will make your patch much simpler as, AFAICT, it will no longer have to refactor the error handling in ssl3_HandleCertificate.
Blocks: 714477
Blocks: 725985
Priority: -- → P1
Target Milestone: 3.13.3 → 3.13.4
Version: 3.11.3 → trunk
Target Milestone: 3.13.4 → 3.13.5
Attached patch OCSP stapling patch used in Chromium (obsolete) (deleted) — Splinter Review
This is the current version of Adam Langley's patch (attachment 418746 [details] [diff] [review]).
Comment on attachment 616669 [details] [diff] [review]
OCSP stapling patch used in Chromium

Two comments on this patch:

1. The use of "unsigned int *len" by SSL_GetStapledOCSPResponse as an in/out argument
does not match the NSS convention.  We should add a separate "unsigned int maxLen"
input argument, and use "unsigned int *len" as an output argument.

2. Adam's original patch used the -O option to enable OCSP stapling in tstclnt,
presumably because "OCSP" starts with "O".

The -O option has since been taken for synchronous certificate validation.  So I
changed the patch to use -P to enable OCSP stapling.
Comment on attachment 616669 [details] [diff] [review]
OCSP stapling patch used in Chromium

The alternative chromium patch described in comment 48 and 49 appears to be incomplete.

I don't see any interaction with the OCSP cache

(As a reminder, the interaction with the OCSP cache, using a strategy that avoids a direct call from libSSL into the OCSP cache, that was the reason why I have introduced callbacks and why my patch is longer.)
Attachment #616669 - Flags: feedback-
Attached patch Patch v12 (obsolete) (deleted) — Splinter Review
Patch v11 == bitrotted

Merged to trunk => Patch v12

Now with function names and larger context (diff -u20 -p)
Attachment #581795 - Attachment is obsolete: true
Attachment #581795 - Flags: superreview?
Attachment #581795 - Flags: feedback?(agl)
(In reply to Kai Engert (:kaie) from comment #50)
> I don't see any interaction with the OCSP cache

I think the code you're looking for is...
http://src.chromium.org/viewvc/chrome/trunk/src/net/socket/ssl_client_socket_nss.cc?view=markup
(Search for the first instance of "CertSetCertificateContextProperty").
(In reply to Rob Stradling from comment #52)
> 
> I think the code you're looking for is...
> http://src.chromium.org/viewvc/chrome/trunk/src/net/socket/
> ssl_client_socket_nss.cc?view=markup
> (Search for the first instance of "CertSetCertificateContextProperty").

Thanks, but in my understanding that's application code, while this bug rather needs general purpose library level patches.
> 
> 2. Adam's original patch used the -O option to enable OCSP stapling in
> tstclnt,
> presumably because "OCSP" starts with "O".
> 
> The -O option has since been taken for synchronous certificate validation. 
> So I
> changed the patch to use -P to enable OCSP stapling.


Let's handle enhancements of the tools in bug 700701.
Comment on attachment 618252 [details] [diff] [review]
Patch v12

Patch fails in test suite, needs investigation
Attachment #618252 - Flags: feedback-
(In reply to Kai Engert (:kaie) from comment #53)
> Thanks, but in my understanding that's application code, while this bug
> rather needs general purpose library level patches.

You are both right. For Firefox (and Chrome of course) we should do things similar to the way Chrome does it now, but also the default SSL certificate validation function SSL_AuthCertificate should be modified to provide an example (and default implementation) for applications, using the same functions that Firefox and Chrome would use to extract out the stapled OCSP response and add it to the OCSP cache, as they do not use SSL_AuthCertificate. This was discussed already in previous review comments.
> For Firefox (and Chrome of course) we should do things
> similar to the way Chrome does it now, 

I don't understand why you are proposing this.

Obviously the Chrome patch isn't a complete NSS library patch.

My patch is, and my patch already contains a solution for injecting the data into the OCSP cache.
> using the same
> functions that Firefox and Chrome would use to extract out the stapled OCSP
> response and add it to the OCSP cache

I'm opposing any solution that requires application level code for this to function.

I don't want a solution that requires the application to deal with this.

I want a general purpose solution at the NSS library level, that will work by default for any application using NSS SSL.

Can I get any support for this point of view?
Attached patch Patch v14 (obsolete) (deleted) — Splinter Review
Patch v12 required some small additions.

This patch v14 successfully passes the NSS test suite.
Attachment #618252 - Attachment is obsolete: true
I'll now try to work through the change proposals from comment 45 and later.
(In reply to Kai Engert (:kaie) from comment #58)
> I don't want a solution that requires the application to deal with this.

> I want a general purpose solution at the NSS library level, that will work
> by default for any application using NSS SSL.

I agree that it is good to have a default implementation for most applications to use, and especially for use in testing at the NSS level.

But, AFAICT, it is easy to make your changes to ocsp/*, sslauth.c on top of Google's patch to get a default implementation, right? That would be much less code to review because AGL and WTC already wrote, reviewed, tested, and deployed to production the vast majority of the code.

In my previous comments, I mentioned that it is important that there not be any calls to ocsp_*/OCSP_* in the core of libssl. I see that you addeded the processCertStatus in response to that. I was unclear in what I meant: it is OK to call ocsp_*/OCSP_* in SSL_AuthCertificate, but not elsewhere in libssl, because applications for which those calls would be problematic (Chrome, Firefox) also need to avoid calling SSL_AuthCertificate for similar reasons. So, the complexity added by the processCertStatus callback part is unnecessary.

Also, in your patch, I didn't see a default implementation of the processCertStatus hook. Did I overlook it?
(In reply to Brian Smith (:bsmith) from comment #61)
> 
> But, AFAICT, it is easy to make your changes to ocsp/*, sslauth.c on top of
> Google's patch to get a default implementation, right? 


Why should *I* write a new patch on top of someone's else patch, if I already have my own patch, which is more complete?
Whiteboard: [rename cert_status=>certStatus]
Whiteboard: [rename cert_status=>certStatus]
(In reply to Brian Smith (:bsmith) from comment #45)
> 
> > +    if (certIDWasConsumed)
> 
> why not just pass a valid pointer for certIDWasConsumed in the new code
> below? Then we can avoid changing this function, which seems to be used via
> many code paths, which do not pass NULL here.


Sorry, I don't understand your proposal,
or the way I understand it doesn't make sense to me.

I'm not changing the way how this function is being used,
it should be fine to keep this function in the way I change it.

If you still think it's absolutely necessary that this change must be made,
the please explain in more detail.
Or maybe you should implement this change yourself, 
in order to push things forward.


> > + *     if cert is null, function will cache all valid responses
> 
> A caller should never pass NULL for cert. At least for OCSP stapling, we
> never need to do so.


Ok. I had misinterpreted the multi-stapling RFC (or it might have changed
after I read it the last time).

The latest draft 3 says that the multi stapling response may contain
OCSP responses for certificates found in the handshake message,
but not for any unrelated certificates.

I'm changing the code to require that a cert is provided.


> ::: security/nss/lib/ssl/ssl3con.c
> @@ +7827,5 @@
> > +    SSL3AlertDescription desc;
> > +
> > +    if (!ss->ssl3.hs.may_get_cert_status ||
> > +       ss->ssl3.hs.ws != wait_server_cert ||
> > +       ss->ssl3.hs.cert_status.len) {
> 
> AFAICT, this condition should just be ss->ssl3.hs.ws != wait_cert_status.


I had taken this code unchanged from the original patch.

Following your argument that the Chrome code has
already been tested, I propose we follow the latest implementation
of function ssl3_HandleCertificateStatus from the chrome patch

(... ignoring a small delta related to allocation,
 caused by the nature of the different "pending data"
 strategy used in the chrome patch).

I think this suggests that we should keep the current expression.

Given that this code works, I don't see a reason to follow
your suggestion.


> This 4-space-then-3-space indention style doesn't match the rest of libssl.


Please let's postpone any complaints about whitespace, 
and let's rather focus on wording towards a functionally acceptable patch.


> > +    SECITEMARRAY_Alloc(NULL, &ss->ssl3.hs.cert_status, 1);
> > +    if (!ss->ssl3.hs.cert_status.items)
> > +       return SECFailure;
> > +
> > +    ss->ssl3.hs.cert_status.items[0].data = PORT_Alloc(length);
> 
> We need to check that the OCSP response is of a sane size, like we do for
> the Certificate message. I suggest 16KB is more than enough.


I agree with the idea of a maximum, but let's be prepared for multi-stapling.

16 KB seems to small to me, if multiple certificates can be involved.
Even a much larger limit will protect us from
malicious servers that try to crash the client.

We should use an official limit, if there is one.
Let's start with a larger value.
You can file a follow-up bug if you want to lower this limit,
or research the correct limit.


> > +format_loser:
> > +    errCode = SSL_ERROR_BAD_CERT_STATUS_RESPONSE_ALERT;
> > +    desc = bad_certificate_status_response;
> 
> I believe that the bad_certificate_status_response is only supposed to be
> returned if there is something wrong with the OCSP response itself, not the
> CertificateStatus message. 


Given that you're willing to accept Chrome's patch,
and given that that patch uses the same approach,
I propose we postpone your worry to some followup bug.


> > +    ss->ssl3.hs.pendingHandleCertificateAuth = PR_TRUE;
> 
> Note that in the "no certificate" case for servers,
> pendingHandleCertificateAuth should be set to false before we "return
> SECSuccess" below, right?

Yes. I also discovered this today because of a test suite failure.
Done


> > +        /*return check_loser(rv, ss, desc);*/
> 
> remove the comment.

ok


> > +ssl3_HandleCertificateAuth(sslSocket *ss, SSL3Opaque *b, PRUint32 length)
> 
> I suggest using the name ssl3_AuthCertificateIfNeeded. ssl3_Handle* is the
> convention for functions that parse peer records/messages.

ok




> > +ssl3_HandleCertificateAuth(sslSocket *ss, SSL3Opaque *b, PRUint32 length)
> > +{
> > +    SECStatus        rv;
> > +    PRBool           isServer   = (PRBool)(!!ss->sec.isServer);
> > +    int              errCode    = SSL_ERROR_RX_MALFORMED_CERTIFICATE;
> 
> Do not initialize errCode to SSL_ERROR_RX_MALFORMED_CERTIFICATE because that
> error code is not valid for anything this function checks.

It's not "my" error code init.
This is the same default that was used in this code path.
Let's keep this default for now, rather than introducing new scenarios.


> > +    SECStatus        rv;
> > +    PRBool           isServer   = (PRBool)(!!ss->sec.isServer);
> > +    int              errCode    = SSL_ERROR_RX_MALFORMED_CERTIFICATE;
> > +    PRBool isTLS = (PRBool)(ss->ssl3.prSpec->version > SSL_LIBRARY_VERSION_3_0);
> > +    SSL3AlertDescription desc   = bad_certificate;
> 
> We may need to send either bad_certificate or
> bad_certificate_status_response depending on why the cert authentication
> failed.

This code path handles the certificate message, not the status message.


> >  	}
> >  	rv = ssl3_HandleServerHello(ss, b, length);
> >  	break;
> >      case certificate:
> >  	rv = ssl3_HandleCertificate(ss, b, length);
> 
> Here, we should call ssl3_HandleCertificateAuth if the certificate status
> extension wasn't negotiated, because we want to start authenticating the
> certificate ASAP for performance reasons.


This isn't a change, it's context.
This bug is about getting the functionality implemented.
Let's do performance tweaks in a follow up bug.


> > +    if (ss->ssl3.hs.cert_status.len) {
> 
> This is good. But, we should be careful to document in ssl.h that the
> certificate status data is only available within the auth certificate hook,
> and not elsewhere.

Should be unnecessary.
This cleanup code is called from ssl3_DestroySSL3Info,
which is called from ssl_DestroySocketContents,
and all the socket information is destroyed together.


> > +ssl3_ClientHandleCertificateStatusXtn(sslSocket *ss, PRUint16 ex_type,
> > +                                      SECItem *data)
> > +{
> > +    /* If we didn't request this extension, then the server may not echo it. */
> > +    if (!ss->opt.enableOCSPStapling)
> 
> Is this check necessary? i thought there was already a check in the code
> that would call this, such that this function will only be called when the
> extension was correctly negotiated.

I don't know.
Given this check is in both the original and the latest chrome patch,
let's keep it.


> > +    if (!ss->opt.enableOCSPStapling)
> > +	return SECFailure;
> > +
> > +    /* The echoed extension must be empty. */
> > +    if (data->len != 0)
> 
> Need to set error code.

Chrome patch doesn't set error code.


> > +    ss->ssl3.hs.may_get_cert_status = PR_TRUE;
> 
> AFAICT, we don't need this variable. We can just call
> ssl3_ExtensionNegotiated in ssl3_HandleCertificateStatus.

Chrome patch still uses it.
Let's keep it, as your proposal would require rewriting the logic,
and I'm glad that it's working.


> > +    PRBool                may_get_cert_status; /* the server echoed a
> 
> use camelCase.

ok


> > +                                                  handshake message. */
> > +    PRBool                pendingHandleCertificateAuth;
> > +    SECItemArray          cert_status; /* an array of OCSP responses */
> 
> use camelCase.

If we make this change, it will make it quite difficult 
to compare earlier patches with the newer patches.

This is an unnecessary change request,
there are plenty of other member variables inside the same structure
that use the underscore style.


> > +/* TODO: Implement new function SSL_RevealCertList
> > + *       that returns all certificates sent by the server.
> > + *       It will be required to implement tls-ext-multiple-ocsp.
> > + */
> 
> Remove this comment. There is already an open bug in bugzilla for this
> feature.


I don't understand the benefit of removing a comment.
I think it's good to keep it, because I added function name SSL_RevealCertList
at two places in the code. I believe that's a very helpful recording
of knowledge for the later work of getting it implemented.

I shortened the comment and added the bug number 611836.


> > +/* Given PRFileDesc, returns the OCSP information that was sent by the server
> > + * during the handshake.
> > + * We use a list of OCSP responses, to prepare for tls-ext-multiple-ocsp.
> > + * Caller must free data when done.
> 
> Move the application-level documentation to ssl.h.

All functions in sslreveal.c have their documentation next to the implementation.


> Document that the function may only be called from the auth certificate hook
> (or bad cert handler?).

We don't appear to have this limitation.

The function uses locking, it duplicates the data.
It seems it's fine to call this function after the handshake,
until the socket gets destroyed.


(In reply to Brian Smith (:bsmith) from comment #61)
> 
> In my previous comments, I mentioned that it is important that there not be
> any calls to ocsp_*/OCSP_* in the core of libssl. I see that you addeded the
> processCertStatus in response to that. I was unclear in what I meant: it is
> OK to call ocsp_*/OCSP_* in SSL_AuthCertificate, but not elsewhere in
> libssl, because applications for which those calls would be problematic
> (Chrome, Firefox) also need to avoid calling SSL_AuthCertificate for similar
> reasons. So, the complexity added by the processCertStatus callback part is
> unnecessary.


So you've changed your opinion that calling OCSP code from within
SSL_AuthCertificate shall be fine, but what about the
#include dependency?

It was my understanding that such a dependency is considered
undesirable by the chrome guys.

(Sigh. I had already removed this code, until I discovered 
the #include dependency. I added it back.)

The code I've written avoids that dependency
Can we handle comment 46 and comment 47 in a follow up bug?
Maybe Brian can do this work, after the initial step is done?
I don't know if this work is still necessary.
Attached patch Patch v15 (obsolete) (deleted) — Splinter Review
Attachment #618380 - Attachment is obsolete: true
Wan-Teh, thank you very much for attaching the latest patch used by chrome.

I compared my latest patch (something after v14) and the chrome patch.

I copied/merged the various enhancements that you made, they are contained in patch v15.
Attached patch a special subset and reordered variation of v15 (obsolete) (deleted) — Splinter Review
I would like to make it easy for everyone to compare chrome's latest patch and my latest patch.

Therefore I manually created a version of v15, where I rearranged the pieces, so that they are in the same order as in attachment 616669 [details] [diff] [review].

You could download both attachment 616669 [details] [diff] [review] and this attachment and compare them, for example using "tkdiff".
(In reply to Brian Smith (:bsmith) from comment #61)
> 
> Also, in your patch, I didn't see a default implementation of the
> processCertStatus hook. Did I overlook it?


True, all I have so far is this:

+ * An application that uses the default SSL_AuthCertificateHook
+ * and wishes to enable NSS' default processing of received OCSP stapling data
+ * should call
+ *      SSL_ProcessCertStatusHook(fd, CERT_CacheOCSPResponseFromSideChannel)
+ * after each socket construction.
(In reply to Kai Engert (:kaie) from comment #68)
> 
> True, all I have so far is this:
> 
> + * An application that uses the default SSL_AuthCertificateHook
> + * and wishes to enable NSS' default processing of received OCSP stapling
> data
> + * should call
> + *      SSL_ProcessCertStatusHook(fd, CERT_CacheOCSPResponseFromSideChannel)
> + * after each socket construction.


Given that a requirement was
  "no include dependency from libSSL to the rest of NSS"
it's difficult to provide a full default implementation,
since such an implementation would have to live inside libSSL and call
CERT_CacheOCSPResponseFromSideChannel.
Target Milestone: 3.13.5 → 3.14
Attachment #618505 - Flags: review?(wtc)
Comment on attachment 618505 [details] [diff] [review]
Patch v15

We decided that introducing a direct call from libSSL into the OCSP cache is acceptable, and preferable, because it avoids complexity.

I'll attach such a patch soon.
Attachment #618505 - Flags: review?(wtc)
Attached patch Patch v16 (obsolete) (deleted) — Splinter Review
Attachment #618505 - Attachment is obsolete: true
Attachment #618749 - Flags: review?
Comment on attachment 618749 [details] [diff] [review]
Patch v16

In case you wonder why I added this fragment:

>     case client_key_exchange:
> 	if (!ss->sec.isServer) {
> 	    (void)SSL3_SendAlert(ss, alert_fatal, unexpected_message);
> 	    PORT_SetError(SSL_ERROR_RX_UNEXPECTED_CLIENT_KEY_EXCH);
> 	    return SECFailure;
> 	}
>+        rv = ssl3_AuthCertificateIfNeeded(ss, b, length);
>+        if (rv != SECSuccess)
>+            break;
> 	rv = ssl3_HandleClientKeyExchange(ss, b, length);
> 	break;


Without it, the test suite fails if the server requests client authentication.
Depends on: 554369
I ran into a crash with the patch from bug 700701,
when the handshake-callback called SSL_RevealCertStatus.

We hit an assertion related to firstHandshakeLock
caused by the locking in SSL_RevealCertStatus
(I had copied the locking from the chrome patch).

I haven't studied how the locks work in libSSL code,
so I'd welcome advice.

This function should be safe to call any time after the initial handshake has completed,
and inside the lock we just need to copy a buffer.

If the buffer can never change after the first handshake,
we could remove the lock, check that (ss->firstHsDone) is true,
and simply copy the buffer.

Both SSL_RevealCert and SSL_RevealURL use this strategy (no lock).

Function SSL_HandshakeNegotiatedExtension reads (ss->firstHsDone) without locking.
#0  0xb7fff424 in __kernel_vsyscall ()
#1  0x4326798f in raise () from /lib/libc.so.6
#2  0x432692d5 in abort () from /lib/libc.so.6
#3  0xb7d930ec in PR_Assert (s=0xb7ff0e58 "PZ_InMonitor((ss)->firstHandshakeLock) || !ssl_HaveRecvBufLock(ss)", file=0xb7ff0e4c "sslreveal.c", ln=84) at ../../../../pr/src/io/prlog.c:554
#4  0xb7fd500e in SSL_RevealCertStatus (fd=0x807b2d8) at sslreveal.c:84
#5  0x0804bb0b in printSecurityInfo (fd=0x807b2d8) at tstclnt.c:156
#6  0x0804bb91 in handshakeCallback (fd=0x807b2d8, client_data=0x0) at tstclnt.c:171
#7  0xb7fc3583 in ssl3_FinishHandshake (ss=0x807bbf8) at ssl3con.c:9061
#8  0xb7fc3420 in ssl3_HandleFinished (ss=0x807bbf8, 
    b=0x808367c "x\267\224]\256ph\250\305c\234$\246@\360\257\246\362\227\035\tE\360kc\220\203\325x\355\211ԇ< WeE\367\021\363\347\365\070L\256ˆ\325\375f\244+\"MJ\343\356K\016\251_VA\b\270", <incomplete sequence \321>, length=12, 
    hashes=0xbfffd92c) at ssl3con.c:9037
#9  0xb7fc3e3a in ssl3_HandleHandshakeMessage (ss=0x807bbf8, 
    b=0x808367c "x\267\224]\256ph\250\305c\234$\246@\360\257\246\362\227\035\tE\360kc\220\203\325x\355\211ԇ< WeE\367\021\363\347\365\070L\256ˆ\325\375f\244+\"MJ\343\356K\016\251_VA\b\270", <incomplete sequence \321>, length=12)
    at ssl3con.c:9269
#10 0xb7fc4098 in ssl3_HandleHandshake (ss=0x807bbf8, origBuf=0x807be60) at ssl3con.c:9343
#11 0xb7fc4f8d in ssl3_HandleRecord (ss=0x807bbf8, cText=0xbfffdae0, databuf=0x807be60) at ssl3con.c:9773
#12 0xb7fc68bd in ssl3_GatherCompleteHandshake (ss=0x807bbf8, flags=0) at ssl3gthr.c:361
#13 0xb7fc6a54 in ssl3_GatherAppDataRecord (ss=0x807bbf8, flags=0) at ssl3gthr.c:404
#14 0xb7fd6676 in DoRecv (ss=0x807bbf8, out=0xbfffdc5c "GET / HTTP/1.0\r\n\r\n", len=4000, flags=0) at sslsecur.c:535
#15 0xb7fd7a07 in ssl_SecureRecv (ss=0x807bbf8, buf=0xbfffdc5c "GET / HTTP/1.0\r\n\r\n", len=4000, flags=0) at sslsecur.c:1143
#16 0xb7fe0a84 in ssl_Recv (fd=0x807b2d8, buf=0xbfffdc5c, len=4000, flags=0, timeout=4294967295) at sslsock.c:2003
#17 0xb7d8f31f in PR_Recv (fd=0x807b2d8, buf=0xbfffdc5c, amount=4000, flags=0, timeout=4294967295) at ../../../../pr/src/io/priometh.c:188
#18 0x0804eb83 in main (argc=14, argv=0xbfffeee4) at tstclnt.c:1442
(In reply to Kai Engert (:kaie) from comment #73)
> I haven't studied how the locks work in libSSL code,
> so I'd welcome advice.

Some functions in libssl are only (thread-)safe to call when they are called from callback functions. See bug 707394 where I started to document this.

In the case of SSL_RevealCertStatus, we should document that it must only be called within the cert auth hook, and remove the locking.
> In the case of SSL_RevealCertStatus, we should document that it must only be
> called within the cert auth hook

Maybe we can allow the handshake-callback, too?
Attached patch Patch v20 [multiple bugs] [includes testing] (obsolete) (deleted) — Splinter Review
Patch v20 combines (this) bug 360420
and bug 554369 and bug 700701 and bug 749714.

It adds basic OCSP sapling testing against known servers,
the test suite passes.

This patch is for backup and demonstration purposes, if you want to try it yourself.
We should probably continue to review the patches individuall, 
and I'll attach (later) an updated subset of the patch.
(In reply to Kai Engert (:kaie) from comment #76)
> > In the case of SSL_RevealCertStatus, we should document that it must only be
> > called within the cert auth hook
> 
> Maybe we can allow the handshake-callback, too?

Let's consider bug 731478. If we need to cache the intermediate CA certificates in the session cache, shouldn't we also cache the OCSP data in the session cache too? It seems we should do so, if the purpose of bug 731478 is to allow an application to have access to all the data needed to re-validate the certificate in a resumption handshake. In that case, it does make sense to do the memory allocation and save the OCSP data in ss->sec.ci.sid->cert_status.

I guess that because Wan-Teh filed bug 731478, Chromium must need to have the information needed to do cert re-verification in its handshake callback, since that is the only callback that gets called during resumptions.

So, it seems like, for Chromium's sake, at least, we should document support for accessing the OCSP data in the handshake callback in addition to the auth certificate hook.
When reviewing the patches, I noticed that *both* patches are more complicated than necessary. I believe that Adam's patch is too optimized for minimizing the diff to mainline CVS at the cost of readability/understandability. Kai's patch is too complicated regarding the refactoring of ssl3_HandleCertificate.

Also, I am concerned about the additional complexity in Kai's patch to support multi-stapling. Without actually implementing the entire multi-stapling spec, it is hard to tell whether that extra complexity is worthwhile. I talked with EKR and AFAICT, we both agree that multi-stapling is a good idea but that it isn't a high priority because most browsers have alternate mechanisms for dealing with intermediate certificate revocations for the vast majority of websites. Also, I suspect that there will still be a lot of changes to the multi-stapling spec itself as it is still going through standardization. IMO, it is better to experiment with implementing it in its own bug, bug 611836. I think that most of Kai's patch will be useful for fixing that bug.

That means it makes sense to use a patch more like Adam's. However, I think that Kai's restructuring of ssl3_HandleCertificate is much better than Adam's, except for the replacement of gotos with return XXX_error() calls.

Since Kai asked me to make any significant changes to his patch myself, I have done a three-way merge of the two patches and done further simplifications; I believe the merged, simplified patch will be more smaller, more obviously correct, and easier to maintain than either of the current ones. I will finish it up tomorrow, after I verify that it is simple to modify it to handle the changes that will be necessary as part of bug 731478.

FWIW, I spoke with EKR about this and he is in favor of taking Google's patch as-is, or with minor changes. I am also in favor of doing that, if we do not agree that my simplified version of Kai's changes to ssl3_AuthCertificate are not worthwhile.

Wan-Teh objected to the signature of SSL_GetStapledOCSPResponse in Adam's patch. In that patch, the caller of SSL_GetStapledOCSPResponse is responsible for providing a buffer for libssl to copy the OCSP response data into, and the function uses a relatively uncommon (for libssl) pattern for doing that copying. AFAICT, it is not really necessary to do any copying at all, if we allow SSL_GetStapledOCSPResponse to return a pointer to the internal SECItem that holds the response data, and document that the application must not access that SECItem outside of the callback function that it is calling SSL_GetStapledOCSPResponse in. This would be more efficient. The signature would be:

   const SECItem * SSL_GetStapledOCSPResponse(PRFileDesc * fd);

Does this seem reasonable to people?

Kai's patch fixes some bugs in CERT_CacheOCSPResponseFromSideChannel and provides a default OCSP stapling implementation in SSL_AuthCertificate. I suggest that we handle the fixes to CERT_CacheOCSPResponseFromSideChannel in separate bugs. I also suggest that we factor out his implementation of SSL_AuthCertificate into a separate bug, because I think there are some unresolved questions about the how OCSP stapling is really supposed to work, that require more clarification (documentation and testing) of the implemented semantics.

One high-level issue that came up: the CertificateStatus extension is not specific to OCSP. Do we want to hard-code OCSP into the API, or do we want the API to be more flexible and allow other status types? In the interest of expediency, and keeping with the YAGNI principle, it is better to just keep the hard-coding of a single OCSP response in the API, like both patches do. But, we should note that is a conscious decision that may result in a new more flexible CertificateStatus API later, that would make this OCSP-specific API somewhat redundant.
Assignee: kaie → nobody
" Also, I suspect that there will still be a lot of changes to the multi-stapling spec itself as it is still going through standardization. IMO, it is better to experiment with implementing it in its own bug, bug 611836. I think that most of Kai's patch will be useful for fixing that bug."

To elaborate on this topic, we *just* accepted it as a WG item, so I expect it will undergo the usual torturing by the WG and we won't have it in final form (which won't be the same as now) for at least a year. I think having regular stapling now would be a good thing.
Whiteboard: [review patch v16]
Attached patch patch: incremental build fixes (obsolete) (deleted) — Splinter Review
these obvious syntax fixes are necessary to build on windows
Attached patch Part 1 - Cleanup of agl's patch (obsolete) (deleted) — Splinter Review
NOTE: I think there is a bug in AGL's original patch, and in this modified patch too: In the case of a renegotiation, where server stapled an OCSP response in the previous handshake, but did not staple an OCSP response in the current handshake, I believe that the OCSP response from the previous handshake still gets reported from SSL_GetStapledOCSPResponse/SSL_PeerStapledOCSPResponse. This would be confusing especially in the case where the server sent a different certificate chain in the current handshake than it sent in the previous handshake.

In the next comment, I will use the Splinter interface to explain the changes I made to AGL's patch. I recommend using the splinter tool to read the comments.

In the next patch, I will merge a simplified version of Kai's changes to the state machine, which avoids the problem of the OCSP response from the previous handshake on a connection being reported for subsequent handshakes that do not staple a response.
Assignee: nobody → bsmith
Attachment #622676 - Flags: review?(wtc)
It is easier to review these changes using this interdiff, which is slightly different than the version that I just posted.
Attachment #622676 - Attachment is obsolete: true
Attachment #622680 - Flags: review?(wtc)
Attachment #622676 - Flags: review?(wtc)
Comment on attachment 622680 [details] [diff] [review]
Part 1 - Cleanup of AGL's patch -- interdiff from AGL's patch in attachment 616669 [details] [diff] [review]

Review of attachment 622680 [details] [diff] [review]:
-----------------------------------------------------------------

Like I mentioned above, it will probably be easier to review this interdiff with the comments I added via Splinter.

Also, like I mentioned above, I think there is a bug regarding renegotiations, but it is different than what I said above:

ssl3_HandleCertificateStatus contains this check:

    if (!ss->ssl3.hs.may_get_cert_status ||
	ss->ssl3.hs.ws != wait_server_cert ||
	!ss->ssl3.hs.pending_cert_msg.data ||
	ss->ssl3.hs.cert_status.data) {
	errCode = SSL_ERROR_RX_UNEXPECTED_CERT_STATUS;
	desc = unexpected_message;
	goto alert_loser;
    }

ss->ssl3.hs.cert_status.data will be non-NULL in a renegotiation that has a CertificateStatus that occurs after a previous handshake contained a CertificateStatus, because, as far as I can tell, ss->ssl3.hs.cert_status.data is not cleared between handshakes. If we were to remove this check, then we would run into the problem where a CertificateStatus from the previous handshake could get carried over to the next handshake, and we would have a memory leak. So, AFAICT, the simplest correct solution is to free cert_status between handshakes on a connection. (In the next patch, I do something different than that.)

::: ../ocsp-agl/mozilla/security/nss/lib/ssl/ssl.def
@@ +145,5 @@
>  ;+    global:
>  DTLS_GetTimeout;
>  DTLS_ImportFD;
>  SSL_ExportKeyingMaterial;
> +SSL_PeerStapledOCSPResponse;

I renamed the function to make it clearer that the OCSP response returned is the peer's OCSP response. If we implement OCSP stapling on the server side, then there may be a SSL_LocalStapledOCSPResponse. This naming follows the convention set by SSL_PeerCertificate/SSL_LocalCertificate.

::: ../ocsp-agl/mozilla/security/nss/lib/ssl/ssl.h
@@ +416,2 @@
>   */
> +SSL_IMPORT const SECItem * SSL_PeerStapledOCSPResponse(PRFileDesc *fd);

Extended documentation and simplified calling convention. Most callers are probably going to pass the response to CERT_CacheOCSPResponseFromSideChannel, so there's no need to do any allocations or copying.

Returning const SECItem* helps ensure that the caller doesn't try to free the SECItem.

::: ../ocsp-agl/mozilla/security/nss/lib/ssl/ssl3con.c
@@ +8194,5 @@
>  	goto format_loser;
>      }
>  
>      if (SECITEM_AllocItem(NULL, &ss->ssl3.hs.cert_status, length) == NULL) {
> +	return SECFailure;

replaced spaces with tabs.

@@ +8202,5 @@
>  
>      return SECSuccess;
>  
>  format_loser:
> +    return ssl3_DecodeError(ss);

The bad_certificate_status_response alert is supposed to be sent when there is a problem with parsing/verifying the OCSP response, not when there is a problem parsing the CertificateStatus message.

@@ +9181,5 @@
>  		(void)ssl_MapLowLevelError(SSL_ERROR_RX_UNEXPECTED_CERTIFICATE);
>  		return SECFailure;
>  	    }
>  	    if (SECITEM_AllocItem(NULL, &ss->ssl3.hs.pending_cert_msg,
> +				  length) == NULL) {

replaced spaces with tabs.

::: ../ocsp-agl/mozilla/security/nss/lib/ssl/ssl3ext.c
@@ -631,5 @@
>  {
> -    /* If we didn't request this extension, then the server may not echo it. */
> -    if (!ss->opt.enableOCSPStapling)
> -	return SECFailure;
> -

ssl3_HandleHelloExtensions already does this check via ssl3_ClientExtensionAdvertised:

http://mxr.mozilla.org/mozilla/source/security/nss/lib/ssl/ssl3ext.c#1458

::: ../ocsp-agl/mozilla/security/nss/lib/ssl/sslimpl.h
@@ +799,1 @@
>      SECItem               cert_status; /* an OCSP response */

Whitespace changes.

::: ../ocsp-agl/mozilla/security/nss/lib/ssl/sslsock.c
@@ -1796,4 @@
>      }
>  
> -    ssl_Get1stHandshakeLock(ss);
> -    ssl_GetSSL3HandshakeLock(ss);

We cannot acquire these locks in this function because this function will usually be called in the authenticate certificate callback, and we do that with these locks already held.

@@ -1805,5 @@
> -	*len = ss->ssl3.hs.cert_status.len;
> -	PORT_Memcpy(out_data, ss->ssl3.hs.cert_status.data, todo);
> -    } else {
> -	*len = 0;
> -    }

Simplified calling convention.
I will explain the changes using Splinter in the next comment.
Attachment #622684 - Flags: review?(wtc)
Comment on attachment 622684 [details] [diff] [review]
Part 2 - Simplify state machine changes using logic from Kai's patch (simplified) and move OCSP response to session - Interdiff from attachment 622680 [details] [diff] [review]

Review of attachment 622684 [details] [diff] [review]:
-----------------------------------------------------------------

::: ../ocsp-agl/mozilla/security/nss/lib/ssl/ssl3con.c
@@ +8174,2 @@
>  
> +    PORT_Assert(ss->ssl3.hs.ws == wait_certificate_status);

ssl3_HandleHandshakeMessage now ensures that ssl3_HandleCertificateStatus is only called when appropriate.

@@ +8189,4 @@
>  	return SECFailure;
>      }
> +    ss->sec.ci.sid->peerCertStatus.type = siBuffer;
> +    PORT_Memcpy(ss->sec.ci.sid->peerCertStatus.data, b, length);

Moved the OCSP response to the session object, so that it can be retrieved for resumed handshakes too, and because the session object automatically gets reset during a full handshake renegotation, so we don't have to worry about the state carrying over from one handshake on a connection to the next one.

@@ +8355,5 @@
> +    }
> +    
> +    return rv;
> +
> +ambiguous_err:

The actual change (hard to see from this diff) is that we've hoisted the code for ssl3_AuthCertificate out of this function. All the error handling logic here is unchanged from the original logic, except for one exception noted below.

I believe that ssl3_AuthCertificate as a standalone function separate from the parsing of the Certificate message will be useful in later patches. And, doing things this way lets us avoid needing the pending_cert_msg buffer.

@@ +8376,5 @@
> +
> +alert_loser:
> +    (void)SSL3_SendAlert(ss, alert_fatal, desc);
> +
> +loser:

See note below about what was removed from this label.

@@ -8493,5 @@
> -
> -    if (ss->sec.peerCert != NULL) {
> -	CERT_DestroyCertificate(ss->sec.peerCert);
> -	ss->sec.peerCert = NULL;
> -    }

There is no need to clean up these things here because we will do so when the socket is closed.
Comment on attachment 619227 [details] [diff] [review]
Patch v20 [multiple bugs] [includes testing]

In order to allow for early testing of this proposed patch, I've made binaries available at http://flowerbeetle.org
Target Milestone: 3.14 → 3.14.1
Blocks: 803582
I have voted in favor of this bug. I understand that proposed patch needs to be reviewed and approved, but would it help any if I were to get more people to vote?  I can email members of the CA/B Forum, etc.  If that would not be helpful, what other things could I do to move this forward? Thanks, Ben
Voting is irrelevant if the developers block each other, because they cannot reach agreements.
I have emailed Wah-Teh to ask him to comment on the best way to make progress here.

Gerv
We have an unclear mix of attachments.

I understand Brian's proposal involves the following set of patches attached on top of each other:
- Adam's attachment 616669 [details] [diff] [review]
- Brian's attachment 622680 [details] [diff] [review]
- Brian's attachment 622684 [details] [diff] [review]

The latter patches even remove stuff that was added by the first one, which makes it very difficult to review.

(I think contributors should do better when asking for reviews, and attach consolidated patches.)

I just spent one hour and, one after the other, manually merged the bitrotted patches into the latest NSS trunk. I'm attaching it here. (It builds, but I haven't tested it.) I'm marking the old patches as obsolete. 

Brian, feel free to re-request review on this consolidated patch from Wan-Teh.

I don't like this patch, because it's incomplete. It makes no attempt to inject the OCSP data received into the OCSP cache. I think it's inappropriate to require each NSS based application to handle that injection on their own. NSS should be a comfortable toolkit with default functionality, not require applications to do a lot of stuff by foot.

(The patch that I had proposed implements that injection.)
Attachment #622680 - Attachment is obsolete: true
Attachment #622684 - Attachment is obsolete: true
Attachment #622680 - Flags: review?(wtc)
Attachment #622684 - Flags: review?(wtc)
The following decisions should be made by the NSS team in tomorrow's NSS conference call:

1:
(a) do what Brian suggests, and strictly focus on single-stapling for now, in order to use simpler patches
(b) do what Kai suggests, and use the multi-stapling preparation code that Kai has already implemented in the attached patches


2:
Do OCSP cache injection as part of this work from within libSSL.
(a) no, as omitted by Brian
(b) yes, as proposed by Kai in my latest patch


In addition, given the 5 months of review stalling that we've seen in this bug, I'd like to ask for a clarification:

3:
Given that Google Chrome has already implemented this feature using a locally patched version of NSS, are Google employees still interested in getting this feature to work in the shared NSS code (e.g. by reviewing patches) - or should we rather try to get this done without their help?

Thanks
Whiteboard: [review patch v16]
(In reply to Kai Engert (:kaie) from comment #92)
> The following decisions should be made by the NSS team in tomorrow's NSS
> conference call:
> 
> 1:
> (a) do what Brian suggests, and strictly focus on single-stapling for now,
> in order to use simpler patches
> (b) do what Kai suggests, and use the multi-stapling preparation code that
> Kai has already implemented in the attached patches

FYI, I doubt that Firefox will ever offer the multi-stapling extension to servers because we are already planning different optimizations for revocation checking of intermediates (as discussed in the NSS workweek), and the currently-proposed multi-stapling extension would interfere with those optimizations and other ones. For the same reason, I doubt that Chromium would be better off offering the multi-stapling extension either. And, accordingly, I am not sure any servers will ever implement it, if browsers do not choose to send it.

> Do OCSP cache injection as part of this work from within libSSL.
> (a) no, as omitted by Brian
> (b) yes, as proposed by Kai in my latest patch

It is important that the application have direct access to the OCSP responses, and that the application can avoid using the NSS OCSP cache if it doesn't want to use it. In particular, it seems likely that Firefox may (eventually) not use the NSS OCSP cache at all, because it would need to implement its own (persistent) cache for other reasons.

As discussed above, the best solution is to make the SSL certificate auth hook responsible for dealing with the OCSP responses. Then, NSS's default certificate hook can put those responses in the OCSP cache if it wants, though (again, as discussed above) there are problems with doing so.

> In addition, given the 5 months of review stalling that we've seen in this
> bug, I'd like to ask for a clarification:

Rather, progress has been blocked on me updating the patches.

Note that I also raised some concern about whether all of these patches are doing the correct thing when an OCSP response is received during a renegotiation handshake. IIRC, there is such a problem with the patches that were posted before mine. In my patch, I attempted to addressed that issue, but I did so by storing the OCSP responses in the session state; IIRC, Wan-Teh had asked that we avoid storing the OCSP responses in the session cache.
Brian: if there are onlookers interested in helping this patch make progress, what can they do? Is there testing they can perform, perhaps using a copy of Flowerbeetle? Is it worth anyone else taking up the work of improving the patch, or is it close enough to done that this is unlikely to help?

Gerv
(In reply to Gervase Markham [:gerv] from comment #94)
> Brian: if there are onlookers interested in helping this patch make
> progress, what can they do? Is there testing they can perform, perhaps using
> a copy of Flowerbeetle? Is it worth anyone else taking up the work of
> improving the patch, or is it close enough to done that this is unlikely to
> help?

Previously, Kai asked me to implement my review comments on his patch myself, which I did. But, like Kai mentioned, there are some things that need to be agreed on that might require modifications of that patch.

For Gecko's use, we need to write the PSM-side code and we need to write the xpcshell tests that verify that Gecko is implementing OCSP stapling correctly. I'd speculate that fixing bug 466524 and bug 663733 would be very helpful for writing said tests. Having those tests already running and passing before we do the final reviews would also make the reviews easier. For example, having a test that we do the right thing during renegotation handshakes would be especially useful. Depending on who does what, it may also be better to get the changes in bug 754365 landed before attempting the PSM side of this.
I'm not interested in work that requires PSM side pieces. The PSM pieces should be discussed and implemented elsewhere.

This is an NSS level bug. I am interested to see a solution that is complete inside NSS, thereby allowing any NSS based application to make use of the OCSP stapling feature.
This NSS level bug shouldn't be blocked by Gecko/PSM requirements, be it PSM code or be it Gecko level testing. The Mozilla/PSM/Gecko teams can start writing their tests after we have finished an initial implementation in NSS.
Thanks.  Could I ask that Gerv be prepared to provide a status update to the CA/Browser Forum during a telephone call on Thursday?  Thanks again.
Attached patch Patch v24 (obsolete) (deleted) — Splinter Review
For what it's worth...

I've also merged my own patches (which include the work from others) to latest trunk. (Includes a fix for an incorrect pointer that I discovered today.)

This matches what I described earlier as patch v20 (based on v16).

The test script passes, including the new tests against some known OCSP stapling servers. (Note the expired+revoked cert donated by a commercial CA for testing, which runs on my test server, will expire on December 8, after which we will have reduced CA testing coverage, unless we get new commercial certificate contributions.)
Attachment #618507 - Attachment is obsolete: true
Attachment #618749 - Attachment is obsolete: true
Attachment #619227 - Attachment is obsolete: true
Attachment #621944 - Attachment is obsolete: true
Attachment #618749 - Flags: review?
Attachment #674688 - Flags: review?
FWIW - the CAB/Forum also assigned a new EKU OID for "mustStaple". It is 2.23.140.16.1.  (2.23.140 is the "ca-browser-forum" and 16 is "ocsp").  Last I read, the EKU would mean "only use this key when it's accompanied by some stapled certificate status information."
OK, here's my take:

My assumptions.

1) IIRC only single stapling has been completely defined in the RFC.
2) There is a proposal for multi-stapling
3) Mozilla's current trajectory requires only single stapling.

My conclusions:

1) OCPS stapling patch should not be held up waiting for multi-stapling.
2) In the long run NSS should support both single stapling and multi-stapling, even if Mozilla only cares about the former (for now).

If we have patches that are complete for single stapling and does not interfere with the multi-stapling spec, we should get those patches in. If we have a choice between a set of complete single stapling patches and a complete set of multi-stapling patches, we should check in the former.

In all cases complete means both server and client side need to be implemented... at least to the point that we can run automated testing on the feature.

bob
(In reply to Robert Relyea from comment #101)
> 1) OCPS stapling patch should not be held up waiting for multi-stapling.

Nobody proposed to wait for multi stapling.

My patches use a "list" instead of "single fixed item" in the APIs, thereby offering a simple way to be "prepared" for a potential later multi stapling implementation. That's all there is to say.

But Brian doesn't like this preparation and suggests to undo that work.

In my opinion, this preparation isn't harmful. But if that were the only area of dispute, I'd be fine to revert code to an always-single-item API.


> 2) In the long run NSS should support both single stapling and
> multi-stapling, even if Mozilla only cares about the former (for now).
> 
> If we have patches that are complete for single stapling and does not
> interfere with the multi-stapling spec,
> we should get those patches in.


I believe that's what the attached patch v24 does.

The client side code is complete and does cache injection.

It full supports single stapling.

It doesn't attept to implement multi stapling in any way, except by "being prepared" for potential future larger sets of OCSP data.



> If
> we have a choice between a set of complete single stapling patches and a
> complete set of multi-stapling patches, we should check in the former.


We currently have a choice between 

(a) incomplete single stapling as implemented by Adam/Brian, because that code is missing cache injection and requires the application to do additional work

and

(b) complete single stapling as put together by me (using earlier contributions), including basic automated testing of the client side, against known stapling servers.


> In all cases complete means both server and client side need to be
> implemented... at least to the point that we can run automated testing on
> the feature.


I'm not aware of any server side code in NSS for OCSP stapling, because we currently don't have any NSS code to produce OCSP responses.

We have client testing against known external servers.
Comment on attachment 674688 [details] [diff] [review]
Patch v24

Review of attachment 674688 [details] [diff] [review]:
-----------------------------------------------------------------

::: mozilla/security/nss/lib/ssl/ssl3con.c
@@ +8386,5 @@
> +static SECStatus ambiguous_err(sslSocket *ss, PRBool isTLS, SSL3AlertDescription desc);
> +static SECStatus decode_loser(sslSocket *ss, PRBool isTLS, SSL3AlertDescription desc, int errCode);
> +static SECStatus alert_loser(sslSocket *ss, SSL3AlertDescription desc, int errCode);
> +static SECStatus loser(sslSocket *ss, SSL3AlertDescription desc, int errCode);
> +static SECStatus check_loser(SECStatus rv, sslSocket *ss, SSL3AlertDescription desc);

It seems like this is very similar to the patch I r-d a long time ago. Please see the previous review comments about fixing things like this unnecessary splitting up of ssl3_HandleCertificate into many functions. It isn't necessary and it makes the code harder to understand, especially because there are similarly-named functions that work slightly differently in this file, and also because these are not general-purpose functions. The error handling logic for the change to support OCSP stapling can be made very simple when this is undone.

it seems most or all of my previous review comments still seem to apply to this patch. Please address them before you post a patch for review. Thanks.
Attachment #674688 - Flags: review?
Brian, I know that you don't like my patch, and I didn't ask you for review. I simply attached the patch to demonstrate that we have working code, and maybe other people will like it.
Besides, it's not strictly my patch. It combines a lot of code from the other bugs, too. It's an attempt go get something done.

Brian, when I suggested that you might rewrite the patch, I had hoped that you would produce a result that is similarly complete as the patch that I attached. I think you didn't do that. Therefore I think patch v24 is better than yours.

Feel free to use as much from patch v24 as you like, combine it with your own preferred code, and produce something that is similarly complete and that you like better.

But without a similarly complete alternative patch, I propose to take what we have, get it reviewed and checked in, and go from there. Let's not wait another couple of months for a new patch, simply because Brian is unhappy about stylistic details.

And instead of this continuing and stuck debate between Brian and me, it's time to hear opinions and get review comments from third parties. Regards.
Kai, it's not clear from your comment whether or not you want a detailed review, but I interpreted to be that you are not looking for a review at this time.

If you're asking for a preference between these two patches, I support Brian's change. I believe that we should keep the changes as minimal and as well-defined as possible, and build upwards, as I view Brian's patch as doing, rather than implementing a much larger area, but only half-implemented in some areas, as I see patch v24 doing.

This is nothing personal, I just believe that a smaller patch is easier to reason about, easier to spot bugs in, and easier to test. The fact that Brian's patch is 1/4 the size of v24 I believe highlights the amount of code changes that exist in v24. I agree with Brian that I do not think it's appropriate to tie clean up to this patch, and I think that's one thing holding it back.
Guys, I really feel misunderstood.

Rejecting my patch based on the argument of size is funny.

My patch is larger because it contains additional functionality and testing.

How is Brian's patch actually enabling NSS applications to make use of the OCSP stapling data? It doesn't!

My patch implements OCSP cache injection in the default authentication code (and that requires some code for duplicating the cert-id data structure).

Also, all new code is supposed to contain testing, right? Brian's patch doesn't have any real testing!

Brian's patch only has a dummy test in tstclnt, that simply tests whether ANY data has been received in the stapling message. I don't call that a functional test.

A big set of my patch is an enhancement to tstclnt that actually tests whether the OCSP cache injection has worked, and whether the data was a good status or was a revoked status or invalid data. And my patch also extends the OCSP test scripts to invokes this testing.

Furthermore, my patch adds an implementation to ssltap that can dump the contents of the stapled data. That's trivial code fully contained in the ssltap, and shouldn't be seen as a disadvantage for reviewing the rest.

The real dispute here is that Brian doesn't like the portions in the core SSL protocol implementation, that I had put together based on the earlier code contributions.

But I'm making a step towards you. I appreciate that Brian has written the code in the way he prefers (even though I had to merge three patches before I was able to understand what he did).

I hereby give up on my SUBSET of my patch for the actual SSL protocol enhancement. I removed these parts from my patch, and I replaced it with Brian's patch.

Another step towards you, in order to simplify the patch, I give up argueing about the "multi stapling preparation", and I have removed the code that duplicates the secitem-array, and I'm accepting Brian's interface that takes only a single secitem with a single response. (Which IMHO is a bit shortsighted, because using a list and the array duplication code is trivial, but so be it, if it helps move this forward.)

I'm attaching a new patch v26 that is said combination, uses Brian's SSL protocol code, plus the code to enable testing.
Attachment #674627 - Attachment is obsolete: true
Attachment #674688 - Attachment is obsolete: true
Attachment #675068 - Flags: review?
Attachment #674627 - Attachment is obsolete: false
Output created by new tests:

ocsp.sh: startssl valid, supports OCSP stapling
ocsp.sh: #1596: startssl valid, supports OCSP stapling  - PASSED
ocsp.sh: startssl revoked, supports OCSP stapling
ocsp.sh: #1597: startssl revoked, supports OCSP stapling  - PASSED
ocsp.sh: comodo trial test expired revoked, supports OCSP stapling
ocsp.sh: #1598: comodo trial test expired revoked, supports OCSP stapling  - PASSED
ocsp.sh: thawte valid, supports OCSP stapling
ocsp.sh: #1599: thawte valid, supports OCSP stapling  - PASSED
ocsp.sh: thawte revoked, supports OCSP stapling
ocsp.sh: #1600: thawte revoked, supports OCSP stapling  - PASSED
ocsp.sh: live valid, supports OCSP stapling
ocsp.sh: #1601: live valid, supports OCSP stapling  - PASSED
ocsp.sh: startssl valid, don't support OCSP stapling
ocsp.sh: #1602: startssl valid, don't support OCSP stapling  - PASSED
ocsp.sh: cacert untrusted, don't support OCSP stapling
ocsp.sh: #1603: cacert untrusted, don't support OCSP stapling  - PASSED
Just documenting what I see as I go along, Brian's patch is missing the check-for-max-sane-cert-status-length that you have suggested yourself in comment 45 (and was contained in my patch v24).

So, Brian, since you have asked that we drop my parts of the patch that implements the core protocol, you might want to double check if you have missed anything else that I had already fixed...

(I'll soon give an update on what has been discussed today, so please wait a little. This comment doesn't mean a change of plan.)
Attached patch p29-agl-616669 [1st of 5] (obsolete) (deleted) — Splinter Review
Attached patch p29-bsmith-20120510-622680-622684 [2nd of 5] (obsolete) (deleted) — Splinter Review
Attached patch p29-kaie-injection-and-agl-554369 (obsolete) (deleted) — Splinter Review
Attached patch p29-kaie-tools-700701-and-tests [4th of 5] (obsolete) (deleted) — Splinter Review
Attached patch p29-kaie-prepare-multi (obsolete) (deleted) — Splinter Review
Comment on attachment 622680 [details] [diff] [review]
Part 1 - Cleanup of AGL's patch -- interdiff from AGL's patch in attachment 616669 [details] [diff] [review]

Re-requesting review on this patch, as that's what has been agreed on today.

Brian's patch.
Attachment #622680 - Flags: review?(wtc)
Attachment #622680 - Attachment is obsolete: false
Comment on attachment 622684 [details] [diff] [review]
Part 2 - Simplify state machine changes using logic from Kai's patch (simplified) and move OCSP response to session - Interdiff from attachment 622680 [details] [diff] [review]

Re-requesting review on this patch, as that's what has been agreed on today.

Brian's patch.
Attachment #622684 - Attachment is obsolete: false
Attachment #622684 - Flags: review?(wtc)
Comment on attachment 674627 [details] [diff] [review]
Brian's/Adam's 616669+622680+622684 combined and merged to trunk

obsoleting this patch, as Brian/Wan-Teh want to perform the review based on the older, original patches.

Brian, FYI, your patches are attached in combined form, merged to trunk, in the attachment named p29-bsmith-20120510-622680-622684.

If you are going to address change requests from Wan-Teh, then you might find attachment p29-agl-616669 helpful, which is the version of AGL's patch that you had used as the base for your patches, merged to latest trunk.


It would be nice if we can get this reviewed and done in the short term, saving me from future manual merging. When producing new patches, it would be good to have the other following p29-* attachments in mind, to avoid merge conflicts. Thank you.
Attachment #674627 - Attachment is obsolete: true
Attached patch p29 - NOT FOR REVIEW (obsolete) (deleted) — Splinter Review
This is simply the combination of all five p29* patches applied on top of each other.
Attachment #675068 - Attachment is obsolete: true
Attachment #675068 - Flags: review?
Blocks: 436414
Comment on attachment 675275 [details] [diff] [review]
p29-kaie-prepare-multi

(noticed a copy/paste mistake, too many plus signs (+) in nssutil.def)
No longer depends on: 663733
Blocks: 811331
Comment on attachment 675273 [details] [diff] [review]
p29-kaie-injection-and-agl-554369

Review of attachment 675273 [details] [diff] [review]:
-----------------------------------------------------------------

Kai: I have some questions and suggested changes for
this patch. It is best to view my comments in Splinter
Review. Thanks.

::: security/nss/lib/certhigh/ocsp.c
@@ +781,1 @@
>      *certIDWasConsumed = PR_FALSE;

Why do we need to allow certIDWasConsumed to be NULL?

@@ +790,5 @@
> +            myCertID = certID;
> +            *certIDWasConsumed = PR_TRUE;
> +        }
> +        else {
> +            myCertID = CERT_DupCertID(certID);

Nit: the "else" line should be formatted like
        } else {

It is strange that myCertID is a duplicate of certID
only if certIDWasConsumed is NULL. Why?

Why do we set *certIDWasConsumed to PR_TRUE early, before
we know ocsp_CreateCacheItemAndConsumeCertID has succeeded?

@@ +803,1 @@
>                                                    &cacheItem);

Do we need to destroy myCertID if it is a duplicate of
certID (i.e., if certIDWasConsumed is NULL)?

@@ +808,4 @@
>      }
>      if (single) {
> +        PRTime thisUpdate;
> +        rv = DER_GeneralizedTimeToTime(&thisUpdate, &single->thisUpdate);

Is it OK for this DER_GeneralizedTimeToTime to fail?
I guess that just means the response has an empty
thisUpdate field?

@@ +818,5 @@
> +                PR_ExitMonitor(OCSP_Global.monitor);
> +                return rv;
> +            }
> +        } else {
> +            OCSP_TRACE(("Not caching responce because the cache is newer"));

Typo: responce => response

Another reason we don't cache the response is that the
DER_GeneralizedTimeToTime call failed. This trace message
probably should say something like:
  Not caching response because the response is not newer than the cache

@@ +824,5 @@
>      } else {
> +        if (cacheItem->certStatusArena) {
> +            PORT_FreeArena(cacheItem->certStatusArena, PR_FALSE);
> +            cacheItem->certStatusArena = NULL;
> +        }

I suggest we do this after setting cacheItem->missingResponseError,
just in case the PORT_FreeArena call may modify the
current error code.

@@ +1786,5 @@
> +    }
> +
> +    arena = PORT_NewArena(DER_DEFAULT_CHUNKSIZE);
> +    if (!arena)
> +        goto loser;

We should return NULL here, because the code under the
'loser' label seems to assume 'arena' is not NULL.

@@ +1795,5 @@
> +
> +#define DUPHELP(element) \
> +    if (src->element.data) {  \
> +        SECITEM_CopyItem(arena, &dest->element, &src->element); \
> +        if (!dest->element.data) \

It is better to check the return value of SECITEM_CopyItem
rather than !dest->element.data.

@@ +1810,5 @@
> +    DUPHELP(issuerMD2NameHash)
> +    DUPHELP(issuerSHA1KeyHash)
> +    DUPHELP(issuerMD5KeyHash)
> +    DUPHELP(issuerMD2KeyHash)
> +    

Nit: remove these white spaces (also on line 1802). You can
see the white spaces in the Splinter Review.

@@ +1818,5 @@
> +loser:
> +    PORT_FreeArena(arena, PR_FALSE);
> +    PORT_SetError(PR_OUT_OF_MEMORY_ERROR);
> +    return NULL;
> +

Nit: delete this blank line.

@@ +4935,5 @@
> +     * side channel.
> +     */
> +
> +    if (!cert)
> +        return SECFailure;

Call PORT_SetError().

@@ +4944,5 @@
>          certID, time, PR_FALSE, /* ignoreGlobalOcspFailureSetting */
>          &rvOcsp, &dummy_error_code);
>      if (rv == SECSuccess && rvOcsp == SECSuccess) {
> +        /* The cached value is good. We don't want to waste time validating
> +         * this OCSP response. This the first column in the table above. */

Add "is" between "This" and "the first column".

@@ +4952,5 @@
>  
> +    /* The logic for caching the more recent response is handled in
> +     * ocsp_CreateOrUpdateCacheEntry, which is called by this function. */
> +    rv = ocsp_CacheEncodedOCSPResponse(handle, certID, cert, time,
> +                                       NULL /* no pwArg */, encodedResponse,

Why do you pass a NULL pwArg argument?

@@ +5161,5 @@
>  
>      *rv_ocsp = ocsp_SingleResponseCertHasGoodStatus(single, time);
>  
>  loser:
> +    /* If single == NULL here then the response was invalid. */

This comment seems to be redundant with the existing comment
on line 5169.

::: security/nss/lib/nss/nss.def
@@ +1012,5 @@
>  ;+       *;
>  ;+};
> +;+NSS_3.14.1 {    # NSS 3.14.1 release
> +;+    global:
> +CERT_DupCertID;

Rename this function CERT_DupOCSPCertID.

I am not sure if it needs to be exported from libnss3.so.
It probably can be a static function inside ocsp.c.  If so,
use lowercase prefix (cert_DupOCSPCertID) and do not declare
it in ocsp.h.

::: security/nss/lib/ssl/sslauth.c
@@ +231,5 @@
> +                                              ss->sec.peerCert,
> +                                              now,
> +                                              &ss->sec.ci.sid->peerCertStatus,
> +                                              arg);
> +    }

I think this file should be moved to the libSSL patch.
It seems unrelated to the changes to ocsp.{h,cc}.
Comment on attachment 675275 [details] [diff] [review]
p29-kaie-prepare-multi

Review of attachment 675275 [details] [diff] [review]:
-----------------------------------------------------------------

I reviewed the SECItemArray portions of this patch.

::: security/nss/lib/util/seccomon.h
@@ +60,5 @@
> +
> +struct SECItemArrayStr {
> +    SECItem *items;
> +    unsigned int len;
> +};

I looked into how NSS is using SECItem arrays. They exist.
You can find them by search for "NewArray" on this MXR query
results page:
http://mxr.mozilla.org/security/ident?i=SECItem

I found that NSS has been defining custom types for SECItem
arrays. For example,

1. CERTDistNames: this is similar to SECItemArray, with a
field for the number of elements in the array:
http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/security/nss/lib/certdb/certt.h&rev=1.57&mark=414,416-417#411

2. CERTOidSequence: this is an array of SECItem* pointers.
The array is terminated with a null pointer:
http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/security/nss/lib/certdb/certt.h&rev=1.57&mark=827-828#825

3. CMMFPOPODecKeyRespContent: this is also a null-terminated
array of SECItem* pointers:
http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/security/nss/lib/crmf/cmmfit.h&rev=1.3&mark=99-100#99

In some cases we just use SECItem** and don't even bother
defining a custom type. For example, the cert_EncodeGeneralNames
function returns a SECItem** value:
http://mxr.mozilla.org/security/ident?i=cert_EncodeGeneralNames

I believe this is why we have been getting by without a
SECItemArray type. To be consistent with existing usage
in NSS, I recommend we define a custom type for an array
of OCSP responses. The struct would contain an internal
PLArenaPool field, where the SECItems and the data they
point to are allocated from.

::: security/nss/lib/util/secitem.h
@@ +93,5 @@
> +extern SECItemArray *SECITEMARRAY_Alloc(PLArenaPool *arena,
> +                                        SECItemArray *array,
> +                                        unsigned int len);
> +extern void SECITEMARRAY_Free(SECItemArray *array, PRBool freeit);
> +extern void SECITEMARRAY_Zfree(SECItemArray *array, PRBool freeit);

If we still want to add a SECItemArray type, I suggest that
we name these functions

  SECITEM_AllocArray
  SECITEM_FreeArray
  SECITEM_ZfreeArray

to avoid adding a new SECITEMARRAY_ prefix.
Comment on attachment 675273 [details] [diff] [review]
p29-kaie-injection-and-agl-554369

Review of attachment 675273 [details] [diff] [review]:
-----------------------------------------------------------------

Wan-Teh, thanks for your review comments. I'll address most of your proposals, see comments below. I hope you will be able to read my repsonses in splinter review, too.

::: security/nss/lib/certhigh/ocsp.c
@@ +781,1 @@
>      *certIDWasConsumed = PR_FALSE;

We have one code path, where the caller isn't able to transfer ownership of the certID object to us for consumption.

This inability to transfer ownership is indicated by setting certIDWasConsumed to NULL.

If we are unable to return a consumption status back to the caller, then we conclude that we aren't allowed to consume ownership, and that's the scenario where we must duplicate the certID object.

The code path is:
  CERT_GetOCSPStatusForCertID -> cert_ProcessOCSPResponse -> 

Because the definition of the old exported API doesn't allow ownership transfer of the certID, we communicate this inability all the way down to function ocsp_CreateOrUpdateCacheEntry, which will duplicate it only if necessary.

I notice I haven't yet enabled this ability. Right now, we'll never update the cache if an external caller calls CERT_GetOCSPStatusForCertID. But with this code infrastructure preparation,  we can enable that later.

I suspect I had created this ability in earlier iterations of this work, as I had another code path with a similar scenario. I don't remember. But given that it can be helpful to enhance the CERT_GetOCSPStatusForCertID, I'd prefer to keep this ability.

If you want, I'll add a comment that will document the purpose of the certIDWasConsumed parameter.

I've added this comment to the function:

 * If the caller is unable to transfer ownership of certID,
 * then the caller must set certIDWasConsumed to NULL,
 * and this function will potentially duplicate the certID object.

@@ +790,5 @@
> +            myCertID = certID;
> +            *certIDWasConsumed = PR_TRUE;
> +        }
> +        else {
> +            myCertID = CERT_DupCertID(certID);

"else" updated in my local tree

"strange": see comment above. If we're able to communicate our consumption back to the original caller, then we don't need to duplicate.

"consume early": Because a CERTOCSPCertID always has its own arena, and ocsp_CreateCacheItemAndConsumeCertID makes use of the same arena. In the failure scenario, that function will destroy that arena. That's why we cannot recover from a failure, we cannot allow the caller to keep ownership of the certID after we've called ocsp_CreateCacheItemAndConsumeCertID. That's also the reason why we don't need to destroy the duplicated certID in the failure case here.

@@ +803,1 @@
>                                                    &cacheItem);

no, see above explanation, because the failure scenario of ocsp_CreateCacheItemAndConsumeCertID will destroy the arena that was used to allocate the certID.

@@ +808,4 @@
>      }
>      if (single) {
> +        PRTime thisUpdate;
> +        rv = DER_GeneralizedTimeToTime(&thisUpdate, &single->thisUpdate);

The presence of thisUpdate is mandatory, according to my reading of RFC 2560.

I took this change from Adam Langley, bug 554369 attachment 434271 [details] [diff] [review].

@@ +818,5 @@
> +                PR_ExitMonitor(OCSP_Global.monitor);
> +                return rv;
> +            }
> +        } else {
> +            OCSP_TRACE(("Not caching responce because the cache is newer"));

ok

@@ +824,5 @@
>      } else {
> +        if (cacheItem->certStatusArena) {
> +            PORT_FreeArena(cacheItem->certStatusArena, PR_FALSE);
> +            cacheItem->certStatusArena = NULL;
> +        }

good idea

@@ +1786,5 @@
> +    }
> +
> +    arena = PORT_NewArena(DER_DEFAULT_CHUNKSIZE);
> +    if (!arena)
> +        goto loser;

good catch. I guess we should also set PORT_SetError(PR_OUT_OF_MEMORY_ERROR);

I'll change
    PRArenaPool *arena = NULL;
and update the loser block to
    if (arena)
        PORT_FreeArena(arena, PR_FALSE);

@@ +1795,5 @@
> +
> +#define DUPHELP(element) \
> +    if (src->element.data) {  \
> +        SECITEM_CopyItem(arena, &dest->element, &src->element); \
> +        if (!dest->element.data) \

ok. Changed to:

        if (SECITEM_CopyItem(arena, &dest->element, &src->element) \
            != SECSuccess) \
            goto loser;     \

@@ +1810,5 @@
> +    DUPHELP(issuerMD2NameHash)
> +    DUPHELP(issuerSHA1KeyHash)
> +    DUPHELP(issuerMD5KeyHash)
> +    DUPHELP(issuerMD2KeyHash)
> +    

ok

@@ +1818,5 @@
> +loser:
> +    PORT_FreeArena(arena, PR_FALSE);
> +    PORT_SetError(PR_OUT_OF_MEMORY_ERROR);
> +    return NULL;
> +

ok

@@ +4935,5 @@
> +     * side channel.
> +     */
> +
> +    if (!cert)
> +        return SECFailure;

ok, added PORT_SetError(SEC_ERROR_INVALID_ARGS);

@@ +4944,5 @@
>          certID, time, PR_FALSE, /* ignoreGlobalOcspFailureSetting */
>          &rvOcsp, &dummy_error_code);
>      if (rv == SECSuccess && rvOcsp == SECSuccess) {
> +        /* The cached value is good. We don't want to waste time validating
> +         * this OCSP response. This the first column in the table above. */

ok

@@ +4952,5 @@
>  
> +    /* The logic for caching the more recent response is handled in
> +     * ocsp_CreateOrUpdateCacheEntry, which is called by this function. */
> +    rv = ocsp_CacheEncodedOCSPResponse(handle, certID, cert, time,
> +                                       NULL /* no pwArg */, encodedResponse,

I can't remember. Thanks for catching.

I looked at the full sequence of my local patches - and I made this change at the same time when I started to change the testing tools. Maybe I ran into trouble and wanted to continue my tests, and forgot to change it back afterwards.

I'll revert it to pass pwArg - and I'll have to test in order to verify it works.

@@ +5161,5 @@
>  
>      *rv_ocsp = ocsp_SingleResponseCertHasGoodStatus(single, time);
>  
>  loser:
> +    /* If single == NULL here then the response was invalid. */

I think the redudancy is helpful.
This first comment explains what single==NULL means.
The second comment below confirms that it's correct to call the function with single==NULL and what effect that will have.

::: security/nss/lib/nss/nss.def
@@ +1012,5 @@
>  ;+       *;
>  ;+};
> +;+NSS_3.14.1 {    # NSS 3.14.1 release
> +;+    global:
> +CERT_DupCertID;

agreed. changed to static.

::: security/nss/lib/ssl/sslauth.c
@@ +231,5 @@
> +                                              ss->sec.peerCert,
> +                                              now,
> +                                              &ss->sec.ci.sid->peerCertStatus,
> +                                              arg);
> +    }

I'd prefer to keep this change as part of this patch.

The earlier libSSL patch is Brian's work, I don't want to ask Brian to add my change to his patch.

Furthermore, the intention of the changes to ocsp.c is to make it work correctly with CERT_CacheOCSPResponseFromSideChannel. I don't know if CERT_CacheOCSPResponseFromSideChannel works correctly without these changes to ocsp.c. I find it reasonable to combine this action with this patch. This patch is labeled "injection", and this piece of code performs the injection.
(In reply to Wan-Teh Chang from comment #122)
> To be consistent with existing usage
> in NSS, I recommend we define a custom type for an array
> of OCSP responses. The struct would contain an internal
> PLArenaPool field, where the SECItems and the data they
> point to are allocated from.

I'm surprised that you prefer consistency so much, that you prefer that I throw away my implementation of this new container data structure, and that you prefer that we rather continue to go the more complicated path of having to maintain arrays manually. IMHO a working general-purpose container data structure can avoid mistakes and can ease readability.
(In reply to Kai Engert (:kaie) from comment #124)
> IMHO a working general-purpose container data
> structure can avoid mistakes and can ease readability.

... and can shorten future coding efforts that also need an array.


(In reply to Wan-Teh Chang from comment #122)
> If we still want to add a SECItemArray type, I suggest that
> we name these functions
>   SECITEM_AllocArray
>   SECITEM_FreeArray
>   SECITEM_ZfreeArray
> to avoid adding a new SECITEMARRAY_ prefix.

agreed, I'll rename as suggested
Attachment #675271 - Attachment description: p29-agl-616669 → p29-agl-616669 [1st of 5]
Attachment #675272 - Attachment description: p29-bsmith-20120510-622680-622684. → p29-bsmith-20120510-622680-622684 [2nd of 5]
Attached patch p31-kaie-injection-and-agl-554369 [3rd of 5] (obsolete) (deleted) — Splinter Review
addressed most of Wan-Teh's comments
Attachment #675273 - Attachment is obsolete: true
Attachment #675274 - Attachment description: p29-kaie-tools-700701-and-tests → p29-kaie-tools-700701-and-tests [4th of 5]
Attached patch p31-kaie-prepare-multi [5th of 5] (obsolete) (deleted) — Splinter Review
Attachment #675275 - Attachment is obsolete: true
Attached patch p31 - not for review [all 5 patches combined] (obsolete) (deleted) — Splinter Review
Attachment #675279 - Attachment is obsolete: true
Target Milestone: 3.14.1 → 3.14.2
RFC 6066, the successor of RFC 4366, says:

   Clients requesting an OCSP response and receiving an OCSP response in
   a "CertificateStatus" message MUST check the OCSP response and abort
   the handshake if the response is not satisfactory with
   bad_certificate_status_response(113) alert.  This alert is always
   fatal.

I'd interpret the requirement to "check that a response is satisfactory" 
as "require that ssl3_HandleCertificateStatus succeeds",
and if not, return the alert.

We don't sent that alert yet,
but that's an straightforward addition to our existing patches,
and I can add that alert.


Any objections to implementing the simple stragy described above?


In case you had objections, we would have to discuss the following:

{ Start of Alternative discussion:

An alternative interpretation of the RFC could be: "must fully verify the OCSP response", 
including a check for a valid OCSP signature.

A call to CERT_CacheOCSPResponseFromSideChannel
prior to (or as part of authentication) would be mandatory,
(as I have suggested in the "kaie-injection" patches).

Furthermore, and that's not in my patches yet,
we would have to check the result value of CERT_CacheOCSPResponseFromSideChannel.

(CERT_CacheOCSPResponseFromSideChannel gets called as part of ssl3_AuthCertificate)

If we decided to go this stricter route,
we would have to clarify in NSS documentation, that an application provided 
authentication hook (registed with SSL_AuthCertificateHook) MUST query
SSL_PeerStapledOCSPResponses and feed them into CERT_CacheOCSPResponseFromSideChannel.

However, if the NSS team thought this was an inappropriate requirement,
because it cannot be enforced by NSS,
then we would have to consider to enforce it by processing 
the stapled OCSP responses as part of the SSL protocol code.

} End of Alternative discussion.
kaie: In order to successfully implement the above strategy, you need to implement an async hook to do so, because "check the OCSP response" includes further chain building and verification.

Microsoft's CryptoAPI demonstrates this fairly clearly, with its recursive checks of OCSP-signer-and-chain for revocation information. In general, this information should be cached (from a previous root->leaf verification, such as of the subscriber cert), but if you're checking the OCSP status *before* cert chain validation, none of that data will be cached.

So for that reason, I would strongly discourage you from doing said check in ssl3_HandleCertificateStatus. I don't see any reason why this can't be delayed until when verifying the actual leaf certificate, for example. But I can say for certainty that spec non-compliance on this (presumably temporarily, by separating the rest into further patches) is not going to break inter-op, so we should not introduce all of that complexity yet.
Ryan, thanks for your thoughts.

If I understand correctly, you have argued that the simpler proposal in the first half of comment 129 is sufficient, and that we should avoid the more complicated approach described in the second part of comment 129.

Going the simpler route is fine with me.
Attached patch p33 - backup of latest work (obsolete) (deleted) — Splinter Review
Attachment #684461 - Attachment is obsolete: true
Attachment #695043 - Attachment is obsolete: true
Given the silence from Brian and Wan-Teh, given that there has been zero visible progress recently on the first part of the work - the SSL protocol code - which is a prerequirement for any later patches - I conclude that Mozilla and Google don't have a sufficiently high motivation to push this feature forward, when compared with other of your priorities.

Given that we had also agreed on a deadline last end of November, for this initial work to get completed, and given this deadline has passed 6 weeks ago, I'd like to give up on waiting for Mozilla/Google people, and would like to ask Bob to help me push this forward.

As Bob hasn't reviewed any of these patches yet, I'll merge patches as appropriate. In particular, I'll combine the two initial patches (agl + kaie + bsmith-shuffling), which add and then later remove things, in order to avoid confusion.
Assignee: bsmith → kaie
Implement client side of SSL protocol code.

Based on Adam Langleys original code, plus adjustments made by me and by Brian Smith.

This patch requests the data, receives it and remembers it for the duration of the socket's lifetime, but doesn't process it.

The patch ensures that we delay verifying the certificate until after we've received a potential cert status message.

This implementation is tested to correctly talk to existing public servers that support OCSP stapling.

(This is a merger of the three patches, attachment 616669 [details] [diff] [review], attachment 622680 [details] [diff] [review], attachment 622684 [details] [diff] [review].)
Attachment #616669 - Attachment is obsolete: true
Attachment #622680 - Attachment is obsolete: true
Attachment #622684 - Attachment is obsolete: true
Attachment #675271 - Attachment is obsolete: true
Attachment #675272 - Attachment is obsolete: true
Attachment #622680 - Flags: review?(wtc)
Attachment #622684 - Flags: review?(wtc)
Attachment #701823 - Flags: review?(rrelyea)
- our default SSL authentication function will inject the received
  stapled data into our cache, making it available for the verification task.

- in order for the above to work, we need code to duplicate a certificate ID,
  which is used as the primary key in the OCSP cache.

- we must fix the behavior of the existing CERT_CacheOCSPResponseFromSideChannel
  function. This part was contributed by Adam Langley in bug 554369.
  I agree to it, and marked it as r=kaie
  Please read that (short) bug for its effect.
Attachment #684457 - Attachment is obsolete: true
Attachment #701824 - Flags: review?(rrelyea)
This patch implements bug 700701.

- it enhances ssltap to understand OCSP stapling.

- it enhances tstclnt to speak OCSP stapling to a server, 
  and also introduces new parameters, that can be used to require
  that valid OCSP stapling data has been received.

- it enhances the NSS test suite to run tstclnt against a few
  public servers that are known to support OCSP stapling.

(there is one block inside #if 0 - please ignore it - we can probably remove that block - I need to double check what it was about).
Attachment #675274 - Attachment is obsolete: true
Attachment #701832 - Flags: review?(rrelyea)
This patch tweaks the earlier patches, and prepares for a future implementation of OCSP-multi-stapling.
Attachment #684458 - Attachment is obsolete: true
Attachment #701833 - Flags: review?(rrelyea)
Attachment #695322 - Attachment is obsolete: true
In addition, I've updated the patch for self-contained testing, using enhanced selfserv, which can self-sign OCSP responses. It's attached to bug 811331.

The 5 combined patches of "round 36" have passed the NSS test suite on Linux, Mac and Windows.
General comments on the patches:

1) Where is the minimal server side support? I'm not really thrilled with testing by connecting out to your own servers. This is the biggest lack in the patch. The support can be as simple as an API which says "here's an OCSP response to staple", or simply fetching the OCSP response at startup for the server's cert and stapling it. We don't need to worry about refreshing the response to get this patch in.


Other than that one I do have some minor comments, none of which would disqualify the patches, but fixes would be prefered.

In patch 1 of 4, the handling of wait_cert_status as optional is unusual. The current SSL engine handles optional commands by checking the status on the next expected command, So whatever state your new ssl3_AuthCertificate() would set, that expected Handle function should check for hs.ws == wait_cert_status, and call ssl3_AuthCertificate() at that point.
Then you could handle ssl3_handleCertificateStatus() in the normal swith. It would make sure that hs.ws was set to wait_cert_status, and it could call ssl3_AuthCertificate() as its last step.

This is more in line with how the SSL state machine normally functions.

In patch 2 of 4:

Nit: fix the indentation where you added the new PRBool in the declaration of ocsp_CacheEncodedOCSPResponse (probably a tab issue).

You've changed:
     if (*rv_ocsp == SECSuccess || cacheInvalid) {
to:
     if (*rv_ocsp == SECSuccess || cacheInvalid || single != NULL) {

My question is, can *rv_ocsp == SECSuccess if singe == NULL? can't you just drop the *rv_ocsp == SECSuccess now?

Patch 3 of 4

Need to fix the test cases to tstclient connects to selfserv (once you fixed the above issue). selfserv should get the OCSP response from our ocsp test cases (either directly from from an ocspclient that fetches a response and just passes it to selfserv). You can actually kill the OCSP server before running tstclient so that we know the responses had to be stapled.

in print_status_response() you assert that the response is <= ocspResponse_other. If this value actually comes from the line, we shouldn't assert, but print big warnings that the response was wrong and continue. ssltap shouldn't die because of bad data, it should log the data and keep going.

path 4 of 4 looks good, but it's dependent on the other patches.
(In reply to Robert Relyea from comment #141)
> General comments on the patches:
> 
> 1) Where is the minimal server side support? I'm not really thrilled with
> testing by connecting out to your own servers. This is the biggest lack in
> the patch.

Fully implemented in bug 811331
(In reply to Robert Relyea from comment #141)
> The support can be as simple as an API which says "here's an OCSP
> response to staple"

I've implemented it in function SSL_SetStapledOCSPResponses
which is contained in bug 811311 attachment 703310 [details] [diff] [review] - that's the patch that implements the server side protocol of OCSP stapling.

I would like to check in the patches from this bug together with the patches from bug 811331 at the same time.


> In patch 1 of 4, the handling of wait_cert_status as optional is unusual.
> The current SSL engine handles optional commands by checking the status on
> the next expected command, So whatever state your new ssl3_AuthCertificate()
> would set, that expected Handle function should check for hs.ws ==
> wait_cert_status, and call ssl3_AuthCertificate() at that point.
> Then you could handle ssl3_handleCertificateStatus() in the normal swith. It
> would make sure that hs.ws was set to wait_cert_status, and it could call
> ssl3_AuthCertificate() as its last step.
> 
> This is more in line with how the SSL state machine normally functions.


Note that this part has already been the subject of debates.

I'm not sure what change exactly you're asking for here, I'm still trying to digest and fully understand your comment.

But let me say, this current part of the code has been re-shuffled by Brian, in the way that he prefers.

Originally, that code was different, and I suspect that original code might actually have been closer to what you are asking for now.

Could you please have a look at the obsolete patch v16, in file ssl3con.c, and tell me if you prefer the way it's done in v16?

That older code uses a function "ssl3_AuthCertificateIfNeeded", which might get called from four (4) different places - depending on which command is seen after the cert/cert_status message.


> Patch 3 of 4
> 
> Need to fix the test cases to tstclient connects to selfserv (once you fixed
> the above issue). selfserv should get the OCSP response from our ocsp test
> cases (either directly from from an ocspclient that fetches a response and
> just passes it to selfserv). You can actually kill the OCSP server before
> running tstclient so that we know the responses had to be stapled.


We don't have an OCSP server yet, so there's nothing I could kill.

I've implemented the self-contained testing of OCSP stapling using enhancements to selfserv, which generates its own OCSP response - without having to talk to an external OCSP server.

Is that approach acceptable?


(I'll address your other requests separately.)
(In reply to Robert Relyea from comment #141)
> 
> Nit: fix the indentation where you added the new PRBool in the declaration
> of ocsp_CacheEncodedOCSPResponse (probably a tab issue).

done


> You've changed:
>      if (*rv_ocsp == SECSuccess || cacheInvalid) {
> to:
>      if (*rv_ocsp == SECSuccess || cacheInvalid || single != NULL) {
> 
> My question is, can *rv_ocsp == SECSuccess if singe == NULL? can't you just
> drop the *rv_ocsp == SECSuccess now?


Can "*rv_ocsp == SECSuccess" and "single == NULL" ever happen at the same time ?

Answer: no!

- if single == NULL, then *rv_ocsp remains at its default value SECFailure

- *rv_ocsp can only become SECSuccess, if single had been set to != NULL


Can we remove the test for "*rv_ocsp == SECSuccess" ?

Answer: yes!


Thank your for catching the redundancy in the test!


In theory, we could either
- remove "*rv_ocsp == SECSuccess" (as you suggested)
or
- remove "single != NULL" (keeping the original test prior to my change).


But I agree with your proposal, changing it to:

    if (cacheInvalid || single != NULL)

because that's better aligned with the following code, which also talks about the state of the "single" variable.


However, I would like to reorder the test to say

    if (single != NULL || cacheInvalid)

to make it clearer that the first condition is the main expectation, while the second condition is optional.


(Sorry for my delay in rethinking this code.)


> in print_status_response() you assert that the response is <=
> ocspResponse_other. If this value actually comes from the line, we shouldn't
> assert, but print big warnings that the response was wrong and continue.
> ssltap shouldn't die because of bad data, it should log the data and keep
> going.


I agree, and I'm glad you mention it. It reminded me of a tool regression caused by bug 811317, which I intend to fix in bug 833857.

I'll attach an updated patch, which will depend on bug 833857.
Depends on: 833857
Updated "patch 2 of 4".
Attachment #701824 - Attachment is obsolete: true
Attachment #701824 - Flags: review?(rrelyea)
Attachment #705428 - Flags: review?(rrelyea)
Attachment #701835 - Attachment is obsolete: true
Updated "patch 3 of 4".
Attachment #701832 - Attachment is obsolete: true
Attachment #701832 - Flags: review?(rrelyea)
Attachment #705429 - Flags: review?(rrelyea)
Comment on attachment 705428 [details] [diff] [review]
round 38 - patch 2 of 4 - cache injection and caching fixes

Use this link to view the changes in this new revision of "patch 2 of 4".
https://bugzilla.mozilla.org/attachment.cgi?oldid=701824&action=interdiff&newid=705428&headers=1
Comment on attachment 705429 [details] [diff] [review]
round 38 - patch 3 of 4 - tools and testing enhancements - client side

Use this link to view the changes in this new revision of "patch 3 of 4".
https://bugzilla.mozilla.org/attachment.cgi?oldid=701832&action=interdiff&newid=705429&headers=1
Latest summary:

- Regarding Bob's comments on testing,
  I told him that bug 811331 implements it, and we are now waiting for Bob
  to complete his review of the patches in that bug.

- Regarding Bob's comment on the SSL state machine.
  I explained that the current result is already the result of some debates.
  He considers to accept the current code in order to get things checked in,
  and have us improve the state machine in a follow-up bug.

- Regarding Bob's other review comments for the patches in this bug:
  I believe I have addressed all of them in the "round 38" patches.

- no changes for patches "1 of 4" and "4 of 4" in round 38.
Minor adjustment for "patch 3 of 4", based on the checked-in fix for bug 833857.
Attachment #705429 - Attachment is obsolete: true
Attachment #705429 - Flags: review?(rrelyea)
Attachment #705603 - Flags: review?(rrelyea)
(In reply to Kai Engert (:kaie) from comment #143)
> (In reply to Robert Relyea from comment #141)
> > In patch 1 of 4, the handling of wait_cert_status as optional is unusual.
> > The current SSL engine handles optional commands by checking the status on
> > the next expected command, So whatever state your new ssl3_AuthCertificate()
> > would set, that expected Handle function should check for hs.ws ==
> > wait_cert_status, and call ssl3_AuthCertificate() at that point.
> > Then you could handle ssl3_handleCertificateStatus() in the normal swith. It
> > would make sure that hs.ws was set to wait_cert_status, and it could call
> > ssl3_AuthCertificate() as its last step.
> > 
> > This is more in line with how the SSL state machine normally functions.
> 
> Could you please have a look at the obsolete patch v16, in file ssl3con.c,
> and tell me if you prefer the way it's done in v16?
> 
> That older code uses a function "ssl3_AuthCertificateIfNeeded", which might
> get called from four (4) different places - depending on which command is
> seen after the cert/cert_status message.

(In reply to Kai Engert (:kaie) from comment #149)
> - Regarding Bob's comment on the SSL state machine.
>   He considers to accept the current code in order to get things checked in,
>   and have us improve the state machine in a follow-up bug.

I reworked the code so that ssl3_AuthCertificate is only called in two places instead of 5 or 6. I think it is too tedious (and dangerous) to have to evalulate whether we need to add the conditional call to ssl3_AuthCertificate for every new handshake message we need to support. So, while I agree that this style is different than the normal style in libssl, I think that the organization I chose is safer. Also, future patches to libssl will add new calls to ssl3_AuthCertificate for different reasons, and IMO it is worth simplifying the logic as much as we can now so that the added complexity from those future new uses is more bearable.
Kai: Do you have an updated target version for this bug?
Status: NEW → ASSIGNED
Target Milestone: 3.14.2 → 3.14.3
Comment on attachment 701823 [details] [diff] [review]
round 36 - patch 1 of 4 - client SSL protocol code

r+ I would like a separate patch to address the ssl state macine issue, but lets get this clear first.

brian- my complaint is the patch dies not follow how the ssl state machine works today I'm sympathetic to you woories, but adding and unexpected methode for handling optional messages were there us already an existing method has even more dangers. We should not put state machine desogn changes in peicemeal.
Attachment #701823 - Flags: review?(rrelyea) → review+
Comment on attachment 705428 [details] [diff] [review]
round 38 - patch 2 of 4 - cache injection and caching fixes

r+ relyea.
Attachment #705428 - Flags: review?(rrelyea) → review+
Comment on attachment 705603 [details] [diff] [review]
round 39 - patch 3 of 4 - tools and testing enhancements - client side

r+ but I would like the following:
1) remove the dead if 0 block.
provide a supplemental patch that can test SSL_Authenticate. We want to make sure we still regression test that function.
Attachment #705603 - Flags: review?(rrelyea) → review+
Comment on attachment 701833 [details] [diff] [review]
round 36 - patch 4 of 4 - prepare multi stapling

r+ now that the rest are approved.
Attachment #701833 - Flags: review?(rrelyea) → review+
(In reply to Robert Relyea from comment #155)
> provide a supplemental patch that can test SSL_Authenticate. We want to make
> sure we still regression test that function.

Bob, I think you were referring to SSL_AuthCertificate. I think the code path in tstclnt to test SSL_AuthCertificate still exists, please see the explanation below:


tstclnt does async cert validation by default
    shouldPause == TRUE

-O requests sync validation
    shouldPause == FALSE


In the past we had:
    if (serverCertAuth.shouldPause) {
	SSL_AuthCertificateHook(s, ownAuthCertificate, &serverCertAuth);
    } else {
	SSL_AuthCertificateHook(s, SSL_AuthCertificate, serverCertAuth.dbHandle);
    }

Now we always set:
    SSL_AuthCertificateHook(s, ownAuthCertificate, &serverCertAuth);


But the old code path still gets tested, because:
    ownAuthCertificate()
    {
        if (!shouldPause) {
            if ( ! "newly introduced, optional stapling test mode" ) {
                return SSL_AuthCertificate()
            }
            ...
        }
        ...
    }
(In reply to Robert Relyea from comment #155)
> r+ but I would like the following:
> 1) remove the dead if 0 block.

This patch removes the block.

The patch is otherwise identical to what Bob gave r+ to.
Attachment #711550 - Flags: review+
Blocks: 841792
Comment on attachment 701823 [details] [diff] [review]
round 36 - patch 1 of 4 - client SSL protocol code

Checking in security/nss/lib/ssl/SSLerrs.h;
/cvsroot/mozilla/security/nss/lib/ssl/SSLerrs.h,v  <--  SSLerrs.h
new revision: 1.24; previous revision: 1.23
done
Checking in security/nss/lib/ssl/ssl.def;
/cvsroot/mozilla/security/nss/lib/ssl/ssl.def,v  <--  ssl.def
new revision: 1.40; previous revision: 1.39
done
Checking in security/nss/lib/ssl/ssl.h;
/cvsroot/mozilla/security/nss/lib/ssl/ssl.h,v  <--  ssl.h
new revision: 1.60; previous revision: 1.59
done
Checking in security/nss/lib/ssl/ssl3con.c;
/cvsroot/mozilla/security/nss/lib/ssl/ssl3con.c,v  <--  ssl3con.c
new revision: 1.202; previous revision: 1.201
done
Checking in security/nss/lib/ssl/ssl3ext.c;
/cvsroot/mozilla/security/nss/lib/ssl/ssl3ext.c,v  <--  ssl3ext.c
new revision: 1.31; previous revision: 1.30
done
Checking in security/nss/lib/ssl/ssl3prot.h;
/cvsroot/mozilla/security/nss/lib/ssl/ssl3prot.h,v  <--  ssl3prot.h
new revision: 1.23; previous revision: 1.22
done
Checking in security/nss/lib/ssl/sslerr.h;
/cvsroot/mozilla/security/nss/lib/ssl/sslerr.h,v  <--  sslerr.h
new revision: 1.26; previous revision: 1.25
done
Checking in security/nss/lib/ssl/sslimpl.h;
/cvsroot/mozilla/security/nss/lib/ssl/sslimpl.h,v  <--  sslimpl.h
new revision: 1.110; previous revision: 1.109
done
Checking in security/nss/lib/ssl/sslnonce.c;
/cvsroot/mozilla/security/nss/lib/ssl/sslnonce.c,v  <--  sslnonce.c
new revision: 1.29; previous revision: 1.28
done
Checking in security/nss/lib/ssl/sslsock.c;
/cvsroot/mozilla/security/nss/lib/ssl/sslsock.c,v  <--  sslsock.c
new revision: 1.100; previous revision: 1.99
done
Checking in security/nss/lib/ssl/sslt.h;
/cvsroot/mozilla/security/nss/lib/ssl/sslt.h,v  <--  sslt.h
new revision: 1.24; previous revision: 1.23
done
Attachment #701823 - Flags: checked-in+
Comment on attachment 705428 [details] [diff] [review]
round 38 - patch 2 of 4 - cache injection and caching fixes

Checking in security/nss/lib/certhigh/ocsp.c;
/cvsroot/mozilla/security/nss/lib/certhigh/ocsp.c,v  <--  ocsp.c
new revision: 1.78; previous revision: 1.77
done
Checking in security/nss/lib/certhigh/ocsp.h;
/cvsroot/mozilla/security/nss/lib/certhigh/ocsp.h,v  <--  ocsp.h
new revision: 1.25; previous revision: 1.24
done
Checking in security/nss/lib/ssl/sslauth.c;
/cvsroot/mozilla/security/nss/lib/ssl/sslauth.c,v  <--  sslauth.c
new revision: 1.19; previous revision: 1.18
done
Attachment #705428 - Flags: checked-in+
Comment on attachment 711550 [details] [diff] [review]
round 40 - patch 3 of 4 - tools and testing enhancements - client side

Checking in security/nss/cmd/selfserv/selfserv.c;
/cvsroot/mozilla/security/nss/cmd/selfserv/selfserv.c,v  <--  selfserv.c
new revision: 1.103; previous revision: 1.102
done
Checking in security/nss/cmd/ssltap/ssltap.c;
/cvsroot/mozilla/security/nss/cmd/ssltap/ssltap.c,v  <--  ssltap.c
new revision: 1.24; previous revision: 1.23
done
Checking in security/nss/cmd/tstclnt/tstclnt.c;
/cvsroot/mozilla/security/nss/cmd/tstclnt/tstclnt.c,v  <--  tstclnt.c
new revision: 1.73; previous revision: 1.72
done
Checking in security/nss/tests/ocsp/ocsp.sh;
/cvsroot/mozilla/security/nss/tests/ocsp/ocsp.sh,v  <--  ocsp.sh
new revision: 1.4; previous revision: 1.3
done
Attachment #711550 - Flags: checked-in+
Comment on attachment 701833 [details] [diff] [review]
round 36 - patch 4 of 4 - prepare multi stapling

Checking in security/nss/cmd/tstclnt/tstclnt.c;
/cvsroot/mozilla/security/nss/cmd/tstclnt/tstclnt.c,v  <--  tstclnt.c
new revision: 1.74; previous revision: 1.73
done
Checking in security/nss/lib/ssl/ssl.def;
/cvsroot/mozilla/security/nss/lib/ssl/ssl.def,v  <--  ssl.def
new revision: 1.41; previous revision: 1.40
done
Checking in security/nss/lib/ssl/ssl.h;
/cvsroot/mozilla/security/nss/lib/ssl/ssl.h,v  <--  ssl.h
new revision: 1.61; previous revision: 1.60
done
Checking in security/nss/lib/ssl/ssl3con.c;
/cvsroot/mozilla/security/nss/lib/ssl/ssl3con.c,v  <--  ssl3con.c
new revision: 1.203; previous revision: 1.202
done
Checking in security/nss/lib/ssl/sslauth.c;
/cvsroot/mozilla/security/nss/lib/ssl/sslauth.c,v  <--  sslauth.c
new revision: 1.20; previous revision: 1.19
done
Checking in security/nss/lib/ssl/sslimpl.h;
/cvsroot/mozilla/security/nss/lib/ssl/sslimpl.h,v  <--  sslimpl.h
new revision: 1.111; previous revision: 1.110
done
Checking in security/nss/lib/ssl/sslnonce.c;
/cvsroot/mozilla/security/nss/lib/ssl/sslnonce.c,v  <--  sslnonce.c
new revision: 1.30; previous revision: 1.29
done
Checking in security/nss/lib/ssl/sslsock.c;
/cvsroot/mozilla/security/nss/lib/ssl/sslsock.c,v  <--  sslsock.c
new revision: 1.101; previous revision: 1.100
done
Checking in security/nss/lib/util/nssutil.def;
/cvsroot/mozilla/security/nss/lib/util/nssutil.def,v  <--  nssutil.def
new revision: 1.19; previous revision: 1.18
done
Checking in security/nss/lib/util/seccomon.h;
/cvsroot/mozilla/security/nss/lib/util/seccomon.h,v  <--  seccomon.h
new revision: 1.9; previous revision: 1.8
done
Checking in security/nss/lib/util/secitem.c;
/cvsroot/mozilla/security/nss/lib/util/secitem.c,v  <--  secitem.c
new revision: 1.19; previous revision: 1.18
done
Checking in security/nss/lib/util/secitem.h;
/cvsroot/mozilla/security/nss/lib/util/secitem.h,v  <--  secitem.h
new revision: 1.10; previous revision: 1.9
done
Attachment #701833 - Flags: checked-in+
Attached patch patch to fix testing deadlock (deleted) — Splinter Review
Checking in ocsp.sh;
/cvsroot/mozilla/security/nss/tests/ocsp/ocsp.sh,v  <--  ocsp.sh
new revision: 1.6; previous revision: 1.5
done
Attachment #714785 - Flags: checked-in+
All tests are green, marking fixed.
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: 3.14.3 → 3.14.4
awesome news
Attached patch patch to fix .def versions (deleted) — Splinter Review
Thanks to Wan-Teh for noticing that I failed to update the version numbers in the .def files to the correct release.

Attaching patch to fix it.
Attachment #718505 - Flags: review?(wtc)
Comment on attachment 718505 [details] [diff] [review]
patch to fix .def versions

r=wtc.
Attachment #718505 - Flags: review?(wtc) → review+
Comment on attachment 718505 [details] [diff] [review]
patch to fix .def versions

checked in:
https://hg.mozilla.org/projects/nss//rev/fbac6f829f1c63378a3d6db1cd0cd599b3053842
Attachment #718505 - Flags: checked-in+
Target Milestone: 3.14.4 → 3.15
Blocks: 866363
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: