After some digging, it appears that the public and private keys used to negotiate the shared secret which encrypt a TLS connection are actually ephemeral. The server certificate's private key is not actually used to encrypt the shared secret, but a CertificateVerify message which consists of padding, a context string, and a hash of the handshake messages (which contains both client and server-provided random data) is signed by the server certificate's private key. I interpret this as meaning that it should not be possible for a client to make the server sign (decrypt) arbitrary data. In fact, if the server only signs rather than encrypts both the TLS session's shared secret and signs certificates, then this attack should not be feasible (at least for TLS 1.3; I didn't research previous versions of TLS). Though my analysis may be off, I had enough confidence at this point that the aforementioned attack wasn't readily exploitable.
Nonetheless, given that other protocols may not make such clever use of keys, my (limited) understanding of cryptography has led me to be cautious about using the same keys for signing and encryption. According to Schneier, the aforementioned attack works when, “...the encrypting operation is the same as the signature-verifying operation and the decrypting operation is the same as the signature operation.”, and one of the methods to mitigate this potential attack is to use separate keys for signing and encryption. Thus I set out to design version v0.2.0 of the infrastructure with this in mind.
Having the infrastructure drawn out, the next thought on my mind was: how am I going to convince myself that the infrastructure is viable? The answer that I chose was to speculate various use cases that I would want to apply to the infrastructure. I could then write test cases for each of the various use cases in order to convince myself of the infrastructure's integrity. While I was brainstorming the use cases, it became clear to me that there was a useful distinction between the reason for a use case and the mechanism for fulfilling that use case; for example, the admin may wish to invite a friend (the reason), but this is accomplished by issuing a friend's client certificate (the mechanism). Hence I decided to name all of my use cases by their mechanism, knowing that each may have multiple reasons for being used. The list was as follows:
The purpose of the “revoke-referred” test was to allow an admin to ban a referred user without having to wait for the referrer to issue a Certificate Revocation List (CRL) for the user. I had thought that I'd be able to use Indirect CRLs in order to easily implement this test, but I couldn't figure out how to actually do this, so I started searching and running into similar but irrelevant posts. For example, this issue seemed to me to be an issue with how CRLs were searched for, not how to issue indirect CRLs. This question was similar to my need, but did not explain how to generate an indirect CRL. This lone question appears to focus on verification, but, without generation, is not useful to me. Most hopeful was this old forum post which provided an example of an indirect CRL, but in this case the example appeared to to rely on the fact the indirect CRL issuer was only issuing CRLs for a particular CA and not multiple CAs (the latter being my requirement); nonetheless I tried to get example working but didn't meet with any success. While nothing solved my specific need, I did, however, learn from this post that the purpose of the serial number is to uniquely identify a certificate issued by a given CA. For the rest, I took some time to study the RFC and hypothesize a way forward.
Since I couldn't figure out how to implement “revoke-referred” properly, I came up with a stop-gap test instead. While there didn't appear to be a way to ban client certificates globally in InspIRCd, it was possible to ban a given certificate fingerprint on a channel, so I designed a test that would have a user join a temporary channel, set a ban on the channel for the certificate fingerprint of the other user, then have the other user attempt to join the channel and fail; for parity, a third user who wasn't banned on the channel would then join the channel successfully. Plan in hand, I attempted this only to find that whenever I joined the temporary channel I would not be an operator as I expected, thus I couldn't set any bans on the channel. After a bit of searching I found that my test server's InspIRCd options tag was set with defaultmodes="nt" when it should have been defaultmodes="not", talk about Murphy's Law! Fixing this issue allowed me to set a ban on the channel and implement the test.
Implementing the next test went smoothly, but I then realized that there wasn't much point to implementing the last three tests (“friend-revoke-client”, “friend-revoke-referrer”, and “referred-revoke-client”) at this time, since there does not appear to be a way for users to self-revoke their certificates, and having the admin revoke the friend certificates (for the former two) and the friend revoke the referred certificate (for the latter-most) would simply be a repeat of previous tests. These tests would make sense if I had some kind of network protocol for certificate management and could mimic a client authenticating with their certificate and requesting a revocation on the given certificate, but they did not appear useful as tests with the current implementation.
With tests implemented which verified that the infrastructure appeared to work as I expected (besides the indirect CRLs), I began to look ahead to actually deploying this infrastructure. Given the sheer volume of complex commands I had to issue for testing, I decided that there was no way that I could realistically run these commands by hand and be sure that I had done the same as in my tests. I would have to develop tools in order to simplify my work for me.
The next choice I had was where to put everything. After consulting the Filesystem Hierarchy Standard v3, I settled on /var/lib/afr/instance, where instance might one day allow multiple independent AFR PKIs; though I would be exposing some information in that directory to consumers, the /srv path felt like it was more for Web services. System AFR configuration would go to /etc/afr/afr.conf, and I also ended up sticking the OpenSSL configuration file in the same directory path, though, in retrospect, /usr/lib/share/afr might have been more appropriate.
Layout in place, I began to implement the basic admin use cases, and found that I needed to add auxiliary commands in order to implement parts of certain uses cases; for example receive-crl in order to apply a CRL from a referrer certificate. Once the admin use cases were done, I then had to figure out how to implement the use cases for friends and referreds. While it might have been possible to add non-admin-specific subcommands, giving them names such as friend-init and friend-revoke-referred would be cumbersome for users to type and users would be liable to conflate non-admin subcommands with admin subcommands such as the regular init. The solution that I chose was to add a separate afrc command for clients with its own set of subcommands; functionality common to both afr and afrc would be put into afr_lib.sh and both friends and referreds would use the same command set due to the similarities between the two users. I then implemented the subcommands needed for client use cases, although the implementation I used was slightly awkward since I forced clients to use make install in order to place afr_lib.sh and openssl.cnf in expected locations, but I hadn't thought up a particularly easy workaround for that and it was “good enough”.
A trick that I used in order to convince myself of the basic robustness of my tooling was to refactor the AFR tests that I'd written for InspIRCd to use the new AFR utilities. With each AFR subcommand implemented, I'd refactor the test to use the corresponding subcommand rather than raw OpenSSL commands. Thus, by the time I'd finished writing the tooling, I had tests that verified the tooling and the PKI! While this was good news for when I deployed a new AFR PKI, I still had to figure out how I was going to migrate from my existing PKI to the new AFR PKI.
I'm not sure why it took reading about cross-certification for this vulnerability to occur to me, but, as I studied path construction, I realized that the AFR PKI had nothing in it that would prevent a referrer certificate from issuing server certificates! D'oh. Thankfully, the solution was straightforward enough: add a critical extension to the signing and all referrer certificates which only authorizes those certificates for client authentication; specifically: extendedKeyUsage=critical,clientAuth. Strangely enough, the RFC section on Extended Key Usage states that: “In general, this extension will appear only in end entity certificates.”. This caused me a bit of worry, since a malicious referrer could leave off the extension on an issued service certificate, but thankfully my test showed that the extension was effective despite not being on the end-entity certificate. With that fixed, I could then begin brainstorming a transitional PKI.
The previous PKI (v0.1.0) I had referred to as “Friend-of-Friend” and was structured as such:
After a bit of contemplation, I realized that I could authorize friends using client certificates issued by the old PKI by signing the old root certificate as a referrer certificate. As a downside, any friend-of-friend certificates issued by friends wouldn't be authorized due to the pathlen:0 restriction on the cross-signed old root certificate, but no one had yet bothered to use that feature (surprisingly, users don't like typing in esoteric OpenSSL commands) so the solution was “good enough”. This then gave the following series of PKI paths:
Configuring the service such that clients who trusted both the old root and the new root would be able to authorize the service was a bit trickier. This is in part because the validation path would branch depending on which trust anchor the client was using, and the trust anchor would in either case be outside of the certificates sent by the server. The trick here was to have the old root certificate sign the new root certificate, then have the server present both the cross-signed new root certificate and the service certificate:
Naturally, I wanted to make sure that the planned infrastructure actually worked, so I wrote some transition tests for that purpose; in addition to verifying that old and new certificates worked with the transition PKI, I made sure to check that new certificates did not work with the old PKI. I did have some worries that the client software which trusted the new root would not authenticate the server due to the extraneous, cross-signed certificate sent by the server, but, to my relief, the tests showed otherwise. All of the tests ended up passing as I had hoped.
One last thing that I did in order to help remind users to transition was to write a quick, hacky patch that would warn users with old client certificates that they should upgrade them. Finally, since the tests were were functional and the tooling was in place, I decided that it was time to deploy the changes.
With the server running the new AFR PKI, all that remained was me for to document my work. You know, so I can use it to pick up women at bars. Posteriority. Or something. To this end, I decided to write a LATEXwhite paper and add it to the AFR tools doc directory. I then realized that, much to my annoyance, re-configuring my test environment to use AFR had broken my hacky InspIRCd tests, so I wrote a commit to fix them; perhaps one day I'll implement an elegant solution to manage test environment configuration. Last of all, I wrote (am writing?) this blog in order to finish documenting my work; this was most useful for the transitional PKI, since that fell outside of the scope of the AFR white paper.
There's still a bunch of work that I'd like to do on AFR in order to get it into something that might actually be generally useful for people. First and foremost will be figuring out whether I can actually even use indirect CRLs in a manner which suits my needs; if not, the whole scheme may fall apart without some clever alterations. After that, usability will be key. This may mean writing a network protocol, integrating AFR with a technology such as PAM, or writing some kind of wizard program. While I'm glad to have this version complete, there's still plenty of work to be done.