The news broke yesterday that the entire trove of diplomatic cables held by WikiLeaks is now public and unredacted, due to a bizarre cryptographic mixup between WikiLeaks and the Guardian. WikiLeaks broke the story, after keeping quiet about it for months (I think people outside of the organisation had started to put the pieces together), accusing the Guardian, a UK newspaper, of publishing a book that negligently divulged the password to the encrypted file. The Guardian fired back, claiming that WikiLeaks said the password was temporary and that it’s their fault for making the encrypted data available. I don’t want to go into the politics here — this is a technical blog. I have already seen a lot of confusion online, both in the media and in the more technically literate (such as on Slashdot). I want to put together this FAQ to answer some of the common criticisms. Here I will try to remain politically impartial, and answer these questions from a cryptography standpoint. I’m not a qualified cryptographer (I don’t know where to begin a cryptanalysis of AES), but I do have a detailed understanding of the technology involved here. I would appreciate any corrections you might have in the comments section.
I am not affiliated with WikiLeaks. I am piecing together this information from various sources, but as you shall see, the facts are really not in question here (at least, everything relevant to a cryptographic discussion of the situation) — what is in question is who is at fault, and cryptography can tell us that.
The bottom line is, as I wrote yesterday, that cryptographically, WikiLeaks is in the right and Guardian is in the wrong. From what I can gather, WikiLeaks followed reasonable security expectations, and Guardian broke them. I would encourage anyone who places the blame on WikiLeaks to read this FAQ in detail (or at least skim through for any questions you may have).
The executive summary
It makes no sense to blame WikiLeaks. As I will address below, the security practices could have been improved to prevent this from happening, but I don’t think it is reasonable to expect WikiLeaks to plan for this scenario. By definition (by giving the cables to David Leigh, the Guardian editor responsible for disclosing the passphrase), WikiLeaks chief Julian Assange was trusting Leigh with everything. There are a million things Leigh could have done to break the security. He could have left the cables sitting unencrypted on a laptop connected to the Internet (and in fact, according to WikiLeaks, he did that too). He could have uploaded the cables to a public server if he’d been malicious. Assange had to trust that he wasn’t going to be malicious or stupid with the cables, by the definition of the operation. Divulging the password was just another possible malicious/stupid thing that Assange had to trust Leigh not to do, and he did it. In any other security model they could have used to transfer the cables, there are equally stupid things Leigh could have done to mess up. So I feel that it is unfair to blame WikiLeaks for not predicting Leigh’s stupidity.
Ignoring stupidity on the part of trusted parties, the system as described above was entirely secure. This is my main point: WikiLeaks made an encrypted file public. Guardian made the passphrase public. Either one of these is safe on its own, but if both are public, the system breaks. So clearly, one of the two parties is to blame. The clear cryptographic answer is: Guardian is to blame. When we use encryption, it is entirely expected and normal for the encrypted text to be visible to the public. If the encrypted text isn’t visible to the public, it can’t possibly be transmitted. You can’t run a secure system under the assumption that an encrypted file will not be seen by others. The entire point of cryptography is that we transmit the encrypted text “in the clear” (without further encryption). On the flipside, the entire point of cryptography is that we don’t divulge the encryption key. So WikiLeaks was in the right to make the encrypted file public (however that happened), assuming that the passphrase would be kept private. The Guardian was in the wrong to make the passphrase public, assuming that the encrypted file would be kept private. By definition, the encrypted file was public because it was available from a public server for at least a few hours.
Again, Leigh was a non-technical person, so we can’t expect him to have understood all of these subtleties. But let’s remember what we’re dealing with here: arguably the most important secret documents in history. The man should have gotten a better technical understanding of cryptography before he did this, and failing that, given that he was not an expert, he should not have presumed it was safe to disclose the password. Even assuming no technical knowledge, it is completely idiotic to publish any kind of password, even an expired one. If nothing else, it would have been safe in case Assange did re-use the same password again. And just to dig a bit deeper: Assange took the extra step of creating a salt, the additional word that he told Leigh to remember and insert into the password but not write down, for the express purpose of protecting the data in the event that someone got a hold of the piece of paper. The final, clinching, idiotic move is that Leigh wrote the salt in the book as well as the password — the one thing he was never supposed to write down.
What are the facts?
Firstly, let me state the facts as I see them. Feel free to skip this lengthy section, but I will refer to it later on. I’ll begin by quoting a passage from the book WikiLeaks: Inside Julian Assange’s War on Secrecy. This book, published in February 2011, is available in book stores and can be found online. It was written by two Guardian journalists David Leigh and Luke Harding. Chapter 11, entitled “The Cables,” details the story of how Julian Assange, the editor in chief of WikiLeaks, sent over 251,000 U.S. diplomatic cables to David Leigh, for the purpose of redacting and publishing the cables via the Guardian. The following passage appears verbatim from the book, except that I have replaced the passphrases with XXXs (if you want to find them, they are readily available online as of yesterday). This is largely the same passage that WikiLeaks quoted in their release (but I got it from another copy to be sure):
Eventually, Assange capitulated. Late at night, after a two-hour debate, he started the process on one of his little netbooks that would enable Leigh to download the entire tranche of cables. The Guardian journalist had to set up the PGP encryption system on his laptop at home across the other side of London. Then he could feed in a password. Assange wrote down on a scrap of paper: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX. “That’s the password,” he said. “But you have to add one extra word when you type it in. You have to put in the word ‘XXXXXXXXXX’ before the word ‘XXXXXXX’. Can you remember that?”
“I can remember that.”
Leigh set off home, and successfully installed the PGP software. He typed in the lengthy password, and was gratified to be able to download a huge file from Assange’s temporary website. Then he realized it was zipped up – compressed using a format called 7z which he had never heard of, and couldn’t understand. He got back in his car and drove through the deserted London streets in the small hours, to Assange’s headquarters in Southwick Mews. Assange smiled a little pityingly, and unzipped it for him.
This is about all the information I need to go on, and it’s written in Leigh’s (or his associate’s) own words, so it is difficult to dispute. WikiLeaks quoted this above passage, adding that the reason for having to add the extra word ‘XXXXXXXXXX’ was “so if the paper were seized, the password would not work without Leigh’s co-operation.”
The Guardian’s full statement from yesterday appears below:
It’s nonsense to suggest the Guardian’s WikiLeaks book has compromised security in any way.
Our book about WikiLeaks was published last February. It contained a password, but no details of the location of the files, and we were told it was a temporary password which would expire and be deleted in a matter of hours.
It was a meaningless piece of information to anyone except the person(s) who created the database.
No concerns were expressed when the book was published and if anyone at WikiLeaks had thought this compromised security they have had seven months to remove the files. That they didn’t do so clearly shows the problem was not caused by the Guardian’s book.
Damning for WikiLeaks, and the media have sure picked up on it. Essentially, WikiLeaks was incompetent in their handling of the situation. If WikiLeaks had done their job properly, it wouldn’t have mattered whether or not the password was published.
Except that because I understand the cryptography, I don’t need the Guardian to tell me what WikiLeaks “should” have done. I can piece together the blame just from reading the story of what went down that night when Assange gave the cables to Leigh. The only interesting claim from Guardian’s statement is this: “we were told it was a temporary password which would expire and be deleted in a matter of hours.” That is the only fact that is in dispute here — not whether or not the password was temporary (it is an indisputable fact that it wasn’t), but whether or not WikiLeaks told the Guardian that it was temporary. If they did, then some of the blame can be put on them. But just some. WikiLeaks vehemently denies that they told Guardian that the password was temporary, tweeting today: “It is strictly false that the Guardian was told the password or file were temporary, hence the elaborate password handover method.” We will never know whether WikiLeaks did or did not tell Guardian that the password was temporary. But we can deduce from the facts that the password was not temporary, and anyone with a basic knowledge of security (certainly anyone entrusted with the most politically explosive documents in history) should have known not to publish a password that a serious-looking white-haired Aussie in a black trench coat gave to you on a piece of paper and then explicitly told you a second part of the password which you were not to write down.
So the password is now public knowledge. But what about the archive? The password is useless without access to the encrypted files. We know that the encrypted archive is circulating around the torrents as we speak. It shouldn’t be too hard to find. The question is, where did it come from? Obviously it was originally created by Julian Assange. The details on how it leaked out are very sketchy, but most articles seem to point to this anonymous PasteBin post which seems to link things together (hyperlinks mine):
No-one took note of the Leigh book since where the encrypted file was located was a mystery! Enter the 2nd bad guy of our story: Daniel Domscheit-Berg. DDB said he didn’t know the password before reading the Leigh book, but apparently *did* know the hidden file name on Bittorrent. Using this these two facts (pw+hidden file location), he then went around ingratiating himself with various players by handing them the entire Cablegate archive under the mutually deniable cover of “warning” them about the Leigh book. Enraged after being expelled from the CCC [Chaos Computer Club] he “gave” the cables in this way to more and more people in exchange for alliances and positive spin culminating with the now infamous Freitag and Information.dk articles and now the thing is fucking everywhere…
So it’s hard to say how that file leaked out, but speculation says that since it was on the WikiLeaks server temporarily, and WikiLeaks was aggressively mirroring their site to avoid being taken down, it was copied within the few hours that it was available online, and spread from there. Spiegel reports (courtesy Jonathan for the link) that the file was left on the WikiLeaks server for an extended time, and that Daniel Domscheit-Berg took a copy of the data when he left, and it spread from there. This seems to be corroborated by a report by Nigel Parry (an excellent write-up of the whole debacle — courtesy Will for that link). In any case, it is largely irrelevant how this file got out, because it was already public in the first place, and that shouldn’t have been a problem.
So it seems, the facts were these:
- Assange created an encrypted archive with PGP, using symmetric encryption (passphrase). It is unclear when this took place (whether this was created specifically for Leigh, or whether it was created earlier).
- Around July 2010, he met Leigh in person, giving him most of the passphrase on a piece of paper, and verbally giving him the remainder of the password as a “salt”. It is unclear whether Assange told Leigh the password was temporary or would expire.
- Assange uploaded the encrypted file to the public WikiLeaks server, where it was visible to the public in its encrypted form. It is unclear (and irrelevant) whether SSL was used.
- Leigh downloaded the encrypted file from the WikiLeaks server.
- Leigh used PGP and the passphrase given to him by Assange to decrypt the file.
- Somehow, the encrypted data file ended up in circulation on BitTorrent. The facts surrounding this are sufficiently shady that I won’t go into details here.
- About seven months later, in February 2011, Leigh published a book containing the full passphrase given to him by Assange, including the “salt” that he was told to remember.
- WikiLeaks immediately found out about this, but kept quiet so as not to draw attention to it.
- In August 2011, rumours begin to circulate on the Internet that both the encrypted file and the password are readily available.
- On 1 September, WikiLeaks makes a public announcement calling out the Guardian.
Another thing that is unclear is whether WikiLeaks created a separate archive/passphrase for each journalistic organisation they shared the cables with, or whether there was one passphrase shared between all the journalists. Either scenario is feasible. Many people are denouncing the “stupidity” that Assange would reuse the same passphrase that he gave to Leigh when he later distributed the file, but given that we don’t know where the encrypted file came from, it may in fact be that he only used this passphrase one time, and the archive that spread on BitTorrent is the same one that was only available for Leigh to download for a short time.
Assuming the above facts, I will address most of the questions or points of misinformation that I have seen in the past 48 hours.
Why didn’t Assange transmit the file over a secure connection, such as SSL?
We don’t know whether SSL, (TLS) was used, and it doesn’t matter. SSL doesn’t authenticate the client, it authenticates the server. If Leigh had connected to the WikiLeaks server over SSL, then he would have known he was talking to WikiLeaks, but WikiLeaks wouldn’t have known they were talking to Leigh. To demonstrate, visit https://www.google.com/accounts/. This page is sent to your browser via a secure connection, so you know you are talking to Google. But anybody in the world can visit that page. Google doesn’t know who you are (yet).
The file didn’t leak because of a man in the middle capturing the HTTP traffic between WikiLeaks and Leigh. It leaked because it was available on a public server. If I haven’t answered your question, please read the next one.
Okay, why didn’t Assange require that Leigh log in to the server (via SSL) to download the file?
So firstly, let’s assume that we’re using this as an alternative to PGP — that is, the file is made available to Leigh in plain text, but that it required a secure (SSL) log-in with a password that only Leigh knew. This would work much the same way as requiring someone log in to Gmail to retrieve a file — there would be a secure connection between WikiLeaks and Leigh and this time WikiLeaks would know that Leigh was the one downloading the file. Nobody else could log in to that file, and nobody could snoop the HTTP traffic between WikiLeaks and Leigh. Importantly, it would have been a temporary password, as Leigh believed. Once Leigh finished downloading the file, WikiLeaks would permanently delete it, ensuring that the password was now useless. In hindsight, perhaps this would have been a better idea (but once again, Assange would have to have predicted that Leigh would be stupid enough to disclose the passphrase).
It’s a reasonable solution, but has a few technical flaws. Firstly, you have to assume that WikiLeaks is set up with server software that handles authenticated file downloads. Such software is readily available as open source, but takes time to set up and understand its security implications. Also, no software is perfect, and such programs are commonly found to have security flaws which let the wrong person log in. So it would have been more dangerous to trust such software. (PGP, on the other hand, isn’t being used for authentication, but for encryption — there is no software bug that would let someone without the passphrase decrypt the file, since the passphrase is mathematically required to decrypt it.)
Secondly, we would have to assume that the connection to the server and Leigh’s client were secure. SSL has a number of well-known vulnerabilities which make it inappropriate for security of this importance. SSL requires verification from a trusted root certificate authority. WikiLeaks does not seem to have a verified certificate, and given the current political climate, it might be difficult for them to get one. Either way, a man in the middle could be using a dodgy certificate to snoop on the traffic as it is transferred. This also assumes that Leigh’s browser is clean: it doesn’t have any bad certificate authorities installed, it doesn’t have any dodgy browser extensions, he isn’t using Internet Explorer 6, and so on. Any spyware on his machine could grab either his authentication password or the document itself and transfer it to another party. That is why Assange explicitly requested that the cables only be decrypted on a computer not connected to the Internet. If the cables were transferred in the clear over SSL, they would have immediately become stored on a machine connected to the Internet — not to mention the fact that WikiLeaks would have had to store the cables unencrypted on their machine too.
So it would have been much less secure to use authenticated SSL instead of PGP. But that doesn’t mean Assange couldn’t have combined both of these approaches for maximum security. To pull this off, he’d have to have given Leigh two separate passwords: one for the authenticated login, and one for the decryption. He’d have to have set up a web server with SSL and an authentication system, and created a user account for Leigh, with the password given, then encrypted the file with PGP and uploaded it into Leigh’s private directory on the server. Then, he would have had to ensure that Leigh was actually handshaking with his server, and not a man in the middle — WikiLeaks would need a valid SSL certificate. He could have obtained one from a CA, but as I said above, that could have been difficult given the political state. He could alternatively have self-signed it, and then instructed Leigh to install the WikiLeaks root certificate into his browser. Then he would still have had to trust that there was no dodgy software on Leigh’s computer and that his browser was up to date. And after all that, he would still have had to tell Leigh to take the encrypted file off the computer onto a separate offline machine in order to run PGP to decrypt the file. If, after all that, Leigh disclosed the password, it would probably have been okay, because it is unlikely that anybody would have intercepted that encrypted file.
But think about it from Assange’s perspective (without the benefit of hindsight). Why would you go to all the trouble of setting up and training someone to use the above setup (someone who, by his own admission, isn’t even capable of using 7-zip — imagine guiding him through the steps of installing a root certificate in the browser), when a) your basic security is already bulletproof assuming nobody divulges the passphrase, and b) the additional more complex security is riddled with the sorts of holes I mentioned above? I just wouldn’t consider it to be worth the effort. I would consider “just” PGP to be “good enough”. You might say, “but Julian, this is a super duper top secret document, and you really should go to the extra effort.” You may have a point. But I don’t think it’s fair to blame Mr. Assange for “only” using the most robust encryption known to man once, rather than using it twice.
Let’s be clear on how secure the PGP solution is: You are sending encrypted data over the Internet. I still can’t believe that I’m reading technical people berating WikiLeaks for sending an encrypted message “in the clear” (i.e., not taking pains to prevent the encrypted document from going public) — this is a silly argument because by definition, an encrypted message sent “in the clear” is not in the clear — it is encrypted.
Why didn’t WikiLeaks use asymmetric (public/private key) cryptography instead of symmetric (passphrase)?
It wouldn’t have made a significant difference to the security. A lot of people are saying this, going on a general understanding that “asymmetric cryptography is better than symmetric.” Asymmetric cryptography is useful for a lot of things symmetric isn’t, but that doesn’t make it intrinsically “better.”
The asymmetric solution would have been for Leigh to generate a public/private key pair using PGP, and keep his private key safe. Leigh would have had to send his public key to Assange, who wouldn’t be able to trust that it belonged to Leigh because it was sent over the Internet. They would still have to have met in person in order to exchange Leigh’s public key (for Assange to be totally confident that the public key did in fact belong to Leigh) — this sort of in-person meeting is not out of the ordinary for even normal people not engaged in secret international diplomacy. Then Assange would have been able to encrypt the document using Leigh’s public key and send it over the Internet to Leigh, who would then have used PGP to decrypt the document using his private key. We shall assume that, as above, the encrypted document managed to get out into the public sphere.
Note the similarities here between the symmetric and asymmetric version. It would still require an in-person meeting, and it would still require that Leigh kept on his computer a secret which would expose the document if it ever got out. You might say that Leigh wouldn’t have divulged his private key, because he knew how important it was, but then again, you might have thought the same thing about a super top secret passphrase. Fundamentally, both systems are equivalent.
The reason why asymmetric cryptography is useful over symmetric is that it doesn’t require individualised in-person key exchanges. Once you have established the trustworthiness of a public key, you can send encrypted documents to that person forever. Since this was presumably a one-time exchange, and the two would have had to have met in person anyway to be very confident in establishing trust, there is no advantage in this case of using asymmetric cryptography over symmetric.
Why didn’t WikiLeaks choose a meaningless passphrase?
This is a criticism I share. If you have seen the actual passphrase, you will know that it is an ordinary English phrase which is fairly descriptive of the source material. It is long and properly padded with random characters, so would likely never have been brute forced. But it still would have been best if the password was a sequence of random characters, not a sentence. In particular, Leigh possibly wrote the passphrase down in the book to show how tantalising it was when he saw the passphrase. In that sense, the passphrase forms part of his narrative, in a way that “6105ffc4#098f#4f99#acab#c2bcf7b71bd4” would not have. A similar argument would apply to the private key option discussed above (by definition, a private key is not a meaningful phrase).
But once again, Assange would have had to reason “if I choose an English passphrase, Leigh might be tempted to write it down in his book, so I had better choose a random string of digits.” That thought probably never crossed his mind, because the idea that Leigh would write a passphrase in a book just because it was an English passphrase is nonsensical.
Why didn’t WikiLeaks remove the file immediately after the transfer?
It’s not clear to me whether they did or didn’t do this. Spiegel claims that Assange simply left the file there for an extended period of time. This probably wasn’t wise, but again, it wasn’t tremendously irresponsible either, given that the file was encrypted, and had already been public for several hours anyway. Assange could have assumed that the file was already public, and so if the password leaked out, the damage would be done either way. Still, if this is true, it doesn’t sit well for Assange. It would have been a good precaution to delete the file anyway.
Why didn’t WikiLeaks remove the files once they discovered the password printed in the book?
It was too late. This was the Guardian’s retort when this whole thing exploded yesterday: “No concerns were expressed when the book was published and if anyone at WikiLeaks had thought this compromised security they have had seven months to remove the files.” This statement betrays a fatal misunderstanding of the issue here. By the time the book was published, WikiLeaks had no control over that file. It wasn’t a password to a file on the WikiLeaks servers. It was a password to a file that was, by that time, mirrored on hundreds of other servers and circulating the Internet at large. There was no way to delete it. On the other hand, there wasn’t really any evidence linking the password to the file, so there was a chance that nobody knew about it. WikiLeaks did the best thing they could do: they said nothing about it to anyone. Seven months later, they finally broke their silence after it became clear that the Internet was piecing things together.
Why didn’t WikiLeaks simply set the encrypted file to expire after a certain period of time?
This is one of the most common criticisms. Many technical people have stated: “PGP lets you set keys to expire at a certain date. Assange could have used that technology.”
It isn’t true. In fact, if you have such technology, Microsoft, Sony and the MPAA would love to hear from you, because that would be the holy grail of DRM technology. You can’t make bits that “expire” or “self-destruct” — they are infinitely copyable and don’t disappear unless every single person who has a copy of them simultaneously and voluntarily deletes them. That tends not to happen.
When people say “PGP lets keys expire,” they’re thinking of signatures, not encryption. PGP has two main uses: signing files (“is this file really from the person who claims to be its author?”) and encrypting files (“nobody can read this file except its intended recipient”). On the opposite side, you can verify a signature or decrypt a file. When verifying a signature, it is in the client (the one verifying the file)’s best interests to fail if there is any doubt, so the client will first check that the signature is valid, and then check that the key’s expiration date is in the future compared to the computer’s clock. You could set your computer’s clock back if you wanted to trick your computer into passing the verification, or hack the source to PGP to disable this check, but then you would only be hurting yourself: you’d be tricking yourself into thinking the signature is valid when in reality it might not be.
With decryption, it’s the other way around. It is in the client (the one decrypting the file)’s best interests to succeed wherever possible. I can imagine a version of PGP that first decrypts the file, then checks whether the key’s expiration date is in the future compared to the computer’s clock, and if it isn’t, it says “could not decrypt.” But that would be pointless, because any hacker who really wanted the contents of that file would either set her clock back, to trick the computer into passing the time check, or hack the source to PGP to disable it. Since there is this obvious possibility, PGP doesn’t include that feature at all. You cannot make bits rot.
Returning to the DRM analogy, it has been pointed out that companies like Ubisoft and Blizzard make games that require an “always-on” Internet connection to work. Why couldn’t Assange put similar DRM on the files, requiring Leigh to be always connected to his server to access it (and therefore, have the ability to revoke access when necessary). The problem is that DRM is just all of the above discussion wrapped in a different coating. Most DRM relies on security through obscurity — the video game software is hard-coded to fail without an Internet connection, but the data is still present locally. That’s why most of these games get hacked (and if people are willing to put the time into hacking Driver, surely it’ll be worth it to get the WikiLeaks cables). The “phone home” DRM would come in one of two forms: either the data is stored locally, but it is encrypted, and the decryption key would be sent from the server, or the server would be streaming the data live upon request. Both of these are vulnerable to the same problems as the SSL scenario I talked about above: either you are sending the key, or the cables, over the wire, and that isn’t good. It is much safer for Assange to meet Leigh and hand him a piece of paper than to be sending keys over the network. It is really not feasible to use DRM to make a document expire after some time.
It is possible that Leigh believed that Assange had used this sort of technology, not being a technical person. But that was his misunderstanding — one that he should have clarified before publishing the passphrase in a book.
Why did Assange tell Leigh that it was a temporary password?
Whether or not he said this is something we’ll never know the answer to, since it’s WikiLeaks’ word against the Guardian. It’s not scientific of me to make guesses like this, but I’m going to, because I know Mr. Assange’s reputation. Before he was a WikiLeaks activist, Assange was a cryptography researcher. He created the Rubberhose file system to allow people to safely carry digital secrets without divulging their existence. I cannot say for sure what Assange told Leigh about that passphrase, and I have never met Mr. Assange, but judging by his reputation alone, he knows cryptography inside out. He knows which pieces of information are safe to divulge, and which aren’t. I find it hard to believe that Assange would have accidentally told Leigh that this was a temporary password, when we know just by virtue of the fact he used PGP that it wasn’t temporary.
If I can make some further speculation, I would imagine that Assange told Leigh something along these lines: “I am going to give you access to a file on my web server that will be temporarily available. After a few hours, the file will not be available any more, so you have to download it soon. Also, here is the password which you can use to decrypt the file.”
It’s possible that a non-technical person may have misunderstood the above sentences as suggesting that the password would be useless after those few hours. That still doesn’t excuse the divulging of a password. If someone says something about a red button which you didn’t fully understand, it is probably not a good idea to push the red button.
Also, I can only imagine that Assange did stress the utmost importance of keeping the password secure, and not writing down the additional “salt” word — after all, why would he tell him to remember the salt in the first place if it was safe to write it down?
Why did WikiLeaks re-use the same password for subsequent distribution of the same file?
It isn’t clear how that encrypted file got out, or whether or not WikiLeaks reused the password. This is an (as far as I can tell) unsubstantiated claim that is going around. But let us assume that this is the case. That is: We shall assume that subsequent (or prior) to their dealings with Leigh, WikiLeaks distributed the same cables to other journalistic organisations (as they are known to have done), and in their haste, used the same passphrase to avoid re-encrypting the data. I have found no evidence to support this, but let’s assume it for the time being.
It has been claimed that this violated the principle of discretionary access control, or similar: much like you wouldn’t let different users run software under the same user account, you shouldn’t use the same password for different organisations. There is some wisdom to this. Let us assume that WikiLeaks gave the encrypted file to N organisations, and that there is a probability p, where 0 ≤ p ≤ 1, that any given encrypted file will leak out, and another probability, q, where 0 ≤ q ≤ 1, that the corresponding passphrase will leak out. If you make a separate encrypted bundle for each organisation, then the document will be exposed if and only if one of the encrypted files leaked out, and the passphrase corresponding to that same file also leaked out. It doesn’t matter if the Times encrypted file is leaked and the Guardian key is leaked, since the key doesn’t unlock that file. The probability of a “game over” scenario (the cables go public) is 1 – (1 – p×q)N. That’s because p×q is the probability that both the data and key for the same file get loose, (1 – p×q) is the probability that they don’t both get loose, and N is the number of keys you have to worry about (so multiply the safe probability that many times, then invert it to get the game over probability). For example, if p = 0.3, q = 0.1, and N = 10, the game over probability is 1 – (1 – 0.03)10 = 1 – 0.9710 = 1 – 0.74 = 0.26. Compare that to the scenario in which you make the same bundle and passphrase for all the journalists. Now the document will be exposed if and only if any of the encrypted files leaks out and any passphrase leaks out. The probability of the “game over” scenario is now (1 – (1-p)N)×(1 – (1-q)N). That’s because we now measure the individual probability of any data leaking and any key leaking, and multiply them together. This gives us much worse odds: For example, if p = 0.3, q = 0.1, and N = 10, the game over probability is now (1 – (1 – 0.3)10)×(1 – (1 – 0.1)10) = (1 – 0.710)×(1 – 0.910) = (1 – 0.028)×(1 – 0.35) = 0.97 × 0.65 = 0.63. So the underlying maths confirms our intuition that it is better to give each person we deal with a different key.
But now let’s try to make some reasonable estimates for p and q (and continue assuming that N = 10). Given how important the passphrase is, we’d like to think that q is extremely extremely low. Again, Assange was operating under the assumption that q is near 0, but for the sake of argument, let’s keep q at the generously high 0.1. What is a reasonable value for p? As I’ve stated above, we are operating under the assumption that all of the encrypted data is public, at least for a short while. Every time we share this (for each N), the data will be on a public web server for several hours: a web server that is possibly being mirrored by thousands of other computers across the world in real-time. So if I had to guess, I would place p at 1, or very close to it. That’s not a silly assumption, given that again, the whole point of cryptography is that it is safe to assume the encrypted data is being seen by everyone. Basically, although he didn’t realise it, Leigh was betting that p was 0 when he published the passphrase. Now let’s run the numbers again with p = 1 and q = 0.1. If you give everybody their own individual key, the game over probability is 1 – (1 – 1*0.1)10 = 1 – 0.910 = 1 – 0.35 = 0.65. If you share the key between all the journalists, the game over probability is (1 – (1 – 1)10)×(1 – (1 – 0.1)10) = (1 – 010)×(1 – 0.910) = (1 – 0)×(1 – 0.35) = 1 × 0.65 = 0.65. If p (or q) is 1, then it makes no difference. If you are operating under the assumption that all the encrypted files are being captured by the public (and we are), then you cannot afford to let any single key slip out. There is one other advantage to having separate keys for each journalist, and that is that if a key does slip out, you know who to blame. But that isn’t the problem here.
To use a real-world analogy, say there is an office with 10 rooms and 10 keys. It is more secure if each key only unlocks one room, because then if someone steals a key, they can only access the contents of one room and not all 10. But that isn’t a good analogy for the cables because all the organisations were given the same file. If any one leaks out, it’s game over. So a more appropriate analogy is if there is an office with one big room with 10 doors and 10 keys. Now it doesn’t matter if each key only unlocks one specific door, or if all the keys unlock all the doors. Either way, if someone gets one key, they can steal everything in the room. Given that the whole point of this operation was to trust the journalists with the entire set of cables, there isn’t much more security you can have. You simply have to trust that none of the journalists will reveal his key. It doesn’t matter whether all the journalists have the same key, or if they are given individual ones.
In any event, p is probably not quite 1, so it probably would have been prudent for WikiLeaks to use separate passphrases for each journalist, and it certainly wouldn’t have hurt. But did they? It isn’t clear that Assange distributed the same file and passphrase to multiple people, and it isn’t clear how the file leaked out at all (more on that in the next section). But the point of this section is to show that, however negligent it was to leave that encrypted file on a hard drive, it pales into insignificance next to the negligence of divulging the passphrase. Again, we are operating under the assumption that the encrypted file is available to the public (there is no other way to run a cryptosystem), and therefore we require that the password is secure. We assume that p is 1 or close to it, and we require that q is as close as possible to 0.
How did the encrypted file get out onto the Internet?
I’ve put this question last, as it’s the least important. Hopefully by now, I’ve made the case that it doesn’t matter how the data file got out, who is to blame, or whether or not it was deliberate, because of the basic tenet of cryptography that your encrypted text is not something you need to keep secure. But it deserves to be addressed. I have read many theories, and have not yet seen any clear evidence about this:
- First and foremost, there is some chance that it was grabbed from the WikiLeaks server during the hours that it was online for Leigh to download.
- According to the Spiegel article, the file was only used the one time (to give the data to Leigh), but it leaked out because it wasn’t deleted by Assange, instead remaining on the server for an extended time before it was ultimately mirrored.
- An insider at WikiLeaks could have posted the encrypted file online (either accidentally or deliberately).
- The encrypted file could have been leaked from the Guardian (either accidentally or deliberately). WikiLeaks claims that “David Leigh and the Guardian … violated … our requirements that the unpublished cables be kept safe from state intelligence services by keeping them only on computers not connected to the internet.” If this claim is true, it is not unlikely that the files were stolen from the Guardian computers by government agents or others.
- If WikiLeaks did use the same passphrase and encrypted file to give the cables to the handful of other organisations they worked with, any one of them could have leaked the encrypted file online (either accidentally or deliberately).
Basically, we don’t know how. But it is the least important fact in this debate.
Update: Several people have written to point out that the encrypted data file, z.gpg, was published by WikiLeaks themselves as part of the November 28, 2010 dump of the full WikiLeaks server archive. A post on the front page of wikileaks.org still reads “All released leaks archived (2010-11-28): Due to recent attacks on our infrastructure, we’ve decided to make sure everyone can reach our content. As part of this process we’re releasing archived copy of all files we ever released – that’s almost 20,000 files. The archive linked here contains a torrent generated for each file and each directory.” That link (now removed), I am told, contained the encrypted file unlocked by Leigh’s password. (Thanks to Justin for a full explanation.) I must stress that this file was made available by WikiLeaks in November 2010, three months before Leigh’s book was published.
Clearly, an avalanche of “mistakes” has led to this unfortunate ending. I genuinely believe that Leigh never meant to disclose a live password, but this doesn’t excuse his ignorance of basic security practices (passwords are secret: even if you think they have expired, it’s best not to assume, especially when international diplomacy is at stake). Roughly outlined, the “mistakes” I have seen people claim are as follows:
- Assange decided to trust Leigh in the first place,
- Assange decided to send a file encrypted with PGP “in the clear” (without a second layer of encryption), allowing anyone online to get access to it,
- Assange chose to symmetrically encrypt the file with a passphrase, rather than require Leigh establish a public/private key pair,
- Assange chose a passphrase that was an English phrase and not random gibberish,
- Assange possibly told Leigh that it was a temporary passphrase (according to Leigh),
- Assange possibly left the file sitting on the drive after it had been sent (according to Spiegel),
- Assange possibly re-used the same file and passphrase to give to multiple journalists (according to some rumours I’ve seen going around),
- Leigh published the passphrase in a book.
The above dot points roughly outline what appears to be a rather damning case against Assange. But let’s not forget the first rule of cryptography: establish what is a secret and what is not. Once you have done that, you can set about protecting the secrets at all costs, and operate under the assumption that everything else can be seen by anyone, without any concern. It is not “lazy” or “irresponsible” to encrypt a file, and then send that encrypted file over a public network. It is not “lazy” or “irresponsible” to publish an encrypted file on BitTorrent and point at it with a public tweet saying “I have posted an encrypted file, which I would like as many people as possible to download.” (Assange did that too.) Much of the modern world’s infrastructure is built upon these cryptographic principles, from the DRM on your iPhone apps, to the transfer details you give to your bank website. Every single one of the “mistakes” Assange made was only a mistake in the context of the final error — the publishing of the passphrase in the book. Assange shouldn’t be held accountable for his “mistakes” because they were all rational decisions when operating under the assumption that your passphrase is safe. If you aren’t operating under that assumption, then it’s basically game over anyway.
This document has made the case that protecting the data from your collaborator wilfully publishing the passphrase is a lot more complicated than the media and the commentators are giving Assange credit for. He would have had to go to extreme lengths to seal up all of the issues I’ve mentioned, and that would have bought him a modest amount of additional security. But it probably wouldn’t have been worth it, and it’s pretty hard to make a case for doing so before the fact. What we’re really talking about in this article is shades of caution. Assange could have been more cautious, but in the end, you are pretty much hosed if your collaborator gives up the encryption passphrase, and if you want to blame someone, it should be them.
As my friend Richard said, “[The Guardian have] done the harm that Wikileaks never did but copped months of trash talk and threat of prosecution for.” WikiLeaks will likely take the fall for this, but as cryptographers, we know where the real blame lies.
Acknowledgements to Will for reading through this post and suggesting improvements.