Thursday, January 23, 2014

Random Number Generation, Revisited

On the last day of the workshop Yevgeniy Dodis gave a fast-paced talk on his recent CCS paper "Security analysis of pseudo-random number generators with input: /dev/random is not robust" [http://www.cs.nyu.edu/~dodis/ps/rng.pdf]. It had something for everyone, from a complicated security model to brightly coloured charts showing implementation performance data.

Classically a PRNG is modelled as a function that takes a small secret seed of good randomness and expands it into more randomness. However finding this initial core of randomness is hard, so PRNGs in the real world take in extra input from whatever unpredictable sources they have access to. In practice (in Linux) this data comes from the timing of disk accesses, mouse movements, keyboard presses, and system interrupts [http://eprint.iacr.org/2006/086.pdf].

In 2005 Barak and Halevi [https://eprint.iacr.org/2005/029.pdfformally modelled the idea of a PRNG with input and came up with the security property of robustness, capturing the property that the PRNG can recover from internal state compromise. Dodis et al. strengthen this notion, requiring that it do so even if fresh entropy is accumulated slowly.

The adversary is powerful, and has control over the input as long as it gives an accurate bound on the entropy of the data stream it provides. Understanding of the security model is hampered by the bizarre choice to refer to public parameters as a seed, while the value S plays the role of the seed in a traditional PRNG. But in the real world no one cares about the choice of notation in your security model, so on to the practice.

The potential problems uncovered in the Linux PRNG that prevent it achieving robustness are an entropy estimator given the hugely difficult task of keeping track of how much entropy is in the internal state, and a mixing function that folds in new entropy in a linear way. The entropy estimator values irregular events highly, so that a stream of regular events with unpredictable data will be considered to have low entropy, while irregular events with predictable data are considered to have high entropy. Dodis et al. use this observation to fool the PRNG into outputting bits when it has insufficient entropy in its internal state.

This attack relies on the PRNG ignoring fresh entropy when it estimates it already has enough, which the maintainer for the Linux PRNG states is not how it behaves [https://news.ycombinator.com/item?id=6550256]. However no one is saying this is a realistic attack, and a more valid criticism Dodis made is that the Linux PRNG is very complicated. 

To this end he presented a simple construction meeting the strong notion of robustness. It makes use of a traditional PRNG and some field arithmetic, and looks clean and modular, while the pretty charts show it's competitive with the Linux PRNG. The construction echoes Barak and Halevi, and further back Ferguson and Schneier's Fortuna [https://www.schneier.com/book-ce.html] in ditching the entropy estimator. The takeaway point is that entropy estimation is hard, so let's just not do it.



Friday, January 17, 2014

Efficient provable security from pencil to processor

The long, hard road from theoretical security to practical security is one that has received a lot of attention this year at Real World Crypto. We’ve had Moti Young remind of us of Adi Shamir’s words that “crypto is dead” because of practical considerations due to APTs, Bruce Schneier reassure us that the NSA is not capable of mathematic magic and has probably been using practical security holes to read data that should theoretically be unreadable and Douglas Stebila show us how that all known attacks against TLS have been against its implementation and never against the cryptographic primitives themselves.

A common theme seems to be repeating itself here in New York. Namely that the main problems in cryptographic implementation today are not to do with the security of the primitives but with the engineering. You may have an algorithm for which all the mathematics point towards a highly secure cryptographic scheme but if the implementation of that scheme leaks its state, the security that was proved in theory means nothing. So was there any good news this year at Real World Crypto for theoreticians lamenting the downfall of the cryptographic algorithms they took so long to prove secure because of engineering and implementation? Well, according to Francois Dupressoir there is.

In a talk entitled “Efficient Provably Secure Machine Code from High-Level Implementations” Francois presented his paper [1] with the same title in which a method of ensuring that the proven theoretical security of an algorithm can be carried through to its machine code representation. The method described for doing this was twofold. 1. Ensuring that the high level language (C in this case) is a secure representation of the algorithm and that there is no leakage beyond what is specified in the security definition. 2. That the code is compiled in such a way that the semantic correctness of the high level code (not just its consistency) is strongly maintained in the machine code.

The first of these was done using EasyCrypt to prove the security of the C code. EasyCrypt is an SMT-based interactive prover geared towards proving the security properties of cryptographic schemes. This was extended with a number of libraries to lower the level of abstraction of specifications and run to ensure that the C code is written to a specified definition of security. The main aim of this was to make the C code secure (as defined in the cryptographic scheme specifications) against idealized probabilistic operations in an enhanced security model where it is assumed that the attacker has access to execution traces meant to capture side channel leakage. This enabled the authors to claim that the C code had the same level of security as defined in the theoretical model of the encryption scheme.

Following the development of provably secure C code the second stage was to compile it into equally secure machine code. To do this the authors sought to strongly maintain the correctness (semantic preservation) of the C code using CompCert. CompCert is formally verified optimising compiller which has the proven capability of producing machine code with stong correctness and reasonable efficiency when compared with other general purpose C compilers. By using CompCert to maintain the correctness of the EasyCrypt tested C code, the resulting machine code could be taken to have the same security properties as the C code, which had previously been shown to have the same level of security as the theoretical encryption scheme.

This means that the level of security that was proved originally in theory for the encryption scheme can be proved for the machine code implementation of it, or to put it another way the security of the scheme can be proven from pencil to processor.

[1] Certified computer-aided cryptography: efficient provably secure machine code from high-level implementations. Jose Bacelar Almeida, Manuel Barbos, Gilles Barthe, Francois Dupressoir. 2013.

Thursday, January 16, 2014

Mid-game Attacks & the nuances of drop-in replacement

On the third day of Real World Crypto 2014, Moti Yung presented a talk entitled "Let's get Real about the real world". After beginning with a look at some key developments in cryptography from the past twenty years and some predictions on the direction of future security research, the focus of the talk was "midgame attacks", where an attacker learns the complete intermediate state. It was concluded with the teasing reveal of a SpongeWrap-like Authenticated Encryption method that I'm sure we'll be hearing more about later in the year.

The first section of the talk provided an interesting selection of events cited by the speaker as being important landmarks in the recent history cryptography for how they modified the attack model, starting with the development of side-channel techniques by Paul Kocher et al in the late 90's, which demonstrated that implementation details were security-critical. Citing recent trends, including Adi Shamir's comments at RSA2013 and related articles (such as Nigel Smart's essay on this blog) and especially the Snowden revelations, the speaker noted that we are entering a new paradigm, in which we must assume that the attacker is already within our machine.

Having justified why such an attack model is of practical interest, the talk moved on to "Mid-game Attacks". In the game an attacker is allowed to pause the algorithm in the middle of calculation, and read the entire internal state. The attackers victory condition clearly depends on the situation being modelled, but for example an attacker who 'pauses' an Authenticated Encryption algorithm like AES-GCM mid-calculation would learn the key or key schedule from the state, leading to a trivial key-recovery attack.

This provides a very powerful adversary, so the obvious question is are any current schemes secure against it? Well, as described in this blog clearly not: the attacker should simply stop calculation immediately and read off the inputs. So, we should not expect to secure an entire scheme to this notion. However, if we allow a small amount of pre- and post- processing to be done in a safe environment, the security game models a very real threat. There are many possible scenarios in which this can occur, such as in a computer with limited amounts of trusted hardware or HSMs. Another, very relevant to modern developments, is cloud computation, where a user wishes to push the majority of a computation off to an untrusted cloud without sacrificing security. Overall, formalisation of such a game appears to provide a neat alternative to the more general option of recasting a primitive in the 2-player MPC setting.

Sticking with symmetric crypto, are there any schemes secure in this new setting? The answer is yes, an examples of which are HMAC-SHA1 and HMAC-SHA256. To see this, recall that the HMAC of message m under key k is HMAC(k,m):=h(k,h(k,m)), where h(.) is the appropriate hash function. First, we begin the calculation of h(k,m) on the local (secure) machine, stopping as soon as k has been absorbed into the internal state of the hash function. Passing this state and the message m to the cloud (insecure) machine majority of the calculation of y=h(k,m) can then be pushed out to the cloud (insecure) machine, by simply resuming the calculation. Finally, we calculate HMAC(k,m)=h(k,y) locally. Since h is a hash function and thus pre-image resistant, no matter when the attacker reads the state of the cloud machine, he will be unable to recover the key.

What a reader may notice is that I did not list HMAC-SHA3 as an example. To be blunt, this is because it isn't. This is because the 'proof' given above sketches over a very important detail: when the attacker asks the state, he is also provided with the internal state of the hash function.  As such it is not simply sufficient for the hash function to be one-way, it's internals must be also.
This isn't a problem for hash functions build with the Merkle–Damgård construction (like both those given above) since they internally use a construction believed to be one-way. However, SHA-3 is different. As a Sponge construction, critical to it's security is the assumption that part of the internal state (known as the 'capacity') remains hidden. If this is leaked, the construction is reversible (at least until an unknown input is reached). The same observation can be made of similar constructions such as the SpongeWrap authenticated encryption mode.

A solution for this apparent weakness of SpongeWrap was presented by adding xor-ing the previous capacity into the new one (meaning s_r,s_c  f(s_r , s_c) ⊕ 0....0||s_c ). The effect of this is to make the iterating function one-way. Why this is not implemented in 'standard' Keccak constructions is simply because it leads to an increase in implementation size without strengthening the construction towards its stated objectives.

So, what are the key things to take away from this talk? The attack model is interesting, and the new Authenticated Encryption method presented may turn out to be very significant over time. However,  in terms of a workshop on Real World Crypto, perhaps the most important thing to take away from this talk is how careful we should be with primitives. When changing design paradigms, we must be wary of applications utilising 'accidental features': properties standard across all previous designs, but outside the definition of the primitive. If presenting a new primitive as a drop-in replacement for another, it must be just that, and if not it should be made clear to developers that some properties they may have expected will not be provided.

Tuesday, January 14, 2014

Bitcoin : Anonymous...sort of....not really...eventually.

TL;DR version:

Real World Crypto New York 2014 first session was focused on Bitcoin. Bitcoin is not anonymous to the average user. Zerocoin isn't really an option at the moment. Zerocash may be an option soon. Bitcoin network is evolving fast and as profit margins are being squeezed, it is not entirely clear what/where the future of the network is.

Full version:

The focus for the first session at Real World Crypto (RWC) has it's sights set on Bitcoin and specifically the questions around anonymity followed by a brief debate about some of the more philosophical questions around the Bitcoin system.

The three speakers and their respective talks titles were as follows:

Arvind Narayanan -- Is Bitcoin anonymous?
Matthew Green -- Towards making Bitcoing anonymous.
Yifu Guo -- Avalon

Arvind steps up as the first speaker with a review on some of the problems facing the Bitcoin system with respect to the anonymity.

First, a bit of context on how the Bitcoin systems works with respect to transactions and how users verify the validity of transactions.

Consider two users, Alice and Bob. Alice owns 1 Bitcoin and would like to send it to Bob. Alice begins by generating a transaction which states that she would like to transfer 1 Bitcoin to Bob and signs the transaction using her private key. The transaction is broadcast to the network, verified and subsequently incorporated into the blockchain history.

The verification of a transaction checks the following:

-- the signature on the transaction corresponds to the spender (in this case Alice)
-- The source of the spend (Alice) owns sufficient funds to make the transaction
-- There has been no other valid transaction made to spend the same coins (no double spend)

All transactions are publicly available in a ledger at http://blockchain.info or http://blockexplorer.com

One important aspect of Bitcoin is the addressing scheme. It is trivial for a user to generate a new address each time they wish to carry out a transaction. An example of this is the donation page for wikileaks which generates a new address each time the page is loaded.

It may seem intuitive that if you want to keep your identity secret, simply use a new address for each transaction. Sounds easy enough. In practice though, this proves to be tricky. Consider an example where Alice owns 3 addresses X,Y and Z with the balance of 5, 3 and 6 Bitcoins respectively. Alice would like to purchase a product from Bob which costs 8 Bitcoins. Alice can combine the balance of X and Y to send 8 coins to Bob. Assuming that Bob is to receive all 8 coins at a single address, this now creates some link between the addresses X and Y. Someone wishing to analyse the network will notice the link and collapse the transaction to a single address or identifier. This was first attempted by F.Reid and M.Harrigan PASSAT 2011[5].

The focus of the paper tracked the transactions of a heist that saw 25,000 Bitcoins being stolen. The paper doesn't go as far as to reveal the real identity of the thieves but clusters the transactions believed to be associated with them.

This is all well and good, and so far reveals nothing (or very little) about the identity of the Bitcoin users. This model works well if Bitcoin is used independently from any other service, but of course this would also make it largely useless. Some ways in which this an users identity is revealed:

- A user openly claims to be the owner of an address. This turns out to be a fairly common occurrence and if you look through a few posts on Bitcointalk.org donation addresses are then linked to forum usernames and in some cases, real names.

- A Bitcoin user may wish to exchange Bitcoins for other conventional currency or visa versa. Signing up to an exchange often requires verification (e.g. as on mtGox) and although the user details are not public, a subpoena to the company holding the details will quickly reveal the users details (possibly a scanned copy of their passport!).

- Service providers and vendors which accept Bitcoin may still require a registration which again falls to a subpoena and possibly other avenues such as data sharing.

It is likely that many more exist but these cover quite a large range of possibilities. Remaining clear of the above scenarios require a particularly paranoid user that would have to go to fairly hefty measures to avoid leaking their identity (such as only ever exchanging Bitcoins using cash-in-hand or only using services with no identity checking).

Two papers [1][2] attempt to analyse the public address data in conjunction with the transaction linkability heuristics to analyse and paint a picture of the network transactions.

Backtracking a bit, I mentioned that a user may use a different address for each transaction and this helps keep some semblance of anonymity by making it difficult to link and track transactions to a single user. One difficulty with this is the notion of 'change' from a transaction.

Recall the running example where Alice wishes to transfer Bitcoins to Bob. Now let's assume that Alice wants to give 7.5 Bitcoins to Bob but has three addresses X, Y and Z in denominations of 3, 5 and 6 respectively. Using addresses X and Y to transfer 7.5 Bitcoins to Bob and receiving the 0.5 Bitcoins as 'change' in a new address C. If C is ever used in conjunction with Z then the two are linked and once again can be collapsed under a single entity address. A paper by Meiklejohn et al. [3] include the 'change' transactions in their analysis to present an up-to-date picture of the network. Perhaps a good observation from the paper is the high centralisation in the services currently running on Bitcoin. This shows that a large number of the Bitcoin network flow has at some point passed through a large service provider (such as Satoshi Dice or Mt.Gox). This falls back to the subpoena problem highlighted earlier.

This is not to say that the heuristics do not have their limitations. The 'change' address detection is fragile and can easily lead to the mis-characterisation of traffic. There exist services such as mixing pools which attempt to mix Bitcoins, in effect Bitcoin laundering, but the volume of transactions are often too small to have a clear impact.

Matthew Green presented next to address some of the anonymisation concerns through the use of Zerocoin[4] and a new system built out of this called Zerocash.

I will not go into too much detail about this but Matthew assured us that a Zerocash paper will be published soon (accompanied with an implementation). In short, Zerocoin was introduced as a Bitcoin add-on to facilitate anonymity over Bitcoin. Forbes' Andy Greenberg also wrote an article covering the talk here.

The Zerocoin implementation was found to be too cumbersome and was never fully adopted by the Bitcoin community/developers. Zerocash addresses many of the issues present in Zerocoin by making use of smaller zero-knowledge proofs and a trusted third party. But we'll find out more about this in their publication later this year.

The final talk (or rather -- discussion) was presented by Yifu Guo from Avalon[6]. This sparked more of a philosophical debate rather than a technical presentation. Some interesting points were discussed such as:

-- Consider the Bitcoin network as it stands today. Pooled mining has led us back to a centralised system where the pools are effectively banks. It no longer follows the decentralised model which Bitcoin was once praised for. Yifu suggests that Saitoshi (Bitcoin's creator) may have been aware of this feature (or flaw). So the question should be; Did he design it this way or is it a flaw which has grown well beyond his expectations.

-- Power vs Return. I suppose this is more of a moral dilemma, profit at the cost of electricity (and by extension the environment). The efficiency of mining technology is constantly evolving, pushing the hash/watt ratio to its limit (some companies going down to 20nm ASICs[7] where Intel's current 'state-of-the-art' processor is 14nm). This being said, the amount of mining rigs coming online is also growing. Bitcoin was designed such that as the network hash rate increases, mining becomes proportionally harder. In turn, as more people jump on the bandwagon, to invest in Bitcoin mining equipment, the energy footprint becomes alarmingly high. Yifu estimates that at today's values, the cost of running the network is somewhere around 10% of the return. This, of course, relies on you having a reasonably new and efficient mining rig. As time goes on, the mining rewards will halve and the profit will need to come out of the transaction fees. Currently the fees do not make it viable to mine but this can easily change based on a few factors; pool policies (to only accept transactions that pay a fee), value of Bitcoin, volume of transactions etc.

-- Use(less) computation. As it stands, the Bitcoin network doesn't actually compute on anything useful. Can you use the ASICs to break hashes? In short - no. As the name suggests Application Specific Integrated Circuit (ASIC) means that most mining rigs perform a double Sha-256 hash. There is no allowance for re-purposing the device. So we can already write-off the hardware in that respect. On the other hand, there has been some interest in trying to use the ledger in some way. For example, it is possible to encode data into the Bitcoin ledger. This may be used as a way to draw up contracts which are public and can not be modified (unless the network falls to an attack).

It is worth noting that Avalon originally entered the ASIC race to promote decentralisation. Unfortunately this didn't quite take off and resulted in adding to the problem (of the monopoly on pooled mining).


[1] http://eprint.iacr.org/2012/584.pdf
[2] http://eprint.iacr.org/2012/596.pdf
[3] http://cseweb.ucsd.edu/~smeiklejohn/files/imc13.pdf
[4] http://spar.isi.jhu.edu/~mgreen/ZerocoinOakland.pdf
[5] http://arxiv.org/pdf/1107.4524.pdf
[6] http://avalon-asics.com/
[7] https://www.kncminer.com/categories/miners

Monday, January 13, 2014

Firefox Sync - ease of use vs security

As I am blogging from the Real World Crypto workshop, I wanted to choose a talk that has a real world use. Now based on the name of the conference all the presentations are to do with real world crypto but while bitcoin (for example) is real world, I wanted to choose something that is both real world and mainstream, that the average person uses, and hence choose to blog about "Firefox Sync: Securely Sharing Browser Data" by Brian Warner [1].

The talk was essentially a history of Firefox Sync and how it uses cryptography to provides this desirable functionality but before I go into this I want to recap what Firefox Sync actually provides.

Firefox sync is used to give you access to your Firefox bookmarks, history, tabs and usernames/passwords no matter which one of your multiple devices you are logged onto. This should be a seamless process; I could be browsing a website at work and create an account on it and when I get home this tab should still be open on my laptop with the account details stored on the machine for my use. Now this instantly implies several security concerns, the two major ones being as follows. Firstly no-one else should be able to pretend to be me and have access to my data, and secondly not even the server that is storing my details should be able to read my username and details to my online banking (or any other online account for that matter).

Now that I have recapped what the problem is that is trying to be solved, I will now give the history of Firefox Sync that lead it to be the product it is today.

Pre-2007: This functionality was not provided by Firefox itself (and wasn't even a plug-in) but it was provided by a third-party website. The first problem with this is that it isn't an integrated solution, you would log on to your third-party website of choice and it would display all of the links to your bookmarks. Due to the lack of integration it clearly couldn't do history and tabs, and for reasons that I will explain in a moment, it also doesn't do usernames and passwords. Now a bigger issue than the integration is the issue of security - there is none, everything is stored in the clear. This is why usernames and passwords being stored is clearly a bad idea but even despite this you don't want people to know what your favourite website is to look at photos of cats and thus this solution is not ideal.

2007: Firefox create a plug-in called Weave to provide this functionality. Now that it is integrated, as well as doing bookmarks, it can also do tabs, history and usernames/passwords. Despite being useful to have your history to hand on any device, Firefox having your history has another benefit - it uses this history to train the URL recommendation engine and thus this has its accuracy increased by having access to the history from all of your devices. The major leap forward with this iteration is that this is now all stored encrypted for your account, where you provide a passphrase which is used to generate your encryption key.

2010: Weave trys a new approach to sync devices using J-PAKE [2]. A side effect of this is that the user will no longer have a passphrase that they must create/remember to sync their devices. When the first device is set up it creates a random encryption key to encrypt all the data under (this is not known by the user). When the second (or more) device wants to be synced with the first the following happens; the first device creates a "pairing" key (think bluetooth pairing not elliptic curve pairing) and displays it on the screen. The user then takes this key and types it into the second device at which point the key is used with J-PAKE to sync all the data (including the random secret key generated by the first device) down to this new device. After this has been done the syncing between all devices will be done in the background without any user input.

2011: At this point Weave comes embedded into Firefox and a version for android is now released.

Feedback Phase: This approach turned out to be confusing to most regular users and almost solves a problem that isn't the major one that people have a use for. Instead of using it to sync devices most people were trying to use this as a back-up system for a single machine. They would have it on a single system and when that system dies they would get a replacement and try and get Firefox Sync to bring the data back but of course they can't get a code from the first machine and thus can not retrieve their data...

Q2-2014: The whole of Firefox Sync has an overhaul based on user-feedback. It now works like signing into any other website - each new device you add asks for your username/email and password and logs you on and then will sync all of your data. While this works like people would "expect" it also has its downsides - it trades security in favour of familiarity. Using passwords is inherently less secure, they can be brute forced and if your email is compromise they can wipe all your passwords (thankfully it doesn't give access to all of your other saved passwords/booksmarks) and access all of your bookmarks and other passwords. This version will be released in April this year.

Future Plans: There are (thankfully) plans to release another version that has the option to reuse the pairing option over the standard password option in favour of security. This option is fine for syncing between multiple devices but does not work as a backup solution.

This talk was interesting not due to the technical content but due to seeing how cryptography can be used to solve a problem that is used day to day by millions of people around the world. It was particularly interesting to hear about how the security and primitives had to be adjusted based on the fact that it was not working how the everyday user feels that security products should work.


[1] http://people.mozilla.org/~bwarner/warner-rwc2014/#/
[2] http://en.wikipedia.org/wiki/Password_Authenticated_Key_Exchange_by_Juggling

Real World Cryptography

Today saw the start of the third Real World Cryptography workshop. This follows on from two earlier meetings (one in Cambridge UK, and one in Stanford, USA). This years event is being held in a the Great Hall at the City University of New York, an amazing venue with stained glass windows, painted walls, and lots of ornate carvings.  The event is going from strength to strength with over 400 registrants at this years meeting, making it one of the largest cryptographic conferences to be held this year.
 
The programme contains a number of sessions ranging in topics from Bitcoin, TLS, PRNGs, Automobile security, Multi-Party Computation, Payment Card Systems. On Wednesday we are going to have a talk and  Q&A session with Bruce Schneier on "The Fallout from Snowden". Around half the talks are from industry and around half from academia. So true to prior meetings the event provides a meeting place where academia and industry can exchange ideas and problems.
 
The conference comes at the start of 2014, which has been called the year of encryption in a recent BBC article. It would appear that following the revelations last year, we are finally going to see encryption turned on by default in various applications. In addition many organizations are working to turn off legacy systems and move to more modern systems. 
 
One can see this in articles which say that companies are moving to "Forward Secure" systems; this is a form of cryptography where even if one records all your encrypted conversations it is impossible to decrypt the conversation in the future by breaking someones long term key. In other words for each encrypted conversation one needs to break an individual session key to recover the data. This is because we have learned to protect systems against very powerful adversaries who could be able to record all of our encrypted conversations.

Follow the conference on Twitter using hashtags: #realworldcrypto and #rwcw