Bristol Cryptography Blog: September 2015

Wednesday, September 30, 2015

Challenges in Cloud Storage

This post will discuss parts of the talk by Angelo De Caro (IBM Zurich) on 'data storage-oriented cryptographic protocols,' given at the workshop on secure and trustworthy computing in Bucharest.

The cloud provides an attractive opportunity for users and enterprises (collectively 'clients') to outsource their storage. Many providers offer low-cost storage, with straightforward management and ubiquitous access (from multiple devices). However users inherently lose direct control, meaning they have no guarantees about privacy or the future availability of their files.

Deduplication is the process by which a provider saves itself storage space (and consequently money) by only storing one copy of each file, and using some method of tracking which users own that file. This concept is so desirable because there is a large amount of redundancy in many contexts, such as media (movies, music etc.), system files/software and email attachments. When server providers want to deduplicate they have a number of choices:
- file-level or block-level (latter allows better dedup in systems where updates to large files are common)
- single-user/cross-user (latter is more desirable for providers)
- client-side/server-side

The final point introduces some interesting concerns. Server-side dedup means that the client needs to send the whole file to the server, meaning a high bandwidth cost. For client-side deuduplication Alice takes a hash of her file $F$ and sends this value $H(F)$ to the server: if the server hasn't seen this before then instructs Alice to upload the file, if not then deduplication occurs. This means that having $H(F)$ is enough to download the file (DropBox previously used a system where this was the case). This means Alice can send all her friends $H(F)$ where $F$ is a movie, meaning that the provider acts as a content distribution network (this is not a privacy problem: server doesn't want to become such a provider and will have to pay for bandwidth etc). This method also creates a covert channel that reveals which files are stored on the server: in the so-called 'salary attack' discussed by Pinkas et al. an adversary Eve has knowledge of what a company's payslips look like, so can learn an employee's salary by creating a large number of possible payslips and beginning this upload process for each one until the server informs Eve that it already has the file.

Proofs of Ownership
These issues mean we'd rather use proofs of ownership (PoWs)--a way for Alice to convince the server that she owns the file--and this means we need to avoid short file identifiers. In 2011 Halevi et al. suggested the following framework for such a paradigm: in the preprocessing phase the server stores some short info per file (file itself is located in some secondary storage), then in the proof phase (done only during file upload) there is some challenge/response mechanism. This procedure needs to be bandwidth-efficient, and computation (particularly for the client) should be efficient. The authors suggested a method using Merkle trees with special encodings, where the prover is 'challenged' on a certain block in the hash tree. In the preprocessing phase a server is sent the file and computes a Merkle tree, then stores the root. To prove, the server asks the client to present sibling paths to $t$ random leaves, the client computes them, server authenticates. This solution is bandwidth efficient, and space efficient at server side, but the client has to do quite a lot of computation. A client that knows a large proportion of the file will likely be able to 'cheat,' so we need a way to 'spread' the entropy across the file to stop the scenario like the salary attack. Using an erasure code is a good way of doing this (cheating probability $2^{-t}$) but constructions are fairly computationally expensive. An alternative approach was taken by Di Pietro and Sorniotti (AsiaCCS 2012), which is considerably more efficient at on the client side but worse on the server side, and the challenge values have to be recomputed when they are exhausted.

Proofs of Retrievability
Alice outsources a file and wants to know that it's retrievable, meaning not only that the server still holds the file but also that the server hasn't modified the file. There is a trivial solution: just download the file on a regular basis. A better approach is to use a keyed hash function and store $H(k,F)$; if Alice wants to verify she can just send $k$ to the server S and ask S to compute $H(k,F)$ and compare. This is storage efficient for Alice, but S needs to read the entire file and can only verify once. Even better is to use 'sentinels' (short, random strings): Alice embeds sentinels in random positions in $F$, encrypts block-wise, and sends this file $F'$ to server and keeps the sentinels. To verify Alice just asks for sentinel $s_i$ and checks if it is correct. Protocol means Alice doesn't need to store all of F and can detect large erasure (which would remove more than one sentinel), but Alice has to store the sentinels and cannot detect small erasure. One can improve this by computing $MAC(k,s_i)$ and appends this value to the file (server doesn't know to which sentinels these MACs correspond), only needing to store $k$. Alice doesn't need to store any of F and can detect large erasure but still can't detect small erasure. We can solve the 'small erasure' problem by using error-correcting code, however this makes it more expensive.

Confidentiality
Both PoWs and PoRs are tangential to the goal of confidentiality of files from an untrusted server. If two clients upload the same file, encrypted under their own keys, then we'd expect these two ciphertexts to be distinct and (assuming a strong method of encryption) the server shouldn't be able to learn that the two ciphertexts correspond to the same file. Douceur et al. gave an initial solution to this problem: hash the file and use this value H(F) as the encryption key, and this was generalised by Bellare et al. (Eurocrypt 2013). Since encryption is deterministic we can only expect some sort of security if files have high entropy, and indeed this approach allows offline brute-force attacks (Eve is sent a challenge ciphertext $C^*$, and if the message space is small Eve just computes hash of each file and creates ciphertexts until she finds a collision with $C^*$). The same authors of the EC13 paper gave a solution to this problem: their system called DupLESS uses an independent key server (KS), and a user engages in an oblivious PRF with KS to get an encryption key (this means that this key server needs to enforce a per-client rate-limiting strategy to stop brute force attacks). At CCS next month Liu, Asokan and Pinkas will present a paper that removes the need for the key server, by distributing its role among the clients using PAKE (Asokan gave a talk on this paper earlier in the workshop).

This presentation complemented Asokan's talk, Florian Kerschbaum's talk about computing on encrypted data (slides here) and the talk given by Marc Lacoste from Orange Labs who discussed the goals of the Super Cloud project and the security challenges involved in the development of 5G communication standards and IoT.

Monday, September 28, 2015

52 Things: Number 49: Describe the basic ideas behind IPSec and TLS.

This is the latest in a series of blog posts to address the list of '52 Things Every PhD Student Should Know To Do Cryptography': a set of questions compiled to give PhD candidates a sense of what they should know by the end of their first year. This week we discuss the basic ideas behind IPSec and TLS.

Internet Protocol Security (IPsec) and Transport Layer Security (TLS) both aim to create a secure communication channel between two parties over an insecure network. In general, both use some mechanism to establish a private session key (either pre-shared or via a key negotiation protocol) and use symmetric key cryptography for the bulk of the communication. There are some further details with regards to authentication but I'll skip over that. Although these two ultimately have similar goals, they differ considerably in their implementation.

IPSec sits on the network layer of the OSI model and aims to provide integrity, authenticity and confidentiality between two end points. As it sits on network layer, it blindly encrypts, MACs and packages up the data from the above layers before sending it down the line. This effectively creates a virtual network link between the two end-points without the need to ensure the end-point application has secured the data appropriately. This is often deployed for enterprise VPN solutions as it is a fast solution for remote access to an enterprise network. The downside however is that once a connection is up, it is tricky to restrict applications from using the connection once it is up.

TLS on the other hand establishes a secure connection at the application layer of the OSI model. We see TLS heavily used for securing web protocols such as HTTPS, STARTTLS etc. and as a consequence, each connection/application will negotiate/set up a secure connection independently. From a security perspective, this is quite attractive as a single compromised channel *should* have no bearing on the remaining channels. Whilst TLS can be viewed as a more flexible approach, it does incur some overhead over IPSec for a large number of connections between two nodes.

It's easy to get into very fine details but I think that should cover the 'basic' ideas of the two.

Saturday, September 26, 2015

52 Things: Number 48: What is the purpose and use of a TPM?

This is the latest in a series of blog posts to address the list of '52 Things Every PhD Student Should Know To Do Cryptography': a set of questions compiled to give PhD candidates a sense of what they should know by the end of their first year.

Before examining the point of this question (namely what the purpose and use of a TPM is) it's worth trying to understand the problem a TPM is designed to overcome. The problem is really one of trust. Trusting what? Well, primarily the memory and software running on a computer. These things can be directly accessed by the operating system and so secret information (such as cryptographic keys) can be accessed by an attacker who has access to the machine at the operating system level. If these keys are being stored directly in memory and being accessed by software, it could be fairly trivial for an attacker to read off the memory location where the keys are being stored and then compromise security.

One way around this problem is make sure that keys are never stored directly in the computers memory which can be accessed by software. Given that the keys are required for secure applications they must at some point be presented in a state that can be used by the software so how could this be possible? Well, one way is to protect the secret keys stored in memory by wrapping them using a key that the software does not have access to. By having a separate piece of hardware for instance that has a key burned into it and which is able to perform certain cryptographic operations with that key. This piece of hardware could therefore be employed by the software to do various things with this secret key that is stored on the hardware to do things such as wrap keys to be stored in memory, but never have access to this key directly.

This is essentially what a TPM does. A TPM has an RSA key pair called the Storage Root Key (SRK). The private part of this key is kept secret from everything and everyone. Using this private key, other keys (that software uses) can be wrapped (often called “binding”) using the SRK, protecting them from disclosure. In addition to simply wrapping keys, TPMs can also wrap keys and tie them to certain platform measurements. This type of key can only be unwrapped when those platform measurements have the same values that they had when the key was created. This process is known as “sealing.” TPMs can also be used for cryptographic key generation and perform other cryptographic tasks one of which is know as remote attestation, which creates a hash key summary of the hardware and software configuration allowing a third party to verify that the software has not been changed.

The real point to understand here is that by pushing security down to the hardware level and ensuring that it is given over to a separate piece of hardware that has it's own firmware and circuits that can't be altered from the outside, the system is not exposed to software vulnerabilities and is therefore more trustworthy.

So what is the purpose of a TPM? To overcome the problem of trusting (or rather not trusting) software to be completely reliable.

What is the use of a TPM? We mentioned a number of them. First of all was binding, which essentially wraps a key using the private key of the SRK. The second was sealing which also ties the wraped key to a particular platform measurements. And thirdly we looked at remote attestation and noted that TPMs can also be used for other cryptographic functions such as key generation.

Distance bounding protocols and physical layer security

Yesterday at the Summer School on Secure and Trustworthy Computing Srdjan Capkun gave an interesting talk on physical layer security and distance bounding protocols. I'll try to give some insights of why I found this talk particularly interesting.

Bounding the distance between two devices is becoming an increasingly important problem. Think of you contactless card, your car key. Crypto alone can not answer this questions. Indeed, this is a property of the underlying physical space, not of abstract data/identities. To solve that question, distance bounding protocols have been using time of flight measurements (mixed with some more classical crypto). The intuition is that if you know that the signal can travel from A to B in time less than t, then A and B are at distance at most t*speed of light. The talk wasn't so much about how to study formally such protocols (see [BCSS11] for example of such an analysis), but on the physical properties of the underlying systems needed for these protocols to actually work as intended.

The main assumption is that an adversary can not transmit information faster than the speed of light. This looks like quite a reasonable assumption, as the theory of relativity seems to be a hard problem to break ¹. However, as the talk made quite clear, the situation is more complex. Outputing a bit on a channel is not an instantaneous operation, and time matters a lot here. A nano-light-second is about 15cm, and usually transmitting a bit takes micro/milli seconds... This can practically be used in attacks where the adversary guesses the bit you're trying to send based on only part of the transmission. As a consequence it becomes essential for the length of a bit in that setting to be as low as possible. The technical details were somewhat lost on me, but, according to the speaker, it is possible to reduce this time to a few nano-seconds. This reduces the time of flight uncertainty to less than a meter[under the assumption that FTL travel is not possible], which is pretty good.

The second unusual problem tackled by this talk is as follows: assume that the machine that wants to prove its proximity to you is adversarial, can we still bound the distance? Distance measurement between A and B is done by A sending some signal to B and B answering a processed version of this signal back to A. A then computes the distance: (total time - processing time)*c/2. An adversarial machine can not cheat on the time of flight part, but it can cheat on the processing time. In the end, not only do we need short bits, we also need extremely short processing time. This means that for this approach to become practical, we need to build systems that receive, process and send signal in the span of nano-seconds. This talk provided with an example of such extremely efficient, completely analog computing/transmiting node. Interestingly this also entails that the kind of computations you can do in that framework is quite limited, making the problem extra interesting.

The talk was concluded by a fun use case: secure positioning. In a world where drones and self driving cars are getting more and more comon, the ability for someone to make your system believe it's in the wrong location, this might well become a real problem. GPS positioning is far from secure (see https://en.wikipedia.org/wiki/Iran%E2%80%93U.S._RQ-170_incident), and there is little hope that any similar non-interactive system will ever yield the integrity guarantees we need. This nicely justifies the need for secure distance bounding protocols with the appropriate architecture.

All in all, after this talk my impression is that the interactions between the physical layer and the protocol layer might very well be the key for future developements of secure distance bounding.

1 I guess that if you break it, distance bounding will anyway be the least of your problems^↩

Thursday, September 24, 2015

Intel SGX: The Death of MPC?

An earlier version of post contained errors about the signing of binaries. Thanks to Guillaume Scerri for pointing them out.

On the first day of the summer school on secure and trustworthy computing, considerable time was dedicated to speakers from Intel talking about security-related CPU features. SGX is an upcoming such feature. Even though the introduction seemed to be more directed towards programmers rather than security researchers or even cryptographers, I was left with the following insights.

The core idea of SGX is to obscure the memory of an application from the operating system. Motivated by possible flaws in the OS, the applications themselves are separated from the OS. This is achieved by encrypting the application's memory using a key generated by the CPU, which never leaves the CPU. It is unclear to me whether SGX will hide memory access patterns of the application. However, this probably can be achieved using oblivious RAM within the application. Furthermore, I/O will be still be controlled by the OS, thus leaving the possibility of keylogging etc.

In order to convince a remote party that the binary is indeed running on SGX, processors will contain a private key to sign the binary being executed, with the corresponding public key being provided by the manufacturer. It is not clear whether the private key will be global or per CPU. A private key in the CPU would make it easy to also have the binaries encrypted, but it seems that this is not planned at the moment.

With SGX, Intel obviously targets cloud applications in an attempt to restrict the cloud provider's access to data in the cloud. Since there are no processors with SGX yet, it is hard to estimate the cost and the feasibility for SGX on mobile processors. However, if SGX eventually comes to mobile processors, rooting or jail-breaking becomes somewhat useless because the power of the OS will be restricted. This could be seen as another step of an evolution that went from proprietary software computing locally on personal data to personal data being held in the cloud. With SGX available on all processors, an app provider could exercise almost complete control over all cloud and mobile instances of the app, given the trustworthiness of the processor manufacturer. As a result, apps might effectively become black boxes with all interfaces defined by the app provider.

The second talk on SGX suggested its use as a replacement for multiparty computation as follows: The parties agree on a piece of software executing the computation. The software is then run using SGX and generates a public key, which the parties use to encrypt their inputs. After running the computation, the software only outputs the previously agreed results to the parties. One can argue that the security should be similar to the security of MPC because one has to trust the processor to execute the local computation correctly and without leakage to the adversary. Therefore, one might as well trust the processor with SGX to execute the software correctly while preserving confidentiality by encrypting the memory. However, this scheme introduces the CPU manufacturer as a third party possibly knowing the private key, which is used to confirm that the software runs on SGX. The manufacturer is inherently trusted not to collude with any party. More concisely, one might ask: You trust your CPU. Do you also trust the manufacturer?

Tuesday, September 15, 2015

CHES 2015: Evaluation and Improvement of Generic-Emulating DPA Attacks

On Tuesday afternoon at the ‘Palais Grand Large’ (or, as my long-unrehearsed school-level French childishly insists on translating it, the ‘Big Fat Palace’), Weijia Wang presented ‘Evaluation and Improvement of Generic-Emulating DPA Attacks’. This paper (joint work with Yu Yu, Junrong Liu, Zheng Guo, François-Xavier Standaert, Dawu Gu, Sen Xu and Rong Fu) is of particular interest to me as it builds on work done in part at Bristol and presented at CT-RSA 2014 ([WOS2014]; ePrint).

‘Generic DPA’ attacks -- methods which succeed against an arbitrary physical implementation without the need to meaningfully characterise the power model -- have been hotly pursued in the side-channel literature. However, it has been shown ([WOS2014]) that all such proposals rely on noninjectivity of the target function (among other properties) to succeed. Attacks against injective targets necessarily fail unless provided with some compressing mapping corresponding meaningfully to something about the true leakage of the specific device.

We introduced the notion of ‘generic-emulating DPA’ to enable key recovery attacks against injective functions such as the AES S-Box requiring only some minimal intuition about the form of the leakage. The instantiation we presented used stepwise linear regression to find the key hypothesis producing the most ‘simply explained’ model for the measured leakages (see [WOS2014] for details).

Wang et al. experiment further with our proof-of-concept example and show that the stepwise approach is unstable in small samples. They suggest two alternatives -- ridge-based and lasso-based regression -- which perform better in practice. Against low degree leakage functions, the best difference-of-means (DoM) attacks outperform all three tested generic-emulating methods. However, against complex leakages (for example, a polynomial in eight target bits comprised only of terms of degree 5 and above) their proposals demonstrate an advantage.

They also incorporate cross-validation into their strategies, which is shown to further improve outcomes so that even against typical smartcard leakages the ridge-based and lasso-based approaches begin to rival the best DoM attacks. The question and answer session drew attention to the computational cost of such methods relative to (cheap) DoM, but the authors were keen to stress that this was not prohibitive. Whilst no thoroughly convincing application scenario has yet been discovered for generic-emulating DPA, it is interesting to see further progress from our basic suggestion into more practical territory.

[WOS2014]: The Myth of Generic DPA … and the Magic of Learning, Carolyn Whitnall, Elisabeth Oswald, François-Xavier Standaert, CT-RSA 2014, vol 8366 LNCS pp 183-205.

Secure Protocols in a Hostile World

It is Tuesday afternoon of CHES in Saint Malo and I will be blogging the only invited talk of the conference entitled "Secure Protocols in a Hostile World" given by Matthew Green from John Hopkins University.

The talk begins with a slide entitled "Cryptography is a solved problem" and contains lots of quotes such as "Crypto is the strongest link in the chain" and "Why would you focus on making the strongest part of a system stronger". It is explained that this is wrong and the remainder of the talk is justifying why these people's views aren't correct and that we should still care about creating new crypto. Crypto can be seen as being split into the following five parts:

Algorithms
Protocol Design
Implementation
Library API Design
Deployment and Correct Usage

We as cryptographers spend most of our time working on the first point and we are very good at it but the further down the list you go the more unsolved the problems are. Taking implementation as an example there are schemes (such as ElGamal) which are secure because they come with proofs of security but implementing them is hard and if not done carefully leads to sidechannels which allows the scheme to be broken.

Why does it matter that we focus on the first point and leave the rest to others? Well Heartbleed and FREAK to name two reasons why this matters and is becoming more of an issue. In 2015 we seem very good at designing new strong crypto but have still not got the hang of deploying strong crypto out into the real world. We will now look at a case study.

SSL/TLS

SSL has been described as "The most important security protocol on the Internet", however recently it has had its own share of troubles, including; Heartbleed, Lucky13, FREAK, CRIME, BEAST and RC4 (well unlike the rest RC4 isn't an attack but has caused its fair share of trouble recently).

So below I will give a few examples of issues with SSL/TLS;

CBC with MAC then Encrypt
Bad use of IVs
Compression
RC4 (I think my favourite quote of the conference so far has been "When RC4 is the solution, you need a better problem")

All of these problems are well known - I mean MAC then Encrypt not (always) being secure is taught in undergraduate crypto courses so what is it doing in TLS?! The answer is communication, the people doing the implementation are not cryptographers so don't always know about these things and cryptographers tend to be looking for the 'next big thing' instead of working within all five of the points above. To further emphasize the need for communication; when FREAK was released three TLS implementations had the exact same bug that allowed FREAK to happen. There is a clear disjoint here that needs to be resolved.

Another example was then given in the form of a downgrade attack from DHE to DHE_EXPLOIT and then solving the corresponding discrete logarithm problem. This is known as LogJam and will appear at CCS15 so I won't go into detail here but it is certainly worth looking into (and was fixed in browsers only a couple of weeks ago).

Conclusion

I am going to conclude using the same bullet points that Matthew used on his own slides as I think this hits the nail on the head.

Cryptography is hard

Cryptographers fail to push best practices to engineers

Engineers fail to pull best practices from us or from the literature

Cryptography is becoming more complex

It has got to the point this process can no longer be tolerated and something needs to be done about it