Wednesday, July 23, 2014
The day opened with E. Clarke reflecting on the history of model checking before announcing his retirement. That talk took a bit of effort to follow as I wasn't familiar with the various combinations of temporal quantifiers - does AG imply AF or the other way round again? - but I caught the gist of it and it's impressive just how far we've advanced in solving practical instances of what's supposed to be an undecidable problem. The idea I caught is that instead of reasoning about a particular system, you define a set of axioms that you'd hope your system to have and express them in a suitable logic, then verify properties on the level of axioms alone. Anything you can prove here must hold for any model of the axioms, so all that remains for your concrete system is to enforce the constraints of the axioms and you know a lot about what your system will or will not do.
The second invited talk on Tuesday was Acquisti's "Privacy in the age of augmented reality." This took us through a number of experiments performed to explore people's attitudes to privacy. In one experiment, students were asked questions such as whether they'd cheated in exams, which they could also decline to answer. One group was told their results would only be shared with other students, another group that lecturers may see the results as well. As expected, the second group declined to answer more often. But this was just the control - the question was, how much of a time delay do you have to insert between giving people a privacy statement (whether or not lecturers can see the results) and asking the questions, for the difference in declined answers between the two groups to disappear? The answer, it turns out, is about 15 seconds.
In another experiment, researchers set up a number of fake facebook profiles with "unique" names and sent out job applications in these names using the same template to many employers across the US. The facebook profiles between two groups in the same experiment differed in one protected characteristic, so any difference in the response rates implies that employers (1) are looking up candidates' facebook profiles (if only by searching for the name - all these profiles were public) and (2) whether consciously or not, are discriminating based on information they should never have.
The result - sexual orientation didn't make a statistically significant difference anywhere. Religion however did: while in Democrat and "mixed" states there was no significant difference, in a Republican state saying that you're Muslim on your facebook profile all but guarantees you won't even get a response to your CV. For some reason they didn't consider gender in this experiment, but one could also argue that the existence of gender discrimination in hiring is proven beyond all reasonable doubt already*.
The augmented reality part involved an experiment to see just how good face recognition and data mining have got nowadays. Based on a sample of around 100k public facebook profiles linking names to pictures, Acquisti's group constructed an app that lets you point your phone at someone and take a photo, then uploads it to the cloud and tries to find them - if successful, it displays the name as an overlay on the screen. Within about a minute, they could in some cases find not just the name but also the date of birth and most digits of the social security number - can't quite remember what the thing about the encoding of SSNs is but it's somehow linked to a public register of births and deaths. In case the app wasn't creepy enough, google glass was mentioned as the next platform they wanted to look at.
* The following, while not statistically significant on its own, is one example: http://qz.com/103453/i-understood-gender-discrimination-after-i-added-mr-to-my-resume-and-landed-a-job/
Tuesday, July 22, 2014
As a case study, Barthe discussed OAEP and a few hundred variations thereof, parametrising them as a kind of Feistel network with several points where optional randomness can be added and then proving or disproving security of many instances of this model.
In the session on protocol verification, several authors turned their attention to key-exchange protocols. We're not short of models in this area but I find the formal-modelling approach a welcome addition to the area, particularly for its tendency to provide a concrete attack with each failed proof.
While the general probem "Is my Diffie-Hellman type protocol secure?" is undecidable in the formal world, Dougherty and Guttman showed that for a certain class of "lightwieght" protocols, a decision procedure exists. Their approach has two novel components: first, rather than just considering a term algebra, they explicitly model the exponent field as a finite field. Secondly, they show a finite-model theorem that gives the required decidability: a common problem in formal approaches to key-exchange protocols is that you can easily check whether or not there are any attacks that use at most 0, 1, 2, ... sessions but extending this to an unbounded number of sessions is undecidable. The class of lightwieght protocols in the talk have the property that one can find a bound b such that if there is any attack, there is also one with at most b sessions. This looks like a very promising approach to me that I'd hope one can generalise to more protocols: with a suitable bound, one hopes that one could let loose a theorem prover to "clean up" the rest of the problem.
The session ended with Mödersheim and Katsoris addressing the parsing problem. We know from Paterson et al. that there is a big difference between MAC-then-encrypt and MAC-then-pad-then-encrypt; the key to such TLS attacks as Lucky 13 is that the adversary can maul a message to change the way it is parsed. Any analysis of a protocol in a symbolic model with a term algebra operates at too high a level to capture such attacks, as a model of a term algebra implies a perfect encoding/parsing scheme. The mentioned paper refines a formal model down to the byte level and applies this to among other things the XML format and (simplified) TLS messages. A result of the paper is some easy to verify conditions on formats that could well be put to practical use, both for designers/implementors of new message formats and for security analysts who now have a tool to extend their symbolic model to the level of bits and bytes.
Monday, July 21, 2014
We had two sessions on cryptography in the afternoon but the session that I enjoyed most was the one on usable security in the morning; I wouldn't mind if having a mandatory usable security session in every conference to get the word out that humans behave in certain, scientifically understood ways and we shouldn't forget that when designing security protocols (insert standard rant about SSL certificate warnings here).
One interesting finding I took away from the talks on usable security is that if you ask people what security problems can occur for a specific task such as "logging in to a website", they will cite things like weak passwords, keyloggers etc. but as soon as you put the whole thing in a broader context of a primary task such as "an app for ride-sharing", people will list mostly task-specific risks such as "getting kidnapped" and not mention the IT security angle at all.
Also, risks linked to a primary task tend to elicit statements in a personal language "I'm afraid that I could get kidnapped" whereas IT risks were, if at all, described in an abstract and impersonal language "There could be fraud".
There is a lot that I could say about the banquet in the evening, but all I'll say here is that the format was unexpected yet enjoyable and the food was good, which is what really matters. We were also treated to an open-air thunderstorm that beats any firework show I've seen so far.
Sunday, July 20, 2014
Oh yes, the conference. Saturday opened with a panel on discrepancies between theory and practice in complexity (joint between CSF and other conferences). The story goes a bit like this: industry has a problem and gives it to some theoreticians. They define a general class of problems containing the real one and study how hard it is to solve a random instance of the class; the answer is usually "very". So they go and prove that the class is PSPACE-complete (NP-completeness no longer gets you a paper these days) or even undecidable.
Meanwhile, practicioners go ahead and solve the particular problem they were given: SAT-solvers and SMT-solvers are real things, and they tend to solve things. Interestingly, if you let them loose on a "random" instance of a class of problems containing your real one and they perform badly, but one thing discussed at the panel is why such algorithms perform unexpectedly well on the particular problems that arise in reality.
One attempt to bridge this gap has been to hold regular solver competitions, the winners getting a paper in the conference organizing the competition. Here, the common complaint is that solvers nowadays seem to be optimized for scoring points in these competitions, at the expense of progress on actual industrial problems. Also some of these papers seem to offer very little insight.
As a cryptographer, I'm glad that we're not the only ones with a chasm between theory and practice. How to bridge it? I have a small idea, perhaps not so much a bridge as a foundation on which one could support a pillar on which one could build a bridge, but before I propose it let's look at another talk I particularly liked. Back in 2012, IEEE S&P infamously sent an acceptance notification to all submitted papers, followed by an apology to all the rejected ones. (I had a submission there myself and have never quite forgiven them.) This led Popescu et al. to wonder how one would build a better conference management system. So they defined some actors (author, PC member, chair) and objects (paper, review, comment, decision) and did what looks like system-theoretic analysis to me to get some dataflow requirements and security constraints (reviewers can see papers they're assigned to; authors cannot see their reviewers' identities etc.) and formally coded all this, then verified its soundness in the theorem prover Isabelle. The theory they needed to do this seems quite involved, definitely not a trivial task - the main notion used is nondeducibility [Sutherland 1986] with improvements from a string of more recent papers.
At this point they could have written a paper on "yet another proposal for a new conference management system". Instead, they used a code generator to get a verified core, wrapped a web server around it and wrote client programs. Then, they used it for a real conference (a 2014 one on the theorem prover Isabelle) and it sounds like it worked a treat. The result is called CoCon [ http://www21.in.tum.de/~popescua/rs3/GNE.html ] and on the last slide of the talk we were all invited to try it out if we have an upcoming conference to organise.
Generating code from a fromally verified core is certainly a promising strategy - if we can do CoCon today, perhaps a SSL implementation tomorrow? What I liked most about the authors' work though is that they did not stop at "we've got the theory worked out, it looks practical to us, let's write a paper", they actually built the thing and used it for real.
Which brings me to my proposal. It's not actually that new an idea but based on the concept of technlogy readiness levels, as used in aerospace engineering for example. This is a set of well-understood levels describing how mature a technology is, from "drawing board" through "has been built and stress-tested in isolation" through to "used in production applications for X years and data on long-term perfomance in real environments is avaliable".
What I propose is that we introduce a similar classification for papers presenting new cryptographic technologies, which would greatly help us to share research outcomes with practicioners. The number and descriptions of categories are of course open to debate but here's a proposal off the top of my head. The classification only applies to papers proposing new constructions or schemes (or improving on existing ones); there are other classes besides the ones below such as work that synthesises existing knowledge, automates modes of reasoning or breaks an existing scheme. I propose that authors of papers presenting new constructions should be obliged to say which class they believe their constructions to lie in; whether this claim is justified will be one of the criteria by which papers are judged.
R : realised
This construction has been implemented and successfully used in practice to solve a real-world problem. Concrete examples of the type and size of problem solved are given. The value of such a paper is not in the theoretical insight that it provides (though such insights are by no means forbidden in a type-R paper) but in demonstrating the practicality of the construction and presenting insights gained during implementation and use. Concrete parameters such as the size of the problem solved, key sizes and timings are provided.
Examples of a class-R papers could be the CoCon one presented here at CSF or De Marneffe et al.'s work on using Helios at Louvain-la-Neuve.
I : implemented
The authors have implemented their construction, run it and provided timings, space requirements etc. for realistic problem domain and key sizes. The authors consider the implementation ready to be applied to real-world problems but it need not have been used "for real" yet. A central question in judging a type-I paper shall be whether the implementation stands up to the claim of practicality.
The 2008 paper by Adida et al. that introduced Helios to the world could be an example of a type-I paper.
P : proposed
The construction is worked out to the point that the paper could be used as a basis for implementation by engineers, without requiring any more theoretical advances to build the scheme. The authors further believe that an implementation of the proposed scheme is practical (for parameters specified in the paper) and the paper shall be judged among other things on these claims.
An implementation and timings are not required for type-P papers.
An example of a type-P paper could be the Oakland 2008 one on Civitas. While Civitas is definitely type-R today, the 2008 presentation contains the lines "This work is about advancing the science, not selling a particular system ... not suitable for running a national election (yet):"
A : advance in the theory
A paper in this class presents an advance that has no immediate claims to practicality yet but may be a step towards a practical solution. It could realise an existing notion under weaker assumptions or lower resource bounds.
The first advances in fully homomorphic encryption after Gentry's founding paper come to mind as examples of type A: orders of magnitude faster than the original yet still orders of magnitude away from being worth implementing (for our current understanding of suitable lattice key sizes). I would personally put many early voting protocols in this class as they make unrealistic assumptions on voters' abilities.
T : theoretical
A construction in this class asks to be judged on its theoretical contribution alone. It could solve an open problem or show that a particular class of scheme is realisable in principle; it could also be valued for the theoretical insight and understanding gained. The theoretical contribution is sufficient that short- or medium-term practical applications are not relevant.
Example constructions in this class include Gentry's original paper on fully homomorphic encryption or the result that zero-knowledge proofs (without random oracles) exist for all statements in class NP.
Note that the classification is nominal in the sense that a paper describing a type-R scheme is not a priori better or worse than one describing a type-T one; rather the classes express that there are different valuable goals in the design and construction of cryptographic schemes. Reviewers get the extra power not just to judge how interesting a problem is per se but how interesting it is to solve a particular problem in a particular class - results in different classes are not directly comparable in our opinion. Conferences may restrict which classes of problems are within their scope of course. One could also produce statistics on submitted and accepted papers in each class and conference organisers could incentivise submissions in particular classes in their calls for papers if they judge this to be in the interest of the cryptographic community.
The biggest advantage that I claim for this classification (though the actual categories would need to be discussed and refined) is that we gain an extra level of discourse to present our work to the wider world and give more practically oriented computer scientists a better way to understand the state of the art of cryptography.
Friday, July 18, 2014
But we have a bigger problem to solve. For a few years now, we've proudly been citing Norway as an example of where e-voting is used on a national level. Unfortunately, Norway has now abandoned e-voting due to concerns about the security of the system. Not that anyone has broken it - but a good scheme must do two things, it must offer security and it must offer a perception of security. The two are not the same. Sure, an obviously broken scheme won't be perceived as secure either when people's votes get leaked. But building a scheme that no-one has broken to date or even found a hint of how one could break it, as we've seen in Norway, isn't enough to make people comfortable. Looking at the schemes I know of, it strikes me how much we've concentrated on the former at the expense of the latter - how many cryptographers working on voting have even tried to validate their ideas with tests "in the field"?
For example, one of the mantras of cryptographers working on voting is verify, verify, verify: voters can verify their ballots as a protection against fraud, they can verify each others' ballots which goes some way towards preventing ballot stuffing and independent parties can verify elections, preventing the authorities from cheating on the result. Except that when Olembo et al. ( http://link.springer.com/chapter/10.1007%2F978-3-642-39185-9_9 ) actually asked German voters what they thought about a verifiable system, it seems no-one could be bothered to verify. Why? Because voters' mental model is that if a system is good enough for the authorities to approve, then presumably it's good enough that they don't have to check it themselves! And if voters don't verify, we lose our supposed safeguard against someone manipulating the election. Perhaps this suggests that we should look at alternatives to voter-verifiable schemes, which would represent a major shift in the focus of cryptographic voting research.
Papers like this tend to have a hard time in academia - supposedly not "hard" enough science? But in my opinion, this is exactly what we should, nay MUST be doing - cryptographically, e-voting is a solved problem. Homomorphic ElGamal has been around a while and we've got decent mix-nets too. What we don't seem to have is a fully worked out scheme that could be used in a nation-scale election, or (so it seems) the motivation to build one. With Norway's withdrawal, we now have one less example to cite of why our research is supposedly practical. My personal feeling, as the raeder may have guessed by now, is that we should be spending more of our time on focusing on what a secure voting scheme should look like in the eyes of the general public, and then go away and build something that satisfies those criteria. The impact factor we'd get out of a fully working voting scheme would tower over anything we're doing today.
If cryptographers need any more motivation than that, I'd suggest that if we don't come up with the first real* crypto-voting protocol, someone else will - and they'll have a better sense of salesmanship and business than us, but less understanding of cryptography. And what that will produce, I can only speculate - Diebold Voting Machines certainly sold well.
*Helios is real, but specifically designed NOT for political elections.
I guess that's why I'm on the train to Vienna now: although a cryptographer by trade, the more I can work with people in the more general field that comes under the umbrella term of security, the more I'm likely to discover what the real open problems are. My work with TU Darmstadt so far has been very enlightening in this area. So while I'm very much looking forward to the upcoming talks on logic and verification, I don't think that the next big thing in voting will come from there.
Friday, May 30, 2014
In this week's study group Gaven presented a paper that will be appearing at Crypto later in the year, titled "Indistinguishability Obfuscation and UCEs: The Case of Computationally Unpredictable Sources" [BFM14]. In a nutshell, this work says that if indistinguishability obfuscation exists, then a notion of security called UCE1 security cannot be achieved.
To understand what this result means, we have to go back over twenty years to the introduction of the random oracle model [BR93], a useful tool for making heuristic arguments about security. In the random oracle model algorithms have access to a truly random function (the random oracle), which plays the role a hash function will fulfil in the implemented scheme. Security is then proved with respect to this random oracle, the idea being that if the hash function "behaves like a random oracle" then the scheme will be secure in practice.
The main benefits of the random oracle model are its power and its conceptual simplicity. However it isn't without its drawbacks, the main one being that once the random oracle is instantiated with a hash function the original proof no longer applies. It has been shown [CGH98] that there exist schemes secure in the random oracle model that become insecure under any instantiation of the oracle. It could be argued that this construction (and other examples following it) is unnatural and that actual schemes with proofs in the random oracle remain unbroken, but the whole debate is really played out by now, so I'm not going to go into it.
An attempt to address these issues is the notion of Universal Computational Extractors (UCEs) [BHK13]. UCEs seek to formalise what it means to "behave like a random oracle". Schemes can then be proved secure based on the existence of a UCE rather than a random oracle, the difference being that the necessary properties of the instantiating hash are now explicit so that a scheme instantiated with a suitable hash is provably secure.
Behaving like a random oracle
The obvious definition of behaving like a random oracle would be the following game, when the adversary is given the hash key and access to either the real hash or a random oracle, and must determine which function he is interacting with.
This is clearly unsatisfiable as the adversary can query a point x to get response y, then use the hash key to evaluate for himself the hash at point x and check if the resulting value is y. To prevent this trivial win, the adversary is split into a source S and a distinguisher D, with only S having access to the function and only D having the hash key.
The source can aid the distinguisher in his task by providing leakage L. If this leakage is unrestricted then the adversary can use the same attack as before, so for the definition to be useful some restriction must be put on S's output. The restriction is that the source must be unpredictable, which roughly says that given the leakage L it is hard to identify any of the queries the source S made. If no distinguisher can tell real from random with the help of any unpredictable source, then the hash is UCE1 secure. With a UCE1-secure hash in hand, a whole bunch of primitives become achievable in the standard model. Unfortunately, the result of [BFM14] shows that a UCE1-secure hash cannot exist.
An obfuscator is an algorithm that takes in a program and outputs an indecipherable mess of a program with the same functionality, the idea being that you can give this new program to someone and they can't work out anything about its functionality beyond what interacting with the original program in a blackbox way would reveal. It turns out this is impossible [BGI+01], but achieving a weaker notion called indistinguishability obfuscation might be possible. An indistinguishability obfuscator is an algorithm that obfuscates a circuit in such a way that the obfuscations of any two functionally equivalent circuits are indistinguishable, that is, given the obfuscation of either circuit 0 or circuit 1 an adversary can't tell which he was given. Such an obfuscator was constructed last year [GGH+13] using multilinear maps. Using an indistinguishability obfuscator, [BFM14] give an attack that breaks UCE1 security of any hash.
The source behaves as follows. First it queries its oracle at a randomly chosen point x and receives a response y. It then constructs a circuit that takes as input a hash key, evaluates H under that hash key at point x, and then outputs true if the result is y and false otherwise. The source runs the indistinguishability obfuscator on this circuit, and outputs the obfuscated circuit as its leakage L. The distinguisher D is given the hash key and the obfuscated circuit, and evaluates the circuit with the hash key as input. If the source was interacting with the real hash then the result will be true, otherwise it will almost certainly be false. Such a source and distinguisher can thus tell the real hash from a random oracle.
To break UCE1 security the source must be unpredictable, which can be shown through a standard game hopping argument. It is assumed that the output length of the hash is at least twice the key length, but this assumption is just to give high-level intuition and can be removed at the expense of simplicity. The first game is the unpredictability game where a predictor is given the leakage from S and to win must output a point in the set of queries S made. We move to a game that terminates if there exists a hash key such that the leaked obfuscated circuit evaluates to true. The probability that this bad event occurs can be bounded using the assumption to show that it is negligible, so the first two games are indistinguishable. Finally we move to a game where the source leaks an obfuscation of a circuit that always outputs false. Since the bad event of the previous game means it terminates whenever a hash key exists under which the circuit would evaluate to true, if the game does not terminate then the circuit must evaluate to false, and so in the last two games the circuit in each always outputs false. Indistinguishability obfuscation says that the obfuscation of the always false circuit in the second game and the obfuscation of the always false circuit in the last game are indistinguishable, so the games themselves are indistinguishable. In this final game S leaks no information about x, which is chosen at random, so the predictor has no information about x and hence the source is unpredictable.
With UCE1 security unattainable but the UCE framework still a desirable tool, various modifications to the definition have been suggested, including moving from computational to statistical definitions and to split sources. Very recently it was shown [BM14] that a stronger form of unpredictability might be the answer, as it is shown that this notion is sufficient to achieve results in correlated-input secure hash functions, universal hardcore functions and PKE, and crucially a construction meeting this notion is given.
- [BFM14] Indistinguishability Obfuscation and UCEs: The Case of Computationally Unpredictable Sources - Christina Brzuska and Pooya Farshim and Arno Mittelbach - https://eprint.iacr.org/2014/099
- [BGI+01] On the (Im)possibility of Obfuscating Programs - Boaz Barak and Oded Goldreich and Rusell Impagliazzo and Steven Rudich and Amit Sahai and Salil Vadhan and Ke Yang - http://www.iacr.org/archive/crypto2001/21390001.pdf
- [BHK13] Instantiating Random Oracles via UCEs - Mihir Bellare and Viet Tung Hoang and Sriram Keelveedhi - http://eprint.iacr.org/2013/424
- [BM14] Using Indistinguishability Obfuscation via UCEs - Christina Brzuska and Arno Mittelbach - http://eprint.iacr.org/2014/381
- [BR93] Random Oracles are Practical: A Paradigm for Designing Efficient Protocols - Mihir Bellare and Phillip Rogaway - http://cseweb.ucsd.edu/~mihir/papers/ro.pdf
- [CGH98] The Random Oracle Methodology, Revisited - Ran Canetti and Oded Goldreich and Shai Halevi - http://eprint.iacr.org/1998/011
- [GGH+13] Candidate Indistinguishability Obfuscation and Functional Encryption for all circuits - Sanjam Garg and Craig Gentry and Shai Halevi and Mariana Raykova and Amit Sahai and Brent Waters - http://eprint.iacr.org/2013/451