Thursday, April 10, 2014

Life's not fair? Bitcoin can help.

Last week, Marcel presented a study group on secure computation using bitcoin. In a secure multi-party computation (MPC) protocol, several players interact to compute a function of their inputs, without revealing the inputs to each other. These days MPC is a huge area of research in cryptography and there always tends to be several interesting papers at the big conferences.

One fundamental issue that is often overlooked in MPC is fairness. A protocol is said to be fair if it is impossible for a malicious player to abort the protocol early and learn the result of the computation, before the other players have managed to do so. In general this is impossible to achieve, as the first player to learn the result can usually just abort instead of passing this on to the other players [1]. However, using the power of the bitcoin network, fairness can be enforced for any function.

Bitcoin transactions


A transaction in the blockchain typically contains: [2]
- value
- previous transaction ID
- in-script: outputs signature on current transaction
- out-script: verifies signature from next transaction's in-script
- (timelock)

The value and previous transaction ID are self-explanatory; the two scripts provide the key functionality. For a normal bitcoin transaction, the in-script outputs a signature on the current transaction, and the out-script is used to verify the signature from the in-script of the next transaction using the appropriate public key.

The key point is that the in-script and out-script are not just a signature and a public key. They are pieces of code written in a (very simple) scripting language [3], so can actually be used to do something entirely different!

The final, optional item in a transaction is a timelock: this can be used to indicate a time, before which the transaction may not be added to the ledger.

Timed Commitment


A fundamental building block for fair secure computation using bitcoin  transactions is timed commitment, which functions as a normal cryptographic commitment, except the committer must pay a penalty if they do not open it within a set time period. The Committer, C, sends a standard commitment to the receiver, R, and posts an initial commitment transaction to the blockchain:

in-script: (output signature as normal)
out-script: check sigC AND (opening OR sigR)

Here the out-script verifies that the next transaction was signed by C (the  committer), and also that the next transaction either contains the required  decommitment information, opening, or was signed by R (the receiver). This  means that either of the following two transactions may be added to Commit in the blockchain:

in: sigC, opening
out: C

in: sigC, sigR
out: R

The Fuse transaction must be created (and signed by both parties) at the start and given to the Receiver. Once Commit has been posted, the Commiter must post the Open transaction, with the relevant decommitment information, before the timelock expires, at which point the receiver can post Fuse to gain their reward if Open hasn't been added.

Fair MPC using Mutual Commitment


Similarly to the timed commitment protocol above, a *mutual commitment*  between any number of parties can be created. Here every party must commit to something, and is penalised if they do not decommit within the time period. Adapting this to MPC is quite straightforward. At the end of a typical MPC protocol, every party will hold a *share* of the result, which must be revealed so that everyone can reconstruct the result. Mutual commitment guarantees that everyone will reveal their share, or pay the penalty.

Clearly using bitcoin doesn't quite guarantee fairness in the traditional sense; it is still possible for a player to abort, just at a high monetary cost. But in a model with rational adversaries, if the penalty is set appropriately then this seems a reasonable solution.

Further reading:
The first paper we discussed, on timed commitment and secure lotteries without a trusted third party.
The second paper, on fairness in two-party computation.
Another recent paper on fairness and MPC.

[1] See Enrique's blog post for details of a nice, recent work that characterizes situations where fairness is possible.
[2] This is probably not entirely accurate, I recommend the bitcoin wiki for a more thorough description.
[3] The Script language is a simple stack machine, deliberately not Turing-Complete so that non-terminating scripts can be detected easily.

Wednesday, April 9, 2014

Who are you to access this?

Today's study group was given by Bin on the paper "Enforcing Confinement in Distributed Storage and a Cryptographic Model for Access Control" by Halevi et al. It deals with the T10 OSD protocol, which provides access control in a network storage setting. The contributions of the paper are two-fold: First, the authors propose binding access credentials to clients, which prevents clients from passing on credentials to other clients of the system. Second, they formalize access control in the model of universal composability and proof the security of the amended protocol therein.

The protocol involves four parties: a client, a security manager, a policy manager, and a storage server. It is required that the storage server and the security manager share a secret key. If the client wants to access an object on the storage server, it sends a request to the security manager, which in turn request a capability from the policy manager. If the request is accepted, the latter returns a capability, which is then included in the credential sent from the security manager the client. The client then uses the credential to access the desired object on the storage server. Finally, the security manager can also revoke credentials by communication directly with the storage server.

There are four security modes requiring increasing level of checks from the storage server: NoSec, CAPKEY, CMDRSD, ALLDATA. The talk was only concerned with CAPKEY. In this mode, the storage server checks the expiry time included in the capability as well as the so-called VTag. This is a MAC on a security token (a random number for every client-server pair) using a key that is derived by a PRF on the capability using the secret key shared by the security and the storage server. This authenticates the credential. However, this does not bind the credential a particular client. The authors therefore suggest to include a tag derived by PRF on the client ID using the same secret key.

To prove the security of the resulting protocol, the authors propose an ideal functionality keeping tracks of all credential issued and revoked, and they provide a simulator in the hybrid model assuming secure channels and a common reference string.

Friday, March 28, 2014

Something is leaking in the state of Denmark

A recent study group brought about a micro-preview of this year's Eurocrypt, as we visited one of the 'Best Paper' nominated submissions, Unifying leakage models: from probing attacks to noisy leakage [by A. Duc, S. Dziembowski and S. Faust]. Speaking of Eurocypt, Susan's (who is now part of our Bristol Crypto group) and her co-authors' paper is also on this year's programme.

Side-channel attacks target implementation-specific characteristics of algorithms deployed on physical devices, such as power consumption, running time and so on. Sure, this information is "observed" with specific pieces of hardware (e.g., an oscilloscope), but since such equipment is relatively cheap, it is plausible to assume that even a not-so-experienced attacker can gain access to sensitive information. There is a significant number of papers describing scenarios in which and methods by which the then attacker walks successful (at least in theory), even when side-channel information is incomplete or erroneous. This is a highly optimistic perspective, perhaps a more realistic one would be the cat-and-mouse game, where deployers constantly develop countermeasures and attackers/evaluators strive to come up with ingenious ways around them, both tasks being equally hard. And let's not forget that the attackers may not have it easy to begin with: alas, although in theory there may be no difference between theory and practice, in practice, there always is [fact]. In particular, the two shortcomings mentioned above hold base for the most common paradigms in leakage modelling: there's only so much information which is actually available (bounded leakage) and physical measurments will always reflect some degree of intrinsic noise (noisy leakage).

The paper that we have visited aims at unifying leakage models, and of course does so by introducing a new leakage model. Moreover, it extends the previous work of Prouff and Rivain presented at last year's Eurocrypt [paper]. In particular, the latter paper analyzed the security of a block-cipher implementation protected with additive masking, but had some shortcomings, namely 1. some components were assumed leak-free, 2. the security was guaranteed against random (i.e., not chosen) messages 3. it had limited application.

Duc et al. make their argument based on simulation. Clearly, the first issue will then be how to model noise. Basically, noise is regarded as a randomized function over a finite set of values, independent for all elementary operations, bounded by some parameter, δ (δ-noisy), and is as such a statistical distance from the "ideal" leakage.

Three models for adversaries are studied, the noisy, the random probing and the t-threshold-probing model. In this third example, the adversary can learn the value of t intermediate values that are processed during the computation. A random probing adversary recovers an intermediate value with probability e and obtains a special symbol ⊥ with probability 1−e. Quite intuitively, it is possible to change from this model to the t-threshold-probing adversary, the intuition being accompanied by a formal proof and examples.

On the other hand, in order to address the topic of the noise-resilience of cryptographic computations, a model for arithmetic circuits over a finite field is presented. A (stateful arithmetic) circuit Γ over a field F is an oriented graph. The nodes are called gates, and belong to a pre-determined set (e.g., input/output gates, multiplication/addition, random/constant, memory and so on). A black-box circuit adversary A is a machine that adaptively interacts with a circuit Γ via the input and output interface. Given two stateful circuits Γ and Γ′ over some field F, Γ′ is said to be a (δ, ξ)-noise resilient implementation of a circuit Γ with respect to some (encryption) function f, if the output of Γ with some chosen key k and input is equally likely observed as the output of Γ' with the f(k) and same input, and for every δ-noisy adversary A there exists a black-box circuit adversary S such that the distance between the output of A after interacting with Γ with key k and the output of S after interacting with Γ' with key f(k)  is bounded by ξ. Finally, security in the probing model and resilience to leakage are addressed and a comparison to the previous work is given.

Thursday, March 27, 2014

"Nobody" has written this document...

In today's study group, Panos discussed a paper that focuses on Forensic Stylometry, a form of Authorship Attribution. The paper proposes the Classify-Verify method and it can be found at: There are 2 basic categories in Authorship Attribution (AAtr): The Closed World problem and the Open World problem. The former category implies that the author is in the suspect set. In Open World problems the author might not be in the set. For this reason, the authors merge CLASSIFICATION and VERIFICATION. They utilise an existing distance-based authorship VERIFICATION method but they also add per-feature standard deviations normalisation and per-author threshold normalisation to the scheme.

Stylometry is used to analyse anonymous written communications. The goal is to de-anonymize them. Traditional methods need a suspect set to do that in a reliable way. The paper bypasses the strong assumption that the author is inside the suspect set. The current state-of-the-art methods rely basically on AI techniques. They can identify with an accuracy of 90% individuals in sets of 50 people. However, there are certain restrictions to the already proposed algorithms. In Authorship Attribution, given a document D (unknown authorship) and a set of authors (A = {A1, A2, … , An}), we have to determine the author Ai of D. With authorship verification, given D and A we must decide if D is written by A. The paper suggests merging the two worlds: Given a D of unknown authorship and documents written by a set of known Authors, determine the author Ai ∈ A of D, OR highlight that the author of D is not in the Authors Set.

The authors are using two different corpora for their experiments: EBG corpus (45 different authors/at least 6,500 words per author) + adversarial documents and ICWSM 2009 Spinn3r Blog (blog corpus, around 44M blog posts). For classification they use a CLOSED world SVM classifier provided by Weka (SMO SVM with complexity parameter C = 1) and they choose only one type of feature sets from the Writeprints feature set, which was used to quantify the EBG corpus (over 90% accuracy for 50 authors). The evaluation was done using 10-fold cross validation. (Measured on the most common k, 1-5grams, 50 < k < 1000, step = 50). Finally, they chose the 500 most common character bigrams (they call it <500,2>-chars) as their feature set. FEATURE EXTRACTION is done using JStylo and JGAAP authorship attribution APIs.

For verification they use Classifier-induced verifiers that require a closed-world classifier and use its class output for verification. Another family of verifiers is the Standalone verifiers, which rely on a model built using author training data, independent of other classifiers and authors. Classifier-Induced Verification use distance-based classifiers. A higher confidence in an author may indicate that the author is in the suspect set while a lower confidence may indicate that he is not. So, after the classification, verification is based on setting an acceptance threshold t. We thus have to measure the confidence of the classifier and accept the classification if it is above t. We use probabilities in this category. The paper evaluates three different verification methods: P1, P1 – P2 –Diff and Gap-Conf.

For Standalone Verification the evaluation of the Classify-Verify model is done using either Distractorless verification or the proposed Sigma Verification. For the basic Verification (V: Distractorless) we use distance combined with a threshold: The method suggests to set an acceptance threshold t, model as feature vectors document D and author A, measure distance between them and decide that D is written by A if distance is below t.

The proposed Sigma Verification scheme applies two adjustments to the former method. a) Vσ: Per-Feature SD Normalisation, which is based on the variance of the author’s writing style (uses standard deviation of features) and b) Vα: Per-Author Threshold Normalisation. The evaluation of the former approaches indicates that there is no specific verification method preferable than the other and the selection of a verifier should rely on empirical testing. Sometimes it happens V to outperform Vσ or Vα and the opposite.

The proposed C-V algorithm is an abstaining classifier. In other words, it is a classifier that refrains from classification in certain cases to reduce misclassifications. Basically, the authors expand the closed world authorship problem to open world adding another class: the class ‘UNKNOWN’. Therefore, closed-world classification is applied on D and A = {A1, A2, … , An} and the output is given to the verifier. The verifier determines whether to accept Ai or to reject it (⊥) because C-V is a classifier over the suspect set A∪{⊥}. The threshold of the C-V verifier can be determined as follows:
  • Manually: Manually set by user (make classifier strict or relaxed), 
  • p-Induced Threshold: The threshold can be set empirically over the training set to the one that maximises the target measurement, e.g. F1-score, in an automated process, 
  • in-set/not-in-set-Robust: If p is not known we examine various p and various t. There will be a point where the curves intersect. (p is the likelihood the author of the document to be in the set).

The authors use 2 different settings for evaluation: when the authors of the documents are in the set of suspects (in-set) and when they are not (not-in-set). One assumption they make is that if D is written by A, classified as B but the verifier replaces B with ⊥, they consider the result as true. For their CLASSIFICATION phase they train n (n-1)-class classifiers using the SMO SVM as discussed previously. For the VERIFICATION phase they evaluate several methods: A standalone method for each corpus and all the classifier-induced methods: Gap-Conf, P1, P1-P2-Diff.  Also, they use F1-score because it provides a balanced measurement of precision and recall. For the threshold they use two automatic verification methods: If p is known they use the p-induces threshold that maximises the F1-score on the training set (p = 0.5). If p is unknown they use the robust threshold p-F1R. As baseline they compare F1-scores with 10-fold cross validation results of closed-world classification using the SMO SVM with the <500,2>-chars feature set. Finally, for the Adversarial Settings Evaluation, they train their models on non-adversarial documents of EBG and test them on the imitation documents to test how well C-V thwarts attacks.

Their results can be aggregated as follows: For both EBG and the blog corpora, 0.5-F1 results significantly outperform 0.5-Base using any of the underlying verification methods, at a confidence level of p-val < 0.01. Generally, the underlying verifiers (leading to an overall increase in F1-score) thwarted a large proportion of misclassifications. Among all verification methods, P1-P2-Diff is proven the most preferable verifier to use, since it consistently outperforms the other methods across almost all values of p for both corpora, which implies it is robust to domain variation. For adversarial settings, they prove that the closed-world classifier is highly vulnerable to these types of attacks but C-V manages to thwart the majority of the attacks. In addition, the results suggest that classifier-induced verifiers consistently outperform the standalone ones. Overall, the method is able to replace wrong assertions with more honest and useful statements of “unknown”.

Replacing Random Oracles using Indistinguishability Obfuscation

This week's study group looked at the powerful tool and burgeoning tongue-twister that is indistinguishability obfuscation, with specific focus on the forthcoming Eurocrypt '14 paper titled 'Replacing a Random Oracle: Full Domain Hash From Indistinguishability Obfuscation' by Hohenberger, Sahai and Waters (HSW). Since the work of Garg et al. last year there has been a flurry of work on obfuscation (25 papers on ePrint in 2013 with 'obfuscation' in the title and two whole sessions at TCC 2014 devoted to it), and naturally this will be a well-attended topic at all the major conferences this year.

The idea of full-domain hash (FDH) signatures dates back to 1993 and Bellare-Rogaway's seminal 'Random Oracles are Practical' paper. The idea allows construction of a signature scheme from any trapdoor permutation (TDP) by simply signing the hash of a message using the trapdoor: σ = g-1sk(H(m)). Verification simply employs the inverse of this permutation using the public key. The idea is that the hash function maps an arbitrary message space to the domain (and co-domain) of the permutation, and when modeled as a random oracle, this scheme is secure for any TDP. The proof, employing a reduction to the adversary's success against a TDP challenger, was influential and straightforward: for all but one of the (unique) queries to the RO, the reduction algorithm chooses a random value from the domain and outputs gpk(t), and at one query point m* it programs the output of the RO to be z* = gpk (t*) where z* was the value given by the TDP challenger.

Since this landmark work, a number of 'FDH-like' schemes have appeared, such as the seminal Boneh-Franklin IBE scheme and Boneh-Lynn-Shacham signatures, which both utilise the 'random oracle programming' idea in their proof method. The idea of the HSW paper is to provide a replacement hash function for the random oracle without modifying the underlying signature construction. To do this, the authors utilise an indistinguishability obfuscator and constrained PRFs. More information on indistinguishability obfuscation can be found in Essam's blog post of Joop's study group last October, and many of the recent papers on obfuscation provide a concise overview of indistinguishability obfuscation. A major challenge in the field of obfuscation is dealing with the black-box impossibility results, and basically this paper manages to avoid these issues because the hash function constructions are inherently black-box. The idea of a constrained PRF is that it is only defined on a subset of the usual input space, and HSW focus on puncturable PRFs meaning PRFs that can be defined on all bit strings of a certain length, except for any polynomial-size set of inputs.

The main idea of the proof for selectively secure FDH signatures is that the hash function is an obfuscation of gpk(F(K,m)) where gpk is the public direction of the TDP, F is a puncturable PRF and K the key for this PRF. Since the scenario is selective security, the proof involves hopping from this hash function to an obfuscation of the 'punctured version' of the hash function, and since the input/output behaviour is the same this hop is dealt with by the security of indistinguishability obfuscation. In essence, this punctured version fixes a point (that is programmed to be the signature security game challenge) such that the only way an adversary can win is by defeating the PRF itself. This result is particularly neat because it essentially follows the same strategy as the original RO proof. In the adaptive case things get slightly more complicated, as in past proofs the hash function needs to operate in two modes: a 'normal' mode where the hash function parameters are chosen at random, and a 'partitioning' mode where the parameters are chosen in such a way that inversion is possible for a large fraction of points. However obfuscation can defeat this difficulty too, because the description of the hash function itself can be obfuscated. The paper demonstrates the power of indistinguishability obfuscation and the methods used may well be useful in circumventing or expanding on the black-box impossibility results in the literature.

Thursday, March 13, 2014

You can have the result, but you're not getting my data!

A recent study group focused on the ZQL query language that was published in USENIX 2013 by Microsft research ( The ZQL query language was created to address a simple problem. Given a set of private client data, how can an external server learn the result of a function on that data without ever (directly) learning anything about the private data. We focus on a setting of a single client that knows all the data and does not want to give it to the server. Although it may sound trivial, devising a practical method to achieve this whilst providing the desired security guarantees is far from it. ZQL, which supports a subset of SQL functions, achieves this through the use of zero-knowledge protocols to produce code that performs the data certification, computation and verification. The final compiled code aims to provide the following security properties:
  • Correctness. For any given source inputs, the sequential composition of the cryptographic queries for the data sources, the user, and the verifier yields the same result as the source query.
  • Integrity. An adversary given the capabilities of the user cannot get the verifier to accept any other result (except with a negligible probability).
  • Privacy. An adversary given the capabilities of the verifier, able to choose any two collections of in- puts such that the source query yields the same result, and given the result of the user’s cryptographic query, cannot tell which of the two inputs was used.

One running use case in the paper is in the setting of smart metering. A household may be billed at a different rate as per the usage at different times of the day (e.g. economy7 meters but more fine-grained). The privacy concern here is that a user may not want the energy company to know the exact usage for each second of the day (possibly allowing the energy supplier to learn information about the users habits - such as when the user has a cup of tea, leaves for work in the morning or what time they watch T.V.) but the energy supplier is entitled to know what the bill for the user should be. ZQL allows for the meter to perform the bill calculations (as per the request of the supplier), certify the data and prove that the resulting bill is correct without revealing any of the data used to compute the bill.

The authors constructed ZQL to have sufficient abstraction such that using ZQL does not require a thorough understanding of the underlying crypto functions (making it appealing/practical for industrial deployment). Each time a query is constructed, the ZQL compiler will synthesize all the required zero-knowledge protocols (currently supported by RSA and Elliptic Curve primitives). The package is tied up nicely with cost metrics for evaluating and verifying the queries on the data. On the other hand - ZQL is still in it's infancy and limited to relatively simple SQL-like functions in order to remain practical and preserve the security guarantees.

I would say this is a nicely presented paper. My understanding of SQL is sparse at best, but that being said, the authors do well to present the intuition of what they are trying to achieve and what they have attained thus far. People are becoming increasingly aware of the data being given away about their personal habits and this paper does well to demonstrate the application of various crypto and computer science primitves to piece together a nice solution to this challenging problem.

Tuesday, March 4, 2014

Beware – Your user agent could be a double agent!

Even though there were many interesting talks at the RSA 2014 conference in San Francisco, including a few discussing the recent NSA-RSA privacy drama, I chose to blog about a short talk by Mike Shema from Qualys centering around security and privacy on the web.

In his talk, Mike highlighted the difference between having a secure browser and secure date. Basically, every day we use web browsers (known as user agents since they interact on behalf of the user) to access the web. Many websites rely on the collection of users' data to generate their revenue and thus as users access websites, those websites interact with users' browsers behind the scene not always necessarily the way the users want.

Many vendors offer big money prizes to help them identify and mitigate security vulnerabilities in their browsers. Although there have been many recent initiatives towards making browsers more secure, including the advent of HTML5, we still have not reached what we consider default stands for privacy. Vendors have conflicting perspectives regarding not tracking users on the web. The implementation of the recently proposed “Do Not Track” HTTP header is one obvious example. Internet Explorer suggested that the default value of such a field should be set to enabled so that tracking is prevented by default unless the user decides to opt in. On the other hand, Google Chrome had a different point of view. Advertisers also thought such a feature would impact their business.

Constantly, different security features are added to browsers to make them more secure. As well as self updates, among other things, such features include: process separation and sand boxing. Informally, the former allows different tabs/web pages to be run as separate processes, whereas the latter limits access to user's resources on per-application basis. As an example, latest versions of Google Chrome embed Flash on its own.

There are many ways one can envision metrics used in evaluating browsers' and data security or assessing whether one browser is more secure than another or if a data set is more private than another. On the one hand, there are Malwares (short for malicious software) which attack browsers directly. On the other hand, we have attacks like resources framing, Clickjacking and Cross-Site Request Forgery (CSRF). CSRF (sometimes is also abbreviated as XSRF) allows an attacker to force the victim to send HTTP requests to another target website and therefore making use of any capabilities the victim has with the target website, e.g. exploiting an existing authenticated session that the victim has with the target website. Recently, Mozilla and Safari turned on TLS 1.1/TLS 1.2 and are using the recommended cipher suite by default to promote network encryption which is vital for data security.

Many web pages these days are built with multiple origins and their content comes from different sources. Advertising, in particular, is inherently cross-origin. It was not until about 10 years ago that the risks of mixing different contents from multiple origins in a web page came into the fore.
Even though the Same Origin Policy used by browsers ensures that a resource cannot read a response from another origin, it still does not provide proper isolation of cookies and resources across different origins and thus it does not rule out CSRFs. More precisely, the same origin policy does not stop a resource from one origin from making a request to another one from a different origin.

By ensuring that websites use SSL (Secure Socket Layer) as emphasized by the Electronic Frontier Foundation (EFF) in its “HTTPS Everywhere” initiative, many security vulnerabilities such as mixed content and information leakage can be prevented. However, there have been examples of many browsers (especially on mobile devices) which ran HTTPs but still was vulnerable because they were skipping important steps such as certificate validation.

In order to enhance users' data privacy, many potential solutions could be implemented. One important step is to impose penalties on servers which do not honour the Do Not Track requests made by the user. Another possible countermeasure is using data pollution where, for instance, users can swap their cookies such as Google PREF IDs or Double-Click cookies in order to achieve anonymity.

In summary, there a number of ways which could help preserve users' privacy on the web. Firstly, enabling “deny third party” cookies as a default as done by Safari. Secondly, using identity management and separating cookie jars in order to ensure that the interaction of a user with one website is independent from their interaction with another one.