The list of all past and present grant-funded research projects where I am involved.
|Project||Acronym||Principal Investigator||Team members||Duration||Fundings||Type|
|Secure Computation over
|SCENE||Geoffroy Couteau||Adi Rosén,
|2021 – 2024||€170.000||ANR JCJC|
ANR Project — SCENE
A vast body of work has been dedicated to the development of widely adopted methods to protect the exchange of sensitive data over large communication networks, such as the web. Almost 75% of the total internet traffic is now encrypted; in parallel, a vast ecosystem of data-driven applications has emerged, building upon the impressive development of machine learning algorithms. When fed with large sets of labelled data, deep neural networks and other supervised learning methods have the potential to revolutionize numerous sectors, from autonomous cars to the discovery of new therapeutics. This creates a paradoxical situation: the importance of protecting individual’s privacy is now widely recognized, and encrypted communication became the default form of communication; yet, there are strong incentives to publicly reveal the very same private information that encryption methods were designed to protect – at the individual level, because this enables the use of applications such as social networks and recommendation systems, and at the society level, because datasets of sensitive information are the core resource needed in machine learning methods.
To resolve this contradictory situation, the approach has been until now to release private databases stripped of clear identifying data (names, address, email, phone number). Typically, anonymized medical data have not been considered private across the world, and are made publicly accessible, or distributed to researchers and industrials. However, it is now widely recognized that this completely fails to reconcile privacy with data usability: even after being anonymized, the remaining information in these public datasets suffices, in an overwhelming majority of the cases, to fully identify almost every individual in the dataset (1, 2, 3, among many others – for example, the combination of ZIP code, birth date, and gender already suffices to uniquely identify 87% of the American population, see 4). This creates a major concern for privacy, and as the policy makers are realizing it, regulations are evolving in consequence (including modifications to the US HIPAA, or the EU GDPR).
Secure computation is an active research area, introduced in 1986 in the seminal work of Yao. In the conflicting interplay between the need for large datasets in machine learning applications, and the crucial importance of protecting sensitive data, secure computation aims at achieving the best of both worlds and fully reconciling these two goals. It allows to distributively evaluate arbitrary functions on private data held by different individuals, without disclosing these data publicly – in fact, without ever disclosing anything more than the outcome of the calculation, even to the participants. Hence, rather than anonymizing datasets before using them in calculations, secure computation allows to never reveal them. With the failure of data anonymization, secure computation has emerged as the most promising approach for guaranteeing the privacy of sensitive data without giving up on the promises of modern data-driven applications. While early feasibility results were mainly of theoretical interest, there has been tremendous improvements in the past decade, to the point that modern protocols are now no longer beyond the reach of the computational power of modern computers. As a consequence, secure computation solutions are now being proposed by several companies (1, 2) , and secure computation protocols have already been used in a variety of real-world situations where important calculations had to be done on data that could not be disclosed, from auctions in agriculture markets to IT studies in Estonia, from tax-fraud detection to computation of pay equity metrics by gender and ethnicity in the Boston area.
However, as of today, the deployment of concrete solutions for secure computation remains severely limited: in the above examples, preparing the secure protocol required months to years of work of a dedicated team, and executing it required hours of computations over large servers. This remains very far from a satisfying solution to the usability versus privacy problem, which would require a large-scale, on-demand secure computation solution which can be run quickly and efficiently over standard machines, between any group of users interacting over an encrypted communication network.
In this context, our goal is to push the efficiency boundaries of large-scale secure computation, both asymptotically (by obtaining upper and lower bounds on the efficiency which secure computation protocols can reach in various models) and with respect to concrete runtime (by pushing forward a new approach to overcome the limitations of the standard paradigm secure computation). Our aim is also to evaluate precisely the concrete efficiency of our protocols through runtime analysis; as the coordinator of this project did in previous works on overcoming efficiency barriers for secure computation, we will also seek to obtain optimized implementations for some of the protocols developed in this project, through collaborations with researchers outside of the team, in order to precisely evaluate their impact on secure computation in real world situations.