Getting Started on Pseudorandom Correlation Generators

Pseudorandom correlation generators provide a mean to let multiple parties, typically two, generate a large amount of correlated pseudorandomness without any communication given initial short correlated seeds. The kind of correlations that we usually consider are those useful in secure computation, which includes

  • OT correlations: Alice gets pairs of pseudorandom strings $(s_0,s_1)$ and Bob gets a pair $(b,s_b)$ where $b$ is a pseudorandom bit
  • OLE correlations (over some field or ring): Alice gets a pseudorandom linear function $(u,v)$ (specifying the function $f_{u,v}:x\mapsto u\cdot x+v$) and Bob gets $(x,f_{u,v}(x) = u\cdot x+v$ for a pseudorandom $x$. Note that over $\mathbb{F}_2$, this is equivalent to the OT correlation.
  • Beaver triples (over some field or ring): Alice and Bob get additive shares of pseudorandom values $x$, $y$, and of their product $x\cdot y$. A Beaver triple can be locally constructed from two OLE correlations.
  • Authenticated Beaver triple: same as above, except that Alice and Bob also get shares of a pseudorandom MAC $\Delta$, together with shares of $\Delta\cdot x$, $\Delta\cdot y$, and $\Delta\cdot xy$. If there are many authenticated triples to generate, they share the same $\Delta$.

These pseudorandom correlations can be used by MPC protocols in the preprocessing models. These protocols work under the assumption that a trusted dealer initially generates and distributes among the parties long correlated strings. They achieve unconditional security (in this trusted dealer model) and very high concrete efficiency. Different correlations enable different protocols: OT correlations are best used to securely evaluate Boolean circuits (they can be used for circuits over a larger field $\mathbb{F}$, but at the cost of a significant overhead compared to the alternatives), OLEs and Beaver triples target general arithmetic circuits, and authenticated triples are used to get malicious security. PCGs provide a mean to replace the trusted dealer with a secure distributed setup using a minimal amount of communication.

There are many other kinds of protocols that benefit from more exotic forms of correlated randomness: vector-OLE are very useful in zero-knowledge proofs, and private set intersections; unit-vector correlations have nice implications for private information retrieval; matrix triples are useful for secure linear algebra; topology-dependent correlations allow better protocols when the circuit topology is known a priori; higher-degree correlations enable communication savings for layered circuits; etc, etc. For each of those, one can design suitable PCG candidates.

Learning about PCGs

PCGs are an exciting and fast-evolving research area, and I often get requests of the form: where do I learn about them? If I want to start doing research on PCGs, where do I start, what papers should I read? And that’s a great question, to which I did not always provide a satisfying answer! The trouble with young and fast-moving research fields is that research did not always have time to crystallize into surveys, textbooks, and reference papers, and it can be hard to navigate the space. The main purpose of this blog post is to provide some helpful pointers to whoever wants to learn more about them.

Books and surveys

A good coverage of the basics, starting with MPC in the correlated randomness model and introducing the core framework for PCGs from coding-theoretic assumptions, is given in my HDR manuscript. Last year, I have slightly expanded and revised the content of the manuscript to turn it into a book, An Introduction to Silent Secure Computation – but if you don’t want to buy a book, the HDR manuscript covers most of what the book covers. These are the only general introductions to the topic I know of. Below, I list a few complementary resources that cover specific aspects or related areas.

HSS: PCGs are closely related to function secret sharing (a core building block of the standard PCG framework) and homomorphic secret sharing (with connections in both directions), and many recent works on PCGs employ ideas from the HSS domain. A slightly outdated book chapter, co-authored with Elette Boyle, Niv Gilboa, and Yuval Ishai, is available here. It covers some of the fundamentals on FSS and HSS, as well as some applications and relations to PCGs.

VOLE-based ZK: PCGs, and especially the original PCG for vector-OLE from these works, has initiated a whole research area in VOLE-based zero-knowledge proofs. The area has since evolved on its own – in particular, the latest developments (VOLE-in-the-Head) don’t use PCGs anymore. Nevertheless, the two lines of work remain closely intertwined. A good survey on VOLE-based ZK (from June 2023) is available here.

Linear tests: A common framework for analyzing the security of code-based PCGs is the linear test framework, that (heuristically) relates the security of the assumption to the dual distance of the underlying code. I wrote a short introduction to the framework in a blog post. More discussions can be found in the PhD thesis of my former PhD student, Clément Ducros.

Papers

If, like me, your preferred way to get into a new topic is by skimming through academic papers, you’ll find this section to be the most useful. I’ve gathered the main papers about PCGs (as well as related primitives such as PCFs, and cryptanalysis papers that have an impact on the security of PCGs) in a tree that (I hope) provides a visual overview of how the field evolved. You can drag-and-drop the tree or zoom in sections, and click on any nodes to have more details. The colors and icons are explained by the “Show Legend” button. I originally made this tree for myself, to keep track of the field and get an easy way to point students to the state-of-the-art on specific PCG-related questions.

Don’t take the tree structure too seriously: its aim is to loosely represent a follow-up relationship while grouping related papers in the same subtrees, but links can be quite arbitrary, since it is often the case that a new result has implications to several previous works without being directly a follow-up to any of them. I’ve decided against drawing too many edges – the graph would have been unreadable – and clarified the position in the notes when I felt it was necessary. If you think I missed important papers, drop me a mail! If you disagree with how I arranged things in the tree, it’s probably fair but no need to drop me a mail for that :) I hope to maintain the tree and enrich it in the future with new results in the area.

Other: videos, blog posts

If you’re more into watching talks and tutorials, I recommend checking this awesome 4-part introduction to pseudorandom correlation generators, by Yuval Ishai and Peter Scholl: Part 1, Part 2, Part 3, Part 4. For shorter material covering the same topic, you can check MPC with Silent Preprocessing via Pseudorandom Correlation Generators, by Lisa Kohl (Simon Institute workshop), Efficient Pseudorandom Correlation Generators: MPC with Silent Preprocessing, by Peter Scholl (talk at TPMPC), or Pseudorandom Correlation Generators: Secure Computation with Silent Preprocessing, by Yuval Ishai (talk at the NIST Workshop on Multiparty Threshold Schemes (MPTS) 2020). I also regularly give talks providing an introduction to the topic, see for example Pseudorandom Correlation Generators, Constructions and Applications. If you are into PCFs, you can check Correlated Pseudorandom Functions from Variable-Density LPN (BUsec seminar).

For material relating more to VOLE-ZK, you can have a look at Chainlink Labs Research’s series of posts on VOLE-based ZK, which covers the same basic code-based construction as most introductory talk (though the aim is VOLE-ZK rather than PCGs per se). See also this follow-up blog post. For tutorials, I recommand A dive into VOLE-based Zero Knowledge, by Xiao Wang or Zk Proofs From VOLE and VOLE-in-the-Head, by Peter Scholl.

Eventually, you’ll find on youtube the talks for most of the papers in the tree above, except for the ones that have not been presented at a conference yet.

Research Questions

I often find that the best way to learn about a new topic is to approach it with a research question in mind. This provides a goal and a motivation to understand things at a deeper level – without such motivation, I often find myself tempted to stick to a more superficial level of understanding (which is good enough for “storing pointers” that let me retrieve efficiently the material later upon need, but not good enough to acquire knowledge and matery of a topic). So, without further ado, here are in no particular order some of the main open problems in the PCG landscape (according to my own biased view):

Coding theory, cryptanalysis

The linear test framework relates the hardness of breaking LPN variants with a host of standard attacks to properties of the underlying code – specifically, the minimum distance of the code (see my previous blog post on linear tests). Thus, a major goal in PCGs is to find the perfect code according to this metric: finding a (family of) mapping(s) $H$ (a compressing matrix, viewed as the mapping $x \mapsto H\cdot x$) that can be computed as fast as possible (not only strict linear time, but also with small constants) such that a random $H$ from the family satisfies that $H^\top$ generates a code with high minimum distance (ideally, close to the Gilbert-Varshamov bound). There has been a host of proposals: quasi-cyclic codes, LDPC codes, Expand-Accumulate codes, Expand-Convolute codes, Repeat-Accumulate (or Repeat-Accumulate-Accumulate) codes, etc, etc. They come relatively close to the dream goal, but are not quite there yet!

Question: Is there a "dream code" for PCGs that stays close to the Gilbert-Varshamov bound with overwhelming probability, yet yields $O(n)$ seed expansion cost with tiny constants?

The linear test framework seems to capture reasonably well resistance against attacks, but it has deficiencies. First, it is too conservative by some aspects: parameters derived from the framework are much worse than parameters selected according to the best known cryptanalysis. Second, it fails to capture a large class of algebraic attacks (see most of the rightmost part of the tree above, in light green). Third, it is not even clear whether it adequately capture modern ISD variants, e.g., those using the representation techniques. It’s more of a personal pet peeve than a fundamental research question, but I nevertheless feel like it should be included!

Question: Can the linear test framework be formally shown to capture modern ISD variants? Can it be refined to provide tighter security guarantees?

Talking about algebraic attacks, there have been very exciting developments in the past few years that brought some interesting diversity to the cryptanalysis landscape for PCGs. Yet, these attacks seem to require a lot of samples (this makes sense intuitively, when one solves a set of equations of degree at least 2), while many modern PCGs use a dual-LPN approach, with a small number of samples (e.g., $2n$, where $n$ is the LPN dimension). Can we do better? This is out of my league – I’m not a cryptanalyst be any mean – but I’d be really interested to see people working on that!

Question: Can algebraic attacks beat linear tests / ISD on PCGs from dual LPN?

Setup costs

Perhaps the most fundamental PCG-related question, in my eyes, is the existence of a sublinear setup. Currently, if you want to generate $N$ correlation, then the distributed setup costs $\Omega(N)$, even though the seeds are of size $\mathsf{polylog}(N)$. It’s not that much of an issue if you want to use these correlations right after the setup: the setup actually produces them all “for free”! But if you want to store the seeds for later use, that seems unnecessary. This is especially annoying in two settings:

  • Threshold signatures: several signature schemes admit an efficient threshold protocol given OLEs (examples include Schnorr, ECDSA, but also many more). However, on each instance, you want to use a small number of OLE. Using current PCG setups to generate those OLEs forces you to work proportionally to an upper bound on the total number of signing instances you might want to execute in the future!
  • PCFs aim to generate an unbounded amount of correlated randomness. However, with current PCG setups, this cannot possibly materialize, since the setup cost grows with an upper bound on the total amount! Indeed, current PCFs typically assume that a trusted dealer will distribute the short keys to sidestep this issue.

Of course, one can get a $\mathsf{polylog}(N)$-computation protocol to distribute the PCG seeds (or PCF keys) using generic MPC, but those make a heavy non-black-box use of cryptography, and perform very poorly in practice. There are also some recent PCFs, following a different framework (introduced here), that have an efficient distributed setup, but they seem inherently limited at producing OT correlations. Unfortunately, some of the most exciting applications of PCFs require OLE or VOLE correlations. In my eyes, the number 1 question that, if answered positively, would have a tremendous impact on the widescale adoption of PCGs and PCFs, is:

Question: Is there an efficient setup protocol for PCG or PCF (for VOLE or OLE) whose cost does not scale with the maximum number of correlations?

OLE correlations over general fields or rings are among the most useful, as they enable secure computation over a wide variety of structures. They let you work over $\mathbb{Z}_q$ for a $256$-bit prime $q$ for threshold-Schnorr or threshold-ECDSA, or over a $64$-bit field to get $1/2^{64}$ cheating probability when using them as information-theoretic MACs, or over $\mathbb{F}_{16}$ if you want to distributively compute a UOV signature… You see the picture! Yet, current PCGs for OLE are quite costly. I mentioned their setup costs above (a limitation common to all known types of PCGs), but another limitation is their large seed size: while PCGs for OTs have a seed size growing as $\tilde O(\lambda\cdot w) = \tilde O(\lambda^2)$ (where $w = O(\lambda)$ denotes the noise weight), typically in the 10s of kB, PCGs for OLEs have seed size $\tilde O(\lambda\cdot w^2) = O\tilde (\lambda^3)$, typically a few MB. As a result, their benefits only kick in when generating millions of OLEs, which is a show stopper for many applications. A smaller seed size would significantly expand their range of applications.

Question: Is there an efficient PCG for OLE, or a programmable PCG for OT, whose seed size is $o(\lambda^3)$?

Malicious security

By now, we have very fast PCGs for authenticated Beaver triples over $\mathbb{F}_2$ in the 2-party case (e.g., via Expand-Convolute codes), reasonably efficient PCGs for (non-authenticated) Beaver triples over any field $\mathbb{F}$ in the $n$-party case (via quasi-abelian syndrome decoding), and reasonably efficient authenticated Beaver triples over any field $\mathbb{F}$ in the 2-party case (also via quasi-abelian syndrome decoding). This leaves essentially one tantalizing open question, the “final boss” of PCGs, the case of authenticated triples for $n>2$ parties:

Question: Can we build an efficient multiparty PCG for authenticated Beaver triples, over $\mathbb{F}_2$ or over any field $\mathbb{F}$?

The core issue is that state-of-the-art results build upon 2-party PCGs that enjoy a property called programmability. In short, this property allows to convert a $d$-party PCG for degree-$d$ correlations into an $n$-party PCG for degree-$d$ correlations, at the cost of a ${n \choose d}$ blowup. Beaver triples, a degree-$2$ correlation, can be generated for $n$ parties given programmable 2-party PCGs, which we know from ring-LPN or quasi-abelian decoding. On the other hand, authenticated Beaver triples are a degree-3 correlation, which requires a 3-party PCG, which we only know how to build if we have a 3-party DPF (see this work). But while we have 2-party DPFs over a size-$N$ domain with key size $O(\log N)$, the best known (efficient) 3-party DPF has key size $O(\sqrt{N})$, which is considerably more expensive! Improving this state of affair would not only solve the above problem, but would also unlock a host of other cryptographic applications.

Question: Is there an efficient 3-party DPF with polylogarithmic key size?

PCFs

PCFs generate near-unbounded amounts of correlations on-the-fly. For the VOLE and OT correlations, we have fairly efficient PCFs (here and here), though they lag one or two orders of magnitude behind the fastest PCGs. For OLEs, however, or for OTs with programmability, the gap is much worse: the best known PCF for OLEs from here has gigantic key sizes (in the GB) and fairly slow runtimes (two to three orders of magnitudes slower than for VOLE/OT).

Question: Is there an efficient 2-party PCF for the OLE correlation over large fields, or an efficient 2-party programmable PCF for OTs over $\mathbb{F}_2$?

A personal favorite of mine is the question of building public-key PCFs: PCFs where the setup can be instantiated non-interactively, a la Diffie-Hellman key exchange: given each other’s public key, two parties should be able to non-interactively derive PCF keys. This not only solves the setup problem (discussed earlier), but enjoys strong benefits when used in an “Internet-scale” setting: parties need not be online synchronously to run the setup, and $N$ parties can generate PCFs keys between every pair of nodes using $O(N)$ (broadcast) communication instead of $N^2$. A recent work showed the first efficient PK-PCF for OTs (it generates 15k to 30k OTs/s), but several core questions remain wide open: nothing is none for the VOLE correlation (yet alone for the OLE correlation, or for authenticated Beaver triples), and the construction requires quantumly-insecure assumptions. In my eyes, the following two questions are of fundamental interest:

Question: Is there an efficient 2-party public-key PCF for the VOLE correlation?

Of course, a positive answer to the above would automatically raise the next question of getting a PK-PCF for OLEs, or for authenticated Beaver triples.

Question: Is there an efficient post-quantum 2-party public-key PCF for OT?

Note that a PK-PCF implies in particular a non-interactive key exchange, which is notoriously less efficient in the post-quantum setting than with group-based cryptography, so one can at best hope for something with a cost that is reasonable, even if significantly higher than from quantumly-insecure assumptions.

Other correlations

Eventually, there are many other correlations of interest one could want PCGs or PCFs for – too many to list them all! The questions below are just a few illustrative samples, but one can ask many questions along these lines:

Question: Is there an efficient PCG for generating matrix Beaver triples, where the seed size is polylogarithmic in both the matrix dimensions and the number of samples?
Question: Is there an efficient PCG for generating shares of the same pseudorandom bit over two different fields, e.g., $\mathbb{F}_2$ and $\mathbb{F}_p$ for some $p>2$?