The aim of this post is to provide a simple, ready-for-use LaTeX article template for writing research papers on cryptography. This is the setup I generally use in my papers, and I figured it could be useful, e.g. for PhD student in crypto. This templates contains:

- My default main file, for conference versions or full versions of my research papers;
- My default header file;
- The basic additional files which are usually needed, such as the llncs class file, which is the (typically mandatory) class for articles submitted to IACR conferences.

This is not a post about how to install LaTeX, or how to setup a work environment; I assume that you already have a working LaTeX distribution, together with a text editor (I personally use Sublime Text) and a pdf reader (I use Skim). Also, I’m not the author of the template; I probably got it from David Pointcheval or Fabrice Benhamouda (they also likely got it from somewhere else themselves), and made a bunch of modifications here and there.

You can download the basic template here. To start using it, you will also need the crypto.bib file (not included directly since it’s a bit heavy), which contains bib references for most standard crypto conferences, journals, and for ePrint papers. To get the file, just go to cryptobib.di.ens.fr and download the crypto.bib file (on the left) inside the cryptobib folder of the template.

Most of it is self-explanatory. The main file is main.tex. Setting \fullversion to 1 will switch to a format with smaller margins, while setting it to 0 recovers the default margins which are mandatory for submissions to most IACR conferences, such as CRYPTO and EUROCRYPT. Other toggles control whether the submission is anonymous, or whether todos should be shown.

I usually put all other LaTeX files in the directory tex_files. All standard packages and macro are in the file ZZ_header.tex, in the tex_files folder. If you plan to use the template, take a few minutes to scroll it to get a grasp of the many useful shortcuts (with standard crypto notations such as \Enc, \Dec, or useful math notations such as \F for $\mathbb{F}$, \bit for $\{0,1\}$, etc).

I usually create a new LaTeX file for each new section, and input it directly in the main, like that:

```
\section{Introduction}
\label{sec:introduction}
\input{tex_files/01_introduction}
```

Most references you will need can be found in the crypto.bib file. In many situations, downloading the file will suffice for your need, but sometimes the project might run for a longer time, and involve more people, in which case you might want the crypto.bib updates to be added automatically to your project. This can be done using submodules on git, or externals on svn. This is all well explained in the manual.

I usually add all missing citations in add.bib. To get them, I look for the paper on Google Scholar. The bibtex can be found under the “cite” icon (a quote sign).

The default template for citations is the following:

```
[venue_acronym]:[author_initials][year]
```

where:

- venue_acronym is the standard shortcut for the name of the crypto conference or journal, e.g. EC for Eurocrypt, AC for Asiacrypt, EPRINT for ePrint, JoC for the Journal of Cryptology…
- author_initials is the full author last name for papers with a single author (e.g. EC:Couteau19), the first three letters of each names for papers with two or three authors (e.g. C:CouHar20) and the first letter of each name for papers with four or more authors (e.g. CCS:BCGIKRS19).
- year is the last two digits of the year (e.g. 21 for 2021).

In the years I spent browsing the stackexchange network, I devoted quite an amount of time answering questions about cryptography, from the most basic questions such as solving homeworks or understanding fundamental concepts, to more advanced questions about recent works, state of the art, and general summaries of what is known on a specific subject. Since some of these answers could possibly be helpful to others, I decided to select the most useful and detailed ones, and I organized them by categories.

]]>The learning parity with noise assumption (LPN) is one of the most fundamental assumptions of cryptography. It states that given a random secret vector $\vec s$ over $\mathbb{F}_2$, and given access to (an arbitrary polynomial number of) samples of the form $(\vec a, \langle \vec a, \vec s\rangle + e)$, where $\vec a$ is a random vector and $e$ is a random Bernoulli noise (i.e., $e$ is $1$ with some probability $p$, and $1$ otherwise), it is infeasible to recover $\vec s$. In other terms: while linear systems of equations are easy to solve (using Gaussian elimination), it becomes infeasible to solve them as soon as you add a bit of noise to the equations.

LPN has been widely used in cryptography, and exists in many different variants: for different noise distributions, for bounded number of samples (where it becomes equivalent to the syndrome decoding problem), for a different distribution over the $\vec a$, over a different field $\mathbb{F}$, etc. The goal of this post is not to cover variants and applications of LPN. Rather, I would like to state a very useful and folklore result regarding the security of LPN. While this result is folklore (and has been likely known for decades), I’ve not seen it stated explicitely until very recently (my coauthors and I stated it in our CRYPTO’21 paper on silent OT extension from structured LDPC codes, see my publication page). Since I believe it’s a useful observation and guiding principle when analyzing the security of LPN variants, I decided to make a post about it. What follows is essentially taken from our CRYPTO’21 paper.

we define the LPN assumption over a ring $\mathcal{R}$ with dimension $k$, number of samples $n$, w.r.t. a code generation algorithm $\mathsf{CodeGen}$, and a noise distribution $\mathcal{D}$:

Let $\mathcal{D}(\mathcal{R}) = \{\mathcal{D}_{k,n}(\mathcal{R})\}_{k,n\in\mathbb{N}}$ denote a family of efficiently sampleable distributions over a ring $\mathcal{R}$, such that for any $k,n\in\mathbb{N}$, $\mathsf{Image}(\mathcal{D}_{k,n}(\mathcal{R}))\subseteq\mathcal{R}^n$. Let $\mathsf{CodeGen}$ be a probabilistic code generation algorithm such that $\mathsf{CodeGen}(k,n,\mathcal{R})$ outputs a matrix $A\in \mathcal{R}^{n\times k}$. For dimension $k=k(\lambda)$, number of samples (or block length) $n=n(\lambda)$, and ring $\mathcal{R} = \mathcal{R}(\lambda)$, the (primal) $(\mathcal{D},\mathsf{CodeGen},\mathcal{R})\text{-}LPN(k,n)$ assumption states that

$\{(A, \vec{b}) \;|\; A\gets_r\mathsf{CodeGen}(k,n,\mathcal{R}), \vec{e}\gets_r\mathcal{D}_{k,n}(\mathcal{R}), \vec{s}\gets_r\mathbb{R}^k, \vec{b}\gets A\cdot\vec{s} + \vec{e}\}$

\[\approx \{(A, \vec{b}) \;|\; A\gets_r\mathsf{CodeGen}(k,n,\mathcal{R}), \vec{b}\gets_r\mathcal{R}^n\}.\]The above definition is very general, and captures in particular not only the standard LPN assumption and its variants, but also assumptions such as LWE or the multivariate quadratic assumption. However, we will typically restrict our attention to assumptions where the noise distribution outputs sparse vectors with high probability. The standard LPN assumption with dimension $k$, noise rate $r$, and $n$ samples is obtained by setting $A$ to be a uniformly random matrix over $\mathbb{F}_2^{n\times k}$, and the noise distribution to be the Bernoulli distribution $\mathsf{Ber}^n_r(\mathbb{F}_2)$, where each coordinate of $\vec e$ is independently set to $1$ with probability $r$ and to $0$ with probability $1-r$. The term *primal* in the above definition comes from the fact that the assumption can come in two equivalent form: the primal form as above, but also a *dual form*: viewing $A$ as the transpose of the parity check matrix $H$ of a linear code generated by $G$ a matrix, i.e. $A=H^\intercal$, the hardness of distinguishing $H^\intercal \cdot \vec x + \vec e$ from random is equivalent to the hardness of distinguishing $G\cdot (H^\intercal \cdot \vec x + \vec e) = G \cdot \vec e=\vec e\cdot G^\intercal$ from random (since $G^\intercal \cdot H = 0$).

Over the past few decades, a tremendous number of attacks against LPN have been proposed. These attacks include, but are not limited to, attacks based on Gaussian elimination and the BKW algorithm (and variants based on covering codes), information set decoding attacks, statistical decoding attacks, generalized birthday attacks, linearization attacks, attacks based on finding low weight code vectors, or on finding correlations with low-degree polynomials.

In light of this situation, it would be excessively cumbersome, when introducing a new variant of LPN, to go over the entire literature of existing attacks and analyze their potential impact on the new variant. The crucial observation, however, is that this is not necessary, as *all the above attacks* (and more generally, essentially all known attacks against LPN and its variants) fit in a common framework, usually denoted the *linear test framework*. Furthermore, the asymptotic resistance of any LPN variant against any attack from the linear test framework can be deduced from two simple properties of the underlying code ensemble and noise distribution. Informally, if

- the code generated by $G$ has high minimum distance, and
- for any large enough subset $S$ of coordinates, with high probability over the choice of $\vec e \gets \mathcal{D}$, at least one of the coordinates in $S$ of $\vec e$ will be nonzero,

then the LPN assumption with code matrix $G$ and noise distribution $\mathcal{D}$ cannot be broken by any attack from the linear test framework.

The common feature of essentially all known attacks against LPN and its variants is that the distinguisher can be implemented as a (nonzero) *linear function of the samples* (the linear test), where the coefficients of the linear combination can depend arbitrarily on the code matrix. Therefore, all these attacks can be formulated as distinguishing LPN samples from random samples by checking whether the output of some linear test (with coefficients depending arbitrarily on the code matrix) is biased away from the uniform distribution. Formally,

**Security Against Linear Tests.** Let $\mathbb{F}$ be an arbitrary finite field, and let $\mathcal{D} = \{\mathcal{D}_{m,n}\}_{m,n\in\mathbb{N}}$ denote a family of noise distributions over $\mathbb{F}^n$. Let $\mathsf{CodeGen}$ be a probabilistic code generation algorithm such that $\mathsf{CodeGen}(m,n)$ outputs a matrix $A\in \mathbb{F}^{n\times m}$. Let $\varepsilon, \delta: \mathbb{N} \mapsto [0,1]$ be two functions. We say that the $(\mathcal{D},\mathsf{CodeGen},\mathbb{F})\text{-}LPN(m,n)$ assumption with dimension $m = m(\lambda)$ and $n = n(\lambda)$ samples is *$(\varepsilon,\delta)$-secure against linear tests* if for any (possibly inefficient) adversary $\mathcal{A}$ which, on input a matrix $A\in \mathbb{F}^{n\times m}$, outputs a nonzero $\vec v \in \mathbb{F}^n$, it holds that

$\Pr[A \gets_r \mathsf{CodeGen}(m,n), \vec v \gets_r \mathcal{A}(A)\;:\; \mathsf{bias}_{\vec v}(\mathcal{D}_{A}) \geq \varepsilon(\lambda) ] \leq \delta(\lambda),$

where $\mathsf{bias}$ denotes the *bias* of the distribution (the bias of a distribution is defined as $\mathsf{bias}(\mathcal{D}) = \max_{\vec u \neq \vec 0} |\mathbb{E}_{\vec x \sim \mathcal{D}}[\vec u^\intercal \cdot \vec x] - \mathbb{E}_{\vec x \sim \mathcal{U}_n}[\vec u^\intercal \cdot \vec x]|$), and $\mathcal{D}_{A}$ denotes the distribution induced by sampling $\vec s \gets_r \mathbb{F}_2^m$, $\vec e \gets \mathcal{D}_{m,n}$, and outputting the LPN samples $A\cdot \vec s + \vec e$.

Now, define the *dual distance* of a matrix $M$, written $\mathsf{dd}(M)$, to be the largest integer $d$ such that every subset of $d$ rows of $M$ is linearly independent. The name dual distance stems from the fact that the $\mathsf{dd}(M)$ is also the minimum distance of the dual of the code generated by $M$ (i.e., the code generated by the left null space of $M$). The following lemma is folklore:

**Lemma:** For any $d\in \mathbb{N}$, the $(\mathcal{D},\mathsf{CodeGen},\mathbb{F})\text{-}LPN(m,n)$ assumption with dimension $m = m(\lambda)$ and $n = n(\lambda)$ samples is $(\varepsilon_d,\delta_d)$-secure against linear tests, where

- $\varepsilon_d = \max_{\mathsf{HW}(\vec v) > d}\mathsf{bias}_{\vec v}(\mathcal{D}_{m,n})$, and
- $\delta_d = \Pr_{A \gets_r \mathsf{CodeGen}(m,n)}[\mathsf{dd}(A) \geq d]$.

($\mathsf{HW}(\vec v)$ denotes the Hamming weight of $\vec v$)

**Proof :** The proof is straightforward: fix any integer $d$. Then with probability at least $\delta_d$, $\mathbb{dd}(A) \geq d$. Consider any (possibly unbounded) adversary $\mathcal{A}$ outputting $\vec v$. Two cases can occur:

- Either $\mathsf{HW}(\vec v) \leq d \leq \mathsf{dd}(A)$. In this case, the bias with respect to $\vec v$ of the distribution $\{A \cdot \vec s \;|\; \vec s \gets_r \mathbb{F}^m\}$ is $0$ (since this distribution is $d$-wise independent). Since the bias of the XOR of two distribution is at most the smallest bias among them, we get $\mathsf{bias}(\mathcal{D}_{A}) = 0$.
- Or $\mathsf{HW}(\vec v) > d$; in which case $\mathsf{bias}(\mathcal{D}_A) \leq \mathsf{bias}(\mathcal{D}_{m,n})$.

The above follows directly from simple lemmas on bias, which are recalled in my cheat sheet.

An instructive example is to consider the case of LPN with a uniformly random code matrix over $\mathbb{F}_2$, and a Bernoulli noise distribution $\mathcal{D}_{m,n} = \mathsf{Ber}^n_r(\mathbb{F}_2)$, for some noise rate $r$. The probability that $d$ random vectors over $\mathbb{F}_2^m$ are linearly independent is at least

\[\prod_{i=0}^{d-1} \frac{2^m - 2^i}{2^m} \geq (1-2^{d-1-m})^d \geq 1 - 2^{2d - m}.\]Therefore, by a union bound, the probability that a random matrix $A \gets_r \mathbb{F}_2^{n\times m}$ satisfies $\mathsf{dd}(A) \geq d$ is at least $1 - {n \choose d}\cdot 2^{2d - m} \geq 1 - 2^{(2+\log n)d - m}$. On the other hand, for any $d$ and any $\vec v$ with $\mathsf{HW}(\vec v) > d$, we have by the piling-up lemma (see the cheat sheet):

\[\Pr[\vec e \gets \mathsf{Ber}^n_r(\mathbb{F}_2)\; : \; \vec v^\intercal \cdot \vec e = 1] = \frac{1 - (1-2r)^d}{2},\]hence $\mathsf{bias}_{\vec v}(\mathsf{Ber}^n_r(\mathbb{F}_2)) = (1-2r)^d \leq e^{-2rd}$. In particular, setting $d = O(m/\log n)$ suffices to guarantee that with probability at least $\delta_d = 1 - 2^{-O(m)}$, the LPN samples will have bias (with respect to any possible nonzero vector $\vec v$) $\varepsilon_d$ at most $e^{-O(rm/\log n)}$. Hence, any attack that fits in the linear test framework against the standard LPN assumption with dimension $m$ and noise rate $r$ requires of the order of $e^{O(rm/\log n)}$ iterations. Note that this lower bound still leaves a gap with respect to the best known linear attacks, which require time of the order of $e^{O(rm)}$, $e^{O(rm/\log \log m)}$, and $e^{O(rm/\log m)}$ when $n = O(m)$, $n = \mathsf{poly}(m)$, and $n = 2^{O(m/\log m)}$ respectively.

It is straightforward to extend the above to general fields, but I’ll leave that as an exercise to the reader ;)

]]>- Is the RSA cryptosystem provably secure?
- Are there public-key cryptosystems not relying on arithmetic over finite fields? (short answer: yes)
- Are there knapsack-based cryptosystems which have not been broken? (short answer: yes)
- Are there practical universal PKE schemes? (short answer: no)
- Are there encryption schemes where the sender cannot prove that a given plaintext was encrypted?
- Can we use ElGamal over $\mathbb{Z}_{n^2}$? (short answer: not directly)

- Can FHE compute comparisons? (short answer: yes)
- Are there FHE schemes for deep learning operations? (short answer: yes)
- How short can be an FHE ciphertext? (short answer: almost as short as the message it encrypts)
- Are all homomorphic encryption schemes based on lattices? (short answer: it depends)
- Is there a BGN-like encryption scheme without restriction on the plaintext space? (short answer: yes)
- Are there additive homomorphic encryption schemes that support exponentiation?

- Why do we use multilinear maps in obfuscation schemes? (short answer: they are essentially necessary in a well defined sense – though that does not mean constructions must explicitely go through them!)
- Can we obfuscate functions that are mostly zero? (short answer: there is a gradation of increasingly complex obfuscation schemes from increasingly stronger assumptions for increasingly larger subclasses of mostly zero function; the linked answer provide a detailed overview.)

- Why is it plausible that hash functions are quantum secure?
- Does IND-CPA security imply PRFs? (short answer: yes)
- Is it possible to build a symmetric encryption scheme with beyond-brute-force security? (short answer: yes, but only if the messages come from a specific distribution)
- What are the differences between the various notions of CPA security?

- Is it hard to compute $(g^{ab})$ given $(g^a, g^b, a/b)$? (short answer: yes, under the square Diffie-Hellman assumption)
- Why do subexponential attacks on the DLP not work for ECDLP? (short answer: you don’t have small prime values over elliptic curves in general)
- What relation is known between LWE and LPN?
- Are there known reductions from LWE to MLWE or RLWE? (short answer: no)
- Are there decisional variants of the discrete logarithm problem? (short answer: yes)

- What are smooth projective hash functions?
- Why do we use pairing-based cryptography?
- Is there a password-authenticated key-exchange making only a blackbox use of standard cryptographic primitives? (short answer: this seems open, and it is a great question)
- Why are generalized Pedersen commitments secure?
- Are there results on the space complexity of cryptographic primitives? (short answer: yes)
- Why are VDFs preferred to proofs of sequential work?
- What are the cryptographic advantages of using the subgroup of pseudosquares?
- What are the standard constructions of PRFs from well-known assumptions?
- Can you build searchable encryption from the discrete logarithm problem?
- Is it possible to prove that a document was generated long ago?
- Can Pedersen commitments be made deterministic? (short answer: only if you have high min-entropy plaintexts and settle for a weaker security notion)

- Do OWF imply $\mathsf{P} \neq \mathsf{NP}?$ (short answer: yes)
- How does the random oracle model help with constructing secure cryptographic primitives?
- What are the common idealized models in cryptography?
- Can we have cryptography in a world where $\mathsf{P} = \mathsf{NP}?$? (short answer: possibly!)
- How do we estimate that an assumption is sufficiently safe?
- Can the hardness of “breaking a cryptosystem” be based on an NP-complete problem?
- Are there good candidate OWFs with a very simple structure (like, “4 lines of code”-simple)? (short answer: yes)
- How to prove that weak OWFs cannot have a polysize range?

- How to multiply two additively shared values?
- How can two parties conditionally disclose a secret to a third party?
- How to generate Beaver triples?
- Can two parties securely check whether they hold the same value? (keyword: socialist millionaires problem)
- How is silent oblivious transfer constructed?
- How to do secure bitwise mutliplication?

- What is the difference between boolean circuits and arithmetic circuits?
- Are there MPC schemes that do not use secret sharing?
- Can Charlie privately compute $ab$ using a single message from Alice (who knows $a$) and Bob (who knows $b$)?
- Can we do secure computation over incomplete networks?
- Why are garbled circuits randomized encodings?
- How important is round complexity in MPC?

- How to explain ZK proofs to a 7 year old?
- Are there lower bounds on the size and interaction of a ZK proof with a given soundness error?
- Can we do zero-knowledge proofs for BPP statements? (short answer: just send the witness!)
- What is the difference between ZK proofs and ZK proofs of knowledge? (short answer: in ZK proof of knowledge, an extractor can recover the witness)
- Why is computational ZK the most general notion of zero-knowledge?
- How to choose the security parameter for a ZK proof?
- Are there lower bounds on the round complexity and communication of ZK proofs? (and also: how far are existing ZK proofs from these bounds)
- How do common reference strings help in ZK proofs? (short answer: they help reducing round complexity)
- What are the links between ZK proofs and homomorphic encryption? (short answer: there are many links.)

- Why is “sending a hash of the witness” not a valid ZK proof of knowledge of the witness? (also includes a walkthrough of why Schnorr’s proof is a honest-verifier ZK proof)
- Why is the Diffie-Hellman key-exchange protocol not a proof of knowledge of a discrete logarithm? (also includes a discussion about knowledge-of-exponent assumptions, and public-coin versus private coin proofs)

- How to prove that a committed value is the square of the other?
- How to prove that a committed number belongs to a certain range?
- What are the different techniques to build a range proof?
- Where did the four square decomposition technique originate?
- How to prove that a number is not within a range?
- How to prove knowledge of a witness for a DDH tuple?
- Is it possible to prove knowledge of an AES key without showing it? (short answer: yes)
- How to prove correctness of a long computation with a short proof?
- Can we replace pairings by homomorphic encryption in SNARGs/SNARKs? (short answer: yes, but it becomes designated-verifier)
- How to prove soundness of $\Sigma$-protocols in unknown-order groups?

- Is it possible to randomize a non-interactive ZK proof? (short answer: yes)
- What are the security issues of making ZK proofs non-interactive with Fiat-Shamir? (short answer: transferability, computational soundness)
- How to verify a signature on an encrypted message?
- Why is Schnorr’s proof not an argument? (short answer: recovering the discrete log in unbounded time does not contradict soundness)
- Why is a transparent setup desirable for SNARKs?
- Do NIZKs imply simulation-sound NIZKs? (short answer: yes)
- Are there post-quantum SNARGs?

In the course of working on various projects, I found myself spending an excessive amount of time skimming through textbooks and Wikipedia pages to be reminded of the exact statement of various simple probability facts and lemmas. In some cases, this was to check that I was getting the constants right, or not forgetting a condition – in others, this was just out of laziness. To simplify future searches, I decided to centralize in a cheat sheet a bunch of standard probability lemmas, starting from the most basic facts, but also including some slightly more advanced lemmas. These lemmas showed up several times in my work, and are likely to be useful to cryptographers and theoretical computer scientists. An outdated PDF version of this cheat sheet, in a compact two-column format, is also available here.

Let $\mathsf{Ber}_p$ denote the Bernouilli distribution with probability $p$. $\mathsf{SD}(X,Y)$ denotes the statistical distance between random variables $(X,Y)$ over a set $S$, defined as

\begin{align}
\mathsf{SD}(X,Y) &= \frac{1}{2} \cdot \sum_{x\in S} |\Pr[X = x] - \Pr[Y = x]|\\

&= \max_{f:S\mapsto{0,1}} |\Pr[f(X)=1] - \Pr[f(Y) = 1]|\\

&= \max_{Z\subseteq S} |\Pr[X \in Z] - \Pr[Y \in Z]|.
\end{align}

**Union Bound:** $\Pr[A \cup B] \leq \Pr[A] + \Pr[B].$

**Bayes’ Rule:** $\Pr[A | B] = \frac{\Pr[B | A]\cdot \Pr[A]}{\Pr[B]}.$

\begin{align} \Pr[A \cap B] &\leq \min\{\Pr[A],\Pr[B],\Pr[A|B], \Pr[B|A]\}\\ \Pr[A] + \Pr[B] - 1 &\leq \Pr[A \cup B] \end{align}

If $X$ is a random variable taking nonnegative integer values, then

\[\mathbb{E}[X] = \sum_{k=1}^{\infty} \Pr[X \geq k].\]For nonnegative $X$ and differentiable $f$,

\[\mathbb{E}[f(X)] = f(0) + \int_{0}^{\infty} f'(x)\Pr[X \geq x]dx.\]**Cauchy-Schwarz:** $|\mathbb{E}[XY]| \leq \mathbb{E}[|XY|] \leq \sqrt{\mathbb{E}[|X|^2]\mathbb{E}[|Y|^2]}.$

**Jensen:** $\text{For }\phi \text{ convex, } \phi(\mathbb{E}[X]) \leq \mathbb{E}[\phi(X)].$

Given a distribution $\mathcal{D}$ over $\mathbb{F}^n$ and a vector $\vec u \in \mathbb{F}^n$, the bias of $\mathcal{D}$ with respect to $\vec u$, denoted $\mathsf{bias}_{\vec u}(\mathcal{D})$, is equal to

\[\mathsf{bias}_{\vec u}(\mathcal{D}) = \left|\mathbb{E}_{\vec x \sim \mathcal{D}}[\vec u^\intercal \cdot \vec x] - \mathbb{E}_{\vec x \sim \mathbb{U}_n}[\vec u^\intercal \cdot \vec x] \right| = \left|\mathbb{E}_{\vec x \sim \mathcal{D}}[\vec u^\intercal \cdot \vec x] - \frac{1}{|\mathbb{F}|} \right|,\]where $\mathbb{U}_n$ denotes the uniform distribution over $\mathbb{F}^n$. The bias of $\mathcal{D}$, denoted $\mathsf{bias}(\mathcal{D})$, is the maximum bias of $\mathcal{D}$ with respect to any nonzero vector $\vec u$.

Given $t$ distributions $(\mathcal{D}_1, \cdots, \mathcal{D}_t)$ over $\mathbb{F}_2^n$, we denote by $\bigoplus_{i\leq t} \mathcal{D}_i$ the distribution obtained by independently sampling $\vec v_i \gets_r \mathcal{D}_i$ for $i=1$ to $t$ and outputting $ \vec v \gets\vec v_1 \oplus \cdots \oplus \vec v_t$. Then $\mathsf{bias}( \bigoplus_{i\leq t} \mathcal{D}_i ) \leq 2^{t-1}\cdot \prod_{i=1}^t \mathsf{bias}(\mathcal{D}_i) \leq \min_{i \leq t} \mathsf{bias}(\mathcal{D}_i)$. Note that the piling up lemma (given below) can provide a tighter bound if needed.

**Markov Inequality:** Let $X$ be a positive random variable with finite expected value $\mu$. Then for any $k > 0$,

**Bienaymé-Chebyshev Inequality:** Let $X$ be a random variable with finite expected value $\mu$ and finite nonzero variance $\sigma^2$. Then for any $k > 0$,

**Chernoff Inequality:** Let $n\in\mathbb{N}$ and let $(X_1, \cdots, X_n)$ be independent random variables taking values in ${0,1}$. Let $X$ denote their sum and $\mu \gets \mathbb{E}[X]$. Then for any $\delta \in [0,1]$,

Furthermore, for any $\delta \geq 0$,

\[\Pr[X \geq (1+\delta)\mu] \leq \exp\left(-\frac{\delta^2\mu}{2+\delta}\right).\]Note also the tighter, but dirtier bounds:

\[\Pr[X \geq (1+\delta)\mu] \leq \left(\frac{e^\delta}{(1+\delta)^{1+\delta}}\right)^{\mu}\text{ and } \Pr[X \leq (1-\delta)\mu] \leq \left(\frac{e^{-\delta}}{(1-\delta)^{1-\delta}}\right)^{\mu}.\]**Generalized Chernoff Inequality (here):** Let $n\in\mathbb{N}$ be an integer and let $(X_1, \cdots, X_n)$ be boolean random variables such that, for some $\delta\in [0,1]$, it holds that for every subset $S \subset [n]$, $\Pr[\wedge_{i\in S} X_i] \leq \delta^{|S|}.$ Then for any $\gamma \in [\delta, 1]$,

where $D(\gamma||\delta)$ denotes the relative entropy function, satisfying $D(\gamma||\delta) \geq 2(\gamma-\delta)^2$. For more discussions and a constructive proof of the generalized Chernoff bound, see Impagliazzo and Kabanets.

**Bernstein Inequality:** Let $X_1, \cdots, X_m$ be independent zero-mean random variables, and let $M$ be a bound such that $|X_i| \leq M$ almost surely for $i=1$ to $m$. Let $X$ denote the random variable $\sum_{i=1}^m X_i$. It holds that

**Bounded Difference Inequality:** First proved by McDiarmid, in a more general form than below. Special case of Azuma inequality. Let $(n,m)\in\mathbb{N}^2$ be two integers. We say that a function $\Phi:[n]^m\mapsto \mathbb{R}$ satisfies the *Lipschitz property with constant $d$* if for every $\vec x, \vec x’ \in [n]^m$ which differ in a single coordinate, it holds that $|\Phi(\vec x) - \Phi(\vec x’)| \leq d.$ Then, the statement of the bounded difference inequality is as follows: let $\Phi:[n]^m\mapsto \mathbb{R}$ be a function satisfying the Lipschitz property with constant $d$, and let $(X_1, \cdots, X_m)$ be independent random variables over $[n]$. It holds that

Let $\mathsf{H}(x) = x\log(1/x) + (1-x)\log(1/(1-x))$ be the binary entropy function. We let

\[\mathsf{H}_1(X),\; \mathsf{H}_\infty(X),\; \mathsf{H}_\infty(X\;|\; Z),\; \mathsf{H}^{\varepsilon}_\infty(X)\]denote respectively the Shannon entropy, min-entropy, average min-entropy conditioned on $Z$, and $\varepsilon$-smooth min-entropy of a random variable $X$, defined as

\begin{align}
\mathsf{H}_1(X) &= - \sum_{x = 1}\Pr[X= x]\cdot \log\Pr[X= x]\\

\mathsf{H}_\infty(X\;|\; Z) &= - \log\mathbb{E}_{z\gets Z}[2^{-\mathsf{H}_\infty(X\;|\;Z=z)}]\\

\mathsf{H}^{\varepsilon}_\infty(X) &= \max_{\mathsf{SD}(X,Y)\leq \varepsilon} \mathsf{H}_\infty(Y)\\

\mathsf{H}_1(X) &= - \sum_{x\in \mathsf{Supp}(X)}\Pr[X= x]\cdot \log\Pr[X= x]
\end{align}

Note that $\mathsf{H}_1(\mathsf{Ber}_p) = \mathsf{H}(p)$.

**Dodis et al., Lemma 2.2a:** For any $\delta > 0$, $\mathsf{H}_\infty(X|Z = z)$ is at least $\mathsf{H}_\infty(X\;|\; Z) - \log(1/\delta)$ with probability at least $1-\delta$ over the choice of $z$.

**Dodis et al., Lemma 2.2b:** Conditioning on $Z$ that has $b$ bits of information reduces the entropy of $X$ by at most $b$: $\mathsf{H}_\infty(X\;|\; Z_1,Z_2) \geq \mathsf{H}_\infty(X, Z_1\;|\; Z_2) - \log |\mathsf{Supp}(Z_1)|$.

**Sums of Binomial Coefficients:** For any $0 < \mu < 1/2$ and $m\in\mathbb{N}$,

Alternatively, writing

\[\sum_{i=1}^{\mu m} {m \choose i} = {m \choose \mu m} \left [ 1 + \frac{\mu m}{m-\mu m +1} + \frac{\mu m (\mu m - 1)}{(m-\mu m +1)(m - \mu m + 2)} + \cdots \right ],\]we get, bounding the above by a geometric series:

\[\sum_{i=1}^{\mu m} {m \choose i} \leq {m \choose \mu m} \cdot \frac{1-\mu}{1-2\mu}.\] \[\frac{1}{\sqrt{2\pi n \delta (1-\delta)}} \exp\left(n\cdot\mathsf{H}(\delta) - \frac{1}{12 n \delta (1-\delta)} \right) \leq {n \choose \delta n} \leq \frac{1}{\sqrt{2\pi n \delta(1-\delta)}}\exp\left(n\cdot \mathsf{H}(\delta)\right).\] \[\sqrt{\frac{n}{8k(n-k)}}\cdot e^{n\cdot h(k/n)} \leq {n \choose k} \leq \sqrt{\frac{n}{2\pi k(n-k)}}\cdot e^{n\cdot h(k/n)},\]where $h(\cdot)$ denotes the binary entropy function. The version above is taken from Bob Gallager’s Information Theory and Reliable Communications

- For $k = o(n)$, $\log {n\choose k} = (1+o(1))k \log \frac{n}{k}$.
- For any $(k,n)$, $\left(\frac{n}{k}\right)^k \leq {n \choose k} \leq \frac{n^k}{k!} < \left(\frac{ne}{k}\right)^k$.

- $\forall x > 0$, $\exp(-x) > 1-x$.
- $\forall\; 0 < x < \frac{2-\sqrt{2}}{2}$, $1-x > 2^{- \frac{2+\sqrt{2}}{2} x}$.
- $\forall n\geq 1$, $\left(1-\frac{1}{n}\right)^{n} \leq \exp\left(-1\right)$, and $\exp(-1) \leq \left(1-\frac{1}{n}\right)^{n-1}$.
- $\forall \delta > 0$, $\frac{2\delta}{2+\delta} \leq \log(1+\delta)$.

Let $A\subset X \times Y$ such that $\Pr[(x,y) \in A] \geq \varepsilon$. For any $\varepsilon’ < \varepsilon$, defining $B$ as $B = \{(x, y) \in X \times Y \;|\; \Pr_{y’\gets_r Y}[(x, y’) \in A] \geq \varepsilon - \varepsilon’\}$, it holds that

\begin{align} \Pr[B]\geq \varepsilon’ &&\forall (x,y)\in B, \Pr_{y’}[(x,y’)\in A]\geq \varepsilon - \varepsilon’ &&\Pr[B|A] \geq \varepsilon’/\varepsilon. \end{align}

For any $q\geq 1$, any set $H$ with $|H| \geq 2$, and randomized PPT algorithm $\mathcal{A}$ which, on input $(x, h_1, \cdots, h_q)$ returns a pair $(J,\sigma) \in [q]\times \{0,1\}^{*}$, and input distribution $\mathcal{D}$, let

\[\mathsf{acc} \gets \Pr[x \gets_r \mathcal{D}, (h_1, \cdots, h_q) \gets_r H, (J,\sigma) \gets_r \mathcal{A}(x, h_1, \cdots, h_q) : J \geq 1].\]Then define the following algorithm $F_{\mathcal{A}}$: on input $x \in \mathsf{Supp}(\mathcal{D})$, $F_{\mathcal{A}}(x)$ picks coins $r$, $(h_1, \cdots, h_q) \gets_r H$, and runs $(I, \sigma) \gets \mathcal{A}(x, h_1, \cdots, h_q; r)$. If $I=0$, it returns $(0, \varepsilon, \varepsilon)$. Else, it picks $(h’_1, \cdots, h’_q) \gets_r H$, and runs $(I’, \sigma’) \gets \mathcal{A}(x, h_1, \cdots, h_{I-1}, h’_I, \cdots, h’_q; r)$. If $I=I’$ and $h_I \neq h_{I’}$, it returns $(1, \sigma, \sigma’)$; else, it returns $(0,\varepsilon, \varepsilon)$. Let

\[\mathsf{frk} \gets \Pr[x \gets_r \mathcal{D}, (b,\sigma, \sigma') \gets_r F_{\mathcal{A}}(x) : b = 1].\]Then

\[\mathsf{acc} \leq \frac{q}{h} + \sqrt{q\cdot \mathsf{frk}}.\]To be added later.

For $0 < \mu < 1/2$ and random variables $(X_1, \cdots, X_t)$ i.i.d. to $\mathsf{Ber}_\mu$, it holds that

\[\Pr\left[\bigoplus_{i=1}^t X_i = 0\right] = \frac{1}{2}\cdot\left(1 + (1-2\mu)^t\right) = \frac{1}{2} + 2^{-c_\mu t-1},\]where $c_\mu = \log \frac{1}{1-2\mu}$. In other terms, for any $0 < \mu \leq \mu’ < 1/2$, it holds that

\[\mathsf{Ber}_\mu \oplus \mathsf{Ber}_{\frac{\mu'-\mu}{1-2\mu}} \approx \mathsf{Ber}_{\mu'}.\]To come: universal hashing, pairwise independent hashing

]]>