# Input Transformation Based Zero-Knowledge Argument System for Arbitrary Circuits 

Frank Y.C. Lu<br>YinYao Inc.


#### Abstract

We introduce a new efficient, transparent, interactive zero-knowledge argument system that is based on the new input transformation concept that we will introduce in this paper. The core of this concept is a mechanism that converts input parameters into a format that can be processed directly by the circuit so that the circuit output can be verified through direct computation of the circuit. Our benchmark result shows our approach can significantly improve both prover runtime and verifier runtime performance by either close to or more than one order of magnitude over the state of the art while keeping the communication cost competitive with that of the state of the art. Specifically, when processing an deep circuit of $2^{20}$ sequential multiplication gates with 960 input bits on a single CPU thread, the performance of the BinaryBoost version of our protocol is: 0.8 seconds for the prover runtime cost; 17 milliseconds for the verifier runtime cost; and 55 kilobytes for the communication cost. Our approach also allows our protocol to be memory-efficient without forcing it to require a designated verifier. The theoretical memory cost of our protocol is $\leq O\left(m_{p}{ }^{\frac{1}{2}}\right)$ without requiring a designated verifier.


## 1 Introduction

Ever since the discoveries of interactive proofs (IPs) 22 and probabilistically checkable proofs (PCPs) [5] [4] 3] 2n the late last century, there has been a tremendous amount of research in the area of proof systems. More recently, the rise of blockchain and Web3 has finally triggered real-world deployments of zero-knowledge systems.

Due to the expensive computation cost in the setup phase of earlier SNARKs (Succinct NonInteractive Argument of Knowledge), it has become a significant interest to have the structured reference string (SRS) be constructible in a "universal and updatable" fashion, meaning that the same SRS can be used for statements about all circuits of a certain bounded size. The first universal SNARK was in Groth et al. [23], and Maller et al. improved the SRS size from quadratic to linear in Sonic [26]. More recently developed protocols such as PLONK [20, MARLIN [15] are universal fully-succinct SNARK with significantly improved prover runtime compared to the fully-succinct Sonic. However, many of these universal succinct SNARKs systems require trusted setup, and the prover run-time of these protocols is prohibitively expensive even with the latest improvements such as HyperPlonk [14, usually takes over 100 seconds on a single-threaded CPU for a circuit with over $2^{20}$ constraints.

Protocols belong to the Goldwasser, Kalai, and Rothblum (GKR) class such as Hyrax [31], Virgo [36]; MPC-in-the-head class of Kushilevitz, Ostrovsky, and Sahai such as ZKBoo [21] and Ligero/Ligero++ [1] [8] offer efficient prover runtimes that are at least one order of magnitude more efficient than pairingbased SNARKs, and many of these protocols do not require trusted setups. However, these protocols are largely ignored by the industry (e.g., the blockchain community) due to their expensive verifier runtime and high communication cost (hundreds of KBs) compared to fully succinct protocols such as STARK [7, PLONK, MARLIN, and Supersonic [13. Furthermore, state-of-the-art GKR protocols generally have additional dependency on circuit depth, where protocol complexity increases and performance significantly degrades as the circuit depth gets longer, making them less attractive to the industry where complex business logics (e.g., inputs are floating point numbers) are expected on smart contracts.

Memory-efficient privacy-free garbled circuits [25] [19] [24] and Vector Oblivious Linear Evaluation (VOLE) protocols [10 [28 [12] [11 [35 (32] [6] 34 generally offer better prover performance. However, their verifier runtimes are just as expensive as their prover runtime and generally cannot be easily made fully non-interactive, and their communication cost is easily many orders of magnitude more expensive than other approaches.

NIZKs such as SpartanNIZK [29] and later Lakonia 30 seem to offer a much more balanced approach, where they offer efficient prover runtime ( $6-18$ seconds single thread) and competitive communication costs for large circuits ( $2^{20}$ constraints) while not being layer dependent. However, the downside of these protocols is that their verifier performance is still expensive, usually in the $400+$ ms range on a single-threaded CPU.

Our aim is to create a new transparent zero-knowledge protocol that offers great flexibility to optimize and the best overall performance. Specifically, we want to keep the prover runtime cost and communication cost comparable to those of the state-of-the-art and improve the verifier runtime by one order of magnitude over that of the state-of-the-art. Finally, we also want our new system to be memory-efficient, or at least have the option to be memory-efficient.

### 1.1 Summary of Contributions

Our approach is to design a new class of protocols that allows verifiers to validate circuit outputs by directly examining circuit inputs without going through some intermediate translation phase. In our protocol, circuit inputs in the Pedersen commitment form are converted to linear polynomials in the integer field so that verifiers can use standard integer operations to compute and verify each circuit output. In addition to performance gains, we believe such an approach offers more flexibility in designing customized sub-circuits/gates and would allow developers to code business rules exactly as they are described in business language.

There were past attempts that somewhat enabled verifiers to "execute" each multiplication gate on its own, such as Cramer and Damgåd [16] and more recent designated-verifier (which is a limitation itself) VOLE (LPKZ in particular) [18 [34 [17 [33 based protocols. In these older strategies, each multiplication gate computation is actually not computed but "confirmed" by the verifier using transcripts tight to each multiplication gate. As a result, the communication/verifier costs of these earlier protocols are generally linear with the number of multiplication gates in a circuit.

On the other hand, the input transformation technique introduced by our protocol allows verifiers to use transformed inputs to directly (one operation for each operation, like we do with clear text data) compute the circuit (important), and the verifier computed output is still bound to the challenge $x$. This is a first and brings us three direct benefits; 1) After "computing" the whole circuit using transformed inputs, the verifier can now validate sub-linear sized proof transcripts in sub-linear runtime. 2) Since the whole circuit is linearly/directly computed by the verifier, we can break a large circuit into several smaller sub-circuits, minimizing the memory footprint to that of a sub-circuit. 3) Because the circuit is linearly "computed" by both the prover and the verifier, it gives the developer the power to significantly reduce the size of the circuit by combining "other" protocols in the middle and bypassing the "inactive" part of the circuit logic.

In our protocol, we begin by transforming each committed input parameter in $\mathbb{G}$ into its linear polynomial form in $\mathbb{Z}_{q}$. For a simple circuit $a_{1}{ }^{d}+a_{2}{ }^{d}+a_{3}{ }^{d}=r$ s.t. circuit inputs are $a_{1}, a_{2}, a_{3}$ and the circuit output is $r$. In our protocol, inputs $a_{1}, a_{2}, a_{3}$ and output $r$ are committed by the prover using Pedersen commitment. The prover then provides the transformed inputs $a_{1}, a_{2}, a_{3}$ in the linear polynomial form $a_{1}^{\prime}, a_{2}^{\prime}, a_{3}^{\prime} \in \mathbb{Z}_{q}$ s.t. $a_{i}^{\prime}=a_{i}+X \alpha_{i} \in \mathbb{Z}_{q}$ ( $\alpha_{i}$ is its blinding key). Since the transformed inputs are in $\mathbb{F}$, the verifier can plug these values directly into the circuit and just "execute" them to get the output $o$ e.g. $a_{1}^{\prime d}+a_{2}^{\prime d}+a_{3}^{\prime d}=o$. The circuit output $o \in \mathbb{Z}_{p}$ is the evaluation at point $x$ of a degree $d$ polynomial s.t. $f(x)=o$. The constant term of this polynomial is the circuit output $r$ and all other $d$ coefficients are its blinding keys. If the prover can prove 1) it knows all coefficients of the output polynomial before the evaluation point is given (e.g., using a polynomial commitment) 2) all input transformations are legit, then we say the proof is legit.

The output polynomial in the example above has a degree of $d$ because the transformed inputs (linear polynomials) are of degree 1 . Taking to its $d$ th power will give a polynomial with a degree of $d$. So if the circuit is something like $a_{1}{ }^{3}+a_{2}{ }^{3}+a_{3}{ }^{3}+, \ldots,+a_{t}{ }^{3}=r$, the degree of the output polynomial is 3 regardless of the value of $t$. Throughout our paper, we use the symbol $m_{p}$ (short for "multiplication path") to denote the maximum number of multiplications included in the path that leads to the circuit output, which is one less than the degree of the output polynomial (e.g. if the degree of the output polynomial is 3 , then $m_{p}=2$ ). $m_{p}$ is different from "multiplication depth". For example, for a circuit $a_{1}{ }^{3} \cdot{a_{2}}^{5}+a_{3}{ }^{6}=r, m_{p}=7\left(a_{1}{ }^{3} \cdot a_{2}{ }^{5}=\right.$ a polynomial of degree 8$)$, which is bigger than the multiplication depth (5) but smaller than the total number of multiplications (12).

For a deep circuit where $m_{p}$ value is large (unlike GKR based protocols, layers made up of addition gates make negligible performance impact in our protocol), the prover runtime of the base version of our protocol (Protocol 1) is dominated by $O\left(m_{p}^{2}+m_{p}+l\right)$ field operations and $O\left(m_{p}+m_{p}^{1 / 2}+l\right)$ group exponentiations, where $m_{p}$ stands for the total number of multiplication gates included in the path that contains most multiplications and $l$ stands for the number of inputs to a circuit; the verifier runtime is dominated by $O\left(n+m_{p}^{1 / 2}+l\right)$ field operations and $O\left(m_{p}^{1 / 2}+l\right)$ group exponentiations; and the communication cost is dominated by $O\left(m_{p}{ }^{1 / 2}+1\right)$ group elements and $O\left(m_{p}{ }^{1 / 2}+l\right)$ field elements.

On the other hand, if the circuit is shallow (e.g., for a circuit with $n / 2$ addition operations and $n / 2$ multiplication operations: $r=\sum_{i=1}^{n} a \cdot b$ where $r$ is the circuit output and $a, b$ are circuit inputs and $n$ stands for the total number of gates in a circuit., we have $m_{p}=1$ ), the prover work would be dominated by $O(n / 2)$ field addition operations, which is very cheap.

Our protocol is specifically efficient for proving complex application logic where the circuit depth is high and can be greatly simplified using customized gates. Furthermore, our protocol gives the developer the power to significantly reduce the size of the circuit by bypassing the "inactive" part of the circuit logic.

Specifically, when processing an deep circuit of $2^{20}$ sequential multiplication gates ( $m_{p}=n$ ) with 960 input bits on a single CPU thread, the performance of the BinaryBoost version of our protocol is: 0.8 seconds for the prover runtime cost; 17 milliseconds for the verifier runtime cost; and 55 kilobytes for the communication cost. To the best of our knowledge, our protocol offers the best prover/verifier runtime performance in literature (transparent/non-interactive/high-depth protocols) by a large margin.

On the memory side, the theoretical memory cost of our protocol is $\leq O\left(m_{p}{ }^{\frac{1}{2}}\right)$. This makes our protocol extremely attractive because VOLE-based memory-efficient protocols generally require one round of interaction and are extremely expensive in terms of verifier runtime cost and communication cost.

We introduce our protocol in an interactive setting where all verifier challenges are random field elements. In practice, we assume the Fiat-Shamir heuristic is applied to our protocol to obtain a non-interactive zero-knowledge argument in the random oracle model.

## 2 Preliminaries

### 2.1 Assumption

Definition 1. (Discrete Logarithmic Relation) For all PPT adversaries $\mathcal{A}$ and for all $n \geq 2$ there exists a negligible function $\boldsymbol{n e g l}(\lambda)$ s.t.

$$
\operatorname{Pr}\left[\left.\begin{array}{c}
\mathbb{G}=\operatorname{Setup}\left(1^{\lambda}\right), g_{0}, \ldots, g_{n-1} \stackrel{\$}{\leftarrow} \mathbb{G} \\
a_{0}, \ldots, a_{n-1} \in \mathbb{Z}_{p} \leftarrow \mathcal{A}\left(\mathbb{G}, g_{0}, \ldots, g_{n-1}\right)
\end{array} \right\rvert\, \exists a_{i} \neq 0 \wedge \prod_{i=0}^{n-1} g_{i}^{a_{i}}=1\right] \leq \boldsymbol{n e g l}(\lambda)
$$

The Discrete Logarithmic Relation assumption states that an adversary can't find a non-trivial relation between the randomly chosen group elements $g_{0}, \ldots, g_{n-1} \in \mathbb{G}^{n}$, and that $\prod_{i=0}^{n-1} g_{i}^{a_{i}}=1$ is a
non-trivial discrete $\log$ relation among $g_{0}, \ldots, g_{n-1}$. Please note the generators we use in this paper are $g, h, \vec{u} \in \mathbb{G}$.

### 2.2 Zero-Knowledge Argument of Knowledge

Interactive arguments are interactive proofs in which security holds only against computationally bounded provers. In an interactive argument of knowledge for a relation $\mathcal{R}$, a prover convinces a verifier that it knows a witness $w$ for a statement $x$ s.t. $(x, w) \in \mathcal{R}$ without revealing the witness itself to the verifier.

Let $(\mathcal{P}, \mathcal{V})$ denote a pair of PPT interactive algorithms, and $\boldsymbol{S e t u p}$ denotes a non-interactive setup algorithm that outputs public parameters $p p$ given a security parameter $\lambda$. Let $\langle\mathcal{P}(p p, x, w), \mathcal{V}(p p, x)\rangle$ denote the output of $\mathcal{V}$ on input $x$ after its interaction with $\mathcal{P}$, who has knowledge of witness $w$. The triple $(\boldsymbol{S e t u p}, \mathcal{P}, \mathcal{V})$ is called an argument for relation $\mathcal{R}$ if for all non-uniform PPT adversaries $\mathcal{A}$ it satisfies completeness, soundness, and zero-knowledge definitions defined below:

Definition 2. (Perfect Completeness) The triple (Setup, $\mathcal{P}, \mathcal{V})$ satisfies perfect completeness if for all PPT $\mathcal{A}$ :

$$
\operatorname{Pr}\left[\begin{array}{c|c}
(p p, x, w) \notin \mathcal{R} \text { or } & p p \leftarrow \boldsymbol{\operatorname { S e t u p }}\left(1^{\lambda}\right) \\
\langle\mathcal{P}(p p, x, w), \mathcal{V}(p p, x)\rangle=1 & (x, w) \leftarrow \mathcal{A}(p p)
\end{array}\right]=1
$$

The soundness notion we consider in this work is computational witness-extended emulation.
Definition 3. (Computational Witness-Extended Emulation or CWEE) Given a public-coin interactive argument tuple $(\operatorname{Setup}, \mathcal{P}, \mathcal{V})$ and arbitrary prover algorithm $\mathcal{P}^{*}$, let $\boldsymbol{\operatorname { R e c o r d e r }}\left(\mathcal{P}^{*}, p p, x, s\right)$ denote the message transcript between $\mathcal{P}^{*}$ and $\mathcal{V}$ on shared input $x$, initial prover state $s$, and $p p$ generated by Setup. Furthermore, let $\mathcal{E}$ Recorder ( $\mathcal{P}, p p, x, s$ ) denote a machine $\mathcal{E}$ with a transcript oracle for this interaction that can rewind to any round and run again with fresh verifier randomness. The tuple $(\operatorname{Setup}, \mathcal{P}, \mathcal{V})$ has CWEE if for every deterministic polynomial time $\mathcal{P}$ there exists an expected polynomial time emulator $\mathcal{E}$ s.t. for all non-uniform polynomial time adversaries $\mathcal{A}$ the following holds:

$$
\begin{array}{c|c}
\mid \operatorname{Pr}[\mathcal{A}(\operatorname{tr})=1 & \left.\begin{array}{c}
p p \leftarrow \operatorname{Setup}\left(1^{\lambda}\right) \\
(x, s) \leftarrow \mathcal{A}(p p) \\
\operatorname{tr} \leftarrow \operatorname{Recorder}\left(\mathcal{P}^{*}, p p, x, s\right)
\end{array}\right]- \\
\operatorname{Pr}\left[\begin{array}{c}
\mathcal{A}(\operatorname{tr})=1 \wedge \\
\operatorname{tr} \operatorname{accepting} \\
\Longrightarrow(x, w) \in \mathcal{R}
\end{array}\right. & \left.\begin{array}{c}
p p \leftarrow \operatorname{Setup}\left(1^{\lambda}\right) \\
(x, s) \leftarrow \mathcal{A}(p p) \\
\hline \operatorname{tr}, w) \leftarrow \mathcal{E}^{\operatorname{Recorder}\left(\mathcal{P}^{*}, p p, x, s\right)}(p p, x)
\end{array}\right]
\end{array}
$$

The zero-knowledge property requires that the verifier doesn't learn anything about the witness from its interaction with an honest prover.
Definition 4. (Perfect Special Honest Verifier Zero Knowledge for Interactive Arguments) An interactive proof is (Setup $, \mathcal{P}, \mathcal{V})$ is a perfect special honest verifier zero knowledge (PHVZK) argument of knowledge for $\mathcal{R}$ if there exists a probabilistic polynomial time simulator $\mathcal{S}$ such that all pairs of interactive adversaries $\mathcal{A}_{1}, \mathcal{A}_{2}$ have the following property for every $(x, w, \sigma) \leftarrow \mathcal{A}_{2}(p p) \wedge(p p, x, w) \in \mathcal{R}$, where $\sigma$ stands for verifier's public coin randomness for challenges

$$
\begin{aligned}
& \operatorname{Pr}\left[\begin{array}{c|c}
\mathcal{A}_{1}(t r)=1 & p p \leftarrow \operatorname{Setup}\left(1^{\lambda}\right), \\
& \operatorname{tr} \leftarrow\langle\mathcal{P}(p p, x, w), \mathcal{V}(p p, x)\rangle
\end{array}\right]= \\
& \operatorname{Pr}\left[\begin{array}{l|c}
\mathcal{A}_{1}(\operatorname{tr})=1 & p p \leftarrow \operatorname{Setup}\left(1^{\lambda}\right), \\
& \operatorname{tr} \leftarrow \mathcal{S}(p p, x, \sigma)
\end{array}\right]
\end{aligned}
$$

Above property states that the adversary chooses a distribution over statements $x$ and witnesses $w$ but is not able to distinguish between the simulated transcripts and the honestly generated transcripts for a valid statement/witnesses pair, and that the simulator has access to the randomness used by the verifier.

Definition 5. (Public Coin) All messages sent from $\mathcal{V}$ to $\mathcal{P}$ are chosen uniformly at random and independently of $\mathcal{P}$ 's messages.

### 2.3 Polynomial Commitment Function

As in the case of other popular zero-knowledge protocols that offer succinct proof size, our protocol uses a polynomial commitment evaluation protocol to construct most of our proof transcript. Our protocol uses a version of the polynomial commitment scheme defined by Bootle. et al. 9]. The polynomial commitment function PolyCommitEval is defined as:

- PolyCommitEval( $C, y, x ; \vec{\tau}, \phi) \rightarrow$ boolean $C=u_{1}^{\tau_{1}} u_{2}^{\tau_{2}}, \ldots, u_{n}^{\tau_{n}}$ is the committed polynomial (vector commitment) in $\mathbb{G}$ where $\vec{u}$ are generators, $\vec{\tau}$ are its coefficients and $\phi$ is its blinding key. The function returns a boolean value"true" if it can be correctly evaluated at point $x$ s.t. $y=f(x)$.

The polynomial commitment scheme we use in our benchmark testing is based on the one defined by Bootle et al. 9 .

### 2.4 Zero Knowledge Proof of Discrete Logarithm

For a prover to prove it has the knowledge of a discrete logarithmic $\kappa$ of some group element $s=g^{\kappa} \in \mathbb{G}$. We define the relation for this protocol as $\mathcal{R}_{P o D}=\left\{(h, s ; \kappa): s=g^{\kappa}\right\}$. We also define two functions (ProveDL, Verify $\boldsymbol{D L})$ for provers and verifiers to create and verify proof transcripts:

- Prove $\boldsymbol{D} \boldsymbol{L}(g, \kappa) \rightarrow t r_{\kappa}$ generates the proof transcript $t r_{\kappa}$, where $\kappa$ is the witness.
- Verify $\boldsymbol{D L}\left(g, s, t r_{\kappa}\right) \rightarrow b \in\{0,1\}$ takes a proof transcript $t r_{\kappa}$ and a pair of group elements with discrete $\log$ relation $\left(g, s \in \mathbb{G} \wedge s=h^{\kappa}\right)$, and outputs true if the knowledge of the relation is verified, false otherwise.

In this paper, we assume the underlying implementation of the proof of discrete logarithm protocol is Schnorr's protocol [27. We know for a fact that Schnorr's protocol has perfect completeness, special honest verifier zero knowledge, and computational witness-extended emulation.

### 2.5 Notations

Let $\mathbb{G}$ denote any type of secure cyclic group of prime order $p$, and let $\mathbb{Z}_{p}$ denote an integer field modulo $p$. Group elements other than generators are denoted by capital letters. e.g., $C=u_{1}^{a_{1}} u_{2}^{a_{2}} \ldots u_{n}^{a_{n}} \in \mathbb{G}$ is a commitment committed to a vector $\vec{a}$ denoted by a capital letter, and $B \in \mathbb{G}$ is a random group element also denoted by a capital letter. For generators used as base points to compute other group elements in our protocol, such as $\vec{g}, h \in \mathbb{G}$, we use lower case letters to denote them. Greek letters are used to label hidden key values. e.g., $v$ is the blinding key for Pedersen commitment $P$ on generator $h \in \mathbb{G}$ s.t. $P=g^{a} h^{v}$. Finally, we use standard vector notation $\vec{v}$ to denote vectors. i.e., $\vec{a} \in \mathbb{Z}_{p}^{n}$ is a list of $n$ integers $a_{i}$ for $i=\{1,2, \ldots, n\}$.

We write $\mathcal{R}=\{($ Public Inputs $;$ Witnesses $)$ : Relation $\}$ to denote the relation $\mathcal{R}$ using the specified public inputs and witnesses.

## 3 Protocol for Arbitrary Circuits

We first define the relation for the base version of our protocol. For $l$ input parameters, let $\mathcal{C}_{\mathbb{F}}$ represent the set of arbitrary arithmetic circuits in $\mathbb{F}$, there exists a zero knowledge argument for the relation:

$$
\begin{gather*}
\left\{\left(g, h, \vec{u}, \vec{P}, R \in \mathbb{G}, E \in \mathcal{C}_{\mathbb{F}} ; \vec{a}, \vec{v}, r, \phi \in \mathbb{Z}_{q}\right): E(\vec{a})=r\right.  \tag{1}\\
\left.\wedge P_{i}=g^{a_{i}} h^{v_{i}} \forall_{i} \in[1,|\vec{P}|] \wedge R=g^{r} h^{\phi}\right\}
\end{gather*}
$$

$g, h, \vec{u}$ are initial public parameters $p p$ generated during setup. The above relation states that each input parameter to a circuit is represented by a commitment $P_{i}$ in $\mathbb{G}$, which hides each input value $a_{i}$ with a blinding key $v_{i}$. $r$ is the output of circuit $E$ computed from inputs $\vec{a}$, which is also a committed value $R \in \mathbb{G}$ with blinding key $\phi$.

The main idea behind the "input transformation" concept is the process of transforming committed inputs in $\mathbb{G}$ to linear polynomials in $\mathbb{F}$, where the verifier can perform addition and multiplication operations "as is". For an input commitment $P_{i}$ s.t. $P=g^{a_{i}} h^{v_{i}} \in \mathbb{G}$ where $a_{i}$ is the input value and $v_{i}$ is its blinding key, we create a corresponding integer value in linear polynomial form $a_{i}{ }^{\prime} \in \mathbb{Z}_{q}$ :

$$
\begin{equation*}
a_{i}^{\prime}=a_{i}+X \alpha_{i} \in \mathbb{Z}_{q} \tag{2}
\end{equation*}
$$

Note that the blinding key of each input is replaced by a random $\alpha_{i}$ s.t. $\alpha_{i} \neq v_{i}$. Likewise, the circuit output commitment $R=g^{r} h^{\phi} \in \mathbb{G}$ also has a matching linear polynomial in $\mathbb{Z}_{q}$ with blinding key $\epsilon$.

$$
\begin{equation*}
r^{\prime}=r+X \epsilon \in \mathbb{Z}_{q} \tag{3}
\end{equation*}
$$

Since inputs represented by linear polynomials are just integer values, verifiers can perform arithmetic operations on them just as they do on decrypted numbers. The output value of a circuit evaluation is now a polynomial with $m_{p}+1$ degrees evaluated at point $X$. The constant term of the output polynomial is the circuit output $r$ and the coefficient of the degree one term is the blinding key $\epsilon$ of the circuit output.

In the next two sub-sections, we explain our protocol in two steps: In the first step, we introduce a sub-protocol (Protocol InputMapping) that allows the prover to prove each input in $\mathbb{G}$ is correctly transformed to that in $\mathbb{Z}_{q}$; In the second step, we introduce the full protocol (Protocol Baseline) that uses the aforementioned sub-protocol to validate transformed inputs and prove the circuit output is correctly computed from circuit inputs as relation 1 states.

### 3.1 The Sub-Protocol for Linear Polynomial to Pedersen Commitment Mapping Validation

A sub-protocol that validates committed inputs in $\mathbb{G}$ is correctly mapped to transformed inputs in $\mathbb{Z}_{q}$. The relation we try to prove for this sub-protocol is:

$$
\begin{align*}
& \left\{\left(g, h \in \mathbb{G}, \vec{P} \in \mathbb{G}^{l}, \vec{a}^{\prime}, \in \mathbb{Z}_{q}^{l} ; \vec{a}, \vec{\alpha} \in \mathbb{Z}_{q}^{l}, \vec{v} \in \mathbb{Z}_{p}^{l}\right):\right.  \tag{4}\\
& \left.\quad P_{i}=g^{a_{i}} h^{v_{i}} \wedge a_{i}^{\prime}=a_{i}+X \alpha_{i} \wedge \forall_{i} \in[1, l]\right\}
\end{align*}
$$

An important requirement for input transformation is that we need to transform the hidden value in Pedersen commitment to a prime field that is friendly to NTT. When multiplying two polynomials of degree $m_{p}$, the trivial approach to compute the resultant polynomial's coefficients would require a runtime cost of $O\left(m_{p}{ }^{2}\right)$, whereas the NTT based approach would reduce that to $O\left(m_{p} \log m_{p}\right)$.

NTT requires a prime modulo $q$ of the form $q=r \cdot 2^{k}+1$ to be the prime order of the group, where $k$ and $c$ are arbitrary constants, so we need to pick a prime $q$ that is NTT friendly. Also note
that the prime $q$ is expected to be smaller than $p$ because: 1 ) computation in $p$ must not overflow 2 ) the smaller the $q$ value in bits, the lower the communication cost.

To start, the prover commits to a new set of blinding elements to facilitate the transformation:

$$
\begin{array}{ll}
S_{i}=g^{\omega_{i} \cdot q} u^{v_{i}} \in \mathbb{G} & i=\{1, \ldots, l\} \\
T_{i}=g^{v_{i}-\alpha_{i}} \in \mathbb{G} & i=\{1, \ldots, l\} \tag{6}
\end{array}
$$

The prover then sends $S_{i}, T_{i}$ for $i=\{1, . ., l\}$ to the verifier. When the challenge $x \in \mathbb{Z}_{q}$ is available, the prover sends $e_{i}$ s.t.:

$$
\begin{equation*}
e_{i}=\left(\left(x \alpha_{i} \bmod q\right)-x \alpha_{i}\right) \cdot x+\omega_{i} \cdot q \quad i=\{1, \ldots, l\} \tag{7}
\end{equation*}
$$

The idea here is that when we subtract $e_{i}$ from $a_{i} \cdot x$ in the next step, we can subtract out the blinding modulo $q$ element $\left(x \alpha_{i} \bmod q\right.$ ) from $a_{i}^{\prime}$ (e.g., $a_{i}^{\prime} \cdot x-e_{i}=\left(a_{i}+x \alpha_{i}\right) \cdot x-\omega_{i} q$ ), assuming there is no overflow. The verifier can replace the $x^{2} \alpha_{i}-\omega_{i} q$ part with the new blinding element $x^{2} v_{i}$ as the exponent of generator $g$ by adding the previously committed values $S_{i}, T_{i}$.

$$
\begin{equation*}
g^{a_{i}^{\prime} \cdot x-e_{i}} \cdot T_{i}^{x^{2}} \cdot S_{i}=\left(g^{x}\right)^{a_{i}+x v_{i}} u^{v_{i}} \in \mathbb{G} \quad i=\{1, \ldots, l\} \tag{8}
\end{equation*}
$$

With $\left(g^{x}\right)^{a_{i}+x v_{i}}$ available, the verifier can trivially divide each $P_{i}$ and take their sum with powers of $k$ to get $P K_{v_{t}}$.

$$
\begin{equation*}
P K_{v_{t}}=\prod_{i=1}^{l}\left(\frac{P_{i}^{x}}{g^{a_{i}^{\prime} \cdot x-e_{i}} \cdot T_{i}^{x^{2}} \cdot S_{i}}\right)^{k^{i}} \in \mathbb{G} \tag{9}
\end{equation*}
$$

$P K_{v_{t}}=\left(h^{x} /\left(g^{x^{2}} u\right)\right)^{v_{t}}$. The verifier can confirm the correctness of the transformation except with negligible probability if the prover can prove the knowledge of $v_{t}$ on generator $h^{x} /\left(g^{x^{2}} u\right) \in \mathbb{G}$.

Finally, the verifier needs to make sure $e_{i}$ doesn't alter the value of $a_{i}$. This can be done by taking the modulus $q$ of $e_{i}$ and checking if it returns 0 . This is trivial to understand since $a_{i}^{\prime}$ is in $\mathbb{Z}_{q}$. If $e_{i}$ is a multiple of $q$ then it is obvious that it cannot alter the value of $a_{i}$.

$$
\begin{equation*}
\text { if }\left(e_{i} \bmod q\right) \stackrel{?}{=} 0, \text { then continue } \tag{10}
\end{equation*}
$$

This test also implies the transformation process explained in this section is sound since the soundness of equation 9 is trivial to prove except for a negligible probability. For example, if $a_{i}^{\prime *}=a_{i}^{*}+x \alpha_{i}=$ $a_{i}+\delta+x \alpha_{i}$. Knowing that $e_{i}$ must be a multiple of $q$ for equation 10 to be true, we have $a_{i}^{\prime *} \cdot x-e_{i}=$ $\left(a_{i}+x \alpha_{i}\right) \cdot x-\omega_{i} q=x\left(a_{i}+x \alpha_{i}\right)+x \delta$. In order for equality 8 to be true, the left side of the equality 8 must offset $x \delta$ using committed values $T_{i}$ and $S_{i}$, s.t. $x \delta$ will be removed after applying challenge $x$ to these elements (i.e., $T_{i}^{x^{2}} \cdot S_{i}$, note exponents are different powers of $x$ ), which only happens for a negligible chance of $1 / q$ when the dishonest prover successfully guessed $x$ correctly.

This transformation process is zero-knowledge because $e_{i}$ does not leak any information to the verifier either. This is because the first part of $e_{i}:\left(\left(x \alpha_{i} \bmod q\right)-x \alpha_{i}\right) \cdot x$ is a multiple of $q$, and can be represented as $s \cdot q$ for some $s$. This implies $e_{i}=(s+\omega) \cdot q$ for some randomly chosen $\omega$ where $\omega \gg s$. Since every other transcript (in $\mathbb{G}$ ) is trivially zero-knowledge, we say this transformation process is zero-knowledge if $\omega$ is correctly chosen. See full proof in Appendix A.

We have so far skipped the overflow problem. If $a_{i}+\left(x \alpha_{i} \bmod q\right)>q$, then we will have an overflow problem in equation 89 when computing $a_{i}^{\prime} \cdot x-e_{i}$. To get around this, the prover simply needs to check if $a_{i}+\left(x \alpha_{i}\right) \bmod q$ overflows $q$, and subtract $q \cdot x$ from $e_{i}$ if that's the case.

$$
\begin{equation*}
\text { if } a_{i}+\left(x \alpha_{i} \bmod q\right) \geq q, \text { then } e_{i}=e_{i}-q \cdot x \quad i=\{1, \ldots, l\} \tag{11}
\end{equation*}
$$

The sub-protocol for input validation is defined in two parts: Setup and Verify. We omit the challenge generation part since that is also part of the main protocol.

```
Input: \(\left(\vec{P}, g, h \in \mathbb{G} ; \vec{a} \in \mathbb{Z}_{q}, \vec{v} \in \mathbb{Z}_{p}\right)\)
    \(\mathcal{P}^{\prime}\) sinput: \((\vec{P}, g, h ; \vec{a}, \vec{v}) ; \quad \mathcal{V}^{\prime}\) sinput : \((\vec{P}, g, h)\)
    \(\mathcal{P}\) compute :
\begin{tabular}{ll}
\(\alpha_{i} \stackrel{\&}{\leftarrow} \mathbb{Z}_{p}\) & \(i=\{1, \ldots, l\}\) \\
\(\omega_{i} \stackrel{\&}{\leftarrow} \mathbb{Z}_{p}\) & \(i=\{1, \ldots, l\}\) \\
\(S_{i}=g^{\omega_{i} \cdot q} u^{v_{i}} \in \mathbb{G}\) & \(i=\{1, \ldots, l\}\) \\
\(T_{i}=g^{v_{i}-\alpha_{i}} \in \mathbb{G}\) & \(i=\{1, \ldots, l\}\) \\
\(\mathcal{P} \rightarrow \mathcal{V}: \vec{S}, \vec{T}\) &
\end{tabular}
```

Protocol InputMapping - Setup
The verifier generates challenge $x$ and send it to the prover. After the challenge is received, the prover generates $\vec{a}^{\prime}$ and sends them to the verifier. Since this challenge generation part is shared with the main protocol, we omit it here. Next, the verify part of the protocol validates the mapping between transformed inputs in field $\mathbb{Z}_{q}$ to those in group $\mathbb{G}$.

```
Input: \(\left(\vec{P}, \vec{T}, g, h \in \mathbb{G}, \vec{a}^{\prime} \in \mathbb{Z}_{q} ; \vec{a}, \vec{\alpha} \in \mathbb{Z}_{q}, \vec{v} \in \mathbb{Z}_{p}\right)\)
    \(\mathcal{P}^{\prime}\) sinput : \((\vec{P}, \vec{T}, g, h ; \vec{a}, \vec{v}) ; \quad \mathcal{V}^{\prime}\) s input \(:(\vec{P}, \vec{T} g, h)\)
    \(\mathcal{P}\) compute :
        \(e_{i}=\left(\left(x \alpha_{i} \bmod q\right)-x \alpha_{i}\right) x+\omega_{i} q \quad i=\{1, \ldots, l\}\)
        if \(a_{i}+\left(x \alpha_{i} \bmod q\right)>q\), then \(e_{i}=e_{i}-q \cdot x \quad i=\{1, \ldots, l\}\)
    \(\mathcal{P} \rightarrow \mathcal{V}: \vec{e}, \vec{a}^{\prime}\)
    \(\mathcal{V} \rightarrow \mathcal{P}: k \stackrel{\&}{\leftarrow} \mathbb{Z}_{p}\)
    \(\mathcal{P}\) compute :
    \(v_{t}=\sum_{i=1}^{l} v_{i} k^{i} \in \mathbb{Z}_{p}\)
    \(\operatorname{tr}_{v_{t}}=\operatorname{ProveDL}\left(\left(h^{x} /\left(g^{x} u\right)\right), v_{t}\right)\)
\(\mathcal{P} \rightarrow \mathcal{V}: \operatorname{tr}_{v_{t}}\)
\(\mathcal{V}\) verify inputs :
    if \(\left(e_{i} \bmod q\right) \stackrel{?}{=} 0\), then continue \(\quad i=\{1, \ldots, l\}\)
    else reject
    \(P K_{v_{t}}=\prod_{i=1}^{l}\left(\frac{P_{i}^{x}}{g^{a_{i}^{\prime} \cdot x-e_{i}} \cdot T_{i}^{x^{2}} \cdot S_{i}}\right)^{k^{i}} \in \mathbb{G}\)
    if \(\operatorname{Verify} D L\left(\left(h^{x} /\left(g^{x^{2}} u\right)\right), P K_{v_{t}}, t r_{v_{t}}\right)\), then accept
    else reject
```

Protocol for InputMapping - Verify
Theorem 1. (The Input-Mapping Protocol). The proof system presented in this section has perfect completeness, PHVZK, and CWEE. The proof for Theorem 1 is presented in Appendix A.

### 3.2 The Baseline Protocol

To prove the circuit output is correctly computed from transformed inputs $\vec{a}^{\prime}$, the prover needs to show it knows all coefficients of the output polynomial. For example, for a simple circuit that just outputs the sum of two inputs, the prover needs to show it knows the constant term $r$ and the coefficient of the degree 1 term $\epsilon$ of the output polynomial :

$$
\begin{equation*}
o=a_{1}^{\prime}+a_{2}^{\prime}=r+X \cdot \epsilon \tag{12}
\end{equation*}
$$

Computing the output polynomial of the addition operation is the same as adding two polynomials, where $r=\left(a_{1}+a_{2}\right)$ and the blinding key is $\epsilon=\left(\alpha_{1}+\alpha_{2}\right)$. Likewise, multiplying two inputs $a_{1}{ }^{\prime}, a_{2}{ }^{\prime}$ is the same as multiplying two polynomials:

$$
\begin{equation*}
o=a_{1}{ }^{\prime} \cdot a_{2}{ }^{\prime}=r+X \cdot \epsilon+X^{2} \cdot \tau \tag{13}
\end{equation*}
$$

Where $r=a_{1} \cdot a_{2}, \epsilon=a_{2} \alpha_{1}+a_{1} \alpha_{2}$, and $\tau=\alpha_{1} \cdot \alpha_{2}$. We use the label " $o$ " to represent the circuit output, which is equivalent to the output polynomial evaluated at a point $X$. The degree of the polynomial will increase after each multiplication operation, so the efficiency will drop as the maximum number of multiplications that leads to the circuit output $\left(m_{p}\right)$ increases.

To get the linear polynomial we need from the raw output $o$, the verifier needs to subtract out terms with degrees higher than one. In the multiplication circuit above, the verifier needs to eliminate the term of degree 2 to get the linear polynomial. To do so, the prover commits to $\tau$ before the challenge $x$ is known. When the challenge $x$ is available, the prover sends the evaluation value $y$ to the verifier and proofs to prove $f(x)=X^{2} \tau=y$. The verifier can subtract $y$ from $o$ to get the output in linear polynomial form:

$$
r^{\prime}=o-y
$$

We call value $y$ a "breaker". Breaker(s) subtracts all "noises" (polynomial terms of degree higher than one) from the raw output $o$. The prover and the verifier can engage in a polynomial commitment evaluation protocol to confirm $f(x)=y$.

$$
\begin{equation*}
C=\prod_{i=1}^{m_{p}} u_{i}^{\tau_{i}} \in \mathbb{G} \tag{14}
\end{equation*}
$$

We can compute the whole circuit in one run s.t. the protocol evaluates to one polynomial of $m_{p}$ degrees as its output and use just one breaker $y$ to get the circuit output $r^{\prime}$. However, that would be pretty inefficient because the bigger the polynomial gets, the less efficient it would be to compute its coefficients (even with NTT). Instead, we can break a big circuit into $m_{b}$ smaller sub-circuits. Each sub-circuit computes a polynomial of at most $b+1$ degrees, which evaluates to an intermediate result $r_{i}^{\prime}$ s.t.:

$$
\begin{equation*}
r_{i}^{\prime}=o_{i}-y_{i} \quad \text { for } \quad i=\left\{1, \ldots, m_{b}\right\} \tag{15}
\end{equation*}
$$

Each $o_{i}$ is the output of a degree $b+1$ polynomial at evaluation point $x$, and each breaker $y_{i}$ is the breaker for that polynomial.

We just need to make a simple modification to the polynomial commitment evaluation protocol defined by Bootle et al. to enable verifiers to evaluate $m_{b}$ breakers all at once. We start by aligning each breaker $y_{i}$ to the coefficients of the $i$ th generator of vector commitments (columns of the $m_{b} \times b$ matrix).

$$
\begin{array}{ccccc}
u_{1}^{y_{1}^{\prime}} \\
\cdot \\
\cdot \\
u_{m_{b}}^{y_{m_{b}}^{\prime}}
\end{array}=\left(\begin{array}{ccccc}
u_{1}^{\tau_{1,1}} & u_{1}^{\tau_{1,2}} & \cdot & \cdot & u_{1}^{\tau_{1, b}} \\
\cdot & \cdot & \cdot & & \cdot \\
\cdot & \cdot & & \cdot & \cdot \\
u_{m_{b}}^{\tau_{m_{b}, 1}} & u_{m_{b}}^{\tau_{m_{b}, 2}} & \cdot & \cdot & u_{m_{b}}^{\tau_{m_{b}, b}}
\end{array}\right)\left(\begin{array}{c}
x^{2} \\
\cdot \\
x^{b+1}
\end{array}\right)
$$

Figure 1
Let $y_{i}=\left(y_{i}^{\prime}\right) \bmod q$, we can observe from figure 1 that each $y_{i}^{\prime}$ can be computed from the sum of products of exponents of $u_{i}$ (e.g. $y_{i}^{\prime}=\tau_{i, 1} x^{2}+\tau_{i, 2} x^{3}+\ldots+\tau_{i, b} x^{b+1}$ ).

In our protocol, the prover commits to the columns of the matrix in figure 1 in the same way as in Bootle et al.'s polynomial commitment evaluation scheme.

$$
\begin{equation*}
C_{j}=\prod_{i=1}^{m_{b}} u_{i}^{\tau_{i, j}} \quad \text { for } \quad j=\{1, \ldots, b\} \tag{16}
\end{equation*}
$$

When the evaluation point $x$ is known, the verifier computes the exponent of each $C_{j}$. If the equality below is true, all breakers are verified.

$$
\begin{equation*}
\prod_{j=1}^{b} C_{j}^{x^{j}} \stackrel{?}{=} \prod_{i=1}^{m_{b}} u_{i}^{y_{i}^{\prime}} \tag{17}
\end{equation*}
$$

Note that if each breaker $y_{i}^{\prime}$ is equal to the sum of products of $b$ terms and each term is a product of $x^{j} \in \mathbb{Z}_{q}$ and $\tau_{i, j} \in \mathbb{Z}_{q}$, then the bit length of $\left|y_{i}^{\prime}\right|$ is approximately $\left|y_{i}^{\prime}\right| \leq 2 \cdot|q|+|b|$. The value of $y_{i}^{\prime}$ can also be expressed as $y_{i}^{\prime}=y_{i}+v \cdot q$ for some $v$ and $|v| \leq|q|+|b|$ (this is important in Protocol BinaryBoost we will cover in the next section). However, passing raw $y_{i}^{\prime}$ values to the verifier may leak some information about the coefficients.

To cope with that, we make the prover commit to a blinding vector $\vec{\mu} \in \mathbb{Z}_{m}^{m_{b}}$. Each $y_{i}^{\prime}$ is now computed as:

$$
\begin{equation*}
y_{i}^{\prime}=\sum_{j=1}^{b} \tau_{i, j} x^{j}+\mu_{i} \quad \text { for } \quad i=\left\{1, \ldots, m_{b}\right\} \tag{18}
\end{equation*}
$$

If $|m| \gg\left|\sum_{j=1}^{b} \tau_{i, j} x^{j}\right|$, then $\mathbb{Z}_{m}$ perfectly hides $\sum_{j=1}^{b} \tau_{i, j} x^{j}$ (130-bit for 61 -bit prime and $\mathrm{b}=2^{8}$ ) with high probability. For example, if we set $m$ to a randomly chosen 216 -bit value, then there is less than a $2^{-85}$ probability that $\mu_{j}$ does not perfectly hide $\vec{\tau}_{j}$ (e.g. when $y_{i}$ overflows $m$ or less than 131 bits). Note that the power (exponent) of $x$ in each term in the equation above is one degree lower than it needs to be, so to get $y_{i}$ from $y_{i}^{\prime}$ the verifier needs to multiply $x$ one more time:

$$
\begin{equation*}
y_{i}=\left(y_{i}^{\prime} \cdot x\right) \bmod q \in \mathbb{Z}_{q} \quad \text { for } \quad i=\left\{1, \ldots, m_{b}\right\} \tag{19}
\end{equation*}
$$

We also need to adjust the blinding key of $r^{\prime}$ by adding $\mu_{i}$ to each $r_{i}$. The updated equality graph is shown in figure 2 below.

Let $M=\prod_{i=1}^{m_{b}} u_{i}^{\beta_{i}}$, the equality in figure 2 can also be expressed using the equality check below:

$$
\begin{equation*}
\prod_{j=1}^{b} C_{j}^{x^{j}} \cdot M \stackrel{?}{=} \prod_{i=1}^{m_{b}} u_{i}^{y_{i}^{\prime}} \tag{20}
\end{equation*}
$$

If the commitments $\vec{C}, M$ and vector $\vec{y}^{\prime}$ satisfy the equation above, then we know breakers $\vec{y}^{\prime}$ are valid. The verifier applies equation 19 to each $y_{i}^{\prime}$ to get the actual breakers $y_{i}$ used in computing $o$.

We define two more functions for our protocol. function
computeSubCircuitKeys is used by the prover to compute the keys of each sub-circuit or "row" in Figure 1, and function computeSubCircuit is used by the verifier to compute the value of a sub-circuit at the evaluation point $x$ :

1. function computeSubCircuitKeys ${ }_{i}$ ("input values", "input keys"): for $i=\left\{1, . ., m_{b}\right\}$, it takes input values $\vec{a}$ and keys $\vec{\alpha}$ to evaluate the sub-circuit and outputs $r_{i}, \epsilon_{i}, \vec{\tau}_{i}$ (coefficients of $o_{i}$ ).
2. function computeSubCircuit ${ }_{i}$ ("inputs in linear polynomial form", "output from the previous computeSubCircuit function"): for $i=\left\{1, . ., m_{b}\right\}$, it trivially computes the result $o_{i}$ from the inputs to the sub-circuit.

For example, if the logic of the $i$ th sub-circuit is to return the product of $l$ inputs, then the computeSubCircuit $_{i}$ function simply performs $o_{i}=a_{1} \times a_{2} \times, \ldots, \times a_{l}$. Since $a_{1}, \ldots, a_{l}$ are linear polynomials evaluated at point $\mathrm{X}, o_{i}$ is the evaluation of the output polynomial at point X , and the computeSubCircuitKeys function computes all coefficients of the output polynomial. We are now $^{\text {for }}$ ready to introduce Protocol Baseline as follows:

$$
\begin{align*}
& \text { Input: }\left(g, h, \vec{u}, \vec{P} \in \mathbb{G}, \vec{a}^{\prime} \in \mathbb{Z}_{q}^{l} ; \vec{a}, \vec{v} \in \mathbb{Z}_{q}\right)  \tag{21}\\
& \mathcal{P}^{\prime} \text { sinput: }\left(g, h, \vec{u}, R, \vec{a}^{\prime} ; \vec{a}, \vec{\alpha}, \phi\right) ; \quad \mathcal{V}^{\prime} \text { s input : }\left(g, h, \vec{u}, R, \vec{a}^{\prime}\right)  \tag{22}\\
& \mathcal{P} \text { compute : }  \tag{23}\\
& \rho_{i} \stackrel{\$}{\leftarrow} \mathbb{Z}_{m},  \tag{24}\\
& i=\left\{1, \ldots, m_{b}\right\} \\
& M=\prod_{i=1}^{m_{b}} u_{i}^{\mu_{i}} \in \mathbb{G}  \tag{25}\\
& r_{1}, \epsilon_{1}, \vec{\tau}_{1}=\text { computeSubCircuitKeys }{ }_{1}(\vec{a}, \vec{\alpha}) \text {; }  \tag{26}\\
& \epsilon_{1}=\left(\epsilon_{1}-\mu_{1}\right) \bmod q \in \mathbb{Z}_{q}  \tag{27}\\
& \text { for } i=2, \ldots, m_{b} \quad\{  \tag{28}\\
& \vec{a}^{\prime}=\vec{a}\left\|r_{i-1}, \quad \vec{\alpha}^{\prime}=\vec{\alpha}\right\| \epsilon_{i-1} ;  \tag{29}\\
& r_{i}, \epsilon_{i}, \vec{\tau}_{i}=\text { computeSubCircuitKeys }{ }_{i}\left(\vec{a}^{\prime}, \vec{\alpha}^{\prime}\right) \text {; }  \tag{30}\\
& \left.\epsilon_{i}=\left(\epsilon_{i}-\mu_{i}\right) \bmod q \in \mathbb{Z}_{q}\right\}  \tag{31}\\
& C_{j}=\prod_{i=1}^{m_{b}} u_{i}^{\tau_{i, j}} \in \mathbb{G} \quad i=\{1, \ldots, b\}  \tag{32}\\
& \phi \stackrel{\$}{\leftarrow} \mathbb{Z}_{p}  \tag{33}\\
& R=g^{r_{m_{b}}} h^{\phi} \in \mathbb{G}  \tag{34}\\
& \text { InputMapping-Setup }(\vec{P}\|R, g ; \vec{v}\| \phi) \rightarrow \mathcal{P}: \vec{S}, \vec{T}  \tag{35}\\
& \mathcal{P} \rightarrow \mathcal{V}: \vec{C}, M, \vec{S}, \vec{T}  \tag{36}\\
& \mathcal{V} \rightarrow \mathcal{P}: \quad x \stackrel{\$ \mathbb{Z}_{q}}{\leftarrow}  \tag{37}\\
& \mathcal{P} \text { compute: }  \tag{38}\\
& a_{i}^{\prime}=a_{i}+x \cdot \alpha_{i} \in \mathbb{Z}_{q}  \tag{39}\\
& i=\{1, \ldots, l\} \\
& y_{i}^{\prime}=\sum_{j}^{b} \tau_{i, j} \cdot x^{j}+\beta_{i} \in \mathbb{Z}_{p} \quad i=\left\{1, \ldots, m_{b}\right\}  \tag{40}\\
& \mathcal{P} \rightarrow \mathcal{V}: \overrightarrow{y^{\prime}}, \overrightarrow{a^{\prime}}  \tag{41}\\
& \mathcal{V} \text { verify final output: }  \tag{42}\\
& \text { if }\left(\prod_{i=1}^{m_{b}} u_{i}^{y_{i}^{\prime}} \stackrel{?}{=} \prod_{j=1}^{b} C_{j}^{x^{j}} \cdot M\right) \text { then continue }  \tag{43}\\
& \text { else reject }  \tag{44}\\
& \text { for } \quad i=1, \ldots, m_{b} \quad\{  \tag{45}\\
& o_{i}=\text { computeSubCircuit }_{i}\left(\vec{a}^{\prime}\right) \in \mathbb{Z}_{q} \tag{46}
\end{align*}
$$

$$
\begin{align*}
& \quad y_{i}=\left(y_{i}^{\prime} \cdot x\right) \bmod q \in \mathbb{Z}_{q}  \tag{47}\\
&  \tag{48}\\
& r_{i}^{\prime}=o_{i}-y_{i} \in \mathbb{Z}_{q}  \tag{49}\\
&  \tag{50}\\
& \left.\vec{a}^{\prime}=\vec{a}^{\prime} \| r^{\prime} \in \mathbb{Z}_{q}\right\}  \tag{51}\\
& \text { if }  \tag{52}\\
& \text { InputMapping-Verify }\left(\vec{P}\left\|R, \vec{S}, \vec{T}, g, h, \vec{a}^{\prime}\right\| r_{m_{b}}^{\prime} ; \vec{a}\left\|r_{m_{b}}, \vec{\alpha}\right\| \epsilon, \vec{v} \| \phi\right) \\
& \\
& \text { then continue } \\
& \text { else } \text { reject }
\end{align*}
$$

Protocol Baseline

Theorem 2. (The Baseline Protocol). The proof system presented in this section has perfect completeness, PHVZK, and CWEE. The proof for Theorem 2 is presented in Appendix B.

### 3.3 Optimization Techniques and Customized Sub-Circuit

We can optimize performance through the use of specially designed sub-circuits. The idea is similar to that of "custom gates" found in SNARKs protocols in principle but very different in implementation. In our case, the goal is to utilize existing algorithms/protocols to handle operations that would otherwise be expensive in our protocol (or any other protocol).

In particular, we can use range proof inside an arithmetic circuit to handle all comparison operations $(>,<, \geq, \leq)$ and decimal point reductions. This is huge in practice because either using a boolean circuit directly or converting to/from a boolean circuit inside an arithmetic circuit is expensive.

For example, to prove $a_{1}>a_{2}$ (or $P_{1}>P_{2}$ ), the prover can do the following:

1. Commit to their difference $C=g^{c} h^{v}$ s.t. $c=a_{1}-a_{2}$ (or compute $C$ from $P_{1}, P_{2}$ using additive homomorphism e.g. $C=P_{1} / P_{2}$ ).
2. Call protocol InputMapping to check $c^{\prime}=a_{1}^{\prime}-a_{2}^{\prime} \in \mathbb{Z}_{q}$ maps $C \in \mathbb{G}$;
3. Use a range-proof protocol to prove $C>0$. If returns true, then we know $a_{1}>a_{2}$.

An example usage is as follows:

```
computeSubCircuit : \(\left(\vec{a}^{\prime} \in \mathbb{Z}^{q}, \vec{C} \in \mathbb{G}\right)\)
    \(c_{1}^{\prime}=a_{1}^{\prime}-a_{2}^{\prime} \in \mathbb{Z}_{q}\)
    if Protocol RangeProof \(\left(C_{1}, 0\right)\)
        \(c_{2}^{\prime}=a_{3}^{\prime}-a_{4}^{\prime} \in \mathbb{Z}_{q}\)
        if Protocol RangeProof \(\left(C_{2}, 0\right)\)
            \(o=\) do something
        else
            \(o=\) do something else
    else
        \(o=\) do something
    if Protocol InputMapping \(\left(C_{1}\left\|C_{2}, c_{1}^{\prime}\right\| c_{2}^{\prime}\right)\) return \(o\)
    else reject
```

Function computeSubCircuit (Customized)
In the computeSubCircuit function defined above, the circuit first tests if $a_{1}^{\prime}>a_{2}^{\prime}$ and then tests if $a_{3}^{\prime}>a_{4}^{\prime}$. Before the function returns $o$, it batch checks the mapping between each $c_{i}^{\prime}$ and $C_{i}$ pair. In practice, all calls to range proof should also be batch verified at the end of the function.

We can bypass the "inactive" part of the circuit (similar to that of suBlonk (ePrint)). For example, if the first range proof returns false (e.g., $a_{1}<a_{2}$ ), then both the prover and the verifier can bypass the two "else" parts of the circuit above. However, it is worth noticing that using a customized sub-circuit bypassing parts of the circuit may leak information about data to attackers, so one must use such a strategy with extreme caution.

We believe combining the arithmetic circuit and range-proof protocols is the most efficient way to run a zero-knowledge test in the real world. What makes our protocol unique in such an approach is that we can batch proof all comparison logics using one range proof while still keeping the main line of the circuit process inside one protocol run. If it were done in SNARK protocols, it would require cutting a large circuit into multiple smaller ones, making both the communication cost and verifier runtime cost linear to the number of smaller circuits (or the number of comparisons in a circuit).

It is also not necessary that all breakers $y_{1}, \ldots, y_{m_{b}}$ have the same $b$ value. For example, if some value $a_{i}$ is taken to its 100th power ( $a_{i}^{\prime 100}$, degree term $m_{p}=100$ ) and will be used as inputs in multiple places of a circuit, then it would be wise to use a breaker to cut its degree to 1 (in linear polynomial form $a_{i}^{100}+X \alpha_{i}$ ) before being used as inputs in other places of a circuit.

### 3.4 Memory Efficiency

The memory consumption cost of protocol Baseline is $O\left(m_{p}\right)$. We can improve the memory consumption cost of our protocol to $O(b)$ or approximately $O\left(m_{p}^{\frac{1}{2}}\right)$.

Instead of computing $\vec{C}$ after all $\vec{\tau}$ are computed s.t. $|\vec{\tau}|=m_{p}$. We ask the prover to compute $\vec{C}$ iteratively. When the prover computes $r_{i}, \epsilon_{i}, \vec{\tau}_{i}$ in each loop for $i=\left\{1, \ldots, m_{b}\right\}$, the prover also updates $\vec{C}$ with each new "row" of coefficients $\vec{\tau}_{i}$. This can be achieved by replacing lines 28-32 with the following lines:

$$
\begin{align*}
& C_{j}=u_{i}^{\tau_{1, j}} \in \mathbb{G}  \tag{53}\\
& \text { for } \quad i=2, \ldots, m_{b} \quad\{  \tag{54}\\
& \quad \vec{a}^{\prime}=\vec{a}\left\|r_{i-1}, \quad \vec{\alpha}^{\prime}=\vec{\alpha}\right\| \epsilon_{i-1} ;  \tag{55}\\
& \quad r_{i}, \epsilon_{i}, \vec{\tau}_{i}=\text { computeSubCircuitKeys }\left(\vec{a}^{\prime}, \vec{\alpha}^{\prime}\right) ;  \tag{56}\\
& \left.\quad \epsilon_{i}=\left(\epsilon_{i}-\mu_{i}\right) \bmod q \in \mathbb{Z}_{q}\right\} \tag{57}
\end{align*}
$$

Because the coefficients of each iteration $\vec{\tau}_{i}$ will get discarded from the memory after each iteration completes, we can therefore achieve a memory cost of $O(b)$. However, we now have to recompute $\vec{\tau}_{i}$ after challenge $x$ is available so that $\vec{y}^{\prime}$ can be computed (in line 42 ).

The obvious caveat is that we can no longer use Pippenger acceleration to compute each $C_{j}$ as we did in protocol Baseline, and we also have to recompute coefficients for each sub-circuit to get $\vec{y}$.

A meet-in-the-middle approach is to break the computation of $\vec{C}$ into $t b$ segments, and in each segment we compute $\frac{1}{t b} m_{p}$ coefficients and update $\vec{C}$ after each segment. By doing so, we can achieve a consumption cost of $O\left(\frac{1}{t b} m_{p}\right)$.

## 4 The Binary-Boost Protocol for Boolean Circuits

For boolean circuits, we can leverage the fact that all input/output values of each gate can only be either 0 or 1 to construct a more efficient protocol-BinaryBoost.

### 4.1 Protocol for boolean circuit validation

A common requirement in proving boolean circuits is to enforce input data $b_{i} \in\{0,1\}$ for $i \in\{1, \ldots, l\}$ (this is not new, but we need this sub-protocol defined to make our main protocol easier to parse),
such relation is defined as:

$$
\begin{equation*}
\left\{\left(\vec{b}^{\prime}, \in \mathbb{Z}_{q}^{l} ; \vec{b}, \vec{\beta} \in \mathbb{Z}_{q}^{l}\right): b_{i}^{\prime}=b_{i}+X \beta_{i} \wedge \forall_{b_{i}} \in[0,1] \wedge \forall_{i} \in[1, . ., l]\right\} \tag{58}
\end{equation*}
$$

In practice, it is useful to decompose $l$ full integer inputs into $l \cdot 32$ bits (assuming we use 32 bits to represent a full integer). If a committed value $b_{i}$ is in $[0,1]$, then its linear polynomial form $b_{i}$ must have the following property:

$$
\begin{equation*}
\left(b_{i}^{\prime} \cdot b_{i}^{\prime}-b_{i}^{\prime}\right)=\delta_{1, i} x+\delta_{2, i} x^{2} \tag{59}
\end{equation*}
$$

Where $\delta_{2, i}=\beta_{i}^{2}$, and $\delta_{1, i}=\beta_{i}$ when $b_{i}$ is 1 and $\delta_{1, i}=-\beta$ when $b_{i}$ is 0 . To prove the correctness for all $b_{i} \in\{0,1\}$, the prover commits to polynomials $D_{1}, D_{2}$ :

$$
D_{1}=u_{1}^{\delta_{1,1}} u_{2}^{\delta_{1,2}} \ldots u_{l}^{\delta_{1, l}} h^{\rho_{1}}, \quad D_{2}=u_{1}^{\delta_{2,1}} u_{2}^{\delta_{2,2}} \ldots u_{l}^{\delta_{2, l}} h^{\rho_{2}}
$$

Where $D_{1}$ commits to coefficients on the $x$ term and $D_{2}$ commits to coefficients on the $x^{2}$ term. The prover can easily join two polynomial commitments into one and sends only one element $D$ to the verifier.

$$
\begin{equation*}
D=\prod_{i=1}^{l} u_{i}^{\delta_{1 i}} \cdot \prod_{i=1}^{l} u_{i+l}^{\delta_{2 i}} \cdot h^{\rho} \in \mathbb{G} \tag{60}
\end{equation*}
$$

When the challenge $k$ is received, the prover sends the evaluation results $y_{1}, y_{2}$ to the verifier, and then engages with the verifier to verify the correctness of $y_{1}, y_{2}$ at point $k$, and checks if the equality below is true:

$$
\begin{equation*}
y_{1} \cdot X+y_{2} \cdot X^{2}=\sum_{j=1}^{l \cdot 32}\left(b_{i}^{\prime} \cdot b_{i}^{\prime}-b_{i}^{\prime}\right) \cdot k^{i} \tag{61}
\end{equation*}
$$

Once we know all linear polynomials map to either 0 or 1 , it is trivial to recompose the linear polynomial form of a full integer input $a_{i}^{\prime}$ from 32 decomposed bits $b_{i, j}^{\prime}$ for $j=\{1, \ldots, 32\}$.

$$
\begin{equation*}
a_{i}^{\prime}=\sum_{j=1}^{32} b_{i, j}^{\prime} \cdot 2^{j} \tag{62}
\end{equation*}
$$

We define the protocol BooleanityTest using two sub-protocols:

$$
\begin{aligned}
& \text { Input: }\left(\vec{b}^{\prime} \in \mathbb{Z}_{q}: \vec{b}, \vec{\beta}^{\prime} \in \mathbb{Z}_{q}\right) \\
& \begin{array}{ll}
\mathcal{P} \text { compute : } & \\
\quad l=\left|\vec{b}^{\prime}\right| \in \mathbb{Z}_{q} & \\
\quad \rho \stackrel{\$}{\leftarrow} \mathbb{Z}_{q} & i=\{1, \ldots, l\} \\
\delta_{1 i}=b_{i} \beta_{i}+b_{i} \beta_{i}-\beta_{i} \in \mathbb{Z}_{q} & i=\{1, \ldots, l\} \\
\delta_{2 i}=\beta_{i}^{2} \in \mathbb{Z}_{q} & \\
D=\prod_{i=1}^{l} u_{i}^{\delta_{1, i}} \cdot \prod_{i=1}^{l} u_{i+l}^{\delta_{2, i}} \cdot h^{\rho} \in \mathbb{G} &
\end{array}
\end{aligned}
$$

Protocol BooleanityTest-Setup
After a challenge $x$ is sent from the verifier to the prover, the protocol moves to the verification stage defined by the following sub-protocol.

```
Input: \(\left(\vec{b}^{\prime} \in \mathbb{Z}_{q} ; \rho, \overrightarrow{\delta_{1}},{\overrightarrow{\delta_{2}}}^{\prime} \in \mathbb{Z}_{q}\right)\)
    \(\mathcal{P}^{\prime}\) s input: \(\left(\overrightarrow{b^{\prime}} ; \rho, \overrightarrow{\delta_{1}}, \overrightarrow{\delta_{2}}{ }^{\prime}\right) ; \mathcal{V}^{\prime}\) s input: \(\left(\vec{b}^{\prime}\right)\)
\(\mathcal{V} \rightarrow \mathcal{P}: \quad k \stackrel{\$ \mathbb{Z}_{q}}{\leftarrow}\)
\(\mathcal{P}\) compute :
    \(y_{1}=\sum_{i=1}^{l} \delta_{1, i} k^{i} \in \mathbb{Z}_{q}, \quad y_{2}=\sum_{i=1}^{l} \delta_{2, i} k^{i} \in \mathbb{Z}_{q}\)
\(\mathcal{P} \rightarrow \mathcal{V}: y_{1}, y_{2}\)
\(\mathcal{P}, \mathcal{V}\) engate to evaluate :
    if PolyCommitEval \(\left(D, y_{1}+y_{2} k^{l}, k ; \overrightarrow{\delta_{1}} \| \overrightarrow{\delta_{2}}, \rho\right)\)
    \(\wedge y_{1} \cdot x+y_{2} \cdot x^{2} \stackrel{?}{=} \sum_{j=1}^{l \cdot 32}\left(b_{i}^{\prime} \cdot b_{i}^{\prime}-b_{i}^{\prime}\right) \cdot k^{i}\)
        return true
    else return false
```

Protocol BooleanityTest-Verify
Theorem 3. (The Booleanity Test Protocol). The proof system presented in this section has perfect completeness, PHVZK, and CWEE. The proof for Theorem 3 is presented in Appendix C.

### 4.2 Batch Validate Subcircuit Rows

In a boolean circuit, we know for a fact that each $r_{i}$ must be either 0 or 1 s.t.

$$
o_{i}-y_{i}=r_{i}+X \epsilon_{i} \quad \text { s.t. } \quad r_{i} \in[0,1]
$$

We can confirm the boolean property of $\vec{r}$ with the booleanityTest protocol. This property implies a dishonest prover can only alter each $y_{i}$ by $\pm 1$ s.t.

$$
y_{i}^{*}=y_{i} \pm 1 ;
$$

The equation above also implies $y_{i}^{* \prime}=y_{i}^{\prime} \pm 1$ as far as our protocol is concerned since $y_{i}^{*}=$ $\left(y_{i}^{* \prime} \cdot x\right) \bmod q$. Leveraging this fact, we can now batch commit each $C_{j}$ (committing coefficients $\tau_{1, j}, \ldots, \tau_{m_{b}, j}$ ) with fewer generators by using a simple binary trick: multiply each row $i$ by a power of $2^{i}$.

Instead of committing $\tau_{1, j}, . ., \tau_{m_{b}, j}$ using $m_{b}$ generators $u_{1}, . ., u_{m_{b}}$, the prover now uses $\frac{m_{b}}{t b}$ generators $u_{1}, . ., u_{t_{b}}$ to commit each $C_{j}$, where $t b$ is the size of each batch $i^{\prime}$. Each $i^{\prime}$ covers $t b$ "rows" (See Figure 2). Let $s t=\left(i^{\prime}-1\right) \cdot t b$ we have:

$$
\begin{array}{ll}
\gamma_{i^{\prime}, j}=\sum_{i=1}^{t b} \tau_{s t+i, j} \cdot 2^{i} & j=\{1, \ldots, b\} \\
C_{j}=\prod_{i^{\prime}=1}^{m_{b} / t b} u_{i^{\prime}}^{\gamma_{i^{\prime}, j}} \in \mathbb{G} & i^{\prime}=\left\{1, \ldots, m_{b} / t b\right\} \tag{64}
\end{array}
$$

This can be visualized in Figure 2. For example, the exponent of generator $u_{1}$ in $C_{j}$ is $\gamma_{1, j}=$ $\sum_{i=1}^{32} \tau_{s t+i, j} \cdot 2^{i}$, and its (batched) blinding key is $\mu_{i^{\prime}}=\sum_{i=1}^{t b} \mu_{s t+i} \cdot 2^{i}$. After applying evaluation
point $x$, the opening of each $u_{i}$ ("row") is the summation of $t b$ values:

$$
\begin{equation*}
\sum_{i=1}^{t b} y_{s t+i}^{\prime} \cdot 2^{i}=\sum_{i^{\prime}=1}^{m_{b} / t b}\left(\gamma_{i^{\prime}, j} \cdot x^{j}+\mu_{i^{\prime}}\right) \tag{65}
\end{equation*}
$$

A dishonest prover cannot convince verifiers $\vec{y}^{\prime *} \neq \vec{y}^{\prime}$ by committing to $\vec{\gamma}^{*} \neq \vec{\gamma}$ s.t. $\vec{r}^{\prime *} \neq \vec{r}^{\prime}$ still passes the booleanity test. For example, let's say $y_{i}^{\prime *}=y_{i}^{\prime}+d_{i}$ s.t. $d_{i} \in\{0, \pm 1\}$ and $\gamma_{i^{\prime}, j}^{*}=\gamma_{i^{\prime}, j}+f_{i^{\prime}}$ where $f_{i^{\prime}} \in\left\{0, \delta_{i}\right\}$. To convince $\vec{r}^{*}$ the dishonest prover needs to find a pair of sets $\vec{d}, \vec{f}$ that satisfies the following equality before committing $\vec{\gamma}^{*}$ :

$$
\begin{equation*}
\sum_{i=1}^{t b} d_{s t+i} \cdot 2^{i}=\sum_{i^{\prime}=1}^{m_{b} / t b} f_{i^{\prime}, j} \cdot x^{j} \tag{66}
\end{equation*}
$$

Which cannot be done without prior knowledge of $x$. Note that although $\vec{y}^{\prime *}$ is provided by the prover after $x$ is known, the prover needs to set $\vec{d}$, which defines committed values $\vec{r}^{*}, \vec{\epsilon}^{*}, \vec{\gamma}^{*}$, before $x$ is available. The binary trick (multiplying each "row" by $2^{i}$ ) guards against the case where a dishonest prover finds a $\vec{y}^{\prime *}$ set s.t. $\sum_{i=1}^{t b} y_{s t+i}^{\prime *}=\sum_{i=1}^{t b} y_{s t+i}^{\prime}$, in which case the dishonest prover would not need to find a set $\vec{f}$ since $\sum_{i=1}^{t b} d_{s t+i}=0$.

To protect our protocol from the overflow attack, the $t b$ value (we default to 32 ) must be smaller than $|q|$, or else equation 19 would not be safe (e.g., a dishonest prover can find a set $\vec{d}$ s.t. $\left(\sum_{i=1}^{t b} d_{s t+i}\right)$ $\bmod q=0$ ).

Figure 2
The verifier can now batch commit $t b$ "rows" by using their "binary sum" $z_{i^{\prime}}$ :

$$
\begin{equation*}
z_{i^{\prime}}=\sum_{i=1}^{t b} y_{s t+i}^{\prime} \cdot 2^{i} \quad \text { for } \quad i^{\prime}=\left\{1, \ldots, m_{b} / t b\right\} \tag{67}
\end{equation*}
$$

The equation 20 is now updated to batch validate $t b$ "rows" at once.

$$
\begin{equation*}
\prod_{j=1}^{b} C_{j}^{x^{j}} \cdot M \stackrel{?}{=} \prod_{i^{\prime}}^{m_{b} / t b} u_{i^{\prime}}^{z_{i^{\prime}}} \tag{68}
\end{equation*}
$$

### 4.3 Leveraging Batched Rows to Shrink Communication Cost

We can leverage the batch validation approach to reduce the communication cost. To do so, the prover now computes and sends $\vec{y} \in \mathbb{Z}_{q}$ to the verifier instead of $\vec{y}^{\prime}$. Since $\left|y_{i}\right| \approx \frac{1}{4}\left|y_{i}^{\prime}\right|$, sending $\vec{y}$ will save approximately $24 \times m_{b}$ bytes in communication cost.

The equation 19 will now be performed by the prover instead of the verifier. From equation 19 , we can infer that $\left(y_{i} \cdot x^{-1}\right) \bmod q=\left(y_{i}^{\prime}\right) \bmod q$. We can therefore conclude that each $y_{i}^{\prime}$ can be represented as:

$$
y_{i}^{\prime}=\left(y_{i} \cdot x^{-1}\right) \bmod q+v_{i} \cdot q
$$

Note that the equation above must not create an overflow in $\mathbb{Z}_{p}$ (e.g. $y_{i}^{\prime}<P$ ). Rearranging the equation above, we get an equation that computes $v_{i}$ :

$$
v_{i}=\left(y_{i}^{\prime}-\left(y_{i}^{\prime}\right) \bmod q\right) / q
$$

Since we want to prove the committed value in batch, we need $v_{i^{\prime}}=\sum_{i=1}^{t b} v_{s t+i} \cdot 2^{i}$ instead, which we can compute directly from:

$$
\begin{equation*}
v_{i^{\prime}}=\left(\sum_{i=1}^{m_{b} / t b}\left(y_{s t+i}^{\prime}-\left(y_{s t+i}^{\prime}\right) \bmod q\right) \cdot 2^{i}\right) / q \tag{69}
\end{equation*}
$$

Since the prover now passes $\vec{y}, \vec{v}$ instead of $\vec{y}^{\prime}$ to the verifier, the verifier needs to compute $z_{i^{\prime}}$ with the following equation instead of equation 67 .

$$
\begin{equation*}
z_{i^{\prime}}=\left(\sum_{i=1}^{m_{b} / t b}\left(y_{s t+i} \cdot x^{-1}\right) \bmod q \cdot 2^{i}\right)+v_{i^{\prime}} \cdot q \tag{70}
\end{equation*}
$$

Lastly, $v_{i^{\prime}}$ must not overflow $p$ because it can be used as an attacking surface:

$$
\begin{equation*}
z_{i^{\prime}}<p \tag{71}
\end{equation*}
$$

We now define the updated protocol for the boolean circuit - BinaryBoost:

$$
\begin{align*}
& \text { Input: }\left(g, h, \vec{u}, \vec{P} \in \mathbb{G}, \vec{a}^{\prime} \in \mathbb{Z}_{q}^{l} ; \vec{a}, \vec{v} \in \mathbb{Z}_{q}\right)  \tag{72}\\
& \mathcal{P}^{\prime} \text { s input: }\left(g, h, \vec{u}, R, \vec{a}^{\prime} ; \vec{a}, \vec{\alpha}\right) \quad \mathcal{V}^{\prime} \text { s input }:\left(g, h, \vec{u}, R, \vec{a}^{\prime}\right)  \tag{73}\\
& \mathcal{P} \text { compute: }  \tag{74}\\
& \mu_{i} \stackrel{\$}{\leftarrow} \mathbb{Z}_{m}, \quad i=\left\{1, \ldots, m_{b}\right\}  \tag{75}\\
& \mu_{i^{\prime}}=\sum_{i=1}^{t b} \mu_{\left(i^{\prime}-1\right) t b+i} \cdot 2^{i} \quad i^{\prime}=\left\{1, \ldots, m_{b} / t b\right\}  \tag{76}\\
& M=\prod_{i=1}^{m_{b}} u_{i}^{\mu_{i^{\prime}}} \in \mathbb{G}  \tag{77}\\
& b_{i} \in \mathbb{Z}_{2}, \quad \beta_{i} \stackrel{\$}{\leftarrow} \mathbb{Z}_{q} \quad i=\{1, \ldots, l \cdot 32\}  \tag{78}\\
& r_{1}, \epsilon_{1}, \vec{\tau}_{1}=\text { computeSubCircuitKeys }(\vec{b}, \vec{\beta}) \text {; }  \tag{79}\\
& \epsilon_{1}=\left(\epsilon_{1}-\mu_{1}\right) \bmod q \in \mathbb{Z}_{q}  \tag{80}\\
& \text { for } \quad i=2, \ldots, m_{b} \quad\{  \tag{81}\\
& \vec{b}=\vec{b}\left\|r_{i-1}, \quad \vec{\beta}=\vec{\beta}\right\| \epsilon_{i-1} ; \tag{82}
\end{align*}
$$

$$
\begin{align*}
& r_{i}, \epsilon_{i}, \vec{\tau}_{i}=\text { computeSubCircuitKeys }_{i}(\vec{b}, \vec{\beta}) ;  \tag{83}\\
& \left.\epsilon_{i}=\left(\epsilon_{i}-\mu_{i}\right) \bmod q \in \mathbb{Z}_{q} \quad\right\}  \tag{84}\\
& \phi \stackrel{\$}{\leftarrow} \mathbb{Z}_{p}  \tag{85}\\
& R=g^{r} h^{\phi} \in \mathbb{G}  \tag{86}\\
& \text { InputMapping-Setup }(\vec{P}\|R, g ; \vec{v}\| \phi) \rightarrow P: \vec{S}, \vec{T}  \tag{87}\\
& \text { for } i^{\prime}=1, \ldots, m_{b} / t b \quad\{  \tag{88}\\
& s t=\left(i^{\prime}-1\right) \cdot t b  \tag{89}\\
& \gamma_{i^{\prime}, j}=\sum_{i=1}^{t b} \tau_{s t+i, j} \cdot 2^{i-1} \in Z_{p} \quad j=\{1, \ldots, b\}  \tag{90}\\
& \left.C_{i^{\prime}, j}=u_{i^{\prime}}^{\gamma_{i^{\prime}, j}} \in \mathbb{G} \quad j=\{1, \ldots, b\}\right\}  \tag{91}\\
& C_{j}=\prod_{i^{\prime}=1}^{m_{b} / t b} C_{i^{\prime}, j} \in \mathbb{G} \quad i^{\prime}=\left\{1, \ldots, m_{b} / t b\right\}  \tag{92}\\
& \text { Booleanity-Setup }\left(\vec{b}|\mid \vec{r}, \vec{\beta} \| \vec{\epsilon}) \rightarrow \mathcal{P}: D, \rho, \overrightarrow{\delta_{1}}, \overrightarrow{\delta_{2}}\right.  \tag{93}\\
& \mathcal{P} \rightarrow \mathcal{V}: \vec{C}, M, D, \vec{S}, \vec{T}  \tag{94}\\
& \mathcal{V} \rightarrow \mathcal{P}: x \stackrel{\$}{\leftarrow} \mathbb{Z}_{q}  \tag{95}\\
& \mathcal{P} \text { compute : }  \tag{96}\\
& a_{i}^{\prime}=a_{i}+x \cdot \alpha_{i} \in \mathbb{Z}_{q}  \tag{97}\\
& i=\{1, \ldots, l\} \\
& y_{i}^{\prime}=\sum_{j}^{b} \tau_{i, j} \cdot x^{j}+\beta_{i} \in \mathbb{Z}_{p}  \tag{98}\\
& i=\left\{1, \ldots, m_{b}\right\} \\
& y_{i}=\left(y_{i}^{\prime} \cdot x\right) \bmod q \in \mathbb{Z}_{q}  \tag{99}\\
& i=\left\{1, \ldots, m_{b}\right\} \\
& \text { for } \quad i^{\prime}=1, \ldots, m_{b} / t b \quad\{  \tag{100}\\
& s t=\left(i^{\prime}-1\right) \cdot t b  \tag{101}\\
& \left.v_{i^{\prime}}=\left(\sum_{i=1}^{m_{b} / t b}\left(y_{s t+i}^{\prime}-\left(y_{s t+i}^{\prime}\right) \bmod q\right) \cdot 2^{i}\right) / q\right\}  \tag{102}\\
& \mathcal{P} \rightarrow \mathcal{V}: \vec{a}^{\prime}, \vec{b}^{\prime}, \vec{y}, \vec{v}  \tag{103}\\
& \mathcal{V} \text { verify final output: }  \tag{104}\\
& \text { for } i^{\prime}=1, \ldots, m_{b} / t b \quad\{  \tag{105}\\
& s t=\left(i^{\prime}-1\right) \cdot t b  \tag{106}\\
& \left.z_{i^{\prime}}=\left(\sum_{i=1}^{m_{b} / t b}\left(y_{i} \cdot x^{-1}\right) \bmod q \cdot 2^{i}\right)+v_{i^{\prime}} \cdot q\right\}  \tag{107}\\
& \text { if }\left(\prod_{i^{\prime}}^{m_{b} / t b} u_{i^{\prime}}^{z_{i^{\prime}}} \stackrel{?}{=} \prod_{j=1}^{b} C_{j}^{x^{j}} \cdot M\right) \text { then continue }  \tag{108}\\
& \text { else reject }  \tag{109}\\
& \text { for } \quad i=1, \ldots, m_{b} \quad\{  \tag{110}\\
& o_{i}=\text { computeSubCircuit }_{i}\left(\vec{b}^{\prime}\right) \in \mathbb{Z}_{q}  \tag{111}\\
& r_{i}^{\prime}=o_{i}-y_{i} \in \mathbb{Z}_{q}  \tag{112}\\
& \left.\vec{a}^{\prime}=\vec{a}^{\prime}| | r_{i}^{\prime} \in \mathbb{Z}_{q} \quad\right\} \tag{113}
\end{align*}
$$

# if InputMapping-Verify $\left(\vec{P}\left\|R, \vec{S}, \vec{T}, g, h, \vec{a}^{\prime}\right\| r^{\prime} ; \vec{a}\|r, \vec{\alpha}\| \epsilon, \vec{v} \| \phi\right)$ <br> $\wedge$ Booleanity-Verify $\left(\overrightarrow{b^{\prime}} ; \rho, \overrightarrow{\delta_{1}}, \overrightarrow{\delta_{2}}{ }^{\prime}\right)$ <br> $\wedge a_{i}^{\prime}=\sum_{j=1}^{32} b_{i, j}^{\prime} \cdot 2^{j} \quad i=\{1, \ldots, l\}$ <br> then accept <br> else reject 

> Protocol BinaryBoost

Theorem 4. (Input Transformation Protocol with Binary Boost). The proof system presented in this section has perfect completeness, PHVZK, and CWEE. The proof for Theorem 4 is presented in Appendix $D$.

### 4.4 The Asymptotic Cost

The prover runtime of our baseline protocol (Protocol Baseline) is dominated by $O\left(\iota m_{p} \log m_{p}+l\right)$ field operations and $O\left(m_{p}+l\right)$ group exponentiations. The value of $\iota$ depends on how the circuit is wired. For the sequential multiplication test case that we benchmark against, $\iota=m_{b} \sum_{i=1}^{\log b} i$; the verifier runtime is dominated by $O\left(n+m_{p}^{\frac{1}{2}}+l\right)$ field operations and $O\left(m^{\frac{1}{2}}+l\right)$ group exponentiations; and the communication cost is dominated by $O\left(m^{\frac{1}{2}}+l\right)$ group elements and $O\left(m_{p}{ }^{\frac{1}{2}}+l\right)$ field elements.

The binary-boost protocol improves the asymptotic group exponentiation cost of the prover runtime to $O\left(\iota m_{p} \log m_{p}+l\right)$ field operations and $O\left(\frac{1}{t b} m_{p}+m_{\left.p^{\frac{1}{2}}+l\right) \text { group exponentiations, where } t b}^{t b}\right.$ is set to 32 . While the asymptotic cost of the communication cost is the same, the concrete communication cost is reduced by approximately $20 \%$, which is achieved by sending the smaller $\vec{y}$ instead of the larger $\vec{y}^{\prime}$.

Our protocol is also natively faster than its asymptotic cost indicates because group exp. operations of our protocol operate mostly in $q$ (61-bits or 93 -bits), which is significantly smaller than $p$ in ECC. This gives us approximately $2-2.5 \mathrm{X}$ performance gains when performing multi-exponentiations. If the circuit is shallow (e.g., for a circuit with the final output $r=\sum_{i=1}^{n} a \cdot b$ where $a, b$ are circuit inputs and $n$ stands for the total number of gates in a circuit., we have $m_{p}=1$ ), the prover work would be dominated by inexpensive $O(n / 2)$ field addition operations.

## 5 Performance Comparison

We compare the performance of our protocol to some of the most popular transparent zero-knowledge protocols for which open source codes are available. Our test runs are performed on an $\operatorname{Intel}(\mathrm{R})$ Core(TM) i $7-9750 \mathrm{H}$ CPU @ 2.60 Ghz . Only one core is being utilized, and all tests are run on a single CPU thread. Our test code is a non-interactive implementation (using Fiat-Shamir heuristic) of Protocol 3 (not the memory-efficient option mentioned in section 4.5 because we want to leverage the Pippenger acceleration to get the most optimal runtime result).

The baseline protocols we picked are Hyrax, Ligero, Aurora, and Spartan-NIZK. These protocols were chosen because they are the most representative of popular zero-knowledge protocols and can be verified with open source code. In particular, Aurora outperforms STARK in all key parameters (prover runtime, verifier runtime, proof size), and the NIZK version of Spartan offers the most balanced performance across all performance parameters. We also do not consider SNARKs even though most of them can be made transparent by switching to a transparent polynomial commitment scheme, as they are hardly efficient after the switch.

We didn't consider transparent protocols that depends on circuit depth such as GKR-based protocols simply because they can't handle $2^{20}$ sequential multiplications. We also don't consider voice-based protocols, as they are only optimized for prover work and generally require one round of interaction. Other popular transparent schemes such as Bulletproofs are also not being considered because they have a linear verifier runtime and therefore are not succinct.

Spartan++ and Lakonia are two more recent developments that we didn't include in our benchmark testing but are worth mentioning. The improvement of Spartan++ over SpartanNIZK is marginal, and the performance of Lakonia is largely comparable to that of SpartanNIZK (the prover performance of SpartanNIZK is approximately 3 X more efficient, and the verifier performance is 1.5 X more efficient than that of Lakonia, while Lakonia is 4X more efficient than SpartanNIZK in proof size).

We set the number of inputs to our protocol to 30 integers, and each input is represented by 32 bits so that there are a total of $30 \cdot 32=960$ input bits to the circuit. The circuit we use performs $n$ sequential multiplications on $l$ inputs, so we have $m_{p}=n$, likely much closer to the worst-case scenario of our protocol than the test cases of other protocols that we are comparing against. If we run a shallow circuit where $m_{p}$ number is small, the benchmark result will likely be significantly better. For example, if we have a circuit where $m_{p}=1$, then its prover runtime performance will be comparable to the verifier runtime of our protocol.

We picked two NTT prime numbers for our benchmark testing, one for the interactive case and another for the non-interactive case.

The NTT prime number we picked for the non-interactive case is $q=1945555039024054273$, a 61-bit number that implies the soundness error will be at most $2^{-51}$ for a circuit with $2^{20}$ sequential multiplications where $m_{p}=n$, which is more than enough in real-life applications where one interaction is allowed.

For the non-interactive case (through using the Fiat-Shamir heuristic), the prime number for our integer field is 5241902353849032101525979137 , a 93 -bit prime that implies the soundness error will be at most $2^{-83}$ for a deep circuit with $2^{20}$ sequential multiplications where $m_{p}=n$.

To maximize the advantage of the NTT algorithm in computing sequential multiplications, we process each segment $\left(1, \ldots, m_{b}\right)$ of our circuit in binary tree format to represent layers we would see in the real world. Such tuning will likely not be required in real-world applications since large circuits are usually layered and multiplication gates should be somewhat balanced out across layers already.

For group operations, we use the curve25519-dalek implementation, and Pippenger acceleration is applied to all sum-of-product group operations. For field operations, we use the Montgomery algorithm to accelerate modular multiplications on the prime $q$. One challenge we had was building an efficient 128 -bit multiplication function. Unlike the 64 -bit (provided by the CPU) and 256 -bit (heavily optimized with assembly code for various crypto computations such as ECC) multiplications, we couldn't find any efficient multiplication function for 128-bit multiplication so we had to do it ourselves. We did the best we could to optimize our 128-bit multiplication without using assembly code, so there are still rooms for improvement if assembly code is used.

We set $m_{b}=b$ to get a more balanced result. Alternatively, one can set $m_{b}>b$ to get better prover runtime performance in exchange for more expensive communication costs. This is because doing so will 1) evade expensive NTT computations at high degrees and 2) better leverage Pippenger acceleration in computing $\vec{C}$, which will continuously improve group exponentiation operations before peaking out at around $m_{b}=2^{14}$.

Table 1 shows that as the circuit size gets bigger, the prover performance of our protocol is becoming increasingly more efficient than all of our baseline protocols. This is because the cost associated with the number of inputs to the circuit is fixed ( 960 bits), and its impact relative to the cost of evaluating the whole circuit gradually declines as the circuit size gets bigger (the same effect will also apply to verifier runtime and proof size benchmarks below). To the best of our knowledge, our protocol offers the best prover performance in the non-interactive setting in the literature.

| Circuit size | $2^{10}$ | $2^{12}$ | $2^{14}$ | $2^{16}$ | $2^{18}$ | $2^{20}$ |
| :--- | :---: | :---: | :---: | :---: | :---: | :---: |
| Hyrax | 1 | 2.8 | 9 | 36 | 117 | 486 |
| Ligero | 0.1 | 0.4 | 1.6 | 4 | 17 | 69 |
| Aurora | 0.5 | 1.6 | 6.5 | 27 | 116 | 485 |
| SpartanNIZK | 0.02 | 0.05 | 0.16 | 0.6 | 1.7 | 6.2 |
| Baseline(61 bit) | 0.008 | 0.02 | 0.05 | 0.18 | 0.7 | 2.5 |
| Baseline(93 bit) | 0.06 | 0.12 | 0.3 | 0.6 | 1.7 | 4.9 |
| B-Boost(61 bit) | 0.01 | 0.01 | 0.02 | 0.06 | 0.2 | 0.8 |

Table 1. Prover performance comparison (seconds)

| Circuit size | $2^{10}$ | $2^{12}$ | $2^{14}$ | $2^{16}$ | $2^{18}$ | $2^{20}$ |
| :--- | :---: | :---: | :---: | :---: | :---: | :---: |
| Hyrax | 14 | 17 | 21 | 28 | 38 | 58 |
| Ligero | 546 | 1,076 | 2,100 | 5,788 | 10,527 | 19,828 |
| Aurora | 477 | 610 | 810 | 1,069 | 1,315 | 1,603 |
| SpartanNIZK | 9 | 12 | 15 | 21 | 30 | 48 |
| Baseline(61bit) | 16 | 17 | 21 | 27 | 39 | 65 |
| Baseline(93bit) | 20 | 22 | 26 | 33 | 48 | 77 |
| B-Boost(61bit) | 15 | 16 | 19 | 24 | 34 | 55 |

Table 2. Proof size comparison (kilobytes)

Table 2 shows that the communication cost of our protocol dominates that of Ligero and Aurora, while largely comparable to SpartanNIZK and Hyrax.

| Circuit size | $2^{10}$ | $2^{12}$ | $2^{14}$ | $2^{16}$ | $2^{18}$ | $2^{20}$ |
| :--- | :---: | :---: | :---: | :---: | :---: | :---: |
| Hyrax | 206 | 253 | 331 | 594 | 1.6 s | 8.1 s |
| Ligero | 50 | 179 | 700 | 2 s | 7.5 s | 33 s |
| Aurora | 192 | 590 | 2 s | 7.2 s | 29.8 s | 118 s |
| SpartanNIZK | 7 | 11 | 17 | 36 | 103 | 387 |
| Baseline(61bit) | 2 | 2 | 3 | 4 | 7 | 16 |
| Baseline(93bit) | 5 | 5 | 6 | 8 | 15 | 37 |
| B-Boost(61bit) | 2 | 2 | 3 | 4 | 6 | 17 |

Table 3. Verifier performance comparison (milliseconds)

Table 3 demonstrates that our protocol achieves a significant improvement of over one order of magnitude in verifier runtime compared to other protocols.

It is worth noticing that input transformation costs can be shared across multiple circuits if the inputs are reused in other circuit verifications. Only the "base" case is the pure circuit cost that cannot be shared between circuits; this may lead to further reductions in communication costs in the real world.

## 6 Conclusion

We believe the best way to run our protocol in the real world is to use protocol Baseline to run an arithmetic circuit and embed an existing range-proof protocol to perform comparison operations as explained in section 3.3. We also present an ultra-efficient boolean circuit protocol BinaryBoost just in case a boolean circuit is necessary.

## Appendix

## A. Proof for Theorem One

Proof. Perfect completeness follows from the fact that protocol InputMapping Validation is trivially complete.

To prove PHVZK for relation 4. we define a simulator $\mathcal{S}_{\text {input }}$. To start, simulator $\mathcal{S}_{\text {input }}$ randomly generates $\vec{S}, \vec{T}$ and sends them to the verifier. After receiving challenge $x$ from the verifier, the simulator randomly generates and sends $\vec{a}^{\prime}, \vec{e}$ according to the protocol specification (s.t. $\left.\left(e_{i}\right) \bmod q=0\right)$ and sends them to the verifier.

When challenge $k$ is received, the simulator $\mathcal{S}_{\text {input }}$ calls simulator $\mathcal{S}_{\text {dlog }}$ to generate transcript $\operatorname{tr}_{v_{t}}$ and send it to the verifier.

The verifier then follows the protocol to compute $P K_{v_{t}}$ using transcripts $\vec{S}, \vec{T}, \vec{a}^{\prime}, \vec{e}$ and calls the Verify $D L$ function to test it against the input transcript $t r_{v_{t}}$, We already know for a fact that simulator $\mathcal{S}_{\text {dlog }}$ can extract witnesses from any discrete-log relation and that simulators $\mathcal{S}_{\text {input }}$ and $\mathcal{S}_{\text {dlog }}$ choose all proof elements and challenges according to the randomness supplied by the adversary from their respective domains or compute them directly as described in the protocol. Since all elements in proof transcripts are either independently randomly distributed or their relationship is fully defined by the verification equations, we can conclude that protocol InputMapping is PHVZK.

To prove CWEE, we construct an extractor $\mathcal{X}$ that also uses extractor $\mathcal{X}_{d l o g}$ to extract witnesses from proof of knowledge transcripts. To start, the extractor $\mathcal{X}$ interacts with the prover and receives $\vec{S}, \vec{T}$ from the prover. The extractor $\mathcal{X}$ then generates a challenge $x_{1}$ and forwards it to the prover. After receiving $\vec{e}_{1}, \vec{a}_{1}^{\prime}$, the extractor rewinds and repeats this step with another challenge $x_{2}$ to retrieve $\vec{e}_{2}, \vec{a}_{2}^{\prime}$.

After receiving transcripts $\vec{S}, \vec{T}$ and transformed inputs $\vec{a}^{\prime}$ from the prover, the extractor generates $k_{1}$ and then follows the protocol to get $\operatorname{tr}_{v_{t 1}}$ (from prover), $P K_{v_{t 1}}$. The extractor $\mathcal{X}$ then calls the extractor $\mathcal{X}_{d l o g}$ to retrieve $v_{t 1}$ from generator $h^{x} /\left(g^{x^{2}} u\right)$. The extractor then rewinds and repeats this step $l$ times to retrieve $v_{t 2}, \ldots, v_{t l+1}$. Through interpolation, the extractor retrieves witnesses $v_{i}$ for all $i$ in $\{1, \ldots, l\}$. Since we know for a fact that $e_{i}$ cannot alter $a_{i}$ and committed values $\vec{P}, \vec{S}, \vec{T}$ all applied to different powers of $x, a$ cannot be altered except for a negligible probability or we find a non-trivial relationship between generators $g, h, u$.

Using any two different challenges $x_{i}, x_{i+1}$ we mentioned earlier, the extractor gets $\vec{a}_{1}^{\prime}$ and $\vec{a}_{2}^{\prime}$ from the prover, which we can trivially retrieve $\vec{\alpha}$ for all $i=\{1, \ldots, l\}$ using the equation below.

$$
\begin{equation*}
a_{1_{i}}^{\prime}-a_{2_{i}}^{\prime}=\alpha_{i}\left(x_{1}-x_{2}\right) \tag{119}
\end{equation*}
$$

With $\vec{a}, \vec{\alpha}$ extracted, we can also extract $\omega$ from equation 7. Plugging witnesses $\vec{a}, \vec{\alpha}, \vec{v}, \vec{\omega}$ to generators $g, h, u$ we can re-write the left and right sides of equation 9 to:

$$
\begin{equation*}
\left(h^{x} /\left(g^{x^{2}} u\right)\right)^{\sum_{i}^{l} v_{i} \cdot k_{i}}=P K_{v_{t}}=\prod_{i=1}^{l}\left(\frac{g^{a_{i} \cdot x} h^{v_{i} \cdot x}}{g^{a_{i}^{\prime} \cdot x-e_{i}+\left(v_{i}-\alpha_{i}\right) \cdot x^{2}+\omega_{i} \cdot q} \cdot u^{v_{i}}}\right)^{k^{i}} \in \mathbb{G} \tag{120}
\end{equation*}
$$

The equality above must be true for a computationally bounded prover, or we find a non-trivial relationship between generators $g, h, u$.

## B. Proof for Theorem Two

Proof. Perfect completeness follows from the fact that the protocol Baseline is trivially complete.
To prove PHVZK for relation 1, we define a simulator $\mathcal{S}$. Simulator $\mathcal{S}$ calls on simulators $\mathcal{S}_{\text {input }}$ defined earlier to generate transcripts and simulate interactions in the InputMapping sub-protocol used in our baseline protocol.

We have already showed $\mathcal{S}_{\text {input }}$ can simulate all interactions needed in sub-protocol InputMapping, which includes generating transcripts $\vec{S}, \vec{T}$. We now show how $\mathcal{S}$ generates the rest of the transcripts according to the randomness supplied by the adversary from their respective domains or computes them directly as described in the protocol.

Simulator $\mathcal{S}$ randomly generates committed transcripts $\vec{C}, M$ and sends them to the verifier. After receiving challenge $x$ from the verifier, the prover rewinds and regenerates $M$ with a randomly generated $\vec{y}^{\prime}$ s.t. the updated $M^{*}$ is:

$$
\begin{equation*}
M^{*}=\frac{\prod_{i}^{m_{b}} u^{y_{i}^{\prime}}}{\prod_{j=1}^{b} C_{j}^{x^{j}}} \in \mathbb{G} \tag{121}
\end{equation*}
$$

The simulator then randomly generates $\vec{a}$ before sending them to the verifier. Since all elements in proof transcripts are either independently randomly distributed or their relationship is fully defined by the verification equations, we can conclude that the protocol Baseline is PHVZK.

To prove CWEE, we define extractor $\mathcal{X}$ that calls on extractors $\mathcal{X}_{\text {input }}$ defined earlier to extract witnesses for the two sub-protocols used in the protocol Baseline.

We already know $\mathcal{X}$ can extract $\vec{a}, \vec{\alpha}, \vec{v}$ and $\vec{r}, \vec{\epsilon}$ using extractor $\mathcal{X}_{\text {input }}$ from committed transcripts $\vec{S}, \vec{T}$. Using input witnesses, we can use the function computeSubCircuitKeys to compute circuit witnesses $\vec{r}, \vec{\epsilon}, \vec{\tau}$.

Next, we verify the validity of circuit witnesses $\vec{r}, \vec{\epsilon}, \vec{\tau}$ by checking if we can extract the same circuit witnesses from commitments $\vec{C}, M$. The extractor $\mathcal{X}$ interacts with the prover and receives $\vec{C}, M$ from the prover. The extractor $\mathcal{X}$ then generates $b+1$ challenges $\vec{x}$ and forwards them to the prover. After receiving circuit inputs $\vec{a}^{\prime}$ and circuit evaluation transcripts $\vec{y}_{1}^{\prime}$ computed from the first challenge $x$, the extractor rewinds and repeats this step $b$ times to generate challenges $x_{2}, \ldots, x_{b+1}$ and retrieve witnesses $\vec{y}_{2}^{\prime}, \ldots, \vec{y}_{b+1}^{\prime}$ from interpolation. The retrieved witnesses must pass equality 17 for a computationally bounded prover, or else we find a non-trivial discrete log relationship between generators $\vec{u}$.

We then apply polynomial interpolation to retrieve circuit witnesses $\vec{\tau}_{1}, \ldots, \vec{\tau}_{m_{b}}$ and $\vec{\mu}$ from $\vec{y}_{1}^{\prime}, \ldots, \vec{y}_{b+1}^{\prime}$ The extractor then follows the protocol to compute $\vec{r}^{\prime}$, and then trivially extracts $\vec{r}, \vec{\epsilon}$ with any pair of challenge $x_{1}, x_{2}$.

Finally, we check if the extracted circuit witnesses $\vec{r}, \vec{\epsilon}, \vec{\tau}$ match that computed from input witnesses $\vec{a}, \vec{\alpha}$ using computeSubCircuitKeys functions, which must be true for a computationally bounded prover except with negligible probability s.t. the dishonest prover made a correct guess on challenge $x$ or else we find a non-trivial discrete log relationship between generators $g, h$ (for input witnesses) and/or between generators $\vec{u}$ (for circuit witnesses).

## C. Proof for Theorem Three

Proof. Perfect completeness follows from the fact that the protocol BooleanityTest is trivially complete.

To prove PHVZK for relation 58, we define a simulator $\mathcal{S}_{b-t e s t}$. To start, $\mathcal{S}_{b-\text { test }}$ randomly generates a group element $D$, which represents the committed polynomial. After challenge $x$ is received from the verifier,
$\mathcal{S}_{b-\text { test }}$ uses a simulator $\mathcal{S}_{p}$ to simulate proof transcripts needed for polynomial commitment evaluation, which we know exist for a fact [9].

The simulators $\mathcal{S}_{b-\text { test }}$ and $\mathcal{S}_{p}$ choose all proof elements and challenges according to the randomness supplied by the adversary from their respective domains or compute them directly as described in the protocol. Since all elements in proof transcripts are either independently randomly distributed or their relationship is fully defined by the verification equations, we can conclude that protocol BooleanityTest is PHVZK.

To prove this protocol has CWEE, we first define an extractor $\mathcal{X}_{b-\text { test }}$ for Protocol Booleanity that extracts witnesses $\vec{b}, \vec{\beta}$.

The extractor first receives a vector commitment $D$ from the prover. The extractor then generates $2 l+1$ challenges $k_{1}, \ldots, k_{2 l+1}$ and retrieves $2 l+1 \overrightarrow{y_{1}}$ and $2 l+1 \overrightarrow{y_{2}}$ through repeated rewinding. The extractor $\mathcal{X}_{b-\text { test }}$ then calls on an extractor for the polynomial commitment evaluation protocol (which we know exists for a fact [9]) to extract witnesses $\overrightarrow{\delta_{1}}, \overrightarrow{\delta_{2}}, \phi$ or else we find a non-trivial relationship between elements in $\vec{u}, h$.

It is trivial to extract witnesses $b_{i}, \beta_{i}$ from $\delta_{1} . \delta_{2}$ for $i=\{1, \ldots, l\}$ s.t. they must pass the equality test defined in equation 61 except with negligible probability of a dishonest prover making the right guess on $x$

## D. Proof for Theorem Four

Proof. Perfect completeness follows from the fact that the protocol Binary Boost is trivially complete.
To prove PHVZK for relation 1, we define a simulator $\mathcal{S}$. Simulator $\mathcal{S}$ calls on simulators $\mathcal{S}_{\text {input }}$ and $\mathcal{S}_{b-t e s t}$ defined earlier to generate transcripts and simulate interactions in the two sub-protocols used in the Binary Boost protocol.

We have already shown $\mathcal{S}_{\text {input }}$ can simulate all interactions needed in sub-protocol Input-Mapping, which includes generating transcripts $\vec{S}, \vec{T}$. We have also shown that $\mathcal{S}_{b-t e s t}$ can simulate all interactions needed in the sub-protocol Booleanity Test, which covers generating transcripts $D$. We now show $\mathcal{S}$ generates the rest of the transcripts according to the randomness supplied by the adversary from their respective domains or computes them directly as described in the protocol.

Simulator $\mathcal{S}$ randomly generates committed transcripts $\vec{C}, D, M$ and sends them to the verifier. After receiving challenge $x$ from the verifier, the prover rewinds and regenerates $M^{*}$ with a randomly generated $\overrightarrow{z_{i^{\prime}}}$ s.t. $M^{*}$ is:

$$
\begin{equation*}
M^{*}=\frac{\prod_{i^{\prime}}^{m_{b} / t b} u^{z_{i^{\prime}}}}{\prod_{j=1}^{b} C_{j}^{x^{j}}} \in \mathbb{G} \tag{122}
\end{equation*}
$$

The simulator then reversely computes $\vec{y}, \vec{v}$ from generated $\overrightarrow{z_{i^{\prime}}}$ and randomly generates $\vec{a}^{\prime}$ and $\vec{b}^{\prime}$ according to protocol specification before sending them to the verifier. Since all elements in proof transcripts are either independently randomly distributed or their relationship is fully defined by the verification equations, we can conclude that the protocol Binary Boost is PHVZK.

To prove CWEE, we define an extractor $\mathcal{X}$ that calls on extractors $\mathcal{X}_{\text {input }}$ and $\mathcal{X}_{b-\text { test }}$ defined earlier to extract witnesses for the two sub-protocols used in the BinaryBoost protocol.

We already know we can extract input witnesses $\vec{a}, \vec{\alpha}, v$ and $\vec{r}, \vec{\epsilon}$ using extractor $\mathcal{X}_{\text {input }}$ from $\vec{a}^{\prime}, \vec{r}^{\prime}$ and committed transcripts $\vec{S}, \vec{T}$. We also know that we can extract boolean witnesses $\vec{b}, \vec{\beta}$ from $\vec{b}^{\prime}$ and committed value $D$ using extractor $\mathcal{X}_{b-t e s t}$. Since we can extract witnesses for $\vec{a}^{\prime}$ and that of their boolean constituents $\vec{a}^{\prime}$, equality (line TBD-115 of the protocol) 62 must be true for a computationally bounded prover except for the negligible chance of making the right guess on $x$.

Using these boolean witnesses, we can use the function computeSubCircuitKeys to compute circuit witnesses $\vec{r}, \vec{\epsilon}, \vec{\tau}$. We validate the CWEE of the protocol by testing if the witnesses computed from input witnesses match those committed by the prover using commitments $\vec{C}, M$.

The extractor $\mathcal{X}$ interacts with the prover and receives $\vec{C}, M, D$ from the prover, then generates $b+1$ challenges $\vec{x}$ and forwards them to the prover. After receiving circuit inputs $\vec{a}^{\prime}, \vec{b}^{\prime}$ and circuit evaluation transcripts $\vec{y}_{1}, \vec{v}_{1}$ computed from the first challenge $x$, the extractor rewinds and repeats this step $b$ times to retrieve $\vec{y}_{2}, \ldots, \vec{y}_{b+1}, \vec{v}_{2}, \ldots, \vec{v}_{b+1}$. We then apply polynomial interpolation to retrieve circuit witnesses $\vec{\tau}$ and $\vec{\mu}$ from $\vec{y}_{1}, \ldots, \vec{y}_{b+1}$, and retrieve batched circuit witnesses $\vec{\gamma}$ and $\vec{\mu}^{\prime}$ from $\vec{y}_{1}, \ldots, \vec{y}_{b+1}$ and $\vec{v}_{1}, \ldots, \vec{v}_{b+1}$. Where each element $\gamma_{i^{\prime}, j}$ in $\vec{\gamma}$ is the sum of the products of $t b$ elements in $\vec{\tau}_{j}$ as defined in equation 63 .

Apply any challenge $x$ to each $\gamma_{i^{\prime}, j}$ and masking key $\mu_{i}$ we get the exponents of the left hand side generators of equality 68 , which are exponents of $\vec{C}(\vec{\tau})$ multiplied by powers of $x$ plus exponents of $M(\vec{\mu})$. Apply any challenge $x$ to $\vec{y}$ and multiply each element in $\vec{v}$ by $q$, we get the exponents of the right-hand side of equality $68 \vec{u}\left(z_{i^{\prime}}\right)$ (defined in equation 70 :

$$
\sum_{j=1}^{b} \gamma_{i^{\prime}, j} x^{j}+\mu=\left(\sum_{i=1}^{m_{b} / t b}\left(y_{i} \cdot x^{-1}\right) \bmod q \cdot 2^{i}\right)+v_{i^{\prime}} \cdot q
$$

The equality above must be true for each $i^{\prime}=\left\{1, \ldots, m_{b} / t b\right\}$ unless we find a non-trivial discrete log relation between generators $\vec{u}$. After checking that the that the exponents of both sides do not overflow $p$ and $t b<|q|$, we multiply both sides by $x$ and then apply $\bmod q$ to both sides we get:

$$
\begin{equation*}
\sum_{j=1}^{b} \gamma_{i^{\prime}, j} x^{j+1}+\mu_{i^{\prime}} x=\sum_{i=1}^{t b} y_{i} \cdot 2^{i} \in \mathbb{Z}_{q} \tag{123}
\end{equation*}
$$

We know that $y_{i}=\sum_{j=1}^{b} \tau_{i, j} x^{i+1}+\mu_{i} x \in \mathbb{Z}_{q}$ by protocol definition, so we can replace $y_{i}$ by using in $\vec{\tau}, \overrightarrow{m u}$ received from the prover.

$$
\begin{equation*}
\sum_{j=1}^{b} \gamma_{i^{\prime}, j} x^{j+1}+\mu_{i^{\prime}} x=\sum_{i=1}^{t b}\left(\sum_{j=1}^{b} \tau_{i, j} x^{i+1}+\mu_{i} x\right) \cdot 2^{i} \in \mathbb{Z}_{q} \tag{124}
\end{equation*}
$$

Assuming the dishonest prover can only alter $y_{i}^{*}$ by $y_{i} \pm 1$, the right hand side of the equality above can only be altered to $\sum_{i=1}^{t b}\left(\sum_{j=1}^{b} \tau_{i, j} x^{i+1}+\mu_{i} x\right) \cdot 2^{i} \pm 2^{i}$. Since $\pm 2^{i}$ is in the constant term and no combination of $\sum_{i=1}^{t b} \pm 2^{i}$ can possibly compute to 0 , we say that $\vec{\tau}$ is legit except for a negligible probability of making a correct guess on challenge $x$.

This $( \pm 1)$ property can be validated by checking if we can successfully extract witnesses $\vec{r}, \vec{\epsilon}$ from transcripts $\vec{r}^{\prime}$ (computed by the verifier following the protocol) and $D$ using extractor $\mathcal{X}_{b-\text { test }}$ s.t. each $r_{i} \in\{0,1\}$. If we can confirm each $r_{i} \in\{0,1\}$, then we know $y_{i}^{*}$ can only be altered by $\pm 1$ since $o_{i}=r_{i}+y_{i}$.

Finally, we check if the extracted circuit witnesses $\vec{r}, \vec{\epsilon}, \vec{\tau}$ match that computed from input witnesses $\vec{b}, \vec{\beta}$ using computeSubCircuitKeys functions, which must be true for a computationally bounded prover except with negligible probability s.t. the dishonest prover made a correct guess on challenge $x$ or else we find a non-trivial d-log relationship between generators $g$, $h$ (for input witnesses) and/or between generators $\vec{u}$ (for circuit witnesses).

## References

1. Ames, S., Hazay, C., Ishai, Y., Venkitasubramaniam, M.: Ligero: Lightweight sublinear arguments without a trusted setup. In: Thuraisingham, B.M., Evans, D., Malkin, T., Xu, D. (eds.) ACM CCS 2017. pp. 20872104. ACM Press, Dallas, TX, USA (Oct 31 - Nov 2, 2017). https://doi.org/10.1145/3133956.3134104
2. Arora, S., Lund, C., Motwani, R., Sudan, M., Szegedy, M.: Proof verification and hardness of approximation problems. In: 33rd FOCS. pp. 14-23. IEEE Computer Society Press, Pittsburgh, PA, USA (Oct 24-27, 1992). https://doi.org/10.1109/SFCS.1992.267823
3. Arora, S., Safra, S.: Probabilistic checking of proofs; A new characterization of NP. In: 33rd FOCS. pp. 2-13. IEEE Computer Society Press, Pittsburgh, PA, USA (Oct 24-27, 1992). https://doi.org/10.1109/SFCS.1992.267824
4. Babai, L., Fortnow, L., Levin, L.A., Szegedy, M.: Checking computations in polylogarithmic time. In: 23rd ACM STOC. pp. 21-31. ACM Press, New Orleans, LA, USA (May 6-8, 1991). https://doi.org/10.1145/103418.103428
5. Babai, L., Fortnow, L., Lund, C.: Non-deterministic exponential time has two-prover interactive protocols. In: 31st FOCS. pp. 16-25. IEEE Computer Society Press, St. Louis, MO, USA (Oct 22-24, 1990). https://doi.org/10.1109/FSCS.1990.89520
6. Baum, C., Malozemoff, A.J., Rosen, M.B., Scholl, P.: Mac'n'cheese: Zero-knowledge proofs for boolean and arithmetic circuits with nested disjunctions. In: Malkin, T., Peikert, C. (eds.) CRYPTO 2021, Part IV. LNCS, vol. 12828, pp. 92-122. Springer, Heidelberg, Germany, Virtual Event (Aug 16-20, 2021). https://doi.org/10.1007/978-3-030-84259-84
7. Ben-Sasson, E., Bentov, I., Horesh, Y., Riabzev, M.: Scalable zero knowledge with no trusted setup. In: Boldyreva, A., Micciancio, D. (eds.) CRYPTO 2019, Part III. LNCS, vol. 11694, pp. 701-732. Springer, Heidelberg, Germany, Santa Barbara, CA, USA (Aug 18-22, 2019). https://doi.org/10.1007/978-3-030-26954-823
8. Bhadauria, R., Fang, Z., Hazay, C., Venkitasubramaniam, M., Xie, T., Zhang, Y.: Ligero++: A new optimized sublinear IOP. In: Ligatti, J., Ou, X., Katz, J., Vigna, G. (eds.) ACM CCS 2020. pp. 20252038. ACM Press, Virtual Event, USA (Nov 9-13, 2020). https://doi.org/10.1145/3372297.3417893
9. Bootle, J., Cerulli, A., Chaidos, P., Groth, J., Petit, C.: Efficient zero-knowledge arguments for arithmetic circuits in the discrete log setting. In: Fischlin, M., Coron, J.S. (eds.) EUROCRYPT 2016, Part II. LNCS, vol. 9666, pp. 327-357. Springer, Heidelberg, Germany, Vienna, Austria (May 8-12, 2016). https://doi.org/10.1007/978-3-662-49896-512
10. Boyle, E., Couteau, G., Gilboa, N., Ishai, Y.: Compressing vector OLE. In: Lie, D., Mannan, M., Backes, M., Wang, X. (eds.) ACM CCS 2018. pp. 896-912. ACM Press, Toronto, ON, Canada (Oct 15-19, 2018). https://doi.org/10.1145/3243734.3243868
11. Boyle, E., Couteau, G., Gilboa, N., Ishai, Y., Kohl, L., Rindal, P., Scholl, P.: Efficient two-round OT extension and silent non-interactive secure computation. In: Cavallaro, L., Kinder, J., Wang, X., Katz, J. (eds.) ACM CCS 2019. pp. 291-308. ACM Press, London, UK (Nov 11-15, 2019). https://doi.org/10.1145/3319535.3354255
12. Boyle, E., Couteau, G., Gilboa, N., Ishai, Y., Kohl, L., Scholl, P.: Efficient pseudorandom correlation generators: Silent OT extension and more. In: Boldyreva, A., Micciancio, D. (eds.) CRYPTO 2019, Part III. LNCS, vol. 11694, pp. 489-518. Springer, Heidelberg, Germany, Santa Barbara, CA, USA (Aug 18-22, 2019). https://doi.org/10.1007/978-3-030-26954-816
13. Bünz, B., Fisch, B., Szepieniec, A.: Transparent SNARKs from DARK compilers. In: Canteaut, A., Ishai, Y. (eds.) EUROCRYPT 2020, Part I. LNCS, vol. 12105, pp. 677-706. Springer, Heidelberg, Germany, Zagreb, Croatia (May 10-14, 2020). https://doi.org/10.1007/978-3-030-45721-124
14. Chen, B., Bünz, B., Boneh, D., Zhang, Z.: HyperPlonk: Plonk with linear-time prover and high-degree custom gates. In: Hazay, C., Stam, M. (eds.) EUROCRYPT 2023, Part II. LNCS, vol. 14005, pp. 499530. Springer, Heidelberg, Germany, Lyon, France (Apr 23-27, 2023). https://doi.org/10.1007/978-3-031-30617-4 7
15. Chiesa, A., Hu, Y., Maller, M., Mishra, P., Vesely, P., Ward, N.P.: Marlin: Preprocessing zkSNARKs with universal and updatable SRS. In: Canteaut, A., Ishai, Y. (eds.) EUROCRYPT 2020, Part I. LNCS, vol. 12105, pp. 738-768. Springer, Heidelberg, Germany, Zagreb, Croatia (May 10-14, 2020). https://doi.org/10.1007/978-3-030-45721-1 ${ }_{2} 6$
16. Cramer, R., Damgård, I.: Zero-knowledge proofs for finite field arithmetic; or: Can zero-knowledge be for free? In: Krawczyk, H. (ed.) CRYPTO'98. LNCS, vol. 1462, pp. 424-441. Springer, Heidelberg, Germany, Santa Barbara, CA, USA (Aug 23-27, 1998). https://doi.org/10.1007/BFb0055745
17. Dittmer, S., Ishai, Y., Lu, S., Ostrovsky, R.: Improving line-point zero knowledge: Two multiplications for the price of one. In: Yin, H., Stavrou, A., Cremers, C., Shi, E. (eds.) ACM CCS 2022. pp. 829-841. ACM Press, Los Angeles, CA, USA (Nov 7-11, 2022). https://doi.org/10.1145/3548606.3559385
18. Dittmer, S., Ishai, Y., Ostrovsky, R.: Line-point zero knowledge and its applications. Cryptology ePrint Archive, Report 2020/1446 (2020), https://eprint.iacr.org/2020/1446
19. Frederiksen, T.K., Nielsen, J.B., Orlandi, C.: Privacy-free garbled circuits with applications to efficient zero-knowledge. In: Oswald, E., Fischlin, M. (eds.) EUROCRYPT 2015, Part II. LNCS, vol. 9057, pp. 191-219. Springer, Heidelberg, Germany, Sofia, Bulgaria (Apr 26-30, 2015). https://doi.org/10.1007/978-3-662-46803-67
20. Gabizon, A., Williamson, Z.J., Ciobotaru, O.: PLONK: Permutations over lagrange-bases for oecumenical noninteractive arguments of knowledge. Cryptology ePrint Archive, Report 2019/953 (2019), https:// eprint.iacr.org/2019/953
21. Giacomelli, I., Madsen, J., Orlandi, C.: ZKBoo: Faster zero-knowledge for Boolean circuits. In: Holz, T., Savage, S. (eds.) USENIX Security 2016. pp. 1069-1083. USENIX Association, Austin, TX, USA (Aug 10-12, 2016)
22. Goldwasser, S., Micali, S., Rackoff, C.: The knowledge complexity of interactive proof-systems (extended abstract). In: 17th ACM STOC. pp. 291-304. ACM Press, Providence, RI, USA (May 6-8, 1985). https://doi.org/10.1145/22145.22178
23. Groth, J., Kohlweiss, M., Maller, M., Meiklejohn, S., Miers, I.: Updatable and universal common reference strings with applications to zk-SNARKs. In: Shacham, H., Boldyreva, A. (eds.) CRYPTO 2018, Part III. LNCS, vol. 10993, pp. 698-728. Springer, Heidelberg, Germany, Santa Barbara, CA, USA (Aug 19-23, 2018). https://doi.org/10.1007/978-3-319-96878-0 $2_{2}$
24. Heath, D., Kolesnikov, V.: Stacked garbling for disjunctive zero-knowledge proofs. In: Canteaut, A., Ishai, Y. (eds.) EUROCRYPT 2020, Part III. LNCS, vol. 12107, pp. 569-598. Springer, Heidelberg, Germany, Zagreb, Croatia (May 10-14, 2020). https://doi.org/10.1007/978-3-030-45727-319
25. Kiayias, A., Tang, Q.: How to keep a secret: leakage deterring public-key cryptosystems. In: Sadeghi, A.R., Gligor, V.D., Yung, M. (eds.) ACM CCS 2013. pp. 943-954. ACM Press, Berlin, Germany (Nov 4-8, 2013). https://doi.org/10.1145/2508859.2516691
26. Maller, M., Bowe, S., Kohlweiss, M., Meiklejohn, S.: Sonic: Zero-knowledge SNARKs from linearsize universal and updatable structured reference strings. In: Cavallaro, L., Kinder, J., Wang, X., Katz, J. (eds.) ACM CCS 2019. pp. 2111-2128. ACM Press, London, UK (Nov 11-15, 2019). https://doi.org/10.1145/3319535.3339817
27. Schnorr, C.P.: Efficient identification and signatures for smart cards. In: Brassard, G. (ed.) CRYPTO'89. LNCS, vol. 435, pp. 239-252. Springer, Heidelberg, Germany, Santa Barbara, CA, USA (Aug 20-24, 1990). https://doi.org/10.1007/0-387-34805-0 22
28. Schoppmann, P., Gascón, A., Reichert, L., Raykova, M.: Distributed vector-OLE: Improved constructions and implementation. In: Cavallaro, L., Kinder, J., Wang, X., Katz, J. (eds.) ACM CCS 2019. pp. 10551072. ACM Press, London, UK (Nov 11-15, 2019). https://doi.org/10.1145/3319535.3363228
29. Setty, S.: Spartan: Efficient and general-purpose zkSNARKs without trusted setup. In: Micciancio, D., Ristenpart, T. (eds.) CRYPTO 2020, Part III. LNCS, vol. 12172, pp. 704-737. Springer, Heidelberg, Germany, Santa Barbara, CA, USA (Aug 17-21, 2020). https://doi.org/10.1007/978-3-030-56877-125
30. Setty, S., Lee, J.: Quarks: Quadruple-efficient transparent zkSNARKs. Cryptology ePrint Archive, Report 2020/1275 (2020), https://eprint.iacr.org/2020/1275
31. Wahby, R.S., Tzialla, I., shelat, a., Thaler, J., Walfish, M.: Doubly-efficient zkSNARKs without trusted setup. Cryptology ePrint Archive, Report 2017/1132 (2017), https://eprint.iacr.org/2017/1132
32. Weng, C., Yang, K., Katz, J., Wang, X.: Wolverine: Fast, scalable, and communication-efficient zeroknowledge proofs for boolean and arithmetic circuits. In: 2021 IEEE Symposium on Security and Privacy. pp. 1074-1091. IEEE Computer Society Press, San Francisco, CA, USA (May 24-27, 2021). https://doi.org/10.1109/SP40001.2021.00056
33. Weng, C., Yang, K., Yang, Z., Xie, X., Wang, X.: AntMan: Interactive zero-knowledge proofs with sublinear communication. In: Yin, H., Stavrou, A., Cremers, C., Shi, E. (eds.) ACM CCS 2022. pp. 2901-2914. ACM Press, Los Angeles, CA, USA (Nov 7-11, 2022). https://doi.org/10.1145/3548606.3560667
34. Yang, K., Sarkar, P., Weng, C., Wang, X.: QuickSilver: Efficient and affordable zero-knowledge proofs for circuits and polynomials over any field. In: Vigna, G., Shi, E. (eds.) ACM CCS 2021. pp. 2986-3001. ACM Press, Virtual Event, Republic of Korea (Nov 15-19, 2021). https://doi.org/10.1145/3460120.3484556
35. Yang, K., Weng, C., Lan, X., Zhang, J., Wang, X.: Ferret: Fast extension for correlated OT with small communication. In: Ligatti, J., Ou, X., Katz, J., Vigna, G. (eds.) ACM CCS 2020. pp. 1607-1626. ACM Press, Virtual Event, USA (Nov 9-13, 2020). https://doi.org/10.1145/3372297.3417276
36. Zhang, J., Xie, T., Zhang, Y., Song, D.: Transparent polynomial delegation and its applications to zero knowledge proof. In: 2020 IEEE Symposium on Security and Privacy. pp. 859-876. IEEE Computer Society Press, San Francisco, CA, USA (May 18-21, 2020). https://doi.org/10.1109/SP40000.2020.00052
