Problem: calculate $∑_{i=0}k_{i}P_{i}$, where $k_{i}$ is a scalar and $P_{i}$ is point on an EC.

Scalar size: $b$ bits

## Naive

Complexity: $N(b−1)$ squaring and $N(b−1)$ point additions

MSM can be divided into two main parts:

- Modular multiplication
- Point additions

## Pippenger Bucketed Approach

Squaring: double add, i.e.

`2*P`

, where P is an EC point.

- Divide scalar into windows of m, each with $w$ bits.
- calculate: $P=∑_{i}k_{i}P_{i}=∑_{j}2_{cj}(∑_{i}k_{ij}P_{i})=∑_{j}2_{cj}B_{j}$
- $B_{j}=∑_{i}k_{ij}P_{i}=∑_{2_{w}−1}λ∑_{u(λ)}P_{u}$
- Example: c=3, j=15, $B_{1}=4P_{1}+3P_{2}+5P_{3}+1P_{4}+4P_{5}+6P_{7}+6P_{8}+…+3P_{14}+5P_{15}$
- Now, group points with same coefficients together, i.e. create buckets of $2_{c}−1$, and $B_{j}=∑_{λ}λS_{jλ}$
- Take partial sums:

- Sum each window.

Complexity: $cb (2_{c}+N+1)$ additions +. $b$ squarings

- Part 1: window result calculation: $(2_{c}+N)b/c$
- Part 2: sum over all windows: each iteration require $1$ addition and $c$ squarings. $b/c$ windows → $cb +cb ∗c$ squarings

## Point Addition Optimisations

Majority of complexity is due to point additions. Point addition complexity for affine coordinates is 1 division, 2 multiplications, 6 additions on $F$. Division is very costly.

Use projective coordinates. Division can be deferred to when there is need for switch back to affine coordinates.

Complexity: 7 mults, 4 squarings, 9 additions, 3 mults by 2, 1 by 1. But no division is required.

### Batch Affine

Can be used in cases where aim is to find: $G_{i}=P_{i}+Q_{i}$ where P, Q are points on EC.

Denote: $a_{i}=x_{i,2}−x_{i,1}$

- $s$: $∏_{i=1}a_{i}$
- $l_{i}$: $∏_{j=1}a_{j}$
- $r_{i}$: $∏_{j=i+1}a_{j}$
- $G_{i}=x_{i,2}−x_{i,1}1 =s∗l_{i}∗r_{i}$

### GLV with Endomorphism

Goal: accelerate single point scalar mult $k∗P$

#### Naive Approach

- Divide into windows of c bits
- pre-compute $i∗P∀i∈[0,…,2_{c}−1]$
- compute $k_{i}$ for each window $i$
- $d=b/c$
- $R=0$
- For i from d-1 to 0:
- $R=2_{c}∗R$ (require c squaring)
- $R=R+(k_{i}∗P)$ (require 1 point addition)

- return $R$

Complexity: $2_{c}(precompute)+d$ point additions and $c∗d=b$ squarings

#### Endomorphism

For some curves like BN254, we can use cube roots of unity to find new points on the curve for same y.

Equation of BN254:

`y^2=x^3+b`

.

So, $k∗P$ can be divided into $k_{1}∗P_{1}+k_{2}∗P_{2}$, where

- $k=k_{1}+(s∗k_{2})$
- $P_{2}=s∗P_{1}$

This reduces no of point doublings by half.

$k_{1}$ and $k_{2}$ are found using barret-reduction techniques.

Complexity: b/2 squaring and $d/2+2_{c+1}$ additions

### WNAF (windowed Non-adjacent form)

Most of the time in MSM is spent in additions, and number of additions depend on hamming weight of the scalar. Less number of additions will have to be done if the number of 1’s in the scalar bits is less.

That’s exactly what NAF is used for in terms of MSM. Instead of representing number in bits 0, 1. It is represented in {-1, 0, 1}. This reduces the number of number of non-zero bits in the number by 1/3rd.

Also, when the b-bit scalars are divided into c-bit slices, value of each slice is used as bucket index. In total, we need $2_{c}−1$ buckets. Using NAF, we map $0,1,…,2_{c}−1$ to $−2_{c−1},…,−1,0,1,…,2_{c−1}$. For example, for slice $s_{i}∈1,2,3,4,5,6,7$ that needs $2_{3}−1=7$ buckets in total, if we map $s_{i}$ to $−4,−3,−2,−1,1,2,3$, only $2_{(}3−1)=4$ buckets are needed.

Pseudocode to find NAF:

Input: I = $(b_{n−1},b_{n−1},…,b_{0})$ Output O = $(ω_{n−1},ω_{n−2},…,ω_{0})$

```
i=0
while I > 0:
if I is odd:
w(i) = I mod 2^2
I = I - w(i)
else:
w(i) = 0
I = I/2
i = i+1
return O
```

Let’s take a scalar $s$, with c-bit window size, $L=2_{c}$. Split scalar into K slices, where $K=ceil(b/c)$.

$s=s_{0}+s_{1}L+…+s_{K−1}L_{K−1}=i∑ L_{i}K_{i}$Now, to take advantage of wnaf, use following relation:

$s_{i}L_{i}+s_{i+1}L_{i+1}=(s_{i}−L)L_{i}+(s_{i+1}+1)L_{i+1}$i.e. subtract $2_{c}$ from current slice and add 1 to next slice, whenever $s_{i}>=L/2$. After this, slice comes in range $[−L/2,L/2)$. To convert it into array form, whenever sign of $s_{i}$ is negative, subtract it from ith index in bucket array.

```
sequenceDiagram
Prover->>Verifier: Hello
```

## Resources

- zkStudyClub: Multi-scalar multiplication
- Known optimisation for MSM
- Aztec’s wnaf
- Optimizing Multi-Scalar Multiplication (MSM): Learning from ZPRIZE
- EC arithmetic
- Hardware Review: GPUs , FPGAs and Zero Knowledge Proofs
- Optimisation of MSM
- Accelerating the PlonK zkSNARK Proving System using GPU Architectures
- FPGA Acceleration of Multi-Scalar Multiplication: CycloneMSM
- EdMSM: Multi-Scalar-Multiplication for SNARKs and Faster Montgomery multiplication
- cuZK: Accelerating Zero-Knowledge Proof with A Faster Parallel Multi-Scalar Multiplication Algorithm on GPUs
- PipeMSM: Hardware Acceleration for Multi-Scalar Multiplication
- DJB’s Pippenger logs