Problem: calculate , where is a scalar and is point on an EC.
Scalar size: bits
Naive
Complexity: squaring and point additions
MSM can be divided into two main parts:
- Modular multiplication
- Point additions
Pippenger Bucketed Approach
Squaring: double add, i.e. , where P is an EC point.
- Divide scalar into windows of m, each with bits.
- calculate:
- Example: c=3, j=15,
- Now, group points with same coefficients together, i.e. create buckets of , and
- Take partial sums:
- Sum each window.
Complexity: additions +. squarings
- Part 1: window result calculation:
- Part 2: sum over all windows: each iteration require addition and squarings. windows → squarings
Point Addition Optimisations
Majority of complexity is due to point additions. Point addition complexity for affine coordinates is 1 division, 2 multiplications, 6 additions on . Division is very costly.
Use projective coordinates. Division can be deferred to when there is need for switch back to affine coordinates.
Complexity: 7 mults, 4 squarings, 9 additions, 3 mults by 2, 1 by 1. But no division is required.
Batch Affine
Can be used in cases where aim is to find: where P, Q are points on EC.
Denote:
- :
- :
- :
GLV with Endomorphism
Goal: accelerate single point scalar mult
Naive Approach
- Divide into windows of c bits
- pre-compute
- compute for each window
- For i from d-1 to 0:
- (require c squaring)
- (require 1 point addition)
- return
Complexity: point additions and squarings
Endomorphism
For some curves like BN254, we can use cube roots of unity to find new points on the curve for same y.
Equation of BN254: .
So, can be divided into , where
This reduces no of point doublings by half.
and are found using barret-reduction techniques.
Complexity: b/2 squaring and additions
WNAF (windowed Non-adjacent form)
Most of the time in MSM is spent in additions, and number of additions depend on hamming weight of the scalar. Less number of additions will have to be done if the number of 1’s in the scalar bits is less.
That’s exactly what NAF is used for in terms of MSM. Instead of representing number in bits 0, 1. It is represented in {-1, 0, 1}. This reduces the number of number of non-zero bits in the number by 1/3rd.
Also, when the b-bit scalars are divided into c-bit slices, value of each slice is used as bucket index. In total, we need buckets. Using NAF, we map to . For example, for slice that needs buckets in total, if we map to , only buckets are needed.
Pseudocode to find NAF:
Input: I = Output O =
i=0
while I > 0:
if I is odd:
w(i) = I mod 2^2
I = I - w(i)
else:
w(i) = 0
I = I/2
i = i+1
return O
Let’s take a scalar , with c-bit window size, . Split scalar into K slices, where .
Now, to take advantage of wnaf, use following relation:
i.e. subtract from current slice and add 1 to next slice, whenever . After this, slice comes in range . To convert it into array form, whenever sign of is negative, subtract it from ith index in bucket array.
sequenceDiagram
Prover->>Verifier: Hello
Resources
- zkStudyClub: Multi-scalar multiplication
- Known optimisation for MSM
- Aztec’s wnaf
- Optimizing Multi-Scalar Multiplication (MSM): Learning from ZPRIZE
- EC arithmetic
- Hardware Review: GPUs , FPGAs and Zero Knowledge Proofs
- Optimisation of MSM
- Accelerating the PlonK zkSNARK Proving System using GPU Architectures
- FPGA Acceleration of Multi-Scalar Multiplication: CycloneMSM
- EdMSM: Multi-Scalar-Multiplication for SNARKs and Faster Montgomery multiplication
- cuZK: Accelerating Zero-Knowledge Proof with A Faster Parallel Multi-Scalar Multiplication Algorithm on GPUs
- PipeMSM: Hardware Acceleration for Multi-Scalar Multiplication
- DJB’s Pippenger logs