Tezos DAL - Cryptography Explained

A comprehensive guide to understanding the cryptographic foundations of Tezos DAL, including polynomial commitments, Reed-Solomon erasure coding, and KZG proofs.

Introduction
The Core Problem
Solution Overview
Part 1: Polynomial Representation
Part 2: Reed-Solomon Erasure Coding
Part 3: KZG Polynomial Commitments
Part 4: Elliptic Curve Pairings
Part 5: The Complete Flow
Part 6: Security Analysis
Part 7: Why This Design?
Practical Parameters
Conclusion

Introduction

The Tezos Data Availability Layer (DAL) is a critical component enabling Layer 2 scalability. It allows large amounts of data (up to ~4MB per block) to be published and verified without storing everything on-chain permanently.

The cryptographic foundation of DAL combines three powerful techniques:

Polynomial Representation - Data encoded as mathematical polynomials
Reed-Solomon Erasure Coding - Redundancy that allows recovery from partial data
KZG Commitments - Cryptographic proofs using elliptic curve pairings

This guide explains each component and how they work together to create a secure, efficient data availability system.

The Core Problem

The Challenge

Consider a Layer 2 smart rollup that needs to publish transaction data:

Data size: 126 KB per slot
Frequency: Multiple slots per block (up to 32)
Requirements:
- ✅ Data must be available to anyone who needs it
- ✅ Validators must verify data integrity
- ✅ System must tolerate network failures
- ✅ No single validator should be required
- ✅ Efficient bandwidth usage

Why Not Simple Solutions?

Option 1: Store everything on-chain

❌ Too expensive (blockchain bloat)
❌ Not scalable

Option 2: Trust validators to have data

❌ No proof of availability
❌ Validators could lie

Option 3: Send hash of data

❌ Can't verify pieces individually
❌ Need entire data to check hash

Solution: DAL's Cryptographic Approach

✅ Distributed storage (validators hold pieces)
✅ Cryptographic verification (can prove pieces are correct)
✅ Redundancy (can recover from 87.5% data loss)
✅ Efficient (constant-size proofs)

Solution Overview

The DAL Approach

126 KB Data ↓ Represent as Polynomial P(x) ↓ Evaluate at 512 points → 512 shards ↓ Create KZG Commitment (48 bytes) ↓ Distribute shards to validators (each gets 32) ↓ Validators verify their shards against commitment ↓ Validators attest if shards are valid ↓ Anyone can reconstruct data from any 64 shards

Key Properties

Redundancy: 8x (512 shards, only need 64)
Efficiency: Each validator downloads 6.25% (32/512 shards)
Security: Cryptographically verifiable
Scalability: Constant-size commitments (48 bytes)

Part 1: Polynomial Representation

Why Polynomials?

Polynomials have special mathematical properties that make them perfect for data availability:

Can encode arbitrary data as coefficients
Easy to evaluate at different points
Support error correction (Reed-Solomon codes)
Enable cryptographic operations (KZG commitments)

How Data Becomes a Polynomial

Example: Encode 4 bytes of data [10, 20, 30, 40]

Step 1: Treat bytes as polynomial coefficients

P(x) = 10 + 20x + 30x² + 40x³

Step 2: Evaluate at different points to create shards

P(0) = 10
P(1) = 10 + 20(1) + 30(1)² + 40(1)³ = 100
P(2) = 10 + 20(2) + 30(4) + 40(8) = 490
P(3) = 10 + 20(3) + 30(9) + 40(27) = 1420
...

Each evaluation P(i) becomes a shard.

The Magic: Polynomial Interpolation

Fundamental Theorem: A degree-n polynomial is uniquely determined by n+1 points.

Our example polynomial has degree 3, so we need any 4 evaluations to recover it completely.

Lagrange Interpolation Formula:

Given points (x₀, y₀), (x₁, y₁), ..., (xₙ, yₙ), reconstruct:

P(x) = Σᵢ yᵢ · Lᵢ(x)

where Lᵢ(x) = ∏ⱼ≠ᵢ (x - xⱼ) / (xᵢ - xⱼ)

This is how Reed-Solomon decoding works!

In DAL

Original data: 126,944 bytes
Treated as: 64 polynomial coefficients (~1,983 bytes each)
Polynomial degree: 63
Minimum points needed: 64
Total shards created: 512 (8x redundancy)

Part 2: Reed-Solomon Erasure Coding

What is Erasure Coding?

Erasure coding adds redundancy so you can recover data even if parts are lost. Unlike simple replication, it's mathematically optimal.

How Reed-Solomon Works

Step 1: Create the polynomial

Data: D₀, D₁, D₂, ..., D₆₃ (64 chunks)
Polynomial: P(x) = D₀ + D₁x + D₂x² + ... + D₆₃x⁶³

Step 2: Generate shards

Shard₀ = P(0)
Shard₁ = P(1)
Shard₂ = P(2)
...
Shard₅₁₁ = P(511)

Step 3: Distribute shards

Each validator gets 32 shards (deterministically assigned).

Step 4: Recovery

Collect any 64 shards, use Lagrange interpolation to recover P(x), extract coefficients = original data.

Visual Example

Original Data (4 chunks): [A] [B] [C] [D] After Reed-Solomon (8 shards with 2x redundancy): [A] [B] [C] [D] [P1] [P2] [P3] [P4] ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ Original chunks Parity chunks If we lose 50%: [A] [X] [C] [X] [P1] [X] [P3] [X] Can still recover! Any 4 shards = full recovery

Key Properties

Optimal redundancy: Minimum overhead for given fault tolerance
Flexible recovery: Any subset of required size works
No privileged shards: Original and parity shards are mathematically equivalent
Deterministic: Same data → same shards always

In DAL

8x redundancy: 64 original → 512 total shards
Fault tolerance: Can lose up to 448 shards (87.5%)
Recovery threshold: Any 64 shards suffice
Validator load: Each downloads 32 shards (6.25%)

Part 3: KZG Polynomial Commitments

What Problem Do KZG Commitments Solve?

Scenario: Publisher claims to have polynomial P(x) and provides commitment C.

Requirements:

Commitment must cryptographically bind to P(x)
Can't change P(x) after creating C
Can verify individual evaluations P(i) without revealing P(x)
Proofs must be small (constant size)

Traditional Approach: Merkle Trees

Root / \ H12 H34 / \ / \ H1 H2 H3 H4 | | | | S1 S2 S3 S4 (shards)

Problems:

❌ Proof size: O(log n) hashes = 9 hashes for 512 shards
❌ Each hash ~32 bytes = ~288 bytes per proof
❌ No mathematical structure for Reed-Solomon

KZG Approach: Polynomial Commitments

Key Idea: Use elliptic curves to "hide" polynomial evaluation at secret point τ.

Step 1: Trusted Setup

Generate a secret value τ (tau), then destroy it permanently.

Publish "powers of tau" encrypted in elliptic curve groups:

[τ⁰]₁, [τ¹]₁, [τ²]₁, [τ³]₁, ..., [τⁿ]₁

Where [x]₁ means "x encrypted as a point on elliptic curve G₁"

Security: Even with [τⁱ]₁, you can't extract τ (discrete log problem).

In Practice: Community ceremony with 1000+ participants. Only ONE needs to honestly destroy their secret for security.

Step 2: Create Commitment

Given polynomial:

P(x) = a₀ + a₁x + a₂x² + a₃x³

Compute commitment:

C = [P(τ)]₁ = [a₀ + a₁τ + a₂τ² + a₃τ³]₁

We can compute this WITHOUT knowing τ:

C = a₀·[τ⁰]₁ + a₁·[τ¹]₁ + a₂·[τ²]₁ + a₃·[τ³]₁

Result: 48-byte commitment that binds us to P(x).

Step 3: Generate Proof

To prove P(i) = y:

Algebraic trick:

P(i) = y  ⟺  P(x) - y = Q(x)·(x - i)  for some Q(x)

Why? If P(i) = y, then (x - i) is a factor of (P(x) - y).

Compute quotient polynomial:

Q(x) = (P(x) - y) / (x - i)

Create proof:

π = [Q(τ)]₁

Proof size: 48 bytes (constant!)

Step 4: Verify Proof

Verification equation (using pairings):

e(C - [y]₁, [1]₂) = e(π, [τ]₂ - [i]₂)

If equation holds: ✅ Proof valid, shard is correct

If equation fails: ❌ Proof invalid, shard is corrupt

Verification time: ~1 millisecond

Why This Works

Substitute the definitions:

Left side:  e([P(τ) - y]₁, [1]₂)
Right side: e([Q(τ)]₁, [τ - i]₂)

By pairing bilinearity:

e([P(τ) - y]₁, [1]₂) = e([1]₁, [1]₂)^(P(τ) - y)
e([Q(τ)]₁, [τ - i]₂) = e([1]₁, [1]₂)^(Q(τ)·(τ - i))

These are equal if and only if:

P(τ) - y = Q(τ)·(τ - i)

Which is true if and only if P(i) = y! 🎉

KZG Properties

✅ Binding: Can't change P(x) after commitment
✅ Hiding: Commitment doesn't reveal polynomial
✅ Constant size: Commitment and proofs are always 48 bytes
✅ Efficient: Fast verification (~1ms)
✅ Batch-verifiable: Can verify multiple proofs together

In DAL

Commitment format: sh1... (51 characters base58-encoded)
Example: sh1zMUeaXFM8ndMU2FWdpoyhQSWGuKda6EuVqKyW2Gex5AkYLtQ...
Represents: Commitment to 126KB of data
Enables: Verification of 512 individual shards

Part 4: Elliptic Curve Pairings

What is a Pairing?

A pairing is a special mathematical function:

e: G₁ × G₂ → Gₜ

It takes two elliptic curve points (one from group G₁, one from G₂) and outputs a value in target group Gₜ.

The Magic Property: Bilinearity

e([a]₁, [b]₂) = e([1]₁, [1]₂)^(a·b)

What this means:

Multiplication "inside" the groups becomes exponentiation "outside"
Can check equations about encrypted values
Without ever decrypting them!

Example: Checking Multiplication

Goal: Verify that a · b = c without revealing a or b.

Given: [a]₁, [b]₂, [c]ₜ (encrypted values)

Check:

e([a]₁, [b]₂) ?= [c]ₜ

Using bilinearity:

e([a]₁, [b]₂) = e([1]₁, [1]₂)^(a·b)

So this checks if a·b = c!

Why Pairings Enable KZG

Remember we want to verify: P(i) = y

Algebraic identity:

P(i) = y  ⟺  P(τ) - y = Q(τ)·(τ - i)

Check with pairing:

e([P(τ) - y]₁, [1]₂) ?= e([Q(τ)]₁, [τ - i]₂)

Left side: e([1]₁, [1]₂)^(P(τ) - y)
Right side: e([1]₁, [1]₂)^(Q(τ)·(τ - i))

Equal if and only if P(τ) - y = Q(τ)·(τ - i), which is true if and only if P(i) = y!

The BLS12-381 Curve

Tezos DAL uses the BLS12-381 elliptic curve:

Parameters:

Field size: 381 bits
Security level: ~128 bits (equivalent to AES-128)
G₁ point (compressed): 48 bytes
G₂ point (compressed): 96 bytes
Pairing computation: ~1-2 milliseconds

Why BLS12-381?

Optimized for pairing operations
Good performance on modern CPUs
Well-studied and widely used (Ethereum, Zcash, Filecoin)
Strong security guarantees

Pairing-Based Cryptography Security

Based on hardness of:

Discrete logarithm problem in G₁, G₂, Gₜ
Computational Diffie-Hellman
Decisional Diffie-Hellman

Attack complexity: ~2¹²⁸ operations (infeasible)

Quantum resistance: Shor's algorithm reduces to ~2⁶⁴ (still very hard)

Part 5: The Complete Flow

Publishing Flow

┌─────────────────────────────────────────────────────────────┐ │ 1. PREPARE DATA (Publisher/L2) │ ├─────────────────────────────────────────────────────────────┤ │ Original data: 126,944 bytes │ │ Split into 64 chunks (~1,983 bytes each) │ │ Treat as polynomial coefficients │ │ P(x) = c₀ + c₁x + c₂x² + ... + c₆₃x⁶³ │ └─────────────────────────────────────────────────────────────┘ ↓ ┌─────────────────────────────────────────────────────────────┐ │ 2. ENCODE (DAL Node) │ ├─────────────────────────────────────────────────────────────┤ │ Apply Reed-Solomon encoding (8x redundancy) │ │ Evaluate polynomial at 512 points: │ │ Shard₀ = P(0), Shard₁ = P(1), ..., Shard₅₁₁ = P(511) │ └─────────────────────────────────────────────────────────────┘ ↓ ┌─────────────────────────────────────────────────────────────┐ │ 3. COMMIT (Publisher) │ ├─────────────────────────────────────────────────────────────┤ │ Create KZG commitment: │ │ C = [P(τ)]₁ using trusted setup │ │ Result: 48-byte commitment │ │ │ │ Generate proof for each shard: │ │ For each i: πᵢ = [(P(x) - P(i))/(x - i) evaluated at τ] │ └─────────────────────────────────────────────────────────────┘ ↓ ┌─────────────────────────────────────────────────────────────┐ │ 4. PUBLISH TO L1 (On-chain) │ ├─────────────────────────────────────────────────────────────┤ │ Submit commitment to blockchain: │ │ - Commitment: sh1... (51 characters) │ │ - Slot index: 0-31 │ │ - Block level: N │ │ │ │ Cost: ~800 mutez for publish operation │ └─────────────────────────────────────────────────────────────┘ ↓ ┌─────────────────────────────────────────────────────────────┐ │ 5. DISTRIBUTE (P2P Network) │ ├─────────────────────────────────────────────────────────────┤ │ DAL node broadcasts via GossipSub: │ │ - 512 shards │ │ - 512 proofs │ │ │ │ Each validator receives: │ │ - 32 assigned shards (deterministic by PKH/level/slot) │ │ - 32 corresponding proofs │ └─────────────────────────────────────────────────────────────┘

Attestation Flow

┌─────────────────────────────────────────────────────────────┐ │ 6. VERIFY (Validator's DAL Node) │ ├─────────────────────────────────────────────────────────────┤ │ For each received shard i with value vᵢ and proof πᵢ: │ │ │ │ Check pairing equation: │ │ e(C - [vᵢ]₁, [1]₂) ?= e(πᵢ, [τ]₂ - [i]₂) │ │ │ │ If all 32 shards verify: ✅ Ready to attest │ │ If any shard fails: ❌ Do not attest │ │ │ │ Time per verification: ~1ms │ │ Total verification: ~32ms for all shards │ └─────────────────────────────────────────────────────────────┘ ↓ ┌─────────────────────────────────────────────────────────────┐ │ 7. ATTEST (Validator) │ ├─────────────────────────────────────────────────────────────┤ │ 8 blocks after publish (attestation lag): │ │ │ │ If received all assigned shards: │ │ - Sign attestation with companion key (BLS) │ │ - Submit to blockchain │ │ - Earn DAL rewards │ │ │ │ Attestations aggregated into bitset: │ │ dal_attestation = binary representation │ │ Each bit = one slot (1 = attested, 0 = not) │ │ │ │ Threshold: 66% of validators must attest │ └─────────────────────────────────────────────────────────────┘ ↓ ┌─────────────────────────────────────────────────────────────┐ │ 8. RECONSTRUCT (L2 Consumer) │ ├─────────────────────────────────────────────────────────────┤ │ See on-chain: slot attested successfully │ │ │ │ Contact validators, request shards │ │ Collect any 64 shards (out of 512) │ │ │ │ Verify each shard against commitment │ │ │ │ Apply Lagrange interpolation: │ │ Recover P(x) from 64 evaluations │ │ │ │ Extract coefficients = original 126KB data │ └─────────────────────────────────────────────────────────────┘

Timeline Example

Level N: Slot published

Commitment: sh1zMUeaXFM8ndMU2FWdpoyhQ...
Shards distributed via P2P gossip
Validators begin downloading assigned shards

Level N+1 to N+7: Download & verify period

Validators receive shards
Validators verify KZG proofs
Validators store verified shards locally

Level N+8: Attestation due!

Validators check: Did I get all my shards?
If YES: Sign attestation (companion key)
Submit attestations to blockchain
Baker aggregates into bitset

Level N+8 onward: Data available

Bitset shows which slots attested
L2 consumers can reconstruct data
Window for retrieval (limited time)

Part 6: Security Analysis

Attack Scenario 1: Publisher Lies

Attack: Publisher creates commitment C but distributes wrong shards

Defense:

Validator verifies: e(C - [vᵢ]₁, [1]₂) ?= e(πᵢ, [τ]₂ - [i]₂)

If shard is wrong:
  → KZG proof won't verify
  → Validator rejects shard
  → Validator doesn't attest
  → Slot not confirmed

Result: ✅ Attack fails, detected by validators

Attack Scenario 2: Fake Commitment

Attack: Create commitment C' that "commits" to different data

Required:

Break discrete log on BLS12-381
Find different P'(τ) where [P'(τ)]₁ = [P(τ)]₁

Difficulty:

~2¹²⁸ operations
More operations than atoms in observable universe
Would take millions of years with all computing power on Earth

Result: ✅ Computationally infeasible

Attack Scenario 3: Forge Proof

Attack: Create fake proof π for wrong value

Required:

Make equation hold: e(C - [fake_value]₁, [1]₂) = e(π, [τ]₂ - [i]₂)
Without knowing τ

Difficulty:

Would require solving discrete log problem
Or breaking pairing function
Both ~2¹²⁸ security level

Result: ✅ Cryptographically infeasible

Attack Scenario 4: Withhold Shards (Censorship)

Attack: Malicious validators refuse to distribute shards

Defense:

8x redundancy: Need 448/512 = 87.5% malicious validators
Attestation threshold: 66% required
If attack succeeds: Slot not attested (detected on-chain)

Economics:

Malicious validators lose rewards
Attack is expensive (need to control 87.5%)
Attack is detectable (low attestation rate)

Result: ✅ Very expensive, easily detected

Security Summary

Attack Vector	Difficulty	Detection	Result
Fake commitment	2¹²⁸ ops	Automatic	✅ Infeasible
Forge proof	2¹²⁸ ops	Automatic	✅ Infeasible
Wrong shards	Easy	Cryptographic	✅ Detected & rejected
Withhold shards	87.5% control	On-chain metrics	✅ Very expensive
Compromise setup	All 1000+ participants	—	✅ Extremely unlikely

Overall Security: Very high confidence based on:

Strong cryptographic assumptions (discrete log, pairings)
Economic incentives (rewards for honesty)
Redundancy (8x factor)
Distributed trust (no single point of failure)

Part 7: Why This Design?

Decision Matrix

Property	Hash	Merkle	KZG	Verkle	FRI
Proof size	N/A	~300B	48B	~200B	~100KB
Verify time	Fast	Fast	Fast	Fast	Slow
Shard verification	❌	✅	✅	✅	✅
Math structure	❌	❌	✅	✗	✅
Trusted setup	No	No	Yes	Yes	No
Quantum-safe	Yes	Yes	No	No	Yes

Tezos chose KZG because:

Smallest proofs (48 bytes) → lowest bandwidth
Fastest verification (~1ms) → real-time block production
Perfect match with Reed-Solomon → polynomial structure
Battle-tested → used in production (Ethereum, Filecoin)
Trade-off accepted: Trust setup ceremony over quantum resistance

Design Trade-offs

Redundancy Factor (8x chosen)

Factor	Recovery Threshold	Fault Tolerance	Bandwidth
4x	25%	75%	Lower
8x	12.5%	87.5%	Medium
16x	6.25%	93.75%	Higher

Decision: 8x provides excellent fault tolerance without excessive bandwidth

Shard Count (512 chosen)

Count	Shard Size	Granularity	Flexibility
256	~500B	Lower	Lower
512	~248B	Good	Good
1024	~124B	Higher	Higher

Decision: 512 balances shard size, network overhead, and assignment flexibility

Practical Parameters

DAL Configuration (Seoul Protocol)

Slots:

per_block: 32
size_bytes: 126944

Shards:

per_slot: 512
size_bytes: ~248
redundancy_factor: 8
recovery_threshold: 64

Commitments:

type: KZG
curve: BLS12-381
size_bytes: 48

Timing:

attestation_lag: 8 blocks
attestation_window: ~80 seconds

Thresholds:

attestation_required: 66%
participation_sufficient: 64%

Validator Configuration

Per-slot resource requirements:

Download: 32 shards × 248 bytes = ~7,934 bytes
Verification: 32 proofs × 1ms = ~32ms
Storage: Temporary (attestation window only)
Bandwidth: ~8KB per slot

Example calculation for validator with 1% stake:

Expected slots per cycle: ~15
Data per cycle: 15 × 7,934 = ~119KB
Verification time: 15 × 32ms = ~480ms total
Rewards: ~8-10 ꜩ per cycle

Performance Characteristics

Operation	Time	Size
Commitment Creation	~50ms	48 bytes
Proof Generation (per shard)	~10ms	48 bytes
Proof Verification (per shard)	~1ms	48 bytes input
Batch Verification (10 proofs)	~5ms	—
Data Recovery	~100ms	Any 64 shards

Conclusion

What We've Learned

The Tezos Data Availability Layer uses a sophisticated cryptographic stack:

Polynomial Representation: Data encoded as math objects
Reed-Solomon Coding: 8x redundancy enables 87.5% fault tolerance
KZG Commitments: Constant-size proofs using elliptic curves
Pairing-Based Verification: Check proofs without revealing data

Key Takeaways

✅ Efficient: 48-byte commitments, 48-byte proofs, 1ms verification
✅ Secure: Based on well-studied cryptographic assumptions
✅ Scalable: Each validator only handles 6.25% of data
✅ Robust: Tolerates 87.5% data loss
✅ Flexible: Any 64 shards can reconstruct full data

The Big Picture

Traditional Approach:
Every node stores everything → Doesn't scale

DAL Approach:
Distribute data + Cryptographic proofs → Scales!

This same approach is being adopted by:

Ethereum (EIP-4844 proto-danksharding)
Celestia (data availability layer)
Avail (standalone DA)

Tezos was early to production with this technology!

Table of Contents

Introduction

The Core Problem

The Challenge

Why Not Simple Solutions?

Option 1: Store everything on-chain

Option 2: Trust validators to have data

Option 3: Send hash of data

Solution: DAL's Cryptographic Approach

Solution Overview

The DAL Approach

Key Properties

Part 1: Polynomial Representation

Why Polynomials?

How Data Becomes a Polynomial

Step 1: Treat bytes as polynomial coefficients

Step 2: Evaluate at different points to create shards

The Magic: Polynomial Interpolation

In DAL

Part 2: Reed-Solomon Erasure Coding

What is Erasure Coding?

How Reed-Solomon Works

Step 1: Create the polynomial

Step 2: Generate shards

Step 3: Distribute shards

Step 4: Recovery

Visual Example

Key Properties

In DAL

Part 3: KZG Polynomial Commitments

What Problem Do KZG Commitments Solve?

Traditional Approach: Merkle Trees

KZG Approach: Polynomial Commitments

Step 1: Trusted Setup

Step 2: Create Commitment

Step 3: Generate Proof

Step 4: Verify Proof

Why This Works

KZG Properties

In DAL

Part 4: Elliptic Curve Pairings

What is a Pairing?

The Magic Property: Bilinearity

Example: Checking Multiplication

Why Pairings Enable KZG

The BLS12-381 Curve

Pairing-Based Cryptography Security

Part 5: The Complete Flow

Publishing Flow

Attestation Flow

Timeline Example

Level N: Slot published

Level N+1 to N+7: Download & verify period

Level N+8: Attestation due!

Level N+8 onward: Data available

Part 6: Security Analysis

Attack Scenario 1: Publisher Lies

Attack Scenario 2: Fake Commitment

Attack Scenario 3: Forge Proof

Attack Scenario 4: Withhold Shards (Censorship)

Security Summary

Part 7: Why This Design?

Decision Matrix

Design Trade-offs

Redundancy Factor (8x chosen)

Shard Count (512 chosen)

Practical Parameters

DAL Configuration (Seoul Protocol)

Slots:

Shards:

Commitments:

Timing:

Thresholds:

Validator Configuration

Performance Characteristics

Conclusion

What We've Learned

Key Takeaways

The Big Picture

Further Reading