A comprehensive guide to understanding the cryptographic foundations of Tezos DAL, including polynomial commitments, Reed-Solomon erasure coding, and KZG proofs.
Table of Contents
Introduction
The Tezos Data Availability Layer (DAL) is a critical component enabling Layer 2 scalability. It allows large amounts of data (up to ~4MB per block) to be published and verified without storing everything on-chain permanently.
The cryptographic foundation of DAL combines three powerful techniques:
- Polynomial Representation - Data encoded as mathematical polynomials
- Reed-Solomon Erasure Coding - Redundancy that allows recovery from partial data
- KZG Commitments - Cryptographic proofs using elliptic curve pairings
This guide explains each component and how they work together to create a secure, efficient data availability system.
The Core Problem
The Challenge
Consider a Layer 2 smart rollup that needs to publish transaction data:
- Data size: 126 KB per slot
- Frequency: Multiple slots per block (up to 32)
- Requirements:
- ✅ Data must be available to anyone who needs it
- ✅ Validators must verify data integrity
- ✅ System must tolerate network failures
- ✅ No single validator should be required
- ✅ Efficient bandwidth usage
Why Not Simple Solutions?
Option 1: Store everything on-chain
- ❌ Too expensive (blockchain bloat)
- ❌ Not scalable
Option 2: Trust validators to have data
- ❌ No proof of availability
- ❌ Validators could lie
Option 3: Send hash of data
- ❌ Can't verify pieces individually
- ❌ Need entire data to check hash
Solution: DAL's Cryptographic Approach
- ✅ Distributed storage (validators hold pieces)
- ✅ Cryptographic verification (can prove pieces are correct)
- ✅ Redundancy (can recover from 87.5% data loss)
- ✅ Efficient (constant-size proofs)
Solution Overview
The DAL Approach
Key Properties
- Redundancy: 8x (512 shards, only need 64)
- Efficiency: Each validator downloads 6.25% (32/512 shards)
- Security: Cryptographically verifiable
- Scalability: Constant-size commitments (48 bytes)
Part 1: Polynomial Representation
Why Polynomials?
Polynomials have special mathematical properties that make them perfect for data availability:
- Can encode arbitrary data as coefficients
- Easy to evaluate at different points
- Support error correction (Reed-Solomon codes)
- Enable cryptographic operations (KZG commitments)
How Data Becomes a Polynomial
Example: Encode 4 bytes of data [10, 20, 30, 40]
Step 1: Treat bytes as polynomial coefficients
P(x) = 10 + 20x + 30x² + 40x³
Step 2: Evaluate at different points to create shards
P(0) = 10
P(1) = 10 + 20(1) + 30(1)² + 40(1)³ = 100
P(2) = 10 + 20(2) + 30(4) + 40(8) = 490
P(3) = 10 + 20(3) + 30(9) + 40(27) = 1420
...
Each evaluation P(i) becomes a shard.
The Magic: Polynomial Interpolation
Fundamental Theorem: A degree-n polynomial is uniquely determined by n+1 points.
Our example polynomial has degree 3, so we need any 4 evaluations to recover it completely.
Lagrange Interpolation Formula:
Given points (x₀, y₀), (x₁, y₁), ..., (xₙ, yₙ), reconstruct:
P(x) = Σᵢ yᵢ · Lᵢ(x)
where Lᵢ(x) = ∏ⱼ≠ᵢ (x - xⱼ) / (xᵢ - xⱼ)
This is how Reed-Solomon decoding works!
In DAL
- Original data: 126,944 bytes
- Treated as: 64 polynomial coefficients (~1,983 bytes each)
- Polynomial degree: 63
- Minimum points needed: 64
- Total shards created: 512 (8x redundancy)
Part 2: Reed-Solomon Erasure Coding
What is Erasure Coding?
Erasure coding adds redundancy so you can recover data even if parts are lost. Unlike simple replication, it's mathematically optimal.
How Reed-Solomon Works
Step 1: Create the polynomial
Data: D₀, D₁, D₂, ..., D₆₃ (64 chunks)
Polynomial: P(x) = D₀ + D₁x + D₂x² + ... + D₆₃x⁶³
Step 2: Generate shards
Shard₀ = P(0)
Shard₁ = P(1)
Shard₂ = P(2)
...
Shard₅₁₁ = P(511)
Step 3: Distribute shards
Each validator gets 32 shards (deterministically assigned).
Step 4: Recovery
Collect any 64 shards, use Lagrange interpolation to recover P(x), extract coefficients = original data.
Visual Example
Key Properties
- Optimal redundancy: Minimum overhead for given fault tolerance
- Flexible recovery: Any subset of required size works
- No privileged shards: Original and parity shards are mathematically equivalent
- Deterministic: Same data → same shards always
In DAL
- 8x redundancy: 64 original → 512 total shards
- Fault tolerance: Can lose up to 448 shards (87.5%)
- Recovery threshold: Any 64 shards suffice
- Validator load: Each downloads 32 shards (6.25%)
Part 3: KZG Polynomial Commitments
What Problem Do KZG Commitments Solve?
Scenario: Publisher claims to have polynomial P(x) and provides commitment C.
Requirements:
- Commitment must cryptographically bind to P(x)
- Can't change P(x) after creating C
- Can verify individual evaluations P(i) without revealing P(x)
- Proofs must be small (constant size)
Traditional Approach: Merkle Trees
Problems:
- ❌ Proof size: O(log n) hashes = 9 hashes for 512 shards
- ❌ Each hash ~32 bytes = ~288 bytes per proof
- ❌ No mathematical structure for Reed-Solomon
KZG Approach: Polynomial Commitments
Key Idea: Use elliptic curves to "hide" polynomial evaluation at secret point τ.
Step 1: Trusted Setup
Generate a secret value τ (tau), then destroy it permanently.
Publish "powers of tau" encrypted in elliptic curve groups:
[τ⁰]₁, [τ¹]₁, [τ²]₁, [τ³]₁, ..., [τⁿ]₁
Where [x]₁ means "x encrypted as a point on elliptic curve G₁"
Security: Even with [τⁱ]₁, you can't extract τ (discrete log problem).
In Practice: Community ceremony with 1000+ participants. Only ONE needs to honestly destroy their secret for security.
Step 2: Create Commitment
Given polynomial:
P(x) = a₀ + a₁x + a₂x² + a₃x³
Compute commitment:
C = [P(τ)]₁ = [a₀ + a₁τ + a₂τ² + a₃τ³]₁
We can compute this WITHOUT knowing τ:
C = a₀·[τ⁰]₁ + a₁·[τ¹]₁ + a₂·[τ²]₁ + a₃·[τ³]₁
Result: 48-byte commitment that binds us to P(x).
Step 3: Generate Proof
To prove P(i) = y:
Algebraic trick:
P(i) = y ⟺ P(x) - y = Q(x)·(x - i) for some Q(x)
Why? If P(i) = y, then (x - i) is a factor of (P(x) - y).
Compute quotient polynomial:
Q(x) = (P(x) - y) / (x - i)
Create proof:
π = [Q(τ)]₁
Proof size: 48 bytes (constant!)
Step 4: Verify Proof
Verification equation (using pairings):
e(C - [y]₁, [1]₂) = e(π, [τ]₂ - [i]₂)
If equation holds: ✅ Proof valid, shard is correct
If equation fails: ❌ Proof invalid, shard is corrupt
Verification time: ~1 millisecond
Why This Works
Substitute the definitions:
Left side: e([P(τ) - y]₁, [1]₂)
Right side: e([Q(τ)]₁, [τ - i]₂)
By pairing bilinearity:
e([P(τ) - y]₁, [1]₂) = e([1]₁, [1]₂)^(P(τ) - y)
e([Q(τ)]₁, [τ - i]₂) = e([1]₁, [1]₂)^(Q(τ)·(τ - i))
These are equal if and only if:
P(τ) - y = Q(τ)·(τ - i)
Which is true if and only if P(i) = y! 🎉
KZG Properties
- ✅ Binding: Can't change P(x) after commitment
- ✅ Hiding: Commitment doesn't reveal polynomial
- ✅ Constant size: Commitment and proofs are always 48 bytes
- ✅ Efficient: Fast verification (~1ms)
- ✅ Batch-verifiable: Can verify multiple proofs together
In DAL
- Commitment format:
sh1...(51 characters base58-encoded) - Example:
sh1zMUeaXFM8ndMU2FWdpoyhQSWGuKda6EuVqKyW2Gex5AkYLtQ... - Represents: Commitment to 126KB of data
- Enables: Verification of 512 individual shards
Part 4: Elliptic Curve Pairings
What is a Pairing?
A pairing is a special mathematical function:
e: G₁ × G₂ → Gₜ
It takes two elliptic curve points (one from group G₁, one from G₂) and outputs a value in target group Gₜ.
The Magic Property: Bilinearity
e([a]₁, [b]₂) = e([1]₁, [1]₂)^(a·b)
What this means:
- Multiplication "inside" the groups becomes exponentiation "outside"
- Can check equations about encrypted values
- Without ever decrypting them!
Example: Checking Multiplication
Goal: Verify that a · b = c without revealing a or b.
Given: [a]₁, [b]₂, [c]ₜ (encrypted values)
Check:
e([a]₁, [b]₂) ?= [c]ₜ
Using bilinearity:
e([a]₁, [b]₂) = e([1]₁, [1]₂)^(a·b)
So this checks if a·b = c!
Why Pairings Enable KZG
Remember we want to verify: P(i) = y
Algebraic identity:
P(i) = y ⟺ P(τ) - y = Q(τ)·(τ - i)
Check with pairing:
e([P(τ) - y]₁, [1]₂) ?= e([Q(τ)]₁, [τ - i]₂)
Left side: e([1]₁, [1]₂)^(P(τ) - y)
Right side: e([1]₁, [1]₂)^(Q(τ)·(τ - i))
Equal if and only if P(τ) - y = Q(τ)·(τ - i), which is true if and only if P(i) = y!
The BLS12-381 Curve
Tezos DAL uses the BLS12-381 elliptic curve:
Parameters:
- Field size: 381 bits
- Security level: ~128 bits (equivalent to AES-128)
- G₁ point (compressed): 48 bytes
- G₂ point (compressed): 96 bytes
- Pairing computation: ~1-2 milliseconds
Why BLS12-381?
- Optimized for pairing operations
- Good performance on modern CPUs
- Well-studied and widely used (Ethereum, Zcash, Filecoin)
- Strong security guarantees
Pairing-Based Cryptography Security
Based on hardness of:
- Discrete logarithm problem in G₁, G₂, Gₜ
- Computational Diffie-Hellman
- Decisional Diffie-Hellman
Attack complexity: ~2¹²⁸ operations (infeasible)
Quantum resistance: Shor's algorithm reduces to ~2⁶⁴ (still very hard)
Part 5: The Complete Flow
Publishing Flow
Attestation Flow
Timeline Example
Level N: Slot published
- Commitment: sh1zMUeaXFM8ndMU2FWdpoyhQ...
- Shards distributed via P2P gossip
- Validators begin downloading assigned shards
Level N+1 to N+7: Download & verify period
- Validators receive shards
- Validators verify KZG proofs
- Validators store verified shards locally
Level N+8: Attestation due!
- Validators check: Did I get all my shards?
- If YES: Sign attestation (companion key)
- Submit attestations to blockchain
- Baker aggregates into bitset
Level N+8 onward: Data available
- Bitset shows which slots attested
- L2 consumers can reconstruct data
- Window for retrieval (limited time)
Part 6: Security Analysis
Attack Scenario 1: Publisher Lies
Attack: Publisher creates commitment C but distributes wrong shards
Defense:
Validator verifies: e(C - [vᵢ]₁, [1]₂) ?= e(πᵢ, [τ]₂ - [i]₂)
If shard is wrong:
→ KZG proof won't verify
→ Validator rejects shard
→ Validator doesn't attest
→ Slot not confirmed
Result: ✅ Attack fails, detected by validators
Attack Scenario 2: Fake Commitment
Attack: Create commitment C' that "commits" to different data
Required:
- Break discrete log on BLS12-381
- Find different P'(τ) where [P'(τ)]₁ = [P(τ)]₁
Difficulty:
- ~2¹²⁸ operations
- More operations than atoms in observable universe
- Would take millions of years with all computing power on Earth
Result: ✅ Computationally infeasible
Attack Scenario 3: Forge Proof
Attack: Create fake proof π for wrong value
Required:
- Make equation hold: e(C - [fake_value]₁, [1]₂) = e(π, [τ]₂ - [i]₂)
- Without knowing τ
Difficulty:
- Would require solving discrete log problem
- Or breaking pairing function
- Both ~2¹²⁸ security level
Result: ✅ Cryptographically infeasible
Attack Scenario 4: Withhold Shards (Censorship)
Attack: Malicious validators refuse to distribute shards
Defense:
- 8x redundancy: Need 448/512 = 87.5% malicious validators
- Attestation threshold: 66% required
- If attack succeeds: Slot not attested (detected on-chain)
Economics:
- Malicious validators lose rewards
- Attack is expensive (need to control 87.5%)
- Attack is detectable (low attestation rate)
Result: ✅ Very expensive, easily detected
Security Summary
| Attack Vector | Difficulty | Detection | Result |
|---|---|---|---|
| Fake commitment | 2¹²⁸ ops | Automatic | ✅ Infeasible |
| Forge proof | 2¹²⁸ ops | Automatic | ✅ Infeasible |
| Wrong shards | Easy | Cryptographic | ✅ Detected & rejected |
| Withhold shards | 87.5% control | On-chain metrics | ✅ Very expensive |
| Compromise setup | All 1000+ participants | — | ✅ Extremely unlikely |
Overall Security: Very high confidence based on:
- Strong cryptographic assumptions (discrete log, pairings)
- Economic incentives (rewards for honesty)
- Redundancy (8x factor)
- Distributed trust (no single point of failure)
Part 7: Why This Design?
Decision Matrix
| Property | Hash | Merkle | KZG | Verkle | FRI |
|---|---|---|---|---|---|
| Proof size | N/A | ~300B | 48B | ~200B | ~100KB |
| Verify time | Fast | Fast | Fast | Fast | Slow |
| Shard verification | ❌ | ✅ | ✅ | ✅ | ✅ |
| Math structure | ❌ | ❌ | ✅ | ✗ | ✅ |
| Trusted setup | No | No | Yes | Yes | No |
| Quantum-safe | Yes | Yes | No | No | Yes |
Tezos chose KZG because:
- Smallest proofs (48 bytes) → lowest bandwidth
- Fastest verification (~1ms) → real-time block production
- Perfect match with Reed-Solomon → polynomial structure
- Battle-tested → used in production (Ethereum, Filecoin)
- Trade-off accepted: Trust setup ceremony over quantum resistance
Design Trade-offs
Redundancy Factor (8x chosen)
| Factor | Recovery Threshold | Fault Tolerance | Bandwidth |
|---|---|---|---|
| 4x | 25% | 75% | Lower |
| 8x | 12.5% | 87.5% | Medium |
| 16x | 6.25% | 93.75% | Higher |
Decision: 8x provides excellent fault tolerance without excessive bandwidth
Shard Count (512 chosen)
| Count | Shard Size | Granularity | Flexibility |
|---|---|---|---|
| 256 | ~500B | Lower | Lower |
| 512 | ~248B | Good | Good |
| 1024 | ~124B | Higher | Higher |
Decision: 512 balances shard size, network overhead, and assignment flexibility
Practical Parameters
DAL Configuration (Seoul Protocol)
Slots:
- per_block: 32
- size_bytes: 126944
Shards:
- per_slot: 512
- size_bytes: ~248
- redundancy_factor: 8
- recovery_threshold: 64
Commitments:
- type: KZG
- curve: BLS12-381
- size_bytes: 48
Timing:
- attestation_lag: 8 blocks
- attestation_window: ~80 seconds
Thresholds:
- attestation_required: 66%
- participation_sufficient: 64%
Validator Configuration
Per-slot resource requirements:
- Download: 32 shards × 248 bytes = ~7,934 bytes
- Verification: 32 proofs × 1ms = ~32ms
- Storage: Temporary (attestation window only)
- Bandwidth: ~8KB per slot
Example calculation for validator with 1% stake:
- Expected slots per cycle: ~15
- Data per cycle: 15 × 7,934 = ~119KB
- Verification time: 15 × 32ms = ~480ms total
- Rewards: ~8-10 ꜩ per cycle
Performance Characteristics
| Operation | Time | Size |
|---|---|---|
| Commitment Creation | ~50ms | 48 bytes |
| Proof Generation (per shard) | ~10ms | 48 bytes |
| Proof Verification (per shard) | ~1ms | 48 bytes input |
| Batch Verification (10 proofs) | ~5ms | — |
| Data Recovery | ~100ms | Any 64 shards |
Conclusion
What We've Learned
The Tezos Data Availability Layer uses a sophisticated cryptographic stack:
- Polynomial Representation: Data encoded as math objects
- Reed-Solomon Coding: 8x redundancy enables 87.5% fault tolerance
- KZG Commitments: Constant-size proofs using elliptic curves
- Pairing-Based Verification: Check proofs without revealing data
Key Takeaways
- ✅ Efficient: 48-byte commitments, 48-byte proofs, 1ms verification
- ✅ Secure: Based on well-studied cryptographic assumptions
- ✅ Scalable: Each validator only handles 6.25% of data
- ✅ Robust: Tolerates 87.5% data loss
- ✅ Flexible: Any 64 shards can reconstruct full data
The Big Picture
Traditional Approach:
Every node stores everything → Doesn't scale
DAL Approach:
Distribute data + Cryptographic proofs → Scales!
This same approach is being adopted by:
- Ethereum (EIP-4844 proto-danksharding)
- Celestia (data availability layer)
- Avail (standalone DA)
Tezos was early to production with this technology!
Further Reading
Academic Papers:
- "Constant-Size Commitments to Polynomials and Their Applications" (Kate, Zaverucha, Goldberg, 2010) - Original KZG paper
- "A Practical Introduction to Polynomial Commitment Schemes" (Gabizon, 2019)
- "Proofs, Arguments, and Zero-Knowledge" (Thaler, 2023) - Comprehensive reference
Technical Resources:
Document Version: 1.0
Last Updated: October 2025
Author: DAL Technical Documentation Team
License: CC BY 4.0