← Back to DALHousie Home

Tezos Data Availability Layer (DAL)

Cryptography Explained

A comprehensive guide to understanding the cryptographic foundations of Tezos DAL, including polynomial commitments, Reed-Solomon erasure coding, and KZG proofs.

Introduction

The Tezos Data Availability Layer (DAL) is a critical component enabling Layer 2 scalability. It allows large amounts of data (up to ~4MB per block) to be published and verified without storing everything on-chain permanently.

The cryptographic foundation of DAL combines three powerful techniques:

  1. Polynomial Representation - Data encoded as mathematical polynomials
  2. Reed-Solomon Erasure Coding - Redundancy that allows recovery from partial data
  3. KZG Commitments - Cryptographic proofs using elliptic curve pairings

This guide explains each component and how they work together to create a secure, efficient data availability system.

The Core Problem

The Challenge

Consider a Layer 2 smart rollup that needs to publish transaction data:

  • Data size: 126 KB per slot
  • Frequency: Multiple slots per block (up to 32)
  • Requirements:
    • ✅ Data must be available to anyone who needs it
    • ✅ Validators must verify data integrity
    • ✅ System must tolerate network failures
    • ✅ No single validator should be required
    • ✅ Efficient bandwidth usage

Why Not Simple Solutions?

Option 1: Store everything on-chain

  • ❌ Too expensive (blockchain bloat)
  • ❌ Not scalable

Option 2: Trust validators to have data

  • ❌ No proof of availability
  • ❌ Validators could lie

Option 3: Send hash of data

  • ❌ Can't verify pieces individually
  • ❌ Need entire data to check hash

Solution: DAL's Cryptographic Approach

  • ✅ Distributed storage (validators hold pieces)
  • ✅ Cryptographic verification (can prove pieces are correct)
  • ✅ Redundancy (can recover from 87.5% data loss)
  • ✅ Efficient (constant-size proofs)

Solution Overview

The DAL Approach

126 KB Data ↓ Represent as Polynomial P(x) ↓ Evaluate at 512 points → 512 shards ↓ Create KZG Commitment (48 bytes) ↓ Distribute shards to validators (each gets 32) ↓ Validators verify their shards against commitment ↓ Validators attest if shards are valid ↓ Anyone can reconstruct data from any 64 shards

Key Properties

  • Redundancy: 8x (512 shards, only need 64)
  • Efficiency: Each validator downloads 6.25% (32/512 shards)
  • Security: Cryptographically verifiable
  • Scalability: Constant-size commitments (48 bytes)

Part 1: Polynomial Representation

Why Polynomials?

Polynomials have special mathematical properties that make them perfect for data availability:

  1. Can encode arbitrary data as coefficients
  2. Easy to evaluate at different points
  3. Support error correction (Reed-Solomon codes)
  4. Enable cryptographic operations (KZG commitments)

How Data Becomes a Polynomial

Example: Encode 4 bytes of data [10, 20, 30, 40]

Step 1: Treat bytes as polynomial coefficients

P(x) = 10 + 20x + 30x² + 40x³

Step 2: Evaluate at different points to create shards

P(0) = 10
P(1) = 10 + 20(1) + 30(1)² + 40(1)³ = 100
P(2) = 10 + 20(2) + 30(4) + 40(8) = 490
P(3) = 10 + 20(3) + 30(9) + 40(27) = 1420
...

Each evaluation P(i) becomes a shard.

The Magic: Polynomial Interpolation

Fundamental Theorem: A degree-n polynomial is uniquely determined by n+1 points.

Our example polynomial has degree 3, so we need any 4 evaluations to recover it completely.

Lagrange Interpolation Formula:

Given points (x₀, y₀), (x₁, y₁), ..., (xₙ, yₙ), reconstruct:

P(x) = Σᵢ yᵢ · Lᵢ(x)

where Lᵢ(x) = ∏ⱼ≠ᵢ (x - xⱼ) / (xᵢ - xⱼ)

This is how Reed-Solomon decoding works!

In DAL

  • Original data: 126,944 bytes
  • Treated as: 64 polynomial coefficients (~1,983 bytes each)
  • Polynomial degree: 63
  • Minimum points needed: 64
  • Total shards created: 512 (8x redundancy)

Part 2: Reed-Solomon Erasure Coding

What is Erasure Coding?

Erasure coding adds redundancy so you can recover data even if parts are lost. Unlike simple replication, it's mathematically optimal.

How Reed-Solomon Works

Step 1: Create the polynomial

Data: D₀, D₁, D₂, ..., D₆₃ (64 chunks)
Polynomial: P(x) = D₀ + D₁x + D₂x² + ... + D₆₃x⁶³

Step 2: Generate shards

Shard₀ = P(0)
Shard₁ = P(1)
Shard₂ = P(2)
...
Shard₅₁₁ = P(511)

Step 3: Distribute shards

Each validator gets 32 shards (deterministically assigned).

Step 4: Recovery

Collect any 64 shards, use Lagrange interpolation to recover P(x), extract coefficients = original data.

Visual Example

Original Data (4 chunks): [A] [B] [C] [D] After Reed-Solomon (8 shards with 2x redundancy): [A] [B] [C] [D] [P1] [P2] [P3] [P4] ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ Original chunks Parity chunks If we lose 50%: [A] [X] [C] [X] [P1] [X] [P3] [X] Can still recover! Any 4 shards = full recovery

Key Properties

  1. Optimal redundancy: Minimum overhead for given fault tolerance
  2. Flexible recovery: Any subset of required size works
  3. No privileged shards: Original and parity shards are mathematically equivalent
  4. Deterministic: Same data → same shards always

In DAL

  • 8x redundancy: 64 original → 512 total shards
  • Fault tolerance: Can lose up to 448 shards (87.5%)
  • Recovery threshold: Any 64 shards suffice
  • Validator load: Each downloads 32 shards (6.25%)

Part 3: KZG Polynomial Commitments

What Problem Do KZG Commitments Solve?

Scenario: Publisher claims to have polynomial P(x) and provides commitment C.

Requirements:

  1. Commitment must cryptographically bind to P(x)
  2. Can't change P(x) after creating C
  3. Can verify individual evaluations P(i) without revealing P(x)
  4. Proofs must be small (constant size)

Traditional Approach: Merkle Trees

Root / \ H12 H34 / \ / \ H1 H2 H3 H4 | | | | S1 S2 S3 S4 (shards)

Problems:

  • ❌ Proof size: O(log n) hashes = 9 hashes for 512 shards
  • ❌ Each hash ~32 bytes = ~288 bytes per proof
  • ❌ No mathematical structure for Reed-Solomon

KZG Approach: Polynomial Commitments

Key Idea: Use elliptic curves to "hide" polynomial evaluation at secret point τ.

Step 1: Trusted Setup

Generate a secret value τ (tau), then destroy it permanently.

Publish "powers of tau" encrypted in elliptic curve groups:

[τ⁰]₁, [τ¹]₁, [τ²]₁, [τ³]₁, ..., [τⁿ]₁

Where [x]₁ means "x encrypted as a point on elliptic curve G₁"

Security: Even with [τⁱ]₁, you can't extract τ (discrete log problem).

In Practice: Community ceremony with 1000+ participants. Only ONE needs to honestly destroy their secret for security.

Step 2: Create Commitment

Given polynomial:

P(x) = a₀ + a₁x + a₂x² + a₃x³

Compute commitment:

C = [P(τ)]₁ = [a₀ + a₁τ + a₂τ² + a₃τ³]₁

We can compute this WITHOUT knowing τ:

C = a₀·[τ⁰]₁ + a₁·[τ¹]₁ + a₂·[τ²]₁ + a₃·[τ³]₁

Result: 48-byte commitment that binds us to P(x).

Step 3: Generate Proof

To prove P(i) = y:

Algebraic trick:

P(i) = y  ⟺  P(x) - y = Q(x)·(x - i)  for some Q(x)

Why? If P(i) = y, then (x - i) is a factor of (P(x) - y).

Compute quotient polynomial:

Q(x) = (P(x) - y) / (x - i)

Create proof:

π = [Q(τ)]₁

Proof size: 48 bytes (constant!)

Step 4: Verify Proof

Verification equation (using pairings):

e(C - [y]₁, [1]₂) = e(π, [τ]₂ - [i]₂)

If equation holds: ✅ Proof valid, shard is correct

If equation fails: ❌ Proof invalid, shard is corrupt

Verification time: ~1 millisecond

Why This Works

Substitute the definitions:

Left side:  e([P(τ) - y]₁, [1]₂)
Right side: e([Q(τ)]₁, [τ - i]₂)

By pairing bilinearity:

e([P(τ) - y]₁, [1]₂) = e([1]₁, [1]₂)^(P(τ) - y)
e([Q(τ)]₁, [τ - i]₂) = e([1]₁, [1]₂)^(Q(τ)·(τ - i))

These are equal if and only if:

P(τ) - y = Q(τ)·(τ - i)

Which is true if and only if P(i) = y! 🎉

KZG Properties

  • Binding: Can't change P(x) after commitment
  • Hiding: Commitment doesn't reveal polynomial
  • Constant size: Commitment and proofs are always 48 bytes
  • Efficient: Fast verification (~1ms)
  • Batch-verifiable: Can verify multiple proofs together

In DAL

  • Commitment format: sh1... (51 characters base58-encoded)
  • Example: sh1zMUeaXFM8ndMU2FWdpoyhQSWGuKda6EuVqKyW2Gex5AkYLtQ...
  • Represents: Commitment to 126KB of data
  • Enables: Verification of 512 individual shards

Part 4: Elliptic Curve Pairings

What is a Pairing?

A pairing is a special mathematical function:

e: G₁ × G₂ → Gₜ

It takes two elliptic curve points (one from group G₁, one from G₂) and outputs a value in target group Gₜ.

The Magic Property: Bilinearity

e([a]₁, [b]₂) = e([1]₁, [1]₂)^(a·b)

What this means:

  • Multiplication "inside" the groups becomes exponentiation "outside"
  • Can check equations about encrypted values
  • Without ever decrypting them!

Example: Checking Multiplication

Goal: Verify that a · b = c without revealing a or b.

Given: [a]₁, [b]₂, [c]ₜ (encrypted values)

Check:

e([a]₁, [b]₂) ?= [c]ₜ

Using bilinearity:

e([a]₁, [b]₂) = e([1]₁, [1]₂)^(a·b)

So this checks if a·b = c!

Why Pairings Enable KZG

Remember we want to verify: P(i) = y

Algebraic identity:

P(i) = y  ⟺  P(τ) - y = Q(τ)·(τ - i)

Check with pairing:

e([P(τ) - y]₁, [1]₂) ?= e([Q(τ)]₁, [τ - i]₂)

Left side: e([1]₁, [1]₂)^(P(τ) - y)
Right side: e([1]₁, [1]₂)^(Q(τ)·(τ - i))

Equal if and only if P(τ) - y = Q(τ)·(τ - i), which is true if and only if P(i) = y!

The BLS12-381 Curve

Tezos DAL uses the BLS12-381 elliptic curve:

Parameters:

  • Field size: 381 bits
  • Security level: ~128 bits (equivalent to AES-128)
  • G₁ point (compressed): 48 bytes
  • G₂ point (compressed): 96 bytes
  • Pairing computation: ~1-2 milliseconds

Why BLS12-381?

  • Optimized for pairing operations
  • Good performance on modern CPUs
  • Well-studied and widely used (Ethereum, Zcash, Filecoin)
  • Strong security guarantees

Pairing-Based Cryptography Security

Based on hardness of:

  1. Discrete logarithm problem in G₁, G₂, Gₜ
  2. Computational Diffie-Hellman
  3. Decisional Diffie-Hellman

Attack complexity: ~2¹²⁸ operations (infeasible)

Quantum resistance: Shor's algorithm reduces to ~2⁶⁴ (still very hard)

Part 5: The Complete Flow

Publishing Flow

┌─────────────────────────────────────────────────────────────┐ │ 1. PREPARE DATA (Publisher/L2) │ ├─────────────────────────────────────────────────────────────┤ │ Original data: 126,944 bytes │ │ Split into 64 chunks (~1,983 bytes each) │ │ Treat as polynomial coefficients │ │ P(x) = c₀ + c₁x + c₂x² + ... + c₆₃x⁶³ │ └─────────────────────────────────────────────────────────────┘ ↓ ┌─────────────────────────────────────────────────────────────┐ │ 2. ENCODE (DAL Node) │ ├─────────────────────────────────────────────────────────────┤ │ Apply Reed-Solomon encoding (8x redundancy) │ │ Evaluate polynomial at 512 points: │ │ Shard₀ = P(0), Shard₁ = P(1), ..., Shard₅₁₁ = P(511) │ └─────────────────────────────────────────────────────────────┘ ↓ ┌─────────────────────────────────────────────────────────────┐ │ 3. COMMIT (Publisher) │ ├─────────────────────────────────────────────────────────────┤ │ Create KZG commitment: │ │ C = [P(τ)]₁ using trusted setup │ │ Result: 48-byte commitment │ │ │ │ Generate proof for each shard: │ │ For each i: πᵢ = [(P(x) - P(i))/(x - i) evaluated at τ] │ └─────────────────────────────────────────────────────────────┘ ↓ ┌─────────────────────────────────────────────────────────────┐ │ 4. PUBLISH TO L1 (On-chain) │ ├─────────────────────────────────────────────────────────────┤ │ Submit commitment to blockchain: │ │ - Commitment: sh1... (51 characters) │ │ - Slot index: 0-31 │ │ - Block level: N │ │ │ │ Cost: ~800 mutez for publish operation │ └─────────────────────────────────────────────────────────────┘ ↓ ┌─────────────────────────────────────────────────────────────┐ │ 5. DISTRIBUTE (P2P Network) │ ├─────────────────────────────────────────────────────────────┤ │ DAL node broadcasts via GossipSub: │ │ - 512 shards │ │ - 512 proofs │ │ │ │ Each validator receives: │ │ - 32 assigned shards (deterministic by PKH/level/slot) │ │ - 32 corresponding proofs │ └─────────────────────────────────────────────────────────────┘

Attestation Flow

┌─────────────────────────────────────────────────────────────┐ │ 6. VERIFY (Validator's DAL Node) │ ├─────────────────────────────────────────────────────────────┤ │ For each received shard i with value vᵢ and proof πᵢ: │ │ │ │ Check pairing equation: │ │ e(C - [vᵢ]₁, [1]₂) ?= e(πᵢ, [τ]₂ - [i]₂) │ │ │ │ If all 32 shards verify: ✅ Ready to attest │ │ If any shard fails: ❌ Do not attest │ │ │ │ Time per verification: ~1ms │ │ Total verification: ~32ms for all shards │ └─────────────────────────────────────────────────────────────┘ ↓ ┌─────────────────────────────────────────────────────────────┐ │ 7. ATTEST (Validator) │ ├─────────────────────────────────────────────────────────────┤ │ 8 blocks after publish (attestation lag): │ │ │ │ If received all assigned shards: │ │ - Sign attestation with companion key (BLS) │ │ - Submit to blockchain │ │ - Earn DAL rewards │ │ │ │ Attestations aggregated into bitset: │ │ dal_attestation = binary representation │ │ Each bit = one slot (1 = attested, 0 = not) │ │ │ │ Threshold: 66% of validators must attest │ └─────────────────────────────────────────────────────────────┘ ↓ ┌─────────────────────────────────────────────────────────────┐ │ 8. RECONSTRUCT (L2 Consumer) │ ├─────────────────────────────────────────────────────────────┤ │ See on-chain: slot attested successfully │ │ │ │ Contact validators, request shards │ │ Collect any 64 shards (out of 512) │ │ │ │ Verify each shard against commitment │ │ │ │ Apply Lagrange interpolation: │ │ Recover P(x) from 64 evaluations │ │ │ │ Extract coefficients = original 126KB data │ └─────────────────────────────────────────────────────────────┘

Timeline Example

Level N: Slot published

  • Commitment: sh1zMUeaXFM8ndMU2FWdpoyhQ...
  • Shards distributed via P2P gossip
  • Validators begin downloading assigned shards

Level N+1 to N+7: Download & verify period

  • Validators receive shards
  • Validators verify KZG proofs
  • Validators store verified shards locally

Level N+8: Attestation due!

  • Validators check: Did I get all my shards?
  • If YES: Sign attestation (companion key)
  • Submit attestations to blockchain
  • Baker aggregates into bitset

Level N+8 onward: Data available

  • Bitset shows which slots attested
  • L2 consumers can reconstruct data
  • Window for retrieval (limited time)

Part 6: Security Analysis

Attack Scenario 1: Publisher Lies

Attack: Publisher creates commitment C but distributes wrong shards

Defense:

Validator verifies: e(C - [vᵢ]₁, [1]₂) ?= e(πᵢ, [τ]₂ - [i]₂)

If shard is wrong:
  → KZG proof won't verify
  → Validator rejects shard
  → Validator doesn't attest
  → Slot not confirmed

Result: ✅ Attack fails, detected by validators

Attack Scenario 2: Fake Commitment

Attack: Create commitment C' that "commits" to different data

Required:

  • Break discrete log on BLS12-381
  • Find different P'(τ) where [P'(τ)]₁ = [P(τ)]₁

Difficulty:

  • ~2¹²⁸ operations
  • More operations than atoms in observable universe
  • Would take millions of years with all computing power on Earth

Result: ✅ Computationally infeasible

Attack Scenario 3: Forge Proof

Attack: Create fake proof π for wrong value

Required:

  • Make equation hold: e(C - [fake_value]₁, [1]₂) = e(π, [τ]₂ - [i]₂)
  • Without knowing τ

Difficulty:

  • Would require solving discrete log problem
  • Or breaking pairing function
  • Both ~2¹²⁸ security level

Result: ✅ Cryptographically infeasible

Attack Scenario 4: Withhold Shards (Censorship)

Attack: Malicious validators refuse to distribute shards

Defense:

  • 8x redundancy: Need 448/512 = 87.5% malicious validators
  • Attestation threshold: 66% required
  • If attack succeeds: Slot not attested (detected on-chain)

Economics:

  • Malicious validators lose rewards
  • Attack is expensive (need to control 87.5%)
  • Attack is detectable (low attestation rate)

Result: ✅ Very expensive, easily detected

Security Summary

Attack Vector Difficulty Detection Result
Fake commitment 2¹²⁸ ops Automatic ✅ Infeasible
Forge proof 2¹²⁸ ops Automatic ✅ Infeasible
Wrong shards Easy Cryptographic ✅ Detected & rejected
Withhold shards 87.5% control On-chain metrics ✅ Very expensive
Compromise setup All 1000+ participants ✅ Extremely unlikely

Overall Security: Very high confidence based on:

  • Strong cryptographic assumptions (discrete log, pairings)
  • Economic incentives (rewards for honesty)
  • Redundancy (8x factor)
  • Distributed trust (no single point of failure)

Part 7: Why This Design?

Decision Matrix

Property Hash Merkle KZG Verkle FRI
Proof size N/A ~300B 48B ~200B ~100KB
Verify time Fast Fast Fast Fast Slow
Shard verification
Math structure
Trusted setup No No Yes Yes No
Quantum-safe Yes Yes No No Yes

Tezos chose KZG because:

  1. Smallest proofs (48 bytes) → lowest bandwidth
  2. Fastest verification (~1ms) → real-time block production
  3. Perfect match with Reed-Solomon → polynomial structure
  4. Battle-tested → used in production (Ethereum, Filecoin)
  5. Trade-off accepted: Trust setup ceremony over quantum resistance

Design Trade-offs

Redundancy Factor (8x chosen)

Factor Recovery Threshold Fault Tolerance Bandwidth
4x 25% 75% Lower
8x 12.5% 87.5% Medium
16x 6.25% 93.75% Higher

Decision: 8x provides excellent fault tolerance without excessive bandwidth

Shard Count (512 chosen)

Count Shard Size Granularity Flexibility
256 ~500B Lower Lower
512 ~248B Good Good
1024 ~124B Higher Higher

Decision: 512 balances shard size, network overhead, and assignment flexibility

Practical Parameters

DAL Configuration (Seoul Protocol)

Slots:

  • per_block: 32
  • size_bytes: 126944

Shards:

  • per_slot: 512
  • size_bytes: ~248
  • redundancy_factor: 8
  • recovery_threshold: 64

Commitments:

  • type: KZG
  • curve: BLS12-381
  • size_bytes: 48

Timing:

  • attestation_lag: 8 blocks
  • attestation_window: ~80 seconds

Thresholds:

  • attestation_required: 66%
  • participation_sufficient: 64%

Validator Configuration

Per-slot resource requirements:

  • Download: 32 shards × 248 bytes = ~7,934 bytes
  • Verification: 32 proofs × 1ms = ~32ms
  • Storage: Temporary (attestation window only)
  • Bandwidth: ~8KB per slot

Example calculation for validator with 1% stake:

  • Expected slots per cycle: ~15
  • Data per cycle: 15 × 7,934 = ~119KB
  • Verification time: 15 × 32ms = ~480ms total
  • Rewards: ~8-10 ꜩ per cycle

Performance Characteristics

Operation Time Size
Commitment Creation ~50ms 48 bytes
Proof Generation (per shard) ~10ms 48 bytes
Proof Verification (per shard) ~1ms 48 bytes input
Batch Verification (10 proofs) ~5ms
Data Recovery ~100ms Any 64 shards

Conclusion

What We've Learned

The Tezos Data Availability Layer uses a sophisticated cryptographic stack:

  1. Polynomial Representation: Data encoded as math objects
  2. Reed-Solomon Coding: 8x redundancy enables 87.5% fault tolerance
  3. KZG Commitments: Constant-size proofs using elliptic curves
  4. Pairing-Based Verification: Check proofs without revealing data

Key Takeaways

  • Efficient: 48-byte commitments, 48-byte proofs, 1ms verification
  • Secure: Based on well-studied cryptographic assumptions
  • Scalable: Each validator only handles 6.25% of data
  • Robust: Tolerates 87.5% data loss
  • Flexible: Any 64 shards can reconstruct full data

The Big Picture

Traditional Approach:
Every node stores everything → Doesn't scale

DAL Approach:
Distribute data + Cryptographic proofs → Scales!

This same approach is being adopted by:

  • Ethereum (EIP-4844 proto-danksharding)
  • Celestia (data availability layer)
  • Avail (standalone DA)

Tezos was early to production with this technology!

Further Reading

Academic Papers:

  • "Constant-Size Commitments to Polynomials and Their Applications" (Kate, Zaverucha, Goldberg, 2010) - Original KZG paper
  • "A Practical Introduction to Polynomial Commitment Schemes" (Gabizon, 2019)
  • "Proofs, Arguments, and Zero-Knowledge" (Thaler, 2023) - Comprehensive reference

Technical Resources:


Document Version: 1.0
Last Updated: October 2025
Author: DAL Technical Documentation Team
License: CC BY 4.0