DDTree

Accelerating Speculative Decoding with Block Diffusion Draft Trees

Liran Ringel, Yaniv Romano

DDTree (Diffusion Draft Tree) builds a draft tree from one block diffusion pass, then verifies the whole tree in one target-model forward pass.

Paper Code

Side by side speed comparison

Autoregressive decoding, DFlash, and DDTree run on the same prompt, with the lower DDTree verification panel showing the draft tree and accepted path.

One DDTree decoding round

A single round starts from the previous bonus token, builds a draft tree, verifies it with one forward pass, and carries the next bonus token forward.

The block diffusion draft model runs one forward pass and outputs per-position distributions for the next block.

Speedups relative to autoregressive decoding

Speedups across datasets, per model and temperature.

HumanEval T=0.0 Qwen3-30B-MoE 8.22x DDTree 6.09x DFlash +2.13x Gain

DFlash DFlash + DDTree Swipe chart

Speedup Relative to Autoregressive Decoding

7.5×

MATH-500

6.6×

GSM8K

7.3×

AIME 2024

7.2×

AIME 2025

6.8×

HumanEval

6.4×

MBPP

6.8×

LiveCodeBench

4.3×

SWE-bench Lite

4.2×

MT-Bench

3.3×

Alpaca

6.6×

MATH-500

6.2×

GSM8K

5.3×

AIME 2024

5.1×

AIME 2025

6.4×

HumanEval

6.1×

MBPP

6.4×

LiveCodeBench

3.7×

SWE-bench Lite

3.9×

MT-Bench

3.2×

Alpaca

7.5×

MATH-500

6.8×

GSM8K

7.3×

AIME 2024

7.0×

AIME 2025

6.9×

HumanEval

6.4×

MBPP

7.1×

LiveCodeBench

4.2×

SWE-bench Lite

4.1×

MT-Bench

3.4×

Alpaca

6.6×

MATH-500

6.3×

GSM8K

5.4×

AIME 2024

5.3×

AIME 2025

6.1×

HumanEval

5.9×

MBPP

6.5×

LiveCodeBench

3.5×

SWE-bench Lite

3.8×

MT-Bench

3.2×

Alpaca

6.2×

MATH-500

5.8×

GSM8K

5.9×

AIME 2024

5.9×

AIME 2025

8.2×

HumanEval

7.7×

MBPP

6.5×

LiveCodeBench

4.4×

SWE-bench Lite

3.3×

MT-Bench

2.5×

Alpaca

5.9×

MATH-500

5.7×

GSM8K

4.9×

AIME 2024

5.0×

AIME 2025

7.9×

HumanEval

7.5×

MBPP

5.5×

LiveCodeBench

3.8×

SWE-bench Lite

3.1×

MT-Bench

2.4×

Alpaca

How DDTree works

Vanilla DFlash already delivers strong acceptance lengths and speedups while keeping drafting cheap by predicting a block in one diffusion pass, but it still verifies only one drafted trajectory per round, so most of that output is never used. DDTree instead uses the drafter's per-position predictions to build a tree of likely continuations rather than collapsing them into a single path.

At each round, DDTree builds a draft tree under a fixed node budget, choosing the branches that look most promising. The target model then verifies the whole tree in one forward pass with tree attention.

The procedure remains lossless: the target model uses its own decoding rule, so DDTree preserves exactly the target model's output distribution. The verifier walks as long as the chosen token matches a child in the tree and commits the matched prefix. When the walk leaves the tree, DDTree stops and carries the first unmatched target token into the next round as the new bonus token.

Budget tradeoff plot on Math500 — **Budget tradeoff**
On MATH-500, acceptance length rises steadily as the DDTree node budget grows, but speedup peaks at an intermediate budget once verifier cost becomes dominant.

Acceptance histogram on MATH-500 — **Acceptance histogram**
On MATH-500, DDTree shifts mass toward longer accepted prefixes, making short acceptances rarer and full-block acceptances substantially more common.

Citation

@article{ringel2026ddtree,
  title={Accelerating Speculative Decoding with Block Diffusion Draft Trees},
  author={Ringel, Liran and Romano, Yaniv},
  journal={arXiv preprint arXiv:2604.12989},
  year={2026}
}