DDTree

Accelerating Speculative Decoding with Block Diffusion Draft Trees

DDTree (Diffusion Draft Tree) builds a draft tree from one block diffusion pass, then verifies the whole tree in one target-model forward pass.

Side by side speed comparison

Autoregressive decoding, DFlash, and DDTree run on the same prompt, with the lower DDTree verification panel showing the draft tree and accepted path.

Slow 1x

One DDTree decoding round

A single round starts from the previous bonus token, builds a draft tree, verifies it with one forward pass, and carries the next bonus token forward.

One DDTree decoding round The previous bonus token enters as the root, a draft tree is selected, the target model scores the whole tree in one forward pass with tree attention, the walk follows matching children, and the next unmatched token becomes the next bonus token. position 1 position 2 position 3 one verifier forward pass context m m m mask tokens pos 1 b c ... pos 2 d e f ... pos 3 g h i ... a b c d e f g h i a' previous bonus next bonus

The block diffusion draft model runs one forward pass and outputs per-position distributions for the next block.

Speedups relative to autoregressive decoding

Speedups across datasets, per model and temperature.

HumanEval T=0.0 Qwen3-30B-MoE 8.22x DDTree 6.09x DFlash +2.13x Gain
DFlash DFlash + DDTree Swipe chart
Speedup Relative to Autoregressive Decoding
7.5×
MATH-500
6.6×
GSM8K
7.3×
AIME 2024
7.2×
AIME 2025
6.8×
HumanEval
6.4×
MBPP
6.8×
LiveCodeBench
4.3×
SWE-bench Lite
4.2×
MT-Bench
3.3×
Alpaca
6.6×
MATH-500
6.2×
GSM8K
5.3×
AIME 2024
5.1×
AIME 2025
6.4×
HumanEval
6.1×
MBPP
6.4×
LiveCodeBench
3.7×
SWE-bench Lite
3.9×
MT-Bench
3.2×
Alpaca
7.5×
MATH-500
6.8×
GSM8K
7.3×
AIME 2024
7.0×
AIME 2025
6.9×
HumanEval
6.4×
MBPP
7.1×
LiveCodeBench
4.2×
SWE-bench Lite
4.1×
MT-Bench
3.4×
Alpaca
6.6×
MATH-500
6.3×
GSM8K
5.4×
AIME 2024
5.3×
AIME 2025
6.1×
HumanEval
5.9×
MBPP
6.5×
LiveCodeBench
3.5×
SWE-bench Lite
3.8×
MT-Bench
3.2×
Alpaca
6.2×
MATH-500
5.8×
GSM8K
5.9×
AIME 2024
5.9×
AIME 2025
8.2×
HumanEval
7.7×
MBPP
6.5×
LiveCodeBench
4.4×
SWE-bench Lite
3.3×
MT-Bench
2.5×
Alpaca
5.9×
MATH-500
5.7×
GSM8K
4.9×
AIME 2024
5.0×
AIME 2025
7.9×
HumanEval
7.5×
MBPP
5.5×
LiveCodeBench
3.8×
SWE-bench Lite
3.1×
MT-Bench
2.4×
Alpaca

How DDTree works

Vanilla DFlash already delivers strong acceptance lengths and speedups while keeping drafting cheap by predicting a block in one diffusion pass, but it still verifies only one drafted trajectory per round, so most of that output is never used. DDTree instead uses the drafter's per-position predictions to build a tree of likely continuations rather than collapsing them into a single path.

At each round, DDTree builds a draft tree under a fixed node budget, choosing the branches that look most promising. The target model then verifies the whole tree in one forward pass with tree attention.

The procedure remains lossless: the target model uses its own decoding rule, so DDTree preserves exactly the target model's output distribution. The verifier walks as long as the chosen token matches a child in the tree and commits the matched prefix. When the walk leaves the tree, DDTree stops and carries the first unmatched target token into the next round as the new bonus token.

Budget tradeoff plot on Math500
Budget tradeoff

On MATH-500, acceptance length rises steadily as the DDTree node budget grows, but speedup peaks at an intermediate budget once verifier cost becomes dominant.

Acceptance histogram on MATH-500
Acceptance histogram

On MATH-500, DDTree shifts mass toward longer accepted prefixes, making short acceptances rarer and full-block acceptances substantially more common.