Skip to content

Simplify block production (leanSpec PR #464)#251

Draft
pablodeymo wants to merge 1 commit intomainfrom
simplify-block-production
Draft

Simplify block production (leanSpec PR #464)#251
pablodeymo wants to merge 1 commit intomainfrom
simplify-block-production

Conversation

@pablodeymo
Copy link
Collaborator

Motivation

leanSpec PR #464 ("simplify block production") rewrites build_block() to work directly with aggregated payloads keyed by AttestationData instead of individual Attestation objects. This eliminates an unnecessary round-trip: the old approach extracted individual attestations from aggregated payloads, filtered them, then re-aggregated. The new approach filters and selects proofs directly from the payload map.

Spec PR: leanEthereum/leanSpec#464

Description

Storage layer (crates/storage/src/store.rs)

  • PayloadBuffer::grouped_by_data_root() — New method that groups buffer entries by data_root (the tree hash root of AttestationData), deduplicating proofs by comparing their participant bitfield bytes. Returns HashMap<H256, Vec<AggregatedSignatureProof>>.
  • Store::known_payloads_by_data_root() — Public wrapper exposing the grouped payloads to the blockchain crate.
  • Added AggregatedSignatureProof to the import list.

Blockchain layer (crates/blockchain/src/store.rs)

New: extend_proofs_greedily()

Greedy set-cover algorithm that selects proofs maximizing new validator coverage for a given attestation data entry. Each selected proof produces one AggregatedAttestation in the block body. This replaces both aggregate_attestations_by_data() and select_aggregated_proofs().

Rewritten: build_block()

Before: Took &[Attestation] (individual per-validator attestations) and a HashMap<SignatureKey, Vec<AggregatedSignatureProof>> keyed by (validator_id, data_root). Used a fixed-point loop to collect individual attestations, then called aggregate_attestations_by_data() to group them, then select_aggregated_proofs() to find proofs. Returned (Block, State, Vec<AggregatedSignatureProof>).

After: Takes HashMap<H256, (AttestationData, Vec<AggregatedSignatureProof>)> keyed by data_root. Sorts entries by target.slot for deterministic order, filters by known block roots and source checkpoint match, then calls extend_proofs_greedily() directly. Handles genesis edge case (slot 0 justified root = parent_root). Loops when justification advances to unlock attestation data with new source checkpoints. Returns (Block, Vec<AggregatedSignatureProof>).

Simplified: produce_block_with_signatures()

Before: Called iter_known_aggregated_payloads(), then extract_latest_attestations() to reconstruct per-validator Attestation objects, then re-keyed the payloads by SignatureKey for build_block().

After: Calls known_payloads_by_data_root() to get deduplicated proofs grouped by data_root, resolves AttestationData for each via get_attestation_data_by_root(), and passes the result directly to build_block().

Routing change: aggregate_committee_signatures()

Own aggregation output now goes directly to known_payloads (immediately usable for block building and fork choice) instead of new_payloads. Gossip-received aggregated attestations still go through new -> known promotion at intervals 0/4.

Deleted

  • aggregate_attestations_by_data() — Functionality absorbed into extend_proofs_greedily()
  • select_aggregated_proofs() — Functionality absorbed into extend_proofs_greedily()

Backward Compatibility

  • Block format: Unchanged. Wire-compatible.
  • Storage format: Unchanged. No migration needed.
  • Public API: produce_block_with_signatures() same signature and return type.
  • Gossip handling: on_gossip_attestation, on_gossip_aggregated_attestation, on_block_core unchanged.
  • Fork choice: extract_latest_attestations and related methods unchanged.

How to Test

make fmt    # ✅ Passes
make lint   # ✅ Passes
make test   # ✅ All 107 tests pass (27 forkchoice, 8 signature, 14 STF, rest unit)

For full validation:

  • make run-devnet with --is-aggregator: verify blocks contain attestations, justified_slot and finalized_slot advance
  • Multi-client devnet: .claude/skills/test-pr-devnet/scripts/test-branch.sh

…leanSpec PR #464)

Rewrite build_block() to filter and select proofs directly from a
payload map keyed by data_root, eliminating the round-trip of extracting
individual attestations from aggregated payloads, filtering them, then
re-aggregating. This matches the simplified approach in leanSpec.

Key changes:
- Add PayloadBuffer::grouped_by_data_root() for proof deduplication
- Replace aggregate_attestations_by_data() + select_aggregated_proofs()
  with extend_proofs_greedily() greedy set-cover selection
- Simplify produce_block_with_signatures() to resolve AttestationData
  per data_root instead of reconstructing per-validator attestations
- Route own aggregation output directly to known_payloads (immediately
  usable for block building) instead of going through new -> known
@github-actions
Copy link

🤖 Codex Code Review

  1. High: block production can now include multiple votes from the same validator across different attestation data, which can over-count justification/finalization weight.

    The old path reduced the pool to the latest attestation per validator before block assembly via crates/storage/src/store.rs:867. This PR replaces that with known_payloads_by_data_root() in crates/blockchain/src/store.rs:798 and crates/storage/src/store.rs:151, then build_block greedily includes proofs per data_root without any global per-validator dedup across roots in crates/blockchain/src/store.rs:948 and crates/blockchain/src/store.rs:1037. If validator v first attests to target A and later to target B, both roots can now be packed into one block. process_attestations counts votes independently per target root in crates/blockchain/state_transition/src/lib.rs:283, so the same validator can contribute to multiple checkpoint justifications in the same block. For consensus code, that is a correctness bug. The fix is to restore a latest-per-validator filter before block assembly, or maintain a global “already represented validators” set across all selected proofs.

I did not run the test suite: cargo/rustup could not create temp/toolchain files in this sandbox. A regression test for “same validator has two known attestations on different roots; only the latest may be included in a produced block” would be important here.


Automated review by OpenAI Codex · gpt-5.4 · custom prompt

@github-actions
Copy link

🤖 Claude Code Review

Here is the review:


Review: Simplify block production (leanSpec PR #464)

The refactoring is well-motivated and the logic is substantially cleaner. The core ideas are sound. A few observations below.


Non-determinism in extend_proofs_greedily (blockchain/store.rs:959)

let mut remaining_indices: HashSet<usize> = (0..proofs.len()).collect();

HashSet<usize> iteration order is non-deterministic due to Rust's hash randomization. When two proofs have equal new coverage, max_by_key breaks ties by the last element encountered in iteration order — which is undefined. Two proposers building on the same head could include different proof subsets.

Since proofs arrives ordered by insertion (FIFO from PayloadBuffer), using Vec<usize> for remaining_indices and swapping out exhausted entries would preserve insertion order and make selection deterministic.


Sort instability when target.slot ties (blockchain/store.rs:1032)

sorted_entries.sort_by_key(|(_, (data, _))| data.target.slot);

When two entries share the same target.slot, their relative order depends on HashMap's iteration order. Adding a secondary key (e.g., data_root) makes the sort fully deterministic:

sorted_entries.sort_by_key(|(data_root, (data, _))| (data.target.slot, *data_root));

Pre-existing metrics double-count (not introduced here, but worth noting)

inc_pq_sig_aggregated_signatures and inc_pq_sig_attestations_in_aggregated_signatures are called in both:

  • aggregate_committee_signatures at lines 202–203 (aggregation time)
  • extend_proofs_greedily at lines 988–989 (inclusion time)

This double-counting was present before this PR (select_aggregated_proofs also called these metrics). The counter semantics are ambiguous — "aggregated" could mean "was aggregated" or "was included in a block". Worth resolving separately, but not introduced by this change.


Silent drop of unresolvable data roots (blockchain/store.rs:805)

.filter_map(|(data_root, proofs)| {
    let data = store.get_attestation_data_by_root(&data_root)?;
    Some((data_root, (data, proofs)))
})

Missing attestation data silently drops valid proofs. This is acceptable in practice (the payload was stored without its data, which shouldn't happen), but a warn! log here would surface inconsistencies in devnet debugging without overhead in the happy path.


Routing change correctness: own aggregation → known directly

aggregate_committee_signatures now inserts directly to known_payloads, bypassing the new → known promotion cycle. This is correct: the promotion cycle exists to batch-accept gossip-received attestations at intervals 0/4; own aggregations at interval 2 have no reason to wait. Fork choice weights now reflect own attestations immediately after interval 2. The comment at line 206–208 correctly documents this distinction.


Genesis edge case handling (blockchain/store.rs:1019)

The pre-computation of current_justified for the genesis case is correct. When head_state.latest_block_header.slot == 0, process_block_header will update the justified root to parent_root, so pre-loading that root ensures the first iteration's source filter matches what the state transition will produce.


Fixed-point loop termination

Guaranteed: processed_data_roots grows monotonically, and iteration continues only when found_new && justification_advanced. Since the payload map is finite and each data_root is processed at most once, the loop terminates. ✓


Minor: redundant state transition at loop end

The final process_slots + process_block at lines 1099–1101 replicates the last loop iteration's state transition in cases where the loop ran at least once. The extra work is minimal (common path is 1–2 iterations), but the intermediate post_state from the last loop iteration is discarded rather than reused. Acceptable given the simplicity gain.


Overall

The simplification is correct and a net improvement. The two determinism points (Items 1 and 2) are worth addressing before merge — non-deterministic block content between proposers is not a consensus safety issue here (any valid block is accepted), but it makes testing and debugging harder.


Automated review by Claude (Anthropic) · sonnet · custom prompt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant