Skip to content

Persistent MonitorEvents#4491

Open
valentinewallace wants to merge 20 commits intolightningdevkit:mainfrom
valentinewallace:2026-03-persistent-mon-evs
Open

Persistent MonitorEvents#4491
valentinewallace wants to merge 20 commits intolightningdevkit:mainfrom
valentinewallace:2026-03-persistent-mon-evs

Conversation

@valentinewallace
Copy link
Contributor

@valentinewallace valentinewallace commented Mar 17, 2026

As part of #4482, we're looking into changing our architecture -- currently, the Channel{Manager} is responsible for managing the resolution of off-chain HTLCs, and the ChannelMonitor is responsible for them once they're on-chain. See the issue description but there's complexity that results from this design.

Quoting the issue, "Instead, we want to do all resolution in ChannelMonitors (in response to ChannelMonitorUpdates) and pass them back to ChannelManager in the form of MonitorEvents (similar to how HTLCs are resolved after channels are closed). In order to have reliable resolution, we'll need to keep MonitorEvents around in the ChannelMonitor until the ChannelManager has finished processing them - adding a new MonitorEvent resolution path through a new method (rather than via ChannelMonitorUpdates)."

Here we don't add any new MonitorEvents but add support for making the ones we do have persistent across restarts until they are explicitly ack'd, via a new chain::Watch API.

  • update commit messages

@ldk-reviews-bot
Copy link

ldk-reviews-bot commented Mar 17, 2026

👋 Thanks for assigning @TheBlueMatt as a reviewer!
I'll wait for their review and will help manage the review process.
Once they submit their review, I'll check if a second reviewer would be helpful.

(17, in_flight_monitor_updates, option),
(19, peer_storage_dir, optional_vec),
(21, async_receive_offer_cache, (default_value, async_receive_offer_cache)),
(23, persistent_monitor_events, (default_value, false)),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the logic isn't complete, should we write this as an even tlv on the write side and deliberately not implement read for it yet?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean use an even tlv and conditionally write it only if it's set to true? That makes sense to me

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the plan for upgrades/downgrades?

  • Until ~everything's in place, use even TLVs for things but don't write them in prod, and error in prod if they are set on read
  • Then have a version or two that allows the even TLVs to be set on read, but doesn't write them, to allow downgrades
  • Then have a version that allows the even TLVs to be missing, runs the legacy reconstruction logic once, then sets persistent_monitor_events in the manager/monitors at the end of manager read so even TLVs will be henceforth written

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, that's basically what I was thinking.

@valentinewallace valentinewallace force-pushed the 2026-03-persistent-mon-evs branch from 2a22b3e to 6930ccd Compare March 19, 2026 17:11
@valentinewallace valentinewallace self-assigned this Mar 19, 2026
@valentinewallace valentinewallace added the weekly goal Someone wants to land this this week label Mar 19, 2026
@valentinewallace valentinewallace force-pushed the 2026-03-persistent-mon-evs branch from 6930ccd to 2eba975 Compare March 19, 2026 20:24
@valentinewallace valentinewallace marked this pull request as ready for review March 19, 2026 20:27
@valentinewallace valentinewallace requested review from TheBlueMatt and removed request for wpaulino March 19, 2026 20:28
@codecov
Copy link

codecov bot commented Mar 19, 2026

Codecov Report

❌ Patch coverage is 87.41573% with 56 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.17%. Comparing base (d830f10) to head (63fc698).
⚠️ Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
lightning/src/ln/channelmanager.rs 80.72% 17 Missing and 15 partials ⚠️
lightning/src/chain/channelmonitor.rs 92.30% 6 Missing and 6 partials ⚠️
lightning/src/ln/channel.rs 88.04% 9 Missing and 2 partials ⚠️
lightning/src/chain/chainmonitor.rs 94.73% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4491      +/-   ##
==========================================
- Coverage   86.18%   86.17%   -0.01%     
==========================================
  Files         160      160              
  Lines      107536   107826     +290     
  Branches   107536   107826     +290     
==========================================
+ Hits        92680    92923     +243     
- Misses      12231    12261      +30     
- Partials     2625     2642      +17     
Flag Coverage Δ
tests 86.17% <87.41%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Collaborator

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically LGTM!

HTLCFailureMsg::Relay(fail_htlc) => HTLCForwardInfo::FailHTLC {
htlc_id: fail_htlc.htlc_id,
err_packet: fail_htlc.into(),
monitor_event_source: None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm we're failing to forward, presumably there's a monitor event from the inbound edge we have to complete when we fail the HTLC? Bit funny to fail an HTLC back with a monitor event completion for the same channel but makes sense I think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the inbound edge generates any monitor events for forwarded HTLCs right now. Do you mean we should do this in the future?

@ldk-reviews-bot
Copy link

👋 The first review has been submitted!

Do you think this PR is ready for a second reviewer? If so, click here to assign a second reviewer.

@valentinewallace valentinewallace force-pushed the 2026-03-persistent-mon-evs branch from 2eba975 to b8644fb Compare March 20, 2026 21:34
@valentinewallace
Copy link
Contributor Author

Rebased with no changes due to conflicts

@ldk-claude-review-bot
Copy link
Collaborator

ldk-claude-review-bot commented Mar 20, 2026

Review Summary

I performed a thorough review of all files in this PR. Several issues from my prior review pass appear to now be resolved in the current code (the pending_monitor_events field removal and its references, and the blocked monitor update event handling). Here is the one new issue I found:

Inline comments posted:

  • lightning/src/chain/channelmonitor.rs:6420 — The offered_preimage_claim duplicate check only scans pending_monitor_events, missing .chain(self.provided_monitor_events.iter()). This is inconsistent with the accepted_preimage_claim check at line 6398 which was correctly updated. Could produce duplicate HTLCEvents when persistent_events_enabled is true.

Prior comments status:

  • chainmonitor.rs:385 (removed field still referenced) — Appears resolved in current code; the field is removed and all call sites push directly into monitors.
  • channelmonitor.rs:5982 (duplicate check missing provided_monitor_events) — Appears resolved; line 5981 now chains both.
  • channelmonitor.rs:6399 (accepted_preimage_claim duplicate check) — Appears resolved; line 6398 now chains both. The actual unfixed check is the offered_preimage_claim at line 6420 (covered by my new inline comment above).
  • channel.rs:8916 (blocked monitor update losing events) — Appears resolved; PendingChannelMonitorUpdate now has post_update_ackable_events field.
  • channelmanager.rs:9363 (missing comment about why only first HTLC gets source) — Still applicable as a documentation concern.

@valentinewallace valentinewallace force-pushed the 2026-03-persistent-mon-evs branch from b8644fb to bbf1786 Compare March 20, 2026 22:11
Comment on lines 9331 to +9363
@@ -9245,6 +9360,7 @@ impl<
&incoming_trampoline_shared_secret,
&None,
*htlc_id,
if i == 0 { trampoline_monitor_event_source } else { None },
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit/concern: Unlike the claim path (line 10217-10221), there's no comment explaining why only the first HTLC gets the monitor_event_source for the failure case. The reasoning is different here — for claims, the preimage is stored in the monitor and will be replayed to remaining hops via pending_claims_to_replay. For failures, there's no equivalent replay mechanism for the remaining inbound HTLCs.

If the outbound monitor event is acked (because the first inbound failure was committed and the outbound monitor happened to persist), but the node crashes before the remaining inbound HTLCs are failed, those remaining failures would not be re-triggered on restart.

This is acknowledged by the TODO at line 9327, but it might be worth a comment here explaining why the risk is acceptable today (e.g., the ack is not durable since the outbound monitor doesn't persist on ack, so the event will replay in practice).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trampoline replay w/r/t persistent monitor events is a known TODO. We plan to rewrite/revisit this whole hunk IIUC, so I think we're okay here.

Currently, the resolution of HTLCs (and decisions on when HTLCs can be
forwarded) is the responsibility of Channel objects (a part of ChannelManager)
until the channel is closed, and then the ChannelMonitor thereafter. This leads
to some complexity around race conditions for HTLCs right around channel
closure. Additionally, there is lots of complexity reconstructing the state of
all HTLCs in the ChannelManager deserialization/loading logic.

Instead, we want to do all resolution in ChannelMonitors (in response to
ChannelMonitorUpdates) and pass them back to ChannelManager in the form of
MonitorEvents (similar to how HTLCs are resolved after channels are closed). In
order to have reliable resolution, we'll need to keep MonitorEvents around in
the ChannelMonitor until the ChannelManager has finished processing them.  This
simplify things - on restart instead of examining the set of HTLCs in monitors
we can simply replay all the pending MonitorEvents.

As a first step towards this, here we persist a flag in the ChannelManager and
ChannelMonitors indicating whether this new feature is enabled. It will be used
in upcoming commits to maintain compatibility and create an upgrade/downgrade
path between LDK versions.
Cleans up the next commit
This field will be deprecated in upcoming commits when we start persisting
MonitorEvent ids alongside the MonitorEvents.
Currently, the resolution of HTLCs (and decisions on when HTLCs can be
forwarded) is the responsibility of Channel objects (a part of ChannelManager)
until the channel is closed, and then the ChannelMonitor thereafter. This leads
to some complexity around race conditions for HTLCs right around channel
closure. Additionally, there is lots of complexity reconstructing the state of
all HTLCs in the ChannelManager deserialization/loading logic.

Instead, we want to do all resolution in ChannelMonitors (in response to
ChannelMonitorUpdates) and pass them back to ChannelManager in the form of
MonitorEvents (similar to how HTLCs are resolved after channels are closed). In
order to have reliable resolution, we'll need to keep MonitorEvents around in
the ChannelMonitor until the ChannelManager has finished processing them.  This
simplify things - on restart instead of examining the set of HTLCs in monitors
we can simply replay all the pending MonitorEvents.

Here we add an as-yet-unused API to chain::Watch to allow the ChannelManager to
tell the a ChannelMonitor that a MonitorEvent has been irrevocably processed
and can be deleted.
Currently, the resolution of HTLCs (and decisions on when HTLCs can be
forwarded) is the responsibility of Channel objects (a part of ChannelManager)
until the channel is closed, and then the ChannelMonitor thereafter. This leads
to some complexity around race conditions for HTLCs right around channel
closure. Additionally, there is lots of complexity reconstructing the state of
all HTLCs in the ChannelManager deserialization/loading logic.

Instead, we want to do all resolution in ChannelMonitors (in response to
ChannelMonitorUpdates) and pass them back to ChannelManager in the form of
MonitorEvents (similar to how HTLCs are resolved after channels are closed). In
order to have reliable resolution, we'll need to keep MonitorEvents around in
the ChannelMonitor until the ChannelManager has finished processing them.  This
simplify things - on restart instead of examining the set of HTLCs in monitors
we can simply replay all the pending MonitorEvents.

To allow the ChannelManager to ack specific monitor events once they are
resolved in upcoming commits, here we give each MonitorEvent a corresponding
unique id. It's implemented in such a way that we can delete legacy monitor
event code in the future when the new persistent monitor events flag is enabled
by default.
Currently, the resolution of HTLCs (and decisions on when HTLCs can be
forwarded) is the responsibility of Channel objects (a part of ChannelManager)
until the channel is closed, and then the ChannelMonitor thereafter. This leads
to some complexity around race conditions for HTLCs right around channel
closure. Additionally, there is lots of complexity reconstructing the state of
all HTLCs in the ChannelManager deserialization/loading logic.

Instead, we want to do all resolution in ChannelMonitors (in response to
ChannelMonitorUpdates) and pass them back to ChannelManager in the form of
MonitorEvents (similar to how HTLCs are resolved after channels are closed). In
order to have reliable resolution, we'll need to keep MonitorEvents around in
the ChannelMonitor until the ChannelManager has finished processing them. This
will simplify things - on restart instead of examining the set of HTLCs in
monitors we can simply replay all the pending MonitorEvents.

Here we add a MonitorUpdateCompletionAction that will allow the ChannelManager
to tell a monitor that an event is complete, upon completion of a particular
ChannelMonitorUpdate. For example, this will be used when an HTLC is
irrevocably failed backwards post-channel-close, to delete the corresponding
MonitorEvent::HTLCEvent.
Currently, the resolution of HTLCs (and decisions on when HTLCs can be
forwarded) is the responsibility of Channel objects (a part of ChannelManager)
until the channel is closed, and then the ChannelMonitor thereafter. This leads
to some complexity around race conditions for HTLCs right around channel
closure. Additionally, there is lots of complexity reconstructing the state of
all HTLCs in the ChannelManager deserialization/loading logic.

Instead, we want to do all resolution in ChannelMonitors (in response to
ChannelMonitorUpdates) and pass them back to ChannelManager in the form of
MonitorEvents (similar to how HTLCs are resolved after channels are closed). In
order to have reliable resolution, we'll need to keep MonitorEvents around in
the ChannelMonitor until the ChannelManager has finished processing them. This
will simplify things - on restart instead of examining the set of HTLCs in
monitors we can simply replay all the pending MonitorEvents.

Here we take care of the monitor events that are trivial to ACK, since all
except HTLCEvents can be ACK'd immediately after the event is handled and don't
need to be re-processed on startup.
Currently, the resolution of HTLCs (and decisions on when HTLCs can be
forwarded) is the responsibility of Channel objects (a part of ChannelManager)
until the channel is closed, and then the ChannelMonitor thereafter. This leads
to some complexity around race conditions for HTLCs right around channel
closure. Additionally, there is lots of complexity reconstructing the state of
all HTLCs in the ChannelManager deserialization/loading logic.

Instead, we want to do all resolution in ChannelMonitors (in response to
ChannelMonitorUpdates) and pass them back to ChannelManager in the form of
MonitorEvents (similar to how HTLCs are resolved after channels are closed). In
order to have reliable resolution, we'll need to keep MonitorEvents around in
the ChannelMonitor until the ChannelManager has finished processing them. This
will simplify things - on restart instead of examining the set of HTLCs in
monitors we can simply replay all the pending MonitorEvents.

If an outbound payment resolved on-chain and we get a monitor event for it, we
want to mark that monitor event as resolved once the user has processed a
PaymentSent or a PaymentFailed event.
Currently, the resolution of HTLCs (and decisions on when HTLCs can be
forwarded) is the responsibility of Channel objects (a part of ChannelManager)
until the channel is closed, and then the ChannelMonitor thereafter. This leads
to some complexity around race conditions for HTLCs right around channel
closure. Additionally, there is lots of complexity reconstructing the state of
all HTLCs in the ChannelManager deserialization/loading logic.

Instead, we want to do all resolution in ChannelMonitors (in response to
ChannelMonitorUpdates) and pass them back to ChannelManager in the form of
MonitorEvents (similar to how HTLCs are resolved after channels are closed). In
order to have reliable resolution, we'll need to keep MonitorEvents around in
the ChannelMonitor until the ChannelManager has finished processing them. This
will simplify things - on restart instead of examining the set of HTLCs in
monitors we can simply replay all the pending MonitorEvents.

If the outbound edge of a forwarded payment is resolved on-chain via preimage,
and we get a monitor event for it, we want to mark that monitor event resolved
once the preimage is durably persisted in the inbound edge. Hence, we add the
monitor event resolution to the completion of the inbound edge's preimage
monitor update.
Currently, the resolution of HTLCs (and decisions on when HTLCs can be
forwarded) is the responsibility of Channel objects (a part of ChannelManager)
until the channel is closed, and then the ChannelMonitor thereafter. This leads
to some complexity around race conditions for HTLCs right around channel
closure. Additionally, there is lots of complexity reconstructing the state of
all HTLCs in the ChannelManager deserialization/loading logic.

Instead, we want to do all resolution in ChannelMonitors (in response to
ChannelMonitorUpdates) and pass them back to ChannelManager in the form of
MonitorEvents (similar to how HTLCs are resolved after channels are closed). In
order to have reliable resolution, we'll need to keep MonitorEvents around in
the ChannelMonitor until the ChannelManager has finished processing them. This
will simplify things - on restart instead of examining the set of HTLCs in
monitors we can simply replay all the pending MonitorEvents.

To ensure we can resolve HTLC monitor events for forward failures, we need to
pipe the event id through the HTLC failure pipeline, starting from when we
first handle the monitor event and call fail_htlc_backwards. To get it from
there to the next stage, here we store the monitor event id in the manager's
HTLCForwardInfo::Fail*. In upcoming commits we will eventually get to the point
of acking the monitor event when the HTLC is irrevocably removed from the
inbound edge via monitor update.
Currently, the resolution of HTLCs (and decisions on when HTLCs can be
forwarded) is the responsibility of Channel objects (a part of ChannelManager)
until the channel is closed, and then the ChannelMonitor thereafter. This leads
to some complexity around race conditions for HTLCs right around channel
closure. Additionally, there is lots of complexity reconstructing the state of
all HTLCs in the ChannelManager deserialization/loading logic.

Instead, we want to do all resolution in ChannelMonitors (in response to
ChannelMonitorUpdates) and pass them back to ChannelManager in the form of
MonitorEvents (similar to how HTLCs are resolved after channels are closed). In
order to have reliable resolution, we'll need to keep MonitorEvents around in
the ChannelMonitor until the ChannelManager has finished processing them. This
will simplify things - on restart instead of examining the set of HTLCs in
monitors we can simply replay all the pending MonitorEvents.

To ensure we can resolve HTLC monitor events for forward failures, we need to
pipe the event id through the HTLC failure pipeline. We started this process
in the previous commit, and here we continue by storing the monitor event id
in the Channel's holding cell HTLC failures. In upcoming commits we will
eventually get to the point of acking the monitor event when the HTLC is
irrevocably removed from the inbound edge via monitor update.
Currently, the resolution of HTLCs (and decisions on when HTLCs can be
forwarded) is the responsibility of Channel objects (a part of ChannelManager)
until the channel is closed, and then the ChannelMonitor thereafter. This leads
to some complexity around race conditions for HTLCs right around channel
closure. Additionally, there is lots of complexity reconstructing the state of
all HTLCs in the ChannelManager deserialization/loading logic.

Instead, we want to do all resolution in ChannelMonitors (in response to
ChannelMonitorUpdates) and pass them back to ChannelManager in the form of
MonitorEvents (similar to how HTLCs are resolved after channels are closed). In
order to have reliable resolution, we'll need to keep MonitorEvents around in
the ChannelMonitor until the ChannelManager has finished processing them. This
will simplify things - on restart instead of examining the set of HTLCs in
monitors we can simply replay all the pending MonitorEvents.

To ensure we can resolve HTLC monitor events for forward failures, we need to
pipe the event id through the HTLC failure pipeline. Here we pipe the monitor
event source from the manager to the channel, so it can be put in the channel's
holding cell.

In upcoming commits we will eventually get to the point of acking the monitor
event when the HTLC is irrevocably removed from the inbound edge via monitor
update.
Currently, the resolution of HTLCs (and decisions on when HTLCs can be
forwarded) is the responsibility of Channel objects (a part of ChannelManager)
until the channel is closed, and then the ChannelMonitor thereafter. This leads
to some complexity around race conditions for HTLCs right around channel
closure. Additionally, there is lots of complexity reconstructing the state of
all HTLCs in the ChannelManager deserialization/loading logic.

Instead, we want to do all resolution in ChannelMonitors (in response to
ChannelMonitorUpdates) and pass them back to ChannelManager in the form of
MonitorEvents (similar to how HTLCs are resolved after channels are closed). In
order to have reliable resolution, we'll need to keep MonitorEvents around in
the ChannelMonitor until the ChannelManager has finished processing them. This
will simplify things - on restart instead of examining the set of HTLCs in
monitors we can simply replay all the pending MonitorEvents.

Here we build on recent commits by ACK'ing monitor events for forward failures
once the monitor update that marks them as failed on the inbound edge is
complete.
Currently, the resolution of HTLCs (and decisions on when HTLCs can be
forwarded) is the responsibility of Channel objects (a part of ChannelManager)
until the channel is closed, and then the ChannelMonitor thereafter. This leads
to some complexity around race conditions for HTLCs right around channel
closure. Additionally, there is lots of complexity reconstructing the state of
all HTLCs in the ChannelManager deserialization/loading logic.

Instead, we want to do all resolution in ChannelMonitors (in response to
ChannelMonitorUpdates) and pass them back to ChannelManager in the form of
MonitorEvents (similar to how HTLCs are resolved after channels are closed). In
order to have reliable resolution, we'll need to keep MonitorEvents around in
the ChannelMonitor until the ChannelManager has finished processing them. This
will simplify things - on restart instead of examining the set of HTLCs in
monitors we can simply replay all the pending MonitorEvents.

Here we build on recent commits by ACK'ing monitor events for forward failures
once the monitor update that marks them as failed on the inbound edge is
complete, when the HTLC was irrevocably failed via revoke_and_ack.
Currently, the resolution of HTLCs (and decisions on when HTLCs can be
forwarded) is the responsibility of Channel objects (a part of ChannelManager)
until the channel is closed, and then the ChannelMonitor thereafter. This leads
to some complexity around race conditions for HTLCs right around channel
closure. Additionally, there is lots of complexity reconstructing the state of
all HTLCs in the ChannelManager deserialization/loading logic.

Instead, we want to do all resolution in ChannelMonitors (in response to
ChannelMonitorUpdates) and pass them back to ChannelManager in the form of
MonitorEvents (similar to how HTLCs are resolved after channels are closed). In
order to have reliable resolution, we'll need to keep MonitorEvents around in
the ChannelMonitor until the ChannelManager has finished processing them. This
will simplify things - on restart instead of examining the set of HTLCs in
monitors we can simply replay all the pending MonitorEvents.

Here we build on recent commits by ACK'ing monitor events for forward failures
if try to fail backwards and discover the inbound edge is closed. If that's the
case, there's nothing for us to do as the HTLC will be resolved via timeout
by the inbound edge counterparty.
Currently, the resolution of HTLCs (and decisions on when HTLCs can be
forwarded) is the responsibility of Channel objects (a part of ChannelManager)
until the channel is closed, and then the ChannelMonitor thereafter. This leads
to some complexity around race conditions for HTLCs right around channel
closure. Additionally, there is lots of complexity reconstructing the state of
all HTLCs in the ChannelManager deserialization/loading logic.

Instead, we want to do all resolution in ChannelMonitors (in response to
ChannelMonitorUpdates) and pass them back to ChannelManager in the form of
MonitorEvents (similar to how HTLCs are resolved after channels are closed). In
order to have reliable resolution, we'll need to keep MonitorEvents around in
the ChannelMonitor until the ChannelManager has finished processing them. This
will simplify things - on restart instead of examining the set of HTLCs in
monitors we can simply replay all the pending MonitorEvents.

Here we build on recent commits by ACK'ing monitor events for forward failures
if we go to fail on the inbound edge and get an error when queueing the
backwards failure. If we get an error in this case it means the failure is
duplicate or the channel is closed, and in either case it's fine to mark the
event as resolved.
@valentinewallace valentinewallace force-pushed the 2026-03-persistent-mon-evs branch from bbf1786 to ec3ee13 Compare March 23, 2026 19:56
@valentinewallace
Copy link
Contributor Author

valentinewallace commented Mar 23, 2026

Addressed feedback with diff

Edit: with one additional rustfmt fix

Previously, we would drop the list of monitor events that could be ack'd after
a ChannelMonitorUpdate completed, if that monitor update was generated within
the Channel and ended up in the Channel's blocked-updates queue.
Currently, the resolution of HTLCs (and decisions on when HTLCs can be
forwarded) is the responsibility of Channel objects (a part of ChannelManager)
until the channel is closed, and then the ChannelMonitor thereafter. This leads
to some complexity around race conditions for HTLCs right around channel
closure. Additionally, there is lots of complexity reconstructing the state of
all HTLCs in the ChannelManager deserialization/loading logic.

Instead, we want to do all resolution in ChannelMonitors (in response to
ChannelMonitorUpdates) and pass them back to ChannelManager in the form of
MonitorEvents (similar to how HTLCs are resolved after channels are closed). In
order to have reliable resolution, we'll need to keep MonitorEvents around in
the ChannelMonitor until the ChannelManager has finished processing them. This
will simplify things - on restart instead of examining the set of HTLCs in
monitors we can simply replay all the pending MonitorEvents.

Here we complete work that was built on recent prior commits and actually start
re-providing monitor events on startup if they went un-acked during runtime.
This isn't actually supported in prod yet, so this new code will run randomly
in tests, to ensure we still support the old paths.
@valentinewallace valentinewallace force-pushed the 2026-03-persistent-mon-evs branch from ec3ee13 to 63fc698 Compare March 23, 2026 19:58
Comment on lines 6420 to 6423
if !self.pending_monitor_events.iter().any(
|update| if let &MonitorEvent::HTLCEvent(ref upd) = update {
|(_, update)| if let &MonitorEvent::HTLCEvent(ref upd) = update {
upd.source == source
} else { false }) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug (latent, test-only for now): This offered_preimage_claim duplicate check only scans pending_monitor_events, but when persistent_events_enabled is true, events may have been moved to provided_monitor_events by a prior get_and_clear_pending_monitor_events() call.

This is inconsistent with the accepted_preimage_claim check just above at line 6398, which was correctly updated to chain both:

self.pending_monitor_events.iter().chain(self.provided_monitor_events.iter())

Without chaining provided_monitor_events here, a duplicate HTLCEvent could be pushed when the same on-chain spend is re-scanned after events have been provided but not yet acked.

Suggested change
if !self.pending_monitor_events.iter().any(
|update| if let &MonitorEvent::HTLCEvent(ref upd) = update {
|(_, update)| if let &MonitorEvent::HTLCEvent(ref upd) = update {
upd.source == source
} else { false }) {
if !self.pending_monitor_events.iter().chain(self.provided_monitor_events.iter()).any(
|(_, update)| if let &MonitorEvent::HTLCEvent(ref upd) = update {
upd.source == source
} else { false }) {

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

weekly goal Someone wants to land this this week

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

4 participants