fix: temporal supervision shift in XYZ loss mask uses t+1 instead of t#52
Open
atharrva01 wants to merge 1 commit intoDevoLearn:mainfrom
Open
fix: temporal supervision shift in XYZ loss mask uses t+1 instead of t#52atharrva01 wants to merge 1 commit intoDevoLearn:mainfrom
atharrva01 wants to merge 1 commit intoDevoLearn:mainfrom
Conversation
Use birth_times[c] <= t instead of <= (t+1) so the XYZ loss supervises predictions against cells alive at the current snapshot, not one step ahead. Signed-off-by: atharrva01 <atharvaborade568@gmail.com>
Contributor
Author
|
hi @devoworm this pr fixes off-by-one in XYZ loss mask was including t+1 cells (unborn at current snapshot) in MSE supervision, silently training the model on future targets it can't observe |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
While digging through the training loop, I found a silent but critical bug in how the XYZ loss mask is constructed in
NDP-HNN/train.py.The mask was using
birth_times[c] <= (t + 1), which pulls in cells born at the next time step cells that literally don't exist in the graph the model is currently processing. So at every snapshott, the MSE loss was forcing the model's output to match coordinates of cells it hasn't observed yet.No crash, no NaN it just silently trains the model on a shifted signal for every cell division event in the dataset, which in a growing C. elegans embryo is basically every training snapshot.
What was wrong
The model processes snapshot
tand producespred_xyzfor cells present att. Butmask_nextwas selecting rows for cells born att+1, so the loss was minimizing MSE against coordinates the model had no way to observe or predict causally.Fix
Single character change. Everything else
target_xyzindexing, MSE computation,incidence_bce, the detach pattern — is fine.Impact
The model was accumulating RNN gradients from a supervision signal that was off by one time step across the entire training run. After the fix, the learned developmental program actually corresponds to real embryonic state at each snapshot. Before this, results would show the model appearing to "anticipate" future cell divisions - which looked biologically interesting but was entirely an artifact of the leaky mask.