Skip to content

Fix: return next=None on last page of search results#377

Open
jamditis wants to merge 2 commits intoMuckRock:masterfrom
jamditis:fix/search-pagination-next-null
Open

Fix: return next=None on last page of search results#377
jamditis wants to merge 2 commits intoMuckRock:masterfrom
jamditis:fix/search-pagination-next-null

Conversation

@jamditis
Copy link

Summary

Fixes #372. Search results no longer return a next URL pointing to an empty page when on the last page of results.

Root cause: The previous implementation used math.ceil(results.hits / per_page) to calculate the max page number, but Solr can return approximate hit counts, causing next to point to an empty page.

Fix: For page-number pagination (v1.0), request per_page + 1 rows from Solr and use the extra row's presence to determine whether more results exist. The extra row is truncated before any response formatting. This is more reliable than hit count arithmetic.

Cursor pagination (v2.0) is unchanged — Solr's nextCursorMark equality check already correctly detects the last page. The N+1 approach doesn't work cleanly with Solr cursors because the cursor mark advances past all returned rows (including the probe row), which would skip a document between pages.

Changes

  • search.py_paginate_page() now requests rows + 1 from Solr. _format_response() uses len(results.docs) > per_page instead of hit count math. Removed unused math import.
  • test_search.py — added test_search_last_page_next_is_none (11 docs, per_page=10, verifies page 2 has next=None) and test_search_exact_page_boundary_next_is_none (per_page=11, all results fit on one page).

Test plan

  • Two new tests cover the last-page and exact-boundary cases
  • Existing search tests pass (requires Solr test instance)
  • Manual verification with the example query from the issue: https://api.www.documentcloud.org/api/documents/search/?project=224036

For page-number pagination, request per_page + 1 rows from Solr and
use the extra row's presence to determine if more results exist. This
is more reliable than using results.hits which can be approximate.

The cursor pagination path is unchanged — Solr's nextCursorMark
equality check already correctly detects the last page.

Fixes MuckRock#372
Copilot AI review requested due to automatic review settings March 24, 2026 19:32
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes an issue where v1.0 (page-number) search pagination could return a next URL that points to an empty page by avoiding reliance on Solr’s potentially inexact hit counts.

Changes:

  • For v1.0 pagination, request per_page + 1 rows from Solr and use the extra row to determine whether next should be None.
  • Truncate the extra probe row before formatting/returning results.
  • Add tests asserting next=None on the last page and when results exactly fill a page.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
documentcloud/documents/search.py Implements N+1 probe for page-number pagination and updates next computation logic.
documentcloud/documents/tests/test_search.py Adds regression tests for next=None on last page and exact page boundary.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +598 to +599
# Request one extra row to detect whether a next page exists
return {"rows": rows + 1, "start": start}, {"page": page, "rows": rows}
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_paginate_page now queries Solr with rows + 1, which can exceed the configured max_page_size / settings.SOLR_ANON_MAX_ROWS by 1 when the client requests the maximum per_page. That weakens the intended cap and may cause avoidable load or Solr-side errors if a hard row limit is enforced. Consider clamping the Solr rows parameter (e.g., only probe with +1 when rows < max_page_size, or otherwise fall back to a different last-page detection strategy for the max-size case) while still returning at most rows results to the client.

Suggested change
# Request one extra row to detect whether a next page exists
return {"rows": rows + 1, "start": start}, {"page": page, "rows": rows}
# Request one extra row to detect whether a next page exists, but never exceed max_page_size
solr_rows = rows + 1 if rows < max_page_size else rows
return {"rows": solr_rows, "start": start}, {"page": page, "rows": rows}

Copilot uses AI. Check for mistakes.
Copilot review: rows + 1 could exceed max_page_size when the client
requests the maximum per_page. Now only probes with +1 when
rows < max_page_size; falls back to hit-count check at the cap.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Search results return next parameter even when there are no more results

2 participants