SOLR-18147 Make a new Grafana dashboard for Solr 10.x by janhoy · Pull Request #4210 · apache/solr

janhoy · 2026-03-12T15:52:38Z

https://issues.apache.org/jira/browse/SOLR-18147

Brand new dashboard, built from mixin source that can re-generate both dashboard and alerts.
Bringing back monitoring-with-prometheus-and-grafana refguide page, but written from scratch, with a new diagram scraping each solr node.
A solr/monitoring/dev folder with a docker-compose file that starts two solr, prometheus, grafana, alertmanager and a tarffic ingester container, to easily test metric/grafana changes locally

Want to review?

This is a first draft, the things most ready for review are the mixin build logic and the dev/ compose setup for local testing.

I'd not recommend starting a details-focused review of each dashboard panel, presentation etc. The dashboard and panels themselves I'd categorize as first LLM draft. I have not done more than fixing them so they display data and react to variable dropdowns. Thus, everything related to choice of dashboard ROWs, selection and presentation of what metrics to make panels for, and the design of those panels are up for discussion, so the most useful review feedback on the dashboard at this stage is high-level on what rows and panels we need, and what style.

I give every committer permission to commit fixes and improvements to this branch, after first announcing what you intend to do in a review comment or ordinary comment. I am not strongly attached to the current row+panel selection.

Current dashboard layout (Draft)

The rows are:

Node Overview (open by default) — query/index request rates, latency, cores, disk
JVM (open by default) — heap, GC, threads, CPU
SolrCloud (collapsed) — Overseer queues, ZK ops, shard leaders
Index Health (collapsed) — segments, index size, merge rates, MMap efficiency
Cache Efficiency (collapsed) — filter/query/document cache hit rates and evictions

Here are some screenshots:

Disclaimer: All of this is built by Claude Code.

Add traffic generator Run two solr's in example cluster

Fix several panels

support running your own solr

janhoy · 2026-03-13T00:09:10Z

So the foundation is laid I believe. Technically it is working and I generally like the "rows" and panels chosen by AI.

But there are probably useful changes to do. Here are some I can think of

Add a panel for system memory (dependent on SOLR-18159 Add metrics for system memory #4209), perhaps a stacked area with heap-max in it
Distinguish between "collection QPS" and "per-core" QPS. I think the metrics include a label for whether they are "local" or not?
Add panel for number of zookeepers "up"
Add panel for number of solr nodes "up"
Other panels for cluster-level things like number of collections, shard leadership over time
Gather more user feedback for what they lack
Add OTEL collector to the docker-compose and have it push metrics to the same prometheus, but with a different "cluster" or "environment" label, to test those dropdowns.

gus-asf · 2026-03-13T14:00:25Z

Latency graphs should always show the max, p50 is basically useless... https://www.youtube.com/watch?v=lJ8ydIuPFeU

Also update latency is only rarely interesting... throughput is what most folks care about for indexing, that and stuck/failed documents.

mlbiscoc · 2026-03-13T14:13:39Z

Thanks Jan this looks like a great start. I'll find some time to take a look. I really love the docker compose setup making it easy to test. Something we should add is also a way to turn on tracing module with this so we can also see exemplars that Solr supports now as well with these dashboards. Maybe a second iteration since that is definitely way out of scope.

janhoy · 2026-03-13T14:36:00Z

Latency graphs should always show the max, p50 is basically useless... https://www.youtube.com/watch?v=lJ8ydIuPFeU

Good feedback, adding in a max graph in the search latency panel. Let's do that.

Also update latency is only rarely interesting... throughput is what most folks care about for indexing, that and stuck/failed documents.

Yea, cause /update is non-blocking, right, so it won't tell much other than how large the payload was and perhaps how busy the server was. Let's use that real estate for something better.

janhoy · 2026-03-13T14:41:47Z

Something we should add is also a way to turn on tracing module with this so we can also see exemplars that Solr supports now as well with these dashboards.

Thought of it but wanted to keep scope somewhat low, so I think this PR should focus on a GA dashboard. Then follow up work could add OTEL collector and Jaeger to the dev/ setup. I also discovered Microsofts Aspire Dashboard project, and I think I'll add it to compose. It shows you real-time what OTLP packets (metric, trace, logs) are received, and you can inspect the content of all. it has a simple traces viewer.

Jesssullivan · 2026-03-13T16:22:15Z

Looking good! +1 on lacing up a OTEL collector next 👀

Thought of it but wanted to keep scope somewhat low, so I think this PR should focus on a GA dashboard. Then follow up work could add OTEL collector and Jaeger to the dev/ setup. I

janhoy · 2026-03-19T14:18:04Z

Are you ok with the location in the monorepo solr/monitoring ? In some way it more belongs on the top level, but I guess I try to avoid adding stuff to top level. Considered separate git repo but that breaks with our monorepo style, and it is useful to keep dashboard in sync with evolution of the app.

mlbiscoc · 2026-03-19T21:05:14Z

I like solr/monitoring location over it being at the root and not putting in a separate repo. In a separate repo, if we add metrics or change, it'd be hard to see it without switching between 2 repos. I'd vote how it is.

epugh · 2026-03-21T11:35:19Z

solr/monitoring/dev/docker-compose.yml

+#   ./stack.sh --help              # All options
+#
+# Services (full stack):
+#   solr1        http://localhost:8983  (SolrCloud node 1, embedded ZooKeeper)


I ❤️ this!

epugh

Good progress.. There is a lot here that I don't quite grok... Is trafficgen coming out of other perf related effrots, or just "hey, we need some load" ;-)

janhoy · 2026-03-21T23:27:17Z

Good progress.. There is a lot here that I don't quite grok... Is trafficgen coming out of other perf related effrots, or just "hey, we need some load" ;-)

Trafficgen is just something I wrote earlier, not written for perf at all, just to have something happening in a cluster, as it is boring to view a dashboard or traces with nothing going on. This dev/ hack is just convenience tooling to assist when developing / changing dashboards, metrics, modifyint OTEL Collector configuration etc.

Do you feel it is too much to add? Should the entire dev/ folder move to /dev-tools/monitoring instead, and trafficgen to /dev-tools/trafficgen?

SOLR-18147 Make a new Grafana dashboard for Solr 10.x

a297ba2

janhoy marked this pull request as draft March 12, 2026 15:52

github-actions bot added documentation Improvements or additions to documentation scripts labels Mar 12, 2026

janhoy added 4 commits March 12, 2026 16:54

Shorten changelog title

4f5a1a1

Tested makefile and fixed it

630f788

Docker based makefile build

1842193

Fix various dashboard issues

17cb754

Add traffic generator Run two solr's in example cluster

janhoy marked this pull request as ready for review March 12, 2026 19:51

janhoy requested a review from Copilot March 12, 2026 19:51

Copilot started reviewing on behalf of janhoy March 12, 2026 19:52 View session

This comment was marked as outdated.

Sign in to view

janhoy added 2 commits March 12, 2026 21:00

License headers

854cb1e

Add a better screenshot

27d875f

Fix several panels

janhoy requested a review from mlbiscoc March 12, 2026 22:38

stack.sh script for running monitoring stack

8634fe9

support running your own solr

janhoy marked this pull request as draft March 13, 2026 08:58

epugh reviewed Mar 21, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SOLR-18147 Make a new Grafana dashboard for Solr 10.x#4210

SOLR-18147 Make a new Grafana dashboard for Solr 10.x#4210
janhoy wants to merge 8 commits intoapache:mainfrom
janhoy:001-grafana-dashboard-solr10

janhoy commented Mar 12, 2026 •

edited

Loading

Uh oh!

This comment was marked as outdated.

Uh oh!

janhoy commented Mar 13, 2026 •

edited

Loading

Uh oh!

gus-asf commented Mar 13, 2026

Uh oh!

mlbiscoc commented Mar 13, 2026

Uh oh!

janhoy commented Mar 13, 2026

Uh oh!

janhoy commented Mar 13, 2026

Uh oh!

Jesssullivan commented Mar 13, 2026

Uh oh!

janhoy commented Mar 19, 2026

Uh oh!

mlbiscoc commented Mar 19, 2026 •

edited

Loading

Uh oh!

epugh Mar 21, 2026

Uh oh!

epugh left a comment

Uh oh!

janhoy commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

janhoy commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Want to review?

Current dashboard layout (Draft)

Uh oh!

This comment was marked as outdated.

Uh oh!

janhoy commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gus-asf commented Mar 13, 2026

Uh oh!

mlbiscoc commented Mar 13, 2026

Uh oh!

janhoy commented Mar 13, 2026

Uh oh!

janhoy commented Mar 13, 2026

Uh oh!

Jesssullivan commented Mar 13, 2026

Uh oh!

janhoy commented Mar 19, 2026

Uh oh!

mlbiscoc commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

epugh Mar 21, 2026

Choose a reason for hiding this comment

Uh oh!

epugh left a comment

Choose a reason for hiding this comment

Uh oh!

janhoy commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

janhoy commented Mar 12, 2026 •

edited

Loading

janhoy commented Mar 13, 2026 •

edited

Loading

mlbiscoc commented Mar 19, 2026 •

edited

Loading