Skip to content

[fix](build) fix macOS BE startup crash caused by fd_number overflow#61770

Open
924060929 wants to merge 1 commit intoapache:masterfrom
924060929:fix-macos-be-fd-overflow
Open

[fix](build) fix macOS BE startup crash caused by fd_number overflow#61770
924060929 wants to merge 1 commit intoapache:masterfrom
924060929:fix-macos-be-fd-overflow

Conversation

@924060929
Copy link
Contributor

On macOS, getrlimit(RLIMIT_NOFILE) can return RLIM_INFINITY (INT64_MAX). The calculation fd_number / 100 * segment_cache_fd_percentage produces 1844674407370955160, which overflows cast_set<uint32_t> in SegmentCache constructor, crashing BE on startup. Linux kernels cap this via fs.nr_open (default 1M), so only macOS is affected.

Fix: cap fd_number on macOS before the segment cache capacity calculation.

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@924060929
Copy link
Contributor Author

run buildall

@924060929 924060929 force-pushed the fix-macos-be-fd-overflow branch from ed9e5c2 to b907d19 Compare March 26, 2026 10:29
On macOS, getrlimit(RLIMIT_NOFILE) can return RLIM_INFINITY (INT64_MAX).
The calculation fd_number / 100 * segment_cache_fd_percentage produces
1844674407370955160, which overflows cast_set<uint32_t> in SegmentCache
constructor, crashing BE on startup. Linux kernels cap this via fs.nr_open
(default 1M), so only macOS is affected.

Fix: cap fd_number on macOS before the segment cache capacity calculation.
@924060929 924060929 force-pushed the fix-macos-be-fd-overflow branch from b907d19 to ce8a098 Compare March 26, 2026 10:36
@924060929
Copy link
Contributor Author

run buildall

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Mar 26, 2026
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@doris-robot
Copy link

TPC-H: Total hot run time: 26362 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit ce8a0982d2728f7737ed5e380c994502e0386f53, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17598	4474	4316	4316
q2	q3	10647	810	516	516
q4	4675	350	257	257
q5	7548	1207	1009	1009
q6	173	174	146	146
q7	761	875	674	674
q8	9302	1479	1307	1307
q9	4768	4709	4534	4534
q10	6256	1898	1636	1636
q11	456	255	254	254
q12	699	568	474	474
q13	18018	2653	1956	1956
q14	230	233	217	217
q15	q16	730	741	659	659
q17	700	792	474	474
q18	5863	5410	5352	5352
q19	1104	1000	602	602
q20	538	504	372	372
q21	4412	1816	1372	1372
q22	343	286	235	235
Total cold run time: 94821 ms
Total hot run time: 26362 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4982	4710	4616	4616
q2	q3	3887	4360	3836	3836
q4	892	1205	780	780
q5	4054	4423	4324	4324
q6	195	175	145	145
q7	1780	1664	1519	1519
q8	2528	2716	2539	2539
q9	7732	7354	7451	7354
q10	3876	4036	3668	3668
q11	497	421	416	416
q12	478	582	439	439
q13	2468	2926	2080	2080
q14	271	280	274	274
q15	q16	696	783	729	729
q17	1182	1338	1308	1308
q18	7163	6795	6757	6757
q19	870	852	1065	852
q20	2054	2156	1984	1984
q21	3960	3491	3300	3300
q22	430	429	376	376
Total cold run time: 49995 ms
Total hot run time: 47296 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 169367 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit ce8a0982d2728f7737ed5e380c994502e0386f53, data reload: false

query5	4338	625	510	510
query6	346	231	205	205
query7	4237	473	259	259
query8	343	251	237	237
query9	8736	2728	2745	2728
query10	508	386	350	350
query11	7033	5090	4897	4897
query12	187	135	127	127
query13	1281	467	362	362
query14	5852	3729	3493	3493
query14_1	2813	2848	2837	2837
query15	206	198	180	180
query16	1002	490	459	459
query17	1136	746	638	638
query18	2450	453	368	368
query19	225	220	184	184
query20	136	127	127	127
query21	215	142	112	112
query22	13255	14106	14690	14106
query23	16677	16463	16107	16107
query23_1	15870	15667	15687	15667
query24	7264	1624	1256	1256
query24_1	1243	1254	1223	1223
query25	548	469	420	420
query26	1243	264	146	146
query27	2783	479	299	299
query28	4483	1829	1844	1829
query29	852	564	493	493
query30	293	231	192	192
query31	1011	943	875	875
query32	78	68	71	68
query33	519	346	297	297
query34	895	865	514	514
query35	633	680	617	617
query36	1078	1110	999	999
query37	142	98	86	86
query38	2923	2902	2876	2876
query39	860	849	817	817
query39_1	794	795	798	795
query40	235	154	136	136
query41	64	60	60	60
query42	256	257	254	254
query43	241	245	225	225
query44	
query45	199	195	181	181
query46	874	993	604	604
query47	2833	2151	2075	2075
query48	313	316	224	224
query49	636	462	405	405
query50	706	300	223	223
query51	4077	4088	4029	4029
query52	260	265	250	250
query53	292	338	282	282
query54	300	280	268	268
query55	88	88	84	84
query56	318	328	316	316
query57	1917	1746	1706	1706
query58	322	272	264	264
query59	2778	2941	2747	2747
query60	346	334	322	322
query61	164	161	160	160
query62	629	593	541	541
query63	312	275	279	275
query64	5113	1299	1011	1011
query65	
query66	1451	455	348	348
query67	24272	24271	24121	24121
query68	
query69	407	315	286	286
query70	950	963	927	927
query71	337	321	298	298
query72	2868	2685	2605	2605
query73	538	541	320	320
query74	9628	9565	9437	9437
query75	2859	2775	2453	2453
query76	2283	1051	666	666
query77	368	390	336	336
query78	10957	11181	10492	10492
query79	1124	779	565	565
query80	917	625	554	554
query81	540	262	226	226
query82	1350	157	118	118
query83	348	264	240	240
query84	257	119	96	96
query85	929	517	481	481
query86	421	305	289	289
query87	3128	3100	3024	3024
query88	3576	2661	2662	2661
query89	422	366	346	346
query90	1911	183	178	178
query91	172	170	141	141
query92	80	76	73	73
query93	921	826	498	498
query94	546	332	287	287
query95	600	338	323	323
query96	636	528	227	227
query97	2475	2475	2414	2414
query98	236	222	216	216
query99	997	989	915	915
Total cold run time: 250391 ms
Total hot run time: 169367 ms

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.91% (19924/37659)
Line Coverage 36.42% (186632/512495)
Region Coverage 32.64% (144588/442973)
Branch Coverage 33.87% (63451/187312)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100% (0/0) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.72% (26452/36884)
Line Coverage 54.56% (278769/510950)
Region Coverage 51.82% (231690/447098)
Branch Coverage 53.25% (100050/187878)

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants