Improve performance of name normalization by jaraco · Pull Request #533 · python/importlib_metadata

jaraco · 2026-03-20T07:42:26Z

Backport of changes from python/cpython#143658

…ance of `importlib.metadata.Prepared.normalized` (#143660) Co-authored-by: Henry Schreiner <henryschreineriii@gmail.com> Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com> Co-authored-by: Bartosz Sławecki <bartosz@ilikepython.com>

…ce of `importlib.metadata.Prepared.normalized` (#144083) Co-authored-by: Henry Schreiner <henryschreineriii@gmail.com>

jaraco · 2026-03-20T07:56:49Z

I don't understand why the performance test is not showing any effect:

exercises.py:cached distribution: 0:00:00.000169 (+-1 day, 23:59:59.999974, -13%)
exercises.py:discovery: 0:00:00.000173 (+0:00:00, 0%)
exercises.py:entry_points(): 0:00:00.002320 (+0:00:00.000060, 3%)
exercises.py:entrypoint_regexp_perf: 0:00:00.000069 (+-1 day, 23:59:59.999998, -3%)
exercises.py:normalize_perf: 0:00:00 (+0:00:00, 0%)
exercises.py:uncached distribution: 0:00:00.000273 (+-1 day, 23:59:59.999978, -7%)

It's seeming to indicate that the normalize('sample') runs in 0 time.

If I add a sleep, it does reveal some performance degradation:

diff --git a/exercises.py b/exercises.py
index b346cc05f8..a48c57fe6e 100644
--- a/exercises.py
+++ b/exercises.py
@@ -49,6 +49,8 @@ def entrypoint_regexp_perf():
 
 def normalize_perf():
     # python/cpython#143658
+    import time
     import importlib_metadata  # end warmup
 
+    time.sleep(0.001)
     importlib_metadata.Prepared.normalize('sample')

exercises.py:normalize_perf: 0:00:00.001270 (+0:00:00, 0%)

So maybe the operation is too small for timeit to measure?

jaraco · 2026-03-20T08:03:01Z

Running timeit directly reveals some run time:

 🐚 .tox/py/bin/python -m timeit --setup 'import importlib_metadata' -- 'importlib_metadata.Prepared("sample")'
1000000 loops, best of 5: 231 nsec per loop

Aah - so 231 nsec is smaller than the precision of Python's timespan, which only handles microseconds.

 importlib_metadata backport-cpython-143658 🐚 pip-run tempora
Python 3.14.3 (main, Feb  3 2026, 15:32:20) [Clang 17.0.0 (clang-1700.6.3.2)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import tempora
>>> tempora.parse_timedelta('231 nsec')
datetime.timedelta(0)

jaraco · 2026-03-20T08:35:40Z

The latest benchmark shows a 74% reduction in execution time.

exercises.py:normalize_perf: 0:00:00.000100 (+-1 day, 23:59:59.999722, -74%)

I get different numbers when I run timeit manually (~230ns vs ~510ns, 55% reduction).

Was it worth it to save a few hundred nanoseconds?

hugovk · 2026-03-20T11:52:42Z

In many cases, the speedup won't make any noticeable effect. The change is similar to an improvement from packaging, which was does have real benefits for pip when doing large resolutions (https://iscinumpy.dev/post/packaging-faster/).

Let's also ask if @henryiii is aware of any use cases.

Personally, I would make this change, I don't think it makes the code harder to read or maintain. But it's fine if you'd prefer to reject this and/or revert in CPython.

btw I'd already opened #529 against main, and this is against maint/8.x. I couldn't find a contrib guide, what's the normal workflow? Can this be documented somewhere, or did I miss it?

This is the better PR -- same prod code change, but parametrised tests and a performance benchmark added.

jaraco · 2026-03-20T15:42:13Z

btw I'd already opened #529 against main, and this is against maint/8.x. I couldn't find a contrib guide, what's the normal workflow? Can this be documented somewhere, or did I miss it?

Thanks for that PR; I'd just not gotten to it as my attention/bandwidth are limited.

There's some documentation in https://github.com/python/importlib_metadata/wiki/Development-Methodology. And the contribution to main was the correct thing at the time. But because I'm now getting this change in place (along with some others) and there's a 9.0 release that I'd like to exclude for the time being, that's why I'm targeting 8.x.

…termediate implementations. Reference the rationale.

jaraco · 2026-03-20T16:05:50Z

Personally, I would make this change, I don't think it makes the code harder to read or maintain. But it's fine if you'd prefer to reject this and/or revert in CPython.

I would argue it is less readable and has other concerns.

It introduces a new value variable whose state set in multiple locations. I try to avoid mutated variables and follow a functional paradigm wherever possible. In fact, I'm tempted to wrap the while loop in another function replace_all in order to encapsulate these imperative behaviors.

It also repeats itself, using the .replace operation an arbitrary number of times both statically and dynamically.

I'd also argue that it's less sophisticated, using lower-level operations instead of a more succinct and reusable approach. Although regex comes with its own pitfalls, it elevates the conversation by re-using well-known (and documented) approaches.

Additionally, although the spec is "PEP 503 normalization plus dashes as underscores", it doesn't actually perform PEP 503 normalization, but compiles in the aggregate operation to produce an equivalent output. Compare that with the original implementation, where the code reflected precisely the specified behavior, even though it could have simply subbed to _ and eliminated the .replace, making the code more self-documenting.

These concerns are mostly just me being pedantic, but I wanted to articulate why I would prefer the prior implementation absent any performance concerns, and why I'd prefer we get a strong rationale for the benefits. Still, I missed my opportunity to get my review in before merged with CPython, so I'm planning to proceed with this change to get the code bases aligned and we can consider reverting later (while keeping the performance and unit tests).

hugovk and others added 5 commits March 20, 2026 03:04

gh-143658: Use str.lower and replace to further improve performan…

001db0d

…ce of `importlib.metadata.Prepared.normalized` (#144083) Co-authored-by: Henry Schreiner <henryschreineriii@gmail.com>

Remove CPython news fragments.

852e44f

Use parameterize fixture for parameterized tests.

1b0be12

Add performance test for Prepared.normalize.

a77d0d1

jaraco mentioned this pull request Mar 20, 2026

Better support for sub-microsecond operations jaraco/pytest-perf#18

Open

Repeat the operation to get performance visibility.

cbadafc

jaraco mentioned this pull request Mar 20, 2026

importlib.metadata: Use str.translate to improve performance of importlib.metadata.Prepared.normalized python/cpython#143658

Closed

Move behavior description into the docstring. Remove references to in…

27169dc

…termediate implementations. Reference the rationale.

jaraco merged commit 8c5d91b into maint/8.x Mar 20, 2026
15 of 30 checks passed

jaraco deleted the backport-cpython-143658 branch March 20, 2026 16:06

This was referenced Mar 20, 2026

Use str.translate to improve performance of normalize #529

Closed

gh-146228: Better fork support in cached FastPath python/cpython#146231

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve performance of name normalization#533

Improve performance of name normalization#533
jaraco merged 7 commits intomaint/8.xfrom
backport-cpython-143658

jaraco commented Mar 20, 2026

Uh oh!

jaraco commented Mar 20, 2026

Uh oh!

jaraco commented Mar 20, 2026

Uh oh!

jaraco commented Mar 20, 2026 •

edited

Loading

Uh oh!

hugovk commented Mar 20, 2026 •

edited

Loading

Uh oh!

jaraco commented Mar 20, 2026

Uh oh!

jaraco commented Mar 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

jaraco commented Mar 20, 2026

Uh oh!

jaraco commented Mar 20, 2026

Uh oh!

jaraco commented Mar 20, 2026

Uh oh!

jaraco commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hugovk commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jaraco commented Mar 20, 2026

Uh oh!

jaraco commented Mar 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jaraco commented Mar 20, 2026 •

edited

Loading

hugovk commented Mar 20, 2026 •

edited

Loading