Skip to content

Improve performance of name normalization#533

Merged
jaraco merged 7 commits intomaint/8.xfrom
backport-cpython-143658
Mar 20, 2026
Merged

Improve performance of name normalization#533
jaraco merged 7 commits intomaint/8.xfrom
backport-cpython-143658

Conversation

@jaraco
Copy link
Member

@jaraco jaraco commented Mar 20, 2026

Backport of changes from python/cpython#143658

hugovk and others added 5 commits March 20, 2026 03:04
…ance of `importlib.metadata.Prepared.normalized` (#143660)

Co-authored-by: Henry Schreiner <henryschreineriii@gmail.com>
Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
Co-authored-by: Bartosz Sławecki <bartosz@ilikepython.com>
…ce of `importlib.metadata.Prepared.normalized` (#144083)

Co-authored-by: Henry Schreiner <henryschreineriii@gmail.com>
@jaraco
Copy link
Member Author

jaraco commented Mar 20, 2026

I don't understand why the performance test is not showing any effect:

exercises.py:cached distribution: 0:00:00.000169 (+-1 day, 23:59:59.999974, -13%)
exercises.py:discovery: 0:00:00.000173 (+0:00:00, 0%)
exercises.py:entry_points(): 0:00:00.002320 (+0:00:00.000060, 3%)
exercises.py:entrypoint_regexp_perf: 0:00:00.000069 (+-1 day, 23:59:59.999998, -3%)
exercises.py:normalize_perf: 0:00:00 (+0:00:00, 0%)
exercises.py:uncached distribution: 0:00:00.000273 (+-1 day, 23:59:59.999978, -7%)

It's seeming to indicate that the normalize('sample') runs in 0 time.

If I add a sleep, it does reveal some performance degradation:

diff --git a/exercises.py b/exercises.py
index b346cc05f8..a48c57fe6e 100644
--- a/exercises.py
+++ b/exercises.py
@@ -49,6 +49,8 @@ def entrypoint_regexp_perf():
 
 def normalize_perf():
     # python/cpython#143658
+    import time
     import importlib_metadata  # end warmup
 
+    time.sleep(0.001)
     importlib_metadata.Prepared.normalize('sample')
exercises.py:normalize_perf: 0:00:00.001270 (+0:00:00, 0%)

So maybe the operation is too small for timeit to measure?

@jaraco
Copy link
Member Author

jaraco commented Mar 20, 2026

Running timeit directly reveals some run time:

 🐚 .tox/py/bin/python -m timeit --setup 'import importlib_metadata' -- 'importlib_metadata.Prepared("sample")'
1000000 loops, best of 5: 231 nsec per loop

Aah - so 231 nsec is smaller than the precision of Python's timespan, which only handles microseconds.

 importlib_metadata backport-cpython-143658 🐚 pip-run tempora
Python 3.14.3 (main, Feb  3 2026, 15:32:20) [Clang 17.0.0 (clang-1700.6.3.2)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import tempora
>>> tempora.parse_timedelta('231 nsec')
datetime.timedelta(0)

@jaraco
Copy link
Member Author

jaraco commented Mar 20, 2026

The latest benchmark shows a 74% reduction in execution time.

exercises.py:normalize_perf: 0:00:00.000100 (+-1 day, 23:59:59.999722, -74%)

I get different numbers when I run timeit manually (~230ns vs ~510ns, 55% reduction).

Was it worth it to save a few hundred nanoseconds?

@hugovk
Copy link
Member

hugovk commented Mar 20, 2026

In many cases, the speedup won't make any noticeable effect. The change is similar to an improvement from packaging, which was does have real benefits for pip when doing large resolutions (https://iscinumpy.dev/post/packaging-faster/).

Let's also ask if @henryiii is aware of any use cases.

Personally, I would make this change, I don't think it makes the code harder to read or maintain. But it's fine if you'd prefer to reject this and/or revert in CPython.


btw I'd already opened #529 against main, and this is against maint/8.x. I couldn't find a contrib guide, what's the normal workflow? Can this be documented somewhere, or did I miss it?

This is the better PR -- same prod code change, but parametrised tests and a performance benchmark added.

@jaraco
Copy link
Member Author

jaraco commented Mar 20, 2026

btw I'd already opened #529 against main, and this is against maint/8.x. I couldn't find a contrib guide, what's the normal workflow? Can this be documented somewhere, or did I miss it?

Thanks for that PR; I'd just not gotten to it as my attention/bandwidth are limited.

There's some documentation in https://github.com/python/importlib_metadata/wiki/Development-Methodology. And the contribution to main was the correct thing at the time. But because I'm now getting this change in place (along with some others) and there's a 9.0 release that I'd like to exclude for the time being, that's why I'm targeting 8.x.

…termediate implementations. Reference the rationale.
@jaraco
Copy link
Member Author

jaraco commented Mar 20, 2026

Personally, I would make this change, I don't think it makes the code harder to read or maintain. But it's fine if you'd prefer to reject this and/or revert in CPython.

I would argue it is less readable and has other concerns.

It introduces a new value variable whose state set in multiple locations. I try to avoid mutated variables and follow a functional paradigm wherever possible. In fact, I'm tempted to wrap the while loop in another function replace_all in order to encapsulate these imperative behaviors.

It also repeats itself, using the .replace operation an arbitrary number of times both statically and dynamically.

I'd also argue that it's less sophisticated, using lower-level operations instead of a more succinct and reusable approach. Although regex comes with its own pitfalls, it elevates the conversation by re-using well-known (and documented) approaches.

Additionally, although the spec is "PEP 503 normalization plus dashes as underscores", it doesn't actually perform PEP 503 normalization, but compiles in the aggregate operation to produce an equivalent output. Compare that with the original implementation, where the code reflected precisely the specified behavior, even though it could have simply subbed to _ and eliminated the .replace, making the code more self-documenting.


These concerns are mostly just me being pedantic, but I wanted to articulate why I would prefer the prior implementation absent any performance concerns, and why I'd prefer we get a strong rationale for the benefits. Still, I missed my opportunity to get my review in before merged with CPython, so I'm planning to proceed with this change to get the code bases aligned and we can consider reverting later (while keeping the performance and unit tests).

@jaraco jaraco merged commit 8c5d91b into maint/8.x Mar 20, 2026
15 of 30 checks passed
@jaraco jaraco deleted the backport-cpython-143658 branch March 20, 2026 16:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants