Improve performance of name normalization#533
Conversation
…ance of `importlib.metadata.Prepared.normalized` (#143660) Co-authored-by: Henry Schreiner <henryschreineriii@gmail.com> Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com> Co-authored-by: Bartosz Sławecki <bartosz@ilikepython.com>
…ce of `importlib.metadata.Prepared.normalized` (#144083) Co-authored-by: Henry Schreiner <henryschreineriii@gmail.com>
|
I don't understand why the performance test is not showing any effect: It's seeming to indicate that the If I add a sleep, it does reveal some performance degradation: diff --git a/exercises.py b/exercises.py
index b346cc05f8..a48c57fe6e 100644
--- a/exercises.py
+++ b/exercises.py
@@ -49,6 +49,8 @@ def entrypoint_regexp_perf():
def normalize_perf():
# python/cpython#143658
+ import time
import importlib_metadata # end warmup
+ time.sleep(0.001)
importlib_metadata.Prepared.normalize('sample')So maybe the operation is too small for timeit to measure? |
|
Running timeit directly reveals some run time: Aah - so 231 nsec is smaller than the precision of Python's timespan, which only handles microseconds. |
|
The latest benchmark shows a 74% reduction in execution time. I get different numbers when I run timeit manually (~230ns vs ~510ns, 55% reduction). Was it worth it to save a few hundred nanoseconds? |
|
In many cases, the speedup won't make any noticeable effect. The change is similar to an improvement from Let's also ask if @henryiii is aware of any use cases. Personally, I would make this change, I don't think it makes the code harder to read or maintain. But it's fine if you'd prefer to reject this and/or revert in CPython. btw I'd already opened #529 against This is the better PR -- same prod code change, but parametrised tests and a performance benchmark added. |
Thanks for that PR; I'd just not gotten to it as my attention/bandwidth are limited. There's some documentation in https://github.com/python/importlib_metadata/wiki/Development-Methodology. And the contribution to main was the correct thing at the time. But because I'm now getting this change in place (along with some others) and there's a 9.0 release that I'd like to exclude for the time being, that's why I'm targeting 8.x. |
…termediate implementations. Reference the rationale.
I would argue it is less readable and has other concerns. It introduces a new It also repeats itself, using the I'd also argue that it's less sophisticated, using lower-level operations instead of a more succinct and reusable approach. Although regex comes with its own pitfalls, it elevates the conversation by re-using well-known (and documented) approaches. Additionally, although the spec is "PEP 503 normalization plus dashes as underscores", it doesn't actually perform PEP 503 normalization, but compiles in the aggregate operation to produce an equivalent output. Compare that with the original implementation, where the code reflected precisely the specified behavior, even though it could have simply subbed to These concerns are mostly just me being pedantic, but I wanted to articulate why I would prefer the prior implementation absent any performance concerns, and why I'd prefer we get a strong rationale for the benefits. Still, I missed my opportunity to get my review in before merged with CPython, so I'm planning to proceed with this change to get the code bases aligned and we can consider reverting later (while keeping the performance and unit tests). |
Backport of changes from python/cpython#143658