Skip to content

Improvements to inference using int8 compressed kv's#871

Merged
copybara-service[bot] merged 1 commit intodevfrom
test_875150774
Mar 24, 2026
Merged

Improvements to inference using int8 compressed kv's#871
copybara-service[bot] merged 1 commit intodevfrom
test_875150774

Conversation

@copybara-service
Copy link

@copybara-service copybara-service bot commented Mar 13, 2026

Improvements to inference using int8 compressed kv's
Multiplication is done using int16*int16 multiplication instructions avoid expensive conversion to f32/bf16
x2 speed on zen3

@copybara-service copybara-service bot force-pushed the test_875150774 branch 5 times, most recently from 690c719 to ac4cb67 Compare March 24, 2026 15:15
Multiplication is done using int16*int16 multiplication instructions avoid expensive conversion to f32/bf16
x2 speed on zen3

PiperOrigin-RevId: 888690192
@copybara-service copybara-service bot merged commit f56d18d into dev Mar 24, 2026
@copybara-service copybara-service bot deleted the test_875150774 branch March 24, 2026 15:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants