Skip to content

Batch wgpu submit and present across immediate viewports#7961

Open
gcailly wants to merge 1 commit intoemilk:mainfrom
gcailly:fix/viewport-perf
Open

Batch wgpu submit and present across immediate viewports#7961
gcailly wants to merge 1 commit intoemilk:mainfrom
gcailly:fix/viewport-perf

Conversation

@gcailly
Copy link

@gcailly gcailly commented Mar 6, 2026

Summary

This PR addresses the FPS drop reported in #7885 when multiple immediate viewports are open.

The root cause is that each viewport does its own queue.submit() + present() sequentially, causing redundant GPU synchronization. This PR splits paint_and_update_textures into three phases:

  • paint_prepare — upload textures/buffers, acquire surface texture, record render pass, encode commands
  • paint_submit — single queue.submit() for all viewports at once
  • paint_present — present all viewports after GPU work is done

Immediate viewports now accumulate their PreparedFrames, and the parent viewport batches everything into one submit+present cycle.

paint_and_update_textures is kept as a convenience wrapper calling the three phases sequentially, so the public API remains backward-compatible. Deferred viewports are unaffected (they still go through the wrapper).

Before and after

Tested on Windows 11, release mode, with a minimal benchmark spawning 0 to 10 immediate viewports (vsync off, high-performance GPU).

Viewports Before After Gain
0 1366 1410 +3%
1 695 770 +11%
2 506 622 +23%
3 397 511 +29%
4 315 431 +37%
5 270 367 +36%
6 236 305 +29%
7 202 285 +41%
8 184 249 +35%
9 159 236 +48%
10 149 208 +40%

Note: Initial testing on a different configuration showed a more dramatic improvement (~60 → ~180 FPS with 10 viewports). After more thorough benchmarking, the gain is closer to +30–48% with 4+ viewports.

Benchmark code
#![cfg_attr(not(debug_assertions), windows_subsystem = "windows")]

use eframe::egui_wgpu::{WgpuConfiguration, WgpuSetup, WgpuSetupCreateNew};
use egui::{Id, ViewportId};
use std::time::Instant;
use wgpu::{PowerPreference, PresentMode};

const SECONDS_PER_STEP: f64 = 3.0;
const WARMUP_SECS: f64 = 1.0;
const MAX_VIEWPORTS: usize = 10;

fn main() -> eframe::Result {
    let mut wgpu_options = WgpuConfiguration::default();
    wgpu_options.present_mode = PresentMode::AutoNoVsync;
    wgpu_options.wgpu_setup = match wgpu_options.wgpu_setup {
        WgpuSetup::CreateNew(create_new) => WgpuSetup::CreateNew(WgpuSetupCreateNew {
            power_preference: PowerPreference::HighPerformance,
            ..create_new
        }),
        _ => unreachable!(),
    };

    let native_options = eframe::NativeOptions {
        viewport: egui::ViewportBuilder::default()
            .with_inner_size([400.0, 300.0])
            .with_min_inner_size([300.0, 220.0]),
        vsync: false,
        wgpu_options,
        ..Default::default()
    };

    println!("| Viewports | FPS |");
    println!("|:---------:|:---:|");

    eframe::run_native(
        "viewport_perf",
        native_options,
        Box::new(|_cc| Ok(Box::new(App::new()))),
    )
}

struct App {
    current_step: usize,
    frame_count: usize,
    step_start: Instant,
    warming_up: bool,
    done: bool,
    results: Vec<(usize, usize)>,
}

impl App {
    fn new() -> Self {
        Self {
            current_step: 0,
            frame_count: 0,
            step_start: Instant::now(),
            warming_up: true,
            done: false,
            results: Vec::new(),
        }
    }
}

impl eframe::App for App {
    fn ui(&mut self, ui: &mut egui::Ui, _frame: &mut eframe::Frame) {
        let elapsed = self.step_start.elapsed().as_secs_f64();

        if self.done {
            egui::CentralPanel::default().show_inside(ui, |ui| {
                ui.heading("Benchmark complete!");
                ui.separator();
                for (vp, fps) in &self.results {
                    ui.label(format!("{vp} viewports: {fps} FPS"));
                }
            });
            return;
        }

        if self.warming_up {
            if elapsed >= WARMUP_SECS {
                self.warming_up = false;
                self.frame_count = 0;
                self.step_start = Instant::now();
            }
        } else if elapsed >= SECONDS_PER_STEP {
            let fps = (self.frame_count as f64 / elapsed).round() as usize;
            println!("| {:<9} | {fps:>5} |", self.current_step);
            self.results.push((self.current_step, fps));

            self.current_step += 1;
            self.frame_count = 0;
            self.step_start = Instant::now();
            self.warming_up = true;

            if self.current_step > MAX_VIEWPORTS {
                self.done = true;
                println!("\nDone! You can close the window.");
                return;
            }
        }

        if !self.warming_up {
            self.frame_count += 1;
        }

        egui::CentralPanel::default().show_inside(ui, |ui| {
            ui.heading(format!("Benchmarking: {} viewport(s)...", self.current_step));
            if self.warming_up {
                ui.label("Warming up...");
            } else {
                ui.label(format!(
                    "Measuring ({:.1}s / {SECONDS_PER_STEP}s)",
                    self.step_start.elapsed().as_secs_f64()
                ));
            }
        });

        let viewport_ids: Vec<ViewportId> = (0..self.current_step)
            .map(|i| ViewportId(Id::new(format!("w{i}"))))
            .collect();

        for viewport_id in &viewport_ids {
            ui.ctx().show_viewport_immediate(
                *viewport_id,
                egui::ViewportBuilder::default()
                    .with_inner_size([400.0, 300.0])
                    .with_min_inner_size([300.0, 220.0]),
                |ui, _class| {
                    egui::CentralPanel::default().show_inside(ui, |ui| {
                        ui.heading("Extra Window");
                    });
                },
            );
        }

        ui.ctx().request_repaint();
    }
}

Disclosure

I'm not a Rust developer — I used Claude Code to help me write this. I hope I'm not making a mess, I just wanted to help! Please don't hesitate to point out anything wrong.

Test plan

  • cargo test -p egui-wgpu -p eframe — all tests pass
  • cargo clippy -p egui-wgpu -p eframe — no warnings
  • multiple_viewports example — works correctly
  • Tested with 0–10 immediate viewports — ~30–48% FPS improvement with 4+ viewports

Split `paint_and_update_textures` into three phases (`paint_prepare`,
`paint_submit`, `paint_present`) so that immediate viewports can
accumulate their prepared frames and submit them in a single
`queue.submit()` call instead of one per viewport.

This reduces frame time with many immediate viewports by eliminating
redundant GPU synchronization between each viewport's submit+present
cycle.

Closes emilk#7885
@gcailly gcailly requested a review from Wumpf as a code owner March 6, 2026 14:10
@github-actions
Copy link

github-actions bot commented Mar 6, 2026

Preview available at https://egui-pr-preview.github.io/pr/7961-fixviewport-perf
Note that it might take a couple seconds for the update to show up after the preview_build workflow has completed.

View snapshot changes at kitdiff

@liusuchao
Copy link

Thanks for this awesome optimization! The results are impressive — 3x FPS improvement with 10 viewports is a huge win. Hope this gets reviewed and merged soon. Keep up the great work! 🚀

@gcailly
Copy link
Author

gcailly commented Mar 12, 2026

Thanks @liusuchao! I have to be honest : I was a bit too optimistic with the initial numbers. After more thorough benchmarking (see updated PR description), the real-world gain is closer to 30/40% rather than the 3x improvement I initially reported.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants