Benchmark draw optimisations

1. Canvas-level transform rewrite¶

This is the clearest first win because it lets the existing world-space Rect cache actually pay off.

Problem

The current painter recomputes left/top/width/height in screen space for every rect on every frame and creates a new Rect per object while panning and zooming.

What needs to be done

Change the painter to apply one canvas transform that preserves the current mapping screen = world * zoom + pan, then draw the precomputed world-space rects directly.

Logic and mechanics

The viewport transform is global state, not per-object state. Applying it once with canvas.save(), the correct translate/scale combination, and canvas.restore() removes repeated math and removes the per-rect Rect.fromLTWH(...) allocation path.

Expected benefit

This should lower UI-thread cost on every frame, regardless of how many objects are visible, and it is the safest optimization because it does not change scene data or interaction ownership.

2. Viewport culling¶

After the transform rewrite, the next biggest waste is drawing objects that are not on screen.

Problem

The painter currently iterates and draws every rect every frame, so pan and zoom cost scale with total object count rather than visible object count.

What needs to be done

Compute the visible viewport in world coordinates from pan, zoom, and the canvas Size, then skip objects whose world-space rect does not intersect that visible area.

Logic and mechanics

Because rects are already cached in world space, the painter can derive a visibleWorldRect once per frame and use cheap overlap checks before calling drawRect. Add a small overscan margin so edge objects do not flicker in and out due to rounding.

Expected benefit

This should cut both UI and raster work whenever the user is zoomed in or panned away from most of the scene. It will help least in the worst-case “fully zoomed out and almost everything visible” scenario, but it is still the right next step.

3. Additional rect/render-data caching¶

This should come after the first two phases, because the best cache depends on what still shows up in measurement.

Problem

The current cache only stores world-space rects. That removes one conversion step at load time, but it does not yet help with visibility lookup, scene bounds, or repeated scratch work during paint.

What needs to be done

Extend BenchRenderData only with data that stays valid across pan and zoom, such as scene bounds, culling-friendly metadata, and possibly a simple spatial grouping if linear culling still costs too much.

Logic and mechanics

Good caches here are world-stable caches. Screen-space caches are a bad fit because every pan or zoom invalidates them immediately. The most justified next-level cache is therefore something that makes culling cheaper, not something that tries to memoize per-frame transformed geometry.

Expected benefit

This can reduce the remaining CPU cost of deciding what to draw, especially if the benchmark grows beyond 100k objects or if linear culling still leaves too much per-frame scanning overhead.

4. Broader rendering optimization pass¶

This phase should be a cleanup pass guided by fresh profile data, not a grab bag of speculative tweaks.

Problem

Even after better transforms and culling, immediate-mode drawing can still struggle when many rects remain visible, and there may be smaller sources of avoidable overhead around repaint scope and paint configuration.

What needs to be done

Re-profile and then apply bounded improvements such as tightening repaint isolation, reusing paint configuration, disabling antialiasing for plain axis-aligned fills, and checking whether the canvas subtree alone is repainting.

Logic and mechanics

These changes are smaller individually, but together they reduce incidental overhead around the draw loop. They also give clearer evidence for whether the remaining problem is still CPU-side setup or simply too many visible primitives for this rendering path.

Expected benefit

This phase should smooth out the remaining frame-time spikes and clarify whether the current architecture is “good enough” or whether the fully zoomed-out 100k case needs a bigger product-level compromise later.