How Much Linear Memory Access Is Enough?

Published: 2026-04-11

How Much Linear Memory Access Is Enough?
For basically any high-performance computation, memory layout and access pattern are critical. Common wisdom is that linear, contiguous memory performs best and should almost always be preferred. However, it should be intuitively clear that this has diminishing returns: processing a single 32 GB block vs processing two 16 GB blocks will not meaningfully differ in performance. Working with smaller blocks enables some interesting data structures, so I've set out to experimentally determine what block size is needed to effectively capture the full performance. Findings Setup and detailed analysis below, but my personal takeaway is: 1 MB blocks are enough for basically any workload of this kind 128 kB blocks suffice once you have at least ~1 cycle per processed byte 4 kB blocks are already enough once you're above roughly ~10 cycles per processed byte (for raw data processing, not necessarily if there are other per-block costs) This is the full results chart for my Ryzen 9 7950X3D, effectively showing the block sizes needed for peak performance across different workloads. The rest of this post will go over the setup and discuss a few isolated graphs. Code and results are available here: github.com/solidean/bench-linear-access Setup The memory hierarchy of modern CPUs is famously complex, so I've tried to create an experimental setup where we can isolate and control the effects well enough to make our results generalize. Our main question is: when we have to process a data set in …

Originally sourced from Hacker News

Read the full story on Global Insight Daily