Evaluating Uniform Memory Access Mode on AMD's Turin

twoodfin 8 hours ago

I’m guessing this is less about average latency / throughput tradeoffs and more about providing predictable performance (all else being equal) for both.

Plenty of legacy software out there that a) will never be optimized for NUMA b) scales via more cores touching more shared memory c) needs to hit SLAs & performance beyond that is effectively wasted.

mgerdts 6 hours ago

A workload that uses only a fraction of such system can be corralled onto a single socket or portion thereof and use local memory through the use of cgroups.
Most likely other workloads will also run on this machine. They can be similarly bound to meet their needs.
With kubernetes, CPU manager can be a big help.
- twoodfin 6 hours ago
  
  That’s not the kind of software I had in mind. I mean single large logical systems—databases being likely the largest and most common—that can’t meaningfully be distributed & are still growing in size and workload scale.