@hyeyoo @lkundrak @sj @vbabka bro I'm not I'm an imposter, you're the real one. I don't even work in kernel mm (atm anyway)
Plus age/qualifications don't matter, you got talent which can't be taught. I have an undergrad in civ eng + taught myself :)
I'd say main benefit of NUMA isn't bottleneck, but rather accounting for different time taken for memory accesses thus allowing the kernel to stop you doing something stupid.
I always picture the literal physical setup of a 2 socket system where there's ram attached to each core and a slow interconnect between the two, you don't want to be using that interconnect!
I guess you could say you are trying to avoid the 'global bus' if this == the interconnect.
You find that by default most x86 just has NUMA turned on anyway even in laptop situations, I mean my desktop does too, just with a single node.
I sort of feel like we should have CONFIG_NUMA turned on by default, as it would simplify the code, and just say ok there's 1 node, and all the various mem policy stuff won't make any difference.
One thing that bugged me on my arm64 laptop is that put everything in ZONE_DMA because there zones don't matter. But still kind of... ugly