linux-pinenote/Documentation/vm/numa

Started Nov 1999 by Kanoj Sarcar <kanoj@sgi.com>

The intent of this file is to have an uptodate, running commentary 
from different people about NUMA specific code in the Linux vm.

What is NUMA? It is an architecture where the memory access times
for different regions of memory from a given processor varies
according to the "distance" of the memory region from the processor.
Each region of memory to which access times are the same from any 
cpu, is called a node. On such architectures, it is beneficial if
the kernel tries to minimize inter node communications. Schemes
for this range from kernel text and read-only data replication
across nodes, and trying to house all the data structures that
key components of the kernel need on memory on that node.

Currently, all the numa support is to provide efficient handling
of widely discontiguous physical memory, so architectures which 
are not NUMA but can have huge holes in the physical address space
can use the same code. All this code is bracketed by CONFIG_DISCONTIGMEM.

The initial port includes NUMAizing the bootmem allocator code by
encapsulating all the pieces of information into a bootmem_data_t
structure. Node specific calls have been added to the allocator. 
In theory, any platform which uses the bootmem allocator should 
be able to to put the bootmem and mem_map data structures anywhere
it deems best.

Each node's page allocation data structures have also been encapsulated
into a pg_data_t. The bootmem_data_t is just one part of this. To 
make the code look uniform between NUMA and regular UMA platforms, 
UMA platforms have a statically allocated pg_data_t too (contig_page_data).
For the sake of uniformity, the function num_online_nodes() is also defined
for all platforms. As we run benchmarks, we might decide to NUMAize 
more variables like low_on_memory, nr_free_pages etc into the pg_data_t.

The NUMA aware page allocation code currently tries to allocate pages 
from different nodes in a round robin manner.  This will be changed to 
do concentratic circle search, starting from current node, once the 
NUMA port achieves more maturity. The call alloc_pages_node has been 
added, so that drivers can make the call and not worry about whether 
it is running on a NUMA or UMA platform.
Linux-2.6.12-rc2 Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip! 2005-04-16 15:20:36 -07:00			`Started Nov 1999 by Kanoj Sarcar <kanoj@sgi.com>`

			`The intent of this file is to have an uptodate, running commentary`
			`from different people about NUMA specific code in the Linux vm.`

			`What is NUMA? It is an architecture where the memory access times`
			`for different regions of memory from a given processor varies`
			`according to the "distance" of the memory region from the processor.`
			`Each region of memory to which access times are the same from any`
			`cpu, is called a node. On such architectures, it is beneficial if`
			`the kernel tries to minimize inter node communications. Schemes`
			`for this range from kernel text and read-only data replication`
			`across nodes, and trying to house all the data structures that`
			`key components of the kernel need on memory on that node.`

			`Currently, all the numa support is to provide efficient handling`
			`of widely discontiguous physical memory, so architectures which`
			`are not NUMA but can have huge holes in the physical address space`
			`can use the same code. All this code is bracketed by CONFIG_DISCONTIGMEM.`

			`The initial port includes NUMAizing the bootmem allocator code by`
			`encapsulating all the pieces of information into a bootmem_data_t`
			`structure. Node specific calls have been added to the allocator.`
			`In theory, any platform which uses the bootmem allocator should`
			`be able to to put the bootmem and mem_map data structures anywhere`
			`it deems best.`

			`Each node's page allocation data structures have also been encapsulated`
			`into a pg_data_t. The bootmem_data_t is just one part of this. To`
			`make the code look uniform between NUMA and regular UMA platforms,`
			`UMA platforms have a statically allocated pg_data_t too (contig_page_data).`
			`For the sake of uniformity, the function num_online_nodes() is also defined`
			`for all platforms. As we run benchmarks, we might decide to NUMAize`
			`more variables like low_on_memory, nr_free_pages etc into the pg_data_t.`

			`The NUMA aware page allocation code currently tries to allocate pages`
			`from different nodes in a round robin manner. This will be changed to`
			`do concentratic circle search, starting from current node, once the`
			`NUMA port achieves more maturity. The call alloc_pages_node has been`
			`added, so that drivers can make the call and not worry about whether`
			`it is running on a NUMA or UMA platform.`