42 lines
		
	
	
	
		
			2.2 KiB
			
		
	
	
	
		
			Text
		
	
	
	
	
	
		
		
			
		
	
	
			42 lines
		
	
	
	
		
			2.2 KiB
			
		
	
	
	
		
			Text
		
	
	
	
	
	
| 
								 | 
							
								Started Nov 1999 by Kanoj Sarcar <kanoj@sgi.com>
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								The intent of this file is to have an uptodate, running commentary 
							 | 
						||
| 
								 | 
							
								from different people about NUMA specific code in the Linux vm.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								What is NUMA? It is an architecture where the memory access times
							 | 
						||
| 
								 | 
							
								for different regions of memory from a given processor varies
							 | 
						||
| 
								 | 
							
								according to the "distance" of the memory region from the processor.
							 | 
						||
| 
								 | 
							
								Each region of memory to which access times are the same from any 
							 | 
						||
| 
								 | 
							
								cpu, is called a node. On such architectures, it is beneficial if
							 | 
						||
| 
								 | 
							
								the kernel tries to minimize inter node communications. Schemes
							 | 
						||
| 
								 | 
							
								for this range from kernel text and read-only data replication
							 | 
						||
| 
								 | 
							
								across nodes, and trying to house all the data structures that
							 | 
						||
| 
								 | 
							
								key components of the kernel need on memory on that node.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Currently, all the numa support is to provide efficient handling
							 | 
						||
| 
								 | 
							
								of widely discontiguous physical memory, so architectures which 
							 | 
						||
| 
								 | 
							
								are not NUMA but can have huge holes in the physical address space
							 | 
						||
| 
								 | 
							
								can use the same code. All this code is bracketed by CONFIG_DISCONTIGMEM.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								The initial port includes NUMAizing the bootmem allocator code by
							 | 
						||
| 
								 | 
							
								encapsulating all the pieces of information into a bootmem_data_t
							 | 
						||
| 
								 | 
							
								structure. Node specific calls have been added to the allocator. 
							 | 
						||
| 
								 | 
							
								In theory, any platform which uses the bootmem allocator should 
							 | 
						||
| 
								 | 
							
								be able to to put the bootmem and mem_map data structures anywhere
							 | 
						||
| 
								 | 
							
								it deems best.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Each node's page allocation data structures have also been encapsulated
							 | 
						||
| 
								 | 
							
								into a pg_data_t. The bootmem_data_t is just one part of this. To 
							 | 
						||
| 
								 | 
							
								make the code look uniform between NUMA and regular UMA platforms, 
							 | 
						||
| 
								 | 
							
								UMA platforms have a statically allocated pg_data_t too (contig_page_data).
							 | 
						||
| 
								 | 
							
								For the sake of uniformity, the function num_online_nodes() is also defined
							 | 
						||
| 
								 | 
							
								for all platforms. As we run benchmarks, we might decide to NUMAize 
							 | 
						||
| 
								 | 
							
								more variables like low_on_memory, nr_free_pages etc into the pg_data_t.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								The NUMA aware page allocation code currently tries to allocate pages 
							 | 
						||
| 
								 | 
							
								from different nodes in a round robin manner.  This will be changed to 
							 | 
						||
| 
								 | 
							
								do concentratic circle search, starting from current node, once the 
							 | 
						||
| 
								 | 
							
								NUMA port achieves more maturity. The call alloc_pages_node has been 
							 | 
						||
| 
								 | 
							
								added, so that drivers can make the call and not worry about whether 
							 | 
						||
| 
								 | 
							
								it is running on a NUMA or UMA platform.
							 |