| 
									
										
										
										
											2005-04-16 15:20:36 -07:00
										 |  |  | Started Nov 1999 by Kanoj Sarcar <kanoj@sgi.com> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | The intent of this file is to have an uptodate, running commentary  | 
					
						
							|  |  |  | from different people about NUMA specific code in the Linux vm. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | What is NUMA? It is an architecture where the memory access times | 
					
						
							|  |  |  | for different regions of memory from a given processor varies | 
					
						
							|  |  |  | according to the "distance" of the memory region from the processor. | 
					
						
							|  |  |  | Each region of memory to which access times are the same from any  | 
					
						
							|  |  |  | cpu, is called a node. On such architectures, it is beneficial if | 
					
						
							|  |  |  | the kernel tries to minimize inter node communications. Schemes | 
					
						
							|  |  |  | for this range from kernel text and read-only data replication | 
					
						
							|  |  |  | across nodes, and trying to house all the data structures that | 
					
						
							|  |  |  | key components of the kernel need on memory on that node. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Currently, all the numa support is to provide efficient handling | 
					
						
							|  |  |  | of widely discontiguous physical memory, so architectures which  | 
					
						
							|  |  |  | are not NUMA but can have huge holes in the physical address space | 
					
						
							|  |  |  | can use the same code. All this code is bracketed by CONFIG_DISCONTIGMEM. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | The initial port includes NUMAizing the bootmem allocator code by | 
					
						
							|  |  |  | encapsulating all the pieces of information into a bootmem_data_t | 
					
						
							|  |  |  | structure. Node specific calls have been added to the allocator.  | 
					
						
							|  |  |  | In theory, any platform which uses the bootmem allocator should  | 
					
						
							| 
									
										
										
										
											2006-10-03 22:57:56 +02:00
										 |  |  | be able to put the bootmem and mem_map data structures anywhere | 
					
						
							| 
									
										
										
										
											2005-04-16 15:20:36 -07:00
										 |  |  | it deems best. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Each node's page allocation data structures have also been encapsulated | 
					
						
							|  |  |  | into a pg_data_t. The bootmem_data_t is just one part of this. To  | 
					
						
							|  |  |  | make the code look uniform between NUMA and regular UMA platforms,  | 
					
						
							|  |  |  | UMA platforms have a statically allocated pg_data_t too (contig_page_data). | 
					
						
							|  |  |  | For the sake of uniformity, the function num_online_nodes() is also defined | 
					
						
							|  |  |  | for all platforms. As we run benchmarks, we might decide to NUMAize  | 
					
						
							|  |  |  | more variables like low_on_memory, nr_free_pages etc into the pg_data_t. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | The NUMA aware page allocation code currently tries to allocate pages  | 
					
						
							|  |  |  | from different nodes in a round robin manner.  This will be changed to  | 
					
						
							|  |  |  | do concentratic circle search, starting from current node, once the  | 
					
						
							|  |  |  | NUMA port achieves more maturity. The call alloc_pages_node has been  | 
					
						
							|  |  |  | added, so that drivers can make the call and not worry about whether  | 
					
						
							|  |  |  | it is running on a NUMA or UMA platform. |