| 
									
										
										
										
											2013-11-14 14:31:51 -08:00
										 |  |  | Split page table lock | 
					
						
							|  |  |  | ===================== | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Originally, mm->page_table_lock spinlock protected all page tables of the | 
					
						
							|  |  |  | mm_struct. But this approach leads to poor page fault scalability of | 
					
						
							|  |  |  | multi-threaded applications due high contention on the lock. To improve | 
					
						
							|  |  |  | scalability, split page table lock was introduced. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | With split page table lock we have separate per-table lock to serialize | 
					
						
							|  |  |  | access to the table. At the moment we use split lock for PTE and PMD | 
					
						
							|  |  |  | tables. Access to higher level tables protected by mm->page_table_lock. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | There are helpers to lock/unlock a table and other accessor functions: | 
					
						
							|  |  |  |  - pte_offset_map_lock() | 
					
						
							|  |  |  | 	maps pte and takes PTE table lock, returns pointer to the taken | 
					
						
							|  |  |  | 	lock; | 
					
						
							|  |  |  |  - pte_unmap_unlock() | 
					
						
							|  |  |  | 	unlocks and unmaps PTE table; | 
					
						
							|  |  |  |  - pte_alloc_map_lock() | 
					
						
							|  |  |  | 	allocates PTE table if needed and take the lock, returns pointer | 
					
						
							|  |  |  | 	to taken lock or NULL if allocation failed; | 
					
						
							|  |  |  |  - pte_lockptr() | 
					
						
							|  |  |  | 	returns pointer to PTE table lock; | 
					
						
							|  |  |  |  - pmd_lock() | 
					
						
							|  |  |  | 	takes PMD table lock, returns pointer to taken lock; | 
					
						
							|  |  |  |  - pmd_lockptr() | 
					
						
							|  |  |  | 	returns pointer to PMD table lock; | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Split page table lock for PTE tables is enabled compile-time if | 
					
						
							|  |  |  | CONFIG_SPLIT_PTLOCK_CPUS (usually 4) is less or equal to NR_CPUS. | 
					
						
							|  |  |  | If split lock is disabled, all tables guaded by mm->page_table_lock. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Split page table lock for PMD tables is enabled, if it's enabled for PTE | 
					
						
							|  |  |  | tables and the architecture supports it (see below). | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Hugetlb and split page table lock | 
					
						
							|  |  |  | --------------------------------- | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Hugetlb can support several page sizes. We use split lock only for PMD | 
					
						
							|  |  |  | level, but not for PUD. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Hugetlb-specific helpers: | 
					
						
							|  |  |  |  - huge_pte_lock() | 
					
						
							|  |  |  | 	takes pmd split lock for PMD_SIZE page, mm->page_table_lock | 
					
						
							|  |  |  | 	otherwise; | 
					
						
							|  |  |  |  - huge_pte_lockptr() | 
					
						
							|  |  |  | 	returns pointer to table lock; | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Support of split page table lock by an architecture | 
					
						
							|  |  |  | --------------------------------------------------- | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | There's no need in special enabling of PTE split page table lock: | 
					
						
							|  |  |  | everything required is done by pgtable_page_ctor() and pgtable_page_dtor(), | 
					
						
							|  |  |  | which must be called on PTE table allocation / freeing. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Make sure the architecture doesn't use slab allocator for page table | 
					
						
							|  |  |  | allocation: slab uses page->slab_cache and page->first_page for its pages. | 
					
						
							|  |  |  | These fields share storage with page->ptl. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | PMD split lock only makes sense if you have more than two page table | 
					
						
							|  |  |  | levels. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | PMD split lock enabling requires pgtable_pmd_page_ctor() call on PMD table | 
					
						
							|  |  |  | allocation and pgtable_pmd_page_dtor() on freeing. | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2013-11-21 14:32:09 -08:00
										 |  |  | Allocation usually happens in pmd_alloc_one(), freeing in pmd_free() and | 
					
						
							|  |  |  | pmd_free_tlb(), but make sure you cover all PMD table allocation / freeing | 
					
						
							|  |  |  | paths: i.e X86_PAE preallocate few PMDs on pgd_alloc(). | 
					
						
							| 
									
										
										
										
											2013-11-14 14:31:51 -08:00
										 |  |  | 
 | 
					
						
							|  |  |  | With everything in place you can set CONFIG_ARCH_ENABLE_SPLIT_PMD_PTLOCK. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | NOTE: pgtable_page_ctor() and pgtable_pmd_page_ctor() can fail -- it must | 
					
						
							|  |  |  | be handled properly. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | page->ptl | 
					
						
							|  |  |  | --------- | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | page->ptl is used to access split page table lock, where 'page' is struct | 
					
						
							|  |  |  | page of page containing the table. It shares storage with page->private | 
					
						
							|  |  |  | (and few other fields in union). | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | To avoid increasing size of struct page and have best performance, we use a | 
					
						
							|  |  |  | trick: | 
					
						
							|  |  |  |  - if spinlock_t fits into long, we use page->ptr as spinlock, so we | 
					
						
							|  |  |  |    can avoid indirect access and save a cache line. | 
					
						
							|  |  |  |  - if size of spinlock_t is bigger then size of long, we use page->ptl as | 
					
						
							|  |  |  |    pointer to spinlock_t and allocate it dynamically. This allows to use | 
					
						
							|  |  |  |    split lock with enabled DEBUG_SPINLOCK or DEBUG_LOCK_ALLOC, but costs | 
					
						
							|  |  |  |    one more cache line for indirect access; | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | The spinlock_t allocated in pgtable_page_ctor() for PTE table and in | 
					
						
							|  |  |  | pgtable_pmd_page_ctor() for PMD table. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Please, never access page->ptl directly -- use appropriate helper. |