| 
									
										
										
										
											2005-10-29 18:16:52 -07:00
										 |  |  | #ifndef __LINUX_MEMORY_HOTPLUG_H
 | 
					
						
							|  |  |  | #define __LINUX_MEMORY_HOTPLUG_H
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | #include <linux/mmzone.h>
 | 
					
						
							|  |  |  | #include <linux/spinlock.h>
 | 
					
						
							| 
									
										
										
										
											2005-10-29 18:16:54 -07:00
										 |  |  | #include <linux/notifier.h>
 | 
					
						
							| 
									
										
										
										
											2011-11-23 20:12:59 -05:00
										 |  |  | #include <linux/bug.h>
 | 
					
						
							| 
									
										
										
										
											2005-10-29 18:16:52 -07:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2006-03-06 15:42:49 -08:00
										 |  |  | struct page; | 
					
						
							|  |  |  | struct zone; | 
					
						
							|  |  |  | struct pglist_data; | 
					
						
							| 
									
										
										
										
											2008-04-28 02:12:01 -07:00
										 |  |  | struct mem_section; | 
					
						
							| 
									
										
										
										
											2012-10-08 16:34:01 -07:00
										 |  |  | struct memory_block; | 
					
						
							| 
									
										
										
										
											2006-03-06 15:42:49 -08:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2005-10-29 18:16:52 -07:00
										 |  |  | #ifdef CONFIG_MEMORY_HOTPLUG
 | 
					
						
							| 
									
										
											  
											
												memory hotplug: register section/node id to free
This patch set is to free pages which is allocated by bootmem for
memory-hotremove.  Some structures of memory management are allocated by
bootmem.  ex) memmap, etc.
To remove memory physically, some of them must be freed according to
circumstance.  This patch set makes basis to free those pages, and free
memmaps.
Basic my idea is using remain members of struct page to remember information
of users of bootmem (section number or node id).  When the section is
removing, kernel can confirm it.  By this information, some issues can be
solved.
  1) When the memmap of removing section is allocated on other
     section by bootmem, it should/can be free.
  2) When the memmap of removing section is allocated on the
     same section, it shouldn't be freed. Because the section has to be
     logical memory offlined already and all pages must be isolated against
     page allocater. If it is freed, page allocator may use it which will
     be removed physically soon.
  3) When removing section has other section's memmap,
     kernel will be able to show easily which section should be removed
     before it for user. (Not implemented yet)
  4) When the above case 2), the page isolation will be able to check and skip
     memmap's page when logical memory offline (offline_pages()).
     Current page isolation code fails in this case because this page is
     just reserved page and it can't distinguish this pages can be
     removed or not. But, it will be able to do by this patch.
     (Not implemented yet.)
  5) The node information like pgdat has similar issues. But, this
     will be able to be solved too by this.
     (Not implemented yet, but, remembering node id in the pages.)
Fortunately, current bootmem allocator just keeps PageReserved flags,
and doesn't use any other members of page struct. The users of
bootmem doesn't use them too.
This patch:
This is to register information which is node or section's id.  Kernel can
distinguish which node/section uses the pages allcated by bootmem.  This is
basis for hot-remove sections or nodes.
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											
										 
											2008-04-28 02:13:31 -07:00
										 |  |  | 
 | 
					
						
							|  |  |  | /*
 | 
					
						
							| 
									
										
										
										
											2011-01-13 15:47:00 -08:00
										 |  |  |  * Types for free bootmem stored in page->lru.next. These have to be in | 
					
						
							|  |  |  |  * some random range in unsigned long space for debugging purposes. | 
					
						
							| 
									
										
											  
											
												memory hotplug: register section/node id to free
This patch set is to free pages which is allocated by bootmem for
memory-hotremove.  Some structures of memory management are allocated by
bootmem.  ex) memmap, etc.
To remove memory physically, some of them must be freed according to
circumstance.  This patch set makes basis to free those pages, and free
memmaps.
Basic my idea is using remain members of struct page to remember information
of users of bootmem (section number or node id).  When the section is
removing, kernel can confirm it.  By this information, some issues can be
solved.
  1) When the memmap of removing section is allocated on other
     section by bootmem, it should/can be free.
  2) When the memmap of removing section is allocated on the
     same section, it shouldn't be freed. Because the section has to be
     logical memory offlined already and all pages must be isolated against
     page allocater. If it is freed, page allocator may use it which will
     be removed physically soon.
  3) When removing section has other section's memmap,
     kernel will be able to show easily which section should be removed
     before it for user. (Not implemented yet)
  4) When the above case 2), the page isolation will be able to check and skip
     memmap's page when logical memory offline (offline_pages()).
     Current page isolation code fails in this case because this page is
     just reserved page and it can't distinguish this pages can be
     removed or not. But, it will be able to do by this patch.
     (Not implemented yet.)
  5) The node information like pgdat has similar issues. But, this
     will be able to be solved too by this.
     (Not implemented yet, but, remembering node id in the pages.)
Fortunately, current bootmem allocator just keeps PageReserved flags,
and doesn't use any other members of page struct. The users of
bootmem doesn't use them too.
This patch:
This is to register information which is node or section's id.  Kernel can
distinguish which node/section uses the pages allcated by bootmem.  This is
basis for hot-remove sections or nodes.
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											
										 
											2008-04-28 02:13:31 -07:00
										 |  |  |  */ | 
					
						
							| 
									
										
										
										
											2011-01-13 15:47:00 -08:00
										 |  |  | enum { | 
					
						
							|  |  |  | 	MEMORY_HOTPLUG_MIN_BOOTMEM_TYPE = 12, | 
					
						
							|  |  |  | 	SECTION_INFO = MEMORY_HOTPLUG_MIN_BOOTMEM_TYPE, | 
					
						
							|  |  |  | 	MIX_SECTION_INFO, | 
					
						
							|  |  |  | 	NODE_INFO, | 
					
						
							|  |  |  | 	MEMORY_HOTPLUG_MAX_BOOTMEM_TYPE = NODE_INFO, | 
					
						
							|  |  |  | }; | 
					
						
							| 
									
										
											  
											
												memory hotplug: register section/node id to free
This patch set is to free pages which is allocated by bootmem for
memory-hotremove.  Some structures of memory management are allocated by
bootmem.  ex) memmap, etc.
To remove memory physically, some of them must be freed according to
circumstance.  This patch set makes basis to free those pages, and free
memmaps.
Basic my idea is using remain members of struct page to remember information
of users of bootmem (section number or node id).  When the section is
removing, kernel can confirm it.  By this information, some issues can be
solved.
  1) When the memmap of removing section is allocated on other
     section by bootmem, it should/can be free.
  2) When the memmap of removing section is allocated on the
     same section, it shouldn't be freed. Because the section has to be
     logical memory offlined already and all pages must be isolated against
     page allocater. If it is freed, page allocator may use it which will
     be removed physically soon.
  3) When removing section has other section's memmap,
     kernel will be able to show easily which section should be removed
     before it for user. (Not implemented yet)
  4) When the above case 2), the page isolation will be able to check and skip
     memmap's page when logical memory offline (offline_pages()).
     Current page isolation code fails in this case because this page is
     just reserved page and it can't distinguish this pages can be
     removed or not. But, it will be able to do by this patch.
     (Not implemented yet.)
  5) The node information like pgdat has similar issues. But, this
     will be able to be solved too by this.
     (Not implemented yet, but, remembering node id in the pages.)
Fortunately, current bootmem allocator just keeps PageReserved flags,
and doesn't use any other members of page struct. The users of
bootmem doesn't use them too.
This patch:
This is to register information which is node or section's id.  Kernel can
distinguish which node/section uses the pages allcated by bootmem.  This is
basis for hot-remove sections or nodes.
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											
										 
											2008-04-28 02:13:31 -07:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2014-08-06 16:05:13 -07:00
										 |  |  | /* Types for control the zone type of onlined and offlined memory */ | 
					
						
							| 
									
										
											  
											
												mm, memory-hotplug: dynamic configure movable memory and portion memory
Add online_movable and online_kernel for logic memory hotplug.  This is
the dynamic version of "movablecore" & "kernelcore".
We have the same reason to introduce it as to introduce "movablecore" &
"kernelcore".  It has the same motive as "movablecore" & "kernelcore", but
it is dynamic/running-time:
o We can configure memory as kernelcore or movablecore after boot.
  Userspace workload is increased, we need more hugepage, we can't use
  "online_movable" to add memory and allow the system use more
  THP(transparent-huge-page), vice-verse when kernel workload is increase.
  Also help for virtualization to dynamic configure host/guest's memory,
  to save/(reduce waste) memory.
  Memory capacity on Demand
o When a new node is physically online after boot, we need to use
  "online_movable" or "online_kernel" to configure/portion it as we
  expected when we logic-online it.
  This configuration also helps for physically-memory-migrate.
o all benefit as the same as existed "movablecore" & "kernelcore".
o Preparing for movable-node, which is very important for power-saving,
  hardware partitioning and high-available-system(hardware fault
  management).
(Note, we don't introduce movable-node here.)
Action behavior:
When a memoryblock/memorysection is onlined by "online_movable", the kernel
will not have directly reference to the page of the memoryblock,
thus we can remove that memory any time when needed.
When it is online by "online_kernel", the kernel can use it.
When it is online by "online", the zone type doesn't changed.
Current constraints:
Only the memoryblock which is adjacent to the ZONE_MOVABLE
can be online from ZONE_NORMAL to ZONE_MOVABLE.
[akpm@linux-foundation.org: use min_t, cleanups]
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Jiang Liu <jiang.liu@huawei.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											
										 
											2012-12-11 16:03:16 -08:00
										 |  |  | enum { | 
					
						
							| 
									
										
										
										
											2014-08-06 16:05:13 -07:00
										 |  |  | 	MMOP_OFFLINE = -1, | 
					
						
							|  |  |  | 	MMOP_ONLINE_KEEP, | 
					
						
							|  |  |  | 	MMOP_ONLINE_KERNEL, | 
					
						
							|  |  |  | 	MMOP_ONLINE_MOVABLE, | 
					
						
							| 
									
										
											  
											
												mm, memory-hotplug: dynamic configure movable memory and portion memory
Add online_movable and online_kernel for logic memory hotplug.  This is
the dynamic version of "movablecore" & "kernelcore".
We have the same reason to introduce it as to introduce "movablecore" &
"kernelcore".  It has the same motive as "movablecore" & "kernelcore", but
it is dynamic/running-time:
o We can configure memory as kernelcore or movablecore after boot.
  Userspace workload is increased, we need more hugepage, we can't use
  "online_movable" to add memory and allow the system use more
  THP(transparent-huge-page), vice-verse when kernel workload is increase.
  Also help for virtualization to dynamic configure host/guest's memory,
  to save/(reduce waste) memory.
  Memory capacity on Demand
o When a new node is physically online after boot, we need to use
  "online_movable" or "online_kernel" to configure/portion it as we
  expected when we logic-online it.
  This configuration also helps for physically-memory-migrate.
o all benefit as the same as existed "movablecore" & "kernelcore".
o Preparing for movable-node, which is very important for power-saving,
  hardware partitioning and high-available-system(hardware fault
  management).
(Note, we don't introduce movable-node here.)
Action behavior:
When a memoryblock/memorysection is onlined by "online_movable", the kernel
will not have directly reference to the page of the memoryblock,
thus we can remove that memory any time when needed.
When it is online by "online_kernel", the kernel can use it.
When it is online by "online", the zone type doesn't changed.
Current constraints:
Only the memoryblock which is adjacent to the ZONE_MOVABLE
can be online from ZONE_NORMAL to ZONE_MOVABLE.
[akpm@linux-foundation.org: use min_t, cleanups]
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Jiang Liu <jiang.liu@huawei.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											
										 
											2012-12-11 16:03:16 -08:00
										 |  |  | }; | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2005-10-29 18:16:52 -07:00
										 |  |  | /*
 | 
					
						
							|  |  |  |  * pgdat resizing functions | 
					
						
							|  |  |  |  */ | 
					
						
							|  |  |  | static inline | 
					
						
							|  |  |  | void pgdat_resize_lock(struct pglist_data *pgdat, unsigned long *flags) | 
					
						
							|  |  |  | { | 
					
						
							|  |  |  | 	spin_lock_irqsave(&pgdat->node_size_lock, *flags); | 
					
						
							|  |  |  | } | 
					
						
							|  |  |  | static inline | 
					
						
							|  |  |  | void pgdat_resize_unlock(struct pglist_data *pgdat, unsigned long *flags) | 
					
						
							|  |  |  | { | 
					
						
							| 
									
										
										
										
											2005-10-29 18:16:53 -07:00
										 |  |  | 	spin_unlock_irqrestore(&pgdat->node_size_lock, *flags); | 
					
						
							| 
									
										
										
										
											2005-10-29 18:16:52 -07:00
										 |  |  | } | 
					
						
							|  |  |  | static inline | 
					
						
							|  |  |  | void pgdat_resize_init(struct pglist_data *pgdat) | 
					
						
							|  |  |  | { | 
					
						
							|  |  |  | 	spin_lock_init(&pgdat->node_size_lock); | 
					
						
							|  |  |  | } | 
					
						
							| 
									
										
										
										
											2005-10-29 18:16:53 -07:00
										 |  |  | /*
 | 
					
						
							|  |  |  |  * Zone resizing functions | 
					
						
							| 
									
										
											  
											
												mm, memory-hotplug: dynamic configure movable memory and portion memory
Add online_movable and online_kernel for logic memory hotplug.  This is
the dynamic version of "movablecore" & "kernelcore".
We have the same reason to introduce it as to introduce "movablecore" &
"kernelcore".  It has the same motive as "movablecore" & "kernelcore", but
it is dynamic/running-time:
o We can configure memory as kernelcore or movablecore after boot.
  Userspace workload is increased, we need more hugepage, we can't use
  "online_movable" to add memory and allow the system use more
  THP(transparent-huge-page), vice-verse when kernel workload is increase.
  Also help for virtualization to dynamic configure host/guest's memory,
  to save/(reduce waste) memory.
  Memory capacity on Demand
o When a new node is physically online after boot, we need to use
  "online_movable" or "online_kernel" to configure/portion it as we
  expected when we logic-online it.
  This configuration also helps for physically-memory-migrate.
o all benefit as the same as existed "movablecore" & "kernelcore".
o Preparing for movable-node, which is very important for power-saving,
  hardware partitioning and high-available-system(hardware fault
  management).
(Note, we don't introduce movable-node here.)
Action behavior:
When a memoryblock/memorysection is onlined by "online_movable", the kernel
will not have directly reference to the page of the memoryblock,
thus we can remove that memory any time when needed.
When it is online by "online_kernel", the kernel can use it.
When it is online by "online", the zone type doesn't changed.
Current constraints:
Only the memoryblock which is adjacent to the ZONE_MOVABLE
can be online from ZONE_NORMAL to ZONE_MOVABLE.
[akpm@linux-foundation.org: use min_t, cleanups]
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Jiang Liu <jiang.liu@huawei.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											
										 
											2012-12-11 16:03:16 -08:00
										 |  |  |  * | 
					
						
							|  |  |  |  * Note: any attempt to resize a zone should has pgdat_resize_lock() | 
					
						
							|  |  |  |  * zone_span_writelock() both held. This ensure the size of a zone | 
					
						
							|  |  |  |  * can't be changed while pgdat_resize_lock() held. | 
					
						
							| 
									
										
										
										
											2005-10-29 18:16:53 -07:00
										 |  |  |  */ | 
					
						
							|  |  |  | static inline unsigned zone_span_seqbegin(struct zone *zone) | 
					
						
							|  |  |  | { | 
					
						
							|  |  |  | 	return read_seqbegin(&zone->span_seqlock); | 
					
						
							|  |  |  | } | 
					
						
							|  |  |  | static inline int zone_span_seqretry(struct zone *zone, unsigned iv) | 
					
						
							|  |  |  | { | 
					
						
							|  |  |  | 	return read_seqretry(&zone->span_seqlock, iv); | 
					
						
							|  |  |  | } | 
					
						
							|  |  |  | static inline void zone_span_writelock(struct zone *zone) | 
					
						
							|  |  |  | { | 
					
						
							|  |  |  | 	write_seqlock(&zone->span_seqlock); | 
					
						
							|  |  |  | } | 
					
						
							|  |  |  | static inline void zone_span_writeunlock(struct zone *zone) | 
					
						
							|  |  |  | { | 
					
						
							|  |  |  | 	write_sequnlock(&zone->span_seqlock); | 
					
						
							|  |  |  | } | 
					
						
							|  |  |  | static inline void zone_seqlock_init(struct zone *zone) | 
					
						
							|  |  |  | { | 
					
						
							|  |  |  | 	seqlock_init(&zone->span_seqlock); | 
					
						
							|  |  |  | } | 
					
						
							| 
									
										
										
										
											2005-10-29 18:16:54 -07:00
										 |  |  | extern int zone_grow_free_lists(struct zone *zone, unsigned long new_nr_pages); | 
					
						
							|  |  |  | extern int zone_grow_waitqueues(struct zone *zone, unsigned long nr_pages); | 
					
						
							|  |  |  | extern int add_one_highpage(struct page *page, int pfn, int bad_ppro); | 
					
						
							|  |  |  | /* VM interface that may be used by firmware interface */ | 
					
						
							| 
									
										
											  
											
												mm, memory-hotplug: dynamic configure movable memory and portion memory
Add online_movable and online_kernel for logic memory hotplug.  This is
the dynamic version of "movablecore" & "kernelcore".
We have the same reason to introduce it as to introduce "movablecore" &
"kernelcore".  It has the same motive as "movablecore" & "kernelcore", but
it is dynamic/running-time:
o We can configure memory as kernelcore or movablecore after boot.
  Userspace workload is increased, we need more hugepage, we can't use
  "online_movable" to add memory and allow the system use more
  THP(transparent-huge-page), vice-verse when kernel workload is increase.
  Also help for virtualization to dynamic configure host/guest's memory,
  to save/(reduce waste) memory.
  Memory capacity on Demand
o When a new node is physically online after boot, we need to use
  "online_movable" or "online_kernel" to configure/portion it as we
  expected when we logic-online it.
  This configuration also helps for physically-memory-migrate.
o all benefit as the same as existed "movablecore" & "kernelcore".
o Preparing for movable-node, which is very important for power-saving,
  hardware partitioning and high-available-system(hardware fault
  management).
(Note, we don't introduce movable-node here.)
Action behavior:
When a memoryblock/memorysection is onlined by "online_movable", the kernel
will not have directly reference to the page of the memoryblock,
thus we can remove that memory any time when needed.
When it is online by "online_kernel", the kernel can use it.
When it is online by "online", the zone type doesn't changed.
Current constraints:
Only the memoryblock which is adjacent to the ZONE_MOVABLE
can be online from ZONE_NORMAL to ZONE_MOVABLE.
[akpm@linux-foundation.org: use min_t, cleanups]
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Jiang Liu <jiang.liu@huawei.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											
										 
											2012-12-11 16:03:16 -08:00
										 |  |  | extern int online_pages(unsigned long, unsigned long, int); | 
					
						
							| 
									
										
										
										
											2014-10-09 15:26:31 -07:00
										 |  |  | extern int test_pages_in_a_zone(unsigned long, unsigned long); | 
					
						
							| 
									
										
										
										
											2007-10-16 01:26:12 -07:00
										 |  |  | extern void __offline_isolated_pages(unsigned long, unsigned long); | 
					
						
							| 
									
										
										
										
											2007-10-16 01:26:14 -07:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2011-07-25 17:12:05 -07:00
										 |  |  | typedef void (*online_page_callback_t)(struct page *page); | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | extern int set_online_page_callback(online_page_callback_t callback); | 
					
						
							|  |  |  | extern int restore_online_page_callback(online_page_callback_t callback); | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | extern void __online_page_set_limits(struct page *page); | 
					
						
							|  |  |  | extern void __online_page_increment_counters(struct page *page); | 
					
						
							|  |  |  | extern void __online_page_free(struct page *page); | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2013-11-12 15:07:25 -08:00
										 |  |  | extern int try_online_node(int nid); | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2010-10-26 14:21:30 -07:00
										 |  |  | #ifdef CONFIG_MEMORY_HOTREMOVE
 | 
					
						
							|  |  |  | extern bool is_pageblock_removable_nolock(struct page *page); | 
					
						
							| 
									
										
										
										
											2013-02-22 16:32:58 -08:00
										 |  |  | extern int arch_remove_memory(u64 start, u64 size); | 
					
						
							| 
									
										
										
										
											2013-04-29 15:08:22 -07:00
										 |  |  | extern int __remove_pages(struct zone *zone, unsigned long start_pfn, | 
					
						
							|  |  |  | 	unsigned long nr_pages); | 
					
						
							| 
									
										
										
										
											2010-10-26 14:21:30 -07:00
										 |  |  | #endif /* CONFIG_MEMORY_HOTREMOVE */
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2005-10-29 18:16:54 -07:00
										 |  |  | /* reasonably generic interface to expand the physical pages in a zone  */ | 
					
						
							| 
									
										
										
										
											2009-01-06 14:39:14 -08:00
										 |  |  | extern int __add_pages(int nid, struct zone *zone, unsigned long start_pfn, | 
					
						
							| 
									
										
										
										
											2005-10-29 18:16:54 -07:00
										 |  |  | 	unsigned long nr_pages); | 
					
						
							| 
									
										
										
										
											2006-06-27 02:53:30 -07:00
										 |  |  | 
 | 
					
						
							|  |  |  | #ifdef CONFIG_NUMA
 | 
					
						
							|  |  |  | extern int memory_add_physaddr_to_nid(u64 start); | 
					
						
							|  |  |  | #else
 | 
					
						
							|  |  |  | static inline int memory_add_physaddr_to_nid(u64 start) | 
					
						
							|  |  |  | { | 
					
						
							|  |  |  | 	return 0; | 
					
						
							|  |  |  | } | 
					
						
							|  |  |  | #endif
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2006-06-27 02:53:32 -07:00
										 |  |  | #ifdef CONFIG_HAVE_ARCH_NODEDATA_EXTENSION
 | 
					
						
							|  |  |  | /*
 | 
					
						
							|  |  |  |  * For supporting node-hotadd, we have to allocate a new pgdat. | 
					
						
							|  |  |  |  * | 
					
						
							|  |  |  |  * If an arch has generic style NODE_DATA(), | 
					
						
							|  |  |  |  * node_data[nid] = kzalloc() works well. But it depends on the architecture. | 
					
						
							|  |  |  |  * | 
					
						
							|  |  |  |  * In general, generic_alloc_nodedata() is used. | 
					
						
							|  |  |  |  * Now, arch_free_nodedata() is just defined for error path of node_hot_add. | 
					
						
							|  |  |  |  * | 
					
						
							|  |  |  |  */ | 
					
						
							| 
									
										
										
										
											2006-06-27 02:53:40 -07:00
										 |  |  | extern pg_data_t *arch_alloc_nodedata(int nid); | 
					
						
							|  |  |  | extern void arch_free_nodedata(pg_data_t *pgdat); | 
					
						
							| 
									
										
											  
											
												[PATCH] pgdat allocation and update for ia64 of memory hotplug: update pgdat address array
This is to refresh node_data[] array for ia64.  As I mentioned previous
patches, ia64 has copies of information of pgdat address array on each node as
per node data.
At v2 of node_add, this function used stop_machine_run() to update them.  (I
wished that they were copied safety as much as possible.) But, in this patch,
this arrays are just copied simply, and set node_online_map bit after
completion of pgdat initialization.
So, kernel must touch NODE_DATA() macro after checking node_online_map().
(Current code has already done it.) This is more simple way for just
hot-add.....
Note : It will be problem when hot-remove will occur,
       because, even if online_map bit is set, kernel may
       touch NODE_DATA() due to race condition. :-(
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
											
										 
											2006-06-27 02:53:39 -07:00
										 |  |  | extern void arch_refresh_nodedata(int nid, pg_data_t *pgdat); | 
					
						
							| 
									
										
										
										
											2006-06-27 02:53:32 -07:00
										 |  |  | 
 | 
					
						
							|  |  |  | #else /* CONFIG_HAVE_ARCH_NODEDATA_EXTENSION */
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | #define arch_alloc_nodedata(nid)	generic_alloc_nodedata(nid)
 | 
					
						
							|  |  |  | #define arch_free_nodedata(pgdat)	generic_free_nodedata(pgdat)
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | #ifdef CONFIG_NUMA
 | 
					
						
							|  |  |  | /*
 | 
					
						
							|  |  |  |  * If ARCH_HAS_NODEDATA_EXTENSION=n, this func is used to allocate pgdat. | 
					
						
							|  |  |  |  * XXX: kmalloc_node() can't work well to get new node's memory at this time. | 
					
						
							|  |  |  |  *	Because, pgdat for the new node is not allocated/initialized yet itself. | 
					
						
							|  |  |  |  *	To use new node's memory, more consideration will be necessary. | 
					
						
							|  |  |  |  */ | 
					
						
							|  |  |  | #define generic_alloc_nodedata(nid)				\
 | 
					
						
							|  |  |  | ({								\ | 
					
						
							|  |  |  | 	kzalloc(sizeof(pg_data_t), GFP_KERNEL);			\ | 
					
						
							|  |  |  | }) | 
					
						
							|  |  |  | /*
 | 
					
						
							|  |  |  |  * This definition is just for error path in node hotadd. | 
					
						
							|  |  |  |  * For node hotremove, we have to replace this. | 
					
						
							|  |  |  |  */ | 
					
						
							|  |  |  | #define generic_free_nodedata(pgdat)	kfree(pgdat)
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2006-06-27 02:53:33 -07:00
										 |  |  | extern pg_data_t *node_data[]; | 
					
						
							|  |  |  | static inline void arch_refresh_nodedata(int nid, pg_data_t *pgdat) | 
					
						
							|  |  |  | { | 
					
						
							|  |  |  | 	node_data[nid] = pgdat; | 
					
						
							|  |  |  | } | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2006-06-27 02:53:32 -07:00
										 |  |  | #else /* !CONFIG_NUMA */
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | /* never called */ | 
					
						
							|  |  |  | static inline pg_data_t *generic_alloc_nodedata(int nid) | 
					
						
							|  |  |  | { | 
					
						
							|  |  |  | 	BUG(); | 
					
						
							|  |  |  | 	return NULL; | 
					
						
							|  |  |  | } | 
					
						
							|  |  |  | static inline void generic_free_nodedata(pg_data_t *pgdat) | 
					
						
							|  |  |  | { | 
					
						
							|  |  |  | } | 
					
						
							| 
									
										
										
										
											2006-06-27 02:53:33 -07:00
										 |  |  | static inline void arch_refresh_nodedata(int nid, pg_data_t *pgdat) | 
					
						
							|  |  |  | { | 
					
						
							|  |  |  | } | 
					
						
							| 
									
										
										
										
											2006-06-27 02:53:32 -07:00
										 |  |  | #endif /* CONFIG_NUMA */
 | 
					
						
							|  |  |  | #endif /* CONFIG_HAVE_ARCH_NODEDATA_EXTENSION */
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2013-02-22 16:33:00 -08:00
										 |  |  | #ifdef CONFIG_HAVE_BOOTMEM_INFO_NODE
 | 
					
						
							|  |  |  | extern void register_page_bootmem_info_node(struct pglist_data *pgdat); | 
					
						
							|  |  |  | #else
 | 
					
						
							| 
									
										
											  
											
												memory hotplug: register section/node id to free
This patch set is to free pages which is allocated by bootmem for
memory-hotremove.  Some structures of memory management are allocated by
bootmem.  ex) memmap, etc.
To remove memory physically, some of them must be freed according to
circumstance.  This patch set makes basis to free those pages, and free
memmaps.
Basic my idea is using remain members of struct page to remember information
of users of bootmem (section number or node id).  When the section is
removing, kernel can confirm it.  By this information, some issues can be
solved.
  1) When the memmap of removing section is allocated on other
     section by bootmem, it should/can be free.
  2) When the memmap of removing section is allocated on the
     same section, it shouldn't be freed. Because the section has to be
     logical memory offlined already and all pages must be isolated against
     page allocater. If it is freed, page allocator may use it which will
     be removed physically soon.
  3) When removing section has other section's memmap,
     kernel will be able to show easily which section should be removed
     before it for user. (Not implemented yet)
  4) When the above case 2), the page isolation will be able to check and skip
     memmap's page when logical memory offline (offline_pages()).
     Current page isolation code fails in this case because this page is
     just reserved page and it can't distinguish this pages can be
     removed or not. But, it will be able to do by this patch.
     (Not implemented yet.)
  5) The node information like pgdat has similar issues. But, this
     will be able to be solved too by this.
     (Not implemented yet, but, remembering node id in the pages.)
Fortunately, current bootmem allocator just keeps PageReserved flags,
and doesn't use any other members of page struct. The users of
bootmem doesn't use them too.
This patch:
This is to register information which is node or section's id.  Kernel can
distinguish which node/section uses the pages allcated by bootmem.  This is
basis for hot-remove sections or nodes.
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											
										 
											2008-04-28 02:13:31 -07:00
										 |  |  | static inline void register_page_bootmem_info_node(struct pglist_data *pgdat) | 
					
						
							|  |  |  | { | 
					
						
							|  |  |  | } | 
					
						
							|  |  |  | #endif
 | 
					
						
							| 
									
										
										
										
											2013-02-22 16:33:00 -08:00
										 |  |  | extern void put_page_bootmem(struct page *page); | 
					
						
							|  |  |  | extern void get_page_bootmem(unsigned long ingo, struct page *page, | 
					
						
							|  |  |  | 			     unsigned long type); | 
					
						
							| 
									
										
											  
											
												memory hotplug: register section/node id to free
This patch set is to free pages which is allocated by bootmem for
memory-hotremove.  Some structures of memory management are allocated by
bootmem.  ex) memmap, etc.
To remove memory physically, some of them must be freed according to
circumstance.  This patch set makes basis to free those pages, and free
memmaps.
Basic my idea is using remain members of struct page to remember information
of users of bootmem (section number or node id).  When the section is
removing, kernel can confirm it.  By this information, some issues can be
solved.
  1) When the memmap of removing section is allocated on other
     section by bootmem, it should/can be free.
  2) When the memmap of removing section is allocated on the
     same section, it shouldn't be freed. Because the section has to be
     logical memory offlined already and all pages must be isolated against
     page allocater. If it is freed, page allocator may use it which will
     be removed physically soon.
  3) When removing section has other section's memmap,
     kernel will be able to show easily which section should be removed
     before it for user. (Not implemented yet)
  4) When the above case 2), the page isolation will be able to check and skip
     memmap's page when logical memory offline (offline_pages()).
     Current page isolation code fails in this case because this page is
     just reserved page and it can't distinguish this pages can be
     removed or not. But, it will be able to do by this patch.
     (Not implemented yet.)
  5) The node information like pgdat has similar issues. But, this
     will be able to be solved too by this.
     (Not implemented yet, but, remembering node id in the pages.)
Fortunately, current bootmem allocator just keeps PageReserved flags,
and doesn't use any other members of page struct. The users of
bootmem doesn't use them too.
This patch:
This is to register information which is node or section's id.  Kernel can
distinguish which node/section uses the pages allcated by bootmem.  This is
basis for hot-remove sections or nodes.
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											
										 
											2008-04-28 02:13:31 -07:00
										 |  |  | 
 | 
					
						
							| 
									
										
											  
											
												mem-hotplug: implement get/put_online_mems
kmem_cache_{create,destroy,shrink} need to get a stable value of
cpu/node online mask, because they init/destroy/access per-cpu/node
kmem_cache parts, which can be allocated or destroyed on cpu/mem
hotplug.  To protect against cpu hotplug, these functions use
{get,put}_online_cpus.  However, they do nothing to synchronize with
memory hotplug - taking the slab_mutex does not eliminate the
possibility of race as described in patch 2.
What we need there is something like get_online_cpus, but for memory.
We already have lock_memory_hotplug, which serves for the purpose, but
it's a bit of a hammer right now, because it's backed by a mutex.  As a
result, it imposes some limitations to locking order, which are not
desirable, and can't be used just like get_online_cpus.  That's why in
patch 1 I substitute it with get/put_online_mems, which work exactly
like get/put_online_cpus except they block not cpu, but memory hotplug.
[ v1 can be found at https://lkml.org/lkml/2014/4/6/68.  I NAK'ed it by
  myself, because it used an rw semaphore for get/put_online_mems,
  making them dead lock prune.  ]
This patch (of 2):
{un}lock_memory_hotplug, which is used to synchronize against memory
hotplug, is currently backed by a mutex, which makes it a bit of a
hammer - threads that only want to get a stable value of online nodes
mask won't be able to proceed concurrently.  Also, it imposes some
strong locking ordering rules on it, which narrows down the set of its
usage scenarios.
This patch introduces get/put_online_mems, which are the same as
get/put_online_cpus, but for memory hotplug, i.e.  executing a code
inside a get/put_online_mems section will guarantee a stable value of
online nodes, present pages, etc.
lock_memory_hotplug()/unlock_memory_hotplug() are removed altogether.
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: Jiang Liu <liuj97@gmail.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											
										 
											2014-06-04 16:07:18 -07:00
										 |  |  | void get_online_mems(void); | 
					
						
							|  |  |  | void put_online_mems(void); | 
					
						
							| 
									
										
										
										
											2010-12-02 14:31:19 -08:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2005-10-29 18:16:52 -07:00
										 |  |  | #else /* ! CONFIG_MEMORY_HOTPLUG */
 | 
					
						
							|  |  |  | /*
 | 
					
						
							|  |  |  |  * Stub functions for when hotplug is off | 
					
						
							|  |  |  |  */ | 
					
						
							|  |  |  | static inline void pgdat_resize_lock(struct pglist_data *p, unsigned long *f) {} | 
					
						
							|  |  |  | static inline void pgdat_resize_unlock(struct pglist_data *p, unsigned long *f) {} | 
					
						
							|  |  |  | static inline void pgdat_resize_init(struct pglist_data *pgdat) {} | 
					
						
							| 
									
										
										
										
											2005-10-29 18:16:53 -07:00
										 |  |  | 
 | 
					
						
							|  |  |  | static inline unsigned zone_span_seqbegin(struct zone *zone) | 
					
						
							|  |  |  | { | 
					
						
							|  |  |  | 	return 0; | 
					
						
							|  |  |  | } | 
					
						
							|  |  |  | static inline int zone_span_seqretry(struct zone *zone, unsigned iv) | 
					
						
							|  |  |  | { | 
					
						
							|  |  |  | 	return 0; | 
					
						
							|  |  |  | } | 
					
						
							|  |  |  | static inline void zone_span_writelock(struct zone *zone) {} | 
					
						
							|  |  |  | static inline void zone_span_writeunlock(struct zone *zone) {} | 
					
						
							|  |  |  | static inline void zone_seqlock_init(struct zone *zone) {} | 
					
						
							| 
									
										
										
										
											2005-10-29 18:16:54 -07:00
										 |  |  | 
 | 
					
						
							|  |  |  | static inline int mhp_notimplemented(const char *func) | 
					
						
							|  |  |  | { | 
					
						
							|  |  |  | 	printk(KERN_WARNING "%s() called, with CONFIG_MEMORY_HOTPLUG disabled\n", func); | 
					
						
							|  |  |  | 	dump_stack(); | 
					
						
							|  |  |  | 	return -ENOSYS; | 
					
						
							|  |  |  | } | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
											  
											
												memory hotplug: register section/node id to free
This patch set is to free pages which is allocated by bootmem for
memory-hotremove.  Some structures of memory management are allocated by
bootmem.  ex) memmap, etc.
To remove memory physically, some of them must be freed according to
circumstance.  This patch set makes basis to free those pages, and free
memmaps.
Basic my idea is using remain members of struct page to remember information
of users of bootmem (section number or node id).  When the section is
removing, kernel can confirm it.  By this information, some issues can be
solved.
  1) When the memmap of removing section is allocated on other
     section by bootmem, it should/can be free.
  2) When the memmap of removing section is allocated on the
     same section, it shouldn't be freed. Because the section has to be
     logical memory offlined already and all pages must be isolated against
     page allocater. If it is freed, page allocator may use it which will
     be removed physically soon.
  3) When removing section has other section's memmap,
     kernel will be able to show easily which section should be removed
     before it for user. (Not implemented yet)
  4) When the above case 2), the page isolation will be able to check and skip
     memmap's page when logical memory offline (offline_pages()).
     Current page isolation code fails in this case because this page is
     just reserved page and it can't distinguish this pages can be
     removed or not. But, it will be able to do by this patch.
     (Not implemented yet.)
  5) The node information like pgdat has similar issues. But, this
     will be able to be solved too by this.
     (Not implemented yet, but, remembering node id in the pages.)
Fortunately, current bootmem allocator just keeps PageReserved flags,
and doesn't use any other members of page struct. The users of
bootmem doesn't use them too.
This patch:
This is to register information which is node or section's id.  Kernel can
distinguish which node/section uses the pages allcated by bootmem.  This is
basis for hot-remove sections or nodes.
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											
										 
											2008-04-28 02:13:31 -07:00
										 |  |  | static inline void register_page_bootmem_info_node(struct pglist_data *pgdat) | 
					
						
							|  |  |  | { | 
					
						
							|  |  |  | } | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2013-11-12 15:07:25 -08:00
										 |  |  | static inline int try_online_node(int nid) | 
					
						
							|  |  |  | { | 
					
						
							|  |  |  | 	return 0; | 
					
						
							|  |  |  | } | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
											  
											
												mem-hotplug: implement get/put_online_mems
kmem_cache_{create,destroy,shrink} need to get a stable value of
cpu/node online mask, because they init/destroy/access per-cpu/node
kmem_cache parts, which can be allocated or destroyed on cpu/mem
hotplug.  To protect against cpu hotplug, these functions use
{get,put}_online_cpus.  However, they do nothing to synchronize with
memory hotplug - taking the slab_mutex does not eliminate the
possibility of race as described in patch 2.
What we need there is something like get_online_cpus, but for memory.
We already have lock_memory_hotplug, which serves for the purpose, but
it's a bit of a hammer right now, because it's backed by a mutex.  As a
result, it imposes some limitations to locking order, which are not
desirable, and can't be used just like get_online_cpus.  That's why in
patch 1 I substitute it with get/put_online_mems, which work exactly
like get/put_online_cpus except they block not cpu, but memory hotplug.
[ v1 can be found at https://lkml.org/lkml/2014/4/6/68.  I NAK'ed it by
  myself, because it used an rw semaphore for get/put_online_mems,
  making them dead lock prune.  ]
This patch (of 2):
{un}lock_memory_hotplug, which is used to synchronize against memory
hotplug, is currently backed by a mutex, which makes it a bit of a
hammer - threads that only want to get a stable value of online nodes
mask won't be able to proceed concurrently.  Also, it imposes some
strong locking ordering rules on it, which narrows down the set of its
usage scenarios.
This patch introduces get/put_online_mems, which are the same as
get/put_online_cpus, but for memory hotplug, i.e.  executing a code
inside a get/put_online_mems section will guarantee a stable value of
online nodes, present pages, etc.
lock_memory_hotplug()/unlock_memory_hotplug() are removed altogether.
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: Jiang Liu <liuj97@gmail.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											
										 
											2014-06-04 16:07:18 -07:00
										 |  |  | static inline void get_online_mems(void) {} | 
					
						
							|  |  |  | static inline void put_online_mems(void) {} | 
					
						
							| 
									
										
										
										
											2010-12-02 14:31:19 -08:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2005-10-29 18:16:53 -07:00
										 |  |  | #endif /* ! CONFIG_MEMORY_HOTPLUG */
 | 
					
						
							| 
									
										
										
										
											2006-04-07 19:49:15 +02:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2008-07-23 21:28:19 -07:00
										 |  |  | #ifdef CONFIG_MEMORY_HOTREMOVE
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | extern int is_mem_section_removable(unsigned long pfn, unsigned long nr_pages); | 
					
						
							| 
									
										
										
										
											2013-02-22 16:33:27 -08:00
										 |  |  | extern void try_offline_node(int nid); | 
					
						
							| 
									
										
										
										
											2013-06-01 22:24:07 +02:00
										 |  |  | extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages); | 
					
						
							|  |  |  | extern void remove_memory(int nid, u64 start, u64 size); | 
					
						
							| 
									
										
										
										
											2008-07-23 21:28:19 -07:00
										 |  |  | 
 | 
					
						
							|  |  |  | #else
 | 
					
						
							|  |  |  | static inline int is_mem_section_removable(unsigned long pfn, | 
					
						
							|  |  |  | 					unsigned long nr_pages) | 
					
						
							|  |  |  | { | 
					
						
							|  |  |  | 	return 0; | 
					
						
							|  |  |  | } | 
					
						
							| 
									
										
										
										
											2013-02-22 16:33:27 -08:00
										 |  |  | 
 | 
					
						
							|  |  |  | static inline void try_offline_node(int nid) {} | 
					
						
							| 
									
										
										
										
											2013-06-01 22:24:07 +02:00
										 |  |  | 
 | 
					
						
							|  |  |  | static inline int offline_pages(unsigned long start_pfn, unsigned long nr_pages) | 
					
						
							|  |  |  | { | 
					
						
							|  |  |  | 	return -EINVAL; | 
					
						
							|  |  |  | } | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | static inline void remove_memory(int nid, u64 start, u64 size) {} | 
					
						
							| 
									
										
										
										
											2008-07-23 21:28:19 -07:00
										 |  |  | #endif /* CONFIG_MEMORY_HOTREMOVE */
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2013-05-08 00:29:49 +02:00
										 |  |  | extern int walk_memory_range(unsigned long start_pfn, unsigned long end_pfn, | 
					
						
							|  |  |  | 		void *arg, int (*func)(struct memory_block *, void *)); | 
					
						
							| 
									
										
										
										
											2006-06-27 02:53:30 -07:00
										 |  |  | extern int add_memory(int nid, u64 start, u64 size); | 
					
						
							| 
									
										
											  
											
												memory-hotplug: add zone_for_memory() for selecting zone for new memory
This series of patches fixes a problem when adding memory in bad manner.
For example: for a x86_64 machine booted with "mem=400M" and with 2GiB
memory installed, following commands cause problem:
  # echo 0x40000000 > /sys/devices/system/memory/probe
 [   28.613895] init_memory_mapping: [mem 0x40000000-0x47ffffff]
  # echo 0x48000000 > /sys/devices/system/memory/probe
 [   28.693675] init_memory_mapping: [mem 0x48000000-0x4fffffff]
  # echo online_movable > /sys/devices/system/memory/memory9/state
  # echo 0x50000000 > /sys/devices/system/memory/probe
 [   29.084090] init_memory_mapping: [mem 0x50000000-0x57ffffff]
  # echo 0x58000000 > /sys/devices/system/memory/probe
 [   29.151880] init_memory_mapping: [mem 0x58000000-0x5fffffff]
  # echo online_movable > /sys/devices/system/memory/memory11/state
  # echo online> /sys/devices/system/memory/memory8/state
  # echo online> /sys/devices/system/memory/memory10/state
  # echo offline> /sys/devices/system/memory/memory9/state
 [   30.558819] Offlined Pages 32768
  # free
              total       used       free     shared    buffers     cached
 Mem:        780588 18014398509432020     830552          0          0      51180
 -/+ buffers/cache: 18014398509380840     881732
 Swap:            0          0          0
This is because the above commands probe higher memory after online a
section with online_movable, which causes ZONE_HIGHMEM (or ZONE_NORMAL
for systems without ZONE_HIGHMEM) overlaps ZONE_MOVABLE.
After the second online_movable, the problem can be observed from
zoneinfo:
  # cat /proc/zoneinfo
  ...
  Node 0, zone  Movable
    pages free     65491
          min      250
          low      312
          high     375
          scanned  0
          spanned  18446744073709518848
          present  65536
          managed  65536
  ...
This series of patches solve the problem by checking ZONE_MOVABLE when
choosing zone for new memory.  If new memory is inside or higher than
ZONE_MOVABLE, makes it go there instead.
After applying this series of patches, following are free and zoneinfo
result (after offlining memory9):
  bash-4.2# free
                total       used       free     shared    buffers     cached
   Mem:        780956      80112     700844          0          0      51180
   -/+ buffers/cache:      28932     752024
   Swap:            0          0          0
  bash-4.2# cat /proc/zoneinfo
  Node 0, zone      DMA
    pages free     3389
          min      14
          low      17
          high     21
          scanned  0
          spanned  4095
          present  3998
          managed  3977
      nr_free_pages 3389
  ...
    start_pfn:         1
    inactive_ratio:    1
  Node 0, zone    DMA32
    pages free     73724
          min      341
          low      426
          high     511
          scanned  0
          spanned  98304
          present  98304
          managed  92958
      nr_free_pages 73724
    ...
    start_pfn:         4096
    inactive_ratio:    1
  Node 0, zone   Normal
    pages free     32630
          min      120
          low      150
          high     180
          scanned  0
          spanned  32768
          present  32768
          managed  32768
      nr_free_pages 32630
  ...
    start_pfn:         262144
    inactive_ratio:    1
  Node 0, zone  Movable
    pages free     65476
          min      241
          low      301
          high     361
          scanned  0
          spanned  98304
          present  65536
          managed  65536
      nr_free_pages 65476
  ...
    start_pfn:         294912
    inactive_ratio:    1
This patch (of 7):
Introduce zone_for_memory() in arch independent code for
arch_add_memory() use.
Many arch_add_memory() function simply selects ZONE_HIGHMEM or
ZONE_NORMAL and add new memory into it.  However, with the existance of
ZONE_MOVABLE, the selection method should be carefully considered: if
new, higher memory is added after ZONE_MOVABLE is setup, the default
zone and ZONE_MOVABLE may overlap each other.
should_add_memory_movable() checks the status of ZONE_MOVABLE.  If it
has already contain memory, compare the address of new memory and
movable memory.  If new memory is higher than movable, it should be
added into ZONE_MOVABLE instead of default zone.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: "Mel Gorman" <mgorman@suse.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											
										 
											2014-08-06 16:07:36 -07:00
										 |  |  | extern int zone_for_memory(int nid, u64 start, u64 size, int zone_default); | 
					
						
							| 
									
										
										
										
											2006-06-27 02:53:30 -07:00
										 |  |  | extern int arch_add_memory(int nid, u64 start, u64 size); | 
					
						
							| 
									
										
										
										
											2012-10-08 16:33:58 -07:00
										 |  |  | extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages); | 
					
						
							| 
									
										
										
										
											2013-02-22 16:32:52 -08:00
										 |  |  | extern bool is_memblock_offlined(struct memory_block *mem); | 
					
						
							| 
									
										
										
										
											2013-05-27 12:58:46 +02:00
										 |  |  | extern void remove_memory(int nid, u64 start, u64 size); | 
					
						
							| 
									
										
										
										
											2013-11-12 15:07:42 -08:00
										 |  |  | extern int sparse_add_one_section(struct zone *zone, unsigned long start_pfn); | 
					
						
							| 
									
										
										
										
											2008-04-28 02:12:01 -07:00
										 |  |  | extern void sparse_remove_one_section(struct zone *zone, struct mem_section *ms); | 
					
						
							| 
									
										
											  
											
												memory hotplug: register section/node id to free
This patch set is to free pages which is allocated by bootmem for
memory-hotremove.  Some structures of memory management are allocated by
bootmem.  ex) memmap, etc.
To remove memory physically, some of them must be freed according to
circumstance.  This patch set makes basis to free those pages, and free
memmaps.
Basic my idea is using remain members of struct page to remember information
of users of bootmem (section number or node id).  When the section is
removing, kernel can confirm it.  By this information, some issues can be
solved.
  1) When the memmap of removing section is allocated on other
     section by bootmem, it should/can be free.
  2) When the memmap of removing section is allocated on the
     same section, it shouldn't be freed. Because the section has to be
     logical memory offlined already and all pages must be isolated against
     page allocater. If it is freed, page allocator may use it which will
     be removed physically soon.
  3) When removing section has other section's memmap,
     kernel will be able to show easily which section should be removed
     before it for user. (Not implemented yet)
  4) When the above case 2), the page isolation will be able to check and skip
     memmap's page when logical memory offline (offline_pages()).
     Current page isolation code fails in this case because this page is
     just reserved page and it can't distinguish this pages can be
     removed or not. But, it will be able to do by this patch.
     (Not implemented yet.)
  5) The node information like pgdat has similar issues. But, this
     will be able to be solved too by this.
     (Not implemented yet, but, remembering node id in the pages.)
Fortunately, current bootmem allocator just keeps PageReserved flags,
and doesn't use any other members of page struct. The users of
bootmem doesn't use them too.
This patch:
This is to register information which is node or section's id.  Kernel can
distinguish which node/section uses the pages allcated by bootmem.  This is
basis for hot-remove sections or nodes.
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
											
										 
											2008-04-28 02:13:31 -07:00
										 |  |  | extern struct page *sparse_decode_mem_map(unsigned long coded_mem_map, | 
					
						
							|  |  |  | 					  unsigned long pnum); | 
					
						
							| 
									
										
										
										
											2006-04-07 19:49:15 +02:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2005-10-29 18:16:52 -07:00
										 |  |  | #endif /* __LINUX_MEMORY_HOTPLUG_H */
 |