Commit graph

88873 commits

Author SHA1 Message Date
Ron Rindjunsky
8114fcf185 iwlwifi: A-MPDU Tx conform API to mac80211
This patch alters the current API in order to fit the new
API mac80211 gives for A-MPDU Tx

Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-02-29 15:19:18 -05:00
Ron Rindjunsky
d92684e660 mac80211: A-MPDU Tx add delBA from recipient support
This patch adds the ability to handle delBA from recipient to initiator
during an A-MPDU session

Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-02-29 15:19:18 -05:00
Ron Rindjunsky
483fdcecc5 mac80211: A-MPDU Tx change tx_status to support Block Ack data
This patch adds fields to ieee80211_tx_status in order to allow block ack
information exchange between low-level driver,mac80211 and rate scaling
module.

Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-02-29 15:19:18 -05:00
Ron Rindjunsky
eb2ba62ee5 mac80211: A-MPDU add debugfs support
This patch adds A-MPDU status report per STA to the debugfs.
The option to de/activate A-MPDU through debugfs is also present.

Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-02-29 15:19:17 -05:00
Ron Rindjunsky
fe3bf0f59e mac80211: A-MPDU Tx MLME data initialization
This patch initialize A-MPDU MLME data for Tx sessions.

Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-02-29 15:19:17 -05:00
Ron Rindjunsky
9e72349237 mac80211: A-MPDU Tx adding qdisc support
This patch allows qdisc support in A-MPDU Tx. a method to
handle QoS <-> TID switches is present in this patch.

Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-02-29 15:19:17 -05:00
Ron Rindjunsky
eadc8d9e90 mac80211: A-MPDU Tx adding basic functionality
This patch adds the following abilities to mac80211:
 - start A-MPDU Tx session
 - stop A-MPDU Tx session
 - call backs to start/stop A-MPDU Tx session
 - sending addBA request
 - processing addBA response

Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-02-29 15:19:14 -05:00
Ron Rindjunsky
80656c2031 mac80211: A-MPDU Tx add MLME structures
This patch adds the needed structures to describe the Tx aggregation MLME
per STA
new:
 - struct tid_ampdu_tx: TID aggregation information (Tx)
changed:
 - struct sta_ampdu_mlme: Tx aggregation information per TID and
			  dialog token creator were added
 - struct sta_info: tid_to_tx_q added for tid<->tx queue mapping

Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-02-29 15:19:13 -05:00
Ron Rindjunsky
0df3ef45a3 mac80211: A-MPDU Tx add session's and low level driver's API
This patch adds the API for 3 stages in A-MPDU Tx session flow:
- request mac80211 to start/stop A-MPDU Tx session for specific TID. such a
  request should be issued by a load aware element, either mac80211 itself
  or external element.
- requests by mac80211 to low-level driver to start/stop Tx aggregation.
  notice that low level driver responds now with Starting Sequence Number.
- async feedback by low-level to mac80211 to inform that HW is ready for
  next A-MPDU Tx state.
Changes in API to Rx A-MPDU were also made, reflected in iwlwifi changes as
well.

Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-02-29 15:19:13 -05:00
Holger Schurig
8816edcea9 libertas: rename/document scan_channel
Rename last_scanned_channel to scan_channel, just so that a
grep for struct bss_descriptor's last_scanned element doesn't
show up so many positives.

Also documented the variable and moved it to other scan related
entries in lbs_private.

Signed-off-by: Holger Schurig <hs4233@mail.mn-solutions.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-02-29 15:19:12 -05:00
Holger Schurig
23ff50361f libertas: make lbs_unset_basic_rate_flags() static
... by moving it into the file where it's sole user resides

Signed-off-by: Holger Schurig <hs4233@mail.mn-solutions.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-02-29 15:19:11 -05:00
Holger Schurig
e226868ec3 libertas: make lbs_sync_channel() static
... by moving it into the file where it's sole user resides

Signed-off-by: Holger Schurig <hs4233@mail.mn-solutions.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-02-29 15:19:11 -05:00
Holger Schurig
52507c2087 libertas: make association debug output nicer
This also fixes a bug where should_deauth_infrastructure() always
returned 0.

Signed-off-by: Holger Schurig <hs4233@mail.mn-solutions.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-02-29 15:19:10 -05:00
Holger Schurig
1afc09ab7c libertas: trim overly long debug statement
Signed-off-by: Holger Schurig <hs4233@mail.mn-solutions.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-02-29 15:19:09 -05:00
Johannes Berg
7d185b8bb1 mac80211: allow sending multicast frames through virtual ports
When reworking the port access control code, I forgot multicast frames
and those are now always rejected because the destination station is
not known. This changes the code to allow through multicast frames and
also avoid the sta hash lookup (which is bound to fail) for them.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-02-29 15:19:09 -05:00
Tomas Winkler
b220525696 mac80211: set assoc flag to bss_conf
Only BSS_CHANGED_ASSOC was set in the 'changed' bitmask. Assignment  to
bss_conf.assoc was absent.
This patch assign value to  bss_conf.assoc according the association state.

Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-02-29 15:19:08 -05:00
Ilpo Järvinen
03a64c93b6 [LLC]: Kill static inline llc_addrany
After the patch:
$ git-grep llc_addrany | wc -l
0

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-29 11:46:17 -08:00
Ilpo Järvinen
a90bcbd651 [SCTP]: Kill unused static inline sctp_sysctl_jiffies_ms
After the patch:
$ git-grep sctp_sysctl_jiffies_ms | wc -l
0

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-29 11:45:34 -08:00
Pavel Emelyanov
95a363582b [NET]: Use existing device list walker for /proc/dev_mcast.
The seq_file_operations' dev_mc_seq_xxx callbacks do the same thing as
the dev_seq_xxx ones do, but skip the SEQ_START_TOKEN.

So use the existing exported dev_seq_xxx calls and handle the
SEQ_START_TOKEN in the dev_mc_seq_show().

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-29 11:44:14 -08:00
Denis V. Lunev
fd80eb942a [INET]: Remove struct dst_entry *dst from request_sock_ops.rtx_syn_ack.
It looks like dst parameter is used in this API due to historical
reasons.  Actually, it is really used in the direct call to
tcp_v4_send_synack only.  So, create a wrapper for tcp_v4_send_synack
and remove dst from rtx_syn_ack.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-29 11:43:03 -08:00
Neil Horman
58fbbed4fb [SCTP]: extend exported data in /proc/net/sctp/assoc
RFC 3873 specifies several MIB objects that can't be obtained by the
current data set exported by /proc/sys/net/sctp/assoc.  This patch
adds the missing pieces of data that allow us to compute all the
objects in the sctpAssocTable object.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-29 11:40:56 -08:00
Pavel Emelyanov
665bba1087 [NETFILTER/RXRPC]: Don't use seq_release_private where inappropriate.
Some netfilter code and rxrpc one use seq_open() to open
a proc file, but seq_release_private to release one.

This is harmless, but ambiguous.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-29 11:39:17 -08:00
Pavel Emelyanov
c20932d2c9 [ATALK/DECNET]: Use seq_open_private in appletalk and decnet.
These two also perform manual seq_open_private, so patch them both at
once. But unlike ATM code, these already use the seq_release_private,
so I splitted this patch from the previous one.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-29 11:38:24 -08:00
Pavel Emelyanov
9a8c09e73b [ATM]: Use seq_open/release_privade instead of manual manipulations.
lec_seq_open/lec_seq_release and __vcc_seq_open/vcc_seq_release
do seq_open/release_private's job.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-29 11:37:02 -08:00
David S. Miller
45af1754bc [NET]: sk_release_kernel needs to be exported to modules
Fixes:

ERROR: "sk_release_kernel" [net/ipv6/ipv6.ko] undefined!

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-29 11:33:19 -08:00
Pavel Emelyanov
459eea7410 [SCTP]: Use proc_create to setup de->proc_fops.
In addition to commit 160f17 ("[SCTP]: Use proc_create() to setup
->proc_fops first") use proc_create in two more places.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-29 11:24:45 -08:00
Denis V. Lunev
98c6d1b261 [NETNS]: Make icmpv6_sk per namespace.
All preparations are done. Now just add a hook to perform an
initialization on namespace startup and replace icmpv6_sk macro with
proper inline call.  Actual namespace the packet belongs too will be
passed later along with the one for the routing.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-29 11:21:22 -08:00
Paul E. McKenney
c9e71002aa rcupreempt: remove never-migrates assumption from rcu_process_callbacks()
This patch fixes a potentially invalid access to a per-CPU variable in
rcu_process_callbacks().

This per-CPU access needs to be done in such a way as to guarantee that
the code using it cannot move to some other CPU before all uses of the
value accessed have completed.  Even though this code is currently only
invoked from softirq context, which currrently cannot migrate to some
other CPU, life would be better if this code did not silently make such
an assumption.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-29 20:21:13 +01:00
Paul E. McKenney
ae778869ae rcupreempt: fix hibernate/resume in presence of PREEMPT_RCU and hotplug
This fixes a oops encountered when doing hibernate/resume in presence of
PREEMPT_RCU.

The problem was that the code failed to disable preemption when
accessing a per-CPU variable.  This is OK when called from code that
already has preemption disabled, but such is not the case from the
suspend/resume code path.

Reported-by: Dave Young <hidave.darkstar@gmail.com>
Tested-by: Dave Young <hidave.darkstar@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-29 20:21:13 +01:00
Denis V. Lunev
4a6ad7a141 [NETNS]: Make icmp_sk per namespace.
All preparations are done. Now just add a hook to perform an
initialization on namespace startup and replace icmp_sk macro with
proper inline call.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-29 11:19:58 -08:00
Denis V. Lunev
5c8cafd65e [NETNS]: icmp(v6)_sk should not pin a namespace.
So, change icmp(v6)_sk creation/disposal to the scheme used in the
netlink for rtnl, i.e. create a socket in the context of the init_net
and assign the namespace without getting a referrence later.

Also use sk_release_kernel instead of sock_release to properly destroy
such sockets.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-29 11:19:22 -08:00
Denis V. Lunev
edf0208702 [NET]: Make netlink_kernel_release publically available as sk_release_kernel.
This staff will be needed for non-netlink kernel sockets, which should
also not pin a namespace like tcp_socket and icmp_socket.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-29 11:18:32 -08:00
Denis V. Lunev
9dfbec1fb2 [NETLINK]: No need for a separate __netlink_release call.
Merge it to netlink_kernel_release.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-29 11:17:56 -08:00
Denis V. Lunev
79c9115953 [ICMP]: Allocate data for __icmp(v6)_sk dynamically.
Own __icmp(v6)_sk should be present in each namespace. So, it should be
allocated dynamically. Though, alloc_percpu does not fit the case as it
implies additional dereferrence for no bonus.

Allocate data for pointers just like __percpu_alloc_mask does and place
pointers to struct sock into this array.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-29 11:17:11 -08:00
Denis V. Lunev
405666db84 [ICMP]: Pass proper ICMP socket into icmp(v6)_xmit_(un)lock.
We have to get socket lock inside icmp(v6)_xmit_lock/unlock. The socket
is get from global variable now. When this code became namespaces, one
should pass a namespace and get socket from it.

Though, above is useless. Socket is available in the caller, just pass
it inside. This saves a bit of code now and saves more later.

add/remove: 0/0 grow/shrink: 1/3 up/down: 1/-169 (-168)
function                                     old     new   delta
icmp_rcv                                     718     719      +1
icmpv6_rcv                                  2343    2303     -40
icmp_send                                   1566    1518     -48
icmp_reply                                   549     468     -81

Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-29 11:16:46 -08:00
Denis V. Lunev
b7e729c4b4 [ICMP]: Store sock rather than socket for ICMP flow control.
Basically, there is no difference, what to store: socket or sock. Though,
sock looks better as there will be 1 less dereferrence on the fast path.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-29 11:16:08 -08:00
Denis V. Lunev
1e3cf6834e [ICMP]: Optimize icmp_socket usage.
Use this macro only once in a function to save a bit of space.

add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-98 (-98)
function                                     old     new   delta
icmp_reply                                   562     561      -1
icmp_push_reply                              305     258     -47
icmp_init                                    273     223     -50

Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-29 11:15:42 -08:00
Denis V. Lunev
a5710d6582 [ICMP]: Add return code to icmp_init.
icmp_init could fail and this is normal for namespace other than initial.
So, the panic should be triggered only on init_net initialization path.

Additionally create rollback path for icmp_init as a separate function.
It will also be used later during namespace destruction.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-29 11:14:50 -08:00
Denis V. Lunev
9b0f976f27 [INET]: Remove struct net_proto_family* from _init calls.
struct net_proto_family* is not used in icmp[v6]_init, ndisc_init,
igmp_init and tcp_v4_init. Remove it.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-29 11:13:15 -08:00
Wang Chen
5e47879f49 [IRDA]: Use proc_create() to setup ->proc_fops first
Use proc_create() to make sure that ->proc_fops be setup before gluing
PDE to main tree.

Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-29 10:34:45 -08:00
Linus Torvalds
076d84bbdb Merge git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched
* git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched:
  softlockup: fix task state setting
  rcu: add support for dynamic ticks and preempt rcu
2008-02-29 10:19:27 -08:00
Jeremy Fitzhardinge
d40e705903 xen: mask out SEP from CPUID
Fix 32-on-64 pvops kernel:

we don't want userspace using syscall/sysenter, even if the hypervisor
supports it, so mask it out from CPUID.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-29 18:55:43 +01:00
Dave Anderson
53c5858810 x86 ptrace: fix ptrace_bts_config structure declaration
The 2.6.25 ptrace_bts_config structure in asm-x86/ptrace-abi.h
is defined with u32 types:

   #include <asm/types.h>

   /* configuration/status structure used in PTRACE_BTS_CONFIG and
      PTRACE_BTS_STATUS commands.
   */
   struct ptrace_bts_config {
           /* requested or actual size of BTS buffer in bytes */
           u32 size;
           /* bitmask of below flags */
           u32 flags;
           /* buffer overflow signal */
           u32 signal;
           /* actual size of bts_struct in bytes */
           u32 bts_size;
   };
   #endif

But u32 is only accessible in asm-x86/types.h if __KERNEL__,
leading to compile errors when ptrace.h is included from
user-space. The double-underscore versions that are exported
to user-space in asm-x86/types.h should be used instead.

Signed-off-by: Dave Anderson <anderson@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-29 18:55:43 +01:00
Ingo Molnar
b4ef95de00 x86: disable BTS ptrace extensions for now
revert the BTS ptrace extension for now.

based on general objections from Roland McGrath:

    http://lkml.org/lkml/2008/2/21/323

we'll let the BTS functionality cook some more and re-enable
it in v2.6.26. We'll leave the dead code around to help the
development of this code.

(X86_BTS is not defined at the moment)

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-29 18:55:42 +01:00
Thomas Gleixner
8be8f54bae x86: CPA: avoid split of alias mappings
avoid over-eager large page splitup.

When the target area needs to be split or is split already (ioremap)
then the current code enforces the split of large mappings in the alias
regions even if we could avoid it.

Use a separate variable processed in the cpa_data structure to carry
the number of pages which have been processed instead of reusing the
numpages variable. This keeps numpages intact and gives the alias code
a chance to keep large mappings intact.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-29 18:55:42 +01:00
Ingo Molnar
757265b8c5 x86: delay the export removal of init_mm
delay the removal of this symbol export by one more kernel release,
giving external modules such as VirtualBox a chance to stop using it.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-29 18:55:42 +01:00
Ingo Molnar
b16bf712f4 x86: fix leak un ioremap_page_range() failure
Jan Beulich noticed it during code review that if a driver's ioremap()
fails (say due to -ENOMEM) then we might leak the struct vm_area.

Free it properly.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-29 18:55:42 +01:00
Roland McGrath
f2dbe03dcc x86 vdso: fix build locale dependency
Priit Laes discovered that the sed command processing nm output was
sensitive to locale settings.  This was addressed in commit
03994f01e8 by using [:alnum:] in place of
[a-zA-Z0-9].

But that solution too is locale-dependent and may not always match
the identifiers it needs to.  The better fix is just to run sed et al
with a fixed locale setting in all builds.

Signed-off-by: Roland McGrath <roland@redhat.com>
CC: Priit Laes <plaes@plaes.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-29 18:55:39 +01:00
Thomas Gleixner
d67bbacb4b x86: restore vsyscall64 prochandler
a recent fix:

  commit ce28b9864b
  Author: Thomas Gleixner <tglx@linutronix.de>
  Date:   Wed Feb 20 23:57:30 2008 +0100

    x86: fix vsyscall wreckage

removed the broken /kernel/vsyscall64 handler completely.
This triggers the following debug check:

  sysctl table check failed: /kernel/vsyscall64  No proc_handler

Restore the sane part of the proc handler.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-29 18:55:39 +01:00
Hans Rosenfeld
cded932b75 x86: fix pmd_bad and pud_bad to support huge pages
I recently stumbled upon a problem in the support for huge pages. If a
program using huge pages does not explicitly unmap them, they remain
mapped (and therefore, are lost) after the program exits.

I observed that the free huge page count in /proc/meminfo decreased when
running my program, and it did not increase after the program exited.
After running the program a few times, no more huge pages could be
allocated.

The reason for this seems to be that the x86 pmd_bad and pud_bad
consider pmd/pud entries having the PSE bit set invalid. I think there
is nothing wrong with this bit being set, it just indicates that the
lowest level of translation has been reached. This bit has to be (and
is) checked after the basic validity of the entry has been checked, like
in this fragment from follow_page() in mm/memory.c:

  if (pmd_none(*pmd) || unlikely(pmd_bad(*pmd)))
          goto no_page_table;

  if (pmd_huge(*pmd)) {
          BUG_ON(flags & FOLL_GET);
          page = follow_huge_pmd(mm, address, pmd, flags & FOLL_WRITE);
          goto out;
  }

Note that this code currently doesn't work as intended if the pmd refers
to a huge page, the pmd_huge() check can not be reached if the page is
huge.

Extending pmd_bad() (and, for future 1GB page support, pud_bad()) to
allow for the PSE bit being set fixes this. For similar reasons,
allowing the NX bit being set is necessary, too. I have seen huge pages
having the NX bit set in their pmd entry, which would cause the same
problem.

Signed-Off-By: Hans Rosenfeld <hans.rosenfeld@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-29 18:55:39 +01:00