Commit graph

170001 commits

Author SHA1 Message Date
Kees Cook
0e1a6ef2de sysctl: require CAP_SYS_RAWIO to set mmap_min_addr
Currently the mmap_min_addr value can only be bypassed during mmap when
the task has CAP_SYS_RAWIO.  However, the mmap_min_addr sysctl value itself
can be adjusted to 0 if euid == 0, allowing a bypass without CAP_SYS_RAWIO.
This patch adds a check for the capability before allowing mmap_min_addr to
be changed.

Signed-off-by: Kees Cook <kees.cook@canonical.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
2009-11-09 08:34:22 +11:00
Theodore Ts'o
1e424a3483 ext4: partial revert to fix double brelse WARNING()
This is a partial revert of commit 6487a9d (only the changes made to
fs/ext4/namei.c), since it is causing the following brelse()
double-free warning when running fsstress on a file system with 1k
blocksize and we run into a block allocation failure while converting
a single-block directory to a multi-block hash-tree indexed directory.

WARNING: at fs/buffer.c:1197 __brelse+0x2e/0x33()
Hardware name: 
VFS: brelse: Trying to free free buffer
Modules linked in:
Pid: 2226, comm: jbd2/sdd-8 Not tainted 2.6.32-rc6-00577-g0003f55 #101
Call Trace:
 [<c01587fb>] warn_slowpath_common+0x65/0x95
 [<c0158869>] warn_slowpath_fmt+0x29/0x2c
 [<c021168e>] __brelse+0x2e/0x33
 [<c0288a9f>] jbd2_journal_refile_buffer+0x67/0x6c
 [<c028a9ed>] jbd2_journal_commit_transaction+0x319/0x14d8
 [<c0164d73>] ? try_to_del_timer_sync+0x58/0x60
 [<c0175bcc>] ? sched_clock_cpu+0x12a/0x13e
 [<c017f6b4>] ? trace_hardirqs_off+0xb/0xd
 [<c0175c1f>] ? cpu_clock+0x3f/0x5b
 [<c017f6ec>] ? lock_release_holdtime+0x36/0x137
 [<c0664ad0>] ? _spin_unlock_irqrestore+0x44/0x51
 [<c0180af3>] ? trace_hardirqs_on_caller+0x103/0x124
 [<c0180b1f>] ? trace_hardirqs_on+0xb/0xd
 [<c0164d73>] ? try_to_del_timer_sync+0x58/0x60
 [<c0290d1c>] kjournald2+0x11a/0x310
 [<c017118e>] ? autoremove_wake_function+0x0/0x38
 [<c0290c02>] ? kjournald2+0x0/0x310
 [<c0170ee6>] kthread+0x66/0x6b
 [<c0170e80>] ? kthread+0x0/0x6b
 [<c01251b3>] kernel_thread_helper+0x7/0x10
---[ end trace 5579351b86af61e3 ]---

Commit 6487a9d was an attempt some buffer head leaks in an ENOSPC
error path, but in some cases it actually results in an excess ENOSPC,
as shown above.  Fixing this means cleaning up who is responsible for
releasing the buffer heads from the callee to the caller of
add_dirent_to_buf().

Since that's a relatively complex change, and we're late in the rcX
development cycle, I'm reverting this now, and holding back a more
complete fix until after 2.6.32 ships.  We've lived with this
buffer_head leak on ENOSPC in ext3 and ext4 for a very long time; a
few more months won't kill us.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Curt Wohlgemuth <curtw@google.com>
2009-11-08 15:45:44 -05:00
Russell King
bfd2e29f04 [ARM] Fix test for unimplemented ARM syscalls
The existing test always failed since 'no' was always greater than
0x7ff.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2009-11-08 20:05:28 +00:00
Cyrill Gorcunov
f4a70c5537 x86, apic: Get rid of apicid_to_cpu_present assign on 64-bit
In fact it's never get used on x86-64 (for 64 bit platform
we use differ technique to enumerate io-units).

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <20091108131645.GD5300@lenovo>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-08 19:46:17 +01:00
Dominik Brodowski
9ac3e58cef pcmcia: deprecate CS_CHECK (bluetooth)
Remove all usages of the CS_CHECK macro and replace them with proper
Linux style calling and return value checking. The extra error reporting may
be dropped, as the PCMCIA core already complains about any (non-driver-author)
errors.

CC: linux-bluetooth@vger.kernel.org
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2009-11-08 18:23:15 +01:00
Dominik Brodowski
444486a5f9 pcmcia: use dynamic debug infrastructure, deprecate CS_CHECK (ide)
ide-cs.c is the only PCMCIA device driver making use of CONFIG_PCMCIA_DEBUG,
so convert it to use the dynamic debug infrastructure.

Also, remove all usages of the CS_CHECK macro and replace them with proper
Linux style calling and return value checking. The extra error reporting may
be dropped, as the PCMCIA core already complains about any (non-driver-author)
errors.

CC: linux-ide@vger.kernel.org
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2009-11-08 18:23:14 +01:00
Dominik Brodowski
6d9a299f67 pcmcia: extend error reporting and debug messages in core
Add a few more error and debug messages to the PCMCIA core.

Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2009-11-08 18:23:14 +01:00
Dominik Brodowski
c9f50dddd1 pcmcia: use dynamic debug in PCMCIA socket drivers
Make use of the dynamic debug infrastructure in various PCMCIA socket
drivers. By doing so, only the drivers relying on soc_common make use
of CONFIG_PCMCIA_DEBUG. Therefore, update the Kconfig entry accordingly.

Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2009-11-08 18:23:13 +01:00
Dominik Brodowski
d50dbec3ce pcmcia: use dynamic debug instead of custom infrastructure
Use the generic "dynamic debug" infrastructure instead of
CONIG_PCMCIA_DEBUG in the PCMCIA core (pcmcia.ko and pcmcia_core.ko). To
enable debugging, enable CONFIG_DYNAMIC_DEBUG, mount debugfs and

$ echo -n 'module pcmcia_core +p' > /sys/kernel/debug/dynamic_debug/control

for the complete module "pcmcia_core", for example. For more detailled
instructions, please see Documentation/dynamic-debug-howto.txt

Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2009-11-08 18:23:12 +01:00
Dominik Brodowski
18a7a19b37 pcmcia: remove pcmcia_get_{first,next}_tuple()
Remove the pcmcia_get_{first,next}_tuple() calls no longer needed by
(current) pcmcia device drivers.

Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2009-11-08 18:23:11 +01:00
Dominik Brodowski
18b61b9729 pcmcia: convert pcmciamtd driver to use new CIS helpers
Convert the (broken) pcmciamtd driver to use the new CIS helpers.

CC: David.Woodhouse@intel.com
CC: linux-mtd@lists.infradead.org
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2009-11-08 18:23:10 +01:00
Dominik Brodowski
37ace3d413 pcmcia: convert ssb pcmcia driver to use new CIS helpers
SSB is a prime example of how to make use of the new CIS helpers.

CC: Michael Buesch <mb@bu3sch.de>
Acked-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2009-11-08 18:23:09 +01:00
Dominik Brodowski
dddfbd824b pcmcia: convert net pcmcia drivers to use new CIS helpers
Use the new CIS helpers in net pcmcia drivers, which allows for
a few code cleanups.

This revision does not remove the phys_addr assignment in
3c589_cs.c -- a bug noted by Komuro <komurojun-mbn@nifty.com>

CC: David S. Miller <davem@davemloft.net>
CC: netdev@vger.kernel.org
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2009-11-08 18:23:05 +01:00
Dominik Brodowski
91284224da pcmcia: add new CIS access helpers
As a replacement to pcmcia_get_{first,next}_tuple() and
pcmcia_get_tuple_data(), three new -- and easier to use --
functions are added:

- pcmcia_get_tuple() to get the very first CIS entry of one
  type.

- pcmcia_loop_tuple() to loop over all CIS entries of one type.

- pcmcia_get_mac_from_cis() to read out the hardware MAC address
  from CISTPL_FUNCE.

Only a handful of drivers need these functions anyway, as most
CIS access is already handled by pcmcia_loop_config(), which
now shares the same backed (pccard_loop_tuple()) with
pcmcia_loop_tuple().

A pcmcia_get_mac_from_cis() bug noted by Komuro
<komurojun-mbn@nifty.com> has been fixed in this revision.

Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2009-11-08 18:06:54 +01:00
Dominik Brodowski
af757923a9 ipwireless: make more use of pcmcia_loop_config()
Within the pcmcia_loop_config() callback, we already have all
tuple data available we need. Also add a fix to release the IO
resource (at least within pcmcia_loop_config() error path).

CC: Jiri Kosina <jkosina@suse.cz>
CC: David Sterba <dsterba@suse.cz>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2009-11-08 18:06:53 +01:00
Dominik Brodowski
aaa8cfdada pcmcia: use pcmcia_loop_config in misc pcmcia drivers
Use pcmcia_loop_config() in a few drivers missed during the first
round. On fmvj18x_cs.c it -- strangely -- only requries us to set
conf.ConfigIndex, which is done by the core, so include an empty
loop function which returns 0 unconditionally.

CC: David S. Miller <davem@davemloft.net>
CC: David Sterba <dsterba@suse.cz>
CC: netdev@vger.kernel.org
CC: linux-wireless@vger.kernel.org
For the ipwireless part: Acked-by: Jiri Kosina <jkosina@suse.cz>
Acked-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2009-11-08 18:06:53 +01:00
Dominik Brodowski
7d2e8d00b4 pcmcia: use pre-determined values
A few PCMCIA network drivers can make use of values provided by the pcmcia
core, instead of tedious, independent CIS parsing.

xirc32ps_cs.c: manf_id

hostap_cs.c: multifunction count

b43/pcmcia.c: ConfigBase address and "Present"

smc91c92_cs.c:  By default, mhz_setup() can use VERS_1 as it is stored
in struct pcmcia_device. Only some cards require workarounds, such as
reading out VERS_1 twice.

CC: David S. Miller <davem@davemloft.net>
CC: netdev@vger.kernel.org
CC: linux-wireless@vger.kernel.org
Acked-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2009-11-08 18:06:32 +01:00
Clark Williams
549104f22b perf tools: Modify perf routines to use new debugfs routines
modify perf.c get_debugfs_mntpnt() to use the util/debugfs.c
debugfs_find_mountpoint()

modify util/parse-events.c to use debugfs_valid_mountpoint().

Signed-off-by: Clark Williams <williams@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <20091101155720.624cc87e@torg>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-08 18:01:35 +01:00
Clark Williams
afe61f6778 perf tools: Add debugfs utility routines for perf
Add routines to locate the debugfs mount point and to manage the
mounting and unmounting of the debugfs.

Signed-off-by: Clark Williams <williams@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <20091101155621.2b3503ee@torg>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-08 18:01:34 +01:00
Russell King
5418983113 Merge branch 'for-rmk' of git://git.marvell.com/orion 2009-11-08 16:40:38 +00:00
Cyrill Gorcunov
4343fe1024 x86, ioapic: Use snrpintf while set names for IO-APIC resourses
We should be ready that one day MAX_IO_APICS may raise its
number. To prevent memory overwrite we're to use safe
snprintf while set IO-APIC resourse name.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <20091108155431.GC25940@lenovo>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-08 17:06:23 +01:00
Cyrill Gorcunov
46dc281b1b x86, apic: Use PAGE_SIZE instead of numbers
The whole page is reserved for IO-APIC fixmap
due to non-cacheable requirement. So lets note
this explicitly instead of playing with numbers.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Maciej W. Rozycki <macro@linux-mips.org>
LKML-Reference: <20091108155356.GB25940@lenovo>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-08 17:06:22 +01:00
Pekka Enberg
c10edee2e1 perf tools: Fix permission checks
The perf_event_open() system call returns EACCES if the user is
not root which results in a very confusing error message:

  $ perf record -A -a -f

    Error: perfcounter syscall returned with -1 (Permission denied)

    Fatal: No CONFIG_PERF_EVENTS=y kernel support configured?

It turns out that's because perf tools are checking only for
EPERM. Fix that up to get a much better error message:

  $ perf record -A -a -f
    Fatal: Permission error - are you root?

Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <1257696066-4046-1-git-send-email-penberg@cs.helsinki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-08 17:04:54 +01:00
Jan Beulich
eb647138ac x86/PCI: Adjust GFP mask handling for coherent allocations
Rather than forcing GFP flags and DMA mask to be inconsistent,
GFP flags should be determined even for the fallback device
through dma_alloc_coherent_mask()/dma_alloc_coherent_gfp_flags().

This restores 64-bit behavior as it was prior to commits
8965eb1938 and
4a367f3a9d (not sure why there are
two of them), where GFP_DMA was forced on for 32-bit, but not
for 64-bit, with the slight adjustment that afaict even 32-bit
doesn't need this without CONFIG_ISA.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Acked-by: Takashi Iwai <tiwai@suse.de>
LKML-Reference: <4AF18187020000780001D8AA@vpn.id2.novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-11-08 07:44:30 -08:00
Li Zefan
30ff21e31f ksym_tracer: Remove KSYM_SELFTEST_ENTRY
The macro used to be used in both trace_selftest.c and
trace_ksym.c, but no longer, so remove it from header file.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Prasad <prasad@linux.vnet.ibm.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-11-08 16:21:01 +01:00
Frederic Weisbecker
ba1c813a6b hw-breakpoints: Arbitrate access to pmu following registers constraints
Allow or refuse to build a counter using the breakpoints pmu following
given constraints.

We keep track of the pmu users by using three per cpu variables:

- nr_cpu_bp_pinned stores the number of pinned cpu breakpoints counters
  in the given cpu

- nr_bp_flexible stores the number of non-pinned breakpoints counters
  in the given cpu.

- task_bp_pinned stores the number of pinned task breakpoints in a cpu

The latter is not a simple counter but gathers the number of tasks that
have n pinned breakpoints.
Considering HBP_NUM the number of available breakpoint address
registers:
   task_bp_pinned[0] is the number of tasks having 1 breakpoint
   task_bp_pinned[1] is the number of tasks having 2 breakpoints
   [...]
   task_bp_pinned[HBP_NUM - 1] is the number of tasks having the
   maximum number of registers (HBP_NUM).

When a breakpoint counter is created and wants an access to the pmu,
we evaluate the following constraints:

== Non-pinned counter ==

- If attached to a single cpu, check:

    (per_cpu(nr_bp_flexible, cpu) || (per_cpu(nr_cpu_bp_pinned, cpu)
         + max(per_cpu(task_bp_pinned, cpu)))) < HBP_NUM

       -> If there are already non-pinned counters in this cpu, it
          means there is already a free slot for them.
          Otherwise, we check that the maximum number of per task
          breakpoints (for this cpu) plus the number of per cpu
          breakpoint (for this cpu) doesn't cover every registers.

- If attached to every cpus, check:

    (per_cpu(nr_bp_flexible, *) || (max(per_cpu(nr_cpu_bp_pinned, *))
           + max(per_cpu(task_bp_pinned, *)))) < HBP_NUM

       -> This is roughly the same, except we check the number of per
          cpu bp for every cpu and we keep the max one. Same for the
          per tasks breakpoints.

== Pinned counter ==

- If attached to a single cpu, check:

       ((per_cpu(nr_bp_flexible, cpu) > 1)
            + per_cpu(nr_cpu_bp_pinned, cpu)
            + max(per_cpu(task_bp_pinned, cpu))) < HBP_NUM

       -> Same checks as before. But now the nr_bp_flexible, if any,
          must keep one register at least (or flexible breakpoints will
          never be be fed).

- If attached to every cpus, check:

      ((per_cpu(nr_bp_flexible, *) > 1)
           + max(per_cpu(nr_cpu_bp_pinned, *))
           + max(per_cpu(task_bp_pinned, *))) < HBP_NUM

Changes in v2:

- Counter -> event rename

Changes in v5:

- Fix unreleased non-pinned task-bound-only counters. We only released
  it in the first cpu. (Thanks to Paul Mackerras for reporting that)

Changes in v6:

- Currently, events scheduling are done in this order: cpu context
  pinned + cpu context non-pinned + task context pinned + task context
  non-pinned events. Then our current constraints are right theoretically
  but not in practice, because non-pinned counters may be scheduled
  before we can apply every possible pinned counters. So consider
  non-pinned counters as pinned for now.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Prasad <prasad@linux.vnet.ibm.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Jan Kiszka <jan.kiszka@web.de>
Cc: Jiri Slaby <jirislaby@gmail.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Paul Mundt <lethal@linux-sh.org>
2009-11-08 16:20:47 +01:00
Frederic Weisbecker
24f1e32c60 hw-breakpoints: Rewrite the hw-breakpoints layer on top of perf events
This patch rebase the implementation of the breakpoints API on top of
perf events instances.

Each breakpoints are now perf events that handle the
register scheduling, thread/cpu attachment, etc..

The new layering is now made as follows:

       ptrace       kgdb      ftrace   perf syscall
          \          |          /         /
           \         |         /         /
                                        /
            Core breakpoint API        /
                                      /
                     |               /
                     |              /

              Breakpoints perf events

                     |
                     |

               Breakpoints PMU ---- Debug Register constraints handling
                                    (Part of core breakpoint API)
                     |
                     |

             Hardware debug registers

Reasons of this rewrite:

- Use the centralized/optimized pmu registers scheduling,
  implying an easier arch integration
- More powerful register handling: perf attributes (pinned/flexible
  events, exclusive/non-exclusive, tunable period, etc...)

Impact:

- New perf ABI: the hardware breakpoints counters
- Ptrace breakpoints setting remains tricky and still needs some per
  thread breakpoints references.

Todo (in the order):

- Support breakpoints perf counter events for perf tools (ie: implement
  perf_bpcounter_event())
- Support from perf tools

Changes in v2:

- Follow the perf "event " rename
- The ptrace regression have been fixed (ptrace breakpoint perf events
  weren't released when a task ended)
- Drop the struct hw_breakpoint and store generic fields in
  perf_event_attr.
- Separate core and arch specific headers, drop
  asm-generic/hw_breakpoint.h and create linux/hw_breakpoint.h
- Use new generic len/type for breakpoint
- Handle off case: when breakpoints api is not supported by an arch

Changes in v3:

- Fix broken CONFIG_KVM, we need to propagate the breakpoint api
  changes to kvm when we exit the guest and restore the bp registers
  to the host.

Changes in v4:

- Drop the hw_breakpoint_restore() stub as it is only used by KVM
- EXPORT_SYMBOL_GPL hw_breakpoint_restore() as KVM can be built as a
  module
- Restore the breakpoints unconditionally on kvm guest exit:
  TIF_DEBUG_THREAD doesn't anymore cover every cases of running
  breakpoints and vcpu->arch.switch_db_regs might not always be
  set when the guest used debug registers.
  (Waiting for a reliable optimization)

Changes in v5:

- Split-up the asm-generic/hw-breakpoint.h moving to
  linux/hw_breakpoint.h into a separate patch
- Optimize the breakpoints restoring while switching from kvm guest
  to host. We only want to restore the state if we have active
  breakpoints to the host, otherwise we don't care about messed-up
  address registers.
- Add asm/hw_breakpoint.h to Kbuild
- Fix bad breakpoint type in trace_selftest.c

Changes in v6:

- Fix wrong header inclusion in trace.h (triggered a build
  error with CONFIG_FTRACE_SELFTEST

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Prasad <prasad@linux.vnet.ibm.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Jan Kiszka <jan.kiszka@web.de>
Cc: Jiri Slaby <jirislaby@gmail.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Paul Mundt <lethal@linux-sh.org>
2009-11-08 15:34:42 +01:00
Tejun Heo
2ae8bb75db x86: Fix iommu=nodac parameter handling
iommu=nodac should forbid dac instead of enabling it. Fix it.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Matteo Frigo <athena@fftw.org>
Cc: <stable@kernel.org> # .32.x and older
LKML-Reference: <4AE5B52A.4050408@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-08 13:19:05 +01:00
Lai Jiangshan
d8c80ce091 sched, no_hz: Remove unused rq->last_tick_seen field
In 15934a3732,
field last_tick_seen is added to struct rq.
But it is unused now.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Guillaume Chazarain <guichaz@yahoo.fr>
LKML-Reference: <4AE6A513.6010100@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-08 13:17:47 +01:00
Cyrill Gorcunov
e9036b36ee sched: Use root_task_group_empty only with FAIR_GROUP_SCHED
root_task_group_empty is used only with FAIR_GROUP_SCHED
so if we use other scheduler options we get:

  kernel/sched.c:314: warning: 'root_task_group_empty' defined but not used

So move CONFIG_FAIR_GROUP_SCHED up that it covers
root_task_group_empty().

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <20091026192414.GB5321@lenovo>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-08 13:15:48 +01:00
Cyrill Gorcunov
c82a43d40b irq: Do not attempt to create subdirectories if /proc/irq/<irq> failed
If a parent directory (ie /proc/irq/<irq>) could not be created
we should not attempt to create subdirectories. Otherwise it
would lead that "smp_affinity" and "spurious" entries are may be
registered under /proc root instead of a proper place.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <20091026202811.GD5321@lenovo>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-08 13:14:22 +01:00
FUJITA Tomonori
338bac527e x86: Use x86_platform for iommu_shutdown
This patch cleans up pci_iommu_shutdown() a bit to use
x86_platform (similar to how IA64 initializes an IOMMU driver).

This adds iommu_shutdown() to x86_platform to avoid calling
every IOMMUs' shutdown functions in pci_iommu_shutdown() in
order. The IOMMU shutdown functions are platform specific (we
don't have multiple different IOMMU hardware) so the current way
is pointless.

An IOMMU driver sets x86_platform.iommu_shutdown to the shutdown
function if necessary.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: joerg.roedel@amd.com
LKML-Reference: <20091027163358F.fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-08 13:12:26 +01:00
Nicolas Pitre
158bc5af3d ARM: 5784/1: fix early boot machine ID mismatch error display
That code was refactored a long time ago, but one particular label
didn't get adjusted properly which broke the listing of supported
machines.

Signed-off-by: Nicolas Pitre <nico@marvell.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2009-11-08 11:58:54 +00:00
Xiaotian Feng
de2a47cf2b x86: Fix error return sequence in __ioremap_caller()
kernel missed to free memtype if get_vm_area_caller failed in
__ioremap_caller.

This patch introduces error path to fix this and cleans up the
repetitive error return sequences that contributed to the
creation of the bug.

Signed-off-by: Xiaotian Feng <dfeng@redhat.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
LKML-Reference: <1257389031-20429-1-git-send-email-dfeng@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-08 12:48:58 +01:00
Rusty Russell
0d0fbbddcc x86, msr, cpumask: Use struct cpumask rather than the deprecated cpumask_t
This makes the declarations match the definitions, which already
use 'struct cpumask'.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <200911052245.41803.rusty@rustcorp.com.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-08 11:58:38 +01:00
Masami Hiramatsu
c12a229bc5 x86: Remove unused thread_return label from switch_to()
Remove unused thread_return label from switch_to() macro on
x86-64. Since this symbol cuts into schedule(), backtrace at the
latter half of schedule() was always shown as thread_return().

Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: systemtap <systemtap@sources.redhat.com>
Cc: DLE <dle-develop@lists.sourceforge.net>
LKML-Reference: <20091105160359.5181.26225.stgit@harusame>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-08 11:57:13 +01:00
Randy Dunlap
968c86458a sched: Fix kernel-doc function parameter name
Fix variable name in sched.c kernel-doc notation.

Fixes this DocBook warning:

 Warning(kernel/sched.c:2008): No description found for parameter
 'p' Warning(kernel/sched.c:2008): Excess function parameter 'k'
 description in 'kthread_bind'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
LKML-Reference: <4AF4B1BC.8020604@oracle.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-08 11:26:25 +01:00
Ryusuke Konishi
c083234f15 nilfs2: fix missing cleanup of gc cache on error cases
This fixes an -rc1 regression brought by the commit:
1cf58fa840 ("nilfs2: shorten freeze
period due to GC in write operation v3").

Although the patch moved out a function call of
nilfs_ioctl_move_blocks() to nilfs_ioctl_clean_segments() from
nilfs_ioctl_prepare_clean_segments(), it didn't move corresponding
cleanup job needed for the error case.

This will move the missing cleanup job to the destination function.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Acked-by: Jiro SEKIBA <jir@unicus.jp>
2009-11-08 19:04:25 +09:00
Ryusuke Konishi
5399dd1fc8 nilfs2: fix kernel oops in error case of nilfs_ioctl_move_blocks
This fixes a kernel oops reported by Markus Trippelsdorf in the email
titled "[NILFS users] kernel Oops while running nilfs_cleanerd".

The oops was caused by a bug of error path in
nilfs_ioctl_move_blocks() function, which was inlined in
nilfs_ioctl_clean_segments().

nilfs_ioctl_move_blocks checks duplication of blocks which will be
moved in garbage collection.  But, the check should have be done
within nilfs_ioctl_move_inode_block() to prevent list corruption among
buffers storing the target blocks.

To fix the kernel oops, this moves forward the duplication check
before the list insertion.

I also tested this for stable trees [2.6.30, 2.6.31].

Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: stable <stable@kernel.org>
2009-11-08 19:01:35 +09:00
Arnaldo Carvalho de Melo
8d06367fa7 perf symbols: Use the buildids if present
With this change 'perf record' will intercept PERF_RECORD_MMAP
calls, creating a linked list of DSOs, then when the session
finishes, it will traverse this list and read the buildids,
stashing them at the end of the file and will set up a new
feature bit in the header bitmask.

'perf report' will then notice this feature and populate the
'dsos' list and set the build ids.

When reading the symtabs it will refuse to load from a file that
doesn't have the same build id. This improves the
reliability of the profiler output, as symbols and profiling
data is more guaranteed to match.

Example:

 [root@doppio ~]# perf report | head
 /home/acme/bin/perf with build id b1ea544ac3746e7538972548a09aadecc5753868 not found, continuing without symbols
  # Samples: 2621434559
  #
  # Overhead          Command                  Shared Object  Symbol
  # ........  ...............  .............................  ......
  #
       7.91%             init  [kernel]        [k] read_hpet
       7.64%             init  [kernel]        [k] mwait_idle_with_hints
       7.60%          swapper  [kernel]        [k] read_hpet
       7.60%          swapper  [kernel]        [k] mwait_idle_with_hints
       3.65%             init  [kernel]        [k] 0xffffffffa02339d9
[root@doppio ~]#

In this case the 'perf' binary was an older one, vanished,
so its symbols probably wouldn't match or would cause subtly
different (and misleading) output.

Next patches will support the kernel as well, reading the build
id notes for it and the modules from /sys.

Another patch should also introduce a new plumbing command:

'perf list-buildids'

that will then be used in porcelain that is distro specific to
fetch -debuginfo packages where such buildids are present. This
will in turn allow for one to run 'perf record' in one machine
and 'perf report' in another.

Future work on having the buildid sent directly from the kernel
in the PERF_RECORD_MMAP event is needed to close races, as the
DSO can be changed during a 'perf record' session, but this
patch at least helps with non-corner cases and current/older
kernels.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Frank Ch. Eigler <fche@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Jim Keniston <jkenisto@us.ibm.com>
Cc: K. Prasad <prasad@linux.vnet.ibm.com>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Roland McGrath <roland@redhat.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <1257367843-26224-1-git-send-email-acme@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-08 10:44:36 +01:00
Frederic Weisbecker
444a2a3bcd tracing, perf_events: Protect the buffer from recursion in perf
While tracing using events with perf, if one enables the
lockdep:lock_acquire event, it will infect every other perf
trace events.

Basically, you can enable whatever set of trace events through
perf but if this event is part of the set, the only result we
can get is a long list of lock_acquire events of rcu read lock,
and only that.

This is because of a recursion inside perf.

1) When a trace event is triggered, it will fill a per cpu
   buffer and submit it to perf.

2) Perf will commit this event but will also protect some data
   using rcu_read_lock

3) A recursion appears: rcu_read_lock triggers a lock_acquire
   event that will fill the per cpu event and then submit the
   buffer to perf.

4) Perf detects a recursion and ignores it

5) Perf continues its work on the previous event, but its buffer
   has been overwritten by the lock_acquire event, it has then
   been turned into a lock_acquire event of rcu read lock

Such scenario also happens with lock_release with
rcu_read_unlock().

We could turn the rcu_read_lock() into __rcu_read_lock() to drop
the lock debugging from perf fast path, but that would make us
lose the rcu debugging and that doesn't prevent from other
possible kind of recursion from perf in the future.

This patch adds a recursion protection based on a counter on the
perf trace per cpu buffers to solve the problem.

-v2: Fixed lost whitespace, added reviewed-by tag

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Reviewed-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Jason Baron <jbaron@redhat.com>
LKML-Reference: <1257477185-7838-1-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-08 10:31:42 +01:00
Hitoshi Mitake
bfde82ef51 perf bench: Add subcommand 'bench' to the Makefile
This patch modifies Makefile for new files related to 'bench'
subcommand. The new code is active from this point on.

Signed-off-by: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: fweisbec@gmail.com
Cc: Jiri Kosina <jkosina@suse.cz>
LKML-Reference: <1257381097-4743-8-git-send-email-mitake@dcl.info.waseda.ac.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-08 10:19:20 +01:00
Hitoshi Mitake
dcba8848d3 perf bench: Add new subcommand 'bench' to perf.c
This patch modifies perf.c for invoking 'bench' subcommand.

Signed-off-by: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: fweisbec@gmail.com
Cc: Jiri Kosina <jkosina@suse.cz>
LKML-Reference: <1257381097-4743-7-git-send-email-mitake@dcl.info.waseda.ac.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-08 10:19:19 +01:00
Hitoshi Mitake
11bd341c04 perf bench: Modify builtin.h for new prototype
This patch modifies builtin.h to add prototype of cmd_bench().

Signed-off-by: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: fweisbec@gmail.com
Cc: Jiri Kosina <jkosina@suse.cz>
LKML-Reference: <1257381097-4743-6-git-send-email-mitake@dcl.info.waseda.ac.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-08 10:19:19 +01:00
Hitoshi Mitake
629cc35665 perf bench: Add builtin-bench.c: General framework for benchmark suites
This patch adds builtin-bench.c
builtin-bench.c is a general framework for benchmark suites.

Signed-off-by: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: fweisbec@gmail.com
Cc: Jiri Kosina <jkosina@suse.cz>
LKML-Reference: <1257381097-4743-5-git-send-email-mitake@dcl.info.waseda.ac.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-08 10:19:18 +01:00
Hitoshi Mitake
c7d9300f36 perf bench: Add sched-pipe.c: Benchmark for pipe() system call
This patch adds bench/sched-pipe.c.

bench/sched-pipe.c is a benchmark program
to measure performance of pipe() system call.
This benchmark is based on pipe-test-1m.c by Ingo Molnar:

   http://people.redhat.com/mingo/cfs-scheduler/tools/pipe-test-1m.c

Example of use:

% perf bench sched pipe
  (executing 1000000 pipe operations between two tasks)

          Total time:4.499 sec
                  4.499179 usecs/op
                  222262 ops/sec

% perf bench sched pipe -s -l 1000
0.015

Signed-off-by: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: fweisbec@gmail.com
Cc: Jiri Kosina <jkosina@suse.cz>
LKML-Reference: <1257381097-4743-4-git-send-email-mitake@dcl.info.waseda.ac.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-08 10:19:18 +01:00
Hitoshi Mitake
e27454cc63 perf bench: Add sched-messaging.c: Benchmark for scheduler and IPC mechanisms based on hackbench
This patch adds bench/sched-messaging.c.

This benchmark measures performance of scheduler and IPC
mechanisms, and is based on hackbench by Rusty Russell.

Example of usage:

  % perf bench sched messaging -g 20 -l 1000 -s
  5.432  	  	       	    	    	     # in sec

  % perf bench sched messaging                 # run with default
  options (20 sender and receiver processes per group)
  (10 groups == 400 processes run)

        Total time:0.308 sec

  % perf bench sched messaging -t -g 20	     # # be multi-thread,
  with 20 groups (20 sender and receiver threads per group)
  (20 groups == 800 threads run)

        Total time:0.582 sec

( Rusty is the original author of hackbench.c and he said the code is
  and was under the GPLv2 so fine to be merged. )

Signed-off-by: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: fweisbec@gmail.com
Cc: Jiri Kosina <jkosina@suse.cz>
LKML-Reference: <1257381097-4743-3-git-send-email-mitake@dcl.info.waseda.ac.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-08 10:19:17 +01:00
Hitoshi Mitake
c426bba069 perf bench: Add new directory and header for new subcommand 'bench'
This patch adds bench/ directory and bench/bench.h.

bench/ directory will contain modules for bench subcommand.
bench/bench.h is for listing prototypes of module functions.

Signed-off-by: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: fweisbec@gmail.com
Cc: Jiri Kosina <jkosina@suse.cz>
LKML-Reference: <1257381097-4743-2-git-send-email-mitake@dcl.info.waseda.ac.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-08 10:19:15 +01:00
Sebastian Siewior
2606289779 net/fsl_pq_mdio: add module license GPL
or it will taint the kernel and fail to load becuase
of_address_to_resource() is GPL only.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-08 00:49:04 -08:00
Wolfgang Grandegger
53a0ef866d can: fix WARN_ON dump in net/core/rtnetlink.c:rtmsg_ifinfo()
On older kernels, e.g. 2.6.27, a WARN_ON dump in rtmsg_ifinfo()
is thrown when the CAN device is registered due to insufficient
skb space, as reported by various users. This patch adds the
rtnl_link_ops "get_size" to fix the problem. I think this patch
is required for more recent kernels as well, even if no WARN_ON
dumps are triggered. Maybe we also need "get_xstats_size" for
the CAN xstats.

Signed-off-by: Wolfgang Grandegger <wg@grandegger.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-08 00:45:48 -08:00