* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (23 commits)
bridge: Fix LRO crash with tun
IPv6: fix to set device name when new IPv6 over IPv6 tunnel device is created.
gianfar: Fix boot hangs while bringing up gianfar ethernet
netfilter: xt_sctp: sctp chunk mapping doesn't work
netfilter: ctnetlink: fix echo if not subscribed to any multicast group
netfilter: ctnetlink: allow changing NAT sequence adjustment in creation
netfilter: nf_conntrack_ipv6: don't track ICMPv6 negotiation message
netfilter: fix tuple inversion for Node information request
netxen: fix msi-x interrupt handling
de2104x: force correct order when writing to rx ring
tun: Fix unicast filter overflow
drivers/isdn: introduce missing kfree
drivers/atm: introduce missing kfree
sunhme: Don't match PCI devices in SBUS probe.
9p: fix endian issues [attempt 3]
net_dma: call dmaengine_get only if NET_DMA enabled
3c509: Fix resume from hibernation for PnP mode.
sungem: Soft lockup in sungem on Netra AC200 when switching interface up
RxRPC: Fix a potential NULL dereference
r8169: Don't update statistics counters when interface is down
...
When overcommit is disabled, the core VM accounts for pages used by anonymous
shared, private mappings and special mappings. It keeps track of VMAs that
should be accounted for with VM_ACCOUNT and VMAs that never had a reserve
with VM_NORESERVE.
Overcommit for hugetlbfs is much riskier than overcommit for base pages
due to contiguity requirements. It avoids overcommiting on both shared and
private mappings using reservation counters that are checked and updated
during mmap(). This ensures (within limits) that hugepages exist in the
future when faults occurs or it is too easy to applications to be SIGKILLed.
As hugetlbfs makes its own reservations of a different unit to the base page
size, VM_ACCOUNT should never be set. Even if the units were correct, we would
double account for the usage in the core VM and hugetlbfs. VM_NORESERVE may
be set because an application can request no reserves be made for hugetlbfs
at the risk of getting killed later.
With commit fc8744adc8, VM_NORESERVE and
VM_ACCOUNT are getting unconditionally set for hugetlbfs-backed mappings. This
breaks the accounting for both the core VM and hugetlbfs, can trigger an
OOM storm when hugepage pools are too small lockups and corrupted counters
otherwise are used. This patch brings hugetlbfs more in line with how the
core VM treats VM_NORESERVE but prevents VM_ACCOUNT being set.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Impact: fix to prevent a kernel crash on fault
If for some reason the pointer to the parent function on the
stack takes a fault, the fix up code will not return back to
the original faulting code. This can lead to unpredictable
results and perhaps even a kernel panic.
A fault should not happen, but if it does, we should simply
disable the tracer, warn, and continue running the kernel.
It should not lead to a kernel crash.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
This patch is to make the function return early on failure, and give
correct return value on success.
Signed-off-by: Wenji Huang <wenji.huang@oracle.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
The C99 specification states in section 6.11.5:
The placement of a storage-class specifier other than at the beginning
of the declaration specifiers in a declaration is an obsolescent
feature.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
The constraint used for retrieving and restoring the parent function
pointer is incorrect. The parent variable is a pointer, and the
address of the pointer is modified by the asm statement and not
the pointer itself. It is incorrect to pass it in as an output
constraint since the asm will never update the pointer.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Bits 31-8 are marked as reserved and should be ignored while
interpreting a region's code.
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
The clearing of a vha's req_ques were overrunning during vport
creation. During deletion, vport queues should be torn-down
after all cleanup has occurred.
Signed-off-by: Anirban Chakraborty <anirban.chakraborty@qlogic.com>
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
To ensure smooth operations amongst the FCoE and NIC side
components of the ISP81xx chip, the FCoE driver (qla2xxx) must
ensure the 10gb NIC driver (qlge) does not timeout waiting for
IDC (Inter-Driver Communication) acknowledgments. The
acknowledgment requirements are trivial -- a simple mirroring of
incoming mailbox registers during the AEN to a process-context
capable mailbox command.
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Correct response-queue-0 processing by instructing the firmware
to run with interrupt-handshaking disabled, similarly to what is
now done for all non-0 response queues. Since all
response-queues now run in the same mode, the driver no longer
needs the hot-path 'is-disabled-HCCR' test.
Signed-off-by: Anirban Chakraborty <anirban.chakraborty@qlogic.com>
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Error handling code following a kmalloc should free the allocated data.
The semantic match that finds the problem is as follows:
(http://www.emn.fr/x-info/coccinelle/)
// <smpl>
@r exists@
local idexpression x;
statement S;
expression E;
identifier f,l;
position p1,p2;
expression *ptr != NULL;
@@
(
if ((x@p1 = \(kmalloc\|kzalloc\|kcalloc\)(...)) == NULL) S
|
x@p1 = \(kmalloc\|kzalloc\|kcalloc\)(...);
...
if (x == NULL) S
)
<... when != x
when != if (...) { <+...x...+> }
x->f = E
...>
(
return \(0\|<+...x...+>\|ptr\);
|
return@p2 ...;
)
@script:python@
p1 << r.p1;
p2 << r.p2;
@@
print "* file: %s kmalloc %s return %s" % (p1[0].file,p1[0].line,p2[0].line)
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Acked-by: James Smart <james.smart@emulex.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Yanling Qi from LSI found the root cause of the panic, below is his
analysis:
Problem description: the open iscsi driver installs eh_timed_out handler
to the
blank_transport_template of the scsi middle level that causes panic of
timed
out command of other host
Here are the details
Iscsi Session creation
During iscsi session creation time, the iscsi_tcp_session_create() of
iscsi_tpc.c will create a scsi-host for the session. See the statement
marked
with the label A. The statement B replaces the shost->transportt point
with a
local struct variable.
static struct iscsi_cls_session *
iscsi_tcp_session_create(struct iscsi_endpoint *ep, uint16_t cmds_max,
uint16_t qdepth, uint32_t initial_cmdsn,
uint32_t *hostno)
{
struct iscsi_cls_session *cls_session;
struct iscsi_session *session;
struct Scsi_Host *shost;
int cmd_i;
if (ep) {
printk(KERN_ERR "iscsi_tcp: invalid ep %p.\n", ep);
return NULL;
}
A shost = iscsi_host_alloc(&iscsi_sht, 0, qdepth);
if (!shost)
return NULL;
B shost->transportt = iscsi_tcp_scsi_transport;
shost->max_lun = iscsi_max_lun;
Please note the scsi host is allocated by invoking isccsi_host_alloc()
in
libiscsi.c
Polluting the middle level blank_transport_template in
iscsi_host_alloc() of
libiscsi.c
The iscsi_host_alloc() invokes the middle level function
scsi_host_alloc() in
hosts.c for allocating a scsi_host. Then the statement marked with C
assigns
the iscsi_eh_cmd_timed_out handler to the eh_timed_out callback
function.
struct Scsi_Host *iscsi_host_alloc(struct scsi_host_template *sht,
int dd_data_size, uint16_t qdepth)
{
struct Scsi_Host *shost;
struct iscsi_host *ihost;
shost = scsi_host_alloc(sht, sizeof(struct iscsi_host) +
dd_data_size);
if (!shost)
return NULL;
C shost->transportt->eh_timed_out = iscsi_eh_cmd_timed_out;
Please note the shost->transport is the middle level
blank_transport_template
as shown in the code segment below. We see two problems here. 1.
iscsi_eh_cmd_timed_out is installed to the blank_transport_template that
will
cause some body else problem. 2. iscsi_eh_cmd_timed_out will never be
invoked
when iscsi command gets timeout because the statement B resets the
pointer.
Middle level blank_transport_template
In the middle level function scsi_host_alloc() of hosts.c, the middle
level
assigns a blank_transport_template for those hosts not implementing its
transport layer. All HBAs without supporting a specific scsi_transport
will
share the middle level blank_transport_template. Please see the
statement D
struct Scsi_Host *scsi_host_alloc(struct scsi_host_template *sht, int
privsize)
{
struct Scsi_Host *shost;
gfp_t gfp_mask = GFP_KERNEL;
int rval;
if (sht->unchecked_isa_dma && privsize)
gfp_mask |= __GFP_DMA;
shost = kzalloc(sizeof(struct Scsi_Host) + privsize, gfp_mask);
if (!shost)
return NULL;
shost->host_lock = &shost->default_lock;
spin_lock_init(shost->host_lock);
shost->shost_state = SHOST_CREATED;
INIT_LIST_HEAD(&shost->__devices);
INIT_LIST_HEAD(&shost->__targets);
INIT_LIST_HEAD(&shost->eh_cmd_q);
INIT_LIST_HEAD(&shost->starved_list);
init_waitqueue_head(&shost->host_wait);
mutex_init(&shost->scan_mutex);
shost->host_no = scsi_host_next_hn++; /* XXX(hch): still racy */
shost->dma_channel = 0xff;
/* These three are default values which can be overridden */
shost->max_channel = 0;
shost->max_id = 8;
shost->max_lun = 8;
/* Give each shost a default transportt */
D shost->transportt = &blank_transport_template;
Why we see panic at iscsi_eh_cmd_timed_out()
The mpp virtual HBA doesn’t have a specific scsi_transport. Therefore,
the
blank_transport_template will be assigned to the virtual host of the MPP
virtual HBA by SCSI middle level. Please note that the statement C has
assigned
iscsi-transport eh_timedout handler to the blank_transport_template.
When a mpp
virtual command gets timedout, the iscsi_eh_cmd_timed_out() will be
invoked to
handle mpp virtual command timeout from the middle level
scsi_times_out()
function of the scsi_error.c.
enum blk_eh_timer_return scsi_times_out(struct request *req)
{
struct scsi_cmnd *scmd = req->special;
enum blk_eh_timer_return (*eh_timed_out)(struct scsi_cmnd *);
enum blk_eh_timer_return rtn = BLK_EH_NOT_HANDLED;
scsi_log_completion(scmd, TIMEOUT_ERROR);
if (scmd->device->host->transportt->eh_timed_out)
E eh_timed_out =
scmd->device->host->transportt->eh_timed_out;
else if (scmd->device->host->hostt->eh_timed_out)
eh_timed_out = scmd->device->host->hostt->eh_timed_out;
else
eh_timed_out = NULL;
if (eh_timed_out) {
rtn = eh_timed_out(scmd);
It is very easy to understand why we get panic in the
iscsi_eh_cmd_timed_out().
A scsi_cmnd from a no-iscsi device definitely can not resolve out a
session and
session->lock. The panic can be happed anywhere during the differencing.
static enum blk_eh_timer_return iscsi_eh_cmd_timed_out(struct scsi_cmnd
*scmd)
{
struct iscsi_cls_session *cls_session;
struct iscsi_session *session;
struct iscsi_conn *conn;
enum blk_eh_timer_return rc = BLK_EH_NOT_HANDLED;
cls_session = starget_to_session(scsi_target(scmd->device));
session = cls_session->dd_data;
debug_scsi("scsi cmd %p timedout\n", scmd);
spin_lock(&session->lock);
This patch fixes the problem by moving the setting of the
iscsi_eh_cmd_timed_out to iscsi_add_host, which is after the LLDs
have set their transport template to shost->transportt.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
During cancel testing it has been shown that 15 seconds is not
nearly long enough for the VIOS to respond to a cancel under
loaded situations. Increasing this timeout to 60 seconds allows
time for the VIOS to cancel the outstanding commands and prevents
us from escalating to a full host reset, which can take much longer.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
The ibmvfc driver has a bug in its SCN handling. If it receives
an ELS event such asn an N-Port SCN event or an unsolicited PLOGI,
or any other SCN event which causes ibmvfc_reinit_host to be called,
it is possible that we will call fc_remote_port_add for a target
that already has an rport added, which can result in duplicate
rports getting created for the same targets. Fix this by calling
fc_remote_port_rolechg in this scenario instead to report any possible
role change that may have occurred.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Currently the ibmvfc driver sets the IBMVFC_CLASS_3_ERR flag
in the VFC Frame if both the adapter and the device claim support
for Class 3. However, this bit actually refers to Class 3 Error
Recovery, which is currently not supported by the VIOS. Setting this
bit can cause lots of command timeout responses from the VIOS resulting
in general instability. Fix this by never setting this bit.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Hi,
we have run into an issue with blktrace being started for sg devices.
Please apply.
Thanks,
Martin
From: Martin Peschke <mpeschke@linux.vnet.ibm.com>
The device number denoting a generic SCSI devices (sg) in a blktrace
trace is broken; major and minor are always 0. It looks like
sdp->device->sdev_gendev.devt is not initialized properly.
The fix below uses other data to make up a valid device number,
similar to the way an sg device number is generated for sysfs output.
Reported-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Signed-off-by: Martin Peschke <mpeschke@linux.vnet.ibm.com>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
We were running i/o and performing a bunch of hba resets in a loop.
This forces a lot of target removes and then rescans. Since the
resets are occuring during scan it's causing the scan i/o to timeout,
invoking error recovery, etc. We end up getting some nasty crashing
in scsi_scan.c due to references to old sdevs that are failing
but had some lingering references that kept them around.
Fix by setting device state to SDEV_DEL if the LLD's slave_alloc
fails.
Signed-off-by: James Smart <james.smart@emulex.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
The ibmvscsi client driver is not unmapping the SCSI command after
encountering a DMA mapping error while trying to map an indirect
scattergather list for the event pool. This leads to a leak of DMA
entitlement that could result in the device failing future DMA operations
in a CMO environment.
Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>
Acked-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
The rec_len field in the directory entry is 16 bits, so there was a
problem representing rec_len for filesystems with a 64k block size in
the case where the directory entry takes the entire 64k block.
Unfortunately, there were two schemes that were proposed; one where
all zeros meant 65536 and one where all ones (65535) meant 65536.
E2fsprogs used 0, whereas the kernel used 65535. Oops. Fortunately
this case happens extremely rarely, with the most common case being
the lost+found directory, created by mke2fs.
So we will be liberal in what we accept, and accept both encodings,
but we will continue to encode 65536 as 65535. This will require a
change in e2fsprogs, but with fortunately ext4 filesystems normally
have the dir_index feature enabled, which precludes having a
completely empty directory block.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
If we race with commit code setting i_transaction to NULL, we could
possibly dereference it. Proper locking requires the journal pointer
(to access journal->j_list_lock), which we don't have. So we have to
change the prototype of the function so that filesystem passes us the
journal pointer. Also add a more detailed comment about why the
function jbd2_journal_begin_ordered_truncate() does what it does and
how it should be used.
Thanks to Dan Carpenter <error27@gmail.com> for pointing to the
suspitious code.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Acked-by: Joel Becker <joel.becker@oracle.com>
CC: linux-ext4@vger.kernel.org
CC: ocfs2-devel@oss.oracle.com
CC: mfasheh@suse.de
CC: Dan Carpenter <error27@gmail.com>
Impact: change API and init bpage when copy
ring_buffer_read_page()/rb_remove_entries() may be called for
a partially consumed page.
Add a parameter for rb_remove_entries() and make it update
cpu_buffer->entries correctly for partially consumed pages.
ring_buffer_read_page() now returns the offset to the next event.
Init the bpage's time_stamp when return value is 0.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Impact: Fix bug
I found several very very curious line.
It's so curious that it may be brought by typing mistake.
When (cpu_buffer->reader_page == cpu_buffer->commit_page):
1) We haven't copied it for bpage is changed:
bpage = cpu_buffer->reader_page->page;
memcpy(bpage->data, cpu_buffer->reader_page->page->data + read ... )
2) We need update cpu_buffer->reader_page->read, but
"cpu_buffer->reader_page += read;" is not right.
[
This bug was a typo. The commit->reader_page is a page pointer
and not an index into the page. The line should have been
commit->reader_page->read += read. The other changes
by Lai are nice clean ups to the code. - SDR
]
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
In i8237A_resume(), when resetting the DMA controller, the parameters to
dma_outb() were mixed up.
Signed-off-by: Clemens Ladisch <clemens@ladisch.de>
[ cleaned up the file a tiny bit. ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This undoes commit 14ce0cb411.
Since jbd2_journal_start_commit() is now fixed to return 1 when we
started a transaction commit, there's some transaction waiting to be
committed or there's a transaction already committing, we don't
need to call ext4_force_commit() in ext4_sync_fs(). Furthermore
ext4_force_commit() can unnecessarily create sync transaction which is
expensive so it's worthwhile to remove it when we can.
http://bugzilla.kernel.org/show_bug.cgi?id=12224
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Eric Sandeen <sandeen@redhat.com>
Cc: linux-ext4@vger.kernel.org
The function jbd2_journal_start_commit() returns 1 if either a
transaction is committing or the function has queued a transaction
commit. But it returns 0 if we raced with somebody queueing the
transaction commit as well. This resulted in ext4_sync_fs() not
functioning correctly (description from Arthur Jones):
In the case of a data=ordered umount with pending long symlinks
which are delayed due to a long list of other I/O on the backing
block device, this causes the buffer associated with the long
symlinks to not be moved to the inode dirty list in the second
phase of fsync_super. Then, before they can be dirtied again,
kjournald exits, seeing the UMOUNT flag and the dirty pages are
never written to the backing block device, causing long symlink
corruption and exposing new or previously freed block data to
userspace.
This can be reproduced with a script created by Eric Sandeen
<sandeen@redhat.com>:
#!/bin/bash
umount /mnt/test2
mount /dev/sdb4 /mnt/test2
rm -f /mnt/test2/*
dd if=/dev/zero of=/mnt/test2/bigfile bs=1M count=512
touch /mnt/test2/thisisveryveryveryveryveryveryveryveryveryveryveryveryveryveryveryverylongfilename
ln -s /mnt/test2/thisisveryveryveryveryveryveryveryveryveryveryveryveryveryveryveryverylongfilename
/mnt/test2/link
umount /mnt/test2
mount /dev/sdb4 /mnt/test2
ls /mnt/test2/
This patch fixes jbd2_journal_start_commit() to always return 1 when
there's a transaction committing or queued for commit.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
CC: Eric Sandeen <sandeen@redhat.com>
CC: linux-ext4@vger.kernel.org
With a postfix decrement the timeout will reach -1 rather than 0,
so the warning will not be issued.
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
The C99 specification states in section 6.11.5:
The placement of a storage-class specifier other than at the beginning of the
declaration specifiers in a declaration is an obsolescent feature.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
arch/powerpc/platforms/pseries/hotplug-memory.c uses
remove_section_mapping() but doesn't include sparsemem.h which defines
it. This can cause compilation fails for some configs.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The new legacy_mem file in sysfs is causing problems with X on machines
that don't support legacy memory access. The way I initially implemented
it, we would fail with -ENXIO when trying to mmap it, thus exposing to
X that we do support the API but there is no legacy memory.
Unfortunately, X poor error handling is causing it to fail to start when
it gets this error.
This implements a workaround hack that instead maps anonymous memory
instead (using shmem if VM_SHARED is set, just like /dev/zero does).
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
arch/powerpc/oprofile/cell/spu_profiler.c is missing a asm/time.h
include which is required for ppc_proc_freq. This can cause compile
failures for some config combinations.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Impact: fix dynamic ftrace with large modules in PPC64
The math to calculate the offset into the TOC that is taken from reading
the trampoline is incorrect. The bottom half of the offset is a signed
extended short. The current code was using an OR to create the offset
when it should have been using an addition.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Acked-by: Geoff Levand <geoffrey.levand@am.sony.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Currently emulate_step() emulates mr. instructions without updating cr0
and this can be disastrous. Don't emulate mr.
This bug has been around for a while, but I am not sure if its a worthy
-stable candidate. I'll leave it to Ben do decide.
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Fixed v_mapped_by_tlbcam() and p_mapped_by_tlbcam() to use phys_addr_t
instead of unsigned long. In 36-bit physical mode we really need these
functions to deal with phys_addr_t when trying to match a physical
address or when returning one.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Impact: fix broken /proc/profile on UP machines
Commit c309b917ca "cpumask: convert
kernel/profile.c" broke profiling. prof_cpu_mask was previously
initialized to CPU_MASK_ALL, but left uninitialized in that commit.
We need to copy cpu_possible_mask (cpu_online_mask is not enough).
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: stack protector for x86_32
Implement stack protector for x86_32. GDT entry 28 is used for it.
It's set to point to stack_canary-20 and have the length of 24 bytes.
CONFIG_CC_STACKPROTECTOR turns off CONFIG_X86_32_LAZY_GS and sets %gs
to the stack canary segment on entry. As %gs is otherwise unused by
the kernel, the canary can be anywhere. It's defined as a percpu
variable.
x86_32 exception handlers take register frame on stack directly as
struct pt_regs. With -fstack-protector turned on, gcc copies the
whole structure after the stack canary and (of course) doesn't copy
back on return thus losing all changed. For now, -fno-stack-protector
is added to all files which contain those functions. We definitely
need something better.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: pt_regs changed, lazy gs handling made optional, add slight
overhead to SAVE_ALL, simplifies error_code path a bit
On x86_32, %gs hasn't been used by kernel and handled lazily. pt_regs
doesn't have place for it and gs is saved/loaded only when necessary.
In preparation for stack protector support, this patch makes lazy %gs
handling optional by doing the followings.
* Add CONFIG_X86_32_LAZY_GS and place for gs in pt_regs.
* Save and restore %gs along with other registers in entry_32.S unless
LAZY_GS. Note that this unfortunately adds "pushl $0" on SAVE_ALL
even when LAZY_GS. However, it adds no overhead to common exit path
and simplifies entry path with error code.
* Define different user_gs accessors depending on LAZY_GS and add
lazy_save_gs() and lazy_load_gs() which are noop if !LAZY_GS. The
lazy_*_gs() ops are used to save, load and clear %gs lazily.
* Define ELF_CORE_COPY_KERNEL_REGS() which always read %gs directly.
xen and lguest changes need to be verified.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
On x86_32, %gs is handled lazily. It's not saved and restored on
kernel entry/exit but only when necessary which usually is during task
switch but there are few other places. Currently, it's done by
calling savesegment() and loadsegment() explicitly. Define
get_user_gs(), set_user_gs() and task_user_gs() and use them instead.
While at it, clean up register access macros in signal.c.
This cleans up code a bit and will help future changes.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
Use .macro instead of cpp #define where approriate. This cleans up
code and will ease future changes.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: no default -fno-stack-protector if stackp is enabled, cleanup
Stackprotector make rules had the following problems.
* cc support test and warning are scattered across makefile and
kernel/panic.c.
* -fno-stack-protector was always added regardless of configuration.
Update such that cc support test and warning are contained in makefile
and -fno-stack-protector is added iff stackp is turned off. While at
it, prepare for 32bit support.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: misc udpate
* wrap content with CONFIG_CC_STACK_PROTECTOR so that other arch files
can include it directly
* add missing includes
This will help future changes.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>