Original 2.6.9 patch and explanation from somewhere within HP via
bugzilla...
ia64 stores a success/failure code in r10, and the return value (normal
return, or *positive* errno) in r8. The patch also sets the exit code to
negative errno if it's a failure result for consistency with other
architectures.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
This patch fixes a couple of bugs revealed in new features recently
added to -mm1:
* fixes warnings due to inconsistent use of const struct inode *inode
* fixes bug that prevent a kernel from booting with audit on, and SELinux off
due to a missing function in security/dummy.c
* fixes a bug that throws spurious audit_panic() messages due to a missing
return just before an error_path label
* some reasonable house cleaning in audit_ipc_context(),
audit_inode_context(), and audit_log_task_context()
Signed-off-by: Dustin Kirkland <dustin.kirkland@us.ibm.com>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
This patch extends existing audit records with subject/object context
information. Audit records associated with filesystem inodes, ipc, and
tasks now contain SELinux label information in the field "subj" if the
item is performing the action, or in "obj" if the item is the receiver
of an action.
These labels are collected via hooks in SELinux and appended to the
appropriate record in the audit code.
This additional information is required for Common Criteria Labeled
Security Protection Profile (LSPP).
[AV: fixed kmalloc flags use]
[folded leak fixes]
[folded cleanup from akpm (kfree(NULL)]
[folded audit_inode_context() leak fix]
[folded akpm's fix for audit_ipc_perm() definition in case of !CONFIG_AUDIT]
Signed-off-by: Dustin Kirkland <dustin.kirkland@us.ibm.com>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
- Add a new, 5th filter called "exclude".
- And add a new field AUDIT_MSGTYPE.
- Define a new function audit_filter_exclude() that takes a message type
as input and examines all rules in the filter. It returns '1' if the
message is to be excluded, and '0' otherwise.
- Call the audit_filter_exclude() function near the top of
audit_log_start() just after asserting audit_initialized. If the
message type is not to be audited, return NULL very early, before
doing a lot of work.
[combined with followup fix for bug in original patch, Nov 4, same author]
[combined with later renaming AUDIT_FILTER_EXCLUDE->AUDIT_FILTER_TYPE
and audit_filter_exclude() -> audit_filter_type()]
Signed-off-by: Dustin Kirkland <dustin.kirkland@us.ibm.com>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
This patch augments the collection of inode info during syscall
processing. It represents part of the functionality that was provided
by the auditfs patch included in RHEL4.
Specifically, it:
- Collects information for target inodes created or removed during
syscalls. Previous code only collects information for the target
inode's parent.
- Adds the audit_inode() hook to syscalls that operate on a file
descriptor (e.g. fchown), enabling audit to do inode filtering for
these calls.
- Modifies filtering code to check audit context for either an inode #
or a parent inode # matching a given rule.
- Modifies logging to provide inode # for both parent and child.
- Protect debug info from NULL audit_names.name.
[AV: folded a later typo fix from the same author]
Signed-off-by: Amy Griffis <amy.griffis@hp.com>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The audit hooks (to be added shortly) will want to see dentry->d_inode
too, not just the name.
Signed-off-by: Amy Griffis <amy.griffis@hp.com>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
The attached patch updates various items for the new user space
messages. Please apply.
Signed-off-by: Steve Grubb <sgrubb@redhat.com>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Currently, audit only supports the "=" and "!=" operators in the -F
filter rules.
This patch reworks the support for "=" and "!=", and adds support
for ">", ">=", "<", and "<=".
This turned out to be a pretty clean, and simply process. I ended up
using the high order bits of the "field", as suggested by Steve and Amy.
This allowed for no changes whatsoever to the netlink communications.
See the documentation within the patch in the include/linux/audit.h
area, where there is a table that explains the reasoning of the bitmask
assignments clearly.
The patch adds a new function, audit_comparator(left, op, right).
This function will perform the specified comparison (op, which defaults
to "==" for backward compatibility) between two values (left and right).
If the negate bit is on, it will negate whatever that result was. This
value is returned.
Signed-off-by: Dustin Kirkland <dustin.kirkland@us.ibm.com>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
- add kerneldoc for non-static functions;
- don't init static data to 0;
- limit lines to < 80 columns;
- fix long-format style;
- delete whitespace at end of some lines;
(chrisw: resend and update to current audit-2.6 tree)
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Chris Wright <chrisw@osdl.org>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
hi,
The motivation behind the patch below was to address messages in
/var/log/messages such as:
Jan 31 10:54:15 mets kernel: audit(:0): major=252 name_count=0: freeing
multiple contexts (1)
Jan 31 10:54:15 mets kernel: audit(:0): major=113 name_count=0: freeing
multiple contexts (2)
I can reproduce by running 'get-edid' from:
http://john.fremlin.de/programs/linux/read-edid/.
These messages come about in the log b/c the vm86 calls do not exit via
the normal system call exit paths and thus do not call
'audit_syscall_exit'. The next system call will then free the context for
itself and for the vm86 context, thus generating the above messages. This
patch addresses the issue by simply adding a call to 'audit_syscall_exit'
from the vm86 code.
Besides fixing the above error messages the patch also now allows vm86
system calls to become auditable. This is useful since strace does not
appear to properly record the return values from sys_vm86.
I think this patch is also a step in the right direction in terms of
cleaning up some core auditing code. If we can correct any other paths
that do not properly call the audit exit and entries points, then we can
also eliminate the notion of context chaining.
I've tested this patch by verifying that the log messages no longer
appear, and that the audit records for sys_vm86 appear to be correct.
Also, 'read_edid' produces itentical output.
thanks,
-Jason
Signed-off-by: Jason Baron <jbaron@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
We need to ensure that all writes to the XDR buffers are done before
req->rq_received is visible to other processors.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Kudos to Neil Brown for spotting the problem:
"in nfs_sync_inode, there is effectively the sequence:
nfs_wait_on_requests
nfs_flush_inode
nfs_commit_inode
This seems a bit racy to me as if the only requests are on the
->commit list, and nfs_commit_inode is called separately after
nfs_wait_on_requests completes, and before nfs_commit_inode start
(say: by nfs_write_inode) then none of these function will return
>0, yet there will be some pending request that aren't waited for."
The solution is to search for requests to wait upon, search for dirty
requests, and search for uncommitted requests while holding the
nfsi->req_lock
The patch also cleans up nfs_sync_inode(), getting rid of the redundant
FLUSH_WAIT flag. It turns out that we were always setting it.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
We don't need to set PG_private for readahead pages, since they never get
unlocked while I/O is in progress. However there is a small race in
nfs_readpage_release() whereby the page may be unlocked, and have
PG_private set.
Fix is to have PG_private set only for the case of writes...
Also fix a bug in nfs_clear_page_writeback(): Don't attempt to clear the
radix_tree tag if we've already deleted the radix tree entry.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
If the callback daemon is signalled, but is unable to exit because it still
has users, then we need to flush signals. If not, then svc_recv() can
never sleep, and so we hang.
If we flush signals, then we also have to be prepared to resend them when
we want the thread to exit.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Currently it returns NULL, which usually gets interpreted as ENOMEM. In
fact it can mean a host of issues.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The mount statistics patches introduced a call to nfs_free_iostats that is
not only redundant, but actually causes an oops.
Also fix a memory leak due to the lack of a call to nfs_free_iostats on
unmount.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The struct file_lock does not carry a properly initialised lock,
so don't copy it as if it were.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Now that we have aio writes, it is possible for dreq->outstanding to be
zero, but for the I/O not to have completed. Convert struct nfs_direct_req
to use a completion to signal when the I/O is done.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Introduced by NFS aio+dio patches.
Test plan:
Compile kernel with CONFIG_NFS enabled on 64-bit hardware.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Introduced by NFS metrics patch.
Test plan:
Compile kernel with CONFIG_NFS enabled on a 64-bit platform.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The patch "stop abusing file_lock_list introduces a couple of bugs since
the locks may be copied and need to be removed from the lists when they are
destroyed.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Currently lockd directly access the file_lock_list from fs/locks.c.
It does so to mark locks granted or reclaimable. This is very
suboptimal, because a) lockd needs to poke into locks.c internals, and
b) it needs to iterate over all locks in the system for marking locks
granted or reclaimable.
This patch adds lists for granted and reclaimable locks to the nlm_host
structure instead, and adds locks to those.
nlmclnt_lock:
now adds the lock to h_granted instead of setting the
NFS_LCK_GRANTED, still O(1)
nlmclnt_mark_reclaim:
goes away completely, replaced by a list_splice_init.
Complexity reduced from O(locks in the system) to O(1)
reclaimer:
iterates over h_reclaim now, complexity reduced from
O(locks in the system) to O(locks per nlm_host)
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
When doing NLM_GRANTED requests, lockd may end up blocking if we use
rpc_create_client() due to the synchronous call to rpc_ping(). Instead, use
rpc_new_client().
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Currently it uses nlmclnt_lookup_host(), which puts the resulting host
structure on a different list.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The caller of posix_test_lock() should never need to look at the lock
private data, so do not copy that information. This also means that there
is no need to call the fl_release_private methods.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The struct nfs_direct_req currently keeps a pointer to the file descriptor
without referencing it. This may cause problems if the parent process is
killed.
The nfs_open_context should normally have all the information that we're
currently using the filp for, and unlike fput(), is safe to release from
an rpciod process context.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Currently NFS O_DIRECT writes use FILE_SYNC so that a COMMIT is not
necessary. This simplifies the internal logic, but this could be a
difficult workload for some servers.
Instead, let's send UNSTABLE writes, and after they all complete, send a
COMMIT for the dirty range. After the COMMIT returns successfully, then do
the wake_up or fire off aio_complete().
Test plan:
Async direct I/O tests against Solaris (or any server that requires
committed unstable writes). Reboot server during test.
Based on an earlier patch by Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
^C against "iozone -I" is hitting the assertion in nfs_clear_inode().
Test plan:
"iozone -i0 -I -a -c" against a slow server, then control C. This should
not cause an oops.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Three atomic_t variables cause a lot of bus locking. Because they are all
used in the same places in the code, just use a single spin lock.
Now that the atomic_t variables are gone, we can remove the request size
limitation since the code no longer depends on the limited width of atomic_t
on some platforms.
Test plan:
Compile with CONFIG_NFS and CONFIG_NFS_DIRECTIO enabled. Millions of fsx
operations, iozone, OraSim.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Clean up tab damage and comments. Replace "file_offset" with more commonly
used "pos".
Test plan:
Compile with CONFIG_NFS and CONFIG_NFS_DIRECTIO enabled.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
For async iocb's, the NFS direct write path now returns EIOCBQUEUED,
and calls aio_complete when all the requested writes are finished. The
synchronous part of the NFS direct write path behaves exactly as it
was before.
Shared mapped NFS files will have some coherency difficulties when
accessed concurrently with aio+dio. Will need to explore how this
is handled in the local file system case.
Test plan:
aio-stress with "-O". OraSim.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>