Merge branch 'linus' into timers/hpet
This commit is contained in:
commit
85e9ca333d
8847 changed files with 637384 additions and 500938 deletions
11
.gitignore
vendored
11
.gitignore
vendored
|
@ -3,6 +3,10 @@
|
||||||
# subdirectories here. Add them in the ".gitignore" file
|
# subdirectories here. Add them in the ".gitignore" file
|
||||||
# in that subdirectory instead.
|
# in that subdirectory instead.
|
||||||
#
|
#
|
||||||
|
# NOTE! Please use 'git-ls-files -i --exclude-standard'
|
||||||
|
# command after changing this file, to see if there are
|
||||||
|
# any tracked files which get ignored after the change.
|
||||||
|
#
|
||||||
# Normal rules
|
# Normal rules
|
||||||
#
|
#
|
||||||
.*
|
.*
|
||||||
|
@ -18,18 +22,21 @@
|
||||||
*.lst
|
*.lst
|
||||||
*.symtypes
|
*.symtypes
|
||||||
*.order
|
*.order
|
||||||
|
*.elf
|
||||||
|
*.bin
|
||||||
|
*.gz
|
||||||
|
|
||||||
#
|
#
|
||||||
# Top-level generic files
|
# Top-level generic files
|
||||||
#
|
#
|
||||||
tags
|
tags
|
||||||
TAGS
|
TAGS
|
||||||
vmlinux*
|
vmlinux
|
||||||
!vmlinux.lds.S
|
|
||||||
System.map
|
System.map
|
||||||
Module.markers
|
Module.markers
|
||||||
Module.symvers
|
Module.symvers
|
||||||
!.gitignore
|
!.gitignore
|
||||||
|
!.mailmap
|
||||||
|
|
||||||
#
|
#
|
||||||
# Generated include files
|
# Generated include files
|
||||||
|
|
16
CREDITS
16
CREDITS
|
@ -317,6 +317,14 @@ S: 2322 37th Ave SW
|
||||||
S: Seattle, Washington 98126-2010
|
S: Seattle, Washington 98126-2010
|
||||||
S: USA
|
S: USA
|
||||||
|
|
||||||
|
N: Muli Ben-Yehuda
|
||||||
|
E: mulix@mulix.org
|
||||||
|
E: muli@il.ibm.com
|
||||||
|
W: http://www.mulix.org
|
||||||
|
D: trident OSS sound driver, x86-64 dma-ops and Calgary IOMMU,
|
||||||
|
D: KVM and Xen bits and other misc. hackery.
|
||||||
|
S: Haifa, Israel
|
||||||
|
|
||||||
N: Johannes Berg
|
N: Johannes Berg
|
||||||
E: johannes@sipsolutions.net
|
E: johannes@sipsolutions.net
|
||||||
W: http://johannes.sipsolutions.net/
|
W: http://johannes.sipsolutions.net/
|
||||||
|
@ -2611,8 +2619,9 @@ S: Perth, Western Australia
|
||||||
S: Australia
|
S: Australia
|
||||||
|
|
||||||
N: Miguel Ojeda Sandonis
|
N: Miguel Ojeda Sandonis
|
||||||
E: maxextreme@gmail.com
|
E: miguel.ojeda.sandonis@gmail.com
|
||||||
W: http://maxextreme.googlepages.com/
|
W: http://miguelojeda.es
|
||||||
|
W: http://jair.lab.fi.uva.es/~migojed/
|
||||||
D: Author of the ks0108, cfag12864b and cfag12864bfb auxiliary display drivers.
|
D: Author of the ks0108, cfag12864b and cfag12864bfb auxiliary display drivers.
|
||||||
D: Maintainer of the auxiliary display drivers tree (drivers/auxdisplay/*)
|
D: Maintainer of the auxiliary display drivers tree (drivers/auxdisplay/*)
|
||||||
S: C/ Mieses 20, 9-B
|
S: C/ Mieses 20, 9-B
|
||||||
|
@ -3343,8 +3352,7 @@ S: Spain
|
||||||
N: Linus Torvalds
|
N: Linus Torvalds
|
||||||
E: torvalds@linux-foundation.org
|
E: torvalds@linux-foundation.org
|
||||||
D: Original kernel hacker
|
D: Original kernel hacker
|
||||||
S: 12725 SW Millikan Way, Suite 400
|
S: Portland, Oregon 97005
|
||||||
S: Beaverton, Oregon 97005
|
|
||||||
S: USA
|
S: USA
|
||||||
|
|
||||||
N: Marcelo Tosatti
|
N: Marcelo Tosatti
|
||||||
|
|
|
@ -359,8 +359,6 @@ telephony/
|
||||||
- directory with info on telephony (e.g. voice over IP) support.
|
- directory with info on telephony (e.g. voice over IP) support.
|
||||||
time_interpolators.txt
|
time_interpolators.txt
|
||||||
- info on time interpolators.
|
- info on time interpolators.
|
||||||
tipar.txt
|
|
||||||
- information about Parallel link cable for Texas Instruments handhelds.
|
|
||||||
tty.txt
|
tty.txt
|
||||||
- guide to the locking policies of the tty layer.
|
- guide to the locking policies of the tty layer.
|
||||||
uml/
|
uml/
|
||||||
|
|
|
@ -26,3 +26,37 @@ Description:
|
||||||
I/O statistics of partition <part>. The format is the
|
I/O statistics of partition <part>. The format is the
|
||||||
same as the above-written /sys/block/<disk>/stat
|
same as the above-written /sys/block/<disk>/stat
|
||||||
format.
|
format.
|
||||||
|
|
||||||
|
|
||||||
|
What: /sys/block/<disk>/integrity/format
|
||||||
|
Date: June 2008
|
||||||
|
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||||||
|
Description:
|
||||||
|
Metadata format for integrity capable block device.
|
||||||
|
E.g. T10-DIF-TYPE1-CRC.
|
||||||
|
|
||||||
|
|
||||||
|
What: /sys/block/<disk>/integrity/read_verify
|
||||||
|
Date: June 2008
|
||||||
|
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||||||
|
Description:
|
||||||
|
Indicates whether the block layer should verify the
|
||||||
|
integrity of read requests serviced by devices that
|
||||||
|
support sending integrity metadata.
|
||||||
|
|
||||||
|
|
||||||
|
What: /sys/block/<disk>/integrity/tag_size
|
||||||
|
Date: June 2008
|
||||||
|
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||||||
|
Description:
|
||||||
|
Number of bytes of integrity tag space available per
|
||||||
|
512 bytes of data.
|
||||||
|
|
||||||
|
|
||||||
|
What: /sys/block/<disk>/integrity/write_generate
|
||||||
|
Date: June 2008
|
||||||
|
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||||||
|
Description:
|
||||||
|
Indicates whether the block layer should automatically
|
||||||
|
generate checksums for write requests bound for
|
||||||
|
devices that support receiving integrity metadata.
|
||||||
|
|
35
Documentation/ABI/testing/sysfs-bus-css
Normal file
35
Documentation/ABI/testing/sysfs-bus-css
Normal file
|
@ -0,0 +1,35 @@
|
||||||
|
What: /sys/bus/css/devices/.../type
|
||||||
|
Date: March 2008
|
||||||
|
Contact: Cornelia Huck <cornelia.huck@de.ibm.com>
|
||||||
|
linux-s390@vger.kernel.org
|
||||||
|
Description: Contains the subchannel type, as reported by the hardware.
|
||||||
|
This attribute is present for all subchannel types.
|
||||||
|
|
||||||
|
What: /sys/bus/css/devices/.../modalias
|
||||||
|
Date: March 2008
|
||||||
|
Contact: Cornelia Huck <cornelia.huck@de.ibm.com>
|
||||||
|
linux-s390@vger.kernel.org
|
||||||
|
Description: Contains the module alias as reported with uevents.
|
||||||
|
It is of the format css:t<type> and present for all
|
||||||
|
subchannel types.
|
||||||
|
|
||||||
|
What: /sys/bus/css/drivers/io_subchannel/.../chpids
|
||||||
|
Date: December 2002
|
||||||
|
Contact: Cornelia Huck <cornelia.huck@de.ibm.com>
|
||||||
|
linux-s390@vger.kernel.org
|
||||||
|
Description: Contains the ids of the channel paths used by this
|
||||||
|
subchannel, as reported by the channel subsystem
|
||||||
|
during subchannel recognition.
|
||||||
|
Note: This is an I/O-subchannel specific attribute.
|
||||||
|
Users: s390-tools, HAL
|
||||||
|
|
||||||
|
What: /sys/bus/css/drivers/io_subchannel/.../pimpampom
|
||||||
|
Date: December 2002
|
||||||
|
Contact: Cornelia Huck <cornelia.huck@de.ibm.com>
|
||||||
|
linux-s390@vger.kernel.org
|
||||||
|
Description: Contains the PIM/PAM/POM values, as reported by the
|
||||||
|
channel subsystem when last queried by the common I/O
|
||||||
|
layer (this implies that this attribute is not neccessarily
|
||||||
|
in sync with the values current in the channel subsystem).
|
||||||
|
Note: This is an I/O-subchannel specific attribute.
|
||||||
|
Users: s390-tools, HAL
|
20
Documentation/ABI/testing/sysfs-dev
Normal file
20
Documentation/ABI/testing/sysfs-dev
Normal file
|
@ -0,0 +1,20 @@
|
||||||
|
What: /sys/dev
|
||||||
|
Date: April 2008
|
||||||
|
KernelVersion: 2.6.26
|
||||||
|
Contact: Dan Williams <dan.j.williams@intel.com>
|
||||||
|
Description: The /sys/dev tree provides a method to look up the sysfs
|
||||||
|
path for a device using the information returned from
|
||||||
|
stat(2). There are two directories, 'block' and 'char',
|
||||||
|
beneath /sys/dev containing symbolic links with names of
|
||||||
|
the form "<major>:<minor>". These links point to the
|
||||||
|
corresponding sysfs path for the given device.
|
||||||
|
|
||||||
|
Example:
|
||||||
|
$ readlink /sys/dev/block/8:32
|
||||||
|
../../block/sdc
|
||||||
|
|
||||||
|
Entries in /sys/dev/char and /sys/dev/block will be
|
||||||
|
dynamically created and destroyed as devices enter and
|
||||||
|
leave the system.
|
||||||
|
|
||||||
|
Users: mdadm <linux-raid@vger.kernel.org>
|
24
Documentation/ABI/testing/sysfs-devices-memory
Normal file
24
Documentation/ABI/testing/sysfs-devices-memory
Normal file
|
@ -0,0 +1,24 @@
|
||||||
|
What: /sys/devices/system/memory
|
||||||
|
Date: June 2008
|
||||||
|
Contact: Badari Pulavarty <pbadari@us.ibm.com>
|
||||||
|
Description:
|
||||||
|
The /sys/devices/system/memory contains a snapshot of the
|
||||||
|
internal state of the kernel memory blocks. Files could be
|
||||||
|
added or removed dynamically to represent hot-add/remove
|
||||||
|
operations.
|
||||||
|
|
||||||
|
Users: hotplug memory add/remove tools
|
||||||
|
https://w3.opensource.ibm.com/projects/powerpc-utils/
|
||||||
|
|
||||||
|
What: /sys/devices/system/memory/memoryX/removable
|
||||||
|
Date: June 2008
|
||||||
|
Contact: Badari Pulavarty <pbadari@us.ibm.com>
|
||||||
|
Description:
|
||||||
|
The file /sys/devices/system/memory/memoryX/removable
|
||||||
|
indicates whether this memory block is removable or not.
|
||||||
|
This is useful for a user-level agent to determine
|
||||||
|
identify removable sections of the memory before attempting
|
||||||
|
potentially expensive hot-remove memory operation
|
||||||
|
|
||||||
|
Users: hotplug memory remove tools
|
||||||
|
https://w3.opensource.ibm.com/projects/powerpc-utils/
|
|
@ -29,46 +29,46 @@ Description:
|
||||||
|
|
||||||
$ cd /sys/firmware/acpi/interrupts
|
$ cd /sys/firmware/acpi/interrupts
|
||||||
$ grep . *
|
$ grep . *
|
||||||
error:0
|
error: 0
|
||||||
ff_gbl_lock:0
|
ff_gbl_lock: 0 enable
|
||||||
ff_pmtimer:0
|
ff_pmtimer: 0 invalid
|
||||||
ff_pwr_btn:0
|
ff_pwr_btn: 0 enable
|
||||||
ff_rt_clk:0
|
ff_rt_clk: 2 disable
|
||||||
ff_slp_btn:0
|
ff_slp_btn: 0 invalid
|
||||||
gpe00:0
|
gpe00: 0 invalid
|
||||||
gpe01:0
|
gpe01: 0 enable
|
||||||
gpe02:0
|
gpe02: 108 enable
|
||||||
gpe03:0
|
gpe03: 0 invalid
|
||||||
gpe04:0
|
gpe04: 0 invalid
|
||||||
gpe05:0
|
gpe05: 0 invalid
|
||||||
gpe06:0
|
gpe06: 0 enable
|
||||||
gpe07:0
|
gpe07: 0 enable
|
||||||
gpe08:0
|
gpe08: 0 invalid
|
||||||
gpe09:174
|
gpe09: 0 invalid
|
||||||
gpe0A:0
|
gpe0A: 0 invalid
|
||||||
gpe0B:0
|
gpe0B: 0 invalid
|
||||||
gpe0C:0
|
gpe0C: 0 invalid
|
||||||
gpe0D:0
|
gpe0D: 0 invalid
|
||||||
gpe0E:0
|
gpe0E: 0 invalid
|
||||||
gpe0F:0
|
gpe0F: 0 invalid
|
||||||
gpe10:0
|
gpe10: 0 invalid
|
||||||
gpe11:60
|
gpe11: 0 invalid
|
||||||
gpe12:0
|
gpe12: 0 invalid
|
||||||
gpe13:0
|
gpe13: 0 invalid
|
||||||
gpe14:0
|
gpe14: 0 invalid
|
||||||
gpe15:0
|
gpe15: 0 invalid
|
||||||
gpe16:0
|
gpe16: 0 invalid
|
||||||
gpe17:0
|
gpe17: 1084 enable
|
||||||
gpe18:0
|
gpe18: 0 enable
|
||||||
gpe19:7
|
gpe19: 0 invalid
|
||||||
gpe1A:0
|
gpe1A: 0 invalid
|
||||||
gpe1B:0
|
gpe1B: 0 invalid
|
||||||
gpe1C:0
|
gpe1C: 0 invalid
|
||||||
gpe1D:0
|
gpe1D: 0 invalid
|
||||||
gpe1E:0
|
gpe1E: 0 invalid
|
||||||
gpe1F:0
|
gpe1F: 0 invalid
|
||||||
gpe_all:241
|
gpe_all: 1192
|
||||||
sci:241
|
sci: 1194
|
||||||
|
|
||||||
sci - The total number of times the ACPI SCI
|
sci - The total number of times the ACPI SCI
|
||||||
has claimed an interrupt.
|
has claimed an interrupt.
|
||||||
|
@ -89,6 +89,13 @@ Description:
|
||||||
|
|
||||||
error - an interrupt that can't be accounted for above.
|
error - an interrupt that can't be accounted for above.
|
||||||
|
|
||||||
|
invalid: it's either a wakeup GPE or a GPE/Fixed Event that
|
||||||
|
doesn't have an event handler.
|
||||||
|
|
||||||
|
disable: the GPE/Fixed Event is valid but disabled.
|
||||||
|
|
||||||
|
enable: the GPE/Fixed Event is valid and enabled.
|
||||||
|
|
||||||
Root has permission to clear any of these counters. Eg.
|
Root has permission to clear any of these counters. Eg.
|
||||||
# echo 0 > gpe11
|
# echo 0 > gpe11
|
||||||
|
|
||||||
|
@ -97,3 +104,43 @@ Description:
|
||||||
|
|
||||||
None of these counters has an effect on the function
|
None of these counters has an effect on the function
|
||||||
of the system, they are simply statistics.
|
of the system, they are simply statistics.
|
||||||
|
|
||||||
|
Besides this, user can also write specific strings to these files
|
||||||
|
to enable/disable/clear ACPI interrupts in user space, which can be
|
||||||
|
used to debug some ACPI interrupt storm issues.
|
||||||
|
|
||||||
|
Note that only writting to VALID GPE/Fixed Event is allowed,
|
||||||
|
i.e. user can only change the status of runtime GPE and
|
||||||
|
Fixed Event with event handler installed.
|
||||||
|
|
||||||
|
Let's take power button fixed event for example, please kill acpid
|
||||||
|
and other user space applications so that the machine won't shutdown
|
||||||
|
when pressing the power button.
|
||||||
|
# cat ff_pwr_btn
|
||||||
|
0
|
||||||
|
# press the power button for 3 times;
|
||||||
|
# cat ff_pwr_btn
|
||||||
|
3
|
||||||
|
# echo disable > ff_pwr_btn
|
||||||
|
# cat ff_pwr_btn
|
||||||
|
disable
|
||||||
|
# press the power button for 3 times;
|
||||||
|
# cat ff_pwr_btn
|
||||||
|
disable
|
||||||
|
# echo enable > ff_pwr_btn
|
||||||
|
# cat ff_pwr_btn
|
||||||
|
4
|
||||||
|
/*
|
||||||
|
* this is because the status bit is set even if the enable bit is cleared,
|
||||||
|
* and it triggers an ACPI fixed event when the enable bit is set again
|
||||||
|
*/
|
||||||
|
# press the power button for 3 times;
|
||||||
|
# cat ff_pwr_btn
|
||||||
|
7
|
||||||
|
# echo disable > ff_pwr_btn
|
||||||
|
# press the power button for 3 times;
|
||||||
|
# echo clear > ff_pwr_btn /* clear the status bit */
|
||||||
|
# echo disable > ff_pwr_btn
|
||||||
|
# cat ff_pwr_btn
|
||||||
|
7
|
||||||
|
|
||||||
|
|
71
Documentation/ABI/testing/sysfs-firmware-memmap
Normal file
71
Documentation/ABI/testing/sysfs-firmware-memmap
Normal file
|
@ -0,0 +1,71 @@
|
||||||
|
What: /sys/firmware/memmap/
|
||||||
|
Date: June 2008
|
||||||
|
Contact: Bernhard Walle <bwalle@suse.de>
|
||||||
|
Description:
|
||||||
|
On all platforms, the firmware provides a memory map which the
|
||||||
|
kernel reads. The resources from that memory map are registered
|
||||||
|
in the kernel resource tree and exposed to userspace via
|
||||||
|
/proc/iomem (together with other resources).
|
||||||
|
|
||||||
|
However, on most architectures that firmware-provided memory
|
||||||
|
map is modified afterwards by the kernel itself, either because
|
||||||
|
the kernel merges that memory map with other information or
|
||||||
|
just because the user overwrites that memory map via command
|
||||||
|
line.
|
||||||
|
|
||||||
|
kexec needs the raw firmware-provided memory map to setup the
|
||||||
|
parameter segment of the kernel that should be booted with
|
||||||
|
kexec. Also, the raw memory map is useful for debugging. For
|
||||||
|
that reason, /sys/firmware/memmap is an interface that provides
|
||||||
|
the raw memory map to userspace.
|
||||||
|
|
||||||
|
The structure is as follows: Under /sys/firmware/memmap there
|
||||||
|
are subdirectories with the number of the entry as their name:
|
||||||
|
|
||||||
|
/sys/firmware/memmap/0
|
||||||
|
/sys/firmware/memmap/1
|
||||||
|
/sys/firmware/memmap/2
|
||||||
|
/sys/firmware/memmap/3
|
||||||
|
...
|
||||||
|
|
||||||
|
The maximum depends on the number of memory map entries provided
|
||||||
|
by the firmware. The order is just the order that the firmware
|
||||||
|
provides.
|
||||||
|
|
||||||
|
Each directory contains three files:
|
||||||
|
|
||||||
|
start : The start address (as hexadecimal number with the
|
||||||
|
'0x' prefix).
|
||||||
|
end : The end address, inclusive (regardless whether the
|
||||||
|
firmware provides inclusive or exclusive ranges).
|
||||||
|
type : Type of the entry as string. See below for a list of
|
||||||
|
valid types.
|
||||||
|
|
||||||
|
So, for example:
|
||||||
|
|
||||||
|
/sys/firmware/memmap/0/start
|
||||||
|
/sys/firmware/memmap/0/end
|
||||||
|
/sys/firmware/memmap/0/type
|
||||||
|
/sys/firmware/memmap/1/start
|
||||||
|
...
|
||||||
|
|
||||||
|
Currently following types exist:
|
||||||
|
|
||||||
|
- System RAM
|
||||||
|
- ACPI Tables
|
||||||
|
- ACPI Non-volatile Storage
|
||||||
|
- reserved
|
||||||
|
|
||||||
|
Following shell snippet can be used to display that memory
|
||||||
|
map in a human-readable format:
|
||||||
|
|
||||||
|
-------------------- 8< ----------------------------------------
|
||||||
|
#!/bin/bash
|
||||||
|
cd /sys/firmware/memmap
|
||||||
|
for dir in * ; do
|
||||||
|
start=$(cat $dir/start)
|
||||||
|
end=$(cat $dir/end)
|
||||||
|
type=$(cat $dir/type)
|
||||||
|
printf "%016x-%016x (%s)\n" $start $[ $end +1] "$type"
|
||||||
|
done
|
||||||
|
-------------------- >8 ----------------------------------------
|
6
Documentation/ABI/testing/sysfs-kernel-mm
Normal file
6
Documentation/ABI/testing/sysfs-kernel-mm
Normal file
|
@ -0,0 +1,6 @@
|
||||||
|
What: /sys/kernel/mm
|
||||||
|
Date: July 2008
|
||||||
|
Contact: Nishanth Aravamudan <nacc@us.ibm.com>, VM maintainers
|
||||||
|
Description:
|
||||||
|
/sys/kernel/mm/ should contain any and all VM
|
||||||
|
related information in /sys/kernel/.
|
15
Documentation/ABI/testing/sysfs-kernel-mm-hugepages
Normal file
15
Documentation/ABI/testing/sysfs-kernel-mm-hugepages
Normal file
|
@ -0,0 +1,15 @@
|
||||||
|
What: /sys/kernel/mm/hugepages/
|
||||||
|
Date: June 2008
|
||||||
|
Contact: Nishanth Aravamudan <nacc@us.ibm.com>, hugetlb maintainers
|
||||||
|
Description:
|
||||||
|
/sys/kernel/mm/hugepages/ contains a number of subdirectories
|
||||||
|
of the form hugepages-<size>kB, where <size> is the page size
|
||||||
|
of the hugepages supported by the kernel/CPU combination.
|
||||||
|
|
||||||
|
Under these directories are a number of files:
|
||||||
|
nr_hugepages
|
||||||
|
nr_overcommit_hugepages
|
||||||
|
free_hugepages
|
||||||
|
surplus_hugepages
|
||||||
|
resv_hugepages
|
||||||
|
See Documentation/vm/hugetlbpage.txt for details.
|
|
@ -474,25 +474,29 @@ make a good program).
|
||||||
So, you can either get rid of GNU emacs, or change it to use saner
|
So, you can either get rid of GNU emacs, or change it to use saner
|
||||||
values. To do the latter, you can stick the following in your .emacs file:
|
values. To do the latter, you can stick the following in your .emacs file:
|
||||||
|
|
||||||
(defun linux-c-mode ()
|
(defun c-lineup-arglist-tabs-only (ignored)
|
||||||
"C mode with adjusted defaults for use with the Linux kernel."
|
"Line up argument lists by tabs, not spaces"
|
||||||
(interactive)
|
(let* ((anchor (c-langelem-pos c-syntactic-element))
|
||||||
(c-mode)
|
(column (c-langelem-2nd-pos c-syntactic-element))
|
||||||
(c-set-style "K&R")
|
(offset (- (1+ column) anchor))
|
||||||
(setq tab-width 8)
|
(steps (floor offset c-basic-offset)))
|
||||||
(setq indent-tabs-mode t)
|
(* (max steps 1)
|
||||||
(setq c-basic-offset 8))
|
c-basic-offset)))
|
||||||
|
|
||||||
This will define the M-x linux-c-mode command. When hacking on a
|
(add-hook 'c-mode-hook
|
||||||
module, if you put the string -*- linux-c -*- somewhere on the first
|
(lambda ()
|
||||||
two lines, this mode will be automatically invoked. Also, you may want
|
(let ((filename (buffer-file-name)))
|
||||||
to add
|
;; Enable kernel mode for the appropriate files
|
||||||
|
(when (and filename
|
||||||
|
(string-match "~/src/linux-trees" filename))
|
||||||
|
(setq indent-tabs-mode t)
|
||||||
|
(c-set-style "linux")
|
||||||
|
(c-set-offset 'arglist-cont-nonempty
|
||||||
|
'(c-lineup-gcc-asm-reg
|
||||||
|
c-lineup-arglist-tabs-only))))))
|
||||||
|
|
||||||
(setq auto-mode-alist (cons '("/usr/src/linux.*/.*\\.[ch]$" . linux-c-mode)
|
This will make emacs go better with the kernel coding style for C
|
||||||
auto-mode-alist))
|
files below ~/src/linux-trees.
|
||||||
|
|
||||||
to your .emacs file if you want to have linux-c-mode switched on
|
|
||||||
automagically when you edit source files under /usr/src/linux.
|
|
||||||
|
|
||||||
But even if you fail in getting emacs to do sane formatting, not
|
But even if you fail in getting emacs to do sane formatting, not
|
||||||
everything is lost: use "indent".
|
everything is lost: use "indent".
|
||||||
|
|
|
@ -298,10 +298,10 @@ recommended that you never use these unless you really know what the
|
||||||
cache width is.
|
cache width is.
|
||||||
|
|
||||||
int
|
int
|
||||||
dma_mapping_error(dma_addr_t dma_addr)
|
dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
|
||||||
|
|
||||||
int
|
int
|
||||||
pci_dma_mapping_error(dma_addr_t dma_addr)
|
pci_dma_mapping_error(struct pci_dev *hwdev, dma_addr_t dma_addr)
|
||||||
|
|
||||||
In some circumstances dma_map_single and dma_map_page will fail to create
|
In some circumstances dma_map_single and dma_map_page will fail to create
|
||||||
a mapping. A driver can check for these errors by testing the returned
|
a mapping. A driver can check for these errors by testing the returned
|
||||||
|
|
|
@ -22,3 +22,12 @@ ready and available in memory. The DMA of the "completion indication"
|
||||||
could race with data DMA. Mapping the memory used for completion
|
could race with data DMA. Mapping the memory used for completion
|
||||||
indications with DMA_ATTR_WRITE_BARRIER would prevent the race.
|
indications with DMA_ATTR_WRITE_BARRIER would prevent the race.
|
||||||
|
|
||||||
|
DMA_ATTR_WEAK_ORDERING
|
||||||
|
----------------------
|
||||||
|
|
||||||
|
DMA_ATTR_WEAK_ORDERING specifies that reads and writes to the mapping
|
||||||
|
may be weakly ordered, that is that reads and writes may pass each other.
|
||||||
|
|
||||||
|
Since it is optional for platforms to implement DMA_ATTR_WEAK_ORDERING,
|
||||||
|
those that do not will simply ignore the attribute and exhibit default
|
||||||
|
behavior.
|
||||||
|
|
|
@ -524,6 +524,44 @@ These utilities include endpoint autoconfiguration.
|
||||||
<!-- !Edrivers/usb/gadget/epautoconf.c -->
|
<!-- !Edrivers/usb/gadget/epautoconf.c -->
|
||||||
</sect1>
|
</sect1>
|
||||||
|
|
||||||
|
<sect1 id="composite"><title>Composite Device Framework</title>
|
||||||
|
|
||||||
|
<para>The core API is sufficient for writing drivers for composite
|
||||||
|
USB devices (with more than one function in a given configuration),
|
||||||
|
and also multi-configuration devices (also more than one function,
|
||||||
|
but not necessarily sharing a given configuration).
|
||||||
|
There is however an optional framework which makes it easier to
|
||||||
|
reuse and combine functions.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>Devices using this framework provide a <emphasis>struct
|
||||||
|
usb_composite_driver</emphasis>, which in turn provides one or
|
||||||
|
more <emphasis>struct usb_configuration</emphasis> instances.
|
||||||
|
Each such configuration includes at least one
|
||||||
|
<emphasis>struct usb_function</emphasis>, which packages a user
|
||||||
|
visible role such as "network link" or "mass storage device".
|
||||||
|
Management functions may also exist, such as "Device Firmware
|
||||||
|
Upgrade".
|
||||||
|
</para>
|
||||||
|
|
||||||
|
!Iinclude/linux/usb/composite.h
|
||||||
|
!Edrivers/usb/gadget/composite.c
|
||||||
|
|
||||||
|
</sect1>
|
||||||
|
|
||||||
|
<sect1 id="functions"><title>Composite Device Functions</title>
|
||||||
|
|
||||||
|
<para>At this writing, a few of the current gadget drivers have
|
||||||
|
been converted to this framework.
|
||||||
|
Near-term plans include converting all of them, except for "gadgetfs".
|
||||||
|
</para>
|
||||||
|
|
||||||
|
!Edrivers/usb/gadget/f_acm.c
|
||||||
|
!Edrivers/usb/gadget/f_serial.c
|
||||||
|
|
||||||
|
</sect1>
|
||||||
|
|
||||||
|
|
||||||
</chapter>
|
</chapter>
|
||||||
|
|
||||||
<chapter id="controllers"><title>Peripheral Controller Drivers</title>
|
<chapter id="controllers"><title>Peripheral Controller Drivers</title>
|
||||||
|
|
|
@ -219,10 +219,10 @@
|
||||||
</para>
|
</para>
|
||||||
|
|
||||||
<sect1 id="lock-intro">
|
<sect1 id="lock-intro">
|
||||||
<title>Three Main Types of Kernel Locks: Spinlocks, Mutexes and Semaphores</title>
|
<title>Two Main Types of Kernel Locks: Spinlocks and Mutexes</title>
|
||||||
|
|
||||||
<para>
|
<para>
|
||||||
There are three main types of kernel locks. The fundamental type
|
There are two main types of kernel locks. The fundamental type
|
||||||
is the spinlock
|
is the spinlock
|
||||||
(<filename class="headerfile">include/asm/spinlock.h</filename>),
|
(<filename class="headerfile">include/asm/spinlock.h</filename>),
|
||||||
which is a very simple single-holder lock: if you can't get the
|
which is a very simple single-holder lock: if you can't get the
|
||||||
|
@ -239,14 +239,6 @@
|
||||||
can't sleep (see <xref linkend="sleeping-things"/>), and so have to
|
can't sleep (see <xref linkend="sleeping-things"/>), and so have to
|
||||||
use a spinlock instead.
|
use a spinlock instead.
|
||||||
</para>
|
</para>
|
||||||
<para>
|
|
||||||
The third type is a semaphore
|
|
||||||
(<filename class="headerfile">include/linux/semaphore.h</filename>): it
|
|
||||||
can have more than one holder at any time (the number decided at
|
|
||||||
initialization time), although it is most commonly used as a
|
|
||||||
single-holder lock (a mutex). If you can't get a semaphore, your
|
|
||||||
task will be suspended and later on woken up - just like for mutexes.
|
|
||||||
</para>
|
|
||||||
<para>
|
<para>
|
||||||
Neither type of lock is recursive: see
|
Neither type of lock is recursive: see
|
||||||
<xref linkend="deadlock"/>.
|
<xref linkend="deadlock"/>.
|
||||||
|
@ -278,7 +270,7 @@
|
||||||
</para>
|
</para>
|
||||||
|
|
||||||
<para>
|
<para>
|
||||||
Semaphores still exist, because they are required for
|
Mutexes still exist, because they are required for
|
||||||
synchronization between <firstterm linkend="gloss-usercontext">user
|
synchronization between <firstterm linkend="gloss-usercontext">user
|
||||||
contexts</firstterm>, as we will see below.
|
contexts</firstterm>, as we will see below.
|
||||||
</para>
|
</para>
|
||||||
|
@ -289,18 +281,17 @@
|
||||||
|
|
||||||
<para>
|
<para>
|
||||||
If you have a data structure which is only ever accessed from
|
If you have a data structure which is only ever accessed from
|
||||||
user context, then you can use a simple semaphore
|
user context, then you can use a simple mutex
|
||||||
(<filename>linux/linux/semaphore.h</filename>) to protect it. This
|
(<filename>include/linux/mutex.h</filename>) to protect it. This
|
||||||
is the most trivial case: you initialize the semaphore to the number
|
is the most trivial case: you initialize the mutex. Then you can
|
||||||
of resources available (usually 1), and call
|
call <function>mutex_lock_interruptible()</function> to grab the mutex,
|
||||||
<function>down_interruptible()</function> to grab the semaphore, and
|
and <function>mutex_unlock()</function> to release it. There is also a
|
||||||
<function>up()</function> to release it. There is also a
|
<function>mutex_lock()</function>, which should be avoided, because it
|
||||||
<function>down()</function>, which should be avoided, because it
|
|
||||||
will not return if a signal is received.
|
will not return if a signal is received.
|
||||||
</para>
|
</para>
|
||||||
|
|
||||||
<para>
|
<para>
|
||||||
Example: <filename>linux/net/core/netfilter.c</filename> allows
|
Example: <filename>net/netfilter/nf_sockopt.c</filename> allows
|
||||||
registration of new <function>setsockopt()</function> and
|
registration of new <function>setsockopt()</function> and
|
||||||
<function>getsockopt()</function> calls, with
|
<function>getsockopt()</function> calls, with
|
||||||
<function>nf_register_sockopt()</function>. Registration and
|
<function>nf_register_sockopt()</function>. Registration and
|
||||||
|
@ -515,7 +506,7 @@
|
||||||
<listitem>
|
<listitem>
|
||||||
<para>
|
<para>
|
||||||
If you are in a process context (any syscall) and want to
|
If you are in a process context (any syscall) and want to
|
||||||
lock other process out, use a semaphore. You can take a semaphore
|
lock other process out, use a mutex. You can take a mutex
|
||||||
and sleep (<function>copy_from_user*(</function> or
|
and sleep (<function>copy_from_user*(</function> or
|
||||||
<function>kmalloc(x,GFP_KERNEL)</function>).
|
<function>kmalloc(x,GFP_KERNEL)</function>).
|
||||||
</para>
|
</para>
|
||||||
|
@ -662,7 +653,7 @@
|
||||||
<entry>SLBH</entry>
|
<entry>SLBH</entry>
|
||||||
<entry>SLBH</entry>
|
<entry>SLBH</entry>
|
||||||
<entry>SLBH</entry>
|
<entry>SLBH</entry>
|
||||||
<entry>DI</entry>
|
<entry>MLI</entry>
|
||||||
<entry>None</entry>
|
<entry>None</entry>
|
||||||
</row>
|
</row>
|
||||||
|
|
||||||
|
@ -692,8 +683,8 @@
|
||||||
<entry>spin_lock_bh</entry>
|
<entry>spin_lock_bh</entry>
|
||||||
</row>
|
</row>
|
||||||
<row>
|
<row>
|
||||||
<entry>DI</entry>
|
<entry>MLI</entry>
|
||||||
<entry>down_interruptible</entry>
|
<entry>mutex_lock_interruptible</entry>
|
||||||
</row>
|
</row>
|
||||||
|
|
||||||
</tbody>
|
</tbody>
|
||||||
|
@ -1310,7 +1301,7 @@ as Alan Cox says, <quote>Lock data, not code</quote>.
|
||||||
<para>
|
<para>
|
||||||
There is a coding bug where a piece of code tries to grab a
|
There is a coding bug where a piece of code tries to grab a
|
||||||
spinlock twice: it will spin forever, waiting for the lock to
|
spinlock twice: it will spin forever, waiting for the lock to
|
||||||
be released (spinlocks, rwlocks and semaphores are not
|
be released (spinlocks, rwlocks and mutexes are not
|
||||||
recursive in Linux). This is trivial to diagnose: not a
|
recursive in Linux). This is trivial to diagnose: not a
|
||||||
stay-up-five-nights-talk-to-fluffy-code-bunnies kind of
|
stay-up-five-nights-talk-to-fluffy-code-bunnies kind of
|
||||||
problem.
|
problem.
|
||||||
|
@ -1335,7 +1326,7 @@ as Alan Cox says, <quote>Lock data, not code</quote>.
|
||||||
|
|
||||||
<para>
|
<para>
|
||||||
This complete lockup is easy to diagnose: on SMP boxes the
|
This complete lockup is easy to diagnose: on SMP boxes the
|
||||||
watchdog timer or compiling with <symbol>DEBUG_SPINLOCKS</symbol> set
|
watchdog timer or compiling with <symbol>DEBUG_SPINLOCK</symbol> set
|
||||||
(<filename>include/linux/spinlock.h</filename>) will show this up
|
(<filename>include/linux/spinlock.h</filename>) will show this up
|
||||||
immediately when it happens.
|
immediately when it happens.
|
||||||
</para>
|
</para>
|
||||||
|
@ -1558,7 +1549,7 @@ the amount of locking which needs to be done.
|
||||||
<title>Read/Write Lock Variants</title>
|
<title>Read/Write Lock Variants</title>
|
||||||
|
|
||||||
<para>
|
<para>
|
||||||
Both spinlocks and semaphores have read/write variants:
|
Both spinlocks and mutexes have read/write variants:
|
||||||
<type>rwlock_t</type> and <structname>struct rw_semaphore</structname>.
|
<type>rwlock_t</type> and <structname>struct rw_semaphore</structname>.
|
||||||
These divide users into two classes: the readers and the writers. If
|
These divide users into two classes: the readers and the writers. If
|
||||||
you are only reading the data, you can get a read lock, but to write to
|
you are only reading the data, you can get a read lock, but to write to
|
||||||
|
@ -1681,7 +1672,7 @@ the amount of locking which needs to be done.
|
||||||
#include <linux/slab.h>
|
#include <linux/slab.h>
|
||||||
#include <linux/string.h>
|
#include <linux/string.h>
|
||||||
+#include <linux/rcupdate.h>
|
+#include <linux/rcupdate.h>
|
||||||
#include <linux/semaphore.h>
|
#include <linux/mutex.h>
|
||||||
#include <asm/errno.h>
|
#include <asm/errno.h>
|
||||||
|
|
||||||
struct object
|
struct object
|
||||||
|
@ -1913,7 +1904,7 @@ machines due to caching.
|
||||||
</listitem>
|
</listitem>
|
||||||
<listitem>
|
<listitem>
|
||||||
<para>
|
<para>
|
||||||
<function> put_user()</function>
|
<function>put_user()</function>
|
||||||
</para>
|
</para>
|
||||||
</listitem>
|
</listitem>
|
||||||
</itemizedlist>
|
</itemizedlist>
|
||||||
|
@ -1927,13 +1918,13 @@ machines due to caching.
|
||||||
|
|
||||||
<listitem>
|
<listitem>
|
||||||
<para>
|
<para>
|
||||||
<function>down_interruptible()</function> and
|
<function>mutex_lock_interruptible()</function> and
|
||||||
<function>down()</function>
|
<function>mutex_lock()</function>
|
||||||
</para>
|
</para>
|
||||||
<para>
|
<para>
|
||||||
There is a <function>down_trylock()</function> which can be
|
There is a <function>mutex_trylock()</function> which can be
|
||||||
used inside interrupt context, as it will not sleep.
|
used inside interrupt context, as it will not sleep.
|
||||||
<function>up()</function> will also never sleep.
|
<function>mutex_unlock()</function> will also never sleep.
|
||||||
</para>
|
</para>
|
||||||
</listitem>
|
</listitem>
|
||||||
</itemizedlist>
|
</itemizedlist>
|
||||||
|
@ -2023,7 +2014,7 @@ machines due to caching.
|
||||||
<para>
|
<para>
|
||||||
Prior to 2.5, or when <symbol>CONFIG_PREEMPT</symbol> is
|
Prior to 2.5, or when <symbol>CONFIG_PREEMPT</symbol> is
|
||||||
unset, processes in user context inside the kernel would not
|
unset, processes in user context inside the kernel would not
|
||||||
preempt each other (ie. you had that CPU until you have it up,
|
preempt each other (ie. you had that CPU until you gave it up,
|
||||||
except for interrupts). With the addition of
|
except for interrupts). With the addition of
|
||||||
<symbol>CONFIG_PREEMPT</symbol> in 2.5.4, this changed: when
|
<symbol>CONFIG_PREEMPT</symbol> in 2.5.4, this changed: when
|
||||||
in user context, higher priority tasks can "cut in": spinlocks
|
in user context, higher priority tasks can "cut in": spinlocks
|
||||||
|
|
|
@ -84,10 +84,9 @@
|
||||||
runs an instance of gdb against the vmlinux file which contains
|
runs an instance of gdb against the vmlinux file which contains
|
||||||
the symbols (not boot image such as bzImage, zImage, uImage...).
|
the symbols (not boot image such as bzImage, zImage, uImage...).
|
||||||
In gdb the developer specifies the connection parameters and
|
In gdb the developer specifies the connection parameters and
|
||||||
connects to kgdb. Depending on which kgdb I/O modules exist in
|
connects to kgdb. The type of connection a developer makes with
|
||||||
the kernel for a given architecture, it may be possible to debug
|
gdb depends on the availability of kgdb I/O modules compiled as
|
||||||
the test machine's kernel with the development machine using a
|
builtin's or kernel modules in the test machine's kernel.
|
||||||
rs232 or ethernet connection.
|
|
||||||
</para>
|
</para>
|
||||||
</chapter>
|
</chapter>
|
||||||
<chapter id="CompilingAKernel">
|
<chapter id="CompilingAKernel">
|
||||||
|
@ -223,7 +222,7 @@
|
||||||
</para>
|
</para>
|
||||||
<para>
|
<para>
|
||||||
IMPORTANT NOTE: Using this option with kgdb over the console
|
IMPORTANT NOTE: Using this option with kgdb over the console
|
||||||
(kgdboc) or kgdb over ethernet (kgdboe) is not supported.
|
(kgdboc) is not supported.
|
||||||
</para>
|
</para>
|
||||||
</sect1>
|
</sect1>
|
||||||
</chapter>
|
</chapter>
|
||||||
|
@ -249,18 +248,11 @@
|
||||||
(gdb) target remote /dev/ttyS0
|
(gdb) target remote /dev/ttyS0
|
||||||
</programlisting>
|
</programlisting>
|
||||||
<para>
|
<para>
|
||||||
Example (kgdb to a terminal server):
|
Example (kgdb to a terminal server on tcp port 2012):
|
||||||
</para>
|
</para>
|
||||||
<programlisting>
|
<programlisting>
|
||||||
% gdb ./vmlinux
|
% gdb ./vmlinux
|
||||||
(gdb) target remote udp:192.168.2.2:6443
|
(gdb) target remote 192.168.2.2:2012
|
||||||
</programlisting>
|
|
||||||
<para>
|
|
||||||
Example (kgdb over ethernet):
|
|
||||||
</para>
|
|
||||||
<programlisting>
|
|
||||||
% gdb ./vmlinux
|
|
||||||
(gdb) target remote udp:192.168.2.2:6443
|
|
||||||
</programlisting>
|
</programlisting>
|
||||||
<para>
|
<para>
|
||||||
Once connected, you can debug a kernel the way you would debug an
|
Once connected, you can debug a kernel the way you would debug an
|
||||||
|
|
|
@ -29,12 +29,12 @@
|
||||||
|
|
||||||
<revhistory>
|
<revhistory>
|
||||||
<revision>
|
<revision>
|
||||||
<revnumber>1.0 </revnumber>
|
<revnumber>1.0</revnumber>
|
||||||
<date>May 30, 2001</date>
|
<date>May 30, 2001</date>
|
||||||
<revremark>Initial revision posted to linux-kernel</revremark>
|
<revremark>Initial revision posted to linux-kernel</revremark>
|
||||||
</revision>
|
</revision>
|
||||||
<revision>
|
<revision>
|
||||||
<revnumber>1.1 </revnumber>
|
<revnumber>1.1</revnumber>
|
||||||
<date>June 3, 2001</date>
|
<date>June 3, 2001</date>
|
||||||
<revremark>Revised after comments from linux-kernel</revremark>
|
<revremark>Revised after comments from linux-kernel</revremark>
|
||||||
</revision>
|
</revision>
|
||||||
|
|
|
@ -21,6 +21,18 @@
|
||||||
</affiliation>
|
</affiliation>
|
||||||
</author>
|
</author>
|
||||||
|
|
||||||
|
<copyright>
|
||||||
|
<year>2006-2008</year>
|
||||||
|
<holder>Hans-Jürgen Koch.</holder>
|
||||||
|
</copyright>
|
||||||
|
|
||||||
|
<legalnotice>
|
||||||
|
<para>
|
||||||
|
This documentation is Free Software licensed under the terms of the
|
||||||
|
GPL version 2.
|
||||||
|
</para>
|
||||||
|
</legalnotice>
|
||||||
|
|
||||||
<pubdate>2006-12-11</pubdate>
|
<pubdate>2006-12-11</pubdate>
|
||||||
|
|
||||||
<abstract>
|
<abstract>
|
||||||
|
@ -29,6 +41,12 @@
|
||||||
</abstract>
|
</abstract>
|
||||||
|
|
||||||
<revhistory>
|
<revhistory>
|
||||||
|
<revision>
|
||||||
|
<revnumber>0.5</revnumber>
|
||||||
|
<date>2008-05-22</date>
|
||||||
|
<authorinitials>hjk</authorinitials>
|
||||||
|
<revremark>Added description of write() function.</revremark>
|
||||||
|
</revision>
|
||||||
<revision>
|
<revision>
|
||||||
<revnumber>0.4</revnumber>
|
<revnumber>0.4</revnumber>
|
||||||
<date>2007-11-26</date>
|
<date>2007-11-26</date>
|
||||||
|
@ -57,20 +75,9 @@
|
||||||
</bookinfo>
|
</bookinfo>
|
||||||
|
|
||||||
<chapter id="aboutthisdoc">
|
<chapter id="aboutthisdoc">
|
||||||
<?dbhtml filename="about.html"?>
|
<?dbhtml filename="aboutthis.html"?>
|
||||||
<title>About this document</title>
|
<title>About this document</title>
|
||||||
|
|
||||||
<sect1 id="copyright">
|
|
||||||
<?dbhtml filename="copyright.html"?>
|
|
||||||
<title>Copyright and License</title>
|
|
||||||
<para>
|
|
||||||
Copyright (c) 2006 by Hans-Jürgen Koch.</para>
|
|
||||||
<para>
|
|
||||||
This documentation is Free Software licensed under the terms of the
|
|
||||||
GPL version 2.
|
|
||||||
</para>
|
|
||||||
</sect1>
|
|
||||||
|
|
||||||
<sect1 id="translations">
|
<sect1 id="translations">
|
||||||
<?dbhtml filename="translations.html"?>
|
<?dbhtml filename="translations.html"?>
|
||||||
<title>Translations</title>
|
<title>Translations</title>
|
||||||
|
@ -189,6 +196,30 @@ interested in translating it, please email me
|
||||||
represents the total interrupt count. You can use this number
|
represents the total interrupt count. You can use this number
|
||||||
to figure out if you missed some interrupts.
|
to figure out if you missed some interrupts.
|
||||||
</para>
|
</para>
|
||||||
|
<para>
|
||||||
|
For some hardware that has more than one interrupt source internally,
|
||||||
|
but not separate IRQ mask and status registers, there might be
|
||||||
|
situations where userspace cannot determine what the interrupt source
|
||||||
|
was if the kernel handler disables them by writing to the chip's IRQ
|
||||||
|
register. In such a case, the kernel has to disable the IRQ completely
|
||||||
|
to leave the chip's register untouched. Now the userspace part can
|
||||||
|
determine the cause of the interrupt, but it cannot re-enable
|
||||||
|
interrupts. Another cornercase is chips where re-enabling interrupts
|
||||||
|
is a read-modify-write operation to a combined IRQ status/acknowledge
|
||||||
|
register. This would be racy if a new interrupt occurred
|
||||||
|
simultaneously.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
To address these problems, UIO also implements a write() function. It
|
||||||
|
is normally not used and can be ignored for hardware that has only a
|
||||||
|
single interrupt source or has separate IRQ mask and status registers.
|
||||||
|
If you need it, however, a write to <filename>/dev/uioX</filename>
|
||||||
|
will call the <function>irqcontrol()</function> function implemented
|
||||||
|
by the driver. You have to write a 32-bit value that is usually either
|
||||||
|
0 or 1 to disable or enable interrupts. If a driver does not implement
|
||||||
|
<function>irqcontrol()</function>, <function>write()</function> will
|
||||||
|
return with <varname>-ENOSYS</varname>.
|
||||||
|
</para>
|
||||||
|
|
||||||
<para>
|
<para>
|
||||||
To handle interrupts properly, your custom kernel module can
|
To handle interrupts properly, your custom kernel module can
|
||||||
|
@ -362,6 +393,14 @@ device is actually used.
|
||||||
<function>open()</function>, you will probably also want a custom
|
<function>open()</function>, you will probably also want a custom
|
||||||
<function>release()</function> function.
|
<function>release()</function> function.
|
||||||
</para></listitem>
|
</para></listitem>
|
||||||
|
|
||||||
|
<listitem><para>
|
||||||
|
<varname>int (*irqcontrol)(struct uio_info *info, s32 irq_on)
|
||||||
|
</varname>: Optional. If you need to be able to enable or disable
|
||||||
|
interrupts from userspace by writing to <filename>/dev/uioX</filename>,
|
||||||
|
you can implement this function. The parameter <varname>irq_on</varname>
|
||||||
|
will be 0 to disable interrupts and 1 to enable them.
|
||||||
|
</para></listitem>
|
||||||
</itemizedlist>
|
</itemizedlist>
|
||||||
|
|
||||||
<para>
|
<para>
|
||||||
|
|
|
@ -358,7 +358,7 @@ Here is a list of some of the different kernel trees available:
|
||||||
- pcmcia, Dominik Brodowski <linux@dominikbrodowski.net>
|
- pcmcia, Dominik Brodowski <linux@dominikbrodowski.net>
|
||||||
git.kernel.org:/pub/scm/linux/kernel/git/brodo/pcmcia-2.6.git
|
git.kernel.org:/pub/scm/linux/kernel/git/brodo/pcmcia-2.6.git
|
||||||
|
|
||||||
- SCSI, James Bottomley <James.Bottomley@SteelEye.com>
|
- SCSI, James Bottomley <James.Bottomley@hansenpartnership.com>
|
||||||
git.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6.git
|
git.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6.git
|
||||||
|
|
||||||
- x86, Ingo Molnar <mingo@elte.hu>
|
- x86, Ingo Molnar <mingo@elte.hu>
|
||||||
|
@ -377,7 +377,7 @@ Bug Reporting
|
||||||
bugzilla.kernel.org is where the Linux kernel developers track kernel
|
bugzilla.kernel.org is where the Linux kernel developers track kernel
|
||||||
bugs. Users are encouraged to report all bugs that they find in this
|
bugs. Users are encouraged to report all bugs that they find in this
|
||||||
tool. For details on how to use the kernel bugzilla, please see:
|
tool. For details on how to use the kernel bugzilla, please see:
|
||||||
http://test.kernel.org/bugzilla/faq.html
|
http://bugzilla.kernel.org/page.cgi?id=faq.html
|
||||||
|
|
||||||
The file REPORTING-BUGS in the main kernel source directory has a good
|
The file REPORTING-BUGS in the main kernel source directory has a good
|
||||||
template for how to report a possible kernel bug, and details what kind
|
template for how to report a possible kernel bug, and details what kind
|
||||||
|
|
|
@ -1,17 +1,26 @@
|
||||||
|
ChangeLog:
|
||||||
|
Started by Ingo Molnar <mingo@redhat.com>
|
||||||
|
Update by Max Krasnyansky <maxk@qualcomm.com>
|
||||||
|
|
||||||
SMP IRQ affinity, started by Ingo Molnar <mingo@redhat.com>
|
SMP IRQ affinity
|
||||||
|
|
||||||
|
|
||||||
/proc/irq/IRQ#/smp_affinity specifies which target CPUs are permitted
|
/proc/irq/IRQ#/smp_affinity specifies which target CPUs are permitted
|
||||||
for a given IRQ source. It's a bitmask of allowed CPUs. It's not allowed
|
for a given IRQ source. It's a bitmask of allowed CPUs. It's not allowed
|
||||||
to turn off all CPUs, and if an IRQ controller does not support IRQ
|
to turn off all CPUs, and if an IRQ controller does not support IRQ
|
||||||
affinity then the value will not change from the default 0xffffffff.
|
affinity then the value will not change from the default 0xffffffff.
|
||||||
|
|
||||||
Here is an example of restricting IRQ44 (eth1) to CPU0-3 then restricting
|
/proc/irq/default_smp_affinity specifies default affinity mask that applies
|
||||||
the IRQ to CPU4-7 (this is an 8-CPU SMP box):
|
to all non-active IRQs. Once IRQ is allocated/activated its affinity bitmask
|
||||||
|
will be set to the default mask. It can then be changed as described above.
|
||||||
|
Default mask is 0xffffffff.
|
||||||
|
|
||||||
|
Here is an example of restricting IRQ44 (eth1) to CPU0-3 then restricting
|
||||||
|
it to CPU4-7 (this is an 8-CPU SMP box):
|
||||||
|
|
||||||
|
[root@moon 44]# cd /proc/irq/44
|
||||||
[root@moon 44]# cat smp_affinity
|
[root@moon 44]# cat smp_affinity
|
||||||
ffffffff
|
ffffffff
|
||||||
|
|
||||||
[root@moon 44]# echo 0f > smp_affinity
|
[root@moon 44]# echo 0f > smp_affinity
|
||||||
[root@moon 44]# cat smp_affinity
|
[root@moon 44]# cat smp_affinity
|
||||||
0000000f
|
0000000f
|
||||||
|
@ -21,17 +30,27 @@ PING hell (195.4.7.3): 56 data bytes
|
||||||
--- hell ping statistics ---
|
--- hell ping statistics ---
|
||||||
6029 packets transmitted, 6027 packets received, 0% packet loss
|
6029 packets transmitted, 6027 packets received, 0% packet loss
|
||||||
round-trip min/avg/max = 0.1/0.1/0.4 ms
|
round-trip min/avg/max = 0.1/0.1/0.4 ms
|
||||||
[root@moon 44]# cat /proc/interrupts | grep 44:
|
[root@moon 44]# cat /proc/interrupts | grep 'CPU\|44:'
|
||||||
44: 0 1785 1785 1783 1783 1
|
CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7
|
||||||
1 0 IO-APIC-level eth1
|
44: 1068 1785 1785 1783 0 0 0 0 IO-APIC-level eth1
|
||||||
|
|
||||||
|
As can be seen from the line above IRQ44 was delivered only to the first four
|
||||||
|
processors (0-3).
|
||||||
|
Now lets restrict that IRQ to CPU(4-7).
|
||||||
|
|
||||||
[root@moon 44]# echo f0 > smp_affinity
|
[root@moon 44]# echo f0 > smp_affinity
|
||||||
|
[root@moon 44]# cat smp_affinity
|
||||||
|
000000f0
|
||||||
[root@moon 44]# ping -f h
|
[root@moon 44]# ping -f h
|
||||||
PING hell (195.4.7.3): 56 data bytes
|
PING hell (195.4.7.3): 56 data bytes
|
||||||
..
|
..
|
||||||
--- hell ping statistics ---
|
--- hell ping statistics ---
|
||||||
2779 packets transmitted, 2777 packets received, 0% packet loss
|
2779 packets transmitted, 2777 packets received, 0% packet loss
|
||||||
round-trip min/avg/max = 0.1/0.5/585.4 ms
|
round-trip min/avg/max = 0.1/0.5/585.4 ms
|
||||||
[root@moon 44]# cat /proc/interrupts | grep 44:
|
[root@moon 44]# cat /proc/interrupts | 'CPU\|44:'
|
||||||
44: 1068 1785 1785 1784 1784 1069 1070 1069 IO-APIC-level eth1
|
CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7
|
||||||
[root@moon 44]#
|
44: 1068 1785 1785 1783 1784 1069 1070 1069 IO-APIC-level eth1
|
||||||
|
|
||||||
|
This time around IRQ44 was delivered only to the last four processors.
|
||||||
|
i.e counters for the CPU0-3 did not change.
|
||||||
|
|
||||||
|
|
|
@ -48,7 +48,7 @@ IOVA generation is pretty generic. We used the same technique as vmalloc()
|
||||||
but these are not global address spaces, but separate for each domain.
|
but these are not global address spaces, but separate for each domain.
|
||||||
Different DMA engines may support different number of domains.
|
Different DMA engines may support different number of domains.
|
||||||
|
|
||||||
We also allocate gaurd pages with each mapping, so we can attempt to catch
|
We also allocate guard pages with each mapping, so we can attempt to catch
|
||||||
any overflow that might happen.
|
any overflow that might happen.
|
||||||
|
|
||||||
|
|
||||||
|
@ -112,4 +112,4 @@ TBD
|
||||||
|
|
||||||
- For compatibility testing, could use unity map domain for all devices, just
|
- For compatibility testing, could use unity map domain for all devices, just
|
||||||
provide a 1-1 for all useful memory under a single domain for all devices.
|
provide a 1-1 for all useful memory under a single domain for all devices.
|
||||||
- API for paravirt ops for abstracting functionlity for VMM folks.
|
- API for paravirt ops for abstracting functionality for VMM folks.
|
||||||
|
|
|
@ -93,6 +93,9 @@ Since NMI handlers disable preemption, synchronize_sched() is guaranteed
|
||||||
not to return until all ongoing NMI handlers exit. It is therefore safe
|
not to return until all ongoing NMI handlers exit. It is therefore safe
|
||||||
to free up the handler's data as soon as synchronize_sched() returns.
|
to free up the handler's data as soon as synchronize_sched() returns.
|
||||||
|
|
||||||
|
Important note: for this to work, the architecture in question must
|
||||||
|
invoke irq_enter() and irq_exit() on NMI entry and exit, respectively.
|
||||||
|
|
||||||
|
|
||||||
Answer to Quick Quiz
|
Answer to Quick Quiz
|
||||||
|
|
||||||
|
|
|
@ -52,6 +52,10 @@ of each iteration. Unfortunately, chaotic relaxation requires highly
|
||||||
structured data, such as the matrices used in scientific programs, and
|
structured data, such as the matrices used in scientific programs, and
|
||||||
is thus inapplicable to most data structures in operating-system kernels.
|
is thus inapplicable to most data structures in operating-system kernels.
|
||||||
|
|
||||||
|
In 1992, Henry (now Alexia) Massalin completed a dissertation advising
|
||||||
|
parallel programmers to defer processing when feasible to simplify
|
||||||
|
synchronization. RCU makes extremely heavy use of this advice.
|
||||||
|
|
||||||
In 1993, Jacobson [Jacobson93] verbally described what is perhaps the
|
In 1993, Jacobson [Jacobson93] verbally described what is perhaps the
|
||||||
simplest deferred-free technique: simply waiting a fixed amount of time
|
simplest deferred-free technique: simply waiting a fixed amount of time
|
||||||
before freeing blocks awaiting deferred free. Jacobson did not describe
|
before freeing blocks awaiting deferred free. Jacobson did not describe
|
||||||
|
@ -138,6 +142,13 @@ blocking in read-side critical sections appeared [PaulEMcKenney2006c],
|
||||||
Robert Olsson described an RCU-protected trie-hash combination
|
Robert Olsson described an RCU-protected trie-hash combination
|
||||||
[RobertOlsson2006a].
|
[RobertOlsson2006a].
|
||||||
|
|
||||||
|
2007 saw the journal version of the award-winning RCU paper from 2006
|
||||||
|
[ThomasEHart2007a], as well as a paper demonstrating use of Promela
|
||||||
|
and Spin to mechanically verify an optimization to Oleg Nesterov's
|
||||||
|
QRCU [PaulEMcKenney2007QRCUspin], a design document describing
|
||||||
|
preemptible RCU [PaulEMcKenney2007PreemptibleRCU], and the three-part
|
||||||
|
LWN "What is RCU?" series [PaulEMcKenney2007WhatIsRCUFundamentally,
|
||||||
|
PaulEMcKenney2008WhatIsRCUUsage, and PaulEMcKenney2008WhatIsRCUAPI].
|
||||||
|
|
||||||
Bibtex Entries
|
Bibtex Entries
|
||||||
|
|
||||||
|
@ -202,6 +213,20 @@ Bibtex Entries
|
||||||
,Year="1991"
|
,Year="1991"
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@phdthesis{HMassalinPhD
|
||||||
|
,author="H. Massalin"
|
||||||
|
,title="Synthesis: An Efficient Implementation of Fundamental Operating
|
||||||
|
System Services"
|
||||||
|
,school="Columbia University"
|
||||||
|
,address="New York, NY"
|
||||||
|
,year="1992"
|
||||||
|
,annotation="
|
||||||
|
Mondo optimizing compiler.
|
||||||
|
Wait-free stuff.
|
||||||
|
Good advice: defer work to avoid synchronization.
|
||||||
|
"
|
||||||
|
}
|
||||||
|
|
||||||
@unpublished{Jacobson93
|
@unpublished{Jacobson93
|
||||||
,author="Van Jacobson"
|
,author="Van Jacobson"
|
||||||
,title="Avoid Read-Side Locking Via Delayed Free"
|
,title="Avoid Read-Side Locking Via Delayed Free"
|
||||||
|
@ -635,3 +660,86 @@ Revised:
|
||||||
"
|
"
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@unpublished{PaulEMcKenney2007PreemptibleRCU
|
||||||
|
,Author="Paul E. McKenney"
|
||||||
|
,Title="The design of preemptible read-copy-update"
|
||||||
|
,month="October"
|
||||||
|
,day="8"
|
||||||
|
,year="2007"
|
||||||
|
,note="Available:
|
||||||
|
\url{http://lwn.net/Articles/253651/}
|
||||||
|
[Viewed October 25, 2007]"
|
||||||
|
,annotation="
|
||||||
|
LWN article describing the design of preemptible RCU.
|
||||||
|
"
|
||||||
|
}
|
||||||
|
|
||||||
|
########################################################################
|
||||||
|
#
|
||||||
|
# "What is RCU?" LWN series.
|
||||||
|
#
|
||||||
|
|
||||||
|
@unpublished{PaulEMcKenney2007WhatIsRCUFundamentally
|
||||||
|
,Author="Paul E. McKenney and Jonathan Walpole"
|
||||||
|
,Title="What is {RCU}, Fundamentally?"
|
||||||
|
,month="December"
|
||||||
|
,day="17"
|
||||||
|
,year="2007"
|
||||||
|
,note="Available:
|
||||||
|
\url{http://lwn.net/Articles/262464/}
|
||||||
|
[Viewed December 27, 2007]"
|
||||||
|
,annotation="
|
||||||
|
Lays out the three basic components of RCU: (1) publish-subscribe,
|
||||||
|
(2) wait for pre-existing readers to complete, and (2) maintain
|
||||||
|
multiple versions.
|
||||||
|
"
|
||||||
|
}
|
||||||
|
|
||||||
|
@unpublished{PaulEMcKenney2008WhatIsRCUUsage
|
||||||
|
,Author="Paul E. McKenney"
|
||||||
|
,Title="What is {RCU}? Part 2: Usage"
|
||||||
|
,month="January"
|
||||||
|
,day="4"
|
||||||
|
,year="2008"
|
||||||
|
,note="Available:
|
||||||
|
\url{http://lwn.net/Articles/263130/}
|
||||||
|
[Viewed January 4, 2008]"
|
||||||
|
,annotation="
|
||||||
|
Lays out six uses of RCU:
|
||||||
|
1. RCU is a Reader-Writer Lock Replacement
|
||||||
|
2. RCU is a Restricted Reference-Counting Mechanism
|
||||||
|
3. RCU is a Bulk Reference-Counting Mechanism
|
||||||
|
4. RCU is a Poor Man's Garbage Collector
|
||||||
|
5. RCU is a Way of Providing Existence Guarantees
|
||||||
|
6. RCU is a Way of Waiting for Things to Finish
|
||||||
|
"
|
||||||
|
}
|
||||||
|
|
||||||
|
@unpublished{PaulEMcKenney2008WhatIsRCUAPI
|
||||||
|
,Author="Paul E. McKenney"
|
||||||
|
,Title="{RCU} part 3: the {RCU} {API}"
|
||||||
|
,month="January"
|
||||||
|
,day="17"
|
||||||
|
,year="2008"
|
||||||
|
,note="Available:
|
||||||
|
\url{http://lwn.net/Articles/264090/}
|
||||||
|
[Viewed January 10, 2008]"
|
||||||
|
,annotation="
|
||||||
|
Gives an overview of the Linux-kernel RCU API and a brief annotated RCU
|
||||||
|
bibliography.
|
||||||
|
"
|
||||||
|
}
|
||||||
|
|
||||||
|
@article{DinakarGuniguntala2008IBMSysJ
|
||||||
|
,author="D. Guniguntala and P. E. McKenney and J. Triplett and J. Walpole"
|
||||||
|
,title="The read-copy-update mechanism for supporting real-time applications on shared-memory multiprocessor systems with {Linux}"
|
||||||
|
,Year="2008"
|
||||||
|
,Month="April"
|
||||||
|
,journal="IBM Systems Journal"
|
||||||
|
,volume="47"
|
||||||
|
,number="2"
|
||||||
|
,pages="@@-@@"
|
||||||
|
,annotation="
|
||||||
|
RCU, realtime RCU, sleepable RCU, performance.
|
||||||
|
"
|
||||||
|
}
|
||||||
|
|
|
@ -13,10 +13,13 @@ over a rather long period of time, but improvements are always welcome!
|
||||||
detailed performance measurements show that RCU is nonetheless
|
detailed performance measurements show that RCU is nonetheless
|
||||||
the right tool for the job.
|
the right tool for the job.
|
||||||
|
|
||||||
The other exception would be where performance is not an issue,
|
Another exception is where performance is not an issue, and RCU
|
||||||
and RCU provides a simpler implementation. An example of this
|
provides a simpler implementation. An example of this situation
|
||||||
situation is the dynamic NMI code in the Linux 2.6 kernel,
|
is the dynamic NMI code in the Linux 2.6 kernel, at least on
|
||||||
at least on architectures where NMIs are rare.
|
architectures where NMIs are rare.
|
||||||
|
|
||||||
|
Yet another exception is where the low real-time latency of RCU's
|
||||||
|
read-side primitives is critically important.
|
||||||
|
|
||||||
1. Does the update code have proper mutual exclusion?
|
1. Does the update code have proper mutual exclusion?
|
||||||
|
|
||||||
|
@ -39,9 +42,10 @@ over a rather long period of time, but improvements are always welcome!
|
||||||
|
|
||||||
2. Do the RCU read-side critical sections make proper use of
|
2. Do the RCU read-side critical sections make proper use of
|
||||||
rcu_read_lock() and friends? These primitives are needed
|
rcu_read_lock() and friends? These primitives are needed
|
||||||
to suppress preemption (or bottom halves, in the case of
|
to prevent grace periods from ending prematurely, which
|
||||||
rcu_read_lock_bh()) in the read-side critical sections,
|
could result in data being unceremoniously freed out from
|
||||||
and are also an excellent aid to readability.
|
under your read-side code, which can greatly increase the
|
||||||
|
actuarial risk of your kernel.
|
||||||
|
|
||||||
As a rough rule of thumb, any dereference of an RCU-protected
|
As a rough rule of thumb, any dereference of an RCU-protected
|
||||||
pointer must be covered by rcu_read_lock() or rcu_read_lock_bh()
|
pointer must be covered by rcu_read_lock() or rcu_read_lock_bh()
|
||||||
|
@ -54,15 +58,30 @@ over a rather long period of time, but improvements are always welcome!
|
||||||
be running while updates are in progress. There are a number
|
be running while updates are in progress. There are a number
|
||||||
of ways to handle this concurrency, depending on the situation:
|
of ways to handle this concurrency, depending on the situation:
|
||||||
|
|
||||||
a. Make updates appear atomic to readers. For example,
|
a. Use the RCU variants of the list and hlist update
|
||||||
|
primitives to add, remove, and replace elements on an
|
||||||
|
RCU-protected list. Alternatively, use the RCU-protected
|
||||||
|
trees that have been added to the Linux kernel.
|
||||||
|
|
||||||
|
This is almost always the best approach.
|
||||||
|
|
||||||
|
b. Proceed as in (a) above, but also maintain per-element
|
||||||
|
locks (that are acquired by both readers and writers)
|
||||||
|
that guard per-element state. Of course, fields that
|
||||||
|
the readers refrain from accessing can be guarded by the
|
||||||
|
update-side lock.
|
||||||
|
|
||||||
|
This works quite well, also.
|
||||||
|
|
||||||
|
c. Make updates appear atomic to readers. For example,
|
||||||
pointer updates to properly aligned fields will appear
|
pointer updates to properly aligned fields will appear
|
||||||
atomic, as will individual atomic primitives. Operations
|
atomic, as will individual atomic primitives. Operations
|
||||||
performed under a lock and sequences of multiple atomic
|
performed under a lock and sequences of multiple atomic
|
||||||
primitives will -not- appear to be atomic.
|
primitives will -not- appear to be atomic.
|
||||||
|
|
||||||
This is almost always the best approach.
|
This can work, but is starting to get a bit tricky.
|
||||||
|
|
||||||
b. Carefully order the updates and the reads so that
|
d. Carefully order the updates and the reads so that
|
||||||
readers see valid data at all phases of the update.
|
readers see valid data at all phases of the update.
|
||||||
This is often more difficult than it sounds, especially
|
This is often more difficult than it sounds, especially
|
||||||
given modern CPUs' tendency to reorder memory references.
|
given modern CPUs' tendency to reorder memory references.
|
||||||
|
@ -123,18 +142,22 @@ over a rather long period of time, but improvements are always welcome!
|
||||||
when publicizing a pointer to a structure that can
|
when publicizing a pointer to a structure that can
|
||||||
be traversed by an RCU read-side critical section.
|
be traversed by an RCU read-side critical section.
|
||||||
|
|
||||||
5. If call_rcu(), or a related primitive such as call_rcu_bh(),
|
5. If call_rcu(), or a related primitive such as call_rcu_bh() or
|
||||||
is used, the callback function must be written to be called
|
call_rcu_sched(), is used, the callback function must be
|
||||||
from softirq context. In particular, it cannot block.
|
written to be called from softirq context. In particular,
|
||||||
|
it cannot block.
|
||||||
|
|
||||||
6. Since synchronize_rcu() can block, it cannot be called from
|
6. Since synchronize_rcu() can block, it cannot be called from
|
||||||
any sort of irq context.
|
any sort of irq context. Ditto for synchronize_sched() and
|
||||||
|
synchronize_srcu().
|
||||||
|
|
||||||
7. If the updater uses call_rcu(), then the corresponding readers
|
7. If the updater uses call_rcu(), then the corresponding readers
|
||||||
must use rcu_read_lock() and rcu_read_unlock(). If the updater
|
must use rcu_read_lock() and rcu_read_unlock(). If the updater
|
||||||
uses call_rcu_bh(), then the corresponding readers must use
|
uses call_rcu_bh(), then the corresponding readers must use
|
||||||
rcu_read_lock_bh() and rcu_read_unlock_bh(). Mixing things up
|
rcu_read_lock_bh() and rcu_read_unlock_bh(). If the updater
|
||||||
will result in confusion and broken kernels.
|
uses call_rcu_sched(), then the corresponding readers must
|
||||||
|
disable preemption. Mixing things up will result in confusion
|
||||||
|
and broken kernels.
|
||||||
|
|
||||||
One exception to this rule: rcu_read_lock() and rcu_read_unlock()
|
One exception to this rule: rcu_read_lock() and rcu_read_unlock()
|
||||||
may be substituted for rcu_read_lock_bh() and rcu_read_unlock_bh()
|
may be substituted for rcu_read_lock_bh() and rcu_read_unlock_bh()
|
||||||
|
@ -143,9 +166,9 @@ over a rather long period of time, but improvements are always welcome!
|
||||||
such cases is a must, of course! And the jury is still out on
|
such cases is a must, of course! And the jury is still out on
|
||||||
whether the increased speed is worth it.
|
whether the increased speed is worth it.
|
||||||
|
|
||||||
8. Although synchronize_rcu() is a bit slower than is call_rcu(),
|
8. Although synchronize_rcu() is slower than is call_rcu(), it
|
||||||
it usually results in simpler code. So, unless update
|
usually results in simpler code. So, unless update performance
|
||||||
performance is critically important or the updaters cannot block,
|
is critically important or the updaters cannot block,
|
||||||
synchronize_rcu() should be used in preference to call_rcu().
|
synchronize_rcu() should be used in preference to call_rcu().
|
||||||
|
|
||||||
An especially important property of the synchronize_rcu()
|
An especially important property of the synchronize_rcu()
|
||||||
|
@ -187,23 +210,23 @@ over a rather long period of time, but improvements are always welcome!
|
||||||
number of updates per grace period.
|
number of updates per grace period.
|
||||||
|
|
||||||
9. All RCU list-traversal primitives, which include
|
9. All RCU list-traversal primitives, which include
|
||||||
list_for_each_rcu(), list_for_each_entry_rcu(),
|
rcu_dereference(), list_for_each_rcu(), list_for_each_entry_rcu(),
|
||||||
list_for_each_continue_rcu(), and list_for_each_safe_rcu(),
|
list_for_each_continue_rcu(), and list_for_each_safe_rcu(),
|
||||||
must be within an RCU read-side critical section. RCU
|
must be either within an RCU read-side critical section or
|
||||||
|
must be protected by appropriate update-side locks. RCU
|
||||||
read-side critical sections are delimited by rcu_read_lock()
|
read-side critical sections are delimited by rcu_read_lock()
|
||||||
and rcu_read_unlock(), or by similar primitives such as
|
and rcu_read_unlock(), or by similar primitives such as
|
||||||
rcu_read_lock_bh() and rcu_read_unlock_bh().
|
rcu_read_lock_bh() and rcu_read_unlock_bh().
|
||||||
|
|
||||||
Use of the _rcu() list-traversal primitives outside of an
|
The reason that it is permissible to use RCU list-traversal
|
||||||
RCU read-side critical section causes no harm other than
|
primitives when the update-side lock is held is that doing so
|
||||||
a slight performance degradation on Alpha CPUs. It can
|
can be quite helpful in reducing code bloat when common code is
|
||||||
also be quite helpful in reducing code bloat when common
|
shared between readers and updaters.
|
||||||
code is shared between readers and updaters.
|
|
||||||
|
|
||||||
10. Conversely, if you are in an RCU read-side critical section,
|
10. Conversely, if you are in an RCU read-side critical section,
|
||||||
you -must- use the "_rcu()" variants of the list macros.
|
and you don't hold the appropriate update-side lock, you -must-
|
||||||
Failing to do so will break Alpha and confuse people reading
|
use the "_rcu()" variants of the list macros. Failing to do so
|
||||||
your code.
|
will break Alpha and confuse people reading your code.
|
||||||
|
|
||||||
11. Note that synchronize_rcu() -only- guarantees to wait until
|
11. Note that synchronize_rcu() -only- guarantees to wait until
|
||||||
all currently executing rcu_read_lock()-protected RCU read-side
|
all currently executing rcu_read_lock()-protected RCU read-side
|
||||||
|
@ -230,6 +253,14 @@ over a rather long period of time, but improvements are always welcome!
|
||||||
must use whatever locking or other synchronization is required
|
must use whatever locking or other synchronization is required
|
||||||
to safely access and/or modify that data structure.
|
to safely access and/or modify that data structure.
|
||||||
|
|
||||||
|
RCU callbacks are -usually- executed on the same CPU that executed
|
||||||
|
the corresponding call_rcu(), call_rcu_bh(), or call_rcu_sched(),
|
||||||
|
but are by -no- means guaranteed to be. For example, if a given
|
||||||
|
CPU goes offline while having an RCU callback pending, then that
|
||||||
|
RCU callback will execute on some surviving CPU. (If this was
|
||||||
|
not the case, a self-spawning RCU callback would prevent the
|
||||||
|
victim CPU from ever going offline.)
|
||||||
|
|
||||||
14. SRCU (srcu_read_lock(), srcu_read_unlock(), and synchronize_srcu())
|
14. SRCU (srcu_read_lock(), srcu_read_unlock(), and synchronize_srcu())
|
||||||
may only be invoked from process context. Unlike other forms of
|
may only be invoked from process context. Unlike other forms of
|
||||||
RCU, it -is- permissible to block in an SRCU read-side critical
|
RCU, it -is- permissible to block in an SRCU read-side critical
|
||||||
|
|
|
@ -10,23 +10,30 @@ status messages via printk(), which can be examined via the dmesg
|
||||||
command (perhaps grepping for "torture"). The test is started
|
command (perhaps grepping for "torture"). The test is started
|
||||||
when the module is loaded, and stops when the module is unloaded.
|
when the module is loaded, and stops when the module is unloaded.
|
||||||
|
|
||||||
However, actually setting this config option to "y" results in the system
|
CONFIG_RCU_TORTURE_TEST_RUNNABLE
|
||||||
running the test immediately upon boot, and ending only when the system
|
|
||||||
is taken down. Normally, one will instead want to build the system
|
It is also possible to specify CONFIG_RCU_TORTURE_TEST=y, which will
|
||||||
with CONFIG_RCU_TORTURE_TEST=m and to use modprobe and rmmod to control
|
result in the tests being loaded into the base kernel. In this case,
|
||||||
the test, perhaps using a script similar to the one shown at the end of
|
the CONFIG_RCU_TORTURE_TEST_RUNNABLE config option is used to specify
|
||||||
this document. Note that you will need CONFIG_MODULE_UNLOAD in order
|
whether the RCU torture tests are to be started immediately during
|
||||||
to be able to end the test.
|
boot or whether the /proc/sys/kernel/rcutorture_runnable file is used
|
||||||
|
to enable them. This /proc file can be used to repeatedly pause and
|
||||||
|
restart the tests, regardless of the initial state specified by the
|
||||||
|
CONFIG_RCU_TORTURE_TEST_RUNNABLE config option.
|
||||||
|
|
||||||
|
You will normally -not- want to start the RCU torture tests during boot
|
||||||
|
(and thus the default is CONFIG_RCU_TORTURE_TEST_RUNNABLE=n), but doing
|
||||||
|
this can sometimes be useful in finding boot-time bugs.
|
||||||
|
|
||||||
|
|
||||||
MODULE PARAMETERS
|
MODULE PARAMETERS
|
||||||
|
|
||||||
This module has the following parameters:
|
This module has the following parameters:
|
||||||
|
|
||||||
nreaders This is the number of RCU reading threads supported.
|
irqreaders Says to invoke RCU readers from irq level. This is currently
|
||||||
The default is twice the number of CPUs. Why twice?
|
done via timers. Defaults to "1" for variants of RCU that
|
||||||
To properly exercise RCU implementations with preemptible
|
permit this. (Or, more accurately, variants of RCU that do
|
||||||
read-side critical sections.
|
-not- permit this know to ignore this variable.)
|
||||||
|
|
||||||
nfakewriters This is the number of RCU fake writer threads to run. Fake
|
nfakewriters This is the number of RCU fake writer threads to run. Fake
|
||||||
writer threads repeatedly use the synchronous "wait for
|
writer threads repeatedly use the synchronous "wait for
|
||||||
|
@ -37,6 +44,16 @@ nfakewriters This is the number of RCU fake writer threads to run. Fake
|
||||||
to trigger special cases caused by multiple writers, such as
|
to trigger special cases caused by multiple writers, such as
|
||||||
the synchronize_srcu() early return optimization.
|
the synchronize_srcu() early return optimization.
|
||||||
|
|
||||||
|
nreaders This is the number of RCU reading threads supported.
|
||||||
|
The default is twice the number of CPUs. Why twice?
|
||||||
|
To properly exercise RCU implementations with preemptible
|
||||||
|
read-side critical sections.
|
||||||
|
|
||||||
|
shuffle_interval
|
||||||
|
The number of seconds to keep the test threads affinitied
|
||||||
|
to a particular subset of the CPUs, defaults to 3 seconds.
|
||||||
|
Used in conjunction with test_no_idle_hz.
|
||||||
|
|
||||||
stat_interval The number of seconds between output of torture
|
stat_interval The number of seconds between output of torture
|
||||||
statistics (via printk()). Regardless of the interval,
|
statistics (via printk()). Regardless of the interval,
|
||||||
statistics are printed when the module is unloaded.
|
statistics are printed when the module is unloaded.
|
||||||
|
@ -44,10 +61,11 @@ stat_interval The number of seconds between output of torture
|
||||||
be printed -only- when the module is unloaded, and this
|
be printed -only- when the module is unloaded, and this
|
||||||
is the default.
|
is the default.
|
||||||
|
|
||||||
shuffle_interval
|
stutter The length of time to run the test before pausing for this
|
||||||
The number of seconds to keep the test threads affinitied
|
same period of time. Defaults to "stutter=5", so as
|
||||||
to a particular subset of the CPUs, defaults to 5 seconds.
|
to run and pause for (roughly) five-second intervals.
|
||||||
Used in conjunction with test_no_idle_hz.
|
Specifying "stutter=0" causes the test to run continuously
|
||||||
|
without pausing, which is the old default behavior.
|
||||||
|
|
||||||
test_no_idle_hz Whether or not to test the ability of RCU to operate in
|
test_no_idle_hz Whether or not to test the ability of RCU to operate in
|
||||||
a kernel that disables the scheduling-clock interrupt to
|
a kernel that disables the scheduling-clock interrupt to
|
||||||
|
|
|
@ -1,3 +1,11 @@
|
||||||
|
Please note that the "What is RCU?" LWN series is an excellent place
|
||||||
|
to start learning about RCU:
|
||||||
|
|
||||||
|
1. What is RCU, Fundamentally? http://lwn.net/Articles/262464/
|
||||||
|
2. What is RCU? Part 2: Usage http://lwn.net/Articles/263130/
|
||||||
|
3. RCU part 3: the RCU API http://lwn.net/Articles/264090/
|
||||||
|
|
||||||
|
|
||||||
What is RCU?
|
What is RCU?
|
||||||
|
|
||||||
RCU is a synchronization mechanism that was added to the Linux kernel
|
RCU is a synchronization mechanism that was added to the Linux kernel
|
||||||
|
@ -772,26 +780,18 @@ Linux-kernel source code, but it helps to have a full list of the
|
||||||
APIs, since there does not appear to be a way to categorize them
|
APIs, since there does not appear to be a way to categorize them
|
||||||
in docbook. Here is the list, by category.
|
in docbook. Here is the list, by category.
|
||||||
|
|
||||||
Markers for RCU read-side critical sections:
|
|
||||||
|
|
||||||
rcu_read_lock
|
|
||||||
rcu_read_unlock
|
|
||||||
rcu_read_lock_bh
|
|
||||||
rcu_read_unlock_bh
|
|
||||||
srcu_read_lock
|
|
||||||
srcu_read_unlock
|
|
||||||
|
|
||||||
RCU pointer/list traversal:
|
RCU pointer/list traversal:
|
||||||
|
|
||||||
rcu_dereference
|
rcu_dereference
|
||||||
list_for_each_rcu (to be deprecated in favor of
|
|
||||||
list_for_each_entry_rcu)
|
|
||||||
list_for_each_entry_rcu
|
list_for_each_entry_rcu
|
||||||
list_for_each_continue_rcu (to be deprecated in favor of new
|
|
||||||
list_for_each_entry_continue_rcu)
|
|
||||||
hlist_for_each_entry_rcu
|
hlist_for_each_entry_rcu
|
||||||
|
|
||||||
RCU pointer update:
|
list_for_each_rcu (to be deprecated in favor of
|
||||||
|
list_for_each_entry_rcu)
|
||||||
|
list_for_each_continue_rcu (to be deprecated in favor of new
|
||||||
|
list_for_each_entry_continue_rcu)
|
||||||
|
|
||||||
|
RCU pointer/list update:
|
||||||
|
|
||||||
rcu_assign_pointer
|
rcu_assign_pointer
|
||||||
list_add_rcu
|
list_add_rcu
|
||||||
|
@ -799,16 +799,36 @@ RCU pointer update:
|
||||||
list_del_rcu
|
list_del_rcu
|
||||||
list_replace_rcu
|
list_replace_rcu
|
||||||
hlist_del_rcu
|
hlist_del_rcu
|
||||||
|
hlist_add_after_rcu
|
||||||
|
hlist_add_before_rcu
|
||||||
hlist_add_head_rcu
|
hlist_add_head_rcu
|
||||||
|
hlist_replace_rcu
|
||||||
|
list_splice_init_rcu()
|
||||||
|
|
||||||
RCU grace period:
|
RCU: Critical sections Grace period Barrier
|
||||||
|
|
||||||
|
rcu_read_lock synchronize_net rcu_barrier
|
||||||
|
rcu_read_unlock synchronize_rcu
|
||||||
|
call_rcu
|
||||||
|
|
||||||
|
|
||||||
|
bh: Critical sections Grace period Barrier
|
||||||
|
|
||||||
|
rcu_read_lock_bh call_rcu_bh rcu_barrier_bh
|
||||||
|
rcu_read_unlock_bh
|
||||||
|
|
||||||
|
|
||||||
|
sched: Critical sections Grace period Barrier
|
||||||
|
|
||||||
|
[preempt_disable] synchronize_sched rcu_barrier_sched
|
||||||
|
[and friends] call_rcu_sched
|
||||||
|
|
||||||
|
|
||||||
|
SRCU: Critical sections Grace period Barrier
|
||||||
|
|
||||||
|
srcu_read_lock synchronize_srcu N/A
|
||||||
|
srcu_read_unlock
|
||||||
|
|
||||||
synchronize_net
|
|
||||||
synchronize_sched
|
|
||||||
synchronize_rcu
|
|
||||||
synchronize_srcu
|
|
||||||
call_rcu
|
|
||||||
call_rcu_bh
|
|
||||||
|
|
||||||
See the comment headers in the source code (or the docbook generated
|
See the comment headers in the source code (or the docbook generated
|
||||||
from them) for more information.
|
from them) for more information.
|
||||||
|
|
|
@ -528,7 +528,33 @@ See more details on the proper patch format in the following
|
||||||
references.
|
references.
|
||||||
|
|
||||||
|
|
||||||
|
16) Sending "git pull" requests (from Linus emails)
|
||||||
|
|
||||||
|
Please write the git repo address and branch name alone on the same line
|
||||||
|
so that I can't even by mistake pull from the wrong branch, and so
|
||||||
|
that a triple-click just selects the whole thing.
|
||||||
|
|
||||||
|
So the proper format is something along the lines of:
|
||||||
|
|
||||||
|
"Please pull from
|
||||||
|
|
||||||
|
git://jdelvare.pck.nerim.net/jdelvare-2.6 i2c-for-linus
|
||||||
|
|
||||||
|
to get these changes:"
|
||||||
|
|
||||||
|
so that I don't have to hunt-and-peck for the address and inevitably
|
||||||
|
get it wrong (actually, I've only gotten it wrong a few times, and
|
||||||
|
checking against the diffstat tells me when I get it wrong, but I'm
|
||||||
|
just a lot more comfortable when I don't have to "look for" the right
|
||||||
|
thing to pull, and double-check that I have the right branch-name).
|
||||||
|
|
||||||
|
|
||||||
|
Please use "git diff -M --stat --summary" to generate the diffstat:
|
||||||
|
the -M enables rename detection, and the summary enables a summary of
|
||||||
|
new/deleted or renamed files.
|
||||||
|
|
||||||
|
With rename detection, the statistics are rather different [...]
|
||||||
|
because git will notice that a fair number of the changes are renames.
|
||||||
|
|
||||||
-----------------------------------
|
-----------------------------------
|
||||||
SECTION 2 - HINTS, TIPS, AND TRICKS
|
SECTION 2 - HINTS, TIPS, AND TRICKS
|
||||||
|
|
|
@ -11,6 +11,7 @@ the delays experienced by a task while
|
||||||
a) waiting for a CPU (while being runnable)
|
a) waiting for a CPU (while being runnable)
|
||||||
b) completion of synchronous block I/O initiated by the task
|
b) completion of synchronous block I/O initiated by the task
|
||||||
c) swapping in pages
|
c) swapping in pages
|
||||||
|
d) memory reclaim
|
||||||
|
|
||||||
and makes these statistics available to userspace through
|
and makes these statistics available to userspace through
|
||||||
the taskstats interface.
|
the taskstats interface.
|
||||||
|
@ -41,7 +42,7 @@ this structure. See
|
||||||
include/linux/taskstats.h
|
include/linux/taskstats.h
|
||||||
for a description of the fields pertaining to delay accounting.
|
for a description of the fields pertaining to delay accounting.
|
||||||
It will generally be in the form of counters returning the cumulative
|
It will generally be in the form of counters returning the cumulative
|
||||||
delay seen for cpu, sync block I/O, swapin etc.
|
delay seen for cpu, sync block I/O, swapin, memory reclaim etc.
|
||||||
|
|
||||||
Taking the difference of two successive readings of a given
|
Taking the difference of two successive readings of a given
|
||||||
counter (say cpu_delay_total) for a task will give the delay
|
counter (say cpu_delay_total) for a task will give the delay
|
||||||
|
@ -94,7 +95,9 @@ CPU count real total virtual total delay total
|
||||||
7876 92005750 100000000 24001500
|
7876 92005750 100000000 24001500
|
||||||
IO count delay total
|
IO count delay total
|
||||||
0 0
|
0 0
|
||||||
MEM count delay total
|
SWAP count delay total
|
||||||
|
0 0
|
||||||
|
RECLAIM count delay total
|
||||||
0 0
|
0 0
|
||||||
|
|
||||||
Get delays seen in executing a given simple command
|
Get delays seen in executing a given simple command
|
||||||
|
@ -108,5 +111,7 @@ CPU count real total virtual total delay total
|
||||||
6 4000250 4000000 0
|
6 4000250 4000000 0
|
||||||
IO count delay total
|
IO count delay total
|
||||||
0 0
|
0 0
|
||||||
MEM count delay total
|
SWAP count delay total
|
||||||
|
0 0
|
||||||
|
RECLAIM count delay total
|
||||||
0 0
|
0 0
|
||||||
|
|
|
@ -196,14 +196,18 @@ void print_delayacct(struct taskstats *t)
|
||||||
" %15llu%15llu%15llu%15llu\n"
|
" %15llu%15llu%15llu%15llu\n"
|
||||||
"IO %15s%15s\n"
|
"IO %15s%15s\n"
|
||||||
" %15llu%15llu\n"
|
" %15llu%15llu\n"
|
||||||
"MEM %15s%15s\n"
|
"SWAP %15s%15s\n"
|
||||||
|
" %15llu%15llu\n"
|
||||||
|
"RECLAIM %12s%15s\n"
|
||||||
" %15llu%15llu\n",
|
" %15llu%15llu\n",
|
||||||
"count", "real total", "virtual total", "delay total",
|
"count", "real total", "virtual total", "delay total",
|
||||||
t->cpu_count, t->cpu_run_real_total, t->cpu_run_virtual_total,
|
t->cpu_count, t->cpu_run_real_total, t->cpu_run_virtual_total,
|
||||||
t->cpu_delay_total,
|
t->cpu_delay_total,
|
||||||
"count", "delay total",
|
"count", "delay total",
|
||||||
t->blkio_count, t->blkio_delay_total,
|
t->blkio_count, t->blkio_delay_total,
|
||||||
"count", "delay total", t->swapin_count, t->swapin_delay_total);
|
"count", "delay total", t->swapin_count, t->swapin_delay_total,
|
||||||
|
"count", "delay total",
|
||||||
|
t->freepages_count, t->freepages_delay_total);
|
||||||
}
|
}
|
||||||
|
|
||||||
void task_context_switch_counts(struct taskstats *t)
|
void task_context_switch_counts(struct taskstats *t)
|
||||||
|
|
|
@ -6,7 +6,7 @@ This document contains an explanation of the struct taskstats fields.
|
||||||
There are three different groups of fields in the struct taskstats:
|
There are three different groups of fields in the struct taskstats:
|
||||||
|
|
||||||
1) Common and basic accounting fields
|
1) Common and basic accounting fields
|
||||||
If CONFIG_TASKSTATS is set, the taskstats inteface is enabled and
|
If CONFIG_TASKSTATS is set, the taskstats interface is enabled and
|
||||||
the common fields and basic accounting fields are collected for
|
the common fields and basic accounting fields are collected for
|
||||||
delivery at do_exit() of a task.
|
delivery at do_exit() of a task.
|
||||||
2) Delay accounting fields
|
2) Delay accounting fields
|
||||||
|
@ -24,6 +24,10 @@ There are three different groups of fields in the struct taskstats:
|
||||||
|
|
||||||
4) Per-task and per-thread context switch count statistics
|
4) Per-task and per-thread context switch count statistics
|
||||||
|
|
||||||
|
5) Time accounting for SMT machines
|
||||||
|
|
||||||
|
6) Extended delay accounting fields for memory reclaim
|
||||||
|
|
||||||
Future extension should add fields to the end of the taskstats struct, and
|
Future extension should add fields to the end of the taskstats struct, and
|
||||||
should not change the relative position of each field within the struct.
|
should not change the relative position of each field within the struct.
|
||||||
|
|
||||||
|
@ -164,4 +168,13 @@ struct taskstats {
|
||||||
__u64 nvcsw; /* Context voluntary switch counter */
|
__u64 nvcsw; /* Context voluntary switch counter */
|
||||||
__u64 nivcsw; /* Context involuntary switch counter */
|
__u64 nivcsw; /* Context involuntary switch counter */
|
||||||
|
|
||||||
|
5) Time accounting for SMT machines
|
||||||
|
__u64 ac_utimescaled; /* utime scaled on frequency etc */
|
||||||
|
__u64 ac_stimescaled; /* stime scaled on frequency etc */
|
||||||
|
__u64 cpu_scaled_run_real_total; /* scaled cpu_run_real_total */
|
||||||
|
|
||||||
|
6) Extended delay accounting fields for memory reclaim
|
||||||
|
/* Delay waiting for memory reclaim */
|
||||||
|
__u64 freepages_count;
|
||||||
|
__u64 freepages_delay_total;
|
||||||
}
|
}
|
||||||
|
|
|
@ -138,14 +138,8 @@ So, what's changed?
|
||||||
|
|
||||||
Set active the IRQ edge(s)/level. This replaces the
|
Set active the IRQ edge(s)/level. This replaces the
|
||||||
SA1111 INTPOL manipulation, and the set_GPIO_IRQ_edge()
|
SA1111 INTPOL manipulation, and the set_GPIO_IRQ_edge()
|
||||||
function. Type should be one of the following:
|
function. Type should be one of IRQ_TYPE_xxx defined in
|
||||||
|
<linux/irq.h>
|
||||||
#define IRQT_NOEDGE (0)
|
|
||||||
#define IRQT_RISING (__IRQT_RISEDGE)
|
|
||||||
#define IRQT_FALLING (__IRQT_FALEDGE)
|
|
||||||
#define IRQT_BOTHEDGE (__IRQT_RISEDGE|__IRQT_FALEDGE)
|
|
||||||
#define IRQT_LOW (__IRQT_LOWLVL)
|
|
||||||
#define IRQT_HIGH (__IRQT_HIGHLVL)
|
|
||||||
|
|
||||||
3. set_GPIO_IRQ_edge() is obsolete, and should be replaced by set_irq_type.
|
3. set_GPIO_IRQ_edge() is obsolete, and should be replaced by set_irq_type.
|
||||||
|
|
||||||
|
|
|
@ -3,7 +3,7 @@
|
||||||
===================================
|
===================================
|
||||||
|
|
||||||
License: GPLv2
|
License: GPLv2
|
||||||
Author & Maintainer: Miguel Ojeda Sandonis <maxextreme@gmail.com>
|
Author & Maintainer: Miguel Ojeda Sandonis
|
||||||
Date: 2006-10-27
|
Date: 2006-10-27
|
||||||
|
|
||||||
|
|
||||||
|
@ -22,7 +22,7 @@ Date: 2006-10-27
|
||||||
1. DRIVER INFORMATION
|
1. DRIVER INFORMATION
|
||||||
---------------------
|
---------------------
|
||||||
|
|
||||||
This driver support one cfag12864b display at time.
|
This driver supports a cfag12864b LCD.
|
||||||
|
|
||||||
|
|
||||||
---------------------
|
---------------------
|
||||||
|
|
|
@ -4,7 +4,7 @@
|
||||||
* Description: cfag12864b LCD userspace example program
|
* Description: cfag12864b LCD userspace example program
|
||||||
* License: GPLv2
|
* License: GPLv2
|
||||||
*
|
*
|
||||||
* Author: Copyright (C) Miguel Ojeda Sandonis <maxextreme@gmail.com>
|
* Author: Copyright (C) Miguel Ojeda Sandonis
|
||||||
* Date: 2006-10-31
|
* Date: 2006-10-31
|
||||||
*
|
*
|
||||||
* This program is free software; you can redistribute it and/or modify
|
* This program is free software; you can redistribute it and/or modify
|
||||||
|
|
|
@ -3,7 +3,7 @@
|
||||||
==========================================
|
==========================================
|
||||||
|
|
||||||
License: GPLv2
|
License: GPLv2
|
||||||
Author & Maintainer: Miguel Ojeda Sandonis <maxextreme@gmail.com>
|
Author & Maintainer: Miguel Ojeda Sandonis
|
||||||
Date: 2006-10-27
|
Date: 2006-10-27
|
||||||
|
|
||||||
|
|
||||||
|
@ -21,7 +21,7 @@ Date: 2006-10-27
|
||||||
1. DRIVER INFORMATION
|
1. DRIVER INFORMATION
|
||||||
---------------------
|
---------------------
|
||||||
|
|
||||||
This driver support the ks0108 LCD controller.
|
This driver supports the ks0108 LCD controller.
|
||||||
|
|
||||||
|
|
||||||
---------------------
|
---------------------
|
||||||
|
|
327
Documentation/block/data-integrity.txt
Normal file
327
Documentation/block/data-integrity.txt
Normal file
|
@ -0,0 +1,327 @@
|
||||||
|
----------------------------------------------------------------------
|
||||||
|
1. INTRODUCTION
|
||||||
|
|
||||||
|
Modern filesystems feature checksumming of data and metadata to
|
||||||
|
protect against data corruption. However, the detection of the
|
||||||
|
corruption is done at read time which could potentially be months
|
||||||
|
after the data was written. At that point the original data that the
|
||||||
|
application tried to write is most likely lost.
|
||||||
|
|
||||||
|
The solution is to ensure that the disk is actually storing what the
|
||||||
|
application meant it to. Recent additions to both the SCSI family
|
||||||
|
protocols (SBC Data Integrity Field, SCC protection proposal) as well
|
||||||
|
as SATA/T13 (External Path Protection) try to remedy this by adding
|
||||||
|
support for appending integrity metadata to an I/O. The integrity
|
||||||
|
metadata (or protection information in SCSI terminology) includes a
|
||||||
|
checksum for each sector as well as an incrementing counter that
|
||||||
|
ensures the individual sectors are written in the right order. And
|
||||||
|
for some protection schemes also that the I/O is written to the right
|
||||||
|
place on disk.
|
||||||
|
|
||||||
|
Current storage controllers and devices implement various protective
|
||||||
|
measures, for instance checksumming and scrubbing. But these
|
||||||
|
technologies are working in their own isolated domains or at best
|
||||||
|
between adjacent nodes in the I/O path. The interesting thing about
|
||||||
|
DIF and the other integrity extensions is that the protection format
|
||||||
|
is well defined and every node in the I/O path can verify the
|
||||||
|
integrity of the I/O and reject it if corruption is detected. This
|
||||||
|
allows not only corruption prevention but also isolation of the point
|
||||||
|
of failure.
|
||||||
|
|
||||||
|
----------------------------------------------------------------------
|
||||||
|
2. THE DATA INTEGRITY EXTENSIONS
|
||||||
|
|
||||||
|
As written, the protocol extensions only protect the path between
|
||||||
|
controller and storage device. However, many controllers actually
|
||||||
|
allow the operating system to interact with the integrity metadata
|
||||||
|
(IMD). We have been working with several FC/SAS HBA vendors to enable
|
||||||
|
the protection information to be transferred to and from their
|
||||||
|
controllers.
|
||||||
|
|
||||||
|
The SCSI Data Integrity Field works by appending 8 bytes of protection
|
||||||
|
information to each sector. The data + integrity metadata is stored
|
||||||
|
in 520 byte sectors on disk. Data + IMD are interleaved when
|
||||||
|
transferred between the controller and target. The T13 proposal is
|
||||||
|
similar.
|
||||||
|
|
||||||
|
Because it is highly inconvenient for operating systems to deal with
|
||||||
|
520 (and 4104) byte sectors, we approached several HBA vendors and
|
||||||
|
encouraged them to allow separation of the data and integrity metadata
|
||||||
|
scatter-gather lists.
|
||||||
|
|
||||||
|
The controller will interleave the buffers on write and split them on
|
||||||
|
read. This means that the Linux can DMA the data buffers to and from
|
||||||
|
host memory without changes to the page cache.
|
||||||
|
|
||||||
|
Also, the 16-bit CRC checksum mandated by both the SCSI and SATA specs
|
||||||
|
is somewhat heavy to compute in software. Benchmarks found that
|
||||||
|
calculating this checksum had a significant impact on system
|
||||||
|
performance for a number of workloads. Some controllers allow a
|
||||||
|
lighter-weight checksum to be used when interfacing with the operating
|
||||||
|
system. Emulex, for instance, supports the TCP/IP checksum instead.
|
||||||
|
The IP checksum received from the OS is converted to the 16-bit CRC
|
||||||
|
when writing and vice versa. This allows the integrity metadata to be
|
||||||
|
generated by Linux or the application at very low cost (comparable to
|
||||||
|
software RAID5).
|
||||||
|
|
||||||
|
The IP checksum is weaker than the CRC in terms of detecting bit
|
||||||
|
errors. However, the strength is really in the separation of the data
|
||||||
|
buffers and the integrity metadata. These two distinct buffers much
|
||||||
|
match up for an I/O to complete.
|
||||||
|
|
||||||
|
The separation of the data and integrity metadata buffers as well as
|
||||||
|
the choice in checksums is referred to as the Data Integrity
|
||||||
|
Extensions. As these extensions are outside the scope of the protocol
|
||||||
|
bodies (T10, T13), Oracle and its partners are trying to standardize
|
||||||
|
them within the Storage Networking Industry Association.
|
||||||
|
|
||||||
|
----------------------------------------------------------------------
|
||||||
|
3. KERNEL CHANGES
|
||||||
|
|
||||||
|
The data integrity framework in Linux enables protection information
|
||||||
|
to be pinned to I/Os and sent to/received from controllers that
|
||||||
|
support it.
|
||||||
|
|
||||||
|
The advantage to the integrity extensions in SCSI and SATA is that
|
||||||
|
they enable us to protect the entire path from application to storage
|
||||||
|
device. However, at the same time this is also the biggest
|
||||||
|
disadvantage. It means that the protection information must be in a
|
||||||
|
format that can be understood by the disk.
|
||||||
|
|
||||||
|
Generally Linux/POSIX applications are agnostic to the intricacies of
|
||||||
|
the storage devices they are accessing. The virtual filesystem switch
|
||||||
|
and the block layer make things like hardware sector size and
|
||||||
|
transport protocols completely transparent to the application.
|
||||||
|
|
||||||
|
However, this level of detail is required when preparing the
|
||||||
|
protection information to send to a disk. Consequently, the very
|
||||||
|
concept of an end-to-end protection scheme is a layering violation.
|
||||||
|
It is completely unreasonable for an application to be aware whether
|
||||||
|
it is accessing a SCSI or SATA disk.
|
||||||
|
|
||||||
|
The data integrity support implemented in Linux attempts to hide this
|
||||||
|
from the application. As far as the application (and to some extent
|
||||||
|
the kernel) is concerned, the integrity metadata is opaque information
|
||||||
|
that's attached to the I/O.
|
||||||
|
|
||||||
|
The current implementation allows the block layer to automatically
|
||||||
|
generate the protection information for any I/O. Eventually the
|
||||||
|
intent is to move the integrity metadata calculation to userspace for
|
||||||
|
user data. Metadata and other I/O that originates within the kernel
|
||||||
|
will still use the automatic generation interface.
|
||||||
|
|
||||||
|
Some storage devices allow each hardware sector to be tagged with a
|
||||||
|
16-bit value. The owner of this tag space is the owner of the block
|
||||||
|
device. I.e. the filesystem in most cases. The filesystem can use
|
||||||
|
this extra space to tag sectors as they see fit. Because the tag
|
||||||
|
space is limited, the block interface allows tagging bigger chunks by
|
||||||
|
way of interleaving. This way, 8*16 bits of information can be
|
||||||
|
attached to a typical 4KB filesystem block.
|
||||||
|
|
||||||
|
This also means that applications such as fsck and mkfs will need
|
||||||
|
access to manipulate the tags from user space. A passthrough
|
||||||
|
interface for this is being worked on.
|
||||||
|
|
||||||
|
|
||||||
|
----------------------------------------------------------------------
|
||||||
|
4. BLOCK LAYER IMPLEMENTATION DETAILS
|
||||||
|
|
||||||
|
4.1 BIO
|
||||||
|
|
||||||
|
The data integrity patches add a new field to struct bio when
|
||||||
|
CONFIG_BLK_DEV_INTEGRITY is enabled. bio->bi_integrity is a pointer
|
||||||
|
to a struct bip which contains the bio integrity payload. Essentially
|
||||||
|
a bip is a trimmed down struct bio which holds a bio_vec containing
|
||||||
|
the integrity metadata and the required housekeeping information (bvec
|
||||||
|
pool, vector count, etc.)
|
||||||
|
|
||||||
|
A kernel subsystem can enable data integrity protection on a bio by
|
||||||
|
calling bio_integrity_alloc(bio). This will allocate and attach the
|
||||||
|
bip to the bio.
|
||||||
|
|
||||||
|
Individual pages containing integrity metadata can subsequently be
|
||||||
|
attached using bio_integrity_add_page().
|
||||||
|
|
||||||
|
bio_free() will automatically free the bip.
|
||||||
|
|
||||||
|
|
||||||
|
4.2 BLOCK DEVICE
|
||||||
|
|
||||||
|
Because the format of the protection data is tied to the physical
|
||||||
|
disk, each block device has been extended with a block integrity
|
||||||
|
profile (struct blk_integrity). This optional profile is registered
|
||||||
|
with the block layer using blk_integrity_register().
|
||||||
|
|
||||||
|
The profile contains callback functions for generating and verifying
|
||||||
|
the protection data, as well as getting and setting application tags.
|
||||||
|
The profile also contains a few constants to aid in completing,
|
||||||
|
merging and splitting the integrity metadata.
|
||||||
|
|
||||||
|
Layered block devices will need to pick a profile that's appropriate
|
||||||
|
for all subdevices. blk_integrity_compare() can help with that. DM
|
||||||
|
and MD linear, RAID0 and RAID1 are currently supported. RAID4/5/6
|
||||||
|
will require extra work due to the application tag.
|
||||||
|
|
||||||
|
|
||||||
|
----------------------------------------------------------------------
|
||||||
|
5.0 BLOCK LAYER INTEGRITY API
|
||||||
|
|
||||||
|
5.1 NORMAL FILESYSTEM
|
||||||
|
|
||||||
|
The normal filesystem is unaware that the underlying block device
|
||||||
|
is capable of sending/receiving integrity metadata. The IMD will
|
||||||
|
be automatically generated by the block layer at submit_bio() time
|
||||||
|
in case of a WRITE. A READ request will cause the I/O integrity
|
||||||
|
to be verified upon completion.
|
||||||
|
|
||||||
|
IMD generation and verification can be toggled using the
|
||||||
|
|
||||||
|
/sys/block/<bdev>/integrity/write_generate
|
||||||
|
|
||||||
|
and
|
||||||
|
|
||||||
|
/sys/block/<bdev>/integrity/read_verify
|
||||||
|
|
||||||
|
flags.
|
||||||
|
|
||||||
|
|
||||||
|
5.2 INTEGRITY-AWARE FILESYSTEM
|
||||||
|
|
||||||
|
A filesystem that is integrity-aware can prepare I/Os with IMD
|
||||||
|
attached. It can also use the application tag space if this is
|
||||||
|
supported by the block device.
|
||||||
|
|
||||||
|
|
||||||
|
int bdev_integrity_enabled(block_device, int rw);
|
||||||
|
|
||||||
|
bdev_integrity_enabled() will return 1 if the block device
|
||||||
|
supports integrity metadata transfer for the data direction
|
||||||
|
specified in 'rw'.
|
||||||
|
|
||||||
|
bdev_integrity_enabled() honors the write_generate and
|
||||||
|
read_verify flags in sysfs and will respond accordingly.
|
||||||
|
|
||||||
|
|
||||||
|
int bio_integrity_prep(bio);
|
||||||
|
|
||||||
|
To generate IMD for WRITE and to set up buffers for READ, the
|
||||||
|
filesystem must call bio_integrity_prep(bio).
|
||||||
|
|
||||||
|
Prior to calling this function, the bio data direction and start
|
||||||
|
sector must be set, and the bio should have all data pages
|
||||||
|
added. It is up to the caller to ensure that the bio does not
|
||||||
|
change while I/O is in progress.
|
||||||
|
|
||||||
|
bio_integrity_prep() should only be called if
|
||||||
|
bio_integrity_enabled() returned 1.
|
||||||
|
|
||||||
|
|
||||||
|
int bio_integrity_tag_size(bio);
|
||||||
|
|
||||||
|
If the filesystem wants to use the application tag space it will
|
||||||
|
first have to find out how much storage space is available.
|
||||||
|
Because tag space is generally limited (usually 2 bytes per
|
||||||
|
sector regardless of sector size), the integrity framework
|
||||||
|
supports interleaving the information between the sectors in an
|
||||||
|
I/O.
|
||||||
|
|
||||||
|
Filesystems can call bio_integrity_tag_size(bio) to find out how
|
||||||
|
many bytes of storage are available for that particular bio.
|
||||||
|
|
||||||
|
Another option is bdev_get_tag_size(block_device) which will
|
||||||
|
return the number of available bytes per hardware sector.
|
||||||
|
|
||||||
|
|
||||||
|
int bio_integrity_set_tag(bio, void *tag_buf, len);
|
||||||
|
|
||||||
|
After a successful return from bio_integrity_prep(),
|
||||||
|
bio_integrity_set_tag() can be used to attach an opaque tag
|
||||||
|
buffer to a bio. Obviously this only makes sense if the I/O is
|
||||||
|
a WRITE.
|
||||||
|
|
||||||
|
|
||||||
|
int bio_integrity_get_tag(bio, void *tag_buf, len);
|
||||||
|
|
||||||
|
Similarly, at READ I/O completion time the filesystem can
|
||||||
|
retrieve the tag buffer using bio_integrity_get_tag().
|
||||||
|
|
||||||
|
|
||||||
|
6.3 PASSING EXISTING INTEGRITY METADATA
|
||||||
|
|
||||||
|
Filesystems that either generate their own integrity metadata or
|
||||||
|
are capable of transferring IMD from user space can use the
|
||||||
|
following calls:
|
||||||
|
|
||||||
|
|
||||||
|
struct bip * bio_integrity_alloc(bio, gfp_mask, nr_pages);
|
||||||
|
|
||||||
|
Allocates the bio integrity payload and hangs it off of the bio.
|
||||||
|
nr_pages indicate how many pages of protection data need to be
|
||||||
|
stored in the integrity bio_vec list (similar to bio_alloc()).
|
||||||
|
|
||||||
|
The integrity payload will be freed at bio_free() time.
|
||||||
|
|
||||||
|
|
||||||
|
int bio_integrity_add_page(bio, page, len, offset);
|
||||||
|
|
||||||
|
Attaches a page containing integrity metadata to an existing
|
||||||
|
bio. The bio must have an existing bip,
|
||||||
|
i.e. bio_integrity_alloc() must have been called. For a WRITE,
|
||||||
|
the integrity metadata in the pages must be in a format
|
||||||
|
understood by the target device with the notable exception that
|
||||||
|
the sector numbers will be remapped as the request traverses the
|
||||||
|
I/O stack. This implies that the pages added using this call
|
||||||
|
will be modified during I/O! The first reference tag in the
|
||||||
|
integrity metadata must have a value of bip->bip_sector.
|
||||||
|
|
||||||
|
Pages can be added using bio_integrity_add_page() as long as
|
||||||
|
there is room in the bip bio_vec array (nr_pages).
|
||||||
|
|
||||||
|
Upon completion of a READ operation, the attached pages will
|
||||||
|
contain the integrity metadata received from the storage device.
|
||||||
|
It is up to the receiver to process them and verify data
|
||||||
|
integrity upon completion.
|
||||||
|
|
||||||
|
|
||||||
|
6.4 REGISTERING A BLOCK DEVICE AS CAPABLE OF EXCHANGING INTEGRITY
|
||||||
|
METADATA
|
||||||
|
|
||||||
|
To enable integrity exchange on a block device the gendisk must be
|
||||||
|
registered as capable:
|
||||||
|
|
||||||
|
int blk_integrity_register(gendisk, blk_integrity);
|
||||||
|
|
||||||
|
The blk_integrity struct is a template and should contain the
|
||||||
|
following:
|
||||||
|
|
||||||
|
static struct blk_integrity my_profile = {
|
||||||
|
.name = "STANDARDSBODY-TYPE-VARIANT-CSUM",
|
||||||
|
.generate_fn = my_generate_fn,
|
||||||
|
.verify_fn = my_verify_fn,
|
||||||
|
.get_tag_fn = my_get_tag_fn,
|
||||||
|
.set_tag_fn = my_set_tag_fn,
|
||||||
|
.tuple_size = sizeof(struct my_tuple_size),
|
||||||
|
.tag_size = <tag bytes per hw sector>,
|
||||||
|
};
|
||||||
|
|
||||||
|
'name' is a text string which will be visible in sysfs. This is
|
||||||
|
part of the userland API so chose it carefully and never change
|
||||||
|
it. The format is standards body-type-variant.
|
||||||
|
E.g. T10-DIF-TYPE1-IP or T13-EPP-0-CRC.
|
||||||
|
|
||||||
|
'generate_fn' generates appropriate integrity metadata (for WRITE).
|
||||||
|
|
||||||
|
'verify_fn' verifies that the data buffer matches the integrity
|
||||||
|
metadata.
|
||||||
|
|
||||||
|
'tuple_size' must be set to match the size of the integrity
|
||||||
|
metadata per sector. I.e. 8 for DIF and EPP.
|
||||||
|
|
||||||
|
'tag_size' must be set to identify how many bytes of tag space
|
||||||
|
are available per hardware sector. For DIF this is either 2 or
|
||||||
|
0 depending on the value of the Control Mode Page ATO bit.
|
||||||
|
|
||||||
|
See 6.2 for a description of get_tag_fn and set_tag_fn.
|
||||||
|
|
||||||
|
----------------------------------------------------------------------
|
||||||
|
2007-12-24 Martin K. Petersen <martin.petersen@oracle.com>
|
67
Documentation/bt8xxgpio.txt
Normal file
67
Documentation/bt8xxgpio.txt
Normal file
|
@ -0,0 +1,67 @@
|
||||||
|
===============================================================
|
||||||
|
== BT8XXGPIO driver ==
|
||||||
|
== ==
|
||||||
|
== A driver for a selfmade cheap BT8xx based PCI GPIO-card ==
|
||||||
|
== ==
|
||||||
|
== For advanced documentation, see ==
|
||||||
|
== http://www.bu3sch.de/btgpio.php ==
|
||||||
|
===============================================================
|
||||||
|
|
||||||
|
|
||||||
|
A generic digital 24-port PCI GPIO card can be built out of an ordinary
|
||||||
|
Brooktree bt848, bt849, bt878 or bt879 based analog TV tuner card. The
|
||||||
|
Brooktree chip is used in old analog Hauppauge WinTV PCI cards. You can easily
|
||||||
|
find them used for low prices on the net.
|
||||||
|
|
||||||
|
The bt8xx chip does have 24 digital GPIO ports.
|
||||||
|
These ports are accessible via 24 pins on the SMD chip package.
|
||||||
|
|
||||||
|
|
||||||
|
==============================================
|
||||||
|
== How to physically access the GPIO pins ==
|
||||||
|
==============================================
|
||||||
|
|
||||||
|
The are several ways to access these pins. One might unsolder the whole chip
|
||||||
|
and put it on a custom PCI board, or one might only unsolder each individual
|
||||||
|
GPIO pin and solder that to some tiny wire. As the chip package really is tiny
|
||||||
|
there are some advanced soldering skills needed in any case.
|
||||||
|
|
||||||
|
The physical pinouts are drawn in the following ASCII art.
|
||||||
|
The GPIO pins are marked with G00-G23
|
||||||
|
|
||||||
|
G G G G G G G G G G G G G G G G G G
|
||||||
|
0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
|
||||||
|
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7
|
||||||
|
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
|
||||||
|
---------------------------------------------------------------------------
|
||||||
|
--| ^ ^ |--
|
||||||
|
--| pin 86 pin 67 |--
|
||||||
|
--| |--
|
||||||
|
--| pin 61 > |-- G18
|
||||||
|
--| |-- G19
|
||||||
|
--| |-- G20
|
||||||
|
--| |-- G21
|
||||||
|
--| |-- G22
|
||||||
|
--| pin 56 > |-- G23
|
||||||
|
--| |--
|
||||||
|
--| Brooktree 878/879 |--
|
||||||
|
--| |--
|
||||||
|
--| |--
|
||||||
|
--| |--
|
||||||
|
--| |--
|
||||||
|
--| |--
|
||||||
|
--| |--
|
||||||
|
--| |--
|
||||||
|
--| |--
|
||||||
|
--| |--
|
||||||
|
--| |--
|
||||||
|
--| |--
|
||||||
|
--| |--
|
||||||
|
--| |--
|
||||||
|
--| O |--
|
||||||
|
--| |--
|
||||||
|
---------------------------------------------------------------------------
|
||||||
|
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
|
||||||
|
^
|
||||||
|
This is pin 1
|
||||||
|
|
|
@ -390,6 +390,10 @@ If you have several tasks to attach, you have to do it one after another:
|
||||||
...
|
...
|
||||||
# /bin/echo PIDn > tasks
|
# /bin/echo PIDn > tasks
|
||||||
|
|
||||||
|
You can attach the current shell task by echoing 0:
|
||||||
|
|
||||||
|
# echo 0 > tasks
|
||||||
|
|
||||||
3. Kernel API
|
3. Kernel API
|
||||||
=============
|
=============
|
||||||
|
|
||||||
|
|
|
@ -13,7 +13,7 @@ either an integer or * for all. Access is a composition of r
|
||||||
The root device cgroup starts with rwm to 'all'. A child device
|
The root device cgroup starts with rwm to 'all'. A child device
|
||||||
cgroup gets a copy of the parent. Administrators can then remove
|
cgroup gets a copy of the parent. Administrators can then remove
|
||||||
devices from the whitelist or add new entries. A child cgroup can
|
devices from the whitelist or add new entries. A child cgroup can
|
||||||
never receive a device access which is denied its parent. However
|
never receive a device access which is denied by its parent. However
|
||||||
when a device access is removed from a parent it will not also be
|
when a device access is removed from a parent it will not also be
|
||||||
removed from the child(ren).
|
removed from the child(ren).
|
||||||
|
|
||||||
|
@ -29,7 +29,11 @@ allows cgroup 1 to read and mknod the device usually known as
|
||||||
|
|
||||||
echo a > /cgroups/1/devices.deny
|
echo a > /cgroups/1/devices.deny
|
||||||
|
|
||||||
will remove the default 'a *:* mrw' entry.
|
will remove the default 'a *:* rwm' entry. Doing
|
||||||
|
|
||||||
|
echo a > /cgroups/1/devices.allow
|
||||||
|
|
||||||
|
will add the 'a *:* rwm' entry to the whitelist.
|
||||||
|
|
||||||
3. Security
|
3. Security
|
||||||
|
|
||||||
|
|
|
@ -242,8 +242,7 @@ rmdir() if there are no tasks.
|
||||||
1. Add support for accounting huge pages (as a separate controller)
|
1. Add support for accounting huge pages (as a separate controller)
|
||||||
2. Make per-cgroup scanner reclaim not-shared pages first
|
2. Make per-cgroup scanner reclaim not-shared pages first
|
||||||
3. Teach controller to account for shared-pages
|
3. Teach controller to account for shared-pages
|
||||||
4. Start reclamation when the limit is lowered
|
4. Start reclamation in the background when the limit is
|
||||||
5. Start reclamation in the background when the limit is
|
|
||||||
not yet hit but the usage is getting closer
|
not yet hit but the usage is getting closer
|
||||||
|
|
||||||
Summary
|
Summary
|
||||||
|
|
|
@ -122,7 +122,7 @@ around '10000' or more.
|
||||||
show_sampling_rate_(min|max): the minimum and maximum sampling rates
|
show_sampling_rate_(min|max): the minimum and maximum sampling rates
|
||||||
available that you may set 'sampling_rate' to.
|
available that you may set 'sampling_rate' to.
|
||||||
|
|
||||||
up_threshold: defines what the average CPU usaged between the samplings
|
up_threshold: defines what the average CPU usage between the samplings
|
||||||
of 'sampling_rate' needs to be for the kernel to make a decision on
|
of 'sampling_rate' needs to be for the kernel to make a decision on
|
||||||
whether it should increase the frequency. For example when it is set
|
whether it should increase the frequency. For example when it is set
|
||||||
to its default value of '80' it means that between the checking
|
to its default value of '80' it means that between the checking
|
||||||
|
|
|
@ -154,13 +154,15 @@ browsing and modifying the cpusets presently known to the kernel. No
|
||||||
new system calls are added for cpusets - all support for querying and
|
new system calls are added for cpusets - all support for querying and
|
||||||
modifying cpusets is via this cpuset file system.
|
modifying cpusets is via this cpuset file system.
|
||||||
|
|
||||||
The /proc/<pid>/status file for each task has two added lines,
|
The /proc/<pid>/status file for each task has four added lines,
|
||||||
displaying the tasks cpus_allowed (on which CPUs it may be scheduled)
|
displaying the tasks cpus_allowed (on which CPUs it may be scheduled)
|
||||||
and mems_allowed (on which Memory Nodes it may obtain memory),
|
and mems_allowed (on which Memory Nodes it may obtain memory),
|
||||||
in the format seen in the following example:
|
in the two formats seen in the following example:
|
||||||
|
|
||||||
Cpus_allowed: ffffffff,ffffffff,ffffffff,ffffffff
|
Cpus_allowed: ffffffff,ffffffff,ffffffff,ffffffff
|
||||||
|
Cpus_allowed_list: 0-127
|
||||||
Mems_allowed: ffffffff,ffffffff
|
Mems_allowed: ffffffff,ffffffff
|
||||||
|
Mems_allowed_list: 0-63
|
||||||
|
|
||||||
Each cpuset is represented by a directory in the cgroup file system
|
Each cpuset is represented by a directory in the cgroup file system
|
||||||
containing (on top of the standard cgroup files) the following
|
containing (on top of the standard cgroup files) the following
|
||||||
|
@ -542,7 +544,10 @@ otherwise initial value -1 that indicates the cpuset has no request.
|
||||||
2 : search cores in a package.
|
2 : search cores in a package.
|
||||||
3 : search cpus in a node [= system wide on non-NUMA system]
|
3 : search cpus in a node [= system wide on non-NUMA system]
|
||||||
( 4 : search nodes in a chunk of node [on NUMA system] )
|
( 4 : search nodes in a chunk of node [on NUMA system] )
|
||||||
( 5~ : search system wide [on NUMA system])
|
( 5 : search system wide [on NUMA system] )
|
||||||
|
|
||||||
|
The system default is architecture dependent. The system default
|
||||||
|
can be changed using the relax_domain_level= boot parameter.
|
||||||
|
|
||||||
This file is per-cpuset and affect the sched domain where the cpuset
|
This file is per-cpuset and affect the sched domain where the cpuset
|
||||||
belongs to. Therefore if the flag 'sched_load_balance' of a cpuset
|
belongs to. Therefore if the flag 'sched_load_balance' of a cpuset
|
||||||
|
|
|
@ -14,9 +14,8 @@ represent the thread siblings to cpu X in the same physical package;
|
||||||
To implement it in an architecture-neutral way, a new source file,
|
To implement it in an architecture-neutral way, a new source file,
|
||||||
drivers/base/topology.c, is to export the 4 attributes.
|
drivers/base/topology.c, is to export the 4 attributes.
|
||||||
|
|
||||||
If one architecture wants to support this feature, it just needs to
|
For an architecture to support this feature, it must define some of
|
||||||
implement 4 defines, typically in file include/asm-XXX/topology.h.
|
these macros in include/asm-XXX/topology.h:
|
||||||
The 4 defines are:
|
|
||||||
#define topology_physical_package_id(cpu)
|
#define topology_physical_package_id(cpu)
|
||||||
#define topology_core_id(cpu)
|
#define topology_core_id(cpu)
|
||||||
#define topology_thread_siblings(cpu)
|
#define topology_thread_siblings(cpu)
|
||||||
|
@ -25,17 +24,10 @@ The 4 defines are:
|
||||||
The type of **_id is int.
|
The type of **_id is int.
|
||||||
The type of siblings is cpumask_t.
|
The type of siblings is cpumask_t.
|
||||||
|
|
||||||
To be consistent on all architectures, the 4 attributes should have
|
To be consistent on all architectures, include/linux/topology.h
|
||||||
default values if their values are unavailable. Below is the rule.
|
provides default definitions for any of the above macros that are
|
||||||
1) physical_package_id: If cpu has no physical package id, -1 is the
|
not defined by include/asm-XXX/topology.h:
|
||||||
default value.
|
1) physical_package_id: -1
|
||||||
2) core_id: If cpu doesn't support multi-core, its core id is 0.
|
2) core_id: 0
|
||||||
3) thread_siblings: Just include itself, if the cpu doesn't support
|
3) thread_siblings: just the given CPU
|
||||||
HT/multi-thread.
|
4) core_siblings: just the given CPU
|
||||||
4) core_siblings: Just include itself, if the cpu doesn't support
|
|
||||||
multi-core and HT/Multi-thread.
|
|
||||||
|
|
||||||
So be careful when declaring the 4 defines in include/asm-XXX/topology.h.
|
|
||||||
|
|
||||||
If an attribute isn't defined on an architecture, it won't be exported.
|
|
||||||
|
|
||||||
|
|
|
@ -222,74 +222,9 @@ both csrow2 and csrow3 are populated, this indicates a dual ranked
|
||||||
set of DIMMs for channels 0 and 1.
|
set of DIMMs for channels 0 and 1.
|
||||||
|
|
||||||
|
|
||||||
Within each of the 'mc','mcX' and 'csrowX' directories are several
|
Within each of the 'mcX' and 'csrowX' directories are several
|
||||||
EDAC control and attribute files.
|
EDAC control and attribute files.
|
||||||
|
|
||||||
|
|
||||||
============================================================================
|
|
||||||
DIRECTORY 'mc'
|
|
||||||
|
|
||||||
In directory 'mc' are EDAC system overall control and attribute files:
|
|
||||||
|
|
||||||
|
|
||||||
Panic on UE control file:
|
|
||||||
|
|
||||||
'edac_mc_panic_on_ue'
|
|
||||||
|
|
||||||
An uncorrectable error will cause a machine panic. This is usually
|
|
||||||
desirable. It is a bad idea to continue when an uncorrectable error
|
|
||||||
occurs - it is indeterminate what was uncorrected and the operating
|
|
||||||
system context might be so mangled that continuing will lead to further
|
|
||||||
corruption. If the kernel has MCE configured, then EDAC will never
|
|
||||||
notice the UE.
|
|
||||||
|
|
||||||
LOAD TIME: module/kernel parameter: panic_on_ue=[0|1]
|
|
||||||
|
|
||||||
RUN TIME: echo "1" >/sys/devices/system/edac/mc/edac_mc_panic_on_ue
|
|
||||||
|
|
||||||
|
|
||||||
Log UE control file:
|
|
||||||
|
|
||||||
'edac_mc_log_ue'
|
|
||||||
|
|
||||||
Generate kernel messages describing uncorrectable errors. These errors
|
|
||||||
are reported through the system message log system. UE statistics
|
|
||||||
will be accumulated even when UE logging is disabled.
|
|
||||||
|
|
||||||
LOAD TIME: module/kernel parameter: log_ue=[0|1]
|
|
||||||
|
|
||||||
RUN TIME: echo "1" >/sys/devices/system/edac/mc/edac_mc_log_ue
|
|
||||||
|
|
||||||
|
|
||||||
Log CE control file:
|
|
||||||
|
|
||||||
'edac_mc_log_ce'
|
|
||||||
|
|
||||||
Generate kernel messages describing correctable errors. These
|
|
||||||
errors are reported through the system message log system.
|
|
||||||
CE statistics will be accumulated even when CE logging is disabled.
|
|
||||||
|
|
||||||
LOAD TIME: module/kernel parameter: log_ce=[0|1]
|
|
||||||
|
|
||||||
RUN TIME: echo "1" >/sys/devices/system/edac/mc/edac_mc_log_ce
|
|
||||||
|
|
||||||
|
|
||||||
Polling period control file:
|
|
||||||
|
|
||||||
'edac_mc_poll_msec'
|
|
||||||
|
|
||||||
The time period, in milliseconds, for polling for error information.
|
|
||||||
Too small a value wastes resources. Too large a value might delay
|
|
||||||
necessary handling of errors and might loose valuable information for
|
|
||||||
locating the error. 1000 milliseconds (once each second) is the current
|
|
||||||
default. Systems which require all the bandwidth they can get, may
|
|
||||||
increase this.
|
|
||||||
|
|
||||||
LOAD TIME: module/kernel parameter: poll_msec=[0|1]
|
|
||||||
|
|
||||||
RUN TIME: echo "1000" >/sys/devices/system/edac/mc/edac_mc_poll_msec
|
|
||||||
|
|
||||||
|
|
||||||
============================================================================
|
============================================================================
|
||||||
'mcX' DIRECTORIES
|
'mcX' DIRECTORIES
|
||||||
|
|
||||||
|
@ -392,7 +327,7 @@ Sdram memory scrubbing rate:
|
||||||
'sdram_scrub_rate'
|
'sdram_scrub_rate'
|
||||||
|
|
||||||
Read/Write attribute file that controls memory scrubbing. The scrubbing
|
Read/Write attribute file that controls memory scrubbing. The scrubbing
|
||||||
rate is set by writing a minimum bandwith in bytes/sec to the attribute
|
rate is set by writing a minimum bandwidth in bytes/sec to the attribute
|
||||||
file. The rate will be translated to an internal value that gives at
|
file. The rate will be translated to an internal value that gives at
|
||||||
least the specified rate.
|
least the specified rate.
|
||||||
|
|
||||||
|
@ -537,7 +472,6 @@ Channel 1 DIMM Label control file:
|
||||||
motherboard specific and determination of this information
|
motherboard specific and determination of this information
|
||||||
must occur in userland at this time.
|
must occur in userland at this time.
|
||||||
|
|
||||||
|
|
||||||
============================================================================
|
============================================================================
|
||||||
SYSTEM LOGGING
|
SYSTEM LOGGING
|
||||||
|
|
||||||
|
@ -570,7 +504,6 @@ error type, a notice of "no info" and then an optional,
|
||||||
driver-specific error message.
|
driver-specific error message.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
============================================================================
|
============================================================================
|
||||||
PCI Bus Parity Detection
|
PCI Bus Parity Detection
|
||||||
|
|
||||||
|
@ -604,6 +537,74 @@ Enable/Disable PCI Parity checking control file:
|
||||||
echo "0" >/sys/devices/system/edac/pci/check_pci_parity
|
echo "0" >/sys/devices/system/edac/pci/check_pci_parity
|
||||||
|
|
||||||
|
|
||||||
|
Parity Count:
|
||||||
|
|
||||||
|
'pci_parity_count'
|
||||||
|
|
||||||
|
This attribute file will display the number of parity errors that
|
||||||
|
have been detected.
|
||||||
|
|
||||||
|
|
||||||
|
============================================================================
|
||||||
|
MODULE PARAMETERS
|
||||||
|
|
||||||
|
Panic on UE control file:
|
||||||
|
|
||||||
|
'edac_mc_panic_on_ue'
|
||||||
|
|
||||||
|
An uncorrectable error will cause a machine panic. This is usually
|
||||||
|
desirable. It is a bad idea to continue when an uncorrectable error
|
||||||
|
occurs - it is indeterminate what was uncorrected and the operating
|
||||||
|
system context might be so mangled that continuing will lead to further
|
||||||
|
corruption. If the kernel has MCE configured, then EDAC will never
|
||||||
|
notice the UE.
|
||||||
|
|
||||||
|
LOAD TIME: module/kernel parameter: edac_mc_panic_on_ue=[0|1]
|
||||||
|
|
||||||
|
RUN TIME: echo "1" > /sys/module/edac_core/parameters/edac_mc_panic_on_ue
|
||||||
|
|
||||||
|
|
||||||
|
Log UE control file:
|
||||||
|
|
||||||
|
'edac_mc_log_ue'
|
||||||
|
|
||||||
|
Generate kernel messages describing uncorrectable errors. These errors
|
||||||
|
are reported through the system message log system. UE statistics
|
||||||
|
will be accumulated even when UE logging is disabled.
|
||||||
|
|
||||||
|
LOAD TIME: module/kernel parameter: edac_mc_log_ue=[0|1]
|
||||||
|
|
||||||
|
RUN TIME: echo "1" > /sys/module/edac_core/parameters/edac_mc_log_ue
|
||||||
|
|
||||||
|
|
||||||
|
Log CE control file:
|
||||||
|
|
||||||
|
'edac_mc_log_ce'
|
||||||
|
|
||||||
|
Generate kernel messages describing correctable errors. These
|
||||||
|
errors are reported through the system message log system.
|
||||||
|
CE statistics will be accumulated even when CE logging is disabled.
|
||||||
|
|
||||||
|
LOAD TIME: module/kernel parameter: edac_mc_log_ce=[0|1]
|
||||||
|
|
||||||
|
RUN TIME: echo "1" > /sys/module/edac_core/parameters/edac_mc_log_ce
|
||||||
|
|
||||||
|
|
||||||
|
Polling period control file:
|
||||||
|
|
||||||
|
'edac_mc_poll_msec'
|
||||||
|
|
||||||
|
The time period, in milliseconds, for polling for error information.
|
||||||
|
Too small a value wastes resources. Too large a value might delay
|
||||||
|
necessary handling of errors and might loose valuable information for
|
||||||
|
locating the error. 1000 milliseconds (once each second) is the current
|
||||||
|
default. Systems which require all the bandwidth they can get, may
|
||||||
|
increase this.
|
||||||
|
|
||||||
|
LOAD TIME: module/kernel parameter: edac_mc_poll_msec=[0|1]
|
||||||
|
|
||||||
|
RUN TIME: echo "1000" > /sys/module/edac_core/parameters/edac_mc_poll_msec
|
||||||
|
|
||||||
|
|
||||||
Panic on PCI PARITY Error:
|
Panic on PCI PARITY Error:
|
||||||
|
|
||||||
|
@ -614,21 +615,13 @@ Panic on PCI PARITY Error:
|
||||||
error has been detected.
|
error has been detected.
|
||||||
|
|
||||||
|
|
||||||
module/kernel parameter: panic_on_pci_parity=[0|1]
|
module/kernel parameter: edac_panic_on_pci_pe=[0|1]
|
||||||
|
|
||||||
Enable:
|
Enable:
|
||||||
echo "1" >/sys/devices/system/edac/pci/panic_on_pci_parity
|
echo "1" > /sys/module/edac_core/parameters/edac_panic_on_pci_pe
|
||||||
|
|
||||||
Disable:
|
Disable:
|
||||||
echo "0" >/sys/devices/system/edac/pci/panic_on_pci_parity
|
echo "0" > /sys/module/edac_core/parameters/edac_panic_on_pci_pe
|
||||||
|
|
||||||
|
|
||||||
Parity Count:
|
|
||||||
|
|
||||||
'pci_parity_count'
|
|
||||||
|
|
||||||
This attribute file will display the number of parity errors that
|
|
||||||
have been detected.
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
131
Documentation/fb/sh7760fb.txt
Normal file
131
Documentation/fb/sh7760fb.txt
Normal file
|
@ -0,0 +1,131 @@
|
||||||
|
SH7760/SH7763 integrated LCDC Framebuffer driver
|
||||||
|
================================================
|
||||||
|
|
||||||
|
0. Overwiew
|
||||||
|
-----------
|
||||||
|
The SH7760/SH7763 have an integrated LCD Display controller (LCDC) which
|
||||||
|
supports (in theory) resolutions ranging from 1x1 to 1024x1024,
|
||||||
|
with color depths ranging from 1 to 16 bits, on STN, DSTN and TFT Panels.
|
||||||
|
|
||||||
|
Caveats:
|
||||||
|
* Framebuffer memory must be a large chunk allocated at the top
|
||||||
|
of Area3 (HW requirement). Because of this requirement you should NOT
|
||||||
|
make the driver a module since at runtime it may become impossible to
|
||||||
|
get a large enough contiguous chunk of memory.
|
||||||
|
|
||||||
|
* The driver does not support changing resolution while loaded
|
||||||
|
(displays aren't hotpluggable anyway)
|
||||||
|
|
||||||
|
* Heavy flickering may be observed
|
||||||
|
a) if you're using 15/16bit color modes at >= 640x480 px resolutions,
|
||||||
|
b) during PCMCIA (or any other slow bus) activity.
|
||||||
|
|
||||||
|
* Rotation works only 90degress clockwise, and only if horizontal
|
||||||
|
resolution is <= 320 pixels.
|
||||||
|
|
||||||
|
files: drivers/video/sh7760fb.c
|
||||||
|
include/asm-sh/sh7760fb.h
|
||||||
|
Documentation/fb/sh7760fb.txt
|
||||||
|
|
||||||
|
1. Platform setup
|
||||||
|
-----------------
|
||||||
|
SH7760:
|
||||||
|
Video data is fetched via the DMABRG DMA engine, so you have to
|
||||||
|
configure the SH DMAC for DMABRG mode (write 0x94808080 to the
|
||||||
|
DMARSRA register somewhere at boot).
|
||||||
|
|
||||||
|
PFC registers PCCR and PCDR must be set to peripheral mode.
|
||||||
|
(write zeros to both).
|
||||||
|
|
||||||
|
The driver does NOT do the above for you since board setup is, well, job
|
||||||
|
of the board setup code.
|
||||||
|
|
||||||
|
2. Panel definitions
|
||||||
|
--------------------
|
||||||
|
The LCDC must explicitly be told about the type of LCD panel
|
||||||
|
attached. Data must be wrapped in a "struct sh7760fb_platdata" and
|
||||||
|
passed to the driver as platform_data.
|
||||||
|
|
||||||
|
Suggest you take a closer look at the SH7760 Manual, Section 30.
|
||||||
|
(http://documentation.renesas.com/eng/products/mpumcu/e602291_sh7760.pdf)
|
||||||
|
|
||||||
|
The following code illustrates what needs to be done to
|
||||||
|
get the framebuffer working on a 640x480 TFT:
|
||||||
|
|
||||||
|
====================== cut here ======================================
|
||||||
|
|
||||||
|
#include <linux/fb.h>
|
||||||
|
#include <asm/sh7760fb.h>
|
||||||
|
|
||||||
|
/*
|
||||||
|
* NEC NL6440bc26-01 640x480 TFT
|
||||||
|
* dotclock 25175 kHz
|
||||||
|
* Xres 640 Yres 480
|
||||||
|
* Htotal 800 Vtotal 525
|
||||||
|
* HsynStart 656 VsynStart 490
|
||||||
|
* HsynLenn 30 VsynLenn 2
|
||||||
|
*
|
||||||
|
* The linux framebuffer layer does not use the syncstart/synclen
|
||||||
|
* values but right/left/upper/lower margin values. The comments
|
||||||
|
* for the x_margin explain how to calculate those from given
|
||||||
|
* panel sync timings.
|
||||||
|
*/
|
||||||
|
static struct fb_videomode nl6448bc26 = {
|
||||||
|
.name = "NL6448BC26",
|
||||||
|
.refresh = 60,
|
||||||
|
.xres = 640,
|
||||||
|
.yres = 480,
|
||||||
|
.pixclock = 39683, /* in picoseconds! */
|
||||||
|
.hsync_len = 30,
|
||||||
|
.vsync_len = 2,
|
||||||
|
.left_margin = 114, /* HTOT - (HSYNSLEN + HSYNSTART) */
|
||||||
|
.right_margin = 16, /* HSYNSTART - XRES */
|
||||||
|
.upper_margin = 33, /* VTOT - (VSYNLEN + VSYNSTART) */
|
||||||
|
.lower_margin = 10, /* VSYNSTART - YRES */
|
||||||
|
.sync = FB_SYNC_HOR_HIGH_ACT | FB_SYNC_VERT_HIGH_ACT,
|
||||||
|
.vmode = FB_VMODE_NONINTERLACED,
|
||||||
|
.flag = 0,
|
||||||
|
};
|
||||||
|
|
||||||
|
static struct sh7760fb_platdata sh7760fb_nl6448 = {
|
||||||
|
.def_mode = &nl6448bc26,
|
||||||
|
.ldmtr = LDMTR_TFT_COLOR_16, /* 16bit TFT panel */
|
||||||
|
.lddfr = LDDFR_8BPP, /* we want 8bit output */
|
||||||
|
.ldpmmr = 0x0070,
|
||||||
|
.ldpspr = 0x0500,
|
||||||
|
.ldaclnr = 0,
|
||||||
|
.ldickr = LDICKR_CLKSRC(LCDC_CLKSRC_EXTERNAL) |
|
||||||
|
LDICKR_CLKDIV(1),
|
||||||
|
.rotate = 0,
|
||||||
|
.novsync = 1,
|
||||||
|
.blank = NULL,
|
||||||
|
};
|
||||||
|
|
||||||
|
/* SH7760:
|
||||||
|
* 0xFE300800: 256 * 4byte xRGB palette ram
|
||||||
|
* 0xFE300C00: 42 bytes ctrl registers
|
||||||
|
*/
|
||||||
|
static struct resource sh7760_lcdc_res[] = {
|
||||||
|
[0] = {
|
||||||
|
.start = 0xFE300800,
|
||||||
|
.end = 0xFE300CFF,
|
||||||
|
.flags = IORESOURCE_MEM,
|
||||||
|
},
|
||||||
|
[1] = {
|
||||||
|
.start = 65,
|
||||||
|
.end = 65,
|
||||||
|
.flags = IORESOURCE_IRQ,
|
||||||
|
},
|
||||||
|
};
|
||||||
|
|
||||||
|
static struct platform_device sh7760_lcdc_dev = {
|
||||||
|
.dev = {
|
||||||
|
.platform_data = &sh7760fb_nl6448,
|
||||||
|
},
|
||||||
|
.name = "sh7760-lcdc",
|
||||||
|
.id = -1,
|
||||||
|
.resource = sh7760_lcdc_res,
|
||||||
|
.num_resources = ARRAY_SIZE(sh7760_lcdc_res),
|
||||||
|
};
|
||||||
|
|
||||||
|
====================== cut here ======================================
|
|
@ -3,11 +3,25 @@ Tridentfb is a framebuffer driver for some Trident chip based cards.
|
||||||
The following list of chips is thought to be supported although not all are
|
The following list of chips is thought to be supported although not all are
|
||||||
tested:
|
tested:
|
||||||
|
|
||||||
those from the Image series with Cyber in their names - accelerated
|
those from the TGUI series 9440/96XX and with Cyber in their names
|
||||||
those with Blade in their names (Blade3D,CyberBlade...) - accelerated
|
those from the Image series and with Cyber in their names
|
||||||
the newer CyberBladeXP family - nonaccelerated
|
those with Blade in their names (Blade3D,CyberBlade...)
|
||||||
|
the newer CyberBladeXP family
|
||||||
|
|
||||||
Only PCI/AGP based cards are supported, none of the older Tridents.
|
All families are accelerated. Only PCI/AGP based cards are supported,
|
||||||
|
none of the older Tridents.
|
||||||
|
The driver supports 8, 16 and 32 bits per pixel depths.
|
||||||
|
The TGUI family requires a line length to be power of 2 if acceleration
|
||||||
|
is enabled. This means that range of possible resolutions and bpp is
|
||||||
|
limited comparing to the range if acceleration is disabled (see list
|
||||||
|
of parameters below).
|
||||||
|
|
||||||
|
Known bugs:
|
||||||
|
1. The driver randomly locks up on 3DImage975 chip with acceleration
|
||||||
|
enabled. The same happens in X11 (Xorg).
|
||||||
|
2. The ramdac speeds require some more fine tuning. It is possible to
|
||||||
|
switch resolution which the chip does not support at some depths for
|
||||||
|
older chips.
|
||||||
|
|
||||||
How to use it?
|
How to use it?
|
||||||
==============
|
==============
|
||||||
|
@ -17,12 +31,11 @@ video=tridentfb
|
||||||
|
|
||||||
The parameters for tridentfb are concatenated with a ':' as in this example.
|
The parameters for tridentfb are concatenated with a ':' as in this example.
|
||||||
|
|
||||||
video=tridentfb:800x600,bpp=16,noaccel
|
video=tridentfb:800x600-16@75,noaccel
|
||||||
|
|
||||||
The second level parameters that tridentfb understands are:
|
The second level parameters that tridentfb understands are:
|
||||||
|
|
||||||
noaccel - turns off acceleration (when it doesn't work for your card)
|
noaccel - turns off acceleration (when it doesn't work for your card)
|
||||||
accel - force text acceleration (for boards which by default are noacceled)
|
|
||||||
|
|
||||||
fp - use flat panel related stuff
|
fp - use flat panel related stuff
|
||||||
crt - assume monitor is present instead of fp
|
crt - assume monitor is present instead of fp
|
||||||
|
@ -31,21 +44,24 @@ center - for flat panels and resolutions smaller than native size center the
|
||||||
image, otherwise use
|
image, otherwise use
|
||||||
stretch
|
stretch
|
||||||
|
|
||||||
memsize - integer value in Kb, use if your card's memory size is misdetected.
|
memsize - integer value in KB, use if your card's memory size is misdetected.
|
||||||
look at the driver output to see what it says when initializing.
|
look at the driver output to see what it says when initializing.
|
||||||
memdiff - integer value in Kb,should be nonzero if your card reports
|
|
||||||
more memory than it actually has.For instance mine is 192K less than
|
memdiff - integer value in KB, should be nonzero if your card reports
|
||||||
|
more memory than it actually has. For instance mine is 192K less than
|
||||||
detection says in all three BIOS selectable situations 2M, 4M, 8M.
|
detection says in all three BIOS selectable situations 2M, 4M, 8M.
|
||||||
Only use if your video memory is taken from main memory hence of
|
Only use if your video memory is taken from main memory hence of
|
||||||
configurable size.Otherwise use memsize.
|
configurable size. Otherwise use memsize.
|
||||||
If in some modes which barely fit the memory you see garbage at the bottom
|
If in some modes which barely fit the memory you see garbage
|
||||||
this might help by not letting change to that mode anymore.
|
at the bottom this might help by not letting change to that mode
|
||||||
|
anymore.
|
||||||
|
|
||||||
nativex - the width in pixels of the flat panel.If you know it (usually 1024
|
nativex - the width in pixels of the flat panel.If you know it (usually 1024
|
||||||
800 or 1280) and it is not what the driver seems to detect use it.
|
800 or 1280) and it is not what the driver seems to detect use it.
|
||||||
|
|
||||||
bpp - bits per pixel (8,16 or 32)
|
bpp - bits per pixel (8,16 or 32)
|
||||||
mode - a mode name like 800x600 (as described in Documentation/fb/modedb.txt)
|
mode - a mode name like 800x600-8@75 as described in
|
||||||
|
Documentation/fb/modedb.txt
|
||||||
|
|
||||||
Using insane values for the above parameters will probably result in driver
|
Using insane values for the above parameters will probably result in driver
|
||||||
misbehaviour so take care(for instance memsize=12345678 or memdiff=23784 or
|
misbehaviour so take care(for instance memsize=12345678 or memdiff=23784 or
|
||||||
|
|
|
@ -47,6 +47,30 @@ Who: Mauro Carvalho Chehab <mchehab@infradead.org>
|
||||||
|
|
||||||
---------------------------
|
---------------------------
|
||||||
|
|
||||||
|
What: old tuner-3036 i2c driver
|
||||||
|
When: 2.6.28
|
||||||
|
Why: This driver is for VERY old i2c-over-parallel port teletext receiver
|
||||||
|
boxes. Rather then spending effort on converting this driver to V4L2,
|
||||||
|
and since it is extremely unlikely that anyone still uses one of these
|
||||||
|
devices, it was decided to drop it.
|
||||||
|
Who: Hans Verkuil <hverkuil@xs4all.nl>
|
||||||
|
Mauro Carvalho Chehab <mchehab@infradead.org>
|
||||||
|
|
||||||
|
---------------------------
|
||||||
|
|
||||||
|
What: V4L2 dpc7146 driver
|
||||||
|
When: 2.6.28
|
||||||
|
Why: Old driver for the dpc7146 demonstration board that is no longer
|
||||||
|
relevant. The last time this was tested on actual hardware was
|
||||||
|
probably around 2002. Since this is a driver for a demonstration
|
||||||
|
board the decision was made to remove it rather than spending a
|
||||||
|
lot of effort continually updating this driver to stay in sync
|
||||||
|
with the latest internal V4L2 or I2C API.
|
||||||
|
Who: Hans Verkuil <hverkuil@xs4all.nl>
|
||||||
|
Mauro Carvalho Chehab <mchehab@infradead.org>
|
||||||
|
|
||||||
|
---------------------------
|
||||||
|
|
||||||
What: PCMCIA control ioctl (needed for pcmcia-cs [cardmgr, cardctl])
|
What: PCMCIA control ioctl (needed for pcmcia-cs [cardmgr, cardctl])
|
||||||
When: November 2005
|
When: November 2005
|
||||||
Files: drivers/pcmcia/: pcmcia_ioctl.c
|
Files: drivers/pcmcia/: pcmcia_ioctl.c
|
||||||
|
@ -138,24 +162,6 @@ Who: Kay Sievers <kay.sievers@suse.de>
|
||||||
|
|
||||||
---------------------------
|
---------------------------
|
||||||
|
|
||||||
What: find_task_by_pid
|
|
||||||
When: 2.6.26
|
|
||||||
Why: With pid namespaces, calling this funciton will return the
|
|
||||||
wrong task when called from inside a namespace.
|
|
||||||
|
|
||||||
The best way to save a task pid and find a task by this
|
|
||||||
pid later, is to find this task's struct pid pointer (or get
|
|
||||||
it directly from the task) and call pid_task() later.
|
|
||||||
|
|
||||||
If someone really needs to get a task by its pid_t, then
|
|
||||||
he most likely needs the find_task_by_vpid() to get the
|
|
||||||
task from the same namespace as the current task is in, but
|
|
||||||
this may be not so in general.
|
|
||||||
|
|
||||||
Who: Pavel Emelyanov <xemul@openvz.org>
|
|
||||||
|
|
||||||
---------------------------
|
|
||||||
|
|
||||||
What: ACPI procfs interface
|
What: ACPI procfs interface
|
||||||
When: July 2008
|
When: July 2008
|
||||||
Why: ACPI sysfs conversion should be finished by January 2008.
|
Why: ACPI sysfs conversion should be finished by January 2008.
|
||||||
|
@ -222,13 +228,6 @@ Who: Thomas Gleixner <tglx@linutronix.de>
|
||||||
|
|
||||||
---------------------------
|
---------------------------
|
||||||
|
|
||||||
What: i2c-i810, i2c-prosavage and i2c-savage4
|
|
||||||
When: May 2008
|
|
||||||
Why: These drivers are superseded by i810fb, intelfb and savagefb.
|
|
||||||
Who: Jean Delvare <khali@linux-fr.org>
|
|
||||||
|
|
||||||
---------------------------
|
|
||||||
|
|
||||||
What (Why):
|
What (Why):
|
||||||
- include/linux/netfilter_ipv4/ipt_TOS.h ipt_tos.h header files
|
- include/linux/netfilter_ipv4/ipt_TOS.h ipt_tos.h header files
|
||||||
(superseded by xt_TOS/xt_tos target & match)
|
(superseded by xt_TOS/xt_tos target & match)
|
||||||
|
@ -307,8 +306,41 @@ Who: ocfs2-devel@oss.oracle.com
|
||||||
|
|
||||||
---------------------------
|
---------------------------
|
||||||
|
|
||||||
What: asm/semaphore.h
|
What: SCTP_GET_PEER_ADDRS_NUM_OLD, SCTP_GET_PEER_ADDRS_OLD,
|
||||||
When: 2.6.26
|
SCTP_GET_LOCAL_ADDRS_NUM_OLD, SCTP_GET_LOCAL_ADDRS_OLD
|
||||||
Why: Implementation became generic; users should now include
|
When: June 2009
|
||||||
linux/semaphore.h instead.
|
Why: A newer version of the options have been introduced in 2005 that
|
||||||
Who: Matthew Wilcox <willy@linux.intel.com>
|
removes the limitions of the old API. The sctp library has been
|
||||||
|
converted to use these new options at the same time. Any user
|
||||||
|
space app that directly uses the old options should convert to using
|
||||||
|
the new options.
|
||||||
|
Who: Vlad Yasevich <vladislav.yasevich@hp.com>
|
||||||
|
|
||||||
|
---------------------------
|
||||||
|
|
||||||
|
What: CONFIG_THERMAL_HWMON
|
||||||
|
When: January 2009
|
||||||
|
Why: This option was introduced just to allow older lm-sensors userspace
|
||||||
|
to keep working over the upgrade to 2.6.26. At the scheduled time of
|
||||||
|
removal fixed lm-sensors (2.x or 3.x) should be readily available.
|
||||||
|
Who: Rene Herman <rene.herman@gmail.com>
|
||||||
|
|
||||||
|
---------------------------
|
||||||
|
|
||||||
|
What: Code that is now under CONFIG_WIRELESS_EXT_SYSFS
|
||||||
|
(in net/core/net-sysfs.c)
|
||||||
|
When: After the only user (hal) has seen a release with the patches
|
||||||
|
for enough time, probably some time in 2010.
|
||||||
|
Why: Over 1K .text/.data size reduction, data is available in other
|
||||||
|
ways (ioctls)
|
||||||
|
Who: Johannes Berg <johannes@sipsolutions.net>
|
||||||
|
|
||||||
|
---------------------------
|
||||||
|
|
||||||
|
What: CONFIG_NF_CT_ACCT
|
||||||
|
When: 2.6.29
|
||||||
|
Why: Accounting can now be enabled/disabled without kernel recompilation.
|
||||||
|
Currently used only to set a default value for a feature that is also
|
||||||
|
controlled by a kernel/module/sysfs/sysctl parameter.
|
||||||
|
Who: Krzysztof Piotr Oledzki <ole@ans.pl>
|
||||||
|
|
||||||
|
|
|
@ -510,6 +510,7 @@ prototypes:
|
||||||
void (*close)(struct vm_area_struct*);
|
void (*close)(struct vm_area_struct*);
|
||||||
int (*fault)(struct vm_area_struct*, struct vm_fault *);
|
int (*fault)(struct vm_area_struct*, struct vm_fault *);
|
||||||
int (*page_mkwrite)(struct vm_area_struct *, struct page *);
|
int (*page_mkwrite)(struct vm_area_struct *, struct page *);
|
||||||
|
int (*access)(struct vm_area_struct *, unsigned long, void*, int, int);
|
||||||
|
|
||||||
locking rules:
|
locking rules:
|
||||||
BKL mmap_sem PageLocked(page)
|
BKL mmap_sem PageLocked(page)
|
||||||
|
@ -517,6 +518,7 @@ open: no yes
|
||||||
close: no yes
|
close: no yes
|
||||||
fault: no yes
|
fault: no yes
|
||||||
page_mkwrite: no yes no
|
page_mkwrite: no yes no
|
||||||
|
access: no yes
|
||||||
|
|
||||||
->page_mkwrite() is called when a previously read-only page is
|
->page_mkwrite() is called when a previously read-only page is
|
||||||
about to become writeable. The file system is responsible for
|
about to become writeable. The file system is responsible for
|
||||||
|
@ -525,6 +527,11 @@ taking to lock out truncate, the page range should be verified to be
|
||||||
within i_size. The page mapping should also be checked that it is not
|
within i_size. The page mapping should also be checked that it is not
|
||||||
NULL.
|
NULL.
|
||||||
|
|
||||||
|
->access() is called when get_user_pages() fails in
|
||||||
|
acces_process_vm(), typically used to debug a process through
|
||||||
|
/proc/pid/mem or ptrace. This function is needed only for
|
||||||
|
VM_IO | VM_PFNMAP VMAs.
|
||||||
|
|
||||||
================================================================================
|
================================================================================
|
||||||
Dubious stuff
|
Dubious stuff
|
||||||
|
|
||||||
|
|
|
@ -26,11 +26,11 @@ You can simplify mounting by just typing:
|
||||||
|
|
||||||
this will allocate the first available loopback device (and load loop.o
|
this will allocate the first available loopback device (and load loop.o
|
||||||
kernel module if necessary) automatically. If the loopback driver is not
|
kernel module if necessary) automatically. If the loopback driver is not
|
||||||
loaded automatically, make sure that your kernel is compiled with kmod
|
loaded automatically, make sure that you have compiled the module and
|
||||||
support (CONFIG_KMOD) enabled. Beware that umount will not
|
that modprobe is functioning. Beware that umount will not deallocate
|
||||||
deallocate /dev/loopN device if /etc/mtab file on your system is a
|
/dev/loopN device if /etc/mtab file on your system is a symbolic link to
|
||||||
symbolic link to /proc/mounts. You will need to do it manually using
|
/proc/mounts. You will need to do it manually using "-d" switch of
|
||||||
"-d" switch of losetup(8). Read losetup(8) manpage for more info.
|
losetup(8). Read losetup(8) manpage for more info.
|
||||||
|
|
||||||
To create the BFS image under UnixWare you need to find out first which
|
To create the BFS image under UnixWare you need to find out first which
|
||||||
slice contains it. The command prtvtoc(1M) is your friend:
|
slice contains it. The command prtvtoc(1M) is your friend:
|
||||||
|
|
|
@ -279,7 +279,7 @@ static struct config_item *simple_children_make_item(struct config_group *group,
|
||||||
|
|
||||||
simple_child = kzalloc(sizeof(struct simple_child), GFP_KERNEL);
|
simple_child = kzalloc(sizeof(struct simple_child), GFP_KERNEL);
|
||||||
if (!simple_child)
|
if (!simple_child)
|
||||||
return NULL;
|
return ERR_PTR(-ENOMEM);
|
||||||
|
|
||||||
|
|
||||||
config_item_init_type_name(&simple_child->item, name,
|
config_item_init_type_name(&simple_child->item, name,
|
||||||
|
@ -366,7 +366,7 @@ static struct config_group *group_children_make_group(struct config_group *group
|
||||||
simple_children = kzalloc(sizeof(struct simple_children),
|
simple_children = kzalloc(sizeof(struct simple_children),
|
||||||
GFP_KERNEL);
|
GFP_KERNEL);
|
||||||
if (!simple_children)
|
if (!simple_children)
|
||||||
return NULL;
|
return ERR_PTR(-ENOMEM);
|
||||||
|
|
||||||
|
|
||||||
config_group_init_type_name(&simple_children->group, name,
|
config_group_init_type_name(&simple_children->group, name,
|
||||||
|
|
|
@ -13,72 +13,93 @@ Mailing list: linux-ext4@vger.kernel.org
|
||||||
1. Quick usage instructions:
|
1. Quick usage instructions:
|
||||||
===========================
|
===========================
|
||||||
|
|
||||||
- Grab updated e2fsprogs from
|
- Compile and install the latest version of e2fsprogs (as of this
|
||||||
ftp://ftp.kernel.org/pub/linux/kernel/people/tytso/e2fsprogs-interim/
|
writing version 1.41) from:
|
||||||
This is a patchset on top of e2fsprogs-1.39, which can be found at
|
|
||||||
|
http://sourceforge.net/project/showfiles.php?group_id=2406
|
||||||
|
|
||||||
|
or
|
||||||
|
|
||||||
ftp://ftp.kernel.org/pub/linux/kernel/people/tytso/e2fsprogs/
|
ftp://ftp.kernel.org/pub/linux/kernel/people/tytso/e2fsprogs/
|
||||||
|
|
||||||
- It's still mke2fs -j /dev/hda1
|
or grab the latest git repository from:
|
||||||
|
|
||||||
- mount /dev/hda1 /wherever -t ext4dev
|
git://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git
|
||||||
|
|
||||||
- To enable extents,
|
- Create a new filesystem using the ext4dev filesystem type:
|
||||||
|
|
||||||
mount /dev/hda1 /wherever -t ext4dev -o extents
|
# mke2fs -t ext4dev /dev/hda1
|
||||||
|
|
||||||
- The filesystem is compatible with the ext3 driver until you add a file
|
Or configure an existing ext3 filesystem to support extents and set
|
||||||
which has extents (ie: `mount -o extents', then create a file).
|
the test_fs flag to indicate that it's ok for an in-development
|
||||||
|
filesystem to touch this filesystem:
|
||||||
|
|
||||||
NOTE: The "extents" mount flag is temporary. It will soon go away and
|
# tune2fs -O extents -E test_fs /dev/hda1
|
||||||
extents will be enabled by the "-o extents" flag to mke2fs or tune2fs
|
|
||||||
|
If the filesystem was created with 128 byte inodes, it can be
|
||||||
|
converted to use 256 byte for greater efficiency via:
|
||||||
|
|
||||||
|
# tune2fs -I 256 /dev/hda1
|
||||||
|
|
||||||
|
(Note: we currently do not have tools to convert an ext4dev
|
||||||
|
filesystem back to ext3; so please do not do try this on production
|
||||||
|
filesystems.)
|
||||||
|
|
||||||
|
- Mounting:
|
||||||
|
|
||||||
|
# mount -t ext4dev /dev/hda1 /wherever
|
||||||
|
|
||||||
- When comparing performance with other filesystems, remember that
|
- When comparing performance with other filesystems, remember that
|
||||||
ext3/4 by default offers higher data integrity guarantees than most. So
|
ext3/4 by default offers higher data integrity guarantees than most.
|
||||||
when comparing with a metadata-only journalling filesystem, use `mount -o
|
So when comparing with a metadata-only journalling filesystem, such
|
||||||
data=writeback'. And you might as well use `mount -o nobh' too along
|
as ext3, use `mount -o data=writeback'. And you might as well use
|
||||||
with it. Making the journal larger than the mke2fs default often helps
|
`mount -o nobh' too along with it. Making the journal larger than
|
||||||
performance with metadata-intensive workloads.
|
the mke2fs default often helps performance with metadata-intensive
|
||||||
|
workloads.
|
||||||
|
|
||||||
2. Features
|
2. Features
|
||||||
===========
|
===========
|
||||||
|
|
||||||
2.1 Currently available
|
2.1 Currently available
|
||||||
|
|
||||||
* ability to use filesystems > 16TB
|
* ability to use filesystems > 16TB (e2fsprogs support not available yet)
|
||||||
* extent format reduces metadata overhead (RAM, IO for access, transactions)
|
* extent format reduces metadata overhead (RAM, IO for access, transactions)
|
||||||
* extent format more robust in face of on-disk corruption due to magics,
|
* extent format more robust in face of on-disk corruption due to magics,
|
||||||
* internal redunancy in tree
|
* internal redunancy in tree
|
||||||
|
* improved file allocation (multi-block alloc)
|
||||||
2.1 Previously available, soon to be enabled by default by "mkefs.ext4":
|
* fix 32000 subdirectory limit
|
||||||
|
* nsec timestamps for mtime, atime, ctime, create time
|
||||||
* dir_index and resize inode will be on by default
|
* inode version field on disk (NFSv4, Lustre)
|
||||||
* large inodes will be used by default for fast EAs, nsec timestamps, etc
|
* reduced e2fsck time via uninit_bg feature
|
||||||
|
* journal checksumming for robustness, performance
|
||||||
|
* persistent file preallocation (e.g for streaming media, databases)
|
||||||
|
* ability to pack bitmaps and inode tables into larger virtual groups via the
|
||||||
|
flex_bg feature
|
||||||
|
* large file support
|
||||||
|
* Inode allocation using large virtual block groups via flex_bg
|
||||||
|
* delayed allocation
|
||||||
|
* large block (up to pagesize) support
|
||||||
|
* efficent new ordered mode in JBD2 and ext4(avoid using buffer head to force
|
||||||
|
the ordering)
|
||||||
|
|
||||||
2.2 Candidate features for future inclusion
|
2.2 Candidate features for future inclusion
|
||||||
|
|
||||||
There are several under discussion, whether they all make it in is
|
* Online defrag (patches available but not well tested)
|
||||||
partly a function of how much time everyone has to work on them:
|
* reduced mke2fs time via lazy itable initialization in conjuction with
|
||||||
|
the uninit_bg feature (capability to do this is available in e2fsprogs
|
||||||
|
but a kernel thread to do lazy zeroing of unused inode table blocks
|
||||||
|
after filesystem is first mounted is required for safety)
|
||||||
|
|
||||||
* improved file allocation (multi-block alloc, delayed alloc; basically done)
|
There are several others under discussion, whether they all make it in is
|
||||||
* fix 32000 subdirectory limit (patch exists, needs some e2fsck work)
|
partly a function of how much time everyone has to work on them. Features like
|
||||||
* nsec timestamps for mtime, atime, ctime, create time (patch exists,
|
metadata checksumming have been discussed and planned for a bit but no patches
|
||||||
needs some e2fsck work)
|
exist yet so I'm not sure they're in the near-term roadmap.
|
||||||
* inode version field on disk (NFSv4, Lustre; prototype exists)
|
|
||||||
* reduced mke2fs/e2fsck time via uninitialized groups (prototype exists)
|
|
||||||
* journal checksumming for robustness, performance (prototype exists)
|
|
||||||
* persistent file preallocation (e.g for streaming media, databases)
|
|
||||||
|
|
||||||
Features like metadata checksumming have been discussed and planned for
|
The big performance win will come with mballoc, delalloc and flex_bg
|
||||||
a bit but no patches exist yet so I'm not sure they're in the near-term
|
grouping of bitmaps and inode tables. Some test results available here:
|
||||||
roadmap.
|
|
||||||
|
|
||||||
The big performance win will come with mballoc and delalloc. CFS has
|
- http://www.bullopensource.org/ext4/20080530/ffsb-write-2.6.26-rc2.html
|
||||||
been using mballoc for a few years already with Lustre, and IBM + Bull
|
- http://www.bullopensource.org/ext4/20080530/ffsb-readwrite-2.6.26-rc2.html
|
||||||
did a lot of benchmarking on it. The reason it isn't in the first set of
|
|
||||||
patches is partly a manageability issue, and partly because it doesn't
|
|
||||||
directly affect the on-disk format (outside of much better allocation)
|
|
||||||
so it isn't critical to get into the first round of changes. I believe
|
|
||||||
Alex is working on a new set of patches right now.
|
|
||||||
|
|
||||||
3. Options
|
3. Options
|
||||||
==========
|
==========
|
||||||
|
@ -222,9 +243,11 @@ stripe=n Number of filesystem blocks that mballoc will try
|
||||||
to use for allocation size and alignment. For RAID5/6
|
to use for allocation size and alignment. For RAID5/6
|
||||||
systems this should be the number of data
|
systems this should be the number of data
|
||||||
disks * RAID chunk size in file system blocks.
|
disks * RAID chunk size in file system blocks.
|
||||||
|
delalloc (*) Deferring block allocation until write-out time.
|
||||||
|
nodelalloc Disable delayed allocation. Blocks are allocation
|
||||||
|
when data is copied from user to page cache.
|
||||||
Data Mode
|
Data Mode
|
||||||
---------
|
=========
|
||||||
There are 3 different data modes:
|
There are 3 different data modes:
|
||||||
|
|
||||||
* writeback mode
|
* writeback mode
|
||||||
|
@ -236,10 +259,10 @@ typically provide the best ext4 performance.
|
||||||
|
|
||||||
* ordered mode
|
* ordered mode
|
||||||
In data=ordered mode, ext4 only officially journals metadata, but it logically
|
In data=ordered mode, ext4 only officially journals metadata, but it logically
|
||||||
groups metadata and data blocks into a single unit called a transaction. When
|
groups metadata information related to data changes with the data blocks into a
|
||||||
it's time to write the new metadata out to disk, the associated data blocks
|
single unit called a transaction. When it's time to write the new metadata
|
||||||
are written first. In general, this mode performs slightly slower than
|
out to disk, the associated data blocks are written first. In general,
|
||||||
writeback but significantly faster than journal mode.
|
this mode performs slightly slower than writeback but significantly faster than journal mode.
|
||||||
|
|
||||||
* journal mode
|
* journal mode
|
||||||
data=journal mode provides full data and metadata journaling. All new data is
|
data=journal mode provides full data and metadata journaling. All new data is
|
||||||
|
@ -247,7 +270,8 @@ written to the journal first, and then to its final location.
|
||||||
In the event of a crash, the journal can be replayed, bringing both data and
|
In the event of a crash, the journal can be replayed, bringing both data and
|
||||||
metadata into a consistent state. This mode is the slowest except when data
|
metadata into a consistent state. This mode is the slowest except when data
|
||||||
needs to be read from and written to disk at the same time where it
|
needs to be read from and written to disk at the same time where it
|
||||||
outperforms all others modes.
|
outperforms all others modes. Curently ext4 does not have delayed
|
||||||
|
allocation support if this data journalling mode is selected.
|
||||||
|
|
||||||
References
|
References
|
||||||
==========
|
==========
|
||||||
|
@ -256,7 +280,8 @@ kernel source: <file:fs/ext4/>
|
||||||
<file:fs/jbd2/>
|
<file:fs/jbd2/>
|
||||||
|
|
||||||
programs: http://e2fsprogs.sourceforge.net/
|
programs: http://e2fsprogs.sourceforge.net/
|
||||||
http://ext2resize.sourceforge.net
|
|
||||||
|
|
||||||
useful links: http://fedoraproject.org/wiki/ext3-devel
|
useful links: http://fedoraproject.org/wiki/ext3-devel
|
||||||
http://www.bullopensource.org/ext4/
|
http://www.bullopensource.org/ext4/
|
||||||
|
http://ext4.wiki.kernel.org/index.php/Main_Page
|
||||||
|
http://fedoraproject.org/wiki/Features/Ext4
|
||||||
|
|
114
Documentation/filesystems/gfs2-glocks.txt
Normal file
114
Documentation/filesystems/gfs2-glocks.txt
Normal file
|
@ -0,0 +1,114 @@
|
||||||
|
Glock internal locking rules
|
||||||
|
------------------------------
|
||||||
|
|
||||||
|
This documents the basic principles of the glock state machine
|
||||||
|
internals. Each glock (struct gfs2_glock in fs/gfs2/incore.h)
|
||||||
|
has two main (internal) locks:
|
||||||
|
|
||||||
|
1. A spinlock (gl_spin) which protects the internal state such
|
||||||
|
as gl_state, gl_target and the list of holders (gl_holders)
|
||||||
|
2. A non-blocking bit lock, GLF_LOCK, which is used to prevent other
|
||||||
|
threads from making calls to the DLM, etc. at the same time. If a
|
||||||
|
thread takes this lock, it must then call run_queue (usually via the
|
||||||
|
workqueue) when it releases it in order to ensure any pending tasks
|
||||||
|
are completed.
|
||||||
|
|
||||||
|
The gl_holders list contains all the queued lock requests (not
|
||||||
|
just the holders) associated with the glock. If there are any
|
||||||
|
held locks, then they will be contiguous entries at the head
|
||||||
|
of the list. Locks are granted in strictly the order that they
|
||||||
|
are queued, except for those marked LM_FLAG_PRIORITY which are
|
||||||
|
used only during recovery, and even then only for journal locks.
|
||||||
|
|
||||||
|
There are three lock states that users of the glock layer can request,
|
||||||
|
namely shared (SH), deferred (DF) and exclusive (EX). Those translate
|
||||||
|
to the following DLM lock modes:
|
||||||
|
|
||||||
|
Glock mode | DLM lock mode
|
||||||
|
------------------------------
|
||||||
|
UN | IV/NL Unlocked (no DLM lock associated with glock) or NL
|
||||||
|
SH | PR (Protected read)
|
||||||
|
DF | CW (Concurrent write)
|
||||||
|
EX | EX (Exclusive)
|
||||||
|
|
||||||
|
Thus DF is basically a shared mode which is incompatible with the "normal"
|
||||||
|
shared lock mode, SH. In GFS2 the DF mode is used exclusively for direct I/O
|
||||||
|
operations. The glocks are basically a lock plus some routines which deal
|
||||||
|
with cache management. The following rules apply for the cache:
|
||||||
|
|
||||||
|
Glock mode | Cache data | Cache Metadata | Dirty Data | Dirty Metadata
|
||||||
|
--------------------------------------------------------------------------
|
||||||
|
UN | No | No | No | No
|
||||||
|
SH | Yes | Yes | No | No
|
||||||
|
DF | No | Yes | No | No
|
||||||
|
EX | Yes | Yes | Yes | Yes
|
||||||
|
|
||||||
|
These rules are implemented using the various glock operations which
|
||||||
|
are defined for each type of glock. Not all types of glocks use
|
||||||
|
all the modes. Only inode glocks use the DF mode for example.
|
||||||
|
|
||||||
|
Table of glock operations and per type constants:
|
||||||
|
|
||||||
|
Field | Purpose
|
||||||
|
----------------------------------------------------------------------------
|
||||||
|
go_xmote_th | Called before remote state change (e.g. to sync dirty data)
|
||||||
|
go_xmote_bh | Called after remote state change (e.g. to refill cache)
|
||||||
|
go_inval | Called if remote state change requires invalidating the cache
|
||||||
|
go_demote_ok | Returns boolean value of whether its ok to demote a glock
|
||||||
|
| (e.g. checks timeout, and that there is no cached data)
|
||||||
|
go_lock | Called for the first local holder of a lock
|
||||||
|
go_unlock | Called on the final local unlock of a lock
|
||||||
|
go_dump | Called to print content of object for debugfs file, or on
|
||||||
|
| error to dump glock to the log.
|
||||||
|
go_type; | The type of the glock, LM_TYPE_.....
|
||||||
|
go_min_hold_time | The minimum hold time
|
||||||
|
|
||||||
|
The minimum hold time for each lock is the time after a remote lock
|
||||||
|
grant for which we ignore remote demote requests. This is in order to
|
||||||
|
prevent a situation where locks are being bounced around the cluster
|
||||||
|
from node to node with none of the nodes making any progress. This
|
||||||
|
tends to show up most with shared mmaped files which are being written
|
||||||
|
to by multiple nodes. By delaying the demotion in response to a
|
||||||
|
remote callback, that gives the userspace program time to make
|
||||||
|
some progress before the pages are unmapped.
|
||||||
|
|
||||||
|
There is a plan to try and remove the go_lock and go_unlock callbacks
|
||||||
|
if possible, in order to try and speed up the fast path though the locking.
|
||||||
|
Also, eventually we hope to make the glock "EX" mode locally shared
|
||||||
|
such that any local locking will be done with the i_mutex as required
|
||||||
|
rather than via the glock.
|
||||||
|
|
||||||
|
Locking rules for glock operations:
|
||||||
|
|
||||||
|
Operation | GLF_LOCK bit lock held | gl_spin spinlock held
|
||||||
|
-----------------------------------------------------------------
|
||||||
|
go_xmote_th | Yes | No
|
||||||
|
go_xmote_bh | Yes | No
|
||||||
|
go_inval | Yes | No
|
||||||
|
go_demote_ok | Sometimes | Yes
|
||||||
|
go_lock | Yes | No
|
||||||
|
go_unlock | Yes | No
|
||||||
|
go_dump | Sometimes | Yes
|
||||||
|
|
||||||
|
N.B. Operations must not drop either the bit lock or the spinlock
|
||||||
|
if its held on entry. go_dump and do_demote_ok must never block.
|
||||||
|
Note that go_dump will only be called if the glock's state
|
||||||
|
indicates that it is caching uptodate data.
|
||||||
|
|
||||||
|
Glock locking order within GFS2:
|
||||||
|
|
||||||
|
1. i_mutex (if required)
|
||||||
|
2. Rename glock (for rename only)
|
||||||
|
3. Inode glock(s)
|
||||||
|
(Parents before children, inodes at "same level" with same parent in
|
||||||
|
lock number order)
|
||||||
|
4. Rgrp glock(s) (for (de)allocation operations)
|
||||||
|
5. Transaction glock (via gfs2_trans_begin) for non-read operations
|
||||||
|
6. Page lock (always last, very important!)
|
||||||
|
|
||||||
|
There are two glocks per inode. One deals with access to the inode
|
||||||
|
itself (locking order as above), and the other, known as the iopen
|
||||||
|
glock is used in conjunction with the i_nlink field in the inode to
|
||||||
|
determine the lifetime of the inode in question. Locking of inodes
|
||||||
|
is on a per-inode basis. Locking of rgrps is on a per rgrp basis.
|
||||||
|
|
|
@ -5,7 +5,7 @@
|
||||||
################################################################################
|
################################################################################
|
||||||
|
|
||||||
Author: NetApp and Open Grid Computing
|
Author: NetApp and Open Grid Computing
|
||||||
Date: April 15, 2008
|
Date: May 29, 2008
|
||||||
|
|
||||||
Table of Contents
|
Table of Contents
|
||||||
~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~
|
||||||
|
@ -60,16 +60,18 @@ Installation
|
||||||
The procedures described in this document have been tested with
|
The procedures described in this document have been tested with
|
||||||
distributions from Red Hat's Fedora Project (http://fedora.redhat.com/).
|
distributions from Red Hat's Fedora Project (http://fedora.redhat.com/).
|
||||||
|
|
||||||
- Install nfs-utils-1.1.1 or greater on the client
|
- Install nfs-utils-1.1.2 or greater on the client
|
||||||
|
|
||||||
An NFS/RDMA mount point can only be obtained by using the mount.nfs
|
An NFS/RDMA mount point can be obtained by using the mount.nfs command in
|
||||||
command in nfs-utils-1.1.1 or greater. To see which version of mount.nfs
|
nfs-utils-1.1.2 or greater (nfs-utils-1.1.1 was the first nfs-utils
|
||||||
you are using, type:
|
version with support for NFS/RDMA mounts, but for various reasons we
|
||||||
|
recommend using nfs-utils-1.1.2 or greater). To see which version of
|
||||||
|
mount.nfs you are using, type:
|
||||||
|
|
||||||
> /sbin/mount.nfs -V
|
$ /sbin/mount.nfs -V
|
||||||
|
|
||||||
If the version is less than 1.1.1 or the command does not exist,
|
If the version is less than 1.1.2 or the command does not exist,
|
||||||
then you will need to install the latest version of nfs-utils.
|
you should install the latest version of nfs-utils.
|
||||||
|
|
||||||
Download the latest package from:
|
Download the latest package from:
|
||||||
|
|
||||||
|
@ -77,22 +79,33 @@ Installation
|
||||||
|
|
||||||
Uncompress the package and follow the installation instructions.
|
Uncompress the package and follow the installation instructions.
|
||||||
|
|
||||||
If you will not be using GSS and NFSv4, the installation process
|
If you will not need the idmapper and gssd executables (you do not need
|
||||||
can be simplified by disabling these features when running configure:
|
these to create an NFS/RDMA enabled mount command), the installation
|
||||||
|
process can be simplified by disabling these features when running
|
||||||
|
configure:
|
||||||
|
|
||||||
> ./configure --disable-gss --disable-nfsv4
|
$ ./configure --disable-gss --disable-nfsv4
|
||||||
|
|
||||||
For more information on this see the package's README and INSTALL files.
|
To build nfs-utils you will need the tcp_wrappers package installed. For
|
||||||
|
more information on this see the package's README and INSTALL files.
|
||||||
|
|
||||||
After building the nfs-utils package, there will be a mount.nfs binary in
|
After building the nfs-utils package, there will be a mount.nfs binary in
|
||||||
the utils/mount directory. This binary can be used to initiate NFS v2, v3,
|
the utils/mount directory. This binary can be used to initiate NFS v2, v3,
|
||||||
or v4 mounts. To initiate a v4 mount, the binary must be called mount.nfs4.
|
or v4 mounts. To initiate a v4 mount, the binary must be called
|
||||||
The standard technique is to create a symlink called mount.nfs4 to mount.nfs.
|
mount.nfs4. The standard technique is to create a symlink called
|
||||||
|
mount.nfs4 to mount.nfs.
|
||||||
|
|
||||||
NOTE: mount.nfs and therefore nfs-utils-1.1.1 or greater is only needed
|
This mount.nfs binary should be installed at /sbin/mount.nfs as follows:
|
||||||
|
|
||||||
|
$ sudo cp utils/mount/mount.nfs /sbin/mount.nfs
|
||||||
|
|
||||||
|
In this location, mount.nfs will be invoked automatically for NFS mounts
|
||||||
|
by the system mount commmand.
|
||||||
|
|
||||||
|
NOTE: mount.nfs and therefore nfs-utils-1.1.2 or greater is only needed
|
||||||
on the NFS client machine. You do not need this specific version of
|
on the NFS client machine. You do not need this specific version of
|
||||||
nfs-utils on the server. Furthermore, only the mount.nfs command from
|
nfs-utils on the server. Furthermore, only the mount.nfs command from
|
||||||
nfs-utils-1.1.1 is needed on the client.
|
nfs-utils-1.1.2 is needed on the client.
|
||||||
|
|
||||||
- Install a Linux kernel with NFS/RDMA
|
- Install a Linux kernel with NFS/RDMA
|
||||||
|
|
||||||
|
@ -156,8 +169,8 @@ Check RDMA and NFS Setup
|
||||||
this time. For example, if you are using a Mellanox Tavor/Sinai/Arbel
|
this time. For example, if you are using a Mellanox Tavor/Sinai/Arbel
|
||||||
card:
|
card:
|
||||||
|
|
||||||
> modprobe ib_mthca
|
$ modprobe ib_mthca
|
||||||
> modprobe ib_ipoib
|
$ modprobe ib_ipoib
|
||||||
|
|
||||||
If you are using InfiniBand, make sure there is a Subnet Manager (SM)
|
If you are using InfiniBand, make sure there is a Subnet Manager (SM)
|
||||||
running on the network. If your IB switch has an embedded SM, you can
|
running on the network. If your IB switch has an embedded SM, you can
|
||||||
|
@ -166,7 +179,7 @@ Check RDMA and NFS Setup
|
||||||
|
|
||||||
If an SM is running on your network, you should see the following:
|
If an SM is running on your network, you should see the following:
|
||||||
|
|
||||||
> cat /sys/class/infiniband/driverX/ports/1/state
|
$ cat /sys/class/infiniband/driverX/ports/1/state
|
||||||
4: ACTIVE
|
4: ACTIVE
|
||||||
|
|
||||||
where driverX is mthca0, ipath5, ehca3, etc.
|
where driverX is mthca0, ipath5, ehca3, etc.
|
||||||
|
@ -174,10 +187,10 @@ Check RDMA and NFS Setup
|
||||||
To further test the InfiniBand software stack, use IPoIB (this
|
To further test the InfiniBand software stack, use IPoIB (this
|
||||||
assumes you have two IB hosts named host1 and host2):
|
assumes you have two IB hosts named host1 and host2):
|
||||||
|
|
||||||
host1> ifconfig ib0 a.b.c.x
|
host1$ ifconfig ib0 a.b.c.x
|
||||||
host2> ifconfig ib0 a.b.c.y
|
host2$ ifconfig ib0 a.b.c.y
|
||||||
host1> ping a.b.c.y
|
host1$ ping a.b.c.y
|
||||||
host2> ping a.b.c.x
|
host2$ ping a.b.c.x
|
||||||
|
|
||||||
For other device types, follow the appropriate procedures.
|
For other device types, follow the appropriate procedures.
|
||||||
|
|
||||||
|
@ -202,11 +215,11 @@ NFS/RDMA Setup
|
||||||
/vol0 192.168.0.47(fsid=0,rw,async,insecure,no_root_squash)
|
/vol0 192.168.0.47(fsid=0,rw,async,insecure,no_root_squash)
|
||||||
/vol0 192.168.0.0/255.255.255.0(fsid=0,rw,async,insecure,no_root_squash)
|
/vol0 192.168.0.0/255.255.255.0(fsid=0,rw,async,insecure,no_root_squash)
|
||||||
|
|
||||||
The IP address(es) is(are) the client's IPoIB address for an InfiniBand HCA or the
|
The IP address(es) is(are) the client's IPoIB address for an InfiniBand
|
||||||
cleint's iWARP address(es) for an RNIC.
|
HCA or the cleint's iWARP address(es) for an RNIC.
|
||||||
|
|
||||||
NOTE: The "insecure" option must be used because the NFS/RDMA client does not
|
NOTE: The "insecure" option must be used because the NFS/RDMA client does
|
||||||
use a reserved port.
|
not use a reserved port.
|
||||||
|
|
||||||
Each time a machine boots:
|
Each time a machine boots:
|
||||||
|
|
||||||
|
@ -214,43 +227,45 @@ NFS/RDMA Setup
|
||||||
|
|
||||||
For InfiniBand using a Mellanox adapter:
|
For InfiniBand using a Mellanox adapter:
|
||||||
|
|
||||||
> modprobe ib_mthca
|
$ modprobe ib_mthca
|
||||||
> modprobe ib_ipoib
|
$ modprobe ib_ipoib
|
||||||
> ifconfig ib0 a.b.c.d
|
$ ifconfig ib0 a.b.c.d
|
||||||
|
|
||||||
NOTE: use unique addresses for the client and server
|
NOTE: use unique addresses for the client and server
|
||||||
|
|
||||||
- Start the NFS server
|
- Start the NFS server
|
||||||
|
|
||||||
If the NFS/RDMA server was built as a module (CONFIG_SUNRPC_XPRT_RDMA=m in kernel config),
|
If the NFS/RDMA server was built as a module (CONFIG_SUNRPC_XPRT_RDMA=m in
|
||||||
load the RDMA transport module:
|
kernel config), load the RDMA transport module:
|
||||||
|
|
||||||
> modprobe svcrdma
|
$ modprobe svcrdma
|
||||||
|
|
||||||
Regardless of how the server was built (module or built-in), start the server:
|
Regardless of how the server was built (module or built-in), start the
|
||||||
|
server:
|
||||||
|
|
||||||
> /etc/init.d/nfs start
|
$ /etc/init.d/nfs start
|
||||||
|
|
||||||
or
|
or
|
||||||
|
|
||||||
> service nfs start
|
$ service nfs start
|
||||||
|
|
||||||
Instruct the server to listen on the RDMA transport:
|
Instruct the server to listen on the RDMA transport:
|
||||||
|
|
||||||
> echo rdma 2050 > /proc/fs/nfsd/portlist
|
$ echo rdma 2050 > /proc/fs/nfsd/portlist
|
||||||
|
|
||||||
- On the client system
|
- On the client system
|
||||||
|
|
||||||
If the NFS/RDMA client was built as a module (CONFIG_SUNRPC_XPRT_RDMA=m in kernel config),
|
If the NFS/RDMA client was built as a module (CONFIG_SUNRPC_XPRT_RDMA=m in
|
||||||
load the RDMA client module:
|
kernel config), load the RDMA client module:
|
||||||
|
|
||||||
> modprobe xprtrdma.ko
|
$ modprobe xprtrdma.ko
|
||||||
|
|
||||||
Regardless of how the client was built (module or built-in), issue the mount.nfs command:
|
Regardless of how the client was built (module or built-in), use this
|
||||||
|
command to mount the NFS/RDMA server:
|
||||||
|
|
||||||
> /path/to/your/mount.nfs <IPoIB-server-name-or-address>:/<export> /mnt -i -o rdma,port=2050
|
$ mount -o rdma,port=2050 <IPoIB-server-name-or-address>:/<export> /mnt
|
||||||
|
|
||||||
To verify that the mount is using RDMA, run "cat /proc/mounts" and check the
|
To verify that the mount is using RDMA, run "cat /proc/mounts" and check
|
||||||
"proto" field for the given mount.
|
the "proto" field for the given mount.
|
||||||
|
|
||||||
Congratulations! You're using NFS/RDMA!
|
Congratulations! You're using NFS/RDMA!
|
||||||
|
|
106
Documentation/filesystems/omfs.txt
Normal file
106
Documentation/filesystems/omfs.txt
Normal file
|
@ -0,0 +1,106 @@
|
||||||
|
Optimized MPEG Filesystem (OMFS)
|
||||||
|
|
||||||
|
Overview
|
||||||
|
========
|
||||||
|
|
||||||
|
OMFS is a filesystem created by SonicBlue for use in the ReplayTV DVR
|
||||||
|
and Rio Karma MP3 player. The filesystem is extent-based, utilizing
|
||||||
|
block sizes from 2k to 8k, with hash-based directories. This
|
||||||
|
filesystem driver may be used to read and write disks from these
|
||||||
|
devices.
|
||||||
|
|
||||||
|
Note, it is not recommended that this FS be used in place of a general
|
||||||
|
filesystem for your own streaming media device. Native Linux filesystems
|
||||||
|
will likely perform better.
|
||||||
|
|
||||||
|
More information is available at:
|
||||||
|
|
||||||
|
http://linux-karma.sf.net/
|
||||||
|
|
||||||
|
Various utilities, including mkomfs and omfsck, are included with
|
||||||
|
omfsprogs, available at:
|
||||||
|
|
||||||
|
http://bobcopeland.com/karma/
|
||||||
|
|
||||||
|
Instructions are included in its README.
|
||||||
|
|
||||||
|
Options
|
||||||
|
=======
|
||||||
|
|
||||||
|
OMFS supports the following mount-time options:
|
||||||
|
|
||||||
|
uid=n - make all files owned by specified user
|
||||||
|
gid=n - make all files owned by specified group
|
||||||
|
umask=xxx - set permission umask to xxx
|
||||||
|
fmask=xxx - set umask to xxx for files
|
||||||
|
dmask=xxx - set umask to xxx for directories
|
||||||
|
|
||||||
|
Disk format
|
||||||
|
===========
|
||||||
|
|
||||||
|
OMFS discriminates between "sysblocks" and normal data blocks. The sysblock
|
||||||
|
group consists of super block information, file metadata, directory structures,
|
||||||
|
and extents. Each sysblock has a header containing CRCs of the entire
|
||||||
|
sysblock, and may be mirrored in successive blocks on the disk. A sysblock may
|
||||||
|
have a smaller size than a data block, but since they are both addressed by the
|
||||||
|
same 64-bit block number, any remaining space in the smaller sysblock is
|
||||||
|
unused.
|
||||||
|
|
||||||
|
Sysblock header information:
|
||||||
|
|
||||||
|
struct omfs_header {
|
||||||
|
__be64 h_self; /* FS block where this is located */
|
||||||
|
__be32 h_body_size; /* size of useful data after header */
|
||||||
|
__be16 h_crc; /* crc-ccitt of body_size bytes */
|
||||||
|
char h_fill1[2];
|
||||||
|
u8 h_version; /* version, always 1 */
|
||||||
|
char h_type; /* OMFS_INODE_X */
|
||||||
|
u8 h_magic; /* OMFS_IMAGIC */
|
||||||
|
u8 h_check_xor; /* XOR of header bytes before this */
|
||||||
|
__be32 h_fill2;
|
||||||
|
};
|
||||||
|
|
||||||
|
Files and directories are both represented by omfs_inode:
|
||||||
|
|
||||||
|
struct omfs_inode {
|
||||||
|
struct omfs_header i_head; /* header */
|
||||||
|
__be64 i_parent; /* parent containing this inode */
|
||||||
|
__be64 i_sibling; /* next inode in hash bucket */
|
||||||
|
__be64 i_ctime; /* ctime, in milliseconds */
|
||||||
|
char i_fill1[35];
|
||||||
|
char i_type; /* OMFS_[DIR,FILE] */
|
||||||
|
__be32 i_fill2;
|
||||||
|
char i_fill3[64];
|
||||||
|
char i_name[OMFS_NAMELEN]; /* filename */
|
||||||
|
__be64 i_size; /* size of file, in bytes */
|
||||||
|
};
|
||||||
|
|
||||||
|
Directories in OMFS are implemented as a large hash table. Filenames are
|
||||||
|
hashed then prepended into the bucket list beginning at OMFS_DIR_START.
|
||||||
|
Lookup requires hashing the filename, then seeking across i_sibling pointers
|
||||||
|
until a match is found on i_name. Empty buckets are represented by block
|
||||||
|
pointers with all-1s (~0).
|
||||||
|
|
||||||
|
A file is an omfs_inode structure followed by an extent table beginning at
|
||||||
|
OMFS_EXTENT_START:
|
||||||
|
|
||||||
|
struct omfs_extent_entry {
|
||||||
|
__be64 e_cluster; /* start location of a set of blocks */
|
||||||
|
__be64 e_blocks; /* number of blocks after e_cluster */
|
||||||
|
};
|
||||||
|
|
||||||
|
struct omfs_extent {
|
||||||
|
__be64 e_next; /* next extent table location */
|
||||||
|
__be32 e_extent_count; /* total # extents in this table */
|
||||||
|
__be32 e_fill;
|
||||||
|
struct omfs_extent_entry e_entry; /* start of extent entries */
|
||||||
|
};
|
||||||
|
|
||||||
|
Each extent holds the block offset followed by number of blocks allocated to
|
||||||
|
the extent. The final extent in each table is a terminator with e_cluster
|
||||||
|
being ~0 and e_blocks being ones'-complement of the total number of blocks
|
||||||
|
in the table.
|
||||||
|
|
||||||
|
If this table overflows, a continuation inode is written and pointed to by
|
||||||
|
e_next. These have a header but lack the rest of the inode structure.
|
||||||
|
|
|
@ -296,6 +296,7 @@ Table 1-4: Kernel info in /proc
|
||||||
uptime System uptime
|
uptime System uptime
|
||||||
version Kernel version
|
version Kernel version
|
||||||
video bttv info of video resources (2.4)
|
video bttv info of video resources (2.4)
|
||||||
|
vmallocinfo Show vmalloced areas
|
||||||
..............................................................................
|
..............................................................................
|
||||||
|
|
||||||
You can, for example, check which interrupts are currently in use and what
|
You can, for example, check which interrupts are currently in use and what
|
||||||
|
@ -380,28 +381,35 @@ i386 and x86_64 platforms support the new IRQ vector displays.
|
||||||
Of some interest is the introduction of the /proc/irq directory to 2.4.
|
Of some interest is the introduction of the /proc/irq directory to 2.4.
|
||||||
It could be used to set IRQ to CPU affinity, this means that you can "hook" an
|
It could be used to set IRQ to CPU affinity, this means that you can "hook" an
|
||||||
IRQ to only one CPU, or to exclude a CPU of handling IRQs. The contents of the
|
IRQ to only one CPU, or to exclude a CPU of handling IRQs. The contents of the
|
||||||
irq subdir is one subdir for each IRQ, and one file; prof_cpu_mask
|
irq subdir is one subdir for each IRQ, and two files; default_smp_affinity and
|
||||||
|
prof_cpu_mask.
|
||||||
|
|
||||||
For example
|
For example
|
||||||
> ls /proc/irq/
|
> ls /proc/irq/
|
||||||
0 10 12 14 16 18 2 4 6 8 prof_cpu_mask
|
0 10 12 14 16 18 2 4 6 8 prof_cpu_mask
|
||||||
1 11 13 15 17 19 3 5 7 9
|
1 11 13 15 17 19 3 5 7 9 default_smp_affinity
|
||||||
> ls /proc/irq/0/
|
> ls /proc/irq/0/
|
||||||
smp_affinity
|
smp_affinity
|
||||||
|
|
||||||
The contents of the prof_cpu_mask file and each smp_affinity file for each IRQ
|
smp_affinity is a bitmask, in which you can specify which CPUs can handle the
|
||||||
is the same by default:
|
IRQ, you can set it by doing:
|
||||||
|
|
||||||
> cat /proc/irq/0/smp_affinity
|
> echo 1 > /proc/irq/10/smp_affinity
|
||||||
|
|
||||||
|
This means that only the first CPU will handle the IRQ, but you can also echo
|
||||||
|
5 which means that only the first and fourth CPU can handle the IRQ.
|
||||||
|
|
||||||
|
The contents of each smp_affinity file is the same by default:
|
||||||
|
|
||||||
|
> cat /proc/irq/0/smp_affinity
|
||||||
ffffffff
|
ffffffff
|
||||||
|
|
||||||
It's a bitmask, in which you can specify which CPUs can handle the IRQ, you can
|
The default_smp_affinity mask applies to all non-active IRQs, which are the
|
||||||
set it by doing:
|
IRQs which have not yet been allocated/activated, and hence which lack a
|
||||||
|
/proc/irq/[0-9]* directory.
|
||||||
|
|
||||||
> echo 1 > /proc/irq/prof_cpu_mask
|
prof_cpu_mask specifies which CPUs are to be profiled by the system wide
|
||||||
|
profiler. Default value is ffffffff (all cpus).
|
||||||
This means that only the first CPU will handle the IRQ, but you can also echo 5
|
|
||||||
which means that only the first and fourth CPU can handle the IRQ.
|
|
||||||
|
|
||||||
The way IRQs are routed is handled by the IO-APIC, and it's Round Robin
|
The way IRQs are routed is handled by the IO-APIC, and it's Round Robin
|
||||||
between all the CPUs which are allowed to handle it. As usual the kernel has
|
between all the CPUs which are allowed to handle it. As usual the kernel has
|
||||||
|
@ -550,6 +558,49 @@ VmallocTotal: total size of vmalloc memory area
|
||||||
VmallocUsed: amount of vmalloc area which is used
|
VmallocUsed: amount of vmalloc area which is used
|
||||||
VmallocChunk: largest contigious block of vmalloc area which is free
|
VmallocChunk: largest contigious block of vmalloc area which is free
|
||||||
|
|
||||||
|
..............................................................................
|
||||||
|
|
||||||
|
vmallocinfo:
|
||||||
|
|
||||||
|
Provides information about vmalloced/vmaped areas. One line per area,
|
||||||
|
containing the virtual address range of the area, size in bytes,
|
||||||
|
caller information of the creator, and optional information depending
|
||||||
|
on the kind of area :
|
||||||
|
|
||||||
|
pages=nr number of pages
|
||||||
|
phys=addr if a physical address was specified
|
||||||
|
ioremap I/O mapping (ioremap() and friends)
|
||||||
|
vmalloc vmalloc() area
|
||||||
|
vmap vmap()ed pages
|
||||||
|
user VM_USERMAP area
|
||||||
|
vpages buffer for pages pointers was vmalloced (huge area)
|
||||||
|
N<node>=nr (Only on NUMA kernels)
|
||||||
|
Number of pages allocated on memory node <node>
|
||||||
|
|
||||||
|
> cat /proc/vmallocinfo
|
||||||
|
0xffffc20000000000-0xffffc20000201000 2101248 alloc_large_system_hash+0x204 ...
|
||||||
|
/0x2c0 pages=512 vmalloc N0=128 N1=128 N2=128 N3=128
|
||||||
|
0xffffc20000201000-0xffffc20000302000 1052672 alloc_large_system_hash+0x204 ...
|
||||||
|
/0x2c0 pages=256 vmalloc N0=64 N1=64 N2=64 N3=64
|
||||||
|
0xffffc20000302000-0xffffc20000304000 8192 acpi_tb_verify_table+0x21/0x4f...
|
||||||
|
phys=7fee8000 ioremap
|
||||||
|
0xffffc20000304000-0xffffc20000307000 12288 acpi_tb_verify_table+0x21/0x4f...
|
||||||
|
phys=7fee7000 ioremap
|
||||||
|
0xffffc2000031d000-0xffffc2000031f000 8192 init_vdso_vars+0x112/0x210
|
||||||
|
0xffffc2000031f000-0xffffc2000032b000 49152 cramfs_uncompress_init+0x2e ...
|
||||||
|
/0x80 pages=11 vmalloc N0=3 N1=3 N2=2 N3=3
|
||||||
|
0xffffc2000033a000-0xffffc2000033d000 12288 sys_swapon+0x640/0xac0 ...
|
||||||
|
pages=2 vmalloc N1=2
|
||||||
|
0xffffc20000347000-0xffffc2000034c000 20480 xt_alloc_table_info+0xfe ...
|
||||||
|
/0x130 [x_tables] pages=4 vmalloc N0=4
|
||||||
|
0xffffffffa0000000-0xffffffffa000f000 61440 sys_init_module+0xc27/0x1d00 ...
|
||||||
|
pages=14 vmalloc N2=14
|
||||||
|
0xffffffffa000f000-0xffffffffa0014000 20480 sys_init_module+0xc27/0x1d00 ...
|
||||||
|
pages=4 vmalloc N1=4
|
||||||
|
0xffffffffa0014000-0xffffffffa0017000 12288 sys_init_module+0xc27/0x1d00 ...
|
||||||
|
pages=2 vmalloc N1=2
|
||||||
|
0xffffffffa0017000-0xffffffffa0022000 45056 sys_init_module+0xc27/0x1d00 ...
|
||||||
|
pages=10 vmalloc N0=10
|
||||||
|
|
||||||
1.3 IDE devices in /proc/ide
|
1.3 IDE devices in /proc/ide
|
||||||
----------------------------
|
----------------------------
|
||||||
|
@ -880,7 +931,7 @@ group_prealloc max_to_scan mb_groups mb_history min_to_scan order2_req
|
||||||
stats stream_req
|
stats stream_req
|
||||||
|
|
||||||
mb_groups:
|
mb_groups:
|
||||||
This file gives the details of mutiblock allocator buddy cache of free blocks
|
This file gives the details of multiblock allocator buddy cache of free blocks
|
||||||
|
|
||||||
mb_history:
|
mb_history:
|
||||||
Multiblock allocation history.
|
Multiblock allocation history.
|
||||||
|
@ -1423,7 +1474,7 @@ used because pages_free(1355) is smaller than watermark + protection[2]
|
||||||
normal page requirement. If requirement is DMA zone(index=0), protection[0]
|
normal page requirement. If requirement is DMA zone(index=0), protection[0]
|
||||||
(=0) is used.
|
(=0) is used.
|
||||||
|
|
||||||
zone[i]'s protection[j] is calculated by following exprssion.
|
zone[i]'s protection[j] is calculated by following expression.
|
||||||
|
|
||||||
(i < j):
|
(i < j):
|
||||||
zone[i]->protection[j]
|
zone[i]->protection[j]
|
||||||
|
|
|
@ -294,6 +294,16 @@ user-defined data with a channel, and is immediately available
|
||||||
(including in create_buf_file()) via chan->private_data or
|
(including in create_buf_file()) via chan->private_data or
|
||||||
buf->chan->private_data.
|
buf->chan->private_data.
|
||||||
|
|
||||||
|
Buffer-only channels
|
||||||
|
--------------------
|
||||||
|
|
||||||
|
These channels have no files associated and can be created with
|
||||||
|
relay_open(NULL, NULL, ...). Such channels are useful in scenarios such
|
||||||
|
as when doing early tracing in the kernel, before the VFS is up. In these
|
||||||
|
cases, one may open a buffer-only channel and then call
|
||||||
|
relay_late_setup_files() when the kernel is ready to handle files,
|
||||||
|
to expose the buffered data to the userspace.
|
||||||
|
|
||||||
Channel 'modes'
|
Channel 'modes'
|
||||||
---------------
|
---------------
|
||||||
|
|
||||||
|
|
|
@ -248,6 +248,7 @@ The top level sysfs directory looks like:
|
||||||
block/
|
block/
|
||||||
bus/
|
bus/
|
||||||
class/
|
class/
|
||||||
|
dev/
|
||||||
devices/
|
devices/
|
||||||
firmware/
|
firmware/
|
||||||
net/
|
net/
|
||||||
|
@ -274,6 +275,11 @@ fs/ contains a directory for some filesystems. Currently each
|
||||||
filesystem wanting to export attributes must create its own hierarchy
|
filesystem wanting to export attributes must create its own hierarchy
|
||||||
below fs/ (see ./fuse.txt for an example).
|
below fs/ (see ./fuse.txt for an example).
|
||||||
|
|
||||||
|
dev/ contains two directories char/ and block/. Inside these two
|
||||||
|
directories there are symlinks named <major>:<minor>. These symlinks
|
||||||
|
point to the sysfs directory for the given device. /sys/dev provides a
|
||||||
|
quick way to lookup the sysfs interface for a device from the result of
|
||||||
|
a stat(2) operation.
|
||||||
|
|
||||||
More information can driver-model specific features can be found in
|
More information can driver-model specific features can be found in
|
||||||
Documentation/driver-model/.
|
Documentation/driver-model/.
|
||||||
|
|
164
Documentation/filesystems/ubifs.txt
Normal file
164
Documentation/filesystems/ubifs.txt
Normal file
|
@ -0,0 +1,164 @@
|
||||||
|
Introduction
|
||||||
|
=============
|
||||||
|
|
||||||
|
UBIFS file-system stands for UBI File System. UBI stands for "Unsorted
|
||||||
|
Block Images". UBIFS is a flash file system, which means it is designed
|
||||||
|
to work with flash devices. It is important to understand, that UBIFS
|
||||||
|
is completely different to any traditional file-system in Linux, like
|
||||||
|
Ext2, XFS, JFS, etc. UBIFS represents a separate class of file-systems
|
||||||
|
which work with MTD devices, not block devices. The other Linux
|
||||||
|
file-system of this class is JFFS2.
|
||||||
|
|
||||||
|
To make it more clear, here is a small comparison of MTD devices and
|
||||||
|
block devices.
|
||||||
|
|
||||||
|
1 MTD devices represent flash devices and they consist of eraseblocks of
|
||||||
|
rather large size, typically about 128KiB. Block devices consist of
|
||||||
|
small blocks, typically 512 bytes.
|
||||||
|
2 MTD devices support 3 main operations - read from some offset within an
|
||||||
|
eraseblock, write to some offset within an eraseblock, and erase a whole
|
||||||
|
eraseblock. Block devices support 2 main operations - read a whole
|
||||||
|
block and write a whole block.
|
||||||
|
3 The whole eraseblock has to be erased before it becomes possible to
|
||||||
|
re-write its contents. Blocks may be just re-written.
|
||||||
|
4 Eraseblocks become worn out after some number of erase cycles -
|
||||||
|
typically 100K-1G for SLC NAND and NOR flashes, and 1K-10K for MLC
|
||||||
|
NAND flashes. Blocks do not have the wear-out property.
|
||||||
|
5 Eraseblocks may become bad (only on NAND flashes) and software should
|
||||||
|
deal with this. Blocks on hard drives typically do not become bad,
|
||||||
|
because hardware has mechanisms to substitute bad blocks, at least in
|
||||||
|
modern LBA disks.
|
||||||
|
|
||||||
|
It should be quite obvious why UBIFS is very different to traditional
|
||||||
|
file-systems.
|
||||||
|
|
||||||
|
UBIFS works on top of UBI. UBI is a separate software layer which may be
|
||||||
|
found in drivers/mtd/ubi. UBI is basically a volume management and
|
||||||
|
wear-leveling layer. It provides so called UBI volumes which is a higher
|
||||||
|
level abstraction than a MTD device. The programming model of UBI devices
|
||||||
|
is very similar to MTD devices - they still consist of large eraseblocks,
|
||||||
|
they have read/write/erase operations, but UBI devices are devoid of
|
||||||
|
limitations like wear and bad blocks (items 4 and 5 in the above list).
|
||||||
|
|
||||||
|
In a sense, UBIFS is a next generation of JFFS2 file-system, but it is
|
||||||
|
very different and incompatible to JFFS2. The following are the main
|
||||||
|
differences.
|
||||||
|
|
||||||
|
* JFFS2 works on top of MTD devices, UBIFS depends on UBI and works on
|
||||||
|
top of UBI volumes.
|
||||||
|
* JFFS2 does not have on-media index and has to build it while mounting,
|
||||||
|
which requires full media scan. UBIFS maintains the FS indexing
|
||||||
|
information on the flash media and does not require full media scan,
|
||||||
|
so it mounts many times faster than JFFS2.
|
||||||
|
* JFFS2 is a write-through file-system, while UBIFS supports write-back,
|
||||||
|
which makes UBIFS much faster on writes.
|
||||||
|
|
||||||
|
Similarly to JFFS2, UBIFS supports on-the-flight compression which makes
|
||||||
|
it possible to fit quite a lot of data to the flash.
|
||||||
|
|
||||||
|
Similarly to JFFS2, UBIFS is tolerant of unclean reboots and power-cuts.
|
||||||
|
It does not need stuff like ckfs.ext2. UBIFS automatically replays its
|
||||||
|
journal and recovers from crashes, ensuring that the on-flash data
|
||||||
|
structures are consistent.
|
||||||
|
|
||||||
|
UBIFS scales logarithmically (most of the data structures it uses are
|
||||||
|
trees), so the mount time and memory consumption do not linearly depend
|
||||||
|
on the flash size, like in case of JFFS2. This is because UBIFS
|
||||||
|
maintains the FS index on the flash media. However, UBIFS depends on
|
||||||
|
UBI, which scales linearly. So overall UBI/UBIFS stack scales linearly.
|
||||||
|
Nevertheless, UBI/UBIFS scales considerably better than JFFS2.
|
||||||
|
|
||||||
|
The authors of UBIFS believe, that it is possible to develop UBI2 which
|
||||||
|
would scale logarithmically as well. UBI2 would support the same API as UBI,
|
||||||
|
but it would be binary incompatible to UBI. So UBIFS would not need to be
|
||||||
|
changed to use UBI2
|
||||||
|
|
||||||
|
|
||||||
|
Mount options
|
||||||
|
=============
|
||||||
|
|
||||||
|
(*) == default.
|
||||||
|
|
||||||
|
norm_unmount (*) commit on unmount; the journal is committed
|
||||||
|
when the file-system is unmounted so that the
|
||||||
|
next mount does not have to replay the journal
|
||||||
|
and it becomes very fast;
|
||||||
|
fast_unmount do not commit on unmount; this option makes
|
||||||
|
unmount faster, but the next mount slower
|
||||||
|
because of the need to replay the journal.
|
||||||
|
|
||||||
|
|
||||||
|
Quick usage instructions
|
||||||
|
========================
|
||||||
|
|
||||||
|
The UBI volume to mount is specified using "ubiX_Y" or "ubiX:NAME" syntax,
|
||||||
|
where "X" is UBI device number, "Y" is UBI volume number, and "NAME" is
|
||||||
|
UBI volume name.
|
||||||
|
|
||||||
|
Mount volume 0 on UBI device 0 to /mnt/ubifs:
|
||||||
|
$ mount -t ubifs ubi0_0 /mnt/ubifs
|
||||||
|
|
||||||
|
Mount "rootfs" volume of UBI device 0 to /mnt/ubifs ("rootfs" is volume
|
||||||
|
name):
|
||||||
|
$ mount -t ubifs ubi0:rootfs /mnt/ubifs
|
||||||
|
|
||||||
|
The following is an example of the kernel boot arguments to attach mtd0
|
||||||
|
to UBI and mount volume "rootfs":
|
||||||
|
ubi.mtd=0 root=ubi0:rootfs rootfstype=ubifs
|
||||||
|
|
||||||
|
|
||||||
|
Module Parameters for Debugging
|
||||||
|
===============================
|
||||||
|
|
||||||
|
When UBIFS has been compiled with debugging enabled, there are 3 module
|
||||||
|
parameters that are available to control aspects of testing and debugging.
|
||||||
|
The parameters are unsigned integers where each bit controls an option.
|
||||||
|
The parameters are:
|
||||||
|
|
||||||
|
debug_msgs Selects which debug messages to display, as follows:
|
||||||
|
|
||||||
|
Message Type Flag value
|
||||||
|
|
||||||
|
General messages 1
|
||||||
|
Journal messages 2
|
||||||
|
Mount messages 4
|
||||||
|
Commit messages 8
|
||||||
|
LEB search messages 16
|
||||||
|
Budgeting messages 32
|
||||||
|
Garbage collection messages 64
|
||||||
|
Tree Node Cache (TNC) messages 128
|
||||||
|
LEB properties (lprops) messages 256
|
||||||
|
Input/output messages 512
|
||||||
|
Log messages 1024
|
||||||
|
Scan messages 2048
|
||||||
|
Recovery messages 4096
|
||||||
|
|
||||||
|
debug_chks Selects extra checks that UBIFS can do while running:
|
||||||
|
|
||||||
|
Check Flag value
|
||||||
|
|
||||||
|
General checks 1
|
||||||
|
Check Tree Node Cache (TNC) 2
|
||||||
|
Check indexing tree size 4
|
||||||
|
Check orphan area 8
|
||||||
|
Check old indexing tree 16
|
||||||
|
Check LEB properties (lprops) 32
|
||||||
|
Check leaf nodes and inodes 64
|
||||||
|
|
||||||
|
debug_tsts Selects a mode of testing, as follows:
|
||||||
|
|
||||||
|
Test mode Flag value
|
||||||
|
|
||||||
|
Force in-the-gaps method 2
|
||||||
|
Failure mode for recovery testing 4
|
||||||
|
|
||||||
|
For example, set debug_msgs to 5 to display General messages and Mount
|
||||||
|
messages.
|
||||||
|
|
||||||
|
|
||||||
|
References
|
||||||
|
==========
|
||||||
|
|
||||||
|
UBIFS documentation and FAQ/HOWTO at the MTD web site:
|
||||||
|
http://www.linux-mtd.infradead.org/doc/ubifs.html
|
||||||
|
http://www.linux-mtd.infradead.org/faq/ubifs.html
|
|
@ -96,6 +96,14 @@ shortname=lower|win95|winnt|mixed
|
||||||
emulate the Windows 95 rule for create.
|
emulate the Windows 95 rule for create.
|
||||||
Default setting is `lower'.
|
Default setting is `lower'.
|
||||||
|
|
||||||
|
tz=UTC -- Interpret timestamps as UTC rather than local time.
|
||||||
|
This option disables the conversion of timestamps
|
||||||
|
between local time (as used by Windows on FAT) and UTC
|
||||||
|
(which Linux uses internally). This is particuluarly
|
||||||
|
useful when mounting devices (like digital cameras)
|
||||||
|
that are set to UTC in order to avoid the pitfalls of
|
||||||
|
local time.
|
||||||
|
|
||||||
<bool>: 0,1,yes,no,true,false
|
<bool>: 0,1,yes,no,true,false
|
||||||
|
|
||||||
TODO
|
TODO
|
||||||
|
|
|
@ -143,7 +143,7 @@ struct file_system_type {
|
||||||
|
|
||||||
The get_sb() method has the following arguments:
|
The get_sb() method has the following arguments:
|
||||||
|
|
||||||
struct file_system_type *fs_type: decribes the filesystem, partly initialized
|
struct file_system_type *fs_type: describes the filesystem, partly initialized
|
||||||
by the specific filesystem code
|
by the specific filesystem code
|
||||||
|
|
||||||
int flags: mount flags
|
int flags: mount flags
|
||||||
|
@ -895,9 +895,9 @@ struct dentry_operations {
|
||||||
iput() yourself
|
iput() yourself
|
||||||
|
|
||||||
d_dname: called when the pathname of a dentry should be generated.
|
d_dname: called when the pathname of a dentry should be generated.
|
||||||
Usefull for some pseudo filesystems (sockfs, pipefs, ...) to delay
|
Useful for some pseudo filesystems (sockfs, pipefs, ...) to delay
|
||||||
pathname generation. (Instead of doing it when dentry is created,
|
pathname generation. (Instead of doing it when dentry is created,
|
||||||
its done only when the path is needed.). Real filesystems probably
|
it's done only when the path is needed.). Real filesystems probably
|
||||||
dont want to use it, because their dentries are present in global
|
dont want to use it, because their dentries are present in global
|
||||||
dcache hash, so their hash should be an invariant. As no lock is
|
dcache hash, so their hash should be an invariant. As no lock is
|
||||||
held, d_dname() should not try to modify the dentry itself, unless
|
held, d_dname() should not try to modify the dentry itself, unless
|
||||||
|
|
1360
Documentation/ftrace.txt
Normal file
1360
Documentation/ftrace.txt
Normal file
File diff suppressed because it is too large
Load diff
|
@ -347,15 +347,12 @@ necessarily be nonportable.
|
||||||
Dynamic definition of GPIOs is not currently standard; for example, as
|
Dynamic definition of GPIOs is not currently standard; for example, as
|
||||||
a side effect of configuring an add-on board with some GPIO expanders.
|
a side effect of configuring an add-on board with some GPIO expanders.
|
||||||
|
|
||||||
These calls are purely for kernel space, but a userspace API could be built
|
|
||||||
on top of them.
|
|
||||||
|
|
||||||
|
|
||||||
GPIO implementor's framework (OPTIONAL)
|
GPIO implementor's framework (OPTIONAL)
|
||||||
=======================================
|
=======================================
|
||||||
As noted earlier, there is an optional implementation framework making it
|
As noted earlier, there is an optional implementation framework making it
|
||||||
easier for platforms to support different kinds of GPIO controller using
|
easier for platforms to support different kinds of GPIO controller using
|
||||||
the same programming interface.
|
the same programming interface. This framework is called "gpiolib".
|
||||||
|
|
||||||
As a debugging aid, if debugfs is available a /sys/kernel/debug/gpio file
|
As a debugging aid, if debugfs is available a /sys/kernel/debug/gpio file
|
||||||
will be found there. That will list all the controllers registered through
|
will be found there. That will list all the controllers registered through
|
||||||
|
@ -392,11 +389,21 @@ either NULL or the label associated with that GPIO when it was requested.
|
||||||
|
|
||||||
Platform Support
|
Platform Support
|
||||||
----------------
|
----------------
|
||||||
To support this framework, a platform's Kconfig will "select HAVE_GPIO_LIB"
|
To support this framework, a platform's Kconfig will "select" either
|
||||||
|
ARCH_REQUIRE_GPIOLIB or ARCH_WANT_OPTIONAL_GPIOLIB
|
||||||
and arrange that its <asm/gpio.h> includes <asm-generic/gpio.h> and defines
|
and arrange that its <asm/gpio.h> includes <asm-generic/gpio.h> and defines
|
||||||
three functions: gpio_get_value(), gpio_set_value(), and gpio_cansleep().
|
three functions: gpio_get_value(), gpio_set_value(), and gpio_cansleep().
|
||||||
They may also want to provide a custom value for ARCH_NR_GPIOS.
|
They may also want to provide a custom value for ARCH_NR_GPIOS.
|
||||||
|
|
||||||
|
ARCH_REQUIRE_GPIOLIB means that the gpio-lib code will always get compiled
|
||||||
|
into the kernel on that architecture.
|
||||||
|
|
||||||
|
ARCH_WANT_OPTIONAL_GPIOLIB means the gpio-lib code defaults to off and the user
|
||||||
|
can enable it and build it into the kernel optionally.
|
||||||
|
|
||||||
|
If neither of these options are selected, the platform does not support
|
||||||
|
GPIOs through GPIO-lib and the code cannot be enabled by the user.
|
||||||
|
|
||||||
Trivial implementations of those functions can directly use framework
|
Trivial implementations of those functions can directly use framework
|
||||||
code, which always dispatches through the gpio_chip:
|
code, which always dispatches through the gpio_chip:
|
||||||
|
|
||||||
|
@ -439,4 +446,120 @@ becomes available. That may mean the device should not be registered until
|
||||||
calls for that GPIO can work. One way to address such dependencies is for
|
calls for that GPIO can work. One way to address such dependencies is for
|
||||||
such gpio_chip controllers to provide setup() and teardown() callbacks to
|
such gpio_chip controllers to provide setup() and teardown() callbacks to
|
||||||
board specific code; those board specific callbacks would register devices
|
board specific code; those board specific callbacks would register devices
|
||||||
once all the necessary resources are available.
|
once all the necessary resources are available, and remove them later when
|
||||||
|
the GPIO controller device becomes unavailable.
|
||||||
|
|
||||||
|
|
||||||
|
Sysfs Interface for Userspace (OPTIONAL)
|
||||||
|
========================================
|
||||||
|
Platforms which use the "gpiolib" implementors framework may choose to
|
||||||
|
configure a sysfs user interface to GPIOs. This is different from the
|
||||||
|
debugfs interface, since it provides control over GPIO direction and
|
||||||
|
value instead of just showing a gpio state summary. Plus, it could be
|
||||||
|
present on production systems without debugging support.
|
||||||
|
|
||||||
|
Given approprate hardware documentation for the system, userspace could
|
||||||
|
know for example that GPIO #23 controls the write protect line used to
|
||||||
|
protect boot loader segments in flash memory. System upgrade procedures
|
||||||
|
may need to temporarily remove that protection, first importing a GPIO,
|
||||||
|
then changing its output state, then updating the code before re-enabling
|
||||||
|
the write protection. In normal use, GPIO #23 would never be touched,
|
||||||
|
and the kernel would have no need to know about it.
|
||||||
|
|
||||||
|
Again depending on appropriate hardware documentation, on some systems
|
||||||
|
userspace GPIO can be used to determine system configuration data that
|
||||||
|
standard kernels won't know about. And for some tasks, simple userspace
|
||||||
|
GPIO drivers could be all that the system really needs.
|
||||||
|
|
||||||
|
Note that standard kernel drivers exist for common "LEDs and Buttons"
|
||||||
|
GPIO tasks: "leds-gpio" and "gpio_keys", respectively. Use those
|
||||||
|
instead of talking directly to the GPIOs; they integrate with kernel
|
||||||
|
frameworks better than your userspace code could.
|
||||||
|
|
||||||
|
|
||||||
|
Paths in Sysfs
|
||||||
|
--------------
|
||||||
|
There are three kinds of entry in /sys/class/gpio:
|
||||||
|
|
||||||
|
- Control interfaces used to get userspace control over GPIOs;
|
||||||
|
|
||||||
|
- GPIOs themselves; and
|
||||||
|
|
||||||
|
- GPIO controllers ("gpio_chip" instances).
|
||||||
|
|
||||||
|
That's in addition to standard files including the "device" symlink.
|
||||||
|
|
||||||
|
The control interfaces are write-only:
|
||||||
|
|
||||||
|
/sys/class/gpio/
|
||||||
|
|
||||||
|
"export" ... Userspace may ask the kernel to export control of
|
||||||
|
a GPIO to userspace by writing its number to this file.
|
||||||
|
|
||||||
|
Example: "echo 19 > export" will create a "gpio19" node
|
||||||
|
for GPIO #19, if that's not requested by kernel code.
|
||||||
|
|
||||||
|
"unexport" ... Reverses the effect of exporting to userspace.
|
||||||
|
|
||||||
|
Example: "echo 19 > unexport" will remove a "gpio19"
|
||||||
|
node exported using the "export" file.
|
||||||
|
|
||||||
|
GPIO signals have paths like /sys/class/gpio/gpio42/ (for GPIO #42)
|
||||||
|
and have the following read/write attributes:
|
||||||
|
|
||||||
|
/sys/class/gpio/gpioN/
|
||||||
|
|
||||||
|
"direction" ... reads as either "in" or "out". This value may
|
||||||
|
normally be written. Writing as "out" defaults to
|
||||||
|
initializing the value as low. To ensure glitch free
|
||||||
|
operation, values "low" and "high" may be written to
|
||||||
|
configure the GPIO as an output with that initial value.
|
||||||
|
|
||||||
|
Note that this attribute *will not exist* if the kernel
|
||||||
|
doesn't support changing the direction of a GPIO, or
|
||||||
|
it was exported by kernel code that didn't explicitly
|
||||||
|
allow userspace to reconfigure this GPIO's direction.
|
||||||
|
|
||||||
|
"value" ... reads as either 0 (low) or 1 (high). If the GPIO
|
||||||
|
is configured as an output, this value may be written;
|
||||||
|
any nonzero value is treated as high.
|
||||||
|
|
||||||
|
GPIO controllers have paths like /sys/class/gpio/chipchip42/ (for the
|
||||||
|
controller implementing GPIOs starting at #42) and have the following
|
||||||
|
read-only attributes:
|
||||||
|
|
||||||
|
/sys/class/gpio/gpiochipN/
|
||||||
|
|
||||||
|
"base" ... same as N, the first GPIO managed by this chip
|
||||||
|
|
||||||
|
"label" ... provided for diagnostics (not always unique)
|
||||||
|
|
||||||
|
"ngpio" ... how many GPIOs this manges (N to N + ngpio - 1)
|
||||||
|
|
||||||
|
Board documentation should in most cases cover what GPIOs are used for
|
||||||
|
what purposes. However, those numbers are not always stable; GPIOs on
|
||||||
|
a daughtercard might be different depending on the base board being used,
|
||||||
|
or other cards in the stack. In such cases, you may need to use the
|
||||||
|
gpiochip nodes (possibly in conjunction with schematics) to determine
|
||||||
|
the correct GPIO number to use for a given signal.
|
||||||
|
|
||||||
|
|
||||||
|
Exporting from Kernel code
|
||||||
|
--------------------------
|
||||||
|
Kernel code can explicitly manage exports of GPIOs which have already been
|
||||||
|
requested using gpio_request():
|
||||||
|
|
||||||
|
/* export the GPIO to userspace */
|
||||||
|
int gpio_export(unsigned gpio, bool direction_may_change);
|
||||||
|
|
||||||
|
/* reverse gpio_export() */
|
||||||
|
void gpio_unexport();
|
||||||
|
|
||||||
|
After a kernel driver requests a GPIO, it may only be made available in
|
||||||
|
the sysfs interface by gpio_export(). The driver can control whether the
|
||||||
|
signal direction may change. This helps drivers prevent userspace code
|
||||||
|
from accidentally clobbering important system state.
|
||||||
|
|
||||||
|
This explicit exporting can help with debugging (by making some kinds
|
||||||
|
of experiments easier), or can provide an always-there interface that's
|
||||||
|
suitable for documenting as part of a board support package.
|
||||||
|
|
|
@ -2,17 +2,12 @@ Naming and data format standards for sysfs files
|
||||||
------------------------------------------------
|
------------------------------------------------
|
||||||
|
|
||||||
The libsensors library offers an interface to the raw sensors data
|
The libsensors library offers an interface to the raw sensors data
|
||||||
through the sysfs interface. See libsensors documentation and source for
|
through the sysfs interface. Since lm-sensors 3.0.0, libsensors is
|
||||||
further information. As of writing this document, libsensors
|
completely chip-independent. It assumes that all the kernel drivers
|
||||||
(from lm_sensors 2.8.3) is heavily chip-dependent. Adding or updating
|
implement the standard sysfs interface described in this document.
|
||||||
support for any given chip requires modifying the library's code.
|
This makes adding or updating support for any given chip very easy, as
|
||||||
This is because libsensors was written for the procfs interface
|
libsensors, and applications using it, do not need to be modified.
|
||||||
older kernel modules were using, which wasn't standardized enough.
|
This is a major improvement compared to lm-sensors 2.
|
||||||
Recent versions of libsensors (from lm_sensors 2.8.2 and later) have
|
|
||||||
support for the sysfs interface, though.
|
|
||||||
|
|
||||||
The new sysfs interface was designed to be as chip-independent as
|
|
||||||
possible.
|
|
||||||
|
|
||||||
Note that motherboards vary widely in the connections to sensor chips.
|
Note that motherboards vary widely in the connections to sensor chips.
|
||||||
There is no standard that ensures, for example, that the second
|
There is no standard that ensures, for example, that the second
|
||||||
|
@ -35,19 +30,17 @@ access this data in a simple and consistent way. That said, such programs
|
||||||
will have to implement conversion, labeling and hiding of inputs. For
|
will have to implement conversion, labeling and hiding of inputs. For
|
||||||
this reason, it is still not recommended to bypass the library.
|
this reason, it is still not recommended to bypass the library.
|
||||||
|
|
||||||
If you are developing a userspace application please send us feedback on
|
|
||||||
this standard.
|
|
||||||
|
|
||||||
Note that this standard isn't completely established yet, so it is subject
|
|
||||||
to changes. If you are writing a new hardware monitoring driver those
|
|
||||||
features can't seem to fit in this interface, please contact us with your
|
|
||||||
extension proposal. Keep in mind that backward compatibility must be
|
|
||||||
preserved.
|
|
||||||
|
|
||||||
Each chip gets its own directory in the sysfs /sys/devices tree. To
|
Each chip gets its own directory in the sysfs /sys/devices tree. To
|
||||||
find all sensor chips, it is easier to follow the device symlinks from
|
find all sensor chips, it is easier to follow the device symlinks from
|
||||||
/sys/class/hwmon/hwmon*.
|
/sys/class/hwmon/hwmon*.
|
||||||
|
|
||||||
|
Up to lm-sensors 3.0.0, libsensors looks for hardware monitoring attributes
|
||||||
|
in the "physical" device directory. Since lm-sensors 3.0.1, attributes found
|
||||||
|
in the hwmon "class" device directory are also supported. Complex drivers
|
||||||
|
(e.g. drivers for multifunction chips) may want to use this possibility to
|
||||||
|
avoid namespace pollution. The only drawback will be that older versions of
|
||||||
|
libsensors won't support the driver in question.
|
||||||
|
|
||||||
All sysfs values are fixed point numbers.
|
All sysfs values are fixed point numbers.
|
||||||
|
|
||||||
There is only one value per file, unlike the older /proc specification.
|
There is only one value per file, unlike the older /proc specification.
|
||||||
|
|
|
@ -1,47 +0,0 @@
|
||||||
Kernel driver i2c-i810
|
|
||||||
|
|
||||||
Supported adapters:
|
|
||||||
* Intel 82810, 82810-DC100, 82810E, and 82815 (GMCH)
|
|
||||||
* Intel 82845G (GMCH)
|
|
||||||
|
|
||||||
Authors:
|
|
||||||
Frodo Looijaard <frodol@dds.nl>,
|
|
||||||
Philip Edelbrock <phil@netroedge.com>,
|
|
||||||
Kyösti Mälkki <kmalkki@cc.hut.fi>,
|
|
||||||
Ralph Metzler <rjkm@thp.uni-koeln.de>,
|
|
||||||
Mark D. Studebaker <mdsxyz123@yahoo.com>
|
|
||||||
|
|
||||||
Main contact: Mark Studebaker <mdsxyz123@yahoo.com>
|
|
||||||
|
|
||||||
Description
|
|
||||||
-----------
|
|
||||||
|
|
||||||
WARNING: If you have an '810' or '815' motherboard, your standard I2C
|
|
||||||
temperature sensors are most likely on the 801's I2C bus. You want the
|
|
||||||
i2c-i801 driver for those, not this driver.
|
|
||||||
|
|
||||||
Now for the i2c-i810...
|
|
||||||
|
|
||||||
The GMCH chip contains two I2C interfaces.
|
|
||||||
|
|
||||||
The first interface is used for DDC (Data Display Channel) which is a
|
|
||||||
serial channel through the VGA monitor connector to a DDC-compliant
|
|
||||||
monitor. This interface is defined by the Video Electronics Standards
|
|
||||||
Association (VESA). The standards are available for purchase at
|
|
||||||
http://www.vesa.org .
|
|
||||||
|
|
||||||
The second interface is a general-purpose I2C bus. It may be connected to a
|
|
||||||
TV-out chip such as the BT869 or possibly to a digital flat-panel display.
|
|
||||||
|
|
||||||
Features
|
|
||||||
--------
|
|
||||||
|
|
||||||
Both busses use the i2c-algo-bit driver for 'bit banging'
|
|
||||||
and support for specific transactions is provided by i2c-algo-bit.
|
|
||||||
|
|
||||||
Issues
|
|
||||||
------
|
|
||||||
|
|
||||||
If you enable bus testing in i2c-algo-bit (insmod i2c-algo-bit bit_test=1),
|
|
||||||
the test may fail; if so, the i2c-i810 driver won't be inserted. However,
|
|
||||||
we think this has been fixed.
|
|
|
@ -1,23 +0,0 @@
|
||||||
Kernel driver i2c-prosavage
|
|
||||||
|
|
||||||
Supported adapters:
|
|
||||||
|
|
||||||
S3/VIA KM266/VT8375 aka ProSavage8
|
|
||||||
S3/VIA KM133/VT8365 aka Savage4
|
|
||||||
|
|
||||||
Author: Henk Vergonet <henk@god.dyndns.org>
|
|
||||||
|
|
||||||
Description
|
|
||||||
-----------
|
|
||||||
|
|
||||||
The Savage4 chips contain two I2C interfaces (aka a I2C 'master' or
|
|
||||||
'host').
|
|
||||||
|
|
||||||
The first interface is used for DDC (Data Display Channel) which is a
|
|
||||||
serial channel through the VGA monitor connector to a DDC-compliant
|
|
||||||
monitor. This interface is defined by the Video Electronics Standards
|
|
||||||
Association (VESA). The standards are available for purchase at
|
|
||||||
http://www.vesa.org . The second interface is a general-purpose I2C bus.
|
|
||||||
|
|
||||||
Usefull for gaining access to the TV Encoder chips.
|
|
||||||
|
|
|
@ -1,26 +0,0 @@
|
||||||
Kernel driver i2c-savage4
|
|
||||||
|
|
||||||
Supported adapters:
|
|
||||||
* Savage4
|
|
||||||
* Savage2000
|
|
||||||
|
|
||||||
Authors:
|
|
||||||
Alexander Wold <awold@bigfoot.com>,
|
|
||||||
Mark D. Studebaker <mdsxyz123@yahoo.com>
|
|
||||||
|
|
||||||
Description
|
|
||||||
-----------
|
|
||||||
|
|
||||||
The Savage4 chips contain two I2C interfaces (aka a I2C 'master'
|
|
||||||
or 'host').
|
|
||||||
|
|
||||||
The first interface is used for DDC (Data Display Channel) which is a
|
|
||||||
serial channel through the VGA monitor connector to a DDC-compliant
|
|
||||||
monitor. This interface is defined by the Video Electronics Standards
|
|
||||||
Association (VESA). The standards are available for purchase at
|
|
||||||
http://www.vesa.org . The DDC bus is not yet supported because its register
|
|
||||||
is not directly memory-mapped.
|
|
||||||
|
|
||||||
The second interface is a general-purpose I2C bus. This is the only
|
|
||||||
interface supported by the driver at the moment.
|
|
||||||
|
|
|
@ -49,7 +49,7 @@ $ modprobe max6875 force=0,0x50
|
||||||
|
|
||||||
The MAX6874/MAX6875 ignores address bit 0, so this driver attaches to multiple
|
The MAX6874/MAX6875 ignores address bit 0, so this driver attaches to multiple
|
||||||
addresses. For example, for address 0x50, it also reserves 0x51.
|
addresses. For example, for address 0x50, it also reserves 0x51.
|
||||||
The even-address instance is called 'max6875', the odd one is 'max6875 subclient'.
|
The even-address instance is called 'max6875', the odd one is 'dummy'.
|
||||||
|
|
||||||
|
|
||||||
Programming the chip using i2c-dev
|
Programming the chip using i2c-dev
|
||||||
|
|
|
@ -7,7 +7,7 @@ drivers/gpio/pca9539.c instead.
|
||||||
Supported chips:
|
Supported chips:
|
||||||
* Philips PCA9539
|
* Philips PCA9539
|
||||||
Prefix: 'pca9539'
|
Prefix: 'pca9539'
|
||||||
Addresses scanned: 0x74 - 0x77
|
Addresses scanned: none
|
||||||
Datasheet:
|
Datasheet:
|
||||||
http://www.semiconductors.philips.com/acrobat/datasheets/PCA9539_2.pdf
|
http://www.semiconductors.philips.com/acrobat/datasheets/PCA9539_2.pdf
|
||||||
|
|
||||||
|
@ -23,6 +23,14 @@ The input sense can also be inverted.
|
||||||
The 16 lines are split between two bytes.
|
The 16 lines are split between two bytes.
|
||||||
|
|
||||||
|
|
||||||
|
Detection
|
||||||
|
---------
|
||||||
|
|
||||||
|
The PCA9539 is difficult to detect and not commonly found in PC machines,
|
||||||
|
so you have to pass the I2C bus and address of the installed PCA9539
|
||||||
|
devices explicitly to the driver at load time via the force=... parameter.
|
||||||
|
|
||||||
|
|
||||||
Sysfs entries
|
Sysfs entries
|
||||||
-------------
|
-------------
|
||||||
|
|
||||||
|
|
|
@ -4,13 +4,13 @@ Kernel driver pcf8574
|
||||||
Supported chips:
|
Supported chips:
|
||||||
* Philips PCF8574
|
* Philips PCF8574
|
||||||
Prefix: 'pcf8574'
|
Prefix: 'pcf8574'
|
||||||
Addresses scanned: I2C 0x20 - 0x27
|
Addresses scanned: none
|
||||||
Datasheet: Publicly available at the Philips Semiconductors website
|
Datasheet: Publicly available at the Philips Semiconductors website
|
||||||
http://www.semiconductors.philips.com/pip/PCF8574P.html
|
http://www.semiconductors.philips.com/pip/PCF8574P.html
|
||||||
|
|
||||||
* Philips PCF8574A
|
* Philips PCF8574A
|
||||||
Prefix: 'pcf8574a'
|
Prefix: 'pcf8574a'
|
||||||
Addresses scanned: I2C 0x38 - 0x3f
|
Addresses scanned: none
|
||||||
Datasheet: Publicly available at the Philips Semiconductors website
|
Datasheet: Publicly available at the Philips Semiconductors website
|
||||||
http://www.semiconductors.philips.com/pip/PCF8574P.html
|
http://www.semiconductors.philips.com/pip/PCF8574P.html
|
||||||
|
|
||||||
|
@ -38,12 +38,10 @@ For more informations see the datasheet.
|
||||||
Accessing PCF8574(A) via /sys interface
|
Accessing PCF8574(A) via /sys interface
|
||||||
-------------------------------------
|
-------------------------------------
|
||||||
|
|
||||||
! Be careful !
|
|
||||||
The PCF8574(A) is plainly impossible to detect ! Stupid chip.
|
The PCF8574(A) is plainly impossible to detect ! Stupid chip.
|
||||||
So every chip with address in the interval [20..27] and [38..3f] are
|
So, you have to pass the I2C bus and address of the installed PCF857A
|
||||||
detected as PCF8574(A). If you have other chips in this address
|
and PCF8574A devices explicitly to the driver at load time via the
|
||||||
range, the workaround is to load this module after the one
|
force=... parameter.
|
||||||
for your others chips.
|
|
||||||
|
|
||||||
On detection (i.e. insmod, modprobe et al.), directories are being
|
On detection (i.e. insmod, modprobe et al.), directories are being
|
||||||
created for each detected PCF8574(A):
|
created for each detected PCF8574(A):
|
||||||
|
|
|
@ -40,12 +40,9 @@ Detection
|
||||||
---------
|
---------
|
||||||
|
|
||||||
There is no method known to detect whether a chip on a given I2C address is
|
There is no method known to detect whether a chip on a given I2C address is
|
||||||
a PCF8575 or whether it is any other I2C device. So there are two alternatives
|
a PCF8575 or whether it is any other I2C device, so you have to pass the I2C
|
||||||
to let the driver find the installed PCF8575 devices:
|
bus and address of the installed PCF8575 devices explicitly to the driver at
|
||||||
- Load this driver after any other I2C driver for I2C devices with addresses
|
load time via the force=... parameter.
|
||||||
in the range 0x20 .. 0x27.
|
|
||||||
- Pass the I2C bus and address of the installed PCF8575 devices explicitly to
|
|
||||||
the driver at load time via the probe=... or force=... parameters.
|
|
||||||
|
|
||||||
/sys interface
|
/sys interface
|
||||||
--------------
|
--------------
|
||||||
|
|
127
Documentation/i2c/fault-codes
Normal file
127
Documentation/i2c/fault-codes
Normal file
|
@ -0,0 +1,127 @@
|
||||||
|
This is a summary of the most important conventions for use of fault
|
||||||
|
codes in the I2C/SMBus stack.
|
||||||
|
|
||||||
|
|
||||||
|
A "Fault" is not always an "Error"
|
||||||
|
----------------------------------
|
||||||
|
Not all fault reports imply errors; "page faults" should be a familiar
|
||||||
|
example. Software often retries idempotent operations after transient
|
||||||
|
faults. There may be fancier recovery schemes that are appropriate in
|
||||||
|
some cases, such as re-initializing (and maybe resetting). After such
|
||||||
|
recovery, triggered by a fault report, there is no error.
|
||||||
|
|
||||||
|
In a similar way, sometimes a "fault" code just reports one defined
|
||||||
|
result for an operation ... it doesn't indicate that anything is wrong
|
||||||
|
at all, just that the outcome wasn't on the "golden path".
|
||||||
|
|
||||||
|
In short, your I2C driver code may need to know these codes in order
|
||||||
|
to respond correctly. Other code may need to rely on YOUR code reporting
|
||||||
|
the right fault code, so that it can (in turn) behave correctly.
|
||||||
|
|
||||||
|
|
||||||
|
I2C and SMBus fault codes
|
||||||
|
-------------------------
|
||||||
|
These are returned as negative numbers from most calls, with zero or
|
||||||
|
some positive number indicating a non-fault return. The specific
|
||||||
|
numbers associated with these symbols differ between architectures,
|
||||||
|
though most Linux systems use <asm-generic/errno*.h> numbering.
|
||||||
|
|
||||||
|
Note that the descriptions here are not exhaustive. There are other
|
||||||
|
codes that may be returned, and other cases where these codes should
|
||||||
|
be returned. However, drivers should not return other codes for these
|
||||||
|
cases (unless the hardware doesn't provide unique fault reports).
|
||||||
|
|
||||||
|
Also, codes returned by adapter probe methods follow rules which are
|
||||||
|
specific to their host bus (such as PCI, or the platform bus).
|
||||||
|
|
||||||
|
|
||||||
|
EAGAIN
|
||||||
|
Returned by I2C adapters when they lose arbitration in master
|
||||||
|
transmit mode: some other master was transmitting different
|
||||||
|
data at the same time.
|
||||||
|
|
||||||
|
Also returned when trying to invoke an I2C operation in an
|
||||||
|
atomic context, when some task is already using that I2C bus
|
||||||
|
to execute some other operation.
|
||||||
|
|
||||||
|
EBADMSG
|
||||||
|
Returned by SMBus logic when an invalid Packet Error Code byte
|
||||||
|
is received. This code is a CRC covering all bytes in the
|
||||||
|
transaction, and is sent before the terminating STOP. This
|
||||||
|
fault is only reported on read transactions; the SMBus slave
|
||||||
|
may have a way to report PEC mismatches on writes from the
|
||||||
|
host. Note that even if PECs are in use, you should not rely
|
||||||
|
on these as the only way to detect incorrect data transfers.
|
||||||
|
|
||||||
|
EBUSY
|
||||||
|
Returned by SMBus adapters when the bus was busy for longer
|
||||||
|
than allowed. This usually indicates some device (maybe the
|
||||||
|
SMBus adapter) needs some fault recovery (such as resetting),
|
||||||
|
or that the reset was attempted but failed.
|
||||||
|
|
||||||
|
EINVAL
|
||||||
|
This rather vague error means an invalid parameter has been
|
||||||
|
detected before any I/O operation was started. Use a more
|
||||||
|
specific fault code when you can.
|
||||||
|
|
||||||
|
One example would be a driver trying an SMBus Block Write
|
||||||
|
with block size outside the range of 1-32 bytes.
|
||||||
|
|
||||||
|
EIO
|
||||||
|
This rather vague error means something went wrong when
|
||||||
|
performing an I/O operation. Use a more specific fault
|
||||||
|
code when you can.
|
||||||
|
|
||||||
|
ENODEV
|
||||||
|
Returned by driver probe() methods. This is a bit more
|
||||||
|
specific than ENXIO, implying the problem isn't with the
|
||||||
|
address, but with the device found there. Driver probes
|
||||||
|
may verify the device returns *correct* responses, and
|
||||||
|
return this as appropriate. (The driver core will warn
|
||||||
|
about probe faults other than ENXIO and ENODEV.)
|
||||||
|
|
||||||
|
ENOMEM
|
||||||
|
Returned by any component that can't allocate memory when
|
||||||
|
it needs to do so.
|
||||||
|
|
||||||
|
ENXIO
|
||||||
|
Returned by I2C adapters to indicate that the address phase
|
||||||
|
of a transfer didn't get an ACK. While it might just mean
|
||||||
|
an I2C device was temporarily not responding, usually it
|
||||||
|
means there's nothing listening at that address.
|
||||||
|
|
||||||
|
Returned by driver probe() methods to indicate that they
|
||||||
|
found no device to bind to. (ENODEV may also be used.)
|
||||||
|
|
||||||
|
EOPNOTSUPP
|
||||||
|
Returned by an adapter when asked to perform an operation
|
||||||
|
that it doesn't, or can't, support.
|
||||||
|
|
||||||
|
For example, this would be returned when an adapter that
|
||||||
|
doesn't support SMBus block transfers is asked to execute
|
||||||
|
one. In that case, the driver making that request should
|
||||||
|
have verified that functionality was supported before it
|
||||||
|
made that block transfer request.
|
||||||
|
|
||||||
|
Similarly, if an I2C adapter can't execute all legal I2C
|
||||||
|
messages, it should return this when asked to perform a
|
||||||
|
transaction it can't. (These limitations can't be seen in
|
||||||
|
the adapter's functionality mask, since the assumption is
|
||||||
|
that if an adapter supports I2C it supports all of I2C.)
|
||||||
|
|
||||||
|
EPROTO
|
||||||
|
Returned when slave does not conform to the relevant I2C
|
||||||
|
or SMBus (or chip-specific) protocol specifications. One
|
||||||
|
case is when the length of an SMBus block data response
|
||||||
|
(from the SMBus slave) is outside the range 1-32 bytes.
|
||||||
|
|
||||||
|
ETIMEDOUT
|
||||||
|
This is returned by drivers when an operation took too much
|
||||||
|
time, and was aborted before it completed.
|
||||||
|
|
||||||
|
SMBus adapters may return it when an operation took more
|
||||||
|
time than allowed by the SMBus specification; for example,
|
||||||
|
when a slave stretches clocks too far. I2C has no such
|
||||||
|
timeouts, but it's normal for I2C adapters to impose some
|
||||||
|
arbitrary limits (much longer than SMBus!) too.
|
||||||
|
|
|
@ -42,8 +42,8 @@ Count (8 bits): A data byte containing the length of a block operation.
|
||||||
[..]: Data sent by I2C device, as opposed to data sent by the host adapter.
|
[..]: Data sent by I2C device, as opposed to data sent by the host adapter.
|
||||||
|
|
||||||
|
|
||||||
SMBus Quick Command: i2c_smbus_write_quick()
|
SMBus Quick Command
|
||||||
=============================================
|
===================
|
||||||
|
|
||||||
This sends a single bit to the device, at the place of the Rd/Wr bit.
|
This sends a single bit to the device, at the place of the Rd/Wr bit.
|
||||||
|
|
||||||
|
|
281
Documentation/i2c/upgrading-clients
Normal file
281
Documentation/i2c/upgrading-clients
Normal file
|
@ -0,0 +1,281 @@
|
||||||
|
Upgrading I2C Drivers to the new 2.6 Driver Model
|
||||||
|
=================================================
|
||||||
|
|
||||||
|
Ben Dooks <ben-linux@fluff.org>
|
||||||
|
|
||||||
|
Introduction
|
||||||
|
------------
|
||||||
|
|
||||||
|
This guide outlines how to alter existing Linux 2.6 client drivers from
|
||||||
|
the old to the new new binding methods.
|
||||||
|
|
||||||
|
|
||||||
|
Example old-style driver
|
||||||
|
------------------------
|
||||||
|
|
||||||
|
|
||||||
|
struct example_state {
|
||||||
|
struct i2c_client client;
|
||||||
|
....
|
||||||
|
};
|
||||||
|
|
||||||
|
static struct i2c_driver example_driver;
|
||||||
|
|
||||||
|
static unsigned short ignore[] = { I2C_CLIENT_END };
|
||||||
|
static unsigned short normal_addr[] = { OUR_ADDR, I2C_CLIENT_END };
|
||||||
|
|
||||||
|
I2C_CLIENT_INSMOD;
|
||||||
|
|
||||||
|
static int example_attach(struct i2c_adapter *adap, int addr, int kind)
|
||||||
|
{
|
||||||
|
struct example_state *state;
|
||||||
|
struct device *dev = &adap->dev; /* to use for dev_ reports */
|
||||||
|
int ret;
|
||||||
|
|
||||||
|
state = kzalloc(sizeof(struct example_state), GFP_KERNEL);
|
||||||
|
if (state == NULL) {
|
||||||
|
dev_err(dev, "failed to create our state\n");
|
||||||
|
return -ENOMEM;
|
||||||
|
}
|
||||||
|
|
||||||
|
example->client.addr = addr;
|
||||||
|
example->client.flags = 0;
|
||||||
|
example->client.adapter = adap;
|
||||||
|
|
||||||
|
i2c_set_clientdata(&state->i2c_client, state);
|
||||||
|
strlcpy(client->i2c_client.name, "example", I2C_NAME_SIZE);
|
||||||
|
|
||||||
|
ret = i2c_attach_client(&state->i2c_client);
|
||||||
|
if (ret < 0) {
|
||||||
|
dev_err(dev, "failed to attach client\n");
|
||||||
|
kfree(state);
|
||||||
|
return ret;
|
||||||
|
}
|
||||||
|
|
||||||
|
dev = &state->i2c_client.dev;
|
||||||
|
|
||||||
|
/* rest of the initialisation goes here. */
|
||||||
|
|
||||||
|
dev_info(dev, "example client created\n");
|
||||||
|
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
static int __devexit example_detach(struct i2c_client *client)
|
||||||
|
{
|
||||||
|
struct example_state *state = i2c_get_clientdata(client);
|
||||||
|
|
||||||
|
i2c_detach_client(client);
|
||||||
|
kfree(state);
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
static int example_attach_adapter(struct i2c_adapter *adap)
|
||||||
|
{
|
||||||
|
return i2c_probe(adap, &addr_data, example_attach);
|
||||||
|
}
|
||||||
|
|
||||||
|
static struct i2c_driver example_driver = {
|
||||||
|
.driver = {
|
||||||
|
.owner = THIS_MODULE,
|
||||||
|
.name = "example",
|
||||||
|
},
|
||||||
|
.attach_adapter = example_attach_adapter,
|
||||||
|
.detach_client = __devexit_p(example_detach),
|
||||||
|
.suspend = example_suspend,
|
||||||
|
.resume = example_resume,
|
||||||
|
};
|
||||||
|
|
||||||
|
|
||||||
|
Updating the client
|
||||||
|
-------------------
|
||||||
|
|
||||||
|
The new style binding model will check against a list of supported
|
||||||
|
devices and their associated address supplied by the code registering
|
||||||
|
the busses. This means that the driver .attach_adapter and
|
||||||
|
.detach_adapter methods can be removed, along with the addr_data,
|
||||||
|
as follows:
|
||||||
|
|
||||||
|
- static struct i2c_driver example_driver;
|
||||||
|
|
||||||
|
- static unsigned short ignore[] = { I2C_CLIENT_END };
|
||||||
|
- static unsigned short normal_addr[] = { OUR_ADDR, I2C_CLIENT_END };
|
||||||
|
|
||||||
|
- I2C_CLIENT_INSMOD;
|
||||||
|
|
||||||
|
- static int example_attach_adapter(struct i2c_adapter *adap)
|
||||||
|
- {
|
||||||
|
- return i2c_probe(adap, &addr_data, example_attach);
|
||||||
|
- }
|
||||||
|
|
||||||
|
static struct i2c_driver example_driver = {
|
||||||
|
- .attach_adapter = example_attach_adapter,
|
||||||
|
- .detach_client = __devexit_p(example_detach),
|
||||||
|
}
|
||||||
|
|
||||||
|
Add the probe and remove methods to the i2c_driver, as so:
|
||||||
|
|
||||||
|
static struct i2c_driver example_driver = {
|
||||||
|
+ .probe = example_probe,
|
||||||
|
+ .remove = __devexit_p(example_remove),
|
||||||
|
}
|
||||||
|
|
||||||
|
Change the example_attach method to accept the new parameters
|
||||||
|
which include the i2c_client that it will be working with:
|
||||||
|
|
||||||
|
- static int example_attach(struct i2c_adapter *adap, int addr, int kind)
|
||||||
|
+ static int example_probe(struct i2c_client *client,
|
||||||
|
+ const struct i2c_device_id *id)
|
||||||
|
|
||||||
|
Change the name of example_attach to example_probe to align it with the
|
||||||
|
i2c_driver entry names. The rest of the probe routine will now need to be
|
||||||
|
changed as the i2c_client has already been setup for use.
|
||||||
|
|
||||||
|
The necessary client fields have already been setup before
|
||||||
|
the probe function is called, so the following client setup
|
||||||
|
can be removed:
|
||||||
|
|
||||||
|
- example->client.addr = addr;
|
||||||
|
- example->client.flags = 0;
|
||||||
|
- example->client.adapter = adap;
|
||||||
|
-
|
||||||
|
- strlcpy(client->i2c_client.name, "example", I2C_NAME_SIZE);
|
||||||
|
|
||||||
|
The i2c_set_clientdata is now:
|
||||||
|
|
||||||
|
- i2c_set_clientdata(&state->client, state);
|
||||||
|
+ i2c_set_clientdata(client, state);
|
||||||
|
|
||||||
|
The call to i2c_attach_client is no longer needed, if the probe
|
||||||
|
routine exits successfully, then the driver will be automatically
|
||||||
|
attached by the core. Change the probe routine as so:
|
||||||
|
|
||||||
|
- ret = i2c_attach_client(&state->i2c_client);
|
||||||
|
- if (ret < 0) {
|
||||||
|
- dev_err(dev, "failed to attach client\n");
|
||||||
|
- kfree(state);
|
||||||
|
- return ret;
|
||||||
|
- }
|
||||||
|
|
||||||
|
|
||||||
|
Remove the storage of 'struct i2c_client' from the 'struct example_state'
|
||||||
|
as we are provided with the i2c_client in our example_probe. Instead we
|
||||||
|
store a pointer to it for when it is needed.
|
||||||
|
|
||||||
|
struct example_state {
|
||||||
|
- struct i2c_client client;
|
||||||
|
+ struct i2c_client *client;
|
||||||
|
|
||||||
|
the new i2c client as so:
|
||||||
|
|
||||||
|
- struct device *dev = &adap->dev; /* to use for dev_ reports */
|
||||||
|
+ struct device *dev = &i2c_client->dev; /* to use for dev_ reports */
|
||||||
|
|
||||||
|
And remove the change after our client is attached, as the driver no
|
||||||
|
longer needs to register a new client structure with the core:
|
||||||
|
|
||||||
|
- dev = &state->i2c_client.dev;
|
||||||
|
|
||||||
|
In the probe routine, ensure that the new state has the client stored
|
||||||
|
in it:
|
||||||
|
|
||||||
|
static int example_probe(struct i2c_client *i2c_client,
|
||||||
|
const struct i2c_device_id *id)
|
||||||
|
{
|
||||||
|
struct example_state *state;
|
||||||
|
struct device *dev = &i2c_client->dev;
|
||||||
|
int ret;
|
||||||
|
|
||||||
|
state = kzalloc(sizeof(struct example_state), GFP_KERNEL);
|
||||||
|
if (state == NULL) {
|
||||||
|
dev_err(dev, "failed to create our state\n");
|
||||||
|
return -ENOMEM;
|
||||||
|
}
|
||||||
|
|
||||||
|
+ state->client = i2c_client;
|
||||||
|
|
||||||
|
Update the detach method, by changing the name to _remove and
|
||||||
|
to delete the i2c_detach_client call. It is possible that you
|
||||||
|
can also remove the ret variable as it is not not needed for
|
||||||
|
any of the core functions.
|
||||||
|
|
||||||
|
- static int __devexit example_detach(struct i2c_client *client)
|
||||||
|
+ static int __devexit example_remove(struct i2c_client *client)
|
||||||
|
{
|
||||||
|
struct example_state *state = i2c_get_clientdata(client);
|
||||||
|
|
||||||
|
- i2c_detach_client(client);
|
||||||
|
|
||||||
|
And finally ensure that we have the correct ID table for the i2c-core
|
||||||
|
and other utilities:
|
||||||
|
|
||||||
|
+ struct i2c_device_id example_idtable[] = {
|
||||||
|
+ { "example", 0 },
|
||||||
|
+ { }
|
||||||
|
+};
|
||||||
|
+
|
||||||
|
+MODULE_DEVICE_TABLE(i2c, example_idtable);
|
||||||
|
|
||||||
|
static struct i2c_driver example_driver = {
|
||||||
|
.driver = {
|
||||||
|
.owner = THIS_MODULE,
|
||||||
|
.name = "example",
|
||||||
|
},
|
||||||
|
+ .id_table = example_ids,
|
||||||
|
|
||||||
|
|
||||||
|
Our driver should now look like this:
|
||||||
|
|
||||||
|
struct example_state {
|
||||||
|
struct i2c_client *client;
|
||||||
|
....
|
||||||
|
};
|
||||||
|
|
||||||
|
static int example_probe(struct i2c_client *client,
|
||||||
|
const struct i2c_device_id *id)
|
||||||
|
{
|
||||||
|
struct example_state *state;
|
||||||
|
struct device *dev = &client->dev;
|
||||||
|
|
||||||
|
state = kzalloc(sizeof(struct example_state), GFP_KERNEL);
|
||||||
|
if (state == NULL) {
|
||||||
|
dev_err(dev, "failed to create our state\n");
|
||||||
|
return -ENOMEM;
|
||||||
|
}
|
||||||
|
|
||||||
|
state->client = client;
|
||||||
|
i2c_set_clientdata(client, state);
|
||||||
|
|
||||||
|
/* rest of the initialisation goes here. */
|
||||||
|
|
||||||
|
dev_info(dev, "example client created\n");
|
||||||
|
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
static int __devexit example_remove(struct i2c_client *client)
|
||||||
|
{
|
||||||
|
struct example_state *state = i2c_get_clientdata(client);
|
||||||
|
|
||||||
|
kfree(state);
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
static struct i2c_device_id example_idtable[] = {
|
||||||
|
{ "example", 0 },
|
||||||
|
{ }
|
||||||
|
};
|
||||||
|
|
||||||
|
MODULE_DEVICE_TABLE(i2c, example_idtable);
|
||||||
|
|
||||||
|
static struct i2c_driver example_driver = {
|
||||||
|
.driver = {
|
||||||
|
.owner = THIS_MODULE,
|
||||||
|
.name = "example",
|
||||||
|
},
|
||||||
|
.id_table = example_idtable,
|
||||||
|
.probe = example_probe,
|
||||||
|
.remove = __devexit_p(example_remove),
|
||||||
|
.suspend = example_suspend,
|
||||||
|
.resume = example_resume,
|
||||||
|
};
|
|
@ -25,14 +25,29 @@ routines, and should be zero-initialized except for fields with data you
|
||||||
provide. A client structure holds device-specific information like the
|
provide. A client structure holds device-specific information like the
|
||||||
driver model device node, and its I2C address.
|
driver model device node, and its I2C address.
|
||||||
|
|
||||||
|
/* iff driver uses driver model ("new style") binding model: */
|
||||||
|
|
||||||
|
static struct i2c_device_id foo_idtable[] = {
|
||||||
|
{ "foo", my_id_for_foo },
|
||||||
|
{ "bar", my_id_for_bar },
|
||||||
|
{ }
|
||||||
|
};
|
||||||
|
|
||||||
|
MODULE_DEVICE_TABLE(i2c, foo_idtable);
|
||||||
|
|
||||||
static struct i2c_driver foo_driver = {
|
static struct i2c_driver foo_driver = {
|
||||||
.driver = {
|
.driver = {
|
||||||
.name = "foo",
|
.name = "foo",
|
||||||
},
|
},
|
||||||
|
|
||||||
/* iff driver uses driver model ("new style") binding model: */
|
/* iff driver uses driver model ("new style") binding model: */
|
||||||
|
.id_table = foo_ids,
|
||||||
.probe = foo_probe,
|
.probe = foo_probe,
|
||||||
.remove = foo_remove,
|
.remove = foo_remove,
|
||||||
|
/* if device autodetection is needed: */
|
||||||
|
.class = I2C_CLASS_SOMETHING,
|
||||||
|
.detect = foo_detect,
|
||||||
|
.address_data = &addr_data,
|
||||||
|
|
||||||
/* else, driver uses "legacy" binding model: */
|
/* else, driver uses "legacy" binding model: */
|
||||||
.attach_adapter = foo_attach_adapter,
|
.attach_adapter = foo_attach_adapter,
|
||||||
|
@ -173,10 +188,9 @@ handle may be used during foo_probe(). If foo_probe() reports success
|
||||||
(zero not a negative status code) it may save the handle and use it until
|
(zero not a negative status code) it may save the handle and use it until
|
||||||
foo_remove() returns. That binding model is used by most Linux drivers.
|
foo_remove() returns. That binding model is used by most Linux drivers.
|
||||||
|
|
||||||
Drivers match devices when i2c_client.driver_name and the driver name are
|
The probe function is called when an entry in the id_table name field
|
||||||
the same; this approach is used in several other busses that don't have
|
matches the device's name. It is passed the entry that was matched so
|
||||||
device typing support in the hardware. The driver and module name should
|
the driver knows which one in the table matched.
|
||||||
match, so hotplug/coldplug mechanisms will modprobe the driver.
|
|
||||||
|
|
||||||
|
|
||||||
Device Creation (Standard driver model)
|
Device Creation (Standard driver model)
|
||||||
|
@ -207,6 +221,31 @@ in the I2C bus driver. You may want to save the returned i2c_client
|
||||||
reference for later use.
|
reference for later use.
|
||||||
|
|
||||||
|
|
||||||
|
Device Detection (Standard driver model)
|
||||||
|
----------------------------------------
|
||||||
|
|
||||||
|
Sometimes you do not know in advance which I2C devices are connected to
|
||||||
|
a given I2C bus. This is for example the case of hardware monitoring
|
||||||
|
devices on a PC's SMBus. In that case, you may want to let your driver
|
||||||
|
detect supported devices automatically. This is how the legacy model
|
||||||
|
was working, and is now available as an extension to the standard
|
||||||
|
driver model (so that we can finally get rid of the legacy model.)
|
||||||
|
|
||||||
|
You simply have to define a detect callback which will attempt to
|
||||||
|
identify supported devices (returning 0 for supported ones and -ENODEV
|
||||||
|
for unsupported ones), a list of addresses to probe, and a device type
|
||||||
|
(or class) so that only I2C buses which may have that type of device
|
||||||
|
connected (and not otherwise enumerated) will be probed. The i2c
|
||||||
|
core will then call you back as needed and will instantiate a device
|
||||||
|
for you for every successful detection.
|
||||||
|
|
||||||
|
Note that this mechanism is purely optional and not suitable for all
|
||||||
|
devices. You need some reliable way to identify the supported devices
|
||||||
|
(typically using device-specific, dedicated identification registers),
|
||||||
|
otherwise misdetections are likely to occur and things can get wrong
|
||||||
|
quickly.
|
||||||
|
|
||||||
|
|
||||||
Device Deletion (Standard driver model)
|
Device Deletion (Standard driver model)
|
||||||
---------------------------------------
|
---------------------------------------
|
||||||
|
|
||||||
|
@ -559,7 +598,6 @@ SMBus communication
|
||||||
in terms of it. Never use this function directly!
|
in terms of it. Never use this function directly!
|
||||||
|
|
||||||
|
|
||||||
extern s32 i2c_smbus_write_quick(struct i2c_client * client, u8 value);
|
|
||||||
extern s32 i2c_smbus_read_byte(struct i2c_client * client);
|
extern s32 i2c_smbus_read_byte(struct i2c_client * client);
|
||||||
extern s32 i2c_smbus_write_byte(struct i2c_client * client, u8 value);
|
extern s32 i2c_smbus_write_byte(struct i2c_client * client, u8 value);
|
||||||
extern s32 i2c_smbus_read_byte_data(struct i2c_client * client, u8 command);
|
extern s32 i2c_smbus_read_byte_data(struct i2c_client * client, u8 command);
|
||||||
|
@ -568,30 +606,31 @@ SMBus communication
|
||||||
extern s32 i2c_smbus_read_word_data(struct i2c_client * client, u8 command);
|
extern s32 i2c_smbus_read_word_data(struct i2c_client * client, u8 command);
|
||||||
extern s32 i2c_smbus_write_word_data(struct i2c_client * client,
|
extern s32 i2c_smbus_write_word_data(struct i2c_client * client,
|
||||||
u8 command, u16 value);
|
u8 command, u16 value);
|
||||||
|
extern s32 i2c_smbus_read_block_data(struct i2c_client * client,
|
||||||
|
u8 command, u8 *values);
|
||||||
extern s32 i2c_smbus_write_block_data(struct i2c_client * client,
|
extern s32 i2c_smbus_write_block_data(struct i2c_client * client,
|
||||||
u8 command, u8 length,
|
u8 command, u8 length,
|
||||||
u8 *values);
|
u8 *values);
|
||||||
extern s32 i2c_smbus_read_i2c_block_data(struct i2c_client * client,
|
extern s32 i2c_smbus_read_i2c_block_data(struct i2c_client * client,
|
||||||
u8 command, u8 length, u8 *values);
|
u8 command, u8 length, u8 *values);
|
||||||
|
|
||||||
These ones were removed in Linux 2.6.10 because they had no users, but could
|
|
||||||
be added back later if needed:
|
|
||||||
|
|
||||||
extern s32 i2c_smbus_read_block_data(struct i2c_client * client,
|
|
||||||
u8 command, u8 *values);
|
|
||||||
extern s32 i2c_smbus_write_i2c_block_data(struct i2c_client * client,
|
extern s32 i2c_smbus_write_i2c_block_data(struct i2c_client * client,
|
||||||
u8 command, u8 length,
|
u8 command, u8 length,
|
||||||
u8 *values);
|
u8 *values);
|
||||||
|
|
||||||
|
These ones were removed from i2c-core because they had no users, but could
|
||||||
|
be added back later if needed:
|
||||||
|
|
||||||
|
extern s32 i2c_smbus_write_quick(struct i2c_client * client, u8 value);
|
||||||
extern s32 i2c_smbus_process_call(struct i2c_client * client,
|
extern s32 i2c_smbus_process_call(struct i2c_client * client,
|
||||||
u8 command, u16 value);
|
u8 command, u16 value);
|
||||||
extern s32 i2c_smbus_block_process_call(struct i2c_client *client,
|
extern s32 i2c_smbus_block_process_call(struct i2c_client *client,
|
||||||
u8 command, u8 length,
|
u8 command, u8 length,
|
||||||
u8 *values)
|
u8 *values)
|
||||||
|
|
||||||
All these transactions return -1 on failure. The 'write' transactions
|
All these transactions return a negative errno value on failure. The 'write'
|
||||||
return 0 on success; the 'read' transactions return the read value, except
|
transactions return 0 on success; the 'read' transactions return the read
|
||||||
for read_block, which returns the number of values read. The block buffers
|
value, except for block transactions, which return the number of values
|
||||||
need not be longer than 32 bytes.
|
read. The block buffers need not be longer than 32 bytes.
|
||||||
|
|
||||||
You can read the file `smbus-protocol' for more information about the
|
You can read the file `smbus-protocol' for more information about the
|
||||||
actual SMBus protocol.
|
actual SMBus protocol.
|
||||||
|
|
|
@ -50,9 +50,9 @@ Note: For step 2, please make sure that host page size == TARGET_PAGE_SIZE of qe
|
||||||
/usr/local/bin/qemu-system-ia64 -smp xx -m 512 -hda $your_image
|
/usr/local/bin/qemu-system-ia64 -smp xx -m 512 -hda $your_image
|
||||||
(xx is the number of virtual processors for the guest, now the maximum value is 4)
|
(xx is the number of virtual processors for the guest, now the maximum value is 4)
|
||||||
|
|
||||||
5. Known possibile issue on some platforms with old Firmware.
|
5. Known possible issue on some platforms with old Firmware.
|
||||||
|
|
||||||
If meet strange host crashe issues, try to solve it through either of the following ways:
|
In the event of strange host crash issues, try to solve it through either of the following ways:
|
||||||
|
|
||||||
(1): Upgrade your Firmware to the latest one.
|
(1): Upgrade your Firmware to the latest one.
|
||||||
|
|
||||||
|
@ -65,8 +65,8 @@ index 0b53344..f02b0f7 100644
|
||||||
mov ar.pfs = loc1
|
mov ar.pfs = loc1
|
||||||
mov rp = loc0
|
mov rp = loc0
|
||||||
;;
|
;;
|
||||||
- srlz.d // seralize restoration of psr.l
|
- srlz.d // serialize restoration of psr.l
|
||||||
+ srlz.i // seralize restoration of psr.l
|
+ srlz.i // serialize restoration of psr.l
|
||||||
+ ;;
|
+ ;;
|
||||||
br.ret.sptk.many b0
|
br.ret.sptk.many b0
|
||||||
END(ia64_pal_call_static)
|
END(ia64_pal_call_static)
|
||||||
|
|
137
Documentation/ia64/paravirt_ops.txt
Normal file
137
Documentation/ia64/paravirt_ops.txt
Normal file
|
@ -0,0 +1,137 @@
|
||||||
|
Paravirt_ops on IA64
|
||||||
|
====================
|
||||||
|
21 May 2008, Isaku Yamahata <yamahata@valinux.co.jp>
|
||||||
|
|
||||||
|
|
||||||
|
Introduction
|
||||||
|
------------
|
||||||
|
The aim of this documentation is to help with maintainability and/or to
|
||||||
|
encourage people to use paravirt_ops/IA64.
|
||||||
|
|
||||||
|
paravirt_ops (pv_ops in short) is a way for virtualization support of
|
||||||
|
Linux kernel on x86. Several ways for virtualization support were
|
||||||
|
proposed, paravirt_ops is the winner.
|
||||||
|
On the other hand, now there are also several IA64 virtualization
|
||||||
|
technologies like kvm/IA64, xen/IA64 and many other academic IA64
|
||||||
|
hypervisors so that it is good to add generic virtualization
|
||||||
|
infrastructure on Linux/IA64.
|
||||||
|
|
||||||
|
|
||||||
|
What is paravirt_ops?
|
||||||
|
---------------------
|
||||||
|
It has been developed on x86 as virtualization support via API, not ABI.
|
||||||
|
It allows each hypervisor to override operations which are important for
|
||||||
|
hypervisors at API level. And it allows a single kernel binary to run on
|
||||||
|
all supported execution environments including native machine.
|
||||||
|
Essentially paravirt_ops is a set of function pointers which represent
|
||||||
|
operations corresponding to low level sensitive instructions and high
|
||||||
|
level functionalities in various area. But one significant difference
|
||||||
|
from usual function pointer table is that it allows optimization with
|
||||||
|
binary patch. It is because some of these operations are very
|
||||||
|
performance sensitive and indirect call overhead is not negligible.
|
||||||
|
With binary patch, indirect C function call can be transformed into
|
||||||
|
direct C function call or in-place execution to eliminate the overhead.
|
||||||
|
|
||||||
|
Thus, operations of paravirt_ops are classified into three categories.
|
||||||
|
- simple indirect call
|
||||||
|
These operations correspond to high level functionality so that the
|
||||||
|
overhead of indirect call isn't very important.
|
||||||
|
|
||||||
|
- indirect call which allows optimization with binary patch
|
||||||
|
Usually these operations correspond to low level instructions. They
|
||||||
|
are called frequently and performance critical. So the overhead is
|
||||||
|
very important.
|
||||||
|
|
||||||
|
- a set of macros for hand written assembly code
|
||||||
|
Hand written assembly codes (.S files) also need paravirtualization
|
||||||
|
because they include sensitive instructions or some of code paths in
|
||||||
|
them are very performance critical.
|
||||||
|
|
||||||
|
|
||||||
|
The relation to the IA64 machine vector
|
||||||
|
---------------------------------------
|
||||||
|
Linux/IA64 has the IA64 machine vector functionality which allows the
|
||||||
|
kernel to switch implementations (e.g. initialization, ipi, dma api...)
|
||||||
|
depending on executing platform.
|
||||||
|
We can replace some implementations very easily defining a new machine
|
||||||
|
vector. Thus another approach for virtualization support would be
|
||||||
|
enhancing the machine vector functionality.
|
||||||
|
But paravirt_ops approach was taken because
|
||||||
|
- virtualization support needs wider support than machine vector does.
|
||||||
|
e.g. low level instruction paravirtualization. It must be
|
||||||
|
initialized very early before platform detection.
|
||||||
|
|
||||||
|
- virtualization support needs more functionality like binary patch.
|
||||||
|
Probably the calling overhead might not be very large compared to the
|
||||||
|
emulation overhead of virtualization. However in the native case, the
|
||||||
|
overhead should be eliminated completely.
|
||||||
|
A single kernel binary should run on each environment including native,
|
||||||
|
and the overhead of paravirt_ops on native environment should be as
|
||||||
|
small as possible.
|
||||||
|
|
||||||
|
- for full virtualization technology, e.g. KVM/IA64 or
|
||||||
|
Xen/IA64 HVM domain, the result would be
|
||||||
|
(the emulated platform machine vector. probably dig) + (pv_ops).
|
||||||
|
This means that the virtualization support layer should be under
|
||||||
|
the machine vector layer.
|
||||||
|
|
||||||
|
Possibly it might be better to move some function pointers from
|
||||||
|
paravirt_ops to machine vector. In fact, Xen domU case utilizes both
|
||||||
|
pv_ops and machine vector.
|
||||||
|
|
||||||
|
|
||||||
|
IA64 paravirt_ops
|
||||||
|
-----------------
|
||||||
|
In this section, the concrete paravirt_ops will be discussed.
|
||||||
|
Because of the architecture difference between ia64 and x86, the
|
||||||
|
resulting set of functions is very different from x86 pv_ops.
|
||||||
|
|
||||||
|
- C function pointer tables
|
||||||
|
They are not very performance critical so that simple C indirect
|
||||||
|
function call is acceptable. The following structures are defined at
|
||||||
|
this moment. For details see linux/include/asm-ia64/paravirt.h
|
||||||
|
- struct pv_info
|
||||||
|
This structure describes the execution environment.
|
||||||
|
- struct pv_init_ops
|
||||||
|
This structure describes the various initialization hooks.
|
||||||
|
- struct pv_iosapic_ops
|
||||||
|
This structure describes hooks to iosapic operations.
|
||||||
|
- struct pv_irq_ops
|
||||||
|
This structure describes hooks to irq related operations
|
||||||
|
- struct pv_time_op
|
||||||
|
This structure describes hooks to steal time accounting.
|
||||||
|
|
||||||
|
- a set of indirect calls which need optimization
|
||||||
|
Currently this class of functions correspond to a subset of IA64
|
||||||
|
intrinsics. At this moment the optimization with binary patch isn't
|
||||||
|
implemented yet.
|
||||||
|
struct pv_cpu_op is defined. For details see
|
||||||
|
linux/include/asm-ia64/paravirt_privop.h
|
||||||
|
Mostly they correspond to ia64 intrinsics 1-to-1.
|
||||||
|
Caveat: Now they are defined as C indirect function pointers, but in
|
||||||
|
order to support binary patch optimization, they will be changed
|
||||||
|
using GCC extended inline assembly code.
|
||||||
|
|
||||||
|
- a set of macros for hand written assembly code (.S files)
|
||||||
|
For maintenance purpose, the taken approach for .S files is single
|
||||||
|
source code and compile multiple times with different macros definitions.
|
||||||
|
Each pv_ops instance must define those macros to compile.
|
||||||
|
The important thing here is that sensitive, but non-privileged
|
||||||
|
instructions must be paravirtualized and that some privileged
|
||||||
|
instructions also need paravirtualization for reasonable performance.
|
||||||
|
Developers who modify .S files must be aware of that. At this moment
|
||||||
|
an easy checker is implemented to detect paravirtualization breakage.
|
||||||
|
But it doesn't cover all the cases.
|
||||||
|
|
||||||
|
Sometimes this set of macros is called pv_cpu_asm_op. But there is no
|
||||||
|
corresponding structure in the source code.
|
||||||
|
Those macros mostly 1:1 correspond to a subset of privileged
|
||||||
|
instructions. See linux/include/asm-ia64/native/inst.h.
|
||||||
|
And some functions written in assembly also need to be overrided so
|
||||||
|
that each pv_ops instance have to define some macros. Again see
|
||||||
|
linux/include/asm-ia64/native/inst.h.
|
||||||
|
|
||||||
|
|
||||||
|
Those structures must be initialized very early before start_kernel.
|
||||||
|
Probably initialized in head.S using multi entry point or some other trick.
|
||||||
|
For native case implementation see linux/arch/ia64/kernel/paravirt.c.
|
|
@ -31,7 +31,7 @@ The driver works with ALSA drivers simultaneously. For example, the xracer
|
||||||
uses joystick as input device and PCM device as sound output in one time.
|
uses joystick as input device and PCM device as sound output in one time.
|
||||||
There are no sound or input collisions detected. The source code have
|
There are no sound or input collisions detected. The source code have
|
||||||
comments about them; but I've found the joystick can be initialized
|
comments about them; but I've found the joystick can be initialized
|
||||||
separately of ALSA modules. So, you canm use only one joystick driver
|
separately of ALSA modules. So, you can use only one joystick driver
|
||||||
without ALSA drivers. The ALSA drivers are not needed to compile or
|
without ALSA drivers. The ALSA drivers are not needed to compile or
|
||||||
run this driver.
|
run this driver.
|
||||||
|
|
||||||
|
|
|
@ -1,5 +1,3 @@
|
||||||
$Id: gameport-programming.txt,v 1.3 2001/04/24 13:51:37 vojtech Exp $
|
|
||||||
|
|
||||||
Programming gameport drivers
|
Programming gameport drivers
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
|
|
@ -1,7 +1,6 @@
|
||||||
Linux Input drivers v1.0
|
Linux Input drivers v1.0
|
||||||
(c) 1999-2001 Vojtech Pavlik <vojtech@ucw.cz>
|
(c) 1999-2001 Vojtech Pavlik <vojtech@ucw.cz>
|
||||||
Sponsored by SuSE
|
Sponsored by SuSE
|
||||||
$Id: input.txt,v 1.8 2002/05/29 03:15:01 bradleym Exp $
|
|
||||||
----------------------------------------------------------------------------
|
----------------------------------------------------------------------------
|
||||||
|
|
||||||
0. Disclaimer
|
0. Disclaimer
|
||||||
|
|
|
@ -5,8 +5,6 @@
|
||||||
|
|
||||||
7 Aug 1998
|
7 Aug 1998
|
||||||
|
|
||||||
$Id: joystick-api.txt,v 1.2 2001/05/08 21:21:23 vojtech Exp $
|
|
||||||
|
|
||||||
1. Initialization
|
1. Initialization
|
||||||
~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
|
|
@ -2,7 +2,6 @@
|
||||||
(c) 1998-2000 Vojtech Pavlik <vojtech@ucw.cz>
|
(c) 1998-2000 Vojtech Pavlik <vojtech@ucw.cz>
|
||||||
(c) 1998 Andree Borrmann <a.borrmann@tu-bs.de>
|
(c) 1998 Andree Borrmann <a.borrmann@tu-bs.de>
|
||||||
Sponsored by SuSE
|
Sponsored by SuSE
|
||||||
$Id: joystick-parport.txt,v 1.6 2001/09/25 09:31:32 vojtech Exp $
|
|
||||||
----------------------------------------------------------------------------
|
----------------------------------------------------------------------------
|
||||||
|
|
||||||
0. Disclaimer
|
0. Disclaimer
|
||||||
|
|
|
@ -1,7 +1,6 @@
|
||||||
Linux Joystick driver v2.0.0
|
Linux Joystick driver v2.0.0
|
||||||
(c) 1996-2000 Vojtech Pavlik <vojtech@ucw.cz>
|
(c) 1996-2000 Vojtech Pavlik <vojtech@ucw.cz>
|
||||||
Sponsored by SuSE
|
Sponsored by SuSE
|
||||||
$Id: joystick.txt,v 1.12 2002/03/03 12:13:07 jdeneux Exp $
|
|
||||||
----------------------------------------------------------------------------
|
----------------------------------------------------------------------------
|
||||||
|
|
||||||
0. Disclaimer
|
0. Disclaimer
|
||||||
|
|
|
@ -117,6 +117,7 @@ Code Seq# Include File Comments
|
||||||
<mailto:natalia@nikhefk.nikhef.nl>
|
<mailto:natalia@nikhefk.nikhef.nl>
|
||||||
'c' 00-7F linux/comstats.h conflict!
|
'c' 00-7F linux/comstats.h conflict!
|
||||||
'c' 00-7F linux/coda.h conflict!
|
'c' 00-7F linux/coda.h conflict!
|
||||||
|
'c' 80-9F asm-s390/chsc.h
|
||||||
'd' 00-FF linux/char/drm/drm/h conflict!
|
'd' 00-FF linux/char/drm/drm/h conflict!
|
||||||
'd' 00-DF linux/video_decoder.h conflict!
|
'd' 00-DF linux/video_decoder.h conflict!
|
||||||
'd' F0-FF linux/digi1.h
|
'd' F0-FF linux/digi1.h
|
||||||
|
|
|
@ -508,12 +508,13 @@ HDIO_DRIVE_RESET execute a device reset
|
||||||
|
|
||||||
error returns:
|
error returns:
|
||||||
EACCES Access denied: requires CAP_SYS_ADMIN
|
EACCES Access denied: requires CAP_SYS_ADMIN
|
||||||
|
ENXIO No such device: phy dead or ctl_addr == 0
|
||||||
|
EIO I/O error: reset timed out or hardware error
|
||||||
|
|
||||||
notes:
|
notes:
|
||||||
|
|
||||||
Abort any current command, prevent anything else from being
|
Execute a reset on the device as soon as the current IO
|
||||||
queued, execute a reset on the device, and issue BLKRRPART
|
operation has completed.
|
||||||
ioctl on the block device.
|
|
||||||
|
|
||||||
Executes an ATAPI soft reset if applicable, otherwise
|
Executes an ATAPI soft reset if applicable, otherwise
|
||||||
executes an ATA soft reset on the controller.
|
executes an ATA soft reset on the controller.
|
||||||
|
|
|
@ -1,6 +1,6 @@
|
||||||
To decode a hex IOCTL code:
|
To decode a hex IOCTL code:
|
||||||
|
|
||||||
Most architecures use this generic format, but check
|
Most architectures use this generic format, but check
|
||||||
include/ARCH/ioctl.h for specifics, e.g. powerpc
|
include/ARCH/ioctl.h for specifics, e.g. powerpc
|
||||||
uses 3 bits to encode read/write and 13 bits for size.
|
uses 3 bits to encode read/write and 13 bits for size.
|
||||||
|
|
||||||
|
@ -18,7 +18,7 @@ uses 3 bits to encode read/write and 13 bits for size.
|
||||||
7-0 function #
|
7-0 function #
|
||||||
|
|
||||||
|
|
||||||
So for example 0x82187201 is a read with arg length of 0x218,
|
So for example 0x82187201 is a read with arg length of 0x218,
|
||||||
character 'r' function 1. Grepping the source reveals this is:
|
character 'r' function 1. Grepping the source reveals this is:
|
||||||
|
|
||||||
#define VFAT_IOCTL_READDIR_BOTH _IOR('r', 1, struct dirent [2])
|
#define VFAT_IOCTL_READDIR_BOTH _IOR('r', 1, struct dirent [2])
|
||||||
|
|
|
@ -143,7 +143,7 @@ disk and partition statistics are consistent again. Since we still don't
|
||||||
keep record of the partition-relative address, an operation is attributed to
|
keep record of the partition-relative address, an operation is attributed to
|
||||||
the partition which contains the first sector of the request after the
|
the partition which contains the first sector of the request after the
|
||||||
eventual merges. As requests can be merged across partition, this could lead
|
eventual merges. As requests can be merged across partition, this could lead
|
||||||
to some (probably insignificant) innacuracy.
|
to some (probably insignificant) inaccuracy.
|
||||||
|
|
||||||
Additional notes
|
Additional notes
|
||||||
----------------
|
----------------
|
||||||
|
|
6
Documentation/isdn/README.mISDN
Normal file
6
Documentation/isdn/README.mISDN
Normal file
|
@ -0,0 +1,6 @@
|
||||||
|
mISDN is a new modular ISDN driver, in the long term it should replace
|
||||||
|
the old I4L driver architecture for passiv ISDN cards.
|
||||||
|
It was designed to allow a broad range of applications and interfaces
|
||||||
|
but only have the basic function in kernel, the interface to the user
|
||||||
|
space is based on sockets with a own address family AF_ISDN.
|
||||||
|
|
|
@ -65,26 +65,26 @@ Install kexec-tools
|
||||||
|
|
||||||
2) Download the kexec-tools user-space package from the following URL:
|
2) Download the kexec-tools user-space package from the following URL:
|
||||||
|
|
||||||
http://www.kernel.org/pub/linux/kernel/people/horms/kexec-tools/kexec-tools-testing.tar.gz
|
http://www.kernel.org/pub/linux/kernel/people/horms/kexec-tools/kexec-tools.tar.gz
|
||||||
|
|
||||||
This is a symlink to the latest version, which at the time of writing is
|
This is a symlink to the latest version.
|
||||||
20061214, the only release of kexec-tools-testing so far. As other versions
|
|
||||||
are released, the older ones will remain available at
|
|
||||||
http://www.kernel.org/pub/linux/kernel/people/horms/kexec-tools/
|
|
||||||
|
|
||||||
Note: Latest kexec-tools-testing git tree is available at
|
The latest kexec-tools git tree is available at:
|
||||||
|
|
||||||
git://git.kernel.org/pub/scm/linux/kernel/git/horms/kexec-tools-testing.git
|
git://git.kernel.org/pub/scm/linux/kernel/git/horms/kexec-tools.git
|
||||||
or
|
or
|
||||||
http://www.kernel.org/git/?p=linux/kernel/git/horms/kexec-tools-testing.git;a=summary
|
http://www.kernel.org/git/?p=linux/kernel/git/horms/kexec-tools.git
|
||||||
|
|
||||||
|
More information about kexec-tools can be found at
|
||||||
|
http://www.kernel.org/pub/linux/kernel/people/horms/kexec-tools/README.html
|
||||||
|
|
||||||
3) Unpack the tarball with the tar command, as follows:
|
3) Unpack the tarball with the tar command, as follows:
|
||||||
|
|
||||||
tar xvpzf kexec-tools-testing.tar.gz
|
tar xvpzf kexec-tools.tar.gz
|
||||||
|
|
||||||
4) Change to the kexec-tools directory, as follows:
|
4) Change to the kexec-tools directory, as follows:
|
||||||
|
|
||||||
cd kexec-tools-testing-VERSION
|
cd kexec-tools-VERSION
|
||||||
|
|
||||||
5) Configure the package, as follows:
|
5) Configure the package, as follows:
|
||||||
|
|
||||||
|
@ -109,7 +109,7 @@ There are two possible methods of using Kdump.
|
||||||
2) Or use the system kernel binary itself as dump-capture kernel and there is
|
2) Or use the system kernel binary itself as dump-capture kernel and there is
|
||||||
no need to build a separate dump-capture kernel. This is possible
|
no need to build a separate dump-capture kernel. This is possible
|
||||||
only with the architecutres which support a relocatable kernel. As
|
only with the architecutres which support a relocatable kernel. As
|
||||||
of today i386 and ia64 architectures support relocatable kernel.
|
of today, i386, x86_64 and ia64 architectures support relocatable kernel.
|
||||||
|
|
||||||
Building a relocatable kernel is advantageous from the point of view that
|
Building a relocatable kernel is advantageous from the point of view that
|
||||||
one does not have to build a second kernel for capturing the dump. But
|
one does not have to build a second kernel for capturing the dump. But
|
||||||
|
|
|
@ -87,7 +87,8 @@ parameter is applicable:
|
||||||
SH SuperH architecture is enabled.
|
SH SuperH architecture is enabled.
|
||||||
SMP The kernel is an SMP kernel.
|
SMP The kernel is an SMP kernel.
|
||||||
SPARC Sparc architecture is enabled.
|
SPARC Sparc architecture is enabled.
|
||||||
SWSUSP Software suspend is enabled.
|
SWSUSP Software suspend (hibernation) is enabled.
|
||||||
|
SUSPEND System suspend states are enabled.
|
||||||
TS Appropriate touchscreen support is enabled.
|
TS Appropriate touchscreen support is enabled.
|
||||||
USB USB support is enabled.
|
USB USB support is enabled.
|
||||||
USBHID USB Human Interface Device support is enabled.
|
USBHID USB Human Interface Device support is enabled.
|
||||||
|
@ -147,10 +148,16 @@ and is between 256 and 4096 characters. It is defined in the file
|
||||||
default: 0
|
default: 0
|
||||||
|
|
||||||
acpi_sleep= [HW,ACPI] Sleep options
|
acpi_sleep= [HW,ACPI] Sleep options
|
||||||
Format: { s3_bios, s3_mode, s3_beep }
|
Format: { s3_bios, s3_mode, s3_beep, s4_nohwsig, old_ordering }
|
||||||
See Documentation/power/video.txt for s3_bios and s3_mode.
|
See Documentation/power/video.txt for s3_bios and s3_mode.
|
||||||
s3_beep is for debugging; it makes the PC's speaker beep
|
s3_beep is for debugging; it makes the PC's speaker beep
|
||||||
as soon as the kernel's real-mode entry point is called.
|
as soon as the kernel's real-mode entry point is called.
|
||||||
|
s4_nohwsig prevents ACPI hardware signature from being
|
||||||
|
used during resume from hibernation.
|
||||||
|
old_ordering causes the ACPI 1.0 ordering of the _PTS
|
||||||
|
control method, wrt putting devices into low power
|
||||||
|
states, to be enforced (the ACPI 2.0 ordering of _PTS is
|
||||||
|
used by default).
|
||||||
|
|
||||||
acpi_sci= [HW,ACPI] ACPI System Control Interrupt trigger mode
|
acpi_sci= [HW,ACPI] ACPI System Control Interrupt trigger mode
|
||||||
Format: { level | edge | high | low }
|
Format: { level | edge | high | low }
|
||||||
|
@ -271,6 +278,17 @@ and is between 256 and 4096 characters. It is defined in the file
|
||||||
aic79xx= [HW,SCSI]
|
aic79xx= [HW,SCSI]
|
||||||
See Documentation/scsi/aic79xx.txt.
|
See Documentation/scsi/aic79xx.txt.
|
||||||
|
|
||||||
|
amd_iommu= [HW,X86-84]
|
||||||
|
Pass parameters to the AMD IOMMU driver in the system.
|
||||||
|
Possible values are:
|
||||||
|
isolate - enable device isolation (each device, as far
|
||||||
|
as possible, will get its own protection
|
||||||
|
domain)
|
||||||
|
amd_iommu_size= [HW,X86-64]
|
||||||
|
Define the size of the aperture for the AMD IOMMU
|
||||||
|
driver. Possible values are:
|
||||||
|
'32M', '64M' (default), '128M', '256M', '512M', '1G'
|
||||||
|
|
||||||
amijoy.map= [HW,JOY] Amiga joystick support
|
amijoy.map= [HW,JOY] Amiga joystick support
|
||||||
Map of devices attached to JOY0DAT and JOY1DAT
|
Map of devices attached to JOY0DAT and JOY1DAT
|
||||||
Format: <a>,<b>
|
Format: <a>,<b>
|
||||||
|
@ -295,7 +313,7 @@ and is between 256 and 4096 characters. It is defined in the file
|
||||||
when initialising the APIC and IO-APIC components.
|
when initialising the APIC and IO-APIC components.
|
||||||
|
|
||||||
apm= [APM] Advanced Power Management
|
apm= [APM] Advanced Power Management
|
||||||
See header of arch/i386/kernel/apm.c.
|
See header of arch/x86/kernel/apm_32.c.
|
||||||
|
|
||||||
arcrimi= [HW,NET] ARCnet - "RIM I" (entirely mem-mapped) cards
|
arcrimi= [HW,NET] ARCnet - "RIM I" (entirely mem-mapped) cards
|
||||||
Format: <io>,<irq>,<nodeID>
|
Format: <io>,<irq>,<nodeID>
|
||||||
|
@ -560,6 +578,8 @@ and is between 256 and 4096 characters. It is defined in the file
|
||||||
|
|
||||||
debug_objects [KNL] Enable object debugging
|
debug_objects [KNL] Enable object debugging
|
||||||
|
|
||||||
|
debugpat [X86] Enable PAT debugging
|
||||||
|
|
||||||
decnet.addr= [HW,NET]
|
decnet.addr= [HW,NET]
|
||||||
Format: <area>[,<node>]
|
Format: <area>[,<node>]
|
||||||
See also Documentation/networking/decnet.txt.
|
See also Documentation/networking/decnet.txt.
|
||||||
|
@ -599,6 +619,29 @@ and is between 256 and 4096 characters. It is defined in the file
|
||||||
See drivers/char/README.epca and
|
See drivers/char/README.epca and
|
||||||
Documentation/digiepca.txt.
|
Documentation/digiepca.txt.
|
||||||
|
|
||||||
|
disable_mtrr_cleanup [X86]
|
||||||
|
enable_mtrr_cleanup [X86]
|
||||||
|
The kernel tries to adjust MTRR layout from continuous
|
||||||
|
to discrete, to make X server driver able to add WB
|
||||||
|
entry later. This parameter enables/disables that.
|
||||||
|
|
||||||
|
mtrr_chunk_size=nn[KMG] [X86]
|
||||||
|
used for mtrr cleanup. It is largest continous chunk
|
||||||
|
that could hold holes aka. UC entries.
|
||||||
|
|
||||||
|
mtrr_gran_size=nn[KMG] [X86]
|
||||||
|
Used for mtrr cleanup. It is granularity of mtrr block.
|
||||||
|
Default is 1.
|
||||||
|
Large value could prevent small alignment from
|
||||||
|
using up MTRRs.
|
||||||
|
|
||||||
|
mtrr_spare_reg_nr=n [X86]
|
||||||
|
Format: <integer>
|
||||||
|
Range: 0,7 : spare reg number
|
||||||
|
Default : 1
|
||||||
|
Used for mtrr cleanup. It is spare mtrr entries number.
|
||||||
|
Set to 2 or more if your graphical card needs more.
|
||||||
|
|
||||||
disable_mtrr_trim [X86, Intel and AMD only]
|
disable_mtrr_trim [X86, Intel and AMD only]
|
||||||
By default the kernel will trim any uncacheable
|
By default the kernel will trim any uncacheable
|
||||||
memory out of your available memory pool based on
|
memory out of your available memory pool based on
|
||||||
|
@ -638,7 +681,7 @@ and is between 256 and 4096 characters. It is defined in the file
|
||||||
|
|
||||||
elanfreq= [X86-32]
|
elanfreq= [X86-32]
|
||||||
See comment before function elanfreq_setup() in
|
See comment before function elanfreq_setup() in
|
||||||
arch/i386/kernel/cpu/cpufreq/elanfreq.c.
|
arch/x86/kernel/cpu/cpufreq/elanfreq.c.
|
||||||
|
|
||||||
elevator= [IOSCHED]
|
elevator= [IOSCHED]
|
||||||
Format: {"anticipatory" | "cfq" | "deadline" | "noop"}
|
Format: {"anticipatory" | "cfq" | "deadline" | "noop"}
|
||||||
|
@ -722,9 +765,6 @@ and is between 256 and 4096 characters. It is defined in the file
|
||||||
hd= [EIDE] (E)IDE hard drive subsystem geometry
|
hd= [EIDE] (E)IDE hard drive subsystem geometry
|
||||||
Format: <cyl>,<head>,<sect>
|
Format: <cyl>,<head>,<sect>
|
||||||
|
|
||||||
hd?= [HW] (E)IDE subsystem
|
|
||||||
hd?lun= See Documentation/ide/ide.txt.
|
|
||||||
|
|
||||||
highmem=nn[KMG] [KNL,BOOT] forces the highmem zone to have an exact
|
highmem=nn[KMG] [KNL,BOOT] forces the highmem zone to have an exact
|
||||||
size of <nn>. This works even on boxes that have no
|
size of <nn>. This works even on boxes that have no
|
||||||
highmem otherwise. This also works to reduce highmem
|
highmem otherwise. This also works to reduce highmem
|
||||||
|
@ -737,8 +777,22 @@ and is between 256 and 4096 characters. It is defined in the file
|
||||||
hisax= [HW,ISDN]
|
hisax= [HW,ISDN]
|
||||||
See Documentation/isdn/README.HiSax.
|
See Documentation/isdn/README.HiSax.
|
||||||
|
|
||||||
hugepages= [HW,X86-32,IA-64] Maximal number of HugeTLB pages.
|
hugepages= [HW,X86-32,IA-64] HugeTLB pages to allocate at boot.
|
||||||
hugepagesz= [HW,IA-64,PPC] The size of the HugeTLB pages.
|
hugepagesz= [HW,IA-64,PPC,X86-64] The size of the HugeTLB pages.
|
||||||
|
On x86-64 and powerpc, this option can be specified
|
||||||
|
multiple times interleaved with hugepages= to reserve
|
||||||
|
huge pages of different sizes. Valid pages sizes on
|
||||||
|
x86-64 are 2M (when the CPU supports "pse") and 1G
|
||||||
|
(when the CPU supports the "pdpe1gb" cpuinfo flag)
|
||||||
|
Note that 1GB pages can only be allocated at boot time
|
||||||
|
using hugepages= and not freed afterwards.
|
||||||
|
default_hugepagesz=
|
||||||
|
[same as hugepagesz=] The size of the default
|
||||||
|
HugeTLB page size. This is the size represented by
|
||||||
|
the legacy /proc/ hugepages APIs, used for SHM, and
|
||||||
|
default size when mounting hugetlbfs filesystems.
|
||||||
|
Defaults to the default architecture's huge page size
|
||||||
|
if not specified.
|
||||||
|
|
||||||
i8042.direct [HW] Put keyboard port into non-translated mode
|
i8042.direct [HW] Put keyboard port into non-translated mode
|
||||||
i8042.dumbkbd [HW] Pretend that controller can only read data from
|
i8042.dumbkbd [HW] Pretend that controller can only read data from
|
||||||
|
@ -785,7 +839,7 @@ and is between 256 and 4096 characters. It is defined in the file
|
||||||
See Documentation/ide/ide.txt.
|
See Documentation/ide/ide.txt.
|
||||||
|
|
||||||
idle= [X86]
|
idle= [X86]
|
||||||
Format: idle=poll or idle=mwait
|
Format: idle=poll or idle=mwait, idle=halt, idle=nomwait
|
||||||
Poll forces a polling idle loop that can slightly improves the performance
|
Poll forces a polling idle loop that can slightly improves the performance
|
||||||
of waking up a idle CPU, but will use a lot of power and make the system
|
of waking up a idle CPU, but will use a lot of power and make the system
|
||||||
run hot. Not recommended.
|
run hot. Not recommended.
|
||||||
|
@ -793,6 +847,9 @@ and is between 256 and 4096 characters. It is defined in the file
|
||||||
to not use it because it doesn't save as much power as a normal idle
|
to not use it because it doesn't save as much power as a normal idle
|
||||||
loop use the MONITOR/MWAIT idle loop anyways. Performance should be the same
|
loop use the MONITOR/MWAIT idle loop anyways. Performance should be the same
|
||||||
as idle=poll.
|
as idle=poll.
|
||||||
|
idle=halt. Halt is forced to be used for CPU idle.
|
||||||
|
In such case C2/C3 won't be used again.
|
||||||
|
idle=nomwait. Disable mwait for CPU C-states
|
||||||
|
|
||||||
ide-pci-generic.all-generic-ide [HW] (E)IDE subsystem
|
ide-pci-generic.all-generic-ide [HW] (E)IDE subsystem
|
||||||
Claim all unknown PCI IDE storage controllers.
|
Claim all unknown PCI IDE storage controllers.
|
||||||
|
@ -1166,7 +1223,7 @@ and is between 256 and 4096 characters. It is defined in the file
|
||||||
or
|
or
|
||||||
memmap=0x10000$0x18690000
|
memmap=0x10000$0x18690000
|
||||||
|
|
||||||
memtest= [KNL,X86_64] Enable memtest
|
memtest= [KNL,X86] Enable memtest
|
||||||
Format: <integer>
|
Format: <integer>
|
||||||
range: 0,4 : pattern number
|
range: 0,4 : pattern number
|
||||||
default : 0 <disable>
|
default : 0 <disable>
|
||||||
|
@ -1185,6 +1242,14 @@ and is between 256 and 4096 characters. It is defined in the file
|
||||||
|
|
||||||
mga= [HW,DRM]
|
mga= [HW,DRM]
|
||||||
|
|
||||||
|
mminit_loglevel=
|
||||||
|
[KNL] When CONFIG_DEBUG_MEMORY_INIT is set, this
|
||||||
|
parameter allows control of the logging verbosity for
|
||||||
|
the additional memory initialisation checks. A value
|
||||||
|
of 0 disables mminit logging and a level of 4 will
|
||||||
|
log everything. Information is printed at KERN_DEBUG
|
||||||
|
so loglevel=8 may also need to be specified.
|
||||||
|
|
||||||
mousedev.tap_time=
|
mousedev.tap_time=
|
||||||
[MOUSE] Maximum time between finger touching and
|
[MOUSE] Maximum time between finger touching and
|
||||||
leaving touchpad surface for touch to be considered
|
leaving touchpad surface for touch to be considered
|
||||||
|
@ -1208,6 +1273,11 @@ and is between 256 and 4096 characters. It is defined in the file
|
||||||
mtdparts= [MTD]
|
mtdparts= [MTD]
|
||||||
See drivers/mtd/cmdlinepart.c.
|
See drivers/mtd/cmdlinepart.c.
|
||||||
|
|
||||||
|
mtdset= [ARM]
|
||||||
|
ARM/S3C2412 JIVE boot control
|
||||||
|
|
||||||
|
See arch/arm/mach-s3c2412/mach-jive.c
|
||||||
|
|
||||||
mtouchusb.raw_coordinates=
|
mtouchusb.raw_coordinates=
|
||||||
[HW] Make the MicroTouch USB driver use raw coordinates
|
[HW] Make the MicroTouch USB driver use raw coordinates
|
||||||
('y', default) or cooked coordinates ('n')
|
('y', default) or cooked coordinates ('n')
|
||||||
|
@ -1234,6 +1304,13 @@ and is between 256 and 4096 characters. It is defined in the file
|
||||||
This usage is only documented in each driver source
|
This usage is only documented in each driver source
|
||||||
file if at all.
|
file if at all.
|
||||||
|
|
||||||
|
nf_conntrack.acct=
|
||||||
|
[NETFILTER] Enable connection tracking flow accounting
|
||||||
|
0 to disable accounting
|
||||||
|
1 to enable accounting
|
||||||
|
Default value depends on CONFIG_NF_CT_ACCT that is
|
||||||
|
going to be removed in 2.6.29.
|
||||||
|
|
||||||
nfsaddrs= [NFS]
|
nfsaddrs= [NFS]
|
||||||
See Documentation/filesystems/nfsroot.txt.
|
See Documentation/filesystems/nfsroot.txt.
|
||||||
|
|
||||||
|
@ -1496,6 +1573,9 @@ and is between 256 and 4096 characters. It is defined in the file
|
||||||
Use with caution as certain devices share
|
Use with caution as certain devices share
|
||||||
address decoders between ROMs and other
|
address decoders between ROMs and other
|
||||||
resources.
|
resources.
|
||||||
|
norom [X86-32,X86_64] Do not assign address space to
|
||||||
|
expansion ROMs that do not already have
|
||||||
|
BIOS assigned address ranges.
|
||||||
irqmask=0xMMMM [X86-32] Set a bit mask of IRQs allowed to be
|
irqmask=0xMMMM [X86-32] Set a bit mask of IRQs allowed to be
|
||||||
assigned automatically to PCI devices. You can
|
assigned automatically to PCI devices. You can
|
||||||
make the kernel exclude IRQs of your ISA cards
|
make the kernel exclude IRQs of your ISA cards
|
||||||
|
@ -1571,6 +1651,10 @@ and is between 256 and 4096 characters. It is defined in the file
|
||||||
Format: { parport<nr> | timid | 0 }
|
Format: { parport<nr> | timid | 0 }
|
||||||
See also Documentation/parport.txt.
|
See also Documentation/parport.txt.
|
||||||
|
|
||||||
|
pmtmr= [X86] Manual setup of pmtmr I/O Port.
|
||||||
|
Override pmtimer IOPort with a hex value.
|
||||||
|
e.g. pmtmr=0x508
|
||||||
|
|
||||||
pnpacpi= [ACPI]
|
pnpacpi= [ACPI]
|
||||||
{ off }
|
{ off }
|
||||||
|
|
||||||
|
@ -1679,6 +1763,10 @@ and is between 256 and 4096 characters. It is defined in the file
|
||||||
Format: <reboot_mode>[,<reboot_mode2>[,...]]
|
Format: <reboot_mode>[,<reboot_mode2>[,...]]
|
||||||
See arch/*/kernel/reboot.c or arch/*/kernel/process.c
|
See arch/*/kernel/reboot.c or arch/*/kernel/process.c
|
||||||
|
|
||||||
|
relax_domain_level=
|
||||||
|
[KNL, SMP] Set scheduler's default relax_domain_level.
|
||||||
|
See Documentation/cpusets.txt.
|
||||||
|
|
||||||
reserve= [KNL,BUGS] Force the kernel to ignore some iomem area
|
reserve= [KNL,BUGS] Force the kernel to ignore some iomem area
|
||||||
|
|
||||||
reservetop= [X86-32]
|
reservetop= [X86-32]
|
||||||
|
@ -1971,6 +2059,9 @@ and is between 256 and 4096 characters. It is defined in the file
|
||||||
|
|
||||||
snd-ymfpci= [HW,ALSA]
|
snd-ymfpci= [HW,ALSA]
|
||||||
|
|
||||||
|
softlockup_panic=
|
||||||
|
[KNL] Should the soft-lockup detector generate panics.
|
||||||
|
|
||||||
sonypi.*= [HW] Sony Programmable I/O Control Device driver
|
sonypi.*= [HW] Sony Programmable I/O Control Device driver
|
||||||
See Documentation/sonypi.txt
|
See Documentation/sonypi.txt
|
||||||
|
|
||||||
|
@ -2035,6 +2126,12 @@ and is between 256 and 4096 characters. It is defined in the file
|
||||||
|
|
||||||
tdfx= [HW,DRM]
|
tdfx= [HW,DRM]
|
||||||
|
|
||||||
|
test_suspend= [SUSPEND]
|
||||||
|
Specify "mem" (for Suspend-to-RAM) or "standby" (for
|
||||||
|
standby suspend) as the system sleep state to briefly
|
||||||
|
enter during system startup. The system is woken from
|
||||||
|
this state using a wakeup-capable RTC alarm.
|
||||||
|
|
||||||
thash_entries= [KNL,NET]
|
thash_entries= [KNL,NET]
|
||||||
Set number of hash buckets for TCP connection
|
Set number of hash buckets for TCP connection
|
||||||
|
|
||||||
|
@ -2062,13 +2159,6 @@ and is between 256 and 4096 characters. It is defined in the file
|
||||||
<deci-seconds>: poll all this frequency
|
<deci-seconds>: poll all this frequency
|
||||||
0: no polling (default)
|
0: no polling (default)
|
||||||
|
|
||||||
tipar.timeout= [HW,PPT]
|
|
||||||
Set communications timeout in tenths of a second
|
|
||||||
(default 15).
|
|
||||||
|
|
||||||
tipar.delay= [HW,PPT]
|
|
||||||
Set inter-bit delay in microseconds (default 10).
|
|
||||||
|
|
||||||
tmscsim= [HW,SCSI]
|
tmscsim= [HW,SCSI]
|
||||||
See comment before function dc390_setup() in
|
See comment before function dc390_setup() in
|
||||||
drivers/scsi/tmscsim.c.
|
drivers/scsi/tmscsim.c.
|
||||||
|
@ -2102,6 +2192,10 @@ and is between 256 and 4096 characters. It is defined in the file
|
||||||
Note that genuine overcurrent events won't be
|
Note that genuine overcurrent events won't be
|
||||||
reported either.
|
reported either.
|
||||||
|
|
||||||
|
unknown_nmi_panic
|
||||||
|
[X86-32,X86-64]
|
||||||
|
Set unknown_nmi_panic=1 early on boot.
|
||||||
|
|
||||||
usbcore.autosuspend=
|
usbcore.autosuspend=
|
||||||
[USB] The autosuspend time delay (in seconds) used
|
[USB] The autosuspend time delay (in seconds) used
|
||||||
for newly-detected USB devices (default 2). This
|
for newly-detected USB devices (default 2). This
|
||||||
|
@ -2112,6 +2206,9 @@ and is between 256 and 4096 characters. It is defined in the file
|
||||||
usbhid.mousepoll=
|
usbhid.mousepoll=
|
||||||
[USBHID] The interval which mice are to be polled at.
|
[USBHID] The interval which mice are to be polled at.
|
||||||
|
|
||||||
|
add_efi_memmap [EFI; x86-32,X86-64] Include EFI memory map in
|
||||||
|
kernel's map of available physical RAM.
|
||||||
|
|
||||||
vdso= [X86-32,SH,x86-64]
|
vdso= [X86-32,SH,x86-64]
|
||||||
vdso=2: enable compat VDSO (default with COMPAT_VDSO)
|
vdso=2: enable compat VDSO (default with COMPAT_VDSO)
|
||||||
vdso=1: enable VDSO (default)
|
vdso=1: enable VDSO (default)
|
||||||
|
|
|
@ -864,7 +864,7 @@ payload contents" for more information.
|
||||||
request_key_with_auxdata() respectively.
|
request_key_with_auxdata() respectively.
|
||||||
|
|
||||||
These two functions return with the key potentially still under
|
These two functions return with the key potentially still under
|
||||||
construction. To wait for contruction completion, the following should be
|
construction. To wait for construction completion, the following should be
|
||||||
called:
|
called:
|
||||||
|
|
||||||
int wait_for_key_construction(struct key *key, bool intr);
|
int wait_for_key_construction(struct key *key, bool intr);
|
||||||
|
|
|
@ -172,6 +172,7 @@ architectures:
|
||||||
- ia64 (Does not support probes on instruction slot1.)
|
- ia64 (Does not support probes on instruction slot1.)
|
||||||
- sparc64 (Return probes not yet implemented.)
|
- sparc64 (Return probes not yet implemented.)
|
||||||
- arm
|
- arm
|
||||||
|
- ppc
|
||||||
|
|
||||||
3. Configuring Kprobes
|
3. Configuring Kprobes
|
||||||
|
|
||||||
|
|
|
@ -174,8 +174,6 @@ The LED is exposed through the LED subsystem, and can be found in:
|
||||||
The mail LED is autodetected, so if you don't have one, the LED device won't
|
The mail LED is autodetected, so if you don't have one, the LED device won't
|
||||||
be registered.
|
be registered.
|
||||||
|
|
||||||
If you have a mail LED that is not green, please report this to me.
|
|
||||||
|
|
||||||
Backlight
|
Backlight
|
||||||
*********
|
*********
|
||||||
|
|
||||||
|
|
|
@ -1,7 +1,7 @@
|
||||||
ThinkPad ACPI Extras Driver
|
ThinkPad ACPI Extras Driver
|
||||||
|
|
||||||
Version 0.20
|
Version 0.21
|
||||||
April 09th, 2008
|
May 29th, 2008
|
||||||
|
|
||||||
Borislav Deianov <borislav@users.sf.net>
|
Borislav Deianov <borislav@users.sf.net>
|
||||||
Henrique de Moraes Holschuh <hmh@hmh.eng.br>
|
Henrique de Moraes Holschuh <hmh@hmh.eng.br>
|
||||||
|
@ -621,7 +621,8 @@ Bluetooth
|
||||||
---------
|
---------
|
||||||
|
|
||||||
procfs: /proc/acpi/ibm/bluetooth
|
procfs: /proc/acpi/ibm/bluetooth
|
||||||
sysfs device attribute: bluetooth_enable
|
sysfs device attribute: bluetooth_enable (deprecated)
|
||||||
|
sysfs rfkill class: switch "tpacpi_bluetooth_sw"
|
||||||
|
|
||||||
This feature shows the presence and current state of a ThinkPad
|
This feature shows the presence and current state of a ThinkPad
|
||||||
Bluetooth device in the internal ThinkPad CDC slot.
|
Bluetooth device in the internal ThinkPad CDC slot.
|
||||||
|
@ -643,8 +644,12 @@ Sysfs notes:
|
||||||
0: disables Bluetooth / Bluetooth is disabled
|
0: disables Bluetooth / Bluetooth is disabled
|
||||||
1: enables Bluetooth / Bluetooth is enabled.
|
1: enables Bluetooth / Bluetooth is enabled.
|
||||||
|
|
||||||
Note: this interface will be probably be superseded by the
|
Note: this interface has been superseded by the generic rfkill
|
||||||
generic rfkill class, so it is NOT to be considered stable yet.
|
class. It has been deprecated, and it will be removed in year
|
||||||
|
2010.
|
||||||
|
|
||||||
|
rfkill controller switch "tpacpi_bluetooth_sw": refer to
|
||||||
|
Documentation/rfkill.txt for details.
|
||||||
|
|
||||||
Video output control -- /proc/acpi/ibm/video
|
Video output control -- /proc/acpi/ibm/video
|
||||||
--------------------------------------------
|
--------------------------------------------
|
||||||
|
@ -1374,7 +1379,8 @@ EXPERIMENTAL: WAN
|
||||||
-----------------
|
-----------------
|
||||||
|
|
||||||
procfs: /proc/acpi/ibm/wan
|
procfs: /proc/acpi/ibm/wan
|
||||||
sysfs device attribute: wwan_enable
|
sysfs device attribute: wwan_enable (deprecated)
|
||||||
|
sysfs rfkill class: switch "tpacpi_wwan_sw"
|
||||||
|
|
||||||
This feature is marked EXPERIMENTAL because the implementation
|
This feature is marked EXPERIMENTAL because the implementation
|
||||||
directly accesses hardware registers and may not work as expected. USE
|
directly accesses hardware registers and may not work as expected. USE
|
||||||
|
@ -1404,8 +1410,12 @@ Sysfs notes:
|
||||||
0: disables WWAN card / WWAN card is disabled
|
0: disables WWAN card / WWAN card is disabled
|
||||||
1: enables WWAN card / WWAN card is enabled.
|
1: enables WWAN card / WWAN card is enabled.
|
||||||
|
|
||||||
Note: this interface will be probably be superseded by the
|
Note: this interface has been superseded by the generic rfkill
|
||||||
generic rfkill class, so it is NOT to be considered stable yet.
|
class. It has been deprecated, and it will be removed in year
|
||||||
|
2010.
|
||||||
|
|
||||||
|
rfkill controller switch "tpacpi_wwan_sw": refer to
|
||||||
|
Documentation/rfkill.txt for details.
|
||||||
|
|
||||||
Multiple Commands, Module Parameters
|
Multiple Commands, Module Parameters
|
||||||
------------------------------------
|
------------------------------------
|
||||||
|
|
|
@ -59,7 +59,7 @@ Hardware accelerated blink of LEDs
|
||||||
|
|
||||||
Some LEDs can be programmed to blink without any CPU interaction. To
|
Some LEDs can be programmed to blink without any CPU interaction. To
|
||||||
support this feature, a LED driver can optionally implement the
|
support this feature, a LED driver can optionally implement the
|
||||||
blink_set() function (see <linux/leds.h>). If implemeted, triggers can
|
blink_set() function (see <linux/leds.h>). If implemented, triggers can
|
||||||
attempt to use it before falling back to software timers. The blink_set()
|
attempt to use it before falling back to software timers. The blink_set()
|
||||||
function should return 0 if the blink setting is supported, or -EINVAL
|
function should return 0 if the blink setting is supported, or -EINVAL
|
||||||
otherwise, which means that LED blinking will be handled by software.
|
otherwise, which means that LED blinking will be handled by software.
|
||||||
|
|
|
@ -36,11 +36,13 @@
|
||||||
#include <sched.h>
|
#include <sched.h>
|
||||||
#include <limits.h>
|
#include <limits.h>
|
||||||
#include <stddef.h>
|
#include <stddef.h>
|
||||||
|
#include <signal.h>
|
||||||
#include "linux/lguest_launcher.h"
|
#include "linux/lguest_launcher.h"
|
||||||
#include "linux/virtio_config.h"
|
#include "linux/virtio_config.h"
|
||||||
#include "linux/virtio_net.h"
|
#include "linux/virtio_net.h"
|
||||||
#include "linux/virtio_blk.h"
|
#include "linux/virtio_blk.h"
|
||||||
#include "linux/virtio_console.h"
|
#include "linux/virtio_console.h"
|
||||||
|
#include "linux/virtio_rng.h"
|
||||||
#include "linux/virtio_ring.h"
|
#include "linux/virtio_ring.h"
|
||||||
#include "asm-x86/bootparam.h"
|
#include "asm-x86/bootparam.h"
|
||||||
/*L:110 We can ignore the 39 include files we need for this program, but I do
|
/*L:110 We can ignore the 39 include files we need for this program, but I do
|
||||||
|
@ -64,8 +66,8 @@ typedef uint8_t u8;
|
||||||
#endif
|
#endif
|
||||||
/* We can have up to 256 pages for devices. */
|
/* We can have up to 256 pages for devices. */
|
||||||
#define DEVICE_PAGES 256
|
#define DEVICE_PAGES 256
|
||||||
/* This will occupy 2 pages: it must be a power of 2. */
|
/* This will occupy 3 pages: it must be a power of 2. */
|
||||||
#define VIRTQUEUE_NUM 128
|
#define VIRTQUEUE_NUM 256
|
||||||
|
|
||||||
/*L:120 verbose is both a global flag and a macro. The C preprocessor allows
|
/*L:120 verbose is both a global flag and a macro. The C preprocessor allows
|
||||||
* this, and although I wouldn't recommend it, it works quite nicely here. */
|
* this, and although I wouldn't recommend it, it works quite nicely here. */
|
||||||
|
@ -74,12 +76,19 @@ static bool verbose;
|
||||||
do { if (verbose) printf(args); } while(0)
|
do { if (verbose) printf(args); } while(0)
|
||||||
/*:*/
|
/*:*/
|
||||||
|
|
||||||
/* The pipe to send commands to the waker process */
|
/* File descriptors for the Waker. */
|
||||||
static int waker_fd;
|
struct {
|
||||||
|
int pipe[2];
|
||||||
|
int lguest_fd;
|
||||||
|
} waker_fds;
|
||||||
|
|
||||||
/* The pointer to the start of guest memory. */
|
/* The pointer to the start of guest memory. */
|
||||||
static void *guest_base;
|
static void *guest_base;
|
||||||
/* The maximum guest physical address allowed, and maximum possible. */
|
/* The maximum guest physical address allowed, and maximum possible. */
|
||||||
static unsigned long guest_limit, guest_max;
|
static unsigned long guest_limit, guest_max;
|
||||||
|
/* The pipe for signal hander to write to. */
|
||||||
|
static int timeoutpipe[2];
|
||||||
|
static unsigned int timeout_usec = 500;
|
||||||
|
|
||||||
/* a per-cpu variable indicating whose vcpu is currently running */
|
/* a per-cpu variable indicating whose vcpu is currently running */
|
||||||
static unsigned int __thread cpu_id;
|
static unsigned int __thread cpu_id;
|
||||||
|
@ -155,11 +164,14 @@ struct virtqueue
|
||||||
/* Last available index we saw. */
|
/* Last available index we saw. */
|
||||||
u16 last_avail_idx;
|
u16 last_avail_idx;
|
||||||
|
|
||||||
/* The routine to call when the Guest pings us. */
|
/* The routine to call when the Guest pings us, or timeout. */
|
||||||
void (*handle_output)(int fd, struct virtqueue *me);
|
void (*handle_output)(int fd, struct virtqueue *me, bool timeout);
|
||||||
|
|
||||||
/* Outstanding buffers */
|
/* Outstanding buffers */
|
||||||
unsigned int inflight;
|
unsigned int inflight;
|
||||||
|
|
||||||
|
/* Is this blocked awaiting a timer? */
|
||||||
|
bool blocked;
|
||||||
};
|
};
|
||||||
|
|
||||||
/* Remember the arguments to the program so we can "reboot" */
|
/* Remember the arguments to the program so we can "reboot" */
|
||||||
|
@ -190,6 +202,9 @@ static void *_convert(struct iovec *iov, size_t size, size_t align,
|
||||||
return iov->iov_base;
|
return iov->iov_base;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/* Wrapper for the last available index. Makes it easier to change. */
|
||||||
|
#define lg_last_avail(vq) ((vq)->last_avail_idx)
|
||||||
|
|
||||||
/* The virtio configuration space is defined to be little-endian. x86 is
|
/* The virtio configuration space is defined to be little-endian. x86 is
|
||||||
* little-endian too, but it's nice to be explicit so we have these helpers. */
|
* little-endian too, but it's nice to be explicit so we have these helpers. */
|
||||||
#define cpu_to_le16(v16) (v16)
|
#define cpu_to_le16(v16) (v16)
|
||||||
|
@ -199,6 +214,33 @@ static void *_convert(struct iovec *iov, size_t size, size_t align,
|
||||||
#define le32_to_cpu(v32) (v32)
|
#define le32_to_cpu(v32) (v32)
|
||||||
#define le64_to_cpu(v64) (v64)
|
#define le64_to_cpu(v64) (v64)
|
||||||
|
|
||||||
|
/* Is this iovec empty? */
|
||||||
|
static bool iov_empty(const struct iovec iov[], unsigned int num_iov)
|
||||||
|
{
|
||||||
|
unsigned int i;
|
||||||
|
|
||||||
|
for (i = 0; i < num_iov; i++)
|
||||||
|
if (iov[i].iov_len)
|
||||||
|
return false;
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Take len bytes from the front of this iovec. */
|
||||||
|
static void iov_consume(struct iovec iov[], unsigned num_iov, unsigned len)
|
||||||
|
{
|
||||||
|
unsigned int i;
|
||||||
|
|
||||||
|
for (i = 0; i < num_iov; i++) {
|
||||||
|
unsigned int used;
|
||||||
|
|
||||||
|
used = iov[i].iov_len < len ? iov[i].iov_len : len;
|
||||||
|
iov[i].iov_base += used;
|
||||||
|
iov[i].iov_len -= used;
|
||||||
|
len -= used;
|
||||||
|
}
|
||||||
|
assert(len == 0);
|
||||||
|
}
|
||||||
|
|
||||||
/* The device virtqueue descriptors are followed by feature bitmasks. */
|
/* The device virtqueue descriptors are followed by feature bitmasks. */
|
||||||
static u8 *get_feature_bits(struct device *dev)
|
static u8 *get_feature_bits(struct device *dev)
|
||||||
{
|
{
|
||||||
|
@ -254,6 +296,7 @@ static void *map_zeroed_pages(unsigned int num)
|
||||||
PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE, fd, 0);
|
PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE, fd, 0);
|
||||||
if (addr == MAP_FAILED)
|
if (addr == MAP_FAILED)
|
||||||
err(1, "Mmaping %u pages of /dev/zero", num);
|
err(1, "Mmaping %u pages of /dev/zero", num);
|
||||||
|
close(fd);
|
||||||
|
|
||||||
return addr;
|
return addr;
|
||||||
}
|
}
|
||||||
|
@ -540,69 +583,64 @@ static void add_device_fd(int fd)
|
||||||
* watch, but handing a file descriptor mask through to the kernel is fairly
|
* watch, but handing a file descriptor mask through to the kernel is fairly
|
||||||
* icky.
|
* icky.
|
||||||
*
|
*
|
||||||
* Instead, we fork off a process which watches the file descriptors and writes
|
* Instead, we clone off a thread which watches the file descriptors and writes
|
||||||
* the LHREQ_BREAK command to the /dev/lguest file descriptor to tell the Host
|
* the LHREQ_BREAK command to the /dev/lguest file descriptor to tell the Host
|
||||||
* stop running the Guest. This causes the Launcher to return from the
|
* stop running the Guest. This causes the Launcher to return from the
|
||||||
* /dev/lguest read with -EAGAIN, where it will write to /dev/lguest to reset
|
* /dev/lguest read with -EAGAIN, where it will write to /dev/lguest to reset
|
||||||
* the LHREQ_BREAK and wake us up again.
|
* the LHREQ_BREAK and wake us up again.
|
||||||
*
|
*
|
||||||
* This, of course, is merely a different *kind* of icky.
|
* This, of course, is merely a different *kind* of icky.
|
||||||
|
*
|
||||||
|
* Given my well-known antipathy to threads, I'd prefer to use processes. But
|
||||||
|
* it's easier to share Guest memory with threads, and trivial to share the
|
||||||
|
* devices.infds as the Launcher changes it.
|
||||||
*/
|
*/
|
||||||
static void wake_parent(int pipefd, int lguest_fd)
|
static int waker(void *unused)
|
||||||
{
|
{
|
||||||
/* Add the pipe from the Launcher to the fdset in the device_list, so
|
/* Close the write end of the pipe: only the Launcher has it open. */
|
||||||
* we watch it, too. */
|
close(waker_fds.pipe[1]);
|
||||||
add_device_fd(pipefd);
|
|
||||||
|
|
||||||
for (;;) {
|
for (;;) {
|
||||||
fd_set rfds = devices.infds;
|
fd_set rfds = devices.infds;
|
||||||
unsigned long args[] = { LHREQ_BREAK, 1 };
|
unsigned long args[] = { LHREQ_BREAK, 1 };
|
||||||
|
unsigned int maxfd = devices.max_infd;
|
||||||
|
|
||||||
|
/* We also listen to the pipe from the Launcher. */
|
||||||
|
FD_SET(waker_fds.pipe[0], &rfds);
|
||||||
|
if (waker_fds.pipe[0] > maxfd)
|
||||||
|
maxfd = waker_fds.pipe[0];
|
||||||
|
|
||||||
/* Wait until input is ready from one of the devices. */
|
/* Wait until input is ready from one of the devices. */
|
||||||
select(devices.max_infd+1, &rfds, NULL, NULL, NULL);
|
select(maxfd+1, &rfds, NULL, NULL, NULL);
|
||||||
/* Is it a message from the Launcher? */
|
|
||||||
if (FD_ISSET(pipefd, &rfds)) {
|
/* Message from Launcher? */
|
||||||
int fd;
|
if (FD_ISSET(waker_fds.pipe[0], &rfds)) {
|
||||||
/* If read() returns 0, it means the Launcher has
|
char c;
|
||||||
* exited. We silently follow. */
|
/* If this fails, then assume Launcher has exited.
|
||||||
if (read(pipefd, &fd, sizeof(fd)) == 0)
|
* Don't do anything on exit: we're just a thread! */
|
||||||
exit(0);
|
if (read(waker_fds.pipe[0], &c, 1) != 1)
|
||||||
/* Otherwise it's telling us to change what file
|
_exit(0);
|
||||||
* descriptors we're to listen to. Positive means
|
continue;
|
||||||
* listen to a new one, negative means stop
|
}
|
||||||
* listening. */
|
|
||||||
if (fd >= 0)
|
/* Send LHREQ_BREAK command to snap the Launcher out of it. */
|
||||||
FD_SET(fd, &devices.infds);
|
pwrite(waker_fds.lguest_fd, args, sizeof(args), cpu_id);
|
||||||
else
|
|
||||||
FD_CLR(-fd - 1, &devices.infds);
|
|
||||||
} else /* Send LHREQ_BREAK command. */
|
|
||||||
pwrite(lguest_fd, args, sizeof(args), cpu_id);
|
|
||||||
}
|
}
|
||||||
|
return 0;
|
||||||
}
|
}
|
||||||
|
|
||||||
/* This routine just sets up a pipe to the Waker process. */
|
/* This routine just sets up a pipe to the Waker process. */
|
||||||
static int setup_waker(int lguest_fd)
|
static void setup_waker(int lguest_fd)
|
||||||
{
|
{
|
||||||
int pipefd[2], child;
|
/* This pipe is closed when Launcher dies, telling Waker. */
|
||||||
|
if (pipe(waker_fds.pipe) != 0)
|
||||||
|
err(1, "Creating pipe for Waker");
|
||||||
|
|
||||||
/* We create a pipe to talk to the Waker, and also so it knows when the
|
/* Waker also needs to know the lguest fd */
|
||||||
* Launcher dies (and closes pipe). */
|
waker_fds.lguest_fd = lguest_fd;
|
||||||
pipe(pipefd);
|
|
||||||
child = fork();
|
|
||||||
if (child == -1)
|
|
||||||
err(1, "forking");
|
|
||||||
|
|
||||||
if (child == 0) {
|
if (clone(waker, malloc(4096) + 4096, CLONE_VM | SIGCHLD, NULL) == -1)
|
||||||
/* We are the Waker: close the "writing" end of our copy of the
|
err(1, "Creating Waker");
|
||||||
* pipe and start waiting for input. */
|
|
||||||
close(pipefd[1]);
|
|
||||||
wake_parent(pipefd[0], lguest_fd);
|
|
||||||
}
|
|
||||||
/* Close the reading end of our copy of the pipe. */
|
|
||||||
close(pipefd[0]);
|
|
||||||
|
|
||||||
/* Here is the fd used to talk to the waker. */
|
|
||||||
return pipefd[1];
|
|
||||||
}
|
}
|
||||||
|
|
||||||
/*
|
/*
|
||||||
|
@ -661,19 +699,22 @@ static unsigned get_vq_desc(struct virtqueue *vq,
|
||||||
unsigned int *out_num, unsigned int *in_num)
|
unsigned int *out_num, unsigned int *in_num)
|
||||||
{
|
{
|
||||||
unsigned int i, head;
|
unsigned int i, head;
|
||||||
|
u16 last_avail;
|
||||||
|
|
||||||
/* Check it isn't doing very strange things with descriptor numbers. */
|
/* Check it isn't doing very strange things with descriptor numbers. */
|
||||||
if ((u16)(vq->vring.avail->idx - vq->last_avail_idx) > vq->vring.num)
|
last_avail = lg_last_avail(vq);
|
||||||
|
if ((u16)(vq->vring.avail->idx - last_avail) > vq->vring.num)
|
||||||
errx(1, "Guest moved used index from %u to %u",
|
errx(1, "Guest moved used index from %u to %u",
|
||||||
vq->last_avail_idx, vq->vring.avail->idx);
|
last_avail, vq->vring.avail->idx);
|
||||||
|
|
||||||
/* If there's nothing new since last we looked, return invalid. */
|
/* If there's nothing new since last we looked, return invalid. */
|
||||||
if (vq->vring.avail->idx == vq->last_avail_idx)
|
if (vq->vring.avail->idx == last_avail)
|
||||||
return vq->vring.num;
|
return vq->vring.num;
|
||||||
|
|
||||||
/* Grab the next descriptor number they're advertising, and increment
|
/* Grab the next descriptor number they're advertising, and increment
|
||||||
* the index we've seen. */
|
* the index we've seen. */
|
||||||
head = vq->vring.avail->ring[vq->last_avail_idx++ % vq->vring.num];
|
head = vq->vring.avail->ring[last_avail % vq->vring.num];
|
||||||
|
lg_last_avail(vq)++;
|
||||||
|
|
||||||
/* If their number is silly, that's a fatal mistake. */
|
/* If their number is silly, that's a fatal mistake. */
|
||||||
if (head >= vq->vring.num)
|
if (head >= vq->vring.num)
|
||||||
|
@ -821,8 +862,8 @@ static bool handle_console_input(int fd, struct device *dev)
|
||||||
unsigned long args[] = { LHREQ_BREAK, 0 };
|
unsigned long args[] = { LHREQ_BREAK, 0 };
|
||||||
/* Close the fd so Waker will know it has to
|
/* Close the fd so Waker will know it has to
|
||||||
* exit. */
|
* exit. */
|
||||||
close(waker_fd);
|
close(waker_fds.pipe[1]);
|
||||||
/* Just in case waker is blocked in BREAK, send
|
/* Just in case Waker is blocked in BREAK, send
|
||||||
* unbreak now. */
|
* unbreak now. */
|
||||||
write(fd, args, sizeof(args));
|
write(fd, args, sizeof(args));
|
||||||
exit(2);
|
exit(2);
|
||||||
|
@ -839,7 +880,7 @@ static bool handle_console_input(int fd, struct device *dev)
|
||||||
|
|
||||||
/* Handling output for console is simple: we just get all the output buffers
|
/* Handling output for console is simple: we just get all the output buffers
|
||||||
* and write them to stdout. */
|
* and write them to stdout. */
|
||||||
static void handle_console_output(int fd, struct virtqueue *vq)
|
static void handle_console_output(int fd, struct virtqueue *vq, bool timeout)
|
||||||
{
|
{
|
||||||
unsigned int head, out, in;
|
unsigned int head, out, in;
|
||||||
int len;
|
int len;
|
||||||
|
@ -854,6 +895,21 @@ static void handle_console_output(int fd, struct virtqueue *vq)
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
static void block_vq(struct virtqueue *vq)
|
||||||
|
{
|
||||||
|
struct itimerval itm;
|
||||||
|
|
||||||
|
vq->vring.used->flags |= VRING_USED_F_NO_NOTIFY;
|
||||||
|
vq->blocked = true;
|
||||||
|
|
||||||
|
itm.it_interval.tv_sec = 0;
|
||||||
|
itm.it_interval.tv_usec = 0;
|
||||||
|
itm.it_value.tv_sec = 0;
|
||||||
|
itm.it_value.tv_usec = timeout_usec;
|
||||||
|
|
||||||
|
setitimer(ITIMER_REAL, &itm, NULL);
|
||||||
|
}
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* The Network
|
* The Network
|
||||||
*
|
*
|
||||||
|
@ -861,22 +917,34 @@ static void handle_console_output(int fd, struct virtqueue *vq)
|
||||||
* and write them (ignoring the first element) to this device's file descriptor
|
* and write them (ignoring the first element) to this device's file descriptor
|
||||||
* (/dev/net/tun).
|
* (/dev/net/tun).
|
||||||
*/
|
*/
|
||||||
static void handle_net_output(int fd, struct virtqueue *vq)
|
static void handle_net_output(int fd, struct virtqueue *vq, bool timeout)
|
||||||
{
|
{
|
||||||
unsigned int head, out, in;
|
unsigned int head, out, in, num = 0;
|
||||||
int len;
|
int len;
|
||||||
struct iovec iov[vq->vring.num];
|
struct iovec iov[vq->vring.num];
|
||||||
|
static int last_timeout_num;
|
||||||
|
|
||||||
/* Keep getting output buffers from the Guest until we run out. */
|
/* Keep getting output buffers from the Guest until we run out. */
|
||||||
while ((head = get_vq_desc(vq, iov, &out, &in)) != vq->vring.num) {
|
while ((head = get_vq_desc(vq, iov, &out, &in)) != vq->vring.num) {
|
||||||
if (in)
|
if (in)
|
||||||
errx(1, "Input buffers in output queue?");
|
errx(1, "Input buffers in output queue?");
|
||||||
/* Check header, but otherwise ignore it (we told the Guest we
|
len = writev(vq->dev->fd, iov, out);
|
||||||
* supported no features, so it shouldn't have anything
|
if (len < 0)
|
||||||
* interesting). */
|
err(1, "Writing network packet to tun");
|
||||||
(void)convert(&iov[0], struct virtio_net_hdr);
|
|
||||||
len = writev(vq->dev->fd, iov+1, out-1);
|
|
||||||
add_used_and_trigger(fd, vq, head, len);
|
add_used_and_trigger(fd, vq, head, len);
|
||||||
|
num++;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Block further kicks and set up a timer if we saw anything. */
|
||||||
|
if (!timeout && num)
|
||||||
|
block_vq(vq);
|
||||||
|
|
||||||
|
if (timeout) {
|
||||||
|
if (num < last_timeout_num)
|
||||||
|
timeout_usec += 10;
|
||||||
|
else if (timeout_usec > 1)
|
||||||
|
timeout_usec--;
|
||||||
|
last_timeout_num = num;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -887,7 +955,6 @@ static bool handle_tun_input(int fd, struct device *dev)
|
||||||
unsigned int head, in_num, out_num;
|
unsigned int head, in_num, out_num;
|
||||||
int len;
|
int len;
|
||||||
struct iovec iov[dev->vq->vring.num];
|
struct iovec iov[dev->vq->vring.num];
|
||||||
struct virtio_net_hdr *hdr;
|
|
||||||
|
|
||||||
/* First we need a network buffer from the Guests's recv virtqueue. */
|
/* First we need a network buffer from the Guests's recv virtqueue. */
|
||||||
head = get_vq_desc(dev->vq, iov, &out_num, &in_num);
|
head = get_vq_desc(dev->vq, iov, &out_num, &in_num);
|
||||||
|
@ -896,25 +963,23 @@ static bool handle_tun_input(int fd, struct device *dev)
|
||||||
* early, the Guest won't be ready yet. Wait until the device
|
* early, the Guest won't be ready yet. Wait until the device
|
||||||
* status says it's ready. */
|
* status says it's ready. */
|
||||||
/* FIXME: Actually want DRIVER_ACTIVE here. */
|
/* FIXME: Actually want DRIVER_ACTIVE here. */
|
||||||
if (dev->desc->status & VIRTIO_CONFIG_S_DRIVER_OK)
|
|
||||||
warn("network: no dma buffer!");
|
/* Now tell it we want to know if new things appear. */
|
||||||
|
dev->vq->vring.used->flags &= ~VRING_USED_F_NO_NOTIFY;
|
||||||
|
wmb();
|
||||||
|
|
||||||
/* We'll turn this back on if input buffers are registered. */
|
/* We'll turn this back on if input buffers are registered. */
|
||||||
return false;
|
return false;
|
||||||
} else if (out_num)
|
} else if (out_num)
|
||||||
errx(1, "Output buffers in network recv queue?");
|
errx(1, "Output buffers in network recv queue?");
|
||||||
|
|
||||||
/* First element is the header: we set it to 0 (no features). */
|
|
||||||
hdr = convert(&iov[0], struct virtio_net_hdr);
|
|
||||||
hdr->flags = 0;
|
|
||||||
hdr->gso_type = VIRTIO_NET_HDR_GSO_NONE;
|
|
||||||
|
|
||||||
/* Read the packet from the device directly into the Guest's buffer. */
|
/* Read the packet from the device directly into the Guest's buffer. */
|
||||||
len = readv(dev->fd, iov+1, in_num-1);
|
len = readv(dev->fd, iov, in_num);
|
||||||
if (len <= 0)
|
if (len <= 0)
|
||||||
err(1, "reading network");
|
err(1, "reading network");
|
||||||
|
|
||||||
/* Tell the Guest about the new packet. */
|
/* Tell the Guest about the new packet. */
|
||||||
add_used_and_trigger(fd, dev->vq, head, sizeof(*hdr) + len);
|
add_used_and_trigger(fd, dev->vq, head, len);
|
||||||
|
|
||||||
verbose("tun input packet len %i [%02x %02x] (%s)\n", len,
|
verbose("tun input packet len %i [%02x %02x] (%s)\n", len,
|
||||||
((u8 *)iov[1].iov_base)[0], ((u8 *)iov[1].iov_base)[1],
|
((u8 *)iov[1].iov_base)[0], ((u8 *)iov[1].iov_base)[1],
|
||||||
|
@ -927,11 +992,18 @@ static bool handle_tun_input(int fd, struct device *dev)
|
||||||
/*L:215 This is the callback attached to the network and console input
|
/*L:215 This is the callback attached to the network and console input
|
||||||
* virtqueues: it ensures we try again, in case we stopped console or net
|
* virtqueues: it ensures we try again, in case we stopped console or net
|
||||||
* delivery because Guest didn't have any buffers. */
|
* delivery because Guest didn't have any buffers. */
|
||||||
static void enable_fd(int fd, struct virtqueue *vq)
|
static void enable_fd(int fd, struct virtqueue *vq, bool timeout)
|
||||||
{
|
{
|
||||||
add_device_fd(vq->dev->fd);
|
add_device_fd(vq->dev->fd);
|
||||||
/* Tell waker to listen to it again */
|
/* Snap the Waker out of its select loop. */
|
||||||
write(waker_fd, &vq->dev->fd, sizeof(vq->dev->fd));
|
write(waker_fds.pipe[1], "", 1);
|
||||||
|
}
|
||||||
|
|
||||||
|
static void net_enable_fd(int fd, struct virtqueue *vq, bool timeout)
|
||||||
|
{
|
||||||
|
/* We don't need to know again when Guest refills receive buffer. */
|
||||||
|
vq->vring.used->flags |= VRING_USED_F_NO_NOTIFY;
|
||||||
|
enable_fd(fd, vq, timeout);
|
||||||
}
|
}
|
||||||
|
|
||||||
/* When the Guest tells us they updated the status field, we handle it. */
|
/* When the Guest tells us they updated the status field, we handle it. */
|
||||||
|
@ -951,7 +1023,7 @@ static void update_device_status(struct device *dev)
|
||||||
for (vq = dev->vq; vq; vq = vq->next) {
|
for (vq = dev->vq; vq; vq = vq->next) {
|
||||||
memset(vq->vring.desc, 0,
|
memset(vq->vring.desc, 0,
|
||||||
vring_size(vq->config.num, getpagesize()));
|
vring_size(vq->config.num, getpagesize()));
|
||||||
vq->last_avail_idx = 0;
|
lg_last_avail(vq) = 0;
|
||||||
}
|
}
|
||||||
} else if (dev->desc->status & VIRTIO_CONFIG_S_FAILED) {
|
} else if (dev->desc->status & VIRTIO_CONFIG_S_FAILED) {
|
||||||
warnx("Device %s configuration FAILED", dev->name);
|
warnx("Device %s configuration FAILED", dev->name);
|
||||||
|
@ -960,10 +1032,10 @@ static void update_device_status(struct device *dev)
|
||||||
|
|
||||||
verbose("Device %s OK: offered", dev->name);
|
verbose("Device %s OK: offered", dev->name);
|
||||||
for (i = 0; i < dev->desc->feature_len; i++)
|
for (i = 0; i < dev->desc->feature_len; i++)
|
||||||
verbose(" %08x", get_feature_bits(dev)[i]);
|
verbose(" %02x", get_feature_bits(dev)[i]);
|
||||||
verbose(", accepted");
|
verbose(", accepted");
|
||||||
for (i = 0; i < dev->desc->feature_len; i++)
|
for (i = 0; i < dev->desc->feature_len; i++)
|
||||||
verbose(" %08x", get_feature_bits(dev)
|
verbose(" %02x", get_feature_bits(dev)
|
||||||
[dev->desc->feature_len+i]);
|
[dev->desc->feature_len+i]);
|
||||||
|
|
||||||
if (dev->ready)
|
if (dev->ready)
|
||||||
|
@ -1000,7 +1072,7 @@ static void handle_output(int fd, unsigned long addr)
|
||||||
if (strcmp(vq->dev->name, "console") != 0)
|
if (strcmp(vq->dev->name, "console") != 0)
|
||||||
verbose("Output to %s\n", vq->dev->name);
|
verbose("Output to %s\n", vq->dev->name);
|
||||||
if (vq->handle_output)
|
if (vq->handle_output)
|
||||||
vq->handle_output(fd, vq);
|
vq->handle_output(fd, vq, false);
|
||||||
return;
|
return;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
@ -1014,6 +1086,29 @@ static void handle_output(int fd, unsigned long addr)
|
||||||
strnlen(from_guest_phys(addr), guest_limit - addr));
|
strnlen(from_guest_phys(addr), guest_limit - addr));
|
||||||
}
|
}
|
||||||
|
|
||||||
|
static void handle_timeout(int fd)
|
||||||
|
{
|
||||||
|
char buf[32];
|
||||||
|
struct device *i;
|
||||||
|
struct virtqueue *vq;
|
||||||
|
|
||||||
|
/* Clear the pipe */
|
||||||
|
read(timeoutpipe[0], buf, sizeof(buf));
|
||||||
|
|
||||||
|
/* Check each device and virtqueue: flush blocked ones. */
|
||||||
|
for (i = devices.dev; i; i = i->next) {
|
||||||
|
for (vq = i->vq; vq; vq = vq->next) {
|
||||||
|
if (!vq->blocked)
|
||||||
|
continue;
|
||||||
|
|
||||||
|
vq->vring.used->flags &= ~VRING_USED_F_NO_NOTIFY;
|
||||||
|
vq->blocked = false;
|
||||||
|
if (vq->handle_output)
|
||||||
|
vq->handle_output(fd, vq, true);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
/* This is called when the Waker wakes us up: check for incoming file
|
/* This is called when the Waker wakes us up: check for incoming file
|
||||||
* descriptors. */
|
* descriptors. */
|
||||||
static void handle_input(int fd)
|
static void handle_input(int fd)
|
||||||
|
@ -1024,16 +1119,20 @@ static void handle_input(int fd)
|
||||||
for (;;) {
|
for (;;) {
|
||||||
struct device *i;
|
struct device *i;
|
||||||
fd_set fds = devices.infds;
|
fd_set fds = devices.infds;
|
||||||
|
int num;
|
||||||
|
|
||||||
|
num = select(devices.max_infd+1, &fds, NULL, NULL, &poll);
|
||||||
|
/* Could get interrupted */
|
||||||
|
if (num < 0)
|
||||||
|
continue;
|
||||||
/* If nothing is ready, we're done. */
|
/* If nothing is ready, we're done. */
|
||||||
if (select(devices.max_infd+1, &fds, NULL, NULL, &poll) == 0)
|
if (num == 0)
|
||||||
break;
|
break;
|
||||||
|
|
||||||
/* Otherwise, call the device(s) which have readable file
|
/* Otherwise, call the device(s) which have readable file
|
||||||
* descriptors and a method of handling them. */
|
* descriptors and a method of handling them. */
|
||||||
for (i = devices.dev; i; i = i->next) {
|
for (i = devices.dev; i; i = i->next) {
|
||||||
if (i->handle_input && FD_ISSET(i->fd, &fds)) {
|
if (i->handle_input && FD_ISSET(i->fd, &fds)) {
|
||||||
int dev_fd;
|
|
||||||
if (i->handle_input(fd, i))
|
if (i->handle_input(fd, i))
|
||||||
continue;
|
continue;
|
||||||
|
|
||||||
|
@ -1043,13 +1142,12 @@ static void handle_input(int fd)
|
||||||
* buffers to deliver into. Console also uses
|
* buffers to deliver into. Console also uses
|
||||||
* it when it discovers that stdin is closed. */
|
* it when it discovers that stdin is closed. */
|
||||||
FD_CLR(i->fd, &devices.infds);
|
FD_CLR(i->fd, &devices.infds);
|
||||||
/* Tell waker to ignore it too, by sending a
|
|
||||||
* negative fd number (-1, since 0 is a valid
|
|
||||||
* FD number). */
|
|
||||||
dev_fd = -i->fd - 1;
|
|
||||||
write(waker_fd, &dev_fd, sizeof(dev_fd));
|
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/* Is this the timeout fd? */
|
||||||
|
if (FD_ISSET(timeoutpipe[0], &fds))
|
||||||
|
handle_timeout(fd);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -1098,7 +1196,7 @@ static struct lguest_device_desc *new_dev_desc(u16 type)
|
||||||
/* Each device descriptor is followed by the description of its virtqueues. We
|
/* Each device descriptor is followed by the description of its virtqueues. We
|
||||||
* specify how many descriptors the virtqueue is to have. */
|
* specify how many descriptors the virtqueue is to have. */
|
||||||
static void add_virtqueue(struct device *dev, unsigned int num_descs,
|
static void add_virtqueue(struct device *dev, unsigned int num_descs,
|
||||||
void (*handle_output)(int fd, struct virtqueue *me))
|
void (*handle_output)(int, struct virtqueue *, bool))
|
||||||
{
|
{
|
||||||
unsigned int pages;
|
unsigned int pages;
|
||||||
struct virtqueue **i, *vq = malloc(sizeof(*vq));
|
struct virtqueue **i, *vq = malloc(sizeof(*vq));
|
||||||
|
@ -1114,6 +1212,7 @@ static void add_virtqueue(struct device *dev, unsigned int num_descs,
|
||||||
vq->last_avail_idx = 0;
|
vq->last_avail_idx = 0;
|
||||||
vq->dev = dev;
|
vq->dev = dev;
|
||||||
vq->inflight = 0;
|
vq->inflight = 0;
|
||||||
|
vq->blocked = false;
|
||||||
|
|
||||||
/* Initialize the configuration. */
|
/* Initialize the configuration. */
|
||||||
vq->config.num = num_descs;
|
vq->config.num = num_descs;
|
||||||
|
@ -1246,6 +1345,24 @@ static void setup_console(void)
|
||||||
}
|
}
|
||||||
/*:*/
|
/*:*/
|
||||||
|
|
||||||
|
static void timeout_alarm(int sig)
|
||||||
|
{
|
||||||
|
write(timeoutpipe[1], "", 1);
|
||||||
|
}
|
||||||
|
|
||||||
|
static void setup_timeout(void)
|
||||||
|
{
|
||||||
|
if (pipe(timeoutpipe) != 0)
|
||||||
|
err(1, "Creating timeout pipe");
|
||||||
|
|
||||||
|
if (fcntl(timeoutpipe[1], F_SETFL,
|
||||||
|
fcntl(timeoutpipe[1], F_GETFL) | O_NONBLOCK) != 0)
|
||||||
|
err(1, "Making timeout pipe nonblocking");
|
||||||
|
|
||||||
|
add_device_fd(timeoutpipe[0]);
|
||||||
|
signal(SIGALRM, timeout_alarm);
|
||||||
|
}
|
||||||
|
|
||||||
/*M:010 Inter-guest networking is an interesting area. Simplest is to have a
|
/*M:010 Inter-guest networking is an interesting area. Simplest is to have a
|
||||||
* --sharenet=<name> option which opens or creates a named pipe. This can be
|
* --sharenet=<name> option which opens or creates a named pipe. This can be
|
||||||
* used to send packets to another guest in a 1:1 manner.
|
* used to send packets to another guest in a 1:1 manner.
|
||||||
|
@ -1264,10 +1381,25 @@ static void setup_console(void)
|
||||||
|
|
||||||
static u32 str2ip(const char *ipaddr)
|
static u32 str2ip(const char *ipaddr)
|
||||||
{
|
{
|
||||||
unsigned int byte[4];
|
unsigned int b[4];
|
||||||
|
|
||||||
sscanf(ipaddr, "%u.%u.%u.%u", &byte[0], &byte[1], &byte[2], &byte[3]);
|
if (sscanf(ipaddr, "%u.%u.%u.%u", &b[0], &b[1], &b[2], &b[3]) != 4)
|
||||||
return (byte[0] << 24) | (byte[1] << 16) | (byte[2] << 8) | byte[3];
|
errx(1, "Failed to parse IP address '%s'", ipaddr);
|
||||||
|
return (b[0] << 24) | (b[1] << 16) | (b[2] << 8) | b[3];
|
||||||
|
}
|
||||||
|
|
||||||
|
static void str2mac(const char *macaddr, unsigned char mac[6])
|
||||||
|
{
|
||||||
|
unsigned int m[6];
|
||||||
|
if (sscanf(macaddr, "%02x:%02x:%02x:%02x:%02x:%02x",
|
||||||
|
&m[0], &m[1], &m[2], &m[3], &m[4], &m[5]) != 6)
|
||||||
|
errx(1, "Failed to parse mac address '%s'", macaddr);
|
||||||
|
mac[0] = m[0];
|
||||||
|
mac[1] = m[1];
|
||||||
|
mac[2] = m[2];
|
||||||
|
mac[3] = m[3];
|
||||||
|
mac[4] = m[4];
|
||||||
|
mac[5] = m[5];
|
||||||
}
|
}
|
||||||
|
|
||||||
/* This code is "adapted" from libbridge: it attaches the Host end of the
|
/* This code is "adapted" from libbridge: it attaches the Host end of the
|
||||||
|
@ -1288,6 +1420,7 @@ static void add_to_bridge(int fd, const char *if_name, const char *br_name)
|
||||||
errx(1, "interface %s does not exist!", if_name);
|
errx(1, "interface %s does not exist!", if_name);
|
||||||
|
|
||||||
strncpy(ifr.ifr_name, br_name, IFNAMSIZ);
|
strncpy(ifr.ifr_name, br_name, IFNAMSIZ);
|
||||||
|
ifr.ifr_name[IFNAMSIZ-1] = '\0';
|
||||||
ifr.ifr_ifindex = ifidx;
|
ifr.ifr_ifindex = ifidx;
|
||||||
if (ioctl(fd, SIOCBRADDIF, &ifr) < 0)
|
if (ioctl(fd, SIOCBRADDIF, &ifr) < 0)
|
||||||
err(1, "can't add %s to bridge %s", if_name, br_name);
|
err(1, "can't add %s to bridge %s", if_name, br_name);
|
||||||
|
@ -1296,64 +1429,90 @@ static void add_to_bridge(int fd, const char *if_name, const char *br_name)
|
||||||
/* This sets up the Host end of the network device with an IP address, brings
|
/* This sets up the Host end of the network device with an IP address, brings
|
||||||
* it up so packets will flow, the copies the MAC address into the hwaddr
|
* it up so packets will flow, the copies the MAC address into the hwaddr
|
||||||
* pointer. */
|
* pointer. */
|
||||||
static void configure_device(int fd, const char *devname, u32 ipaddr,
|
static void configure_device(int fd, const char *tapif, u32 ipaddr)
|
||||||
unsigned char hwaddr[6])
|
|
||||||
{
|
{
|
||||||
struct ifreq ifr;
|
struct ifreq ifr;
|
||||||
struct sockaddr_in *sin = (struct sockaddr_in *)&ifr.ifr_addr;
|
struct sockaddr_in *sin = (struct sockaddr_in *)&ifr.ifr_addr;
|
||||||
|
|
||||||
/* Don't read these incantations. Just cut & paste them like I did! */
|
|
||||||
memset(&ifr, 0, sizeof(ifr));
|
memset(&ifr, 0, sizeof(ifr));
|
||||||
strcpy(ifr.ifr_name, devname);
|
strcpy(ifr.ifr_name, tapif);
|
||||||
|
|
||||||
|
/* Don't read these incantations. Just cut & paste them like I did! */
|
||||||
sin->sin_family = AF_INET;
|
sin->sin_family = AF_INET;
|
||||||
sin->sin_addr.s_addr = htonl(ipaddr);
|
sin->sin_addr.s_addr = htonl(ipaddr);
|
||||||
if (ioctl(fd, SIOCSIFADDR, &ifr) != 0)
|
if (ioctl(fd, SIOCSIFADDR, &ifr) != 0)
|
||||||
err(1, "Setting %s interface address", devname);
|
err(1, "Setting %s interface address", tapif);
|
||||||
ifr.ifr_flags = IFF_UP;
|
ifr.ifr_flags = IFF_UP;
|
||||||
if (ioctl(fd, SIOCSIFFLAGS, &ifr) != 0)
|
if (ioctl(fd, SIOCSIFFLAGS, &ifr) != 0)
|
||||||
err(1, "Bringing interface %s up", devname);
|
err(1, "Bringing interface %s up", tapif);
|
||||||
|
}
|
||||||
|
|
||||||
|
static void get_mac(int fd, const char *tapif, unsigned char hwaddr[6])
|
||||||
|
{
|
||||||
|
struct ifreq ifr;
|
||||||
|
|
||||||
|
memset(&ifr, 0, sizeof(ifr));
|
||||||
|
strcpy(ifr.ifr_name, tapif);
|
||||||
|
|
||||||
/* SIOC stands for Socket I/O Control. G means Get (vs S for Set
|
/* SIOC stands for Socket I/O Control. G means Get (vs S for Set
|
||||||
* above). IF means Interface, and HWADDR is hardware address.
|
* above). IF means Interface, and HWADDR is hardware address.
|
||||||
* Simple! */
|
* Simple! */
|
||||||
if (ioctl(fd, SIOCGIFHWADDR, &ifr) != 0)
|
if (ioctl(fd, SIOCGIFHWADDR, &ifr) != 0)
|
||||||
err(1, "getting hw address for %s", devname);
|
err(1, "getting hw address for %s", tapif);
|
||||||
memcpy(hwaddr, ifr.ifr_hwaddr.sa_data, 6);
|
memcpy(hwaddr, ifr.ifr_hwaddr.sa_data, 6);
|
||||||
}
|
}
|
||||||
|
|
||||||
/*L:195 Our network is a Host<->Guest network. This can either use bridging or
|
static int get_tun_device(char tapif[IFNAMSIZ])
|
||||||
* routing, but the principle is the same: it uses the "tun" device to inject
|
|
||||||
* packets into the Host as if they came in from a normal network card. We
|
|
||||||
* just shunt packets between the Guest and the tun device. */
|
|
||||||
static void setup_tun_net(const char *arg)
|
|
||||||
{
|
{
|
||||||
struct device *dev;
|
|
||||||
struct ifreq ifr;
|
struct ifreq ifr;
|
||||||
int netfd, ipfd;
|
int netfd;
|
||||||
u32 ip;
|
|
||||||
const char *br_name = NULL;
|
/* Start with this zeroed. Messy but sure. */
|
||||||
struct virtio_net_config conf;
|
memset(&ifr, 0, sizeof(ifr));
|
||||||
|
|
||||||
/* We open the /dev/net/tun device and tell it we want a tap device. A
|
/* We open the /dev/net/tun device and tell it we want a tap device. A
|
||||||
* tap device is like a tun device, only somehow different. To tell
|
* tap device is like a tun device, only somehow different. To tell
|
||||||
* the truth, I completely blundered my way through this code, but it
|
* the truth, I completely blundered my way through this code, but it
|
||||||
* works now! */
|
* works now! */
|
||||||
netfd = open_or_die("/dev/net/tun", O_RDWR);
|
netfd = open_or_die("/dev/net/tun", O_RDWR);
|
||||||
memset(&ifr, 0, sizeof(ifr));
|
ifr.ifr_flags = IFF_TAP | IFF_NO_PI | IFF_VNET_HDR;
|
||||||
ifr.ifr_flags = IFF_TAP | IFF_NO_PI;
|
|
||||||
strcpy(ifr.ifr_name, "tap%d");
|
strcpy(ifr.ifr_name, "tap%d");
|
||||||
if (ioctl(netfd, TUNSETIFF, &ifr) != 0)
|
if (ioctl(netfd, TUNSETIFF, &ifr) != 0)
|
||||||
err(1, "configuring /dev/net/tun");
|
err(1, "configuring /dev/net/tun");
|
||||||
|
|
||||||
|
if (ioctl(netfd, TUNSETOFFLOAD,
|
||||||
|
TUN_F_CSUM|TUN_F_TSO4|TUN_F_TSO6|TUN_F_TSO_ECN) != 0)
|
||||||
|
err(1, "Could not set features for tun device");
|
||||||
|
|
||||||
/* We don't need checksums calculated for packets coming in this
|
/* We don't need checksums calculated for packets coming in this
|
||||||
* device: trust us! */
|
* device: trust us! */
|
||||||
ioctl(netfd, TUNSETNOCSUM, 1);
|
ioctl(netfd, TUNSETNOCSUM, 1);
|
||||||
|
|
||||||
|
memcpy(tapif, ifr.ifr_name, IFNAMSIZ);
|
||||||
|
return netfd;
|
||||||
|
}
|
||||||
|
|
||||||
|
/*L:195 Our network is a Host<->Guest network. This can either use bridging or
|
||||||
|
* routing, but the principle is the same: it uses the "tun" device to inject
|
||||||
|
* packets into the Host as if they came in from a normal network card. We
|
||||||
|
* just shunt packets between the Guest and the tun device. */
|
||||||
|
static void setup_tun_net(char *arg)
|
||||||
|
{
|
||||||
|
struct device *dev;
|
||||||
|
int netfd, ipfd;
|
||||||
|
u32 ip = INADDR_ANY;
|
||||||
|
bool bridging = false;
|
||||||
|
char tapif[IFNAMSIZ], *p;
|
||||||
|
struct virtio_net_config conf;
|
||||||
|
|
||||||
|
netfd = get_tun_device(tapif);
|
||||||
|
|
||||||
/* First we create a new network device. */
|
/* First we create a new network device. */
|
||||||
dev = new_device("net", VIRTIO_ID_NET, netfd, handle_tun_input);
|
dev = new_device("net", VIRTIO_ID_NET, netfd, handle_tun_input);
|
||||||
|
|
||||||
/* Network devices need a receive and a send queue, just like
|
/* Network devices need a receive and a send queue, just like
|
||||||
* console. */
|
* console. */
|
||||||
add_virtqueue(dev, VIRTQUEUE_NUM, enable_fd);
|
add_virtqueue(dev, VIRTQUEUE_NUM, net_enable_fd);
|
||||||
add_virtqueue(dev, VIRTQUEUE_NUM, handle_net_output);
|
add_virtqueue(dev, VIRTQUEUE_NUM, handle_net_output);
|
||||||
|
|
||||||
/* We need a socket to perform the magic network ioctls to bring up the
|
/* We need a socket to perform the magic network ioctls to bring up the
|
||||||
|
@ -1364,28 +1523,56 @@ static void setup_tun_net(const char *arg)
|
||||||
|
|
||||||
/* If the command line was --tunnet=bridge:<name> do bridging. */
|
/* If the command line was --tunnet=bridge:<name> do bridging. */
|
||||||
if (!strncmp(BRIDGE_PFX, arg, strlen(BRIDGE_PFX))) {
|
if (!strncmp(BRIDGE_PFX, arg, strlen(BRIDGE_PFX))) {
|
||||||
ip = INADDR_ANY;
|
arg += strlen(BRIDGE_PFX);
|
||||||
br_name = arg + strlen(BRIDGE_PFX);
|
bridging = true;
|
||||||
add_to_bridge(ipfd, ifr.ifr_name, br_name);
|
}
|
||||||
} else /* It is an IP address to set up the device with */
|
|
||||||
|
/* A mac address may follow the bridge name or IP address */
|
||||||
|
p = strchr(arg, ':');
|
||||||
|
if (p) {
|
||||||
|
str2mac(p+1, conf.mac);
|
||||||
|
*p = '\0';
|
||||||
|
} else {
|
||||||
|
p = arg + strlen(arg);
|
||||||
|
/* None supplied; query the randomly assigned mac. */
|
||||||
|
get_mac(ipfd, tapif, conf.mac);
|
||||||
|
}
|
||||||
|
|
||||||
|
/* arg is now either an IP address or a bridge name */
|
||||||
|
if (bridging)
|
||||||
|
add_to_bridge(ipfd, tapif, arg);
|
||||||
|
else
|
||||||
ip = str2ip(arg);
|
ip = str2ip(arg);
|
||||||
|
|
||||||
/* Set up the tun device, and get the mac address for the interface. */
|
/* Set up the tun device. */
|
||||||
configure_device(ipfd, ifr.ifr_name, ip, conf.mac);
|
configure_device(ipfd, tapif, ip);
|
||||||
|
|
||||||
/* Tell Guest what MAC address to use. */
|
/* Tell Guest what MAC address to use. */
|
||||||
add_feature(dev, VIRTIO_NET_F_MAC);
|
add_feature(dev, VIRTIO_NET_F_MAC);
|
||||||
add_feature(dev, VIRTIO_F_NOTIFY_ON_EMPTY);
|
add_feature(dev, VIRTIO_F_NOTIFY_ON_EMPTY);
|
||||||
|
/* Expect Guest to handle everything except UFO */
|
||||||
|
add_feature(dev, VIRTIO_NET_F_CSUM);
|
||||||
|
add_feature(dev, VIRTIO_NET_F_GUEST_CSUM);
|
||||||
|
add_feature(dev, VIRTIO_NET_F_MAC);
|
||||||
|
add_feature(dev, VIRTIO_NET_F_GUEST_TSO4);
|
||||||
|
add_feature(dev, VIRTIO_NET_F_GUEST_TSO6);
|
||||||
|
add_feature(dev, VIRTIO_NET_F_GUEST_ECN);
|
||||||
|
add_feature(dev, VIRTIO_NET_F_HOST_TSO4);
|
||||||
|
add_feature(dev, VIRTIO_NET_F_HOST_TSO6);
|
||||||
|
add_feature(dev, VIRTIO_NET_F_HOST_ECN);
|
||||||
set_config(dev, sizeof(conf), &conf);
|
set_config(dev, sizeof(conf), &conf);
|
||||||
|
|
||||||
/* We don't need the socket any more; setup is done. */
|
/* We don't need the socket any more; setup is done. */
|
||||||
close(ipfd);
|
close(ipfd);
|
||||||
|
|
||||||
verbose("device %u: tun net %u.%u.%u.%u\n",
|
devices.device_num++;
|
||||||
devices.device_num++,
|
|
||||||
(u8)(ip>>24),(u8)(ip>>16),(u8)(ip>>8),(u8)ip);
|
if (bridging)
|
||||||
if (br_name)
|
verbose("device %u: tun %s attached to bridge: %s\n",
|
||||||
verbose("attached to bridge: %s\n", br_name);
|
devices.device_num, tapif, arg);
|
||||||
|
else
|
||||||
|
verbose("device %u: tun %s: %s\n",
|
||||||
|
devices.device_num, tapif, arg);
|
||||||
}
|
}
|
||||||
|
|
||||||
/* Our block (disk) device should be really simple: the Guest asks for a block
|
/* Our block (disk) device should be really simple: the Guest asks for a block
|
||||||
|
@ -1550,7 +1737,7 @@ static bool handle_io_finish(int fd, struct device *dev)
|
||||||
}
|
}
|
||||||
|
|
||||||
/* When the Guest submits some I/O, we just need to wake the I/O thread. */
|
/* When the Guest submits some I/O, we just need to wake the I/O thread. */
|
||||||
static void handle_virtblk_output(int fd, struct virtqueue *vq)
|
static void handle_virtblk_output(int fd, struct virtqueue *vq, bool timeout)
|
||||||
{
|
{
|
||||||
struct vblk_info *vblk = vq->dev->priv;
|
struct vblk_info *vblk = vq->dev->priv;
|
||||||
char c = 0;
|
char c = 0;
|
||||||
|
@ -1621,6 +1808,64 @@ static void setup_block_file(const char *filename)
|
||||||
verbose("device %u: virtblock %llu sectors\n",
|
verbose("device %u: virtblock %llu sectors\n",
|
||||||
devices.device_num, le64_to_cpu(conf.capacity));
|
devices.device_num, le64_to_cpu(conf.capacity));
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/* Our random number generator device reads from /dev/random into the Guest's
|
||||||
|
* input buffers. The usual case is that the Guest doesn't want random numbers
|
||||||
|
* and so has no buffers although /dev/random is still readable, whereas
|
||||||
|
* console is the reverse.
|
||||||
|
*
|
||||||
|
* The same logic applies, however. */
|
||||||
|
static bool handle_rng_input(int fd, struct device *dev)
|
||||||
|
{
|
||||||
|
int len;
|
||||||
|
unsigned int head, in_num, out_num, totlen = 0;
|
||||||
|
struct iovec iov[dev->vq->vring.num];
|
||||||
|
|
||||||
|
/* First we need a buffer from the Guests's virtqueue. */
|
||||||
|
head = get_vq_desc(dev->vq, iov, &out_num, &in_num);
|
||||||
|
|
||||||
|
/* If they're not ready for input, stop listening to this file
|
||||||
|
* descriptor. We'll start again once they add an input buffer. */
|
||||||
|
if (head == dev->vq->vring.num)
|
||||||
|
return false;
|
||||||
|
|
||||||
|
if (out_num)
|
||||||
|
errx(1, "Output buffers in rng?");
|
||||||
|
|
||||||
|
/* This is why we convert to iovecs: the readv() call uses them, and so
|
||||||
|
* it reads straight into the Guest's buffer. We loop to make sure we
|
||||||
|
* fill it. */
|
||||||
|
while (!iov_empty(iov, in_num)) {
|
||||||
|
len = readv(dev->fd, iov, in_num);
|
||||||
|
if (len <= 0)
|
||||||
|
err(1, "Read from /dev/random gave %i", len);
|
||||||
|
iov_consume(iov, in_num, len);
|
||||||
|
totlen += len;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Tell the Guest about the new input. */
|
||||||
|
add_used_and_trigger(fd, dev->vq, head, totlen);
|
||||||
|
|
||||||
|
/* Everything went OK! */
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* And this creates a "hardware" random number device for the Guest. */
|
||||||
|
static void setup_rng(void)
|
||||||
|
{
|
||||||
|
struct device *dev;
|
||||||
|
int fd;
|
||||||
|
|
||||||
|
fd = open_or_die("/dev/random", O_RDONLY);
|
||||||
|
|
||||||
|
/* The device responds to return from I/O thread. */
|
||||||
|
dev = new_device("rng", VIRTIO_ID_RNG, fd, handle_rng_input);
|
||||||
|
|
||||||
|
/* The device has one virtqueue, where the Guest places inbufs. */
|
||||||
|
add_virtqueue(dev, VIRTQUEUE_NUM, enable_fd);
|
||||||
|
|
||||||
|
verbose("device %u: rng\n", devices.device_num++);
|
||||||
|
}
|
||||||
/* That's the end of device setup. */
|
/* That's the end of device setup. */
|
||||||
|
|
||||||
/*L:230 Reboot is pretty easy: clean up and exec() the Launcher afresh. */
|
/*L:230 Reboot is pretty easy: clean up and exec() the Launcher afresh. */
|
||||||
|
@ -1628,11 +1873,12 @@ static void __attribute__((noreturn)) restart_guest(void)
|
||||||
{
|
{
|
||||||
unsigned int i;
|
unsigned int i;
|
||||||
|
|
||||||
/* Closing pipes causes the Waker thread and io_threads to die, and
|
/* Since we don't track all open fds, we simply close everything beyond
|
||||||
* closing /dev/lguest cleans up the Guest. Since we don't track all
|
* stderr. */
|
||||||
* open fds, we simply close everything beyond stderr. */
|
|
||||||
for (i = 3; i < FD_SETSIZE; i++)
|
for (i = 3; i < FD_SETSIZE; i++)
|
||||||
close(i);
|
close(i);
|
||||||
|
|
||||||
|
/* The exec automatically gets rid of the I/O and Waker threads. */
|
||||||
execv(main_args[0], main_args);
|
execv(main_args[0], main_args);
|
||||||
err(1, "Could not exec %s", main_args[0]);
|
err(1, "Could not exec %s", main_args[0]);
|
||||||
}
|
}
|
||||||
|
@ -1663,7 +1909,7 @@ static void __attribute__((noreturn)) run_guest(int lguest_fd)
|
||||||
/* ERESTART means that we need to reboot the guest */
|
/* ERESTART means that we need to reboot the guest */
|
||||||
} else if (errno == ERESTART) {
|
} else if (errno == ERESTART) {
|
||||||
restart_guest();
|
restart_guest();
|
||||||
/* EAGAIN means the Waker wanted us to look at some input.
|
/* EAGAIN means a signal (timeout).
|
||||||
* Anything else means a bug or incompatible change. */
|
* Anything else means a bug or incompatible change. */
|
||||||
} else if (errno != EAGAIN)
|
} else if (errno != EAGAIN)
|
||||||
err(1, "Running guest failed");
|
err(1, "Running guest failed");
|
||||||
|
@ -1691,13 +1937,14 @@ static struct option opts[] = {
|
||||||
{ "verbose", 0, NULL, 'v' },
|
{ "verbose", 0, NULL, 'v' },
|
||||||
{ "tunnet", 1, NULL, 't' },
|
{ "tunnet", 1, NULL, 't' },
|
||||||
{ "block", 1, NULL, 'b' },
|
{ "block", 1, NULL, 'b' },
|
||||||
|
{ "rng", 0, NULL, 'r' },
|
||||||
{ "initrd", 1, NULL, 'i' },
|
{ "initrd", 1, NULL, 'i' },
|
||||||
{ NULL },
|
{ NULL },
|
||||||
};
|
};
|
||||||
static void usage(void)
|
static void usage(void)
|
||||||
{
|
{
|
||||||
errx(1, "Usage: lguest [--verbose] "
|
errx(1, "Usage: lguest [--verbose] "
|
||||||
"[--tunnet=(<ipaddr>|bridge:<bridgename>)\n"
|
"[--tunnet=(<ipaddr>:<macaddr>|bridge:<bridgename>:<macaddr>)\n"
|
||||||
"|--block=<filename>|--initrd=<filename>]...\n"
|
"|--block=<filename>|--initrd=<filename>]...\n"
|
||||||
"<mem-in-mb> vmlinux [args...]");
|
"<mem-in-mb> vmlinux [args...]");
|
||||||
}
|
}
|
||||||
|
@ -1765,6 +2012,9 @@ int main(int argc, char *argv[])
|
||||||
case 'b':
|
case 'b':
|
||||||
setup_block_file(optarg);
|
setup_block_file(optarg);
|
||||||
break;
|
break;
|
||||||
|
case 'r':
|
||||||
|
setup_rng();
|
||||||
|
break;
|
||||||
case 'i':
|
case 'i':
|
||||||
initrd_name = optarg;
|
initrd_name = optarg;
|
||||||
break;
|
break;
|
||||||
|
@ -1783,6 +2033,9 @@ int main(int argc, char *argv[])
|
||||||
/* We always have a console device */
|
/* We always have a console device */
|
||||||
setup_console();
|
setup_console();
|
||||||
|
|
||||||
|
/* We can timeout waiting for Guest network transmit. */
|
||||||
|
setup_timeout();
|
||||||
|
|
||||||
/* Now we load the kernel */
|
/* Now we load the kernel */
|
||||||
start = load_kernel(open_or_die(argv[optind+1], O_RDONLY));
|
start = load_kernel(open_or_die(argv[optind+1], O_RDONLY));
|
||||||
|
|
||||||
|
@ -1826,10 +2079,10 @@ int main(int argc, char *argv[])
|
||||||
* /dev/lguest file descriptor. */
|
* /dev/lguest file descriptor. */
|
||||||
lguest_fd = tell_kernel(pgdir, start);
|
lguest_fd = tell_kernel(pgdir, start);
|
||||||
|
|
||||||
/* We fork off a child process, which wakes the Launcher whenever one
|
/* We clone off a thread, which wakes the Launcher whenever one of the
|
||||||
* of the input file descriptors needs attention. We call this the
|
* input file descriptors needs attention. We call this the Waker, and
|
||||||
* Waker, and we'll cover it in a moment. */
|
* we'll cover it in a moment. */
|
||||||
waker_fd = setup_waker(lguest_fd);
|
setup_waker(lguest_fd);
|
||||||
|
|
||||||
/* Finally, run the Guest. This doesn't return. */
|
/* Finally, run the Guest. This doesn't return. */
|
||||||
run_guest(lguest_fd);
|
run_guest(lguest_fd);
|
||||||
|
|
|
@ -36,7 +36,7 @@ It can be done by slightly modifying the standard atomic operations : only
|
||||||
their UP variant must be kept. It typically means removing LOCK prefix (on
|
their UP variant must be kept. It typically means removing LOCK prefix (on
|
||||||
i386 and x86_64) and any SMP sychronization barrier. If the architecture does
|
i386 and x86_64) and any SMP sychronization barrier. If the architecture does
|
||||||
not have a different behavior between SMP and UP, including asm-generic/local.h
|
not have a different behavior between SMP and UP, including asm-generic/local.h
|
||||||
in your archtecture's local.h is sufficient.
|
in your architecture's local.h is sufficient.
|
||||||
|
|
||||||
The local_t type is defined as an opaque signed long by embedding an
|
The local_t type is defined as an opaque signed long by embedding an
|
||||||
atomic_long_t inside a structure. This is made so a cast from this type to a
|
atomic_long_t inside a structure. This is made so a cast from this type to a
|
||||||
|
|
|
@ -236,6 +236,11 @@ All md devices contain:
|
||||||
writing the word for the desired state, however some states
|
writing the word for the desired state, however some states
|
||||||
cannot be explicitly set, and some transitions are not allowed.
|
cannot be explicitly set, and some transitions are not allowed.
|
||||||
|
|
||||||
|
Select/poll works on this file. All changes except between
|
||||||
|
active_idle and active (which can be frequent and are not
|
||||||
|
very interesting) are notified. active->active_idle is
|
||||||
|
reported if the metadata is externally managed.
|
||||||
|
|
||||||
clear
|
clear
|
||||||
No devices, no size, no level
|
No devices, no size, no level
|
||||||
Writing is equivalent to STOP_ARRAY ioctl
|
Writing is equivalent to STOP_ARRAY ioctl
|
||||||
|
@ -292,6 +297,10 @@ Each directory contains:
|
||||||
writemostly - device will only be subject to read
|
writemostly - device will only be subject to read
|
||||||
requests if there are no other options.
|
requests if there are no other options.
|
||||||
This applies only to raid1 arrays.
|
This applies only to raid1 arrays.
|
||||||
|
blocked - device has failed, metadata is "external",
|
||||||
|
and the failure hasn't been acknowledged yet.
|
||||||
|
Writes that would write to this device if
|
||||||
|
it were not faulty are blocked.
|
||||||
spare - device is working, but not a full member.
|
spare - device is working, but not a full member.
|
||||||
This includes spares that are in the process
|
This includes spares that are in the process
|
||||||
of being recovered to
|
of being recovered to
|
||||||
|
@ -301,6 +310,12 @@ Each directory contains:
|
||||||
Writing "remove" removes the device from the array.
|
Writing "remove" removes the device from the array.
|
||||||
Writing "writemostly" sets the writemostly flag.
|
Writing "writemostly" sets the writemostly flag.
|
||||||
Writing "-writemostly" clears the writemostly flag.
|
Writing "-writemostly" clears the writemostly flag.
|
||||||
|
Writing "blocked" sets the "blocked" flag.
|
||||||
|
Writing "-blocked" clear the "blocked" flag and allows writes
|
||||||
|
to complete.
|
||||||
|
|
||||||
|
This file responds to select/poll. Any change to 'faulty'
|
||||||
|
or 'blocked' causes an event.
|
||||||
|
|
||||||
errors
|
errors
|
||||||
An approximate count of read errors that have been detected on
|
An approximate count of read errors that have been detected on
|
||||||
|
@ -332,7 +347,7 @@ Each directory contains:
|
||||||
for storage of data. This will normally be the same as the
|
for storage of data. This will normally be the same as the
|
||||||
component_size. This can be written while assembling an
|
component_size. This can be written while assembling an
|
||||||
array. If a value less than the current component_size is
|
array. If a value less than the current component_size is
|
||||||
written, component_size will be reduced to this value.
|
written, it will be rejected.
|
||||||
|
|
||||||
|
|
||||||
An active md device will also contain and entry for each active device
|
An active md device will also contain and entry for each active device
|
||||||
|
@ -381,6 +396,19 @@ also have
|
||||||
'check' and 'repair' will start the appropriate process
|
'check' and 'repair' will start the appropriate process
|
||||||
providing the current state is 'idle'.
|
providing the current state is 'idle'.
|
||||||
|
|
||||||
|
This file responds to select/poll. Any important change in the value
|
||||||
|
triggers a poll event. Sometimes the value will briefly be
|
||||||
|
"recover" if a recovery seems to be needed, but cannot be
|
||||||
|
achieved. In that case, the transition to "recover" isn't
|
||||||
|
notified, but the transition away is.
|
||||||
|
|
||||||
|
degraded
|
||||||
|
This contains a count of the number of devices by which the
|
||||||
|
arrays is degraded. So an optimal array with show '0'. A
|
||||||
|
single failed/missing drive will show '1', etc.
|
||||||
|
This file responds to select/poll, any increase or decrease
|
||||||
|
in the count of missing devices will trigger an event.
|
||||||
|
|
||||||
mismatch_count
|
mismatch_count
|
||||||
When performing 'check' and 'repair', and possibly when
|
When performing 'check' and 'repair', and possibly when
|
||||||
performing 'resync', md will count the number of errors that are
|
performing 'resync', md will count the number of errors that are
|
||||||
|
|
|
@ -1,14 +1,22 @@
|
||||||
=============================================================================
|
=============================================================================
|
||||||
|
MOXA Smartio/Industio Family Device Driver Installation Guide
|
||||||
|
for Linux Kernel 2.4.x, 2.6.x
|
||||||
|
Copyright (C) 2008, Moxa Inc.
|
||||||
|
=============================================================================
|
||||||
|
Date: 01/21/2008
|
||||||
|
|
||||||
MOXA Smartio Family Device Driver Ver 1.1 Installation Guide
|
|
||||||
for Linux Kernel 2.2.x and 2.0.3x
|
|
||||||
Copyright (C) 1999, Moxa Technologies Co, Ltd.
|
|
||||||
=============================================================================
|
|
||||||
Content
|
Content
|
||||||
|
|
||||||
1. Introduction
|
1. Introduction
|
||||||
2. System Requirement
|
2. System Requirement
|
||||||
3. Installation
|
3. Installation
|
||||||
|
3.1 Hardware installation
|
||||||
|
3.2 Driver files
|
||||||
|
3.3 Device naming convention
|
||||||
|
3.4 Module driver configuration
|
||||||
|
3.5 Static driver configuration for Linux kernel 2.4.x and 2.6.x.
|
||||||
|
3.6 Custom configuration
|
||||||
|
3.7 Verify driver installation
|
||||||
4. Utilities
|
4. Utilities
|
||||||
5. Setserial
|
5. Setserial
|
||||||
6. Troubleshooting
|
6. Troubleshooting
|
||||||
|
@ -16,27 +24,48 @@ Content
|
||||||
-----------------------------------------------------------------------------
|
-----------------------------------------------------------------------------
|
||||||
1. Introduction
|
1. Introduction
|
||||||
|
|
||||||
The Smartio family Linux driver, Ver. 1.1, supports following multiport
|
The Smartio/Industio/UPCI family Linux driver supports following multiport
|
||||||
boards.
|
boards.
|
||||||
|
|
||||||
-C104P/H/HS, C104H/PCI, C104HS/PCI, CI-104J 4 port multiport board.
|
- 2 ports multiport board
|
||||||
-C168P/H/HS, C168H/PCI 8 port multiport board.
|
CP-102U, CP-102UL, CP-102UF
|
||||||
|
CP-132U-I, CP-132UL,
|
||||||
|
CP-132, CP-132I, CP132S, CP-132IS,
|
||||||
|
CI-132, CI-132I, CI-132IS,
|
||||||
|
(C102H, C102HI, C102HIS, C102P, CP-102, CP-102S)
|
||||||
|
|
||||||
This driver has been modified a little and cleaned up from the Moxa
|
- 4 ports multiport board
|
||||||
contributed driver code and merged into Linux 2.2.14pre. In particular
|
CP-104EL,
|
||||||
official major/minor numbers have been assigned which are different to
|
CP-104UL, CP-104JU,
|
||||||
those the original Moxa supplied driver used.
|
CP-134U, CP-134U-I,
|
||||||
|
C104H/PCI, C104HS/PCI,
|
||||||
|
CP-114, CP-114I, CP-114S, CP-114IS, CP-114UL,
|
||||||
|
C104H, C104HS,
|
||||||
|
CI-104J, CI-104JS,
|
||||||
|
CI-134, CI-134I, CI-134IS,
|
||||||
|
(C114HI, CT-114I, C104P)
|
||||||
|
POS-104UL,
|
||||||
|
CB-114,
|
||||||
|
CB-134I
|
||||||
|
|
||||||
|
- 8 ports multiport board
|
||||||
|
CP-118EL, CP-168EL,
|
||||||
|
CP-118U, CP-168U,
|
||||||
|
C168H/PCI,
|
||||||
|
C168H, C168HS,
|
||||||
|
(C168P),
|
||||||
|
CB-108
|
||||||
|
|
||||||
This driver and installation procedure have been developed upon Linux Kernel
|
This driver and installation procedure have been developed upon Linux Kernel
|
||||||
2.2.5 and backward compatible to 2.0.3x. This driver supports Intel x86 and
|
2.4.x and 2.6.x. This driver supports Intel x86 hardware platform. In order
|
||||||
Alpha hardware platform. In order to maintain compatibility, this version
|
to maintain compatibility, this version has also been properly tested with
|
||||||
has also been properly tested with RedHat, OpenLinux, TurboLinux and
|
RedHat, Mandrake, Fedora and S.u.S.E Linux. However, if compatibility problem
|
||||||
S.u.S.E Linux. However, if compatibility problem occurs, please contact
|
occurs, please contact Moxa at support@moxa.com.tw.
|
||||||
Moxa at support@moxa.com.tw.
|
|
||||||
|
|
||||||
In addition to device driver, useful utilities are also provided in this
|
In addition to device driver, useful utilities are also provided in this
|
||||||
version. They are
|
version. They are
|
||||||
- msdiag Diagnostic program for detecting installed Moxa Smartio boards.
|
- msdiag Diagnostic program for displaying installed Moxa
|
||||||
|
Smartio/Industio boards.
|
||||||
- msmon Monitor program to observe data count and line status signals.
|
- msmon Monitor program to observe data count and line status signals.
|
||||||
- msterm A simple terminal program which is useful in testing serial
|
- msterm A simple terminal program which is useful in testing serial
|
||||||
ports.
|
ports.
|
||||||
|
@ -47,8 +76,7 @@ Content
|
||||||
GNU General Public License in this version. Please refer to GNU General
|
GNU General Public License in this version. Please refer to GNU General
|
||||||
Public License announcement in each source code file for more detail.
|
Public License announcement in each source code file for more detail.
|
||||||
|
|
||||||
In Moxa's ftp sites, you may always find latest driver at
|
In Moxa's Web sites, you may always find latest driver at http://web.moxa.com.
|
||||||
ftp://ftp.moxa.com or ftp://ftp.moxa.com.tw.
|
|
||||||
|
|
||||||
This version of driver can be installed as Loadable Module (Module driver)
|
This version of driver can be installed as Loadable Module (Module driver)
|
||||||
or built-in into kernel (Static driver). You may refer to following
|
or built-in into kernel (Static driver). You may refer to following
|
||||||
|
@ -61,8 +89,8 @@ Content
|
||||||
|
|
||||||
-----------------------------------------------------------------------------
|
-----------------------------------------------------------------------------
|
||||||
2. System Requirement
|
2. System Requirement
|
||||||
- Hardware platform: Intel x86 or Alpha machine
|
- Hardware platform: Intel x86 machine
|
||||||
- Kernel version: 2.0.3x or 2.2.x
|
- Kernel version: 2.4.x or 2.6.x
|
||||||
- gcc version 2.72 or later
|
- gcc version 2.72 or later
|
||||||
- Maximum 4 boards can be installed in combination
|
- Maximum 4 boards can be installed in combination
|
||||||
|
|
||||||
|
@ -70,9 +98,18 @@ Content
|
||||||
3. Installation
|
3. Installation
|
||||||
|
|
||||||
3.1 Hardware installation
|
3.1 Hardware installation
|
||||||
|
3.2 Driver files
|
||||||
|
3.3 Device naming convention
|
||||||
|
3.4 Module driver configuration
|
||||||
|
3.5 Static driver configuration for Linux kernel 2.4.x, 2.6.x.
|
||||||
|
3.6 Custom configuration
|
||||||
|
3.7 Verify driver installation
|
||||||
|
|
||||||
There are two types of buses, ISA and PCI, for Smartio family multiport
|
|
||||||
board.
|
3.1 Hardware installation
|
||||||
|
|
||||||
|
There are two types of buses, ISA and PCI, for Smartio/Industio
|
||||||
|
family multiport board.
|
||||||
|
|
||||||
ISA board
|
ISA board
|
||||||
---------
|
---------
|
||||||
|
@ -81,47 +118,57 @@ Content
|
||||||
installation procedure in User's Manual before proceed any further.
|
installation procedure in User's Manual before proceed any further.
|
||||||
Please make sure the JP1 is open after the ISA board is set properly.
|
Please make sure the JP1 is open after the ISA board is set properly.
|
||||||
|
|
||||||
PCI board
|
PCI/UPCI board
|
||||||
---------
|
--------------
|
||||||
You may need to adjust IRQ usage in BIOS to avoid from IRQ conflict
|
You may need to adjust IRQ usage in BIOS to avoid from IRQ conflict
|
||||||
with other ISA devices. Please refer to hardware installation
|
with other ISA devices. Please refer to hardware installation
|
||||||
procedure in User's Manual in advance.
|
procedure in User's Manual in advance.
|
||||||
|
|
||||||
IRQ Sharing
|
PCI IRQ Sharing
|
||||||
-----------
|
-----------
|
||||||
Each port within the same multiport board shares the same IRQ. Up to
|
Each port within the same multiport board shares the same IRQ. Up to
|
||||||
4 Moxa Smartio Family multiport boards can be installed together on
|
4 Moxa Smartio/Industio PCI Family multiport boards can be installed
|
||||||
one system and they can share the same IRQ.
|
together on one system and they can share the same IRQ.
|
||||||
|
|
||||||
3.2 Driver files and device naming convention
|
|
||||||
|
3.2 Driver files
|
||||||
|
|
||||||
The driver file may be obtained from ftp, CD-ROM or floppy disk. The
|
The driver file may be obtained from ftp, CD-ROM or floppy disk. The
|
||||||
first step, anyway, is to copy driver file "mxser.tgz" into specified
|
first step, anyway, is to copy driver file "mxser.tgz" into specified
|
||||||
directory. e.g. /moxa. The execute commands as below.
|
directory. e.g. /moxa. The execute commands as below.
|
||||||
|
|
||||||
|
# cd /
|
||||||
|
# mkdir moxa
|
||||||
# cd /moxa
|
# cd /moxa
|
||||||
# tar xvf /dev/fd0
|
# tar xvf /dev/fd0
|
||||||
|
|
||||||
or
|
or
|
||||||
|
|
||||||
|
# cd /
|
||||||
|
# mkdir moxa
|
||||||
# cd /moxa
|
# cd /moxa
|
||||||
# cp /mnt/cdrom/<driver directory>/mxser.tgz .
|
# cp /mnt/cdrom/<driver directory>/mxser.tgz .
|
||||||
# tar xvfz mxser.tgz
|
# tar xvfz mxser.tgz
|
||||||
|
|
||||||
|
|
||||||
|
3.3 Device naming convention
|
||||||
|
|
||||||
You may find all the driver and utilities files in /moxa/mxser.
|
You may find all the driver and utilities files in /moxa/mxser.
|
||||||
Following installation procedure depends on the model you'd like to
|
Following installation procedure depends on the model you'd like to
|
||||||
run the driver. If you prefer module driver, please refer to 3.3.
|
run the driver. If you prefer module driver, please refer to 3.4.
|
||||||
If static driver is required, please refer to 3.4.
|
If static driver is required, please refer to 3.5.
|
||||||
|
|
||||||
Dialin and callout port
|
Dialin and callout port
|
||||||
-----------------------
|
-----------------------
|
||||||
This driver remains traditional serial device properties. There're
|
This driver remains traditional serial device properties. There are
|
||||||
two special file name for each serial port. One is dial-in port
|
two special file name for each serial port. One is dial-in port
|
||||||
which is named "ttyMxx". For callout port, the naming convention
|
which is named "ttyMxx". For callout port, the naming convention
|
||||||
is "cumxx".
|
is "cumxx".
|
||||||
|
|
||||||
Device naming when more than 2 boards installed
|
Device naming when more than 2 boards installed
|
||||||
-----------------------------------------------
|
-----------------------------------------------
|
||||||
Naming convention for each Smartio multiport board is pre-defined
|
Naming convention for each Smartio/Industio multiport board is
|
||||||
as below.
|
pre-defined as below.
|
||||||
|
|
||||||
Board Num. Dial-in Port Callout port
|
Board Num. Dial-in Port Callout port
|
||||||
1st board ttyM0 - ttyM7 cum0 - cum7
|
1st board ttyM0 - ttyM7 cum0 - cum7
|
||||||
|
@ -129,6 +176,12 @@ Content
|
||||||
3rd board ttyM16 - ttyM23 cum16 - cum23
|
3rd board ttyM16 - ttyM23 cum16 - cum23
|
||||||
4th board ttyM24 - ttym31 cum24 - cum31
|
4th board ttyM24 - ttym31 cum24 - cum31
|
||||||
|
|
||||||
|
|
||||||
|
!!!!!!!!!!!!!!!!!!!! NOTE !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
|
||||||
|
Under Kernel 2.6 the cum Device is Obsolete. So use ttyM*
|
||||||
|
device instead.
|
||||||
|
!!!!!!!!!!!!!!!!!!!! NOTE !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
|
||||||
|
|
||||||
Board sequence
|
Board sequence
|
||||||
--------------
|
--------------
|
||||||
This driver will activate ISA boards according to the parameter set
|
This driver will activate ISA boards according to the parameter set
|
||||||
|
@ -138,69 +191,131 @@ Content
|
||||||
For PCI boards, their sequence will be after ISA boards and C168H/PCI
|
For PCI boards, their sequence will be after ISA boards and C168H/PCI
|
||||||
has higher priority than C104H/PCI boards.
|
has higher priority than C104H/PCI boards.
|
||||||
|
|
||||||
3.3 Module driver configuration
|
3.4 Module driver configuration
|
||||||
Module driver is easiest way to install. If you prefer static driver
|
Module driver is easiest way to install. If you prefer static driver
|
||||||
installation, please skip this paragraph.
|
installation, please skip this paragraph.
|
||||||
1. Find "Makefile" in /moxa/mxser, then run
|
|
||||||
|
|
||||||
# make install
|
|
||||||
|
|
||||||
The driver files "mxser.o" and utilities will be properly compiled
|
------------- Prepare to use the MOXA driver--------------------
|
||||||
and copied to system directories respectively.Then run
|
3.4.1 Create tty device with correct major number
|
||||||
|
Before using MOXA driver, your system must have the tty devices
|
||||||
|
which are created with driver's major number. We offer one shell
|
||||||
|
script "msmknod" to simplify the procedure.
|
||||||
|
This step is only needed to be executed once. But you still
|
||||||
|
need to do this procedure when:
|
||||||
|
a. You change the driver's major number. Please refer the "3.7"
|
||||||
|
section.
|
||||||
|
b. Your total installed MOXA boards number is changed. Maybe you
|
||||||
|
add/delete one MOXA board.
|
||||||
|
c. You want to change the tty name. This needs to modify the
|
||||||
|
shell script "msmknod"
|
||||||
|
|
||||||
# insmod mxser
|
The procedure is:
|
||||||
|
|
||||||
to activate the modular driver. You may run "lsmod" to check
|
|
||||||
if "mxser.o" is activated.
|
|
||||||
|
|
||||||
2. Create special files by executing "msmknod".
|
|
||||||
# cd /moxa/mxser/driver
|
# cd /moxa/mxser/driver
|
||||||
# ./msmknod
|
# ./msmknod
|
||||||
|
|
||||||
Default major numbers for dial-in device and callout device are
|
This shell script will require the major number for dial-in
|
||||||
174, 175. Msmknod will delete any special files occupying the same
|
device and callout device to create tty device. You also need
|
||||||
device naming.
|
to specify the total installed MOXA board number. Default major
|
||||||
|
numbers for dial-in device and callout device are 30, 35. If
|
||||||
|
you need to change to other number, please refer section "3.7"
|
||||||
|
for more detailed procedure.
|
||||||
|
Msmknod will delete any special files occupying the same device
|
||||||
|
naming.
|
||||||
|
|
||||||
3. Up to now, you may manually execute "insmod mxser" to activate
|
3.4.2 Build the MOXA driver and utilities
|
||||||
this driver and run "rmmod mxser" to remove it. However, it's
|
Before using the MOXA driver and utilities, you need compile the
|
||||||
better to have a boot time configuration to eliminate manual
|
all the source code. This step is only need to be executed once.
|
||||||
operation.
|
But you still re-compile the source code if you modify the source
|
||||||
Boot time configuration can be achieved by rc file. Run following
|
code. For example, if you change the driver's major number (see
|
||||||
command for setting rc files.
|
"3.7" section), then you need to do this step again.
|
||||||
|
|
||||||
|
Find "Makefile" in /moxa/mxser, then run
|
||||||
|
|
||||||
|
# make clean; make install
|
||||||
|
|
||||||
|
!!!!!!!!!! NOTE !!!!!!!!!!!!!!!!!
|
||||||
|
For Red Hat 9, Red Hat Enterprise Linux AS3/ES3/WS3 & Fedora Core1:
|
||||||
|
# make clean; make installsp1
|
||||||
|
|
||||||
|
For Red Hat Enterprise Linux AS4/ES4/WS4:
|
||||||
|
# make clean; make installsp2
|
||||||
|
!!!!!!!!!! NOTE !!!!!!!!!!!!!!!!!
|
||||||
|
|
||||||
|
The driver files "mxser.o" and utilities will be properly compiled
|
||||||
|
and copied to system directories respectively.
|
||||||
|
|
||||||
|
------------- Load MOXA driver--------------------
|
||||||
|
3.4.3 Load the MOXA driver
|
||||||
|
|
||||||
|
# modprobe mxser <argument>
|
||||||
|
|
||||||
|
will activate the module driver. You may run "lsmod" to check
|
||||||
|
if "mxser" is activated. If the MOXA board is ISA board, the
|
||||||
|
<argument> is needed. Please refer to section "3.4.5" for more
|
||||||
|
information.
|
||||||
|
|
||||||
|
|
||||||
|
------------- Load MOXA driver on boot --------------------
|
||||||
|
3.4.4 For the above description, you may manually execute
|
||||||
|
"modprobe mxser" to activate this driver and run
|
||||||
|
"rmmod mxser" to remove it.
|
||||||
|
However, it's better to have a boot time configuration to
|
||||||
|
eliminate manual operation. Boot time configuration can be
|
||||||
|
achieved by rc file. We offer one "rc.mxser" file to simplify
|
||||||
|
the procedure under "moxa/mxser/driver".
|
||||||
|
|
||||||
|
But if you use ISA board, please modify the "modprobe ..." command
|
||||||
|
to add the argument (see "3.4.5" section). After modifying the
|
||||||
|
rc.mxser, please try to execute "/moxa/mxser/driver/rc.mxser"
|
||||||
|
manually to make sure the modification is ok. If any error
|
||||||
|
encountered, please try to modify again. If the modification is
|
||||||
|
completed, follow the below step.
|
||||||
|
|
||||||
|
Run following command for setting rc files.
|
||||||
|
|
||||||
# cd /moxa/mxser/driver
|
# cd /moxa/mxser/driver
|
||||||
# cp ./rc.mxser /etc/rc.d
|
# cp ./rc.mxser /etc/rc.d
|
||||||
# cd /etc/rc.d
|
# cd /etc/rc.d
|
||||||
|
|
||||||
You may have to modify part of the content in rc.mxser to specify
|
Check "rc.serial" is existed or not. If "rc.serial" doesn't exist,
|
||||||
parameters for ISA board. Please refer to rc.mxser for more detail.
|
create it by vi, run "chmod 755 rc.serial" to change the permission.
|
||||||
Find "rc.serial". If "rc.serial" doesn't exist, create it by vi.
|
Add "/etc/rc.d/rc.mxser" in last line,
|
||||||
Add "rc.mxser" in last line. Next, open rc.local by vi
|
|
||||||
and append following content.
|
|
||||||
|
|
||||||
if [ -f /etc/rc.d/rc.serial ]; then
|
Reboot and check if moxa.o activated by "lsmod" command.
|
||||||
sh /etc/rc.d/rc.serial
|
|
||||||
fi
|
|
||||||
|
|
||||||
4. Reboot and check if mxser.o activated by "lsmod" command.
|
3.4.5. If you'd like to drive Smartio/Industio ISA boards in the system,
|
||||||
5. If you'd like to drive Smartio ISA boards in the system, you'll
|
you'll have to add parameter to specify CAP address of given
|
||||||
have to add parameter to specify CAP address of given board while
|
board while activating "mxser.o". The format for parameters are
|
||||||
activating "mxser.o". The format for parameters are as follows.
|
as follows.
|
||||||
|
|
||||||
insmod mxser ioaddr=0x???,0x???,0x???,0x???
|
modprobe mxser ioaddr=0x???,0x???,0x???,0x???
|
||||||
| | | |
|
| | | |
|
||||||
| | | +- 4th ISA board
|
| | | +- 4th ISA board
|
||||||
| | +------ 3rd ISA board
|
| | +------ 3rd ISA board
|
||||||
| +------------ 2nd ISA board
|
| +------------ 2nd ISA board
|
||||||
+------------------- 1st ISA board
|
+------------------- 1st ISA board
|
||||||
|
|
||||||
3.4 Static driver configuration
|
3.5 Static driver configuration for Linux kernel 2.4.x and 2.6.x
|
||||||
|
|
||||||
1. Create link
|
Note: To use static driver, you must install the linux kernel
|
||||||
|
source package.
|
||||||
|
|
||||||
|
3.5.1 Backup the built-in driver in the kernel.
|
||||||
|
# cd /usr/src/linux/drivers/char
|
||||||
|
# mv mxser.c mxser.c.old
|
||||||
|
|
||||||
|
For Red Hat 7.x user, you need to create link:
|
||||||
|
# cd /usr/src
|
||||||
|
# ln -s linux-2.4 linux
|
||||||
|
|
||||||
|
3.5.2 Create link
|
||||||
# cd /usr/src/linux/drivers/char
|
# cd /usr/src/linux/drivers/char
|
||||||
# ln -s /moxa/mxser/driver/mxser.c mxser.c
|
# ln -s /moxa/mxser/driver/mxser.c mxser.c
|
||||||
|
|
||||||
2. Add CAP address list for ISA boards
|
3.5.3 Add CAP address list for ISA boards. For PCI boards user,
|
||||||
|
please skip this step.
|
||||||
|
|
||||||
In module mode, the CAP address for ISA board is given by
|
In module mode, the CAP address for ISA board is given by
|
||||||
parameter. In static driver configuration, you'll have to
|
parameter. In static driver configuration, you'll have to
|
||||||
assign it within driver's source code. If you will not
|
assign it within driver's source code. If you will not
|
||||||
|
@ -222,73 +337,55 @@ Content
|
||||||
static int mxserBoardCAP[]
|
static int mxserBoardCAP[]
|
||||||
= {0x280, 0x180, 0x00, 0x00};
|
= {0x280, 0x180, 0x00, 0x00};
|
||||||
|
|
||||||
3. Modify tty_io.c
|
3.5.4 Setup kernel configuration
|
||||||
# cd /usr/src/linux/drivers/char/
|
|
||||||
# vi tty_io.c
|
|
||||||
Find pty_init(), insert "mxser_init()" as
|
|
||||||
|
|
||||||
pty_init();
|
Configure the kernel:
|
||||||
mxser_init();
|
|
||||||
|
|
||||||
4. Modify tty.h
|
# cd /usr/src/linux
|
||||||
# cd /usr/src/linux/include/linux
|
# make menuconfig
|
||||||
# vi tty.h
|
|
||||||
Find extern int tty_init(void), insert "mxser_init()" as
|
|
||||||
|
|
||||||
extern int tty_init(void);
|
You will go into a menu-driven system. Please select [Character
|
||||||
extern int mxser_init(void);
|
devices][Non-standard serial port support], enable the [Moxa
|
||||||
|
SmartIO support] driver with "[*]" for built-in (not "[M]"), then
|
||||||
5. Modify Makefile
|
select [Exit] to exit this program.
|
||||||
# cd /usr/src/linux/drivers/char
|
|
||||||
# vi Makefile
|
|
||||||
Find L_OBJS := tty_io.o ...... random.o, add
|
|
||||||
"mxser.o" at last of this line as
|
|
||||||
L_OBJS := tty_io.o ....... mxser.o
|
|
||||||
|
|
||||||
6. Rebuild kernel
|
3.5.5 Rebuild kernel
|
||||||
The following are for Linux kernel rebuilding,for your reference only.
|
The following are for Linux kernel rebuilding, for your
|
||||||
|
reference only.
|
||||||
For appropriate details, please refer to the Linux document.
|
For appropriate details, please refer to the Linux document.
|
||||||
|
|
||||||
If 'lilo' utility is installed, please use 'make zlilo' to rebuild
|
|
||||||
kernel. If 'lilo' is not installed, please follow the following steps.
|
|
||||||
|
|
||||||
a. cd /usr/src/linux
|
a. cd /usr/src/linux
|
||||||
b. make clean /* take a few minutes */
|
b. make clean /* take a few minutes */
|
||||||
c. make bzImage /* take probably 10-20 minutes */
|
c. make dep /* take a few minutes */
|
||||||
d. Backup original boot kernel. /* optional step */
|
d. make bzImage /* take probably 10-20 minutes */
|
||||||
e. cp /usr/src/linux/arch/i386/boot/bzImage /boot/vmlinuz
|
e. make install /* copy boot image to correct position */
|
||||||
f. Please make sure the boot kernel (vmlinuz) is in the
|
f. Please make sure the boot kernel (vmlinuz) is in the
|
||||||
correct position. If you use 'lilo' utility, you should
|
correct position.
|
||||||
check /etc/lilo.conf 'image' item specified the path
|
g. If you use 'lilo' utility, you should check /etc/lilo.conf
|
||||||
which is the 'vmlinuz' path, or you will load wrong
|
'image' item specified the path which is the 'vmlinuz' path,
|
||||||
(or old) boot kernel image (vmlinuz).
|
or you will load wrong (or old) boot kernel image (vmlinuz).
|
||||||
g. chmod 400 /vmlinuz
|
After checking /etc/lilo.conf, please run "lilo".
|
||||||
h. lilo
|
|
||||||
i. rdev -R /vmlinuz 1
|
|
||||||
j. sync
|
|
||||||
|
|
||||||
Note that if the result of "make zImage" is ERROR, then you have to
|
Note that if the result of "make bzImage" is ERROR, then you have to
|
||||||
go back to Linux configuration Setup. Type "make config" in directory
|
go back to Linux configuration Setup. Type "make menuconfig" in
|
||||||
/usr/src/linux or "setup".
|
directory /usr/src/linux.
|
||||||
|
|
||||||
Since system include file, /usr/src/linux/include/linux/interrupt.h,
|
|
||||||
is modified each time the MOXA driver is installed, kernel rebuilding
|
|
||||||
is inevitable. And it takes about 10 to 20 minutes depends on the
|
|
||||||
machine.
|
|
||||||
|
|
||||||
7. Make utility
|
3.5.6 Make tty device and special file
|
||||||
# cd /moxa/mxser/utility
|
|
||||||
# make install
|
|
||||||
|
|
||||||
8. Make special file
|
|
||||||
# cd /moxa/mxser/driver
|
# cd /moxa/mxser/driver
|
||||||
# ./msmknod
|
# ./msmknod
|
||||||
|
|
||||||
9. Reboot
|
3.5.7 Make utility
|
||||||
|
# cd /moxa/mxser/utility
|
||||||
|
# make clean; make install
|
||||||
|
|
||||||
3.5 Custom configuration
|
3.5.8 Reboot
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
3.6 Custom configuration
|
||||||
Although this driver already provides you default configuration, you
|
Although this driver already provides you default configuration, you
|
||||||
still can change the device name and major number.The instruction to
|
still can change the device name and major number. The instruction to
|
||||||
change these parameters are shown as below.
|
change these parameters are shown as below.
|
||||||
|
|
||||||
Change Device name
|
Change Device name
|
||||||
|
@ -306,33 +403,37 @@ Content
|
||||||
2 free major numbers for this driver. There are 3 steps to change
|
2 free major numbers for this driver. There are 3 steps to change
|
||||||
major numbers.
|
major numbers.
|
||||||
|
|
||||||
1. Find free major numbers
|
3.6.1 Find free major numbers
|
||||||
In /proc/devices, you may find all the major numbers occupied
|
In /proc/devices, you may find all the major numbers occupied
|
||||||
in the system. Please select 2 major numbers that are available.
|
in the system. Please select 2 major numbers that are available.
|
||||||
e.g. 40, 45.
|
e.g. 40, 45.
|
||||||
2. Create special files
|
3.6.2 Create special files
|
||||||
Run /moxa/mxser/driver/msmknod to create special files with
|
Run /moxa/mxser/driver/msmknod to create special files with
|
||||||
specified major numbers.
|
specified major numbers.
|
||||||
3. Modify driver with new major number
|
3.6.3 Modify driver with new major number
|
||||||
Run vi to open /moxa/mxser/driver/mxser.c. Locate the line
|
Run vi to open /moxa/mxser/driver/mxser.c. Locate the line
|
||||||
contains "MXSERMAJOR". Change the content as below.
|
contains "MXSERMAJOR". Change the content as below.
|
||||||
#define MXSERMAJOR 40
|
#define MXSERMAJOR 40
|
||||||
#define MXSERCUMAJOR 45
|
#define MXSERCUMAJOR 45
|
||||||
4. Run # make install in /moxa/mxser/driver.
|
3.6.4 Run "make clean; make install" in /moxa/mxser/driver.
|
||||||
|
|
||||||
3.6 Verify driver installation
|
3.7 Verify driver installation
|
||||||
You may refer to /var/log/messages to check the latest status
|
You may refer to /var/log/messages to check the latest status
|
||||||
log reported by this driver whenever it's activated.
|
log reported by this driver whenever it's activated.
|
||||||
|
|
||||||
-----------------------------------------------------------------------------
|
-----------------------------------------------------------------------------
|
||||||
4. Utilities
|
4. Utilities
|
||||||
There are 3 utilities contained in this driver. They are msdiag, msmon and
|
There are 3 utilities contained in this driver. They are msdiag, msmon and
|
||||||
msterm. These 3 utilities are released in form of source code. They should
|
msterm. These 3 utilities are released in form of source code. They should
|
||||||
be compiled into executable file and copied into /usr/bin.
|
be compiled into executable file and copied into /usr/bin.
|
||||||
|
|
||||||
|
Before using these utilities, please load driver (refer 3.4 & 3.5) and
|
||||||
|
make sure you had run the "msmknod" utility.
|
||||||
|
|
||||||
msdiag - Diagnostic
|
msdiag - Diagnostic
|
||||||
--------------------
|
--------------------
|
||||||
This utility provides the function to detect what Moxa Smartio multiport
|
This utility provides the function to display what Moxa Smartio/Industio
|
||||||
board exists in the system.
|
board found by driver in the system.
|
||||||
|
|
||||||
msmon - Port Monitoring
|
msmon - Port Monitoring
|
||||||
-----------------------
|
-----------------------
|
||||||
|
@ -353,12 +454,13 @@ Content
|
||||||
application, for example, sending AT command to a modem connected to the
|
application, for example, sending AT command to a modem connected to the
|
||||||
port or used as a terminal for login purpose. Note that this is only a
|
port or used as a terminal for login purpose. Note that this is only a
|
||||||
dumb terminal emulation without handling full screen operation.
|
dumb terminal emulation without handling full screen operation.
|
||||||
|
|
||||||
-----------------------------------------------------------------------------
|
-----------------------------------------------------------------------------
|
||||||
5. Setserial
|
5. Setserial
|
||||||
|
|
||||||
Supported Setserial parameters are listed as below.
|
Supported Setserial parameters are listed as below.
|
||||||
|
|
||||||
uart set UART type(16450-->disable FIFO, 16550A-->enable FIFO)
|
uart set UART type(16450-->disable FIFO, 16550A-->enable FIFO)
|
||||||
close_delay set the amount of time(in 1/100 of a second) that DTR
|
close_delay set the amount of time(in 1/100 of a second) that DTR
|
||||||
should be kept low while being closed.
|
should be kept low while being closed.
|
||||||
closing_wait set the amount of time(in 1/100 of a second) that the
|
closing_wait set the amount of time(in 1/100 of a second) that the
|
||||||
|
@ -366,7 +468,13 @@ Content
|
||||||
being closed, before the receiver is disable.
|
being closed, before the receiver is disable.
|
||||||
spd_hi Use 57.6kb when the application requests 38.4kb.
|
spd_hi Use 57.6kb when the application requests 38.4kb.
|
||||||
spd_vhi Use 115.2kb when the application requests 38.4kb.
|
spd_vhi Use 115.2kb when the application requests 38.4kb.
|
||||||
|
spd_shi Use 230.4kb when the application requests 38.4kb.
|
||||||
|
spd_warp Use 460.8kb when the application requests 38.4kb.
|
||||||
spd_normal Use 38.4kb when the application requests 38.4kb.
|
spd_normal Use 38.4kb when the application requests 38.4kb.
|
||||||
|
spd_cust Use the custom divisor to set the speed when the
|
||||||
|
application requests 38.4kb.
|
||||||
|
divisor This option set the custom divison.
|
||||||
|
baud_base This option set the base baud rate.
|
||||||
|
|
||||||
-----------------------------------------------------------------------------
|
-----------------------------------------------------------------------------
|
||||||
6. Troubleshooting
|
6. Troubleshooting
|
||||||
|
@ -375,8 +483,9 @@ Content
|
||||||
possible. If all the possible solutions fail, please contact our technical
|
possible. If all the possible solutions fail, please contact our technical
|
||||||
support team to get more help.
|
support team to get more help.
|
||||||
|
|
||||||
Error msg: More than 4 Moxa Smartio family boards found. Fifth board and
|
|
||||||
after are ignored.
|
Error msg: More than 4 Moxa Smartio/Industio family boards found. Fifth board
|
||||||
|
and after are ignored.
|
||||||
Solution:
|
Solution:
|
||||||
To avoid this problem, please unplug fifth and after board, because Moxa
|
To avoid this problem, please unplug fifth and after board, because Moxa
|
||||||
driver supports up to 4 boards.
|
driver supports up to 4 boards.
|
||||||
|
@ -384,7 +493,7 @@ Content
|
||||||
Error msg: Request_irq fail, IRQ(?) may be conflict with another device.
|
Error msg: Request_irq fail, IRQ(?) may be conflict with another device.
|
||||||
Solution:
|
Solution:
|
||||||
Other PCI or ISA devices occupy the assigned IRQ. If you are not sure
|
Other PCI or ISA devices occupy the assigned IRQ. If you are not sure
|
||||||
which device causes the situation,please check /proc/interrupts to find
|
which device causes the situation, please check /proc/interrupts to find
|
||||||
free IRQ and simply change another free IRQ for Moxa board.
|
free IRQ and simply change another free IRQ for Moxa board.
|
||||||
|
|
||||||
Error msg: Board #: C1xx Series(CAP=xxx) interrupt number invalid.
|
Error msg: Board #: C1xx Series(CAP=xxx) interrupt number invalid.
|
||||||
|
@ -397,15 +506,18 @@ Content
|
||||||
Moxa ISA board needs an interrupt vector.Please refer to user's manual
|
Moxa ISA board needs an interrupt vector.Please refer to user's manual
|
||||||
"Hardware Installation" chapter to set interrupt vector.
|
"Hardware Installation" chapter to set interrupt vector.
|
||||||
|
|
||||||
Error msg: Couldn't install MOXA Smartio family driver!
|
Error msg: Couldn't install MOXA Smartio/Industio family driver!
|
||||||
Solution:
|
Solution:
|
||||||
Load Moxa driver fail, the major number may conflict with other devices.
|
Load Moxa driver fail, the major number may conflict with other devices.
|
||||||
Please refer to previous section 3.5 to change a free major number for
|
Please refer to previous section 3.7 to change a free major number for
|
||||||
Moxa driver.
|
Moxa driver.
|
||||||
|
|
||||||
Error msg: Couldn't install MOXA Smartio family callout driver!
|
Error msg: Couldn't install MOXA Smartio/Industio family callout driver!
|
||||||
Solution:
|
Solution:
|
||||||
Load Moxa callout driver fail, the callout device major number may
|
Load Moxa callout driver fail, the callout device major number may
|
||||||
conflict with other devices. Please refer to previous section 3.5 to
|
conflict with other devices. Please refer to previous section 3.7 to
|
||||||
change a free callout device major number for Moxa driver.
|
change a free callout device major number for Moxa driver.
|
||||||
|
|
||||||
|
|
||||||
-----------------------------------------------------------------------------
|
-----------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
|
@ -289,35 +289,73 @@ downdelay
|
||||||
fail_over_mac
|
fail_over_mac
|
||||||
|
|
||||||
Specifies whether active-backup mode should set all slaves to
|
Specifies whether active-backup mode should set all slaves to
|
||||||
the same MAC address (the traditional behavior), or, when
|
the same MAC address at enslavement (the traditional
|
||||||
enabled, change the bond's MAC address when changing the
|
behavior), or, when enabled, perform special handling of the
|
||||||
active interface (i.e., fail over the MAC address itself).
|
bond's MAC address in accordance with the selected policy.
|
||||||
|
|
||||||
Fail over MAC is useful for devices that cannot ever alter
|
Possible values are:
|
||||||
their MAC address, or for devices that refuse incoming
|
|
||||||
broadcasts with their own source MAC (which interferes with
|
|
||||||
the ARP monitor).
|
|
||||||
|
|
||||||
The down side of fail over MAC is that every device on the
|
none or 0
|
||||||
network must be updated via gratuitous ARP, vs. just updating
|
|
||||||
a switch or set of switches (which often takes place for any
|
|
||||||
traffic, not just ARP traffic, if the switch snoops incoming
|
|
||||||
traffic to update its tables) for the traditional method. If
|
|
||||||
the gratuitous ARP is lost, communication may be disrupted.
|
|
||||||
|
|
||||||
When fail over MAC is used in conjuction with the mii monitor,
|
This setting disables fail_over_mac, and causes
|
||||||
devices which assert link up prior to being able to actually
|
bonding to set all slaves of an active-backup bond to
|
||||||
transmit and receive are particularly susecptible to loss of
|
the same MAC address at enslavement time. This is the
|
||||||
the gratuitous ARP, and an appropriate updelay setting may be
|
default.
|
||||||
required.
|
|
||||||
|
|
||||||
A value of 0 disables fail over MAC, and is the default. A
|
active or 1
|
||||||
value of 1 enables fail over MAC. This option is enabled
|
|
||||||
automatically if the first slave added cannot change its MAC
|
|
||||||
address. This option may be modified via sysfs only when no
|
|
||||||
slaves are present in the bond.
|
|
||||||
|
|
||||||
This option was added in bonding version 3.2.0.
|
The "active" fail_over_mac policy indicates that the
|
||||||
|
MAC address of the bond should always be the MAC
|
||||||
|
address of the currently active slave. The MAC
|
||||||
|
address of the slaves is not changed; instead, the MAC
|
||||||
|
address of the bond changes during a failover.
|
||||||
|
|
||||||
|
This policy is useful for devices that cannot ever
|
||||||
|
alter their MAC address, or for devices that refuse
|
||||||
|
incoming broadcasts with their own source MAC (which
|
||||||
|
interferes with the ARP monitor).
|
||||||
|
|
||||||
|
The down side of this policy is that every device on
|
||||||
|
the network must be updated via gratuitous ARP,
|
||||||
|
vs. just updating a switch or set of switches (which
|
||||||
|
often takes place for any traffic, not just ARP
|
||||||
|
traffic, if the switch snoops incoming traffic to
|
||||||
|
update its tables) for the traditional method. If the
|
||||||
|
gratuitous ARP is lost, communication may be
|
||||||
|
disrupted.
|
||||||
|
|
||||||
|
When this policy is used in conjuction with the mii
|
||||||
|
monitor, devices which assert link up prior to being
|
||||||
|
able to actually transmit and receive are particularly
|
||||||
|
susecptible to loss of the gratuitous ARP, and an
|
||||||
|
appropriate updelay setting may be required.
|
||||||
|
|
||||||
|
follow or 2
|
||||||
|
|
||||||
|
The "follow" fail_over_mac policy causes the MAC
|
||||||
|
address of the bond to be selected normally (normally
|
||||||
|
the MAC address of the first slave added to the bond).
|
||||||
|
However, the second and subsequent slaves are not set
|
||||||
|
to this MAC address while they are in a backup role; a
|
||||||
|
slave is programmed with the bond's MAC address at
|
||||||
|
failover time (and the formerly active slave receives
|
||||||
|
the newly active slave's MAC address).
|
||||||
|
|
||||||
|
This policy is useful for multiport devices that
|
||||||
|
either become confused or incur a performance penalty
|
||||||
|
when multiple ports are programmed with the same MAC
|
||||||
|
address.
|
||||||
|
|
||||||
|
|
||||||
|
The default policy is none, unless the first slave cannot
|
||||||
|
change its MAC address, in which case the active policy is
|
||||||
|
selected by default.
|
||||||
|
|
||||||
|
This option may be modified via sysfs only when no slaves are
|
||||||
|
present in the bond.
|
||||||
|
|
||||||
|
This option was added in bonding version 3.2.0. The "follow"
|
||||||
|
policy was added in bonding version 3.3.0.
|
||||||
|
|
||||||
lacp_rate
|
lacp_rate
|
||||||
|
|
||||||
|
@ -338,7 +376,8 @@ max_bonds
|
||||||
Specifies the number of bonding devices to create for this
|
Specifies the number of bonding devices to create for this
|
||||||
instance of the bonding driver. E.g., if max_bonds is 3, and
|
instance of the bonding driver. E.g., if max_bonds is 3, and
|
||||||
the bonding driver is not already loaded, then bond0, bond1
|
the bonding driver is not already loaded, then bond0, bond1
|
||||||
and bond2 will be created. The default value is 1.
|
and bond2 will be created. The default value is 1. Specifying
|
||||||
|
a value of 0 will load bonding, but will not create any devices.
|
||||||
|
|
||||||
miimon
|
miimon
|
||||||
|
|
||||||
|
@ -501,6 +540,17 @@ mode
|
||||||
swapped with the new curr_active_slave that was
|
swapped with the new curr_active_slave that was
|
||||||
chosen.
|
chosen.
|
||||||
|
|
||||||
|
num_grat_arp
|
||||||
|
|
||||||
|
Specifies the number of gratuitous ARPs to be issued after a
|
||||||
|
failover event. One gratuitous ARP is issued immediately after
|
||||||
|
the failover, subsequent ARPs are sent at a rate of one per link
|
||||||
|
monitor interval (arp_interval or miimon, whichever is active).
|
||||||
|
|
||||||
|
The valid range is 0 - 255; the default value is 1. This option
|
||||||
|
affects only the active-backup mode. This option was added for
|
||||||
|
bonding version 3.3.0.
|
||||||
|
|
||||||
primary
|
primary
|
||||||
|
|
||||||
A string (eth0, eth2, etc) specifying which slave is the
|
A string (eth0, eth2, etc) specifying which slave is the
|
||||||
|
@ -581,7 +631,7 @@ xmit_hash_policy
|
||||||
in environments where a layer3 gateway device is
|
in environments where a layer3 gateway device is
|
||||||
required to reach most destinations.
|
required to reach most destinations.
|
||||||
|
|
||||||
This algorithm is 802.3ad complient.
|
This algorithm is 802.3ad compliant.
|
||||||
|
|
||||||
layer3+4
|
layer3+4
|
||||||
|
|
||||||
|
|
|
@ -186,7 +186,7 @@ solution for a couple of reasons:
|
||||||
|
|
||||||
The Linux network devices (by default) just can handle the
|
The Linux network devices (by default) just can handle the
|
||||||
transmission and reception of media dependent frames. Due to the
|
transmission and reception of media dependent frames. Due to the
|
||||||
arbritration on the CAN bus the transmission of a low prio CAN-ID
|
arbitration on the CAN bus the transmission of a low prio CAN-ID
|
||||||
may be delayed by the reception of a high prio CAN frame. To
|
may be delayed by the reception of a high prio CAN frame. To
|
||||||
reflect the correct* traffic on the node the loopback of the sent
|
reflect the correct* traffic on the node the loopback of the sent
|
||||||
data has to be performed right after a successful transmission. If
|
data has to be performed right after a successful transmission. If
|
||||||
|
@ -481,7 +481,7 @@ solution for a couple of reasons:
|
||||||
- stats_timer: To calculate the Socket CAN core statistics
|
- stats_timer: To calculate the Socket CAN core statistics
|
||||||
(e.g. current/maximum frames per second) this 1 second timer is
|
(e.g. current/maximum frames per second) this 1 second timer is
|
||||||
invoked at can.ko module start time by default. This timer can be
|
invoked at can.ko module start time by default. This timer can be
|
||||||
disabled by using stattimer=0 on the module comandline.
|
disabled by using stattimer=0 on the module commandline.
|
||||||
|
|
||||||
- debug: (removed since SocketCAN SVN r546)
|
- debug: (removed since SocketCAN SVN r546)
|
||||||
|
|
||||||
|
|
Some files were not shown because too many files have changed in this diff Show more
Loading…
Add table
Add a link
Reference in a new issue