linux-uconsole/kernel/time
Vladis Dronov 0393b87201 ptp: fix the race between the release of ptp_clock and cdev
[ Upstream commit a33121e548 ]

In a case when a ptp chardev (like /dev/ptp0) is open but an underlying
device is removed, closing this file leads to a race. This reproduces
easily in a kvm virtual machine:

ts# cat openptp0.c
int main() { ... fp = fopen("/dev/ptp0", "r"); ... sleep(10); }
ts# uname -r
5.5.0-rc3-46cf053e
ts# cat /proc/cmdline
... slub_debug=FZP
ts# modprobe ptp_kvm
ts# ./openptp0 &
[1] 670
opened /dev/ptp0, sleeping 10s...
ts# rmmod ptp_kvm
ts# ls /dev/ptp*
ls: cannot access '/dev/ptp*': No such file or directory
ts# ...woken up
[   48.010809] general protection fault: 0000 [#1] SMP
[   48.012502] CPU: 6 PID: 658 Comm: openptp0 Not tainted 5.5.0-rc3-46cf053e #25
[   48.014624] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), ...
[   48.016270] RIP: 0010:module_put.part.0+0x7/0x80
[   48.017939] RSP: 0018:ffffb3850073be00 EFLAGS: 00010202
[   48.018339] RAX: 000000006b6b6b6b RBX: 6b6b6b6b6b6b6b6b RCX: ffff89a476c00ad0
[   48.018936] RDX: fffff65a08d3ea08 RSI: 0000000000000247 RDI: 6b6b6b6b6b6b6b6b
[   48.019470] ...                                              ^^^ a slub poison
[   48.023854] Call Trace:
[   48.024050]  __fput+0x21f/0x240
[   48.024288]  task_work_run+0x79/0x90
[   48.024555]  do_exit+0x2af/0xab0
[   48.024799]  ? vfs_write+0x16a/0x190
[   48.025082]  do_group_exit+0x35/0x90
[   48.025387]  __x64_sys_exit_group+0xf/0x10
[   48.025737]  do_syscall_64+0x3d/0x130
[   48.026056]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   48.026479] RIP: 0033:0x7f53b12082f6
[   48.026792] ...
[   48.030945] Modules linked in: ptp i6300esb watchdog [last unloaded: ptp_kvm]
[   48.045001] Fixing recursive fault but reboot is needed!

This happens in:

static void __fput(struct file *file)
{   ...
    if (file->f_op->release)
        file->f_op->release(inode, file); <<< cdev is kfree'd here
    if (unlikely(S_ISCHR(inode->i_mode) && inode->i_cdev != NULL &&
             !(mode & FMODE_PATH))) {
        cdev_put(inode->i_cdev); <<< cdev fields are accessed here

Namely:

__fput()
  posix_clock_release()
    kref_put(&clk->kref, delete_clock) <<< the last reference
      delete_clock()
        delete_ptp_clock()
          kfree(ptp) <<< cdev is embedded in ptp
  cdev_put
    module_put(p->owner) <<< *p is kfree'd, bang!

Here cdev is embedded in posix_clock which is embedded in ptp_clock.
The race happens because ptp_clock's lifetime is controlled by two
refcounts: kref and cdev.kobj in posix_clock. This is wrong.

Make ptp_clock's sysfs device a parent of cdev with cdev_device_add()
created especially for such cases. This way the parent device with its
ptp_clock is not released until all references to the cdev are released.
This adds a requirement that an initialized but not exposed struct
device should be provided to posix_clock_register() by a caller instead
of a simple dev_t.

This approach was adopted from the commit 72139dfa24 ("watchdog: Fix
the race between the release of watchdog_core_data and cdev"). See
details of the implementation in the commit 233ed09d7f ("chardev: add
helper function to register char devs with a struct device").

Link: https://lore.kernel.org/linux-fsdevel/20191125125342.6189-1-vdronov@redhat.com/T/#u
Analyzed-by: Stephen Johnston <sjohnsto@redhat.com>
Analyzed-by: Vern Lovejoy <vlovejoy@redhat.com>
Signed-off-by: Vladis Dronov <vdronov@redhat.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-04 19:13:35 +01:00
..
alarmtimer.c alarmtimer: Use EOPNOTSUPP instead of ENOTSUPP 2019-10-05 13:10:07 +02:00
clockevents.c clockevents: Warn if cpu_all_mask is used as cpumask 2018-08-02 14:55:53 +02:00
clocksource.c clocksource: Revert "Remove kthread" 2018-09-06 23:38:35 +02:00
hrtimer.c hrtimer: Annotate lockless access to timer->state 2020-01-04 19:13:32 +01:00
itimer.c pid: Implement PIDTYPE_TGID 2018-07-21 10:43:12 -05:00
jiffies.c jiffies: Revert bogus conversion of NSEC_PER_SEC to TICK_NSEC 2017-03-07 11:03:28 +01:00
Kconfig sched/isolation: Eliminate NO_HZ_FULL_ALL 2018-02-15 15:40:37 -08:00
Makefile License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
ntp.c ntp: Limit TAI-UTC offset 2019-07-26 09:14:10 +02:00
ntp_internal.h timekeeping/ntp: Constify some function arguments 2018-07-19 17:08:05 -07:00
posix-clock.c ptp: fix the race between the release of ptp_clock and cdev 2020-01-04 19:13:35 +01:00
posix-cpu-timers.c posix-cpu-timers: Sanitize bogus WARNONS 2019-10-05 13:09:47 +02:00
posix-stubs.c posix-timers: Use new ktime_get_*_ts64() helpers 2018-06-19 09:56:27 +02:00
posix-timers.c posix-timers: Fix division by zero bug 2018-12-29 13:37:56 +01:00
posix-timers.h posix-timers: Make forward callback return s64 2018-07-02 11:33:25 +02:00
sched_clock.c timers/sched_clock: Prevent generic sched_clock wrap caused by tick_freeze() 2019-04-27 09:36:38 +02:00
test_udelay.c
tick-broadcast-hrtimer.c tick: broadcast-hrtimer: Fix a race in bc_set_next 2019-10-11 18:21:28 +02:00
tick-broadcast.c tick/broadcast: Use for_each_cpu() specially on UP kernels 2018-05-15 22:45:54 +02:00
tick-common.c timers/sched_clock: Prevent generic sched_clock wrap caused by tick_freeze() 2019-04-27 09:36:38 +02:00
tick-internal.h Revert: Unify CLOCK_MONOTONIC and CLOCK_BOOTTIME 2018-04-26 14:53:32 +02:00
tick-oneshot.c clockevents: Fix kernel messages split across multiple lines 2018-04-17 17:18:04 +02:00
tick-sched.c nohz: Fix local_timer_softirq_pending() 2018-07-31 22:08:44 +02:00
tick-sched.h nohz: Gather tick_sched booleans under a common flag field 2018-04-09 11:54:57 +02:00
time.c y2038: make do_gettimeofday() and get_seconds() inline 2019-11-20 18:45:24 +01:00
timeconst.bc time: Introduce jiffies64_to_nsecs() 2017-02-01 09:13:45 +01:00
timeconv.c
timecounter.c clocksource: Use a plain u64 instead of cycle_t 2016-12-25 11:04:12 +01:00
timekeeping.c y2038: make do_gettimeofday() and get_seconds() inline 2019-11-20 18:45:24 +01:00
timekeeping.h timers/sched_clock: Prevent generic sched_clock wrap caused by tick_freeze() 2019-04-27 09:36:38 +02:00
timekeeping_debug.c timekeeping/ntp: Constify some function arguments 2018-07-19 17:08:05 -07:00
timekeeping_internal.h timekeeping/ntp: Constify some function arguments 2018-07-19 17:08:05 -07:00
timer.c timer: Read jiffies once when forwarding base clk 2019-10-11 18:20:59 +02:00
timer_list.c timer_list: Guard procfs specific code 2019-07-26 09:14:10 +02:00