1
0
mirror of https://git.FreeBSD.org/src.git synced 2024-12-24 11:29:10 +00:00
freebsd/sys
Attilio Rao 3d7acbbabf Fix several callout migration races:
- Problem1:
   Hypothesis: thread1 is doing a callout_reset_on(), within his
   callout handler, willing to implicitly or explicitly migrate the
   callout.  thread2 is draining the callout.

   Thesys:
   * thread1 calls callout_lock() and locks the old callout cpu
   * thread1 performs the checks in the first path of the
     callout_reset_on()
   * thread1 hits this codepiece:
       /*
        * If the lock must migrate we have to check the state again as
        * we can't hold both the new and old locks simultaneously.
        */
       if (c->c_cpu != cpu) {
               c->c_cpu = cpu;
               CC_UNLOCK(cc);
               goto retry;
       }

     which means it will drop the lock and 'retry'
   * thread2 will callout_lock() and locks the new callout cpu.
     thread1 spins on the new lock and will not keep going for the
     moment.
   * thread2 checks that the callout is not pending (as callout is
     currently running) and that it is not on cc->cc_curr (because cc
     now refers to the new callout and the callout is running on the
     old callout cpu) thus it thinks it is done and returns.
   * thread1  will now acquire the lock and then adds the callout
     to the new callout cpu queue

   That seems an obvious race as callout_stop() falsely reports
   the callout stopped or worse, callout_drain() falsely returns
   while the callout is still in use.
 - Solution1:
   Fixing this problem would require, in general, to lock both
   callout cpus at once while switching the c_cpu field and avoid
   cyclic deadlocks between callout cpus locks.
   The concept of CPUBLOCK is then introduced (working more or less
   like the blocked_lock for thread_lock() function) meaning:
   "in callout_lock(), spin until the c->c_cpu is not different from
   CPUBLOCK". That way the "original" callout cpu, referred to the
   above mentioned code snippet, will remain blocked until the lock
   handover is over critical path will remain covered.

 - Problem2:
   Having the callout currently executed on a specific callout cpu
   and contemporary pending on another callout cpu (as it can happen
   with current code) breaks, at least, the assumption callout_drain()
   returns just once the callout cannot be referenced anymore.
 - Solution2:
   Callout migration is deferred if the current callout is already
   under execution.
   The best place to do that is in softclock() and new members are
   added to the callout cpu structure in order to specify a pending
   migration is requested. That is necessary because the callout
   cannot be trusted (not freed) the 100% of times after the execution
   of the callout handler.
   CPUBLOCK will prevent, in the "deferred migration" case, that the
   callout gets freed in this case, stopping any callout_stop() and
   callout_drain() possible activity until the migration is
   actually performed.

 - Problem3:
   There is a further race in callout_drain().
   In order to avoid a race between sleepqueue lock and callout cpu
   spinlock, in _callout_stop_safe(), the callout cpu lock is dropped,
   the sleepqueue lock is acquired and a new callout cpu lookup is
   performed.  Note that the channel used for locking the sleepqueue is
   obtained from the "current" callout cpu (&cc->cc_waiting).
   If the callout migrated in the meanwhile, callout_drain() will end up
   using the wrong wchan for the sleepqueue (the locked one will be the
   older, while the new one will not really be locked) leading to a
   lock leak and a race access to sleepqueue.
 - Solution3:
   It is enough to check if a migration happened between the operation
   of acquiring the sleepqueue lock and the new callout cpu lock and
   eventually unwind all those and try again.

This problems can lead to deathly races on moderate (4-ways) SMP
environment, leading to easy panic or deadlocks.
The 24-ways of the reporter, could easilly panic, with completely
normal workload, almost daily.
gianni@ kindly wrote the following prof-of-concept which can
panic a FreeBSD machine in less than one hour, in smaller SMP:
http://www.freebsd.org/~attilio/callout/test.c

Reported by:	Nicholas Esborn <nick at desert dot net>, DesertNet
In collabouration with:	gianni, pho, Nicholas Esborn
Reviewed by:	jhb
MFC after:	1 week (*)

* Usually, I would aim for a larger MFC timeout, but I really want this
  in before 8.2-RELEASE, thus re@ accepted a shorter timeout as a special
  case for this patch
2010-12-29 18:17:36 +00:00
..
amd64 Increase size of pcb_flags to four bytes. 2010-12-22 19:57:03 +00:00
arm IXP4XX_GPIO_{,UN}LOCK() don't take args. Remove the sc here to make 2010-12-23 19:28:50 +00:00
boot Give a bit of a hint of the failure (read != expected) but don't make 2010-11-25 03:16:31 +00:00
bsm
cam Fix a few issues related to the XPT_GDEV_ADVINFO CCB. 2010-12-10 21:38:51 +00:00
cddl cyclic xcall: use smp_no_rendevous_barrier as setup function parameter 2010-12-17 18:22:50 +00:00
compat Merge amd64 and i386 bus.h and move the resulting header to x86. Replace 2010-12-20 16:39:43 +00:00
conf MIPS has lots of flavors as well 2010-12-28 22:49:28 +00:00
contrib Update firmware for wpi(4) from version 2.14.4 to 15.32.2.9. 2010-12-19 11:37:44 +00:00
crypto Remove DEBUG sections. 2010-11-27 15:41:44 +00:00
ddb
dev Add reporting of GEOM::candelete BIO_GETATTR for md(4) and geom_disk(4). 2010-12-29 12:11:07 +00:00
fs Delete the nfsvno_localconflict() function in the experimental 2010-12-28 23:50:13 +00:00
gdb there must be only one SYSINIT with SI_SUB_RUN_SCHEDULER+SI_ORDER_ANY order 2010-09-30 17:05:23 +00:00
geom Add reporting of GEOM::candelete BIO_GETATTR for md(4) and geom_disk(4). 2010-12-29 12:11:07 +00:00
gnu Remove prtactive variable and related printf()s in the vop_inactive 2010-11-19 21:17:34 +00:00
i386 Revert r216777, per jhb@ 2010-12-28 22:45:29 +00:00
ia64 Revert r216134. This checkin broke platforms where bus_space are macros: 2010-12-03 07:09:23 +00:00
isa bus_add_child: change type of order parameter to u_int 2010-09-10 11:19:03 +00:00
kern Fix several callout migration races: 2010-12-29 18:17:36 +00:00
kgssapi
libkern Add support for asterisk characters when filling in the GELI password 2010-11-14 14:12:43 +00:00
mips When allocating memory from bootmem for the kernel to use, try to leave about 2010-12-28 20:11:54 +00:00
modules Update firmware for wpi(4) from version 2.14.4 to 15.32.2.9. 2010-12-19 11:37:44 +00:00
net Introduce and use a new VM interface for temporarily pinning pages. This 2010-12-25 21:26:56 +00:00
net80211 The meshid element is memcpy()'ed into se_meshid if included in either 2010-11-22 19:01:47 +00:00
netatalk
netgraph Simplify ng_pipe locking model by relying on the netgraph framework 2010-11-24 16:02:58 +00:00
netinet Add a comment for the ccv member of struct tcpcb. 2010-12-28 12:37:57 +00:00
netinet6 Improve plausibility check in sctp_handle_sack(). 2010-12-22 17:59:38 +00:00
netipsec After some off-list discussion, revert a number of changes to the 2010-11-22 19:32:54 +00:00
netipx
netnatm
netncp
netsmb
nfs Fix the type of the 3rd argument for nm_getinfo so that it works 2010-10-19 11:55:58 +00:00
nfsclient Remove prtactive variable and related printf()s in the vop_inactive 2010-11-19 21:17:34 +00:00
nfsserver ZFS might not return monotonically increasing directory offset cookies, 2010-12-28 21:12:15 +00:00
nlm Modify the NFS clients and the NLM so that the NLM can be used 2010-10-19 00:20:00 +00:00
opencrypto Let cryptosoft(4) add its pseudo-device with a specific unit number and its 2010-11-14 13:09:32 +00:00
pc98 Merge amd64 and i386 bus.h and move the resulting header to x86. Replace 2010-12-20 16:39:43 +00:00
pci Remove standard PCI configuration space register definitions. 2010-11-08 22:10:51 +00:00
powerpc Only keep track of PTE validity statistics for pages not locked in the 2010-12-28 17:02:15 +00:00
rpc Fix the krpc so that it can handle NFSv3,UDP mounts with a read/write 2010-10-13 00:57:14 +00:00
security Fix typos. 2010-11-09 10:59:09 +00:00
sparc64 On UltraSPARC-III+ and greater take advantage of ASI_ATOMIC_QUAD_LDD_PHYS, 2010-12-29 16:59:33 +00:00
sun4v Revert r216134. This checkin broke platforms where bus_space are macros: 2010-12-03 07:09:23 +00:00
sys - Follow r216313, the sched_unlend_user_prio is no longer needed, always 2010-12-29 09:26:46 +00:00
teken Use proper bounds checking on VPA. 2010-12-05 10:15:23 +00:00
tools Add an extra comment to the SDT probes definition. This allows us to get 2010-08-22 11:18:57 +00:00
ufs Add kernel side support for BIO_DELETE/TRIM on UFS. 2010-12-29 12:25:28 +00:00
vm Move the increment of vm object generation count into 2010-12-29 12:53:53 +00:00
x86 Drop the icu_lock spinlock while pausing briefly after masking the 2010-12-23 15:17:28 +00:00
xdr
xen Fix a typo in a comment. 2010-12-14 20:57:40 +00:00
Makefile Add lex and yacc sources to things cscope'd. 2010-11-21 03:58:11 +00:00