1
0
mirror of https://git.FreeBSD.org/src.git synced 2024-12-22 11:17:19 +00:00
freebsd/sys/dev
Bill Paul 7c1968ad82 Finally bring an end to the great "make the Atheros NDIS driver
work on SMP" saga. After several weeks and much gnashing of teeth,
I have finally tracked down all the problems, despite their best
efforts to confound and annoy me.

Problem nunmber one: the Atheros windows driver is _NOT_ a de-serialized
miniport! It used to be that NDIS drivers relied on the NDIS library
itself for all their locking and serialization needs. Transmit packet
queues were all handled internally by NDIS, and all calls to
MiniportXXX() routines were guaranteed to be appropriately serialized.
This proved to be a performance problem however, and Microsoft
introduced de-serialized miniports with the NDIS 5.x spec. Microsoft
still supports serialized miniports, but recommends that all new drivers
written for Windows XP and later be deserialized. Apparently Atheros
wasn't listening when they said this.

This means (among other things) that we have to serialize calls to
MiniportSendPackets(). We also have to serialize calls to MiniportTimer()
that are triggered via the NdisMInitializeTimer() routine. It finally
dawned on me why NdisMInitializeTimer() takes a special
NDIS_MINIPORT_TIMER structure and a pointer to the miniport block:
the timer callback must be serialized, and it's only by saving the
miniport block handle that we can get access to the serialization
lock during the timer callback.

Problem number two: haunted hardware. The thing that was _really_
driving me absolutely bonkers for the longest time is that, for some
reason I couldn't understand, my test machine would occasionally freeze
or more frustratingly, reset completely. That's reset and in *pow!*
back to the BIOS startup. No panic, no crashdump, just a reset. This
appeared to happen most often when MiniportReset() was called. (As
to why MiniportReset() was being called, see problem three below.)
I thought maybe I had created some sort of horrible deadlock
condition in the process of adding the serialization, but after three
weeks, at least 6 different locking implementations and heroic efforts
to debug the spinlock code, the machine still kept resetting. Finally,
I started single stepping through the MiniportReset() routine in
the driver using the kernel debugger, and this ultimately led me to
the source of the problem.

One of the last things the Atheros MiniportReset() routine does is
call NdisReadPciSlotInformation() several times to inspect a portion
of the device's PCI config space. It reads the same chunk of config
space repeatedly, in rapid succession. Presumeably, it's polling
the hardware for some sort of event. The reset occurs partway through
this process. I discovered that when I single-stepped through this
portion of the routine, the reset didn't occur. So I inserted a 1
microsecond delay into the read loop in NdisReadPciSlotInformation().
Suddenly, the reset was gone!!

I'm still very puzzled by the whole thing. What I suspect is happening
is that reading the PCI config space so quickly is causing a severe
PCI bus error. My test system is a Sun w2100z dual Opteron system,
and the NIC is a miniPCI card mounted in a miniPCI-to-PCI carrier card,
plugged into a 100Mhz PCI slot. It's possible that this combination of
hardware causes a bus protocol violation in this scenario which leads
to a fatal machine check. This is pure speculation though. Really all I
know for sure is that inserting the delay makes the problem go away.
(To quote Homer Simpson: "I don't know how it works, but fire makes
it good!")

Problem number three: NdisAllocatePacket() needs to make sure to
initialize the npp_validcounts field in the 'private' section of
the NDIS_PACKET structure. The reason if_ndis was calling the
MiniportReset() routine in the first place is that packet transmits
were sometimes hanging. When sending a packet, an NDIS driver will
call NdisQueryPacket() to learn how many physical buffers the packet
resides in. NdisQueryPacket() is actually a macro, which traverses
the NDIS_BUFFER list attached to the NDIS_PACKET and stashes some
of the results in the 'private' section of the NDIS_PACKET. It also
sets the npp_validcounts field to TRUE To indicate that the results are
now valid. The problem is, now that if_ndis creates a pool of transmit
packets via NdisAllocatePacketPool(), it's important that each time
a new packet is allocated via NdisAllocatePacket() that validcounts
be initialized to FALSE. If it isn't, and a previously transmitted
NDIS_PACKET is pulled out of the pool, it may contain stale data
from a previous transmission which won't get updated by NdisQueryPacket().
This would cause the driver to miscompute the number of fragments
for a given packet, and botch the transmission.

Fixing these three problems seems to make the Atheros driver happy
on SMP, which hopefully means other serialized miniports will be
happy too.

And there was much rejoicing.

Other stuff fixed along the way:

- Modified ndis_thsuspend() to take a mutex as an argument. This
  allows KeWaitForSingleObject() and KeWaitForMultipleObjects() to
  avoid any possible race conditions with other routines that
  use the dispatcher lock.

- Fixed KeCancelTimer() so that it returns the correct value for
  'pending' according to the Microsoft documentation

- Modfied NdisGetSystemUpTime() to use ticks and hz rather than
  calling nanouptime(). Also added comment that this routine wraps
  after 49.7 days.

- Added macros for KeAcquireSpinLock()/KeReleaseSpinLock() to hide
  all the MSCALL() goop.

- For x86, KeAcquireSpinLockRaiseToDpc() needs to be a separate
  function. This is because it's supposed to be _stdcall on the x86
  arch, whereas KeAcquireSpinLock() is supposed to be _fastcall.
  On amd64, all routines use the same calling convention so we can
  just map KeAcquireSpinLockRaiseToDpc() directly to KfAcquireSpinLock()
  and it will work. (The _fastcall attribute is a no-op on amd64.)

- Implement and use IoInitializeDpcRequest() and IoRequestDpc() (they're
  just macros) and use them for interrupt handling. This allows us to
  move the ndis_intrtask() routine from if_ndis.c to kern_ndis.c.

- Fix the MmInitializeMdl() macro so that is uses sizeof(vm_offset_t)
  when computing mdl_size instead of uint32_t, so that it matches the
  MmSizeOfMdl() routine.

- Change a could of M_WAITOKs to M_NOWAITs in the unicode routines in
  subr_ndis.c.

- Use the dispatcher lock a little more consistently in subr_ntoskrnl.c.

- Get rid of the "wait for link event" hack in ndis_init(). Now that
  I fixed NdisReadPciSlotInformation(), it seems I don't need it anymore.
  This should fix the witness panic a couple of people have reported.

- Use MSCALL1() when calling the MiniportHangCheck() function in
  ndis_ticktask(). I accidentally missed this one when adding the
  wrapping for amd64.
2005-03-27 10:14:36 +00:00
..
aac purge dead code 2005-03-26 23:37:54 +00:00
acpi_support Use device_set_desc_copy() for non-constant strings. 2005-03-24 21:07:55 +00:00
acpica If a device_add_child fails (i.e. low memory situation), be sure to free 2005-03-27 03:37:43 +00:00
adlink Instead of a rather useless generation number, use a sample number to 2005-03-19 12:55:46 +00:00
advansys Use BUS_PROBE_DEFAULT for pci probe return value 2005-03-05 19:24:22 +00:00
agp
aha
ahb
aic
aic7xxx Use BUS_PROBE_DEFAULT for pci probe return value 2005-03-05 19:24:22 +00:00
amd Use BUS_PROBE_DEFAULT 2005-03-06 06:55:11 +00:00
amr Fix a null pointer de-ref when passthrough ioctls are issued. This 2005-03-13 06:25:53 +00:00
an Use BUS_PROBE_DEFAULT for pci probe return value 2005-03-05 19:06:12 +00:00
ar Use BUS_PROBE_DEFAULT for pci probe return value 2005-03-05 19:06:12 +00:00
arl
asr Bring back some of the ioctl junk that was removed in rev 1.59 as a 2005-03-17 01:20:49 +00:00
ata Whitespace nit. Clarifies which body this line belongs to. 2005-03-06 10:17:30 +00:00
ath fix braino introduced when converting from madwifi 2005-03-20 01:55:02 +00:00
atkbdc
auxio
awi reclaim mbuf chain when ieee80211_crypto_encap fails 2005-03-08 17:01:03 +00:00
bfe Releasing TX/RX descriptor dmamaps during device detachment instead of 2005-03-17 13:59:30 +00:00
bge Adding new device ID for BCM5751M support. 2005-03-12 06:51:25 +00:00
bktr Use BUS_PROBE_DEFAULT for pci probe return value 2005-03-05 19:06:12 +00:00
buslogic
cardbus Use STAILQ in preference to SLIST for the resources. Insert new resources 2005-03-18 05:19:50 +00:00
ciss Use BUS_PROBE_DEFAULT for pci probe return value 2005-03-05 18:30:12 +00:00
cm
cnw
cp Use BUS_PROBE_DEFAULT for pci probe return value 2005-03-05 18:30:12 +00:00
cpufreq Only activate ICH speedstep if we're going to use it. No bugs were observed 2005-03-20 01:25:21 +00:00
cs
ct
ctau
cx netchild's mega-patch to isolate compiler dependencies into a central 2005-03-02 21:33:29 +00:00
cy Use BUS_PROBE_DEFAULT for pci probe return value 2005-03-05 18:30:12 +00:00
dc Bugger, wiped out a needed comma in the previous commit. 2005-03-09 00:54:55 +00:00
dcons
de
dec
digi Use BUS_PROBE_DEFAULT for pci probe return value 2005-03-05 18:30:12 +00:00
dpt Use BUS_PROBE_DEFAULT for pci probe return value 2005-03-05 18:30:12 +00:00
drm
ed Use BUS_PROBE_DEFAULT for pci probe return value 2005-03-05 18:30:12 +00:00
eisa Now that the Adaptec 2842 has its own probe routine, no need to have 2005-03-17 17:36:07 +00:00
em Use BUS_PROBE_DEFAULT for pci probe return value 2005-03-05 18:30:12 +00:00
en Use BUS_PROBE_DEFAULT for pci probe return value 2005-03-05 18:30:12 +00:00
ep
esp The existing locking in the esp driver appears to be fairly adequate, so 2005-03-02 15:56:42 +00:00
ex
exca
fatm Use BUS_PROBE_DEFAULT for pci probe return value 2005-03-05 18:30:12 +00:00
fb check copyin return values when loading pallete 2005-03-26 18:01:35 +00:00
fdc If we fail a sanity check for the resources just allocated, make sure 2005-03-15 08:02:47 +00:00
fe
firewire Use BUS_PROBE_DEFAULT for pci probe return value 2005-03-05 18:30:12 +00:00
fxp Unload and destroy the TX DMA maps before destroying the DMA tag 2005-03-16 16:39:04 +00:00
gem Use BUS_PROBE_DEFAULT for pci probe return value 2005-03-05 18:30:12 +00:00
gfb
harp
hatm Use BUS_PROBE_DEFAULT for pci probe return value 2005-03-05 18:30:12 +00:00
hfa Use BUS_PROBE_DEFAULT for pci probe return value 2005-03-05 18:30:12 +00:00
hifn Use BUS_PROBE_DEFAULT for pci probe return value 2005-03-05 18:30:12 +00:00
hme Use BUS_PROBE_DEFAULT for pci probe return value 2005-03-05 18:30:12 +00:00
hptmv Don't read past the end of pVDevice[]. (Previously, we would iterate 2005-03-18 05:43:34 +00:00
ic
ichsmb Use BUS_PROBE_DEFAULT for pci probe return value 2005-03-05 18:17:35 +00:00
ichwd
ida Use BUS_PROBE_DEFAULT for pci probe return value 2005-03-05 18:17:35 +00:00
idt Use BUS_PROBE_DEFAULT for pci probe return value 2005-03-05 18:17:35 +00:00
ie
ieee488 Add placeholder mutex argument to new_unrhdr(). 2005-03-07 11:05:47 +00:00
if_ndis Finally bring an end to the great "make the Atheros NDIS driver 2005-03-27 10:14:36 +00:00
iicbus
iir Use BUS_PROBE_DEFAULT for pci probe return value 2005-03-05 18:17:35 +00:00
io
ips Use BUS_PROBE_DEFAULT for pci probe return value 2005-03-05 18:17:35 +00:00
isp Prefer <sys/cdefs.h>'s __printflike() macro to the recently added 2005-03-07 15:29:11 +00:00
ispfw
ixgb Use BUS_PROBE_DEFAULT for pci probe return value 2005-03-05 18:17:35 +00:00
joy
kbd
led Add placeholder mutex argument to new_unrhdr(). 2005-03-07 11:05:47 +00:00
lge Use BUS_PROBE_DEFAULT for pci probe return value 2005-03-05 18:17:35 +00:00
lnc Use BUS_PROBE_DEFAULT for pci probe return value 2005-03-05 18:17:35 +00:00
mc146818
mca
mcd netchild's mega-patch to isolate compiler dependencies into a central 2005-03-02 21:33:29 +00:00
md
mem
mii
mk48txx
mlx Don't call mlx_free() i mlx_attach() in case of failure. Doing so 2005-03-26 21:58:09 +00:00
mly Use BUS_PROBE_DEFAULT for pci probe return value 2005-03-05 18:10:49 +00:00
mpt Use BUS_PROBE_DEFAULT for pci probe return value 2005-03-05 18:10:49 +00:00
mse
musycc Use BUS_PROBE_DEFAULT for pci probe return value 2005-03-05 18:10:49 +00:00
my Use BUS_PROBE_DEFAULT for pci probe return value 2005-03-05 18:10:49 +00:00
ncv
nge deref correct mbuf ptr to collect any vlan tag 2005-03-26 18:47:17 +00:00
nmdm
nsp
null
nve Support MCP versions 4-11. 2005-03-24 18:55:07 +00:00
ofw
owi
patm Use BUS_PROBE_DEFAULT for pci probe return value 2005-03-05 18:10:49 +00:00
pbio
pccard deal with malloc failure 2005-03-26 21:34:12 +00:00
pccbb Use BUS_PROBE_DEFAULT for pci probe return value 2005-03-05 18:10:49 +00:00
pcf
pci fix a copy/paste typo for scanner/gameport... 2005-03-26 22:17:48 +00:00
pdq Offer unhandled IOCTLS to fddi_ioctl(). 2005-03-24 01:58:20 +00:00
ppbus When locking a MTX_SPIN, one needs to use mtx_lock_spin. 2005-03-17 20:45:24 +00:00
ppc
pst Use BUS_PROBE_DEFAULT for pci probe return value 2005-03-05 18:10:49 +00:00
puc Use BUS_PROBE_DEFAULT for pci probe return value 2005-03-05 18:10:49 +00:00
random Fix off-by-one (too little!) array size problem. 2005-03-18 07:13:35 +00:00
ray
rc
re Unbreak build with POLLING. I should really listen and test with NOTES 2005-03-13 01:54:41 +00:00
rndtest
rp - Use pci_get_device() and pci_get_vendor() when we only want one part 2005-03-25 03:10:51 +00:00
sab
safe Use BUS_PROBE_DEFAULT in preference to 0 and BUS_PROBE_LOW_PRIORITY in 2005-03-01 08:58:06 +00:00
sbni
sbsh Use BUS_PROBE_DEFAULT in preference to 0 and BUS_PROBE_LOW_PRIORITY in 2005-03-01 08:58:06 +00:00
scd netchild's mega-patch to isolate compiler dependencies into a central 2005-03-02 21:33:29 +00:00
sf
si Use BUS_PROBE_DEFAULT in preference to 0 and BUS_PROBE_LOW_PRIORITY in 2005-03-01 08:58:06 +00:00
sio
sk handle malloc failure and sk_vpd_prodname potentially being null for 2005-03-26 22:57:28 +00:00
smbus
sn
snc
snp Disable two users of findcdev. They do the wrong thing now and will 2005-03-15 12:39:30 +00:00
sound Return BUS_PROBE_DEFAULT in preference to 0. 2005-03-20 20:13:21 +00:00
speaker
sr Use BUS_PROBE_DEFAULT in preference to 0 and BUS_PROBE_LOW_PRIORITY in 2005-03-01 08:58:06 +00:00
stg Use BUS_PROBE_DEFAULT in preference to 0 and BUS_PROBE_LOW_PRIORITY in 2005-03-01 08:58:06 +00:00
streams
sx Use BUS_PROBE_DEFAULT in preference to 0 and BUS_PROBE_LOW_PRIORITY in 2005-03-01 08:58:06 +00:00
sym eliminate double free when sym_cam_attach fails 2005-03-26 18:17:58 +00:00
syscons
tdfx Use BUS_PROBE_DEFAULT in preference to 0. Also for vx, return 2005-03-01 07:50:12 +00:00
tga Use BUS_PROBE_DEFAULT in preference to 0. Also for vx, return 2005-03-01 07:50:12 +00:00
ti deal with malloc failure when setting up the multicast filter 2005-03-26 23:26:49 +00:00
trm Use BUS_PROBE_DEFAULT in preference to 0. Also for vx, return 2005-03-01 07:50:12 +00:00
twa check copyin+copyout return values when processing TWA_IOCTL_GET_LOCK 2005-03-27 00:29:37 +00:00
twe Use correct flags for bus_dma_tag_create(). 2005-03-06 20:57:54 +00:00
tx Use BUS_PROBE_DEFAULT in preference to 0. Also for vx, return 2005-03-01 07:50:12 +00:00
txp Use BUS_PROBE_DEFAULT in preference to 0. Also for vx, return 2005-03-01 07:50:12 +00:00
uart In uart_cpu_getdev_console() when determinig whether we should use 2005-03-12 17:06:03 +00:00
ubsec Use BUS_PROBE_DEFAULT in preference to 0. Also for vx, return 2005-03-01 07:50:12 +00:00
usb Comment out rue_miibus_statchg() function. Using trial-and-error approach I 2005-03-25 20:19:18 +00:00
utopia
vge Use BUS_PROBE_DEFAULT in preference to 0. Also for vx, return 2005-03-01 07:50:12 +00:00
vkbd
vr
vx Use BUS_PROBE_DEFAULT in preference to 0. Also for vx, return 2005-03-01 07:50:12 +00:00
watchdog
wds
wi purge dead code 2005-03-26 23:51:39 +00:00
wl
xe
zs