the target directory or file. This case should fail in the filesystem
anyway and perhaps kern_rename() should catch it.
Sponsored by: Isilon Systems, Inc.
branch:
Integrate audit.c to audit_worker.c, so as to migrate the worker
thread implementation to its own .c file.
Populate audit_worker.c using parts now removed from audit.c:
- Move audit rotation global variables.
- Move audit_record_write(), audit_worker_rotate(),
audit_worker_drain(), audit_worker(), audit_rotate_vnode().
- Create audit_worker_init() from relevant parts of audit_init(),
which now calls this routine.
- Recreate audit_free(), which wraps uma_zfree() so that
audit_record_zone can be static to audit.c.
- Unstaticize various types and variables relating to the audit
record queue so that audit_worker can get to them. We may want
to wrap these in accessor methods at some point.
- Move AUDIT_PRINTF() to audit_private.h.
Addition of audit_worker.c to kernel configuration, missed in
earlier submit.
Obtained from: TrustedBSD Project
Add ioctls to audit pipes in order to allow querying of the current
record queue state, setting of the queue limit, and querying of pipe
statistics.
Obtained from: TrustedBSD Project
net.inet.ip.portrange.reservedlow apply to IPv6 aswell as IPv4.
We could have made new sysctls for IPv6, but that potentially makes
things complicated for mapped addresses. This seems like the least
confusing option and least likely to cause obscure problems in the
future.
This change makes the mac_portacl module useful with IPv6 apps.
Reviewed by: ume
MFC after: 1 month
really breaking things. Simple "close(0); dup(fd)" does not return descriptor
"0" in some cases. Further, this change also breaks some MAC interactions with
mac_execve_will_transition(). Under certain circumstances, fdcheckstd() can
be called in execve(2) causing an assertion that checks to make sure that
stdin, stdout and stderr reside at indexes 0, 1 and 2 in the process fd table
to fail, resulting in a kernel panic when INVARIANTS is on.
This should also kill the "dup(2) regression on 6.x" show stopper item on the
6.1-RELEASE TODO list.
This is a RELENG_6 candidate.
PR: kern/87208
Silence from: des
MFC after: 1 week
Change send_trigger() prototype to return an int, so that user
space callers can tell if the message was successfully placed
in the trigger queue. This isn't quite the same as it being
successfully received, but is close enough that we can generate
a more useful warning message in audit(8).
Obtained from: TrustedBSD Project
transfers. This fixes some cases where the software toggle tracking
was not doing the right thing. For example, a short transfer that
transferred 0 bytes of the requested qTD transfer size does cause
a toggle change, but the existing code was assuming it didn't.
Reported and tested by: pav
MFC after: 2 weeks
Add bus attachment for the ohci device on this chip. The bus and hub
are detected correctly, but the children devices aren't detected
correctly for reasons unknown.
o update TODO list
o Better use of busdma
o mark RX dtors as COHERENT. This helps performance a lot by not requiring
so many EXPENSIVE cache flushes. The cost of accessing it non-cached
is much smaller.
o Copy data from Rx buffers to make IP header 4 byte aligned.
o CRC length included in reported length, so cope
o Don't free TX buffer twice
o Manage TX buffers better.
o Enable just the interrupts we want.
o Manage OACTIVE better
# Some of these done by cognet
# These changes let us get to # via NFS root.
o Add memory barrier to bus space
o Allow for up to 3 IRQs per device
o Move to table driven population of children devices.
o Add support for usb ohci memory mapped controller resource allocation.
o Clean up a bunch of extra writes to disable interrupts that are now
done elsewhere.
o Force all system interrupt handlers be fast. We get deadlock if they
aren't.
o Disable all interrupts that the ST can generate until we have an ISR
to service them.
o Correct clock calculation to make DELAY the right length...
Submitted by: cognet (#2)
request, the FreeBSD NFS client will quickly back off to a excessively
long wait (days, then weeks) before retrying the request.
Change the behavior of the FreeBSD NFS client to match the behavior of
the reference NFS client implementation (Solaris). This provides a fixed
delay of 10 seconds between each retry by default. A sysctl, called
nfs3_jukebox_delay, is now available to tune the delay. Unlike Solaris,
the sysctl value on FreeBSD is in seconds, rather than in HZ.
Sponsored by: Network Appliance, Incorporated
Reviewed by: rick
Approved by: silby
MFC after: 3 days
sockets as long as the sockets have not been aborted or detached. Do
not try to free the socket in pru_detach(), since sofree() will do so,
if needed, once pru_detach() returns.
Annotate a bug in ddp_abort(), which fails to free the socket; this
is probably OK as ddp_abort() should never be called, so should
instead be deleted.
pcb's allocated:
- Universally ensure (and assert) that so_pcb is not NULL, removing lots
of checks and error cases. Don't free the pcb without clearing the
so_pcb pointer.
- Don't try to free the socket in pru_detach(), since the caller will
immediately free the socket.
- Do retain the sotryfree() in pru_abort() for now, although eventually
the caller will do it unconditionally.
defined for an in-use socket. This allows us to eliminate countless tests
of whether so_pcb is non-NULL, eliminating dozens of error cases. For
now, retain the call to sotryfree() in the uipc_abort() path, but this
will eventually move to soabort().
These new assumptions should be largely correct, and will become more so
as the socket/pcb reference model is fixed. Removing the notion that
so_pcb can be non-NULL is a critical step towards further fine-graining
of the UNIX domain socket locking, as the so_pcb reference no longer
needs to be protected using locks, instead it is a property of the socket
life cycle.
K&R style function declarations with ANSI style. Also fix endian bugs
accessing ioctl arguments that are passed by value.
PR: kern/93897
Submitted by: Vilmos Nebehaj < vili at huwico dot hu >
MFC after: 1 week
debugging message if the flag PMC_F_OLDVALUE was specified in the
PMC_OP_RW request being acted upon. This should fix Coverity bug
CID 671.
Found by: Coverity Prevent
MFC after: 3 weeks
userland. The comment indicated that something in userland needed them, but
make universe can't seem to find any traces of it.
Move <sys/queue.h> include up.
consumers ignore the return value, soabort() is required to succeed,
and protocols produce errors here to report multiple freeing of the
pcb, which we hope to eliminate.
especially for vchans. It turns out that channel numbering always depend
on d->devcount counter (which keep increasing), while PCMMKMINOR() truncate
everything to 8bit length. At some point the truncation cause the newly
created character device overlapped with the existence one, causing erratic
overall system behaviour and panic. Easily reproduce with something like:
(Luckily, only root can reproduce this)
while : ; do
sysctl hw.snd.pcm0.vchans=200
sysctl hw.snd.pcm0.vchans=100
done
- Enforce channel/chardev numbering within 8bit boundary. Return E2BIG
if necessary.
- Traverse d->channels SLIST and try to reclaim "free" counter during channel
creation. Don't rely on d->devcount at all.
- Destroy vchans in reverse order.
Anyway, this is not the fault of vchans. It is just that vchans are so cute
and begging to be abused ;) . Don't blame her.
Old, hidden bugs.. sigh..
MFC after: 3 days
this corrects problems with drivers that rely on the host to do
crypto (iwi, ipw, ral, ural, wi (hostap), awi)
Hard work by: luigi, mlaier
Reviewed by: luigi, mlaier
MFC after: 1 week
applications.
Open[BGP|OSPF]D make use of this to determine the link status of
interfaces to make the right routing descisions.
Obtained from: OpenBSD
MFC after: 3 days
- Allow RTM_CHANGE to change a number of route flags as specified by
RTF_FMASK.
- The unused rtm_use field in struct rt_msghdr is redesignated as
rtm_fmask field to communicate route flag changes in RTM_CHANGE
messages from userland. The use count of a route was moved to
rtm_rmx a long time ago. For source code compatibility reasons
a define of rtm_use to rtm_fmask is provided.
These changes faciliate running of multiple cooperating routing
daemons at the same time without causing undesired interference.
Open[BGP|OSPF]D make use of these features to have IGP routes
override EGP ones.
Obtained from: OpenBSD (claudio@)
MFC after: 3 days
protocol to the socket. Normally protocol references are weak: that is,
the socket layer can tear down the socket (and hence protocol state)
when it finds convenient. This flag will allow the protocol to
explicitly declare to the socket layer that it is maintaining a
strong reference, rather than the current implicit model associated
with so_pcb pointer values and repeated attempts to possibly free the
socket.
rather than panicking later. This can occur if the kernel calls
vn_open() on a fifo, as there will be no associated file descriptor,
and therefore the file descriptor operations cannot be modified to
point to the fifo operation set.
MFC after: 3 days
Reported by: Martin <nakal at nurfuerspam dot de>
PR: 94278
Previously, we tried to allow this only for root. However, we were calling
suser() on the *target* process rather than the current process. This
means that if you can ptrace() a process running as root you can set a
hardware watch point in the kernel. In practice I think you probably have
to be root in order to pass the p_candebug() checks in ptrace() to attach
to a process running as root anyway. Rather than fix the suser(), I just
axed the entire idea, as I can't think of any good reason _at all_ for
userland to set hardware watch points for KVM.
MFC after: 3 days
Also thinks hardware watch points on KVM from userland are bad: bde, rwatson
processing, this actually means there's a double slash recorded in the
symbolic link's path name. We used to start over from / then, which
caused link targets like ../../bsdi.1.0/include//pathnames.h to be
interpreted as /pathnahes.h. This is both contradictionary to our
conventional slash interpretation, as well as potentially dangerous.
The right thing to do is (obviously) to just ignore that element.
bde once pointed out that mistake when he noticed it on the
4.4BSD-Lite2 CD-ROM, and asked me for help.
Reviewed by: bde (about half a year ago)
MFC after: 3 days
specified, the rightmost option takes effect." Fix code to obey
this. This makes e.g. "mount -r /usr" or "mount -ar" actually
mount file systems read-only.
to be 'long' instead of 'int' so that sysctl(8) correctly displays
the 8 returned bytes as a single 'long' instead of two 'int' values.
Submitted by: peter
Submitted by: green
- Speed up synchronization process by using configurable number of I/O
requests in parallel.
+ Add kern.geom.raid3.sync_requests tunable which defines how many parallel
I/O requests should be used.
+ Retire kern.geom.raid3.reqs_per_sync and kern.geom.raid3.syncs_per_sec
sysctls.
- Fix race between regular and synchronization requests.
- Reimplement raid3's data synchronization - do not use the topology lock
for this purpose, as it may case deadlocks.
- Stop synchronization from pre-sync hook.
- Fix some other minor issues.
Tested by: Mike Tancsa <mike@sentex.net>
MFC after: 3 days
requests in parallel.
+ Add kern.geom.mirror.sync_requests tunable which defines how many parallel
I/O requests should be used.
+ Retire kern.geom.mirror.reqs_per_sync and kern.geom.mirror.syncs_per_sec
sysctls.
- Fix race between regular and synchronization requests.
- Reimplement mirror's data synchronization - do not use the topology lock
for this purpose, as it may case deadlocks.
- Stop synchronization from pre-sync hook.
- Fix some other minor issues.
MFC after: 3 days
o call firmware_put() early to release the firmware module
o on firmware panics or watchdog timeouts, schedule a task to reinitialize
the interface (we may sleep in iwi_init())
o discard oversized rx frames
This does not do what I wanted as all dirty buffers must be flushed
by the call to ffs_sync and any remaining dependency work would mean
that this failed.
Pointed out by: tegge
This does not do what I wanted as all dirty buffers must be flushed
by the call to ffs_sync and any remaining dependency work would mean
that this failed.
Pointed out by: tegge
Fix detection of active unlinked files by checking VI_OWEINACT and
VI_DOINGINACT in addition to v_usecount.
Defer inactive handling for unlinked files if the file system is mostly
suspended (secondary writes being blocked).
Perform deferred inactive handling after the file system is resumed.
flag isn't enough - the filter needs to be set up too, or no multicast frames
are accepted.
Sponsored by: Philips Industrial Applications (indirectly)
MFC after: 3 days
o stop processing interrupts after a firmware fatal error or a radio kill
o clarify the possible values for the 'antenna' sysctl.
o by default, let the firmware do antenna diversity.
the firmware will periodically switch to another antenna to evaluate the
signal quality.
ATA framework. Mainly written to be able to use USB Flash keys.
This is work in progress so use with care :)
Doesn't need CAM and cannot coexist with umass.c
means that old problem was triggered (when two providers end at the same
offset, eg. ad0 and ad0s1 and the wrong was is picked up by gmirror/graid3).
Reported by: Michal Suszko <dry@dry.pl>
MFC after: 3 days
Use 'BOOT_SENSITIVE_INFO=YES' variable to turn them on.
- Use 'uint*_t' instead of 'u_int*_t', correct compilation warnings, and
update copyright while I am here.
In at least one benchmark this showed around a 20% performance increase.
If other workloads do benefit from having hyperthreads service interrupts,
we can always make this a loader tunable.
MFC after: 3 days
Tested by: ps
which is normal for own files of a device driver.
DEV_FOO should be used if an unrelated kernel file needs to know of
the `foo' driver's static presence. Obviously, module source files
should never use DEV_*.
triggers.
This should eliminate all the trivial messages which result from minor
increases in cpu_tick frequency.
Machines which don't du cpu clock fiddling shouldn't issue "backwards"
messages now.
Laptops and other machines where the initial estimate of cputicks may be
waaaay off will still issue warnings.
replacement for vn_write_suspend_wait() to better account for secondary write
processing.
Close race where secondary writes could be started after ffs_sync() returned
but before the file system was marked as suspended.
Detect if secondary writes or softdep processing occurred during vnode sync
loop in ffs_sync() and retry the loop if needed.
actually is an mbuf to process. This catches the missing mbuf before it
would otherwise causes a NULL pointer dereference, which could be
triggered by a 0 length RPC record before the check for such records was
added in rev 1.97.
Approved by: cperciva (mentor)
The client's READDIRPLUS logic skips the attributes and
filehandle of the ".." entry. If the server doesn't send
attributes but does send a filehandle for "..", the
client's logic doesn't account for the extra "value
follows" field that indicates whether the filehandle is
present, causing the remaining entries in the reply
to be ignored.
Sponsored by: Network Appliance, Inc.
Reviewed by: rick, mohans
Approved by: silby
MFC after: 2 weeks
table. Previously, the ddb code knew of each linker set of auxiliary
commands and which explicit command list they were tied to. These changes
add a simple command_table struct that contains both the static list of
commands and the pointers for any auxiliary linker set of additional
commands. This also makes it possible for other arbitrary command tables
to be defined in other parts of the kernel w/o having to edit ddb itself.
The DB_SET macro has also been trimmed down to just creating an entry in
a linker set. A new DB_FUNC macro does what the old DB_SET did which is
to not only add an entry to the linker set but also to include a function
prototype for the function being added. With these changes, it's now also
possible to create aliases for ddb functions using DB_SET() directly if
desired.
slightly more tricky than on x86 as although the PCC is 64-bits, it is not
a simple 64-bit counter like the TSC. Instead, the upper 32-bits have
PAL-defined behavior and the lower 32-bits run as a free-running 32-bit
counter. To handle this, we detect overflows by maintaining a small amount
of per-cpu state and use this to simulate the upper 32-bits of the counter
providing a full 64-bit counter to the consumers of cpu_ticks().
an interrupt handler for the i8254. (Our clock interrupts come from
elsewhere.) Instead, use the same algo that i386 uses when the lapic
timer is in use. This lets us remove a lot of cruft that tried to handle
the i8254 interrupts that we weren't even using or setting up a handler
for.
- G/C a bunch of unused cruft while I'm here.
- Fix the code to not use the rpcc timecounter (similar to TSC) on SMP
machines to only disable that timecounter if more than one CPU is in
use by the kernel. Previously, a UP kernel on a machine with multiple
CPUs would needlessly disable this timecounter.
MFC after: 1 week
whether or not to allocate a full mbuf cluster rather than just a plain
mbuf when adding on additional mbufs in m_getm(). In practice, there wasn't
any resulting mem trashing since m_getm() doesn't ever allocate an mbuf with
a packet header, and MINCLSIZE is the available payload in an mbuf with a
header rather than the available payload in a plain mbuf.
Discussed with: andre (lightly)
will remain in the disabled state until another link event happens in the
future (if at all). Add a timer to periodically check the interface state and
recover.
Reported by: Nik Lam <freebsdnik j2d.lam.net.au>
MFC after: 3 days
enabled by default in NETSMB and smbfs.ko.
With the most of modern SMB providers requiring encryption by
default, there is little sense left in keeping the crypto part
of NETSMB optional at the build time.
This will also return smbfs.ko to its former properties users
are rather accustomed to.
Discussed with: freebsd-stable, re (scottl)
Not objected by: bp, tjr (silence)
MFC after: 5 days
generations of 802.11abg chipsets from Ralink Technology.
Get rid of the pccard front-end while I'm here since all adapters are
cardbus ones.
Obtained from: OpenBSD
vnode and a mode and checks if a given access mode is permitted.
This centralises the mac_bsdextended_enabled check and the GETATTR
calls and makes the implementation of the mac policy methods simple.
This should make it easier for us to match vnodes on more complex
attributes than just uid and gid in the future, but for now there
should be no functional change.
Approved/Reviewed by: rwatson, trhodes
MFC after: 1 month
sysinstall(8) still bogusly puts first partition at offset 0 instead of 16,
so glabel/ufs will find file system on slice instead of partition.
Before sysinstall is fixed, we must keep this code, which means that we
wont't be able to detect UFS file systems created with 'newfs -s ...'.
PS. bsdlabel(8) creates partitions properly.
MFC after: 3 days
- Include audit_internal.h to get definition of internal audit record
structures, as it's no longer in audit.h. Forward declare au_record
in audit_private.h as not all audit_private.h consumers care about
it.
- Remove __APPLE__ compatibility bits that are subsumed by configure
for user space.
- Don't expose in6_addr internals (non-portable, but also cleaner
looking).
- Avoid nested include of audit.h in audit_private.h.
Obtained from: TrustedBSD Project
- Add new comments.
- Move private data structures from public audit.h to audit_internal.h to
avoid exposing queue.h macros to undesiring consumers.
Obtained from: TrustedBSD Project
Releasing items from the mt_zone can not be done by a simple
uma_zfree() call since mt_zone is allocated with the UMA_ZONE_MALLOC
flag. Use uma_zfree_arg instead and supply the slab.
This bug caused panics in low memory situations on unloading kernel
modules containing MALLOC_DEFINE(..) statements.
Submitted by: ups
into a separate module. Accordingly, convert the option into a device
named similarly.
Note for MFC: Perhaps the option should stay in RELENG_6 for POLA reasons.
Suggested by: scottl
Reviewed by: cokane
MFC after: 5 days
re-organizing the monitor return logic. We perform interface monitoring
checks after we have determined if the CRC is still on the packet, if
it is, m_adj() is called which will adjust the packet length. This
ensures that we are not including CRC lengths in the byte counters for
each packet.
Discussed with: andre, glebius
that we might have address collisions, so make sure that this hardware address
isn't already in use on another bridge.
Submitted by: csjp
MFC after: 1 month
destination interface as a member of our bridge or this is a unicast packet,
push it through the bpf(4) machinery.
For broadcast or multicast packets, don't bother with the bpf(4) because it will
be re-injected into ether_input. We do this before we pass the packets through
the pfil(9) framework, as it is possible that pfil(9) will drop the packet or
possibly modify it, making it very difficult to debug firewall issues on the
bridge.
Further, implemented IFF_MONITOR for bridge interfaces. This does much the same
thing that it does for regular network interfaces: it pushes the packet to any
bpf(4) peers and then returns. This bypasses all of the bridge machinery,
saving mutex acquisitions, list traversals, and other operations performed by
the bridging code.
This change to the bridging code is useful in situations where individuals use a
bridge to multiplex RX/TX signals from two interfaces, as is required by some
network taps for de-multiplexing links and transmitting the RX/TX signals
out through two separate interfaces. This behaviour is quite common for network
taps monitoring links, especially for certain manufacturers.
Reviewed by: thompsa
MFC after: 1 month
Sponsored by: Seccuris Labs
be called without any vnode locks held. Remove calls to vn_start_write() and
vn_finished_write() in vnode_pager_putpages() and add these calls before the
vnode lock is obtained to most of the callers that don't already have them.
has many positive effects including improved smp locking, reducing
interdependencies between mounts that can lead to deadlocks, etc.
- Add the softdep worklist and various counters to the ufsmnt structure.
- Add a mount pointer to the workitem and remove mount pointers from the
various structures derived from the workitem as they are now redundant.
- Remove the poor-man's semaphore protecting softdep_process_worklist and
softdep_flushworklist. Several threads may now process the list
simultaneously.
- Add softdep_waitidle() to block the thread until all pending
dependencies being operated on by other threads have been flushed.
- Use softdep_waitidle() in unmount and snapshots to block either
operation until the fs is stable.
- Remove softdep worklist processing from the syncer and move it into the
softdep_flush() thread. This thread processes all softdep mounts
once each second and when it is called via the new softdep_speedup()
when there is a resource shortage. This removes the softdep hook
from the kernel and various hacks in header files to support it.
Reviewed by/Discussed with: tegge, truckman, mckusick
Tested by: kris
speeds to perform below the desired bitrate and throughput will be erratic.
This makes queueing work on the Geode SC1100, K5 model 0 and IDT WinChip C6
processors.
MFC after: 3 days
with malloc() or contigmalloc() as usual, but try to re-map the allocated
memory into a VA outside the KVA, non-cached, thus making the calls to
bus_dmamap_sync() for these buffers useless.
to call back for completition and something else is holding the taskqueue
waiting for ATA to return data.
This should clear up the "semaphore timeout !! DANGER Will Robinson !!"
in most situations, and log "taskqueue timeout - completing request directly"
instead, with a delayed "WARNING - freeing taskqueue zombie request" when
the taskqueue finally calls us back with the now stale request.
(It would have been nice if there was a way to remove a scheduled item from
a taskqueue, but that is not currently implemented in the kernel).
A real fix for this is in the works but wont make it to 6.1RELEASE
definite MFC candidate.
- Don't use a common buffer in the softc to store per-command data. Reserve
a buffer in the command itself.
- Don't allocate DMA memory for the kernel command structures when all you
really need is DMA memory for the scratch buffer embedded in them. Instead
allocate a slab for the scratch buffers and divide it up as needed.
- Call bus_dmamap_unload() at the completion of commands.
- Preserve and clear the CAM CCB status flags at completion.
- Reorder some low-level command operations to try to close races.
- Limit the simq to 32 commands for now. There are some serious problems
with the driver under load that are not well understood, so keeping the
simq lower helps avoid this. It has been tested at a higher value, but
this is a safe value that doesn't show much performance degredation.
These changes allow the driver to work reliably with >4GB of memory on i386
and amd64, and also work around deadlocks seen under very high load in
certain situations. The work-around is far from ideal, but without and
documentation it is hard to know what the right fix is.
MFC candidate
By default syscons(4) will look for the kbdmux(4) keyboard first, and then,
if not found, look for any keyboard.
Current kbd code is modified so if kbdmux(4) is the current keyboard, all
new keyboards are automatically added to the kbdmux(4).
Switch to kbdmux(4) can be done at boot time, by loading kbdmux module at
the loader prompt, or at runtime, by kldload'ing the kbdmux module and
releasing current active keyboard.
If, for whatever reason, kbdmux(4) is not required/desired then just do
not load it and everything should work as before. It is also possible to
kldunload kbdmux at runtime and syscons(4) will automatically switch to
the first available keyboard.
No response from: freebsd-current@
MFC after: 1 day
right from the beginning and partly clean up the differences in handling
between SYN_SENT and SYN_RCVD (syncache).
Further changes to this code to come. This is a first incremental step
to a general overhaul and streamlining of the TCP code.
PR: kern/15095
PR: kern/92690 (partly)
Reviewed by: qingli (and tested with ANVL)
Sponsored by: TCP/IP Optimization Fundraise 2005
- Throw out all of the logical APIC ID stuff. The Intel docs are somewhat
ambiguous, but it seems that the "flat" cluster model we are currently
using is only supported on Pentium and P6 family CPUs. The other
"hierarchy" cluster model that is supported on all Intel CPUs with
local APICs is severely underdocumented. For example, it's not clear
if the OS needs to glean the topology of the APIC hierarchy from
somewhere (neither ACPI nor MP Table include it) and setup the logical
clusters based on the physical hierarchy or not. Not only that, but on
certain Intel chipsets, even though there were 4 CPUs in a logical
cluster, all the interrupts were only sent to one CPU anyway.
- We now bind interrupts to individual CPUs using physical addressing via
the local APIC IDs. This code has also moved out of the ioapic PIC
driver and into the common interrupt source code so that it can be
shared with MSI interrupt sources since MSI is addressed to APICs the
same way that I/O APIC pins are.
- Interrupt source classes grow a new method pic_assign_cpu() to bind an
interrupt source to a specific local APIC ID.
- The SMP code now tells the interrupt code which CPUs are avaiable to
handle interrupts in a simpler and more intuitive manner. For one thing,
it means we could now choose to not route interrupts to HT cores if we
wanted to (this code is currently in place in fact, but under an #if 0
for now).
- For now we simply do static round-robin of IRQs to CPUs when the first
interrupt handler just as before, with the change that IRQs are now
bound to individual CPUs rather than groups of up to 4 CPUs.
- Because the IRQ to CPU mapping has now been moved up a layer, it would
be easier to manage this mapping from higher levels. For example, we
could allow drivers to specify a CPU affinity map for their interrupts,
or we could allow a userland tool to bind IRQs to specific CPUs.
The MFC is tentative, but I want to see if this fixes problems some folks
had with UP APIC kernels on 6.0 on SMP machines (an SMP kernel would work
fine, but a UP APIC kernel (such as GENERIC in RELENG_6) would lose
interrupts).
MFC after: 1 week
to allow PHOLD()'s of processes that have P_WEXIT set as once that flag
is set we aren't guaranteed to block in exit1() waiting for the PRELE()
(we might already be past the wait). However, curproc is a bit of a
special case. By the time P_WEXIT is set, the process is single-threaded,
so the only thread for which can do a PHOLD(curproc) is the thread
executing in exit1(). The fact that this thread is executing ensures
that the process won't go away before the current hold is released via
PRELE(). This fixes some panics due to kicking off softupdate operations
inside of exit1() after the recent PHOLD changes to fix ptrace/procfs vs
exit races.
MFC after: 1 week
Tested by: pho