MPLOCKED. The cleaning in rev.1.25 was supposed to have been undone
by rev.1.26, but 1.26 could never have actually affected asm files
since atomic.h is full of C declarations so including it in asm files
would just give syntax errors. The asm MPLOCKED is even less needed
than when misplaced definitions of it were first removed, and is now
unused in any asm file in the src tree except in anachronismns in
sys/i386/i386/support.s.
These functions are used a lot for mutexes, so this reduces the text
size of an average kernel by about 0.75%. This wasn't intended to
be a significant optimization, but it somehow increased the maximum
number of packets per second that can be transmitted by my bge hardware
from 320000 to 460000 (this benchmark is CPU-bound and remarkably
sensitive to changes in the text section).
Details: we would prefer to leave the result of the cmpxchg in %al,
but cannot tell gcc that it is there, so we have to convert it to an
integer register. We converted to %al, then to %[re]ax, but the
latter step is usually wasted since gcc usually only wants the condition
code and can recover it from %al just as easily as from %[re]ax. Let
gcc promote %al in the few cases where this is needed.
Nearby style fixes;
- let gcc manage the load of `res', and don't abuse `res' for a copy of `exp'
- don't echo `res's name in comments
- consistently spell the condition code as 'e' after comparison for equality
- don't hard-code %al anywhere except in constraints
- for the version that doesn't use cmpxchg, there is no requirement to use
%al anywhere, so don't hard-code it in the constraints either.
Style non-fix:
- for the versions that use cmpxchg, keep using "a" (was %[re]ax, now %al)
for the main output operand, although this is not required. The input
and output operands that use the "a" constraint are now decoupled, and
this makes things clearer except for the reason that the output register
is hard-coded. It is now just a hack to tell gcc that the input "a" has
been clobbered without increasing the number of operands.
- Move linux_nanosleep() from src/sys/amd64/linux32/linux32_machdep.c to
src/sys/compat/linux/linux_time.c.
- Validate timespec ranges before use as Linux kernel does.
- Fix l_timespec structure.
- Clean up style(9) nits.
running thread's id on each cpu. This allow us to add in-kernel adaptive
spin for user level mutex. While spinning in user space is possible,
without correct thread running state exported from kernel, it hardly
can be implemented efficiently without wasting cpu cycles, however
exporting thread running state unlikely will be implemented soon as
it has to design and stablize interfaces. This implementation is
transparent to user space, it can be disabled dynamically. With this
change, mutex ping-pong program's performance is improved massively on
SMP machine. performance of mysql super-smack select benchmark is increased
about 7% on Intel dual dual-core2 Xeon machine, it indicates on systems
which have bunch of cpus and system-call overhead is low (athlon64, opteron,
and core-2 are known to be fast), the adaptive spin does help performance.
Added sysctls:
kern.threads.umtx_dflt_spins
if the sysctl value is non-zero, a zero umutex.m_spincount will
cause the sysctl value to be used a spin cycle count.
kern.threads.umtx_max_spins
the sysctl sets upper limit of spin cycle count.
Tested on: Athlon64 X2 3800+, Dual Xeon 5130
passed by value (trap frames) as if they were in fact being passed by
reference. For better or worse, this incorrect behaviour is no longer
present in gcc 4.1. In this patch I convert all trapframe arguments to
be explicitly pass by reference. I also remove vm86_initflags, pushing
the very little work that it actually does up into vm86_prepcall.
Reviewed by: kan
Tested by: kan
behave as expected.
Also:
- Return an error if WD_PASSIVE is passed in to the ioctl as only
WD_ACTIVE is implemented at the moment. See sys/watchdog.h for an
explanation of the difference between WD_ACTIVE and WD_PASSIVE.
- Remove the I_HAVE_TOTALLY_LOST_MY_SENSE_OF_HUMOR define. If you've
lost your sense of humor, than don't add a define.
Specific changes:
i80321_wdog.c
Don't roll your own passive watchdog tickle as this would defeat the
purpose of an active (userland) watchdog tickle.
ichwd.c / ipmi.c:
WD_ACTIVE means active patting of the watchdog by a userland process,
not whether the watchdog is active. See sys/watchdog.h.
kern_clock.c:
(software watchdog) Remove a check for WD_ACTIVE as this does not make
sense here. This reverts r1.181.
pcib_alloc_msix() methods instead of using the method from the generic
PCI-PCI bridge driver as the PCI-PCI methods will be gaining some PCI-PCI
specific logic soon.
to workaround the problem with SMP kernels on Turion64 X2 processors
described in kern/104678 and may be useful in other situations too.
MFC after: 3 days
Make part of John Birrell's KSE patch permanent..
Specifically, remove:
Any reference of the ksegrp structure. This feature was
never fully utilised and made things overly complicated.
All code in the scheduler that tried to make threaded programs
fair to unthreaded programs. Libpthread processes will already
do this to some extent and libthr processes already disable it.
Also:
Since this makes such a big change to the scheduler(s), take the opportunity
to rename some structures and elements that had to be moved anyhow.
This makes the code a lot more readable.
The ULE scheduler compiles again but I have no idea if it works.
The 4bsd scheduler still reqires a little cleaning and some functions that now do
ALMOST nothing will go away, but I thought I'd do that as a separate commit.
Tested by David Xu, and Dan Eischen using libthr and libpthread.
and by only delaying when an RTC register is written to. The delay
after writing to the data register is now not just a workaround.
This reduces the number of ISA accesses in the usual case from 4 to
1. The usual case is 2 rtcin()'s for each RTC interrupt. The index
register is almost always RTC_INTR for this. The 3 extra ISA accesses
were 1 for writing the index and 2 for delays. Some delays are needed
in theory, but in practice they now just slow down slow accesses some
more since almost eveyone including us does them wrong so modern systems
enforce sufficient delays in hardware. I used to have the delays ifdefed
out, but with the index register optimization the delays are rarely
executed so the old magic ones can be kept or even implemented non-
magically without significant cost.
Optimizing RTC interrupt handling is more interesting than it used to
be because RTC interrupts are currently needed to fix the more efficient
apic timer interrupts on some systems. apic_timer_hz is normally 2000
so the RTC interrupt rate needs to be 2048 to keep the apic timer
firing on such systems. Without these changes, each RTC interrupt
normally took 10 ISA accesses (2 PIC accesses and 2 sets of 4 RTC
accesses). Each ISA access takes 1-1.5uS so 10 of then at 2048 Hz
takes 2-3% of a CPU. Now 4 of them take 0.8-1.2% of a CPU.
by default for sun4v where it is absolutely required.
This change moves the buffer from struct pcpu to the stack to avoid
using the critical section which created a LOR in a couple of cases
due to interaction with the tty code and kqueue. The LOR can't be
fixed with the critical section and the pcpu buffer can't be used
without the critical section.
Putting the buffer on the stack was my initial solution, but it was
pointed out that the stress on the stack might cause problems
depending on the call path. We don't have a way of creating tests
for those possible cases, so it's best to leave this as an option
for the time being. In time we may get enough data to enable this
option more generally.
of various scattered magic values.
- Pretty print the address of hardware watchpoints in 'show watch' rather
than just displaying hex.
- Expand address field width on amd64 for 64-bit pointers.
- Drop the printf in intr_machdep.c when we assign an interrupt souce to
a CPU. Each source already has a more detailed printf.
- Don't output a line for each ioapic pin showing its initial state, this
has outlived its usefulness.
- When an APIC enumerator sets the bus, polarity, or trigger mode of an
ioapic pin, just return success without printing anything if the new
value matches the current one.
MFC after: 2 weeks
- Add a new apic_alloc_vectors() method to the local APIC support code
to allocate N contiguous IDT vectors (aligned on a M >= N boundary).
This function is used to allocate IDT vectors for a group of MSI
messages.
- Add MSI and MSI-X PICs. The PIC code here provides methods to manage
edge-triggered MSI messages as x86 interrupt sources. In addition to
the PIC methods, msi.c also includes methods to allocate and release
MSI and MSI-X messages. For x86, we allow for up to 128 different
MSI IRQs starting at IRQ 256 (IRQs 0-15 are reserved for ISA IRQs,
16-254 for APIC PCI IRQs, and IRQ 255 is reserved).
- Add pcib_(alloc|release)_msi[x]() methods to the MD x86 PCI bridge
drivers to bubble the request up to the nexus driver.
- Add pcib_(alloc|release)_msi[x]() methods to the x86 nexus drivers that
ask the MSI PIC code to allocate resources and IDT vectors.
MFC after: 2 months
of NKPT is no longer enough to run amd64 with 16G of RAM, as it
doesn't have space for mapping a kernel (16M kernel would require
additionally 8 page tables).
specific privilege names to a broad range of privileges. These may
require some future tweaking.
Sponsored by: nCircle Network Security, Inc.
Obtained from: TrustedBSD Project
Discussed on: arch@
Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri,
Alex Lyashkov <umka at sevcity dot net>,
Skip Ford <skip dot ford at verizon dot net>,
Antoine Brodin <antoine dot brodin at laposte dot net>
it as a default.
For the record, the KDTRACE option caused _no_ additional source files
to be compiled in; certainly no CDDL source files. All it did was to
allow existing BSD licensed kernel files to include one or more CDDL
header files.
By removing this from DEFAULTS, the onus is on a kernel builder to add
the option to the kernel config, possibly by including GENERIC and
customising from there. It means that DTrace won't be a feature
available in FreeBSD by default, which is the way I intended it to be.
Without this option, you can't load the dtrace module (which contains
the dtrace device and the DTrace framework). This is equivalent to
requiring an option in a kernel config before you can load the linux
emulation module, for example.
I think it is a mistake to have DTrace ported to FreeBSD, but not
to have it available to everyone, all the time. The only exception
to this is the companies which distribute systems with FreeBSD embedded.
Those companies will customise their systems anyway. The KDTRACE
option was intended for them, and only them.
adds the hooks that DTrace modules register with, and adds a few functions
which have the dtrace_ prefix to allow the DTrace FBT (function boundary
trace) provider to avoid tracing because they are called from the DTtrace
probe context.
Unlike other forms of tracing and debug, DTrace support in the kernel
incurs negligible run-time cost.
I think the only reason why anyone wouldn't want to have kernel support
enabled for DTrace would be due to the license (CDDL) under which DTrace
is released.
a lock to prevent interspersed strings written from different CPUs
at the same time.
To avoid putting a buffer on the stack or having to malloc one,
space is incorporated in the per-cpu structure. The buffer
size if 128 bytes; chosen because it's the next power of 2 size
up from 80 characters.
String writes to the console are buffered up the end of the line
or until the buffer fills. Then the buffer is flushed to all
console devices.
Existing low level console output via cnputc() is unaffected by
this change. ithread calls to log() are also unaffected to avoid
blocking those threads.
A minor change to the behaviour in a panic situation is that
console output will still be buffered, but won't be written to
a tty as before. This should prevent interspersed panic output
as a number of CPUs panic before we end up single threaded
running ddb.
Reviewed by: scottl, jhb
MFC after: 2 weeks
dynamic nature (if no native aio code is available, the linux part
returns ENOSYS because of missing requisites) should be solved differently
than it is.
All this will be done in P4.
Not included in this commit is a backout of the changes to the native aio
code (removing static in some places). Those changes (and some more) will
also be needed when the reworked linux aio stuff will reenter the tree.
Requested by: rwatson
Discussed with: rwatson
not completely decided at config time. Just don't default to using
the TSC if there are multiple active CPUs. Also, don't default to
using the TSC if it is broken. SMP ifdefs are still used to disallow
using perfmon since perfmon is always broken if SMP is just configured.
This only helps much for SMP kernels running on 1 CPU. The overheads
for using the i8254 cputime clock were a bit too high on 486/33's, and
now on multi-GHz CPUs they are usually in the 99-99.9% range. Switching
from the old default of an i8254 clock to the TSC works poorly because
the overheads are not recalibrated.
Use the same condition for declaring perfmon stuff as for using it.