GETATTRs being generated - one from lookup()/namei() and the other
from nfs_open() (for cto consistency). This change eliminates the
GETATTR in nfs_open() if an otw GETATTR was done from the namei()
path. Instead of extending the vop interface, we timestamp each attr
load, and use this to detect whether a GETATTR was done from namei()
for this syscall. Introduces a thread-local variable that counts the
syscalls made by the thread and uses <pid, tid, thread syscalls> as
the attrload timestamp. Thanks to jhb@ and peter@ for a discussion on
thread state that could be used as the timestamp with minimal overhead.
For example, during a buildworld more than half of the calls do not
generate an IPI because the only TLB entry invalidated is on the calling
processor. This revision pushes down the acquisition and release of
smp_ipi_mtx into smp_tlb_shootdown() and smp_targeted_tlb_shootdown() and
instead uses sched_pin() and sched_unpin() in pmap_invalidate_*() so that
thread migration doesn't lead to a missed TLB invalidation.
Reviewed by: jhb
MFC after: 3 weeks
RTC state, then it may clobber the RTC index register, so the index
register must be restored before using it to restore control registers
in rtc_restore().
The following problems remain:
- rtc_restore() is only called if pmtimer is configured. Buggy
suspend/resumes are more likely to clobber the index register than
a control register, so pmtimer is more needed than it used to be.
- pmtimer doesn't exist for amd64.
- Restoring of the RTC state may race with rtcintr(). If an RTC
interrupt is handled before the state is restored, then rtcin(RTC_INTR)
in rtcintr() may read from the wrong register, so rtcintr() may spin
forever. This may be mitigated by the most common state clobbering
being to turn off RTC interrupts.
Linux does not check file descriptor when MAP_ANONYMOUS is set.
This should fix recent LTP test regressions.
Reported by: Scot Hetzel (swhetzel at gmail dot com)
netchild
- Dont "return" in linux_clone() after we forked the new process in a case
of problems.
- Move the copyout of p2->p_pid outside the emul_lock coverage in
linux_clone().
- Cache the em->pdeath_signal in a local variable and move the copyout
out of the emul_lock coverage.
- Move the free() out of the emul_shared_lock coverage in a preparation
to switch emul_lock to non-sleepable lock (mutex).
Submitted by: rdivacky
- PROT_READ, PROT_WRITE, or PROT_EXEC implies PROT_READ and PROT_EXEC.
Linux/ia64's i386 emulation layer does this and it complies with Linux
header files. This fixes mmap05 LTP test case on amd64.
- Do not adjust stack size when failure has occurred.
- Synchronize i386 mmap/mprotect with amd64.
partitioning class that supports multiple schemes. Current
schemes supported are APM (Apple Partition Map) and GPT.
Change all GEOM_APPLE anf GEOM_GPT options into GEOM_PART_APM
and GEOM_PART_GPT (resp).
The ctlreq interface supports verbs to create and destroy
partitioning schemes on a disk; to add, delete and modify
partitions; and to commit or undo changes made.
variable to avoid invalid constraints in dead code. Use an array of
u_char's (inside a struct) instead of a char/short/int/long variable so
that the variable and its accesses can be spelled in the same way in all
cases and code doesn't need to be cloned just to hold the spelling
differences.
Fixed strict-aliasing errors in PCPU_SET() and in the amd64 PCPU_GET().
Cast to (void *) as in rev.1.37 of the i386 version where the errors
were fixed for the i386 PCPU_GET() only. It would be more correct to
copy to and from the temp. variable using memcpy(), but then an
ifdef tangle would be required to ensure using the builtin memcpy().
We depend on fairly aggressive optimization to put the temp. variable
only in a register despite it being copied using
*(type *)(void *)&anothertype and could depend on this when using
memcpy() too. This seems to work right even for -O0, but the -O0 case
has not been completely tested.
This change gives identical object code for all object files in LINT
on amd64 (except for one file with a __TIME__ stamp). For LINT on
i386 it gives unimportant differences in instruction order and padding
in a few object files. This was only tested for -O.
This change (actually a previous version of it) gives the following
reductions in the number of object files in LINT that fail to compile
with -O2 but without the -fno-strict-aliasing kludge:
- amd64: 29 (down from 211)
- i386: 36 (down from 47)
gcc-3.4.6 actually allows the invalid constraints that result from not
using the temp. variable, at least with -O[1-2], but gcc-3.3.3 crashes
on them and I don't want to depend on compiler bugs.
o remove errata_a0 and introduce the corresponding flags into 'errata'.
o introduce a new errata for K8, namely some platform might set the
PENDING_BIT but aren't able to unset it, also don't loop forever
waiting PENDING_BIT being cleared.
o try to introduce a workaround for the PENDING_BIT stuck problem,
o support now half multipliers for K8.
Tested by: Abdullah Al-Marrie
Approved by: njl
setrunqueue() was mostly empty. The few asserts and thread state
setting were moved to the individual schedulers. sched_add() was
chosen to displace it for naming consistency reasons.
- Remove adjustrunqueue, it was 4 lines of code that was ifdef'd to be
different on all three schedulers where it was only called in one place
each.
- Remove the long ifdef'd out remrunqueue code.
- Remove the now redundant ts_state. Inspect the thread state directly.
- Don't set TSF_* flags from kern_switch.c, we were only doing this to
support a feature in one scheduler.
- Change sched_choose() to return a thread rather than a td_sched. Also,
rely on the schedulers to return the idlethread. This simplifies the
logic in choosethread(). Aside from the run queue links kern_switch.c
mostly does not care about the contents of td_sched.
Discussed with: julian
- Move the idle thread loop into the per scheduler area. ULE wants to
do something different from the other schedulers.
Suggested by: jhb
Tested on: x86/amd64 sched_{4BSD, ULE, CORE}.
used by clock code, so don't export it to the world for machdep.c to
initialize. There is a minor problem initializing it before it is
used, since although clock initialization is split up so that parts
of it can be done early, the first part was never done early enough
to actually work. Split it up a bit more and do the first part as
late as possible to document the necessary order. The functions that
implement the split are still bogusly exported.
Cleaned up initialization of the i8254 clock hardware using the new
split. Actually initialize it early enough, and don't work around it
not being initialized in DELAY() when DELAY() is called early for
initialization of some console drivers.
This unfortunately moves a little more code before the early debugger
breakpoint so that it is harder to debug. The ordering of console and
related initialization is delicate because we want to do as little as
possible before the breakpoint, but must initialize a console.
- First off, device drivers really do need to know if they are allocating
MSI or MSI-X messages. MSI requires allocating powerof2() messages for
example where MSI-X does not. To address this, split out the MSI-X
support from pci_msi_count() and pci_alloc_msi() into new driver-visible
functions pci_msix_count() and pci_alloc_msix(). As a result,
pci_msi_count() now just returns a count of the max supported MSI
messages for the device, and pci_alloc_msi() only tries to allocate MSI
messages. To get a count of the max supported MSI-X messages, use
pci_msix_count(). To allocate MSI-X messages, use pci_alloc_msix().
pci_release_msi() still handles both MSI and MSI-X messages, however.
As a result of this change, drivers using the existing API will only
use MSI messages and will no longer try to use MSI-X messages.
- Because MSI-X allows for each message to have its own data and address
values (and thus does not require all of the messages to have their
MD vectors allocated as a group), some devices allow for "sparse" use
of MSI-X message slots. For example, if a device supports 8 messages
but the OS is only able to allocate 2 messages, the device may make the
best use of 2 IRQs if it enables the messages at slots 1 and 4 rather
than default of using the first N slots (or indicies) at 1 and 2. To
support this, add a new pci_remap_msix() function that a driver may call
after a successful pci_alloc_msix() (but before allocating any of the
SYS_RES_IRQ resources) to allow the allocated IRQ resources to be
assigned to different message indices. For example, from the earlier
example, after pci_alloc_msix() returned a value of 2, the driver would
call pci_remap_msix() passing in array of integers { 1, 4 } as the
new message indices to use. The rid's for the SYS_RES_IRQ resources
will always match the message indices. Thus, after the call to
pci_remap_msix() the driver would be able to access the first message
in slot 1 at SYS_RES_IRQ rid 1, and the second message at slot 4 at
SYS_RES_IRQ rid 4. Note that the message slots/indices are 1-based
rather than 0-based so that they will always correspond to the rid
values (SYS_RES_IRQ rid 0 is reserved for the legacy INTx interrupt).
To support this API, a new PCIB_REMAP_MSIX() method was added to the
pcib interface to change the message index for a single IRQ.
Tested by: scottl
Dont expose em->shared to the outside world before its properly
initialized. Might not affect anything but its at least a better
coding style.
Dont expose em via p->p_emuldata until its properly initialized.
This also enables us to get rid of some locking and simplify the
code because we are workin on a local copy.
In linux_fork and linux_vfork create the process in stopped state
to be sure that the new process runs with fully initialized emuldata
structure [1]. Also fix the vfork (both in linux_clone and linux_vfork)
race that could result in never woken up process [2].
Reported by: Scot Hetzel [1]
Suggested by: jhb [2]
Reviewed by: jhb (at least some important parts)
Submitted by: rdivacky
Tested by: Scot Hetzel (on amd64)
Change 2 comments (in the new code) to comply to style(9).
Suggested by: jhb
KERNBASE for the first 1 MB of RAM instead of calling pmap_mapdev().
pmap_mapdev() knows how to handle the first 1 MB (and has known for a
while now) and properly maps the memory as UC to boot.
MFC after: 2 weeks
we actually issue preemptions.
- Remove the #ifdef IPI_PREEMPTION so it is always compiled in. Leave
the option which optionally enables support in sched_4bsd. sched_ule.c
will soon use this functionality as a run time rather than compile time
option.
- Compare against the idlethread rather than the priority. There are some
idle prio tasks that we can preempt.
Discussed with: ups
Tested on: i386, amd64