Problem:
selwakeup required calling pfind which would cause lock order
reversals with the allproc_lock and the per-process filedesc lock.
Solution:
Instead of recording the pid of the select()'ing process into the
selinfo structure, actually record a pointer to the thread. To
avoid dereferencing a bad address all the selinfo structures that
are in use by a thread are kept in a list hung off the thread
(protected by sellock). When a selwakeup occurs the selinfo is
removed from that threads list, it is also removed on the way out
of select or poll where the thread will traverse its list removing
all the selinfos from its own list.
Problem:
Previously the PROC_LOCK was used to provide the mutual exclusion
needed to ensure proper locking, this couldn't work because there
was a single condvar used for select and poll and condvars can
only be used with a single mutex.
Solution:
Introduce a global mutex 'sellock' which is used to provide mutual
exclusion when recording events to wait on as well as performing
notification when an event occurs.
Interesting note:
schedlock is required to manipulate the per-thread TDF_SELECT
flag, however if given its own field it would not need schedlock,
also because TDF_SELECT is only manipulated under sellock one
doesn't actually use schedlock for syncronization, only to protect
against corruption.
Proc locks are no longer used in select/poll.
Portions contributed by: davidc
enabled in critical sections and streamline critical_enter() and
critical_exit().
This commit allows an architecture to leave interrupts enabled inside
critical sections if it so wishes. Architectures that do not wish to do
this are not effected by this change.
This commit implements the feature for the I386 architecture and provides
a sysctl, debug.critical_mode, which defaults to 1 (use the feature). For
now you can turn the sysctl on and off at any time in order to test the
architectural changes or track down bugs.
This commit is just the first stage. Some areas of the code, specifically
the MACHINE_CRITICAL_ENTER #ifdef'd code, is strictly temporary and will
be cleaned up in the STAGE-2 commit when the critical_*() functions are
moved entirely into MD files.
The following changes have been made:
* critical_enter() and critical_exit() for I386 now simply increment
and decrement curthread->td_critnest. They no longer disable
hard interrupts. When critical_exit() decrements the counter to
0 it effectively calls a routine to deal with whatever interrupts
were deferred during the time the code was operating in a critical
section.
Other architectures are unaffected.
* fork_exit() has been conditionalized to remove MD assumptions for
the new code. Old code will still use the old MD assumptions
in regards to hard interrupt disablement. In STAGE-2 this will
be turned into a subroutine call into MD code rather then hardcoded
in MI code.
The new code places the burden of entering the critical section
in the trampoline code where it belongs.
* I386: interrupts are now enabled while we are in a critical section.
The interrupt vector code has been adjusted to deal with the fact.
If it detects that we are in a critical section it currently defers
the interrupt by adding the appropriate bit to an interrupt mask.
* In order to accomplish the deferral, icu_lock is required. This
is i386-specific. Thus icu_lock can only be obtained by mainline
i386 code while interrupts are hard disabled. This change has been
made.
* Because interrupts may or may not be hard disabled during a
context switch, cpu_switch() can no longer simply assume that
PSL_I will be in a consistent state. Therefore, it now saves and
restores eflags.
* FAST INTERRUPT PROVISION. Fast interrupts are currently deferred.
The intention is to eventually allow them to operate either while
we are in a critical section or, if we are able to restrict the
use of sched_lock, while we are not holding the sched_lock.
* ICU and APIC vector assembly for I386 cleaned up. The ICU code
has been cleaned up to match the APIC code in regards to format
and macro availability. Additionally, the code has been adjusted
to deal with deferred interrupts.
* Deferred interrupts use a per-cpu boolean int_pending, and
masks ipending, spending, and fpending. Being per-cpu variables
it is not currently necessary to lock; bus cycles modifying them.
Note that the same mechanism will enable preemption to be
incorporated as a true software interrupt without having to
further hack up the critical nesting code.
* Note: the old critical_enter() code in kern/kern_switch.c is
currently #ifdef to be compatible with both the old and new
methodology. In STAGE-2 it will be moved entirely to MD code.
Performance issues:
One of the purposes of this commit is to enhance critical section
performance, specifically to greatly reduce bus overhead to allow
the critical section code to be used to protect per-cpu caches.
These caches, such as Jeff's slab allocator work, can potentially
operate very quickly making the effective savings of the new
critical section code's performance very significant.
The second purpose of this commit is to allow architectures to
enable certain interrupts while in a critical section. Specifically,
the intention is to eventually allow certain FAST interrupts to
operate rather then defer.
The third purpose of this commit is to begin to clean up the
critical_enter()/critical_exit()/cpu_critical_enter()/
cpu_critical_exit() API which currently has serious cross pollution
in MI code (in fork_exit() and ast() for example).
The fourth purpose of this commit is to provide a framework that
allows kernel-preempting software interrupts to be implemented
cleanly. This is currently used for two forward interrupts in I386.
Other architectures will have the choice of using this infrastructure
or building the functionality directly into critical_enter()/
critical_exit().
Finally, this commit is designed to greatly improve the flexibility
of various architectures to manage critical section handling,
software interrupts, preemption, and other highly integrated
architecture-specific details.
is not configured. Including <isa/isavar.h> when it is not used is
harmful as well as bogus, since it includes "isa_if.h" which is not
generated when isa is not configured.
This was fixed in 1999 but was broken by unconditionalizing PNPBIOS.
and it's associated state variables: icu_lock with the name "icu". This
renames the imen_mtx for x86 SMP, but also uses the lock to protect
access to the 8259 PIC on x86 UP. This also adds an appropriate lock to
the various Alpha chipsets which fixes problems with Alpha SMP machines
dropping interrupts with an SMP kernel.
otherwise breaks on the Alpha arch. I think this is wrong since i'd
actually like to probe for a PC architecture, not for a particular CPU
type. Anyway, now it's again the way it used to be.
. The main device node now supports automatic density selection for
commonly used media densities. So you can stuff your 1.44 MB and
720 KB media into your drive and just access /dev/fd0, no questions
asked. It's all that easy, isn't it? :)
. Device density handling has been completely overhauled. The old way
of hardwired kernel density knowledge is no longer there. Instead,
the kernel now implements 16 subdevices per drive. The first
subdevice uses automatic density selection, while the remaining 15
devices are freely programmable. They can be assigned an arbitrary
name of the form /dev/fd[:digit]+.[:digit:]{1,4}, where the second
number is meant to either implement device names that are mnemonic
for their raw capacity (as it used to be), or they can alternatively
be created as "anonymous" devices like fd0.1 through fd0.15,
depending on the taste of the administrator. After creating a
subdevice, it is initialized to the maximal native density of the
respective drive type, so it needs to be customized for other
densities by using fdcontrol(8). Pseudo-partition devices (fd0a
through fd0h) are still supported as symlinks.
. The old hack to use flags 0x1 to always assume drive 0 were there is
no longer supported; this is now supposed to be done by wiring the
devices down from the loader via device flags. On IA32
architectures, the first two drives are looked up in the CMOS
configuration records though. On PCMCIA (i. e., the Y-E Data
controller of the Toshiba Libretto), a single drive is always
assumed.
. Other specialities like disabling the FIFO and not probing the drive
at boot-time are selected by per-controller or per-drive flags, too.
. Unit attentions (media has been changed) are supposed to be detected
now; density autoselection only occurs after a unit attention. (Can
be turned off by a per-drive flag, this will cause each Fdopen() to
perform the autoselection.)
. FM floppies can be handled now (on controllers that actually support
it -- not all do these days).
. Fdopen() can be told to avoid density selection by setting
O_NONBLOCK; this leaves the descriptor in a half-opened state where
only a few ioctls are accepted. This is necessary to run fdformat
on a device that uses automatic density selection (since you cannot
autoselect on an unformatted medium, obviously).
. Just differentiate between a plain old NE765 and the enhanced chips,
but don't try more; the existing code was wrong and only misdetected
the chips anyway.
BUGS and TODOs:
. All documentation update still needs to be done.
. Formatting not-so-standard format yields unpredictable results; i
have yet to figure out why this happens. "Standard" formats like
720 and 1440 KB do work, however.
. rc scripts are needed to setup device nodes with nonstandard
densities (like the old /dev/fdN.MMM we used to have).
. Obtaining device flags from the kernel environment doesn't work yet,
thus currently only drives that are present in (IA32) CMOS are
really detected. Someone who knows the odds and ends about device
flags is needed here, i can't figure out what i'm doing wrong.
. 2.88 MB still needs to be done.
- Now that apm loadable module can inform its existence to other kernel
components (e.g. i386/isa/clock.c:startrtclock()'s TCS hack).
- Exchange priority of SI_SUB_CPU and SI_SUB_KLD for above purpose.
- Add simple arbitration mechanism for APM vs. ACPI. This prevents
the kernel enables both of them.
- Remove obsolete `#ifdef DEV_APM' related code.
- Add abstracted interface for Powermanagement operations. Public apm(4)
functions, such as apm_suspend(), should be replaced new interfaces.
Currently only power_pm_suspend (successor of apm_suspend) is implemented.
Reviewed by: peter, arch@ and audit@
sio_pccard_detach to use new siodetach. Add an extra arg to sioprobe
to tell driver to probe/not probe the device for IRQs.
This incorporates most of Bruce's review material. I'm at a good
checkpoint, but there will be more to come based on bde's further
reviews.
Reviewed by: bde
Move sio from isa/sio.c to dev/sio/sio.c. The next step is to break
out the front end attachments, improve support for these parts on
different busses, and maybe, if we're lucky, merging in pc98 support.
It will also be MI and live in conf/files rather than files.*.
Approved by: bde
Tested with: i386, pc98
- Count the number of this error.
- When the error is detected for the first time, the psm driver will
throw few data bytes (up to entire packet size) and see if it can
get back to sync.
- If the error still persists, the psm driver disable/enable the mouse
and see if it works.
- If the error still persists and the count goes up to 20,
the psm driver reset and reinitialize the mouse. The counter
is reset to zero.
- It also discards an incomplete data packet when the interval
between two consequtive bytes are longer than pre-defined timeout
(2 seconds). The last byte which arrived late will be regarded as
the first byte of a new packet. This is louie's idea.
You may see the following error logs during the above operations:
"psmintr: delay too long; resetting byte count"
"psmintr: out of sync (%04x != %04x)"
"psmintr: discard a byte (%d)"
"psmintr: re-enable the mouse"
"psmintr: reset the mouse"
MFC after: 1 month
sio_lock has been initialized. This prevents the low level console
output (kernel printf) from clobbering the sio settings if the system
happens to be in the middle of comstart().
problems currently experienced in -CURRENT.
This should fix the problem that the PS/2 mouse is detected
twice if the acpi module is not loaded on some systems.
I am not sure if this is absolutely necessary on all systems. Yet,
there certainly are motherboards and notebook systems which require
this, although there are other systems which just don't. I hope we
shall know when to do this on which systems, as the development of our
ACPI subsystem progresses... (I know we didn't need this for the APM
resume.)
mandatory "card" identifier string. A logical devices on the ISA PnP
card may optionally have a "device" identifier string. Do not confuse
them.
The "card" identifier string is assigned to a logical device as the
default description string when the device is found. (If the "card"
identifier string has not been found, use the EISA PnP ID string.
Strictly speaking, this is an error.) We will override it when a
"device" identifier string is found later.
- Add workaround for the problematic PnP BIOS which does not assign
irq resource for the PS/2 mouse device node; if there is no irq
assigned for the PS/2 mouse node, refer to device.hints for an
irq number. If we still don't find an irq number in the hints
database, use a hard-coded value.
- Delete unused ivars.
- Bit of clean up in probe/attach.
- Add PnP ID for the PS/2 mouse port on some IBM ThinkPad models.
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.
Sorry john! (your next MFC will be a doosie!)
Reviewed by: peter@freebsd.org, dillon@freebsd.org
X-MFC after: ha ha ha ha
more cleanly and consistently in all APCI, PnP BIOS, and "hint"
cases.
NOTE: this doesn't necessarily solve the problem that the PS/2
mouse is not detected after the recent ACPI update.
the following bugs.
- When constructing a resource configuration, respect the order
in which resource descriptors are read, in order to establish
the correct mapping between the descriptors and configuration
registers.
"Plug and Play ISA Specification, Version 1.0a", Sec 4.6.1, May 5,
1994. "Clarifications to the Plug and Play ISA Specification,
Version 1.0a", Sec 6.2.1, Dec. 10, 1994.
- Do not ignore null (empty) descriptors; they are valid descriptors
acting as filler.
"Clarifications to the Plug and Play ISA Specification, Version 1.0a",
Sec 6.2.1.
- Correctly set up logical device configuration registers for null
resources.
"Clarifications to the Plug and Play ISA Specification, Version 1.0a"
- Handle null resources properly in the resource allocator for the
ISA bus.
with system statistics monitoring tools (such as systat, vmstat...)
because of stopping RTC interrupts generation.
Restore all the timers (RTC and i8254) atomically.
Reviewed by: bde
MFC after: 1 week
to properly clear the interrupt register on the no error case. Also,
set the mcr register to zero when we find we can't support the chip.
This fixes the hang on sio driver attach problem in the new pci pccard
code that some people have reported. At least on my machine. I'd
like to get this into 4.4.
Submitted by: bde
PR: kern/29742
MFC after: 1 day
events. Otherwise you would see unexpected results if shift or
locking keys are defined to give different actions depending
on other shift/locking keys' state.
Please keep the ukbd module and the kernel in sync, otherwise
the USB keyboard won't work after this change.
MFC after: 10 days
. Integrate fdc.h into fd.c, with the removal of ft(4) there's no longer
a reason to scatter things across two files.
. Sanitize comments. Convert them into the style(9)-recommended
multi-line form, make them sentences where apprpriate, etc.
. Declare all functions on top, and declare them in the order they
appear in the file. This order is totally chaotic, but Bruce
convinced me that reordering the file wouldn't make it better either.
. Kill a `possibly uninitialized' warning (only seen with -O2) in
fd_read_status().
. Make the comments at return (0|1) statements in fdstate() consistent.
. Nuke a ``keep the compiler happy'' dummy return at the end of fdstate(),
gcc is smart enough to detect that it would never be reached anyway.
haven't been probed successfully. It's a known bug that ISA hints
processing instantiates those devices, and prematurely killing them
has other unwanted side-effects.
where they will never succeed. Add a stop-gap measure that will at
least eventually timeout the operation instead of retrying it
indefinately.
MFC after: 1 month
Despite of a few cosmetic things like adding ``irritating silly
parentheses'' around all return values, this mainly improves FDC reset
handling by no longer gratuitously resetting the FDC all the time
(which causes it to lose the notion of the current track) but only in
case of errors, and it sanitizes the block and offset calculations in
fdstrategy() and fdstate(). Some additional cleanup added by me, in
particular the large switch in fdstate() now always uses return to
break out, and no branch falls off the end of the switch statement
anymore. Per Bruce's suggestion, removed M_NOWAIT from the malloc()s
to simplify things.
Submitted by: bde (mostly)
destroyed properly (otherwise bad things would happen after a clone
dev had been created, and the module was kldunloaded). Allocated
children that have not successfully probed are being deleted again
(otherwise fd0 and fd1 have always been allocated, even if only
fd0 was acutally present, and fd1 even survived kldunloading the
module).
Still, kldunloading leaves remnants of the previously existing devices
intact. Why doesn't it destroy all the devices? As a consequence,
since dev->descr now points into no longer allocated memory, the
system panics deep inside printf(9) when running devinfo(1) after
kldunloading the module. Ideas sought...
Also, when kldloading the module on a hints-populated isab0, this bus
somehow has already created an fdc0 entry (a dummy) so the load
attempt fails and will register fdc1 instead. What are those dummy
entries for? Loading the module from the bootloader works, and it
can be unloaded an re-loaded then later.
the sector ID.
Based on numerous comments made by Bruce, rewrite a good part of the
old fdformat() function, and merge it with fdreadid() into a single
unified fdmisccmd() function. Various style and a couple of more
serious bugs fixed there.
While i was at it, i also fixed the long-standing "TODO: don't
allocate buffer on stack." in fdcioctl(), fixed a number of style bugs
there, and finally implemented the FD_DEBUG ioctl command that has
been advertised in <sys/fdcio.h> (formerly <machine/ioctl_fd.h>) for
almost seven years now. ;-)
Submitted by: bde (a lot of fixes for fdformat())
a KLD. Still doesn't work well except in the PCMCIA case (now if only
pccardd(8) could load and unload drivers dynamically...). Mainly, it
tries to find fdc0 on the PCI bus for whatever obscure reasons, but i
need someone who understands driver(9) to fix this. However, it's at least
already better than before, and i'm tired of maintaining too many private
changes in my tree, given the large patches bde submitted. :)
Idea of a KLD triggered by: Michael Reifenberger <root@nihil.plaut.de>
by now (except of a compile test), but i believe this to contain no
actual functional changes.
. Fix the copyright of the Regents i accidentally broke in rev 1.197
(although only a very small part of the original driver survived
at all...).
. Bump MAX_CYLINDER since some obscure formats really use more than 80
cylinders.
. Correctly handle BIO_FORMAT which used to be a bitmask but is now a BIO
command of its own.
. Numerous stylistic fixes.
Submitted by: bde
. staticize out_fdc(), there's no longer an ft(4) driver sharing its use
. remove in_fdc(), has been used by ft(4) last time, long since obsoleted
by fd_in()
. move the declaration of fd_clone() to where most of the other function
declarations are
. de-__P()ify fd_clone(), it's been the only _P()ed function in the
entire file
the console device was open. At other times, the interrupts that
are used to detect the break signal or ~^B sequence were disabled,
so these events would not be noticed until the next open (e.g. the
next kernel printf). This was mainly a problem while there was no
getty running on the console, such as during bootup or shutdown.
For serial consoles with break-to-debugger support, we now enable
the generation of interrupts at attach time, and we leave them
enabled while the device is closed.
Reviewed by: bde (I've since made chages as per his suggestions)
- Replace some very poorly thought out API hacks that should have been
fixed a long while ago.
- Provide some much more flexible search functions (resource_find_*())
- Use strings for storage instead of an outgrowth of the rather
inconvenient temporary ioconf table from config(). We already had a
fallback to using strings before malloc/vm was running anyway.
. remove stale comments and a stale #define (from the old days of ft(4))
. make MAX_SEC_SIZE (used in isa_dmainit()) a #define
. fix a typo in a string
. use 0 as the blocksize in devstat_add_entry(), since the actual blocksize
is unknown (devstat(9) suggests to use 0 in that case)
I think bde even reviewed it once.
Also, change the name of ActionTEC pat to more generic Lucent Kermit
chip. Add stub for Xircom card. Add cardbus attachment too.
can be made userland-visible as <dev/ic/...>. Also, those files are
not supposed to contain any bus-specific details at all, so placing
them under .../isa/ has been a misnomer from the beginning.
The files in src/sys/dev/ic/ have been repo-copied from their old
location (this commit is a forced null commit there to record this
message).
memory I/O space. Otherwise, our resource allocation system might
mistakenly assign pccard, plug and play devices or other things
addresses that conflict with ROMs.
I cleaned up his code a little from the submited driver: style(9)
issues, commentary on why something that looks incorrect really is
correct. Also noted that while a checksum field is defined for the
ROMs, enough common hardware neglects it to make it not worthwhile
checking.
Submitted by: Nikolai Saoukh <nms@otdel-1.org>
PR: 22078
. FD_CLRERR clears the error counter, thus re-enables kernel error
printf()s,
. FD_GSTAT obtains the last FDC operation state, if any,
. FDOPT_NOERRLOG (temporarily) turns off kernel printf() floppy
error logging,
. FDOPT_NOERROR makes the kernel ignore an FDC error, thus can
enable the transfer of an erroneous sector to the user application
All options are being cleared on (last) close.
Prime consumer of the last features will be fdread(1), to be committed
shortly.
(FD_CLRERR should be wired into fdcontrol(8), but then fdcontrol(8)
needs a major rewrite anyway.)
other "system" header files.
Also help the deprecation of lockmgr.h by making it a sub-include of
sys/lock.h and removing sys/lockmgr.h form kernel .c files.
Sort sys/*.h includes where possible in affected files.
OK'ed by: bde (with reservations)
been made machine independent and various other adjustments have been made
to support Alpha SMP.
- It splits the per-process portions of hardclock() and statclock() off
into hardclock_process() and statclock_process() respectively. hardclock()
and statclock() call the *_process() functions for the current process so
that UP systems will run as before. For SMP systems, it is simply necessary
to ensure that all other processors execute the *_process() functions when the
main clock functions are triggered on one CPU by an interrupt. For the alpha
4100, clock interrupts are delievered in a staggered broadcast fashion, so
we simply call hardclock/statclock on the boot CPU and call the *_process()
functions on the secondaries. For x86, we call statclock and hardclock as
usual and then call forward_hardclock/statclock in the MD code to send an IPI
to cause the AP's to execute forwared_hardclock/statclock which then call the
*_process() functions.
- forward_signal() and forward_roundrobin() have been reworked to be MI and to
involve less hackery. Now the cpu doing the forward sets any flags, etc. and
sends a very simple IPI_AST to the other cpu(s). AST IPIs now just basically
return so that they can execute ast() and don't bother with setting the
astpending or needresched flags themselves. This also removes the loop in
forward_signal() as sched_lock closes the race condition that the loop worked
around.
- need_resched(), resched_wanted() and clear_resched() have been changed to take
a process to act on rather than assuming curproc so that they can be used to
implement forward_roundrobin() as described above.
- Various other SMP variables have been moved to a MI subr_smp.c and a new
header sys/smp.h declares MI SMP variables and API's. The IPI API's from
machine/ipl.h have moved to machine/smp.h which is included by sys/smp.h.
- The globaldata_register() and globaldata_find() functions as well as the
SLIST of globaldata structures has become MI and moved into subr_smp.c.
Also, the globaldata list is only available if SMP support is compiled in.
Reviewed by: jake, peter
Looked over by: eivind
tsc_present in the right places (together with other variables of the
same linkage), and don't use messy ifdefs just to avoid exporting it in
some cases.
Some things needed bits of <i386/include/lock.h> - cy.c now has its
own (only) copy of the COM_(UN)LOCK() macros, and IMASK_(UN)LOCK()
has been moved to <i386/include/apic.h> (AKA <machine/apic.h>).
Reviewed by: jhb
- Use swi_* function names.
- Use void * to hold cookies to handlers instead of struct intrhand *.
- In sio.c, use 'driver_name' instead of "sio" as the name of the driver
lock to minimize diffs with cy(4).
mtx_enter(lock, type) becomes:
mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks)
mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized)
similarily, for releasing a lock, we now have:
mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN.
We change the caller interface for the two different types of locks
because the semantics are entirely different for each case, and this
makes it explicitly clear and, at the same time, it rids us of the
extra `type' argument.
The enter->lock and exit->unlock change has been made with the idea
that we're "locking data" and not "entering locked code" in mind.
Further, remove all additional "flags" previously passed to the
lock acquire/release routines with the exception of two:
MTX_QUIET and MTX_NOSWITCH
The functionality of these flags is preserved and they can be passed
to the lock/unlock routines by calling the corresponding wrappers:
mtx_{lock, unlock}_flags(lock, flag(s)) and
mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN
locks, respectively.
Re-inline some lock acq/rel code; in the sleep lock case, we only
inline the _obtain_lock()s in order to ensure that the inlined code
fits into a cache line. In the spin lock case, we inline recursion and
actually only perform a function call if we need to spin. This change
has been made with the idea that we generally tend to avoid spin locks
and that also the spin locks that we do have and are heavily used
(i.e. sched_lock) do recurse, and therefore in an effort to reduce
function call overhead for some architectures (such as alpha), we
inline recursion for this case.
Create a new malloc type for the witness code and retire from using
the M_DEV type. The new type is called M_WITNESS and is only declared
if WITNESS is enabled.
Begin cleaning up some machdep/mutex.h code - specifically updated the
"optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN
and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently
need those.
Finally, caught up to the interface changes in all sys code.
Contributors: jake, jhb, jasone (in no particular order)