Quote from kern/37573:
There is an obvious race in netinet/ip_dummynet.c:config_pipe().
Interrupts are not blocked when changing the params of an
existing pipe. The specific crash observed:
... -> config_pipe -> set_fs_parms -> config_red
malloc a new w_q_lookup table but take an interrupt before
intializing it, interrupt handler does:
... -> dummynet_io -> red_drops
red_drops dereferences the uninitialized (zeroed) w_q_lookup
table.
o Flush accumulated credits for idle pipes.
o Flush accumulated credits when change pipe characteristics.
o Change dn_flow_queue.numbytes type to unsigned long.
Overlapping dn_flow_queue->numbytes in ready_event() leads to
numbytes becomes negative and SET_TICKS() macro returns a very
big value. heap_insert() overlaps dn_key again and inserts a
queue to a ready heap with a sched_time points to the past.
That leads to an "infinity" loop.
PR: kern/33234, kern/37573, misc/42459, kern/43133,
kern/44045, kern/48099
Submitted by: Mike Hibler <mike@cs.utah.edu> (kern/37573)
MFC after: 6 weeks
pages which represent actual physical memory we must strip off the fake
page in order to allow illegal aliases to be detected. Otherwise we map
uncacheable in the virtual and physical caches and set the side effect bit,
as is required for mapping device memory.
This fixes gstat on sparc64, which wants to mmap kernel memory through a
character device.
that the feature can be enabled during the boot process. Note the
continued limitation that FreeBSD fails so rapidly with this setting
enabled that it's hard to narrow down particular failures for
correction; we really need per-malloc type failure rates.
flag (M_DONTWAIT / M_TRYWAIT) to a malloc(9) blocking disposition flag
(M_NOWAIT, M_WAITOK). The semantic match isn't perfect, but for
scenarios where malloc data is used in the network stack, such as for
MAC labeling or for m_tags, we sometimes need to map from one to the
other to get the right blocking behavior.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
in a debugging feature causing M_NOWAIT allocations to fail at
a specified rate. This can be useful for detecting poor
handling of M_NOWAIT: the most frequent problems I've bumped
into are unconditional deference of the pointer even though
it's NULL, and hangs as a result of a lost event where memory
for the event couldn't be allocated. Two sysctls are added:
debug.malloc.failure_rate
How often to generate a failure: if set to 0 (default), this
feature is disabled. Otherwise, the frequency of failures --
I've been using 10 (one in ten mallocs fails), but other
popular settings might be much lower or much higher.
debug.malloc.failure_count
Number of times a coerced malloc failure has occurred as a
result of this feature. Useful for tracking what might have
happened and whether failures are being generated.
Useful possible additions: tying failure rate to malloc type,
printfs indicating the thread that experienced the coerced
failure.
Reviewed by: jeffr, jhb
This keeps the logical cpu's halted in the idle loop. By default
the logical cpu's are halted at startup. It is also possible to
halt any cpu in the idle loop now using machdep.hlt_cpus.
Examples of how to use this:
machdep.hlt_cpus=1 halt cpu0
machdep.hlt_cpus=2 halt cpu1
machdep.hlt_cpus=4 halt cpu2
machdep.hlt_cpus=3 halt cpu0,cpu1
Reviewed by: jhb, peter
1) Its critical for HTT. There's less foot-shooting opportunity.
2) I've seen significant improvements in interactive response to commands
over ssh sessions. I assume this is less lock contention.
3) As incentive to finish the idle cpu IPI wakeup stuff.
4) The machine on my desk was blowing hot air in my general direction
because somebody forgot to turn the hlt on, and it saves 50 watts per
cpu..
The machdep.cpu_idle_hlt sysctl is still available, but now the default
is the same as on UP kernels.
opening the POSIX fifo; convert ENXIO error returns to EOPNOTSUPP.
This improves handling of the case where the /var/run/lock fifo exists
but there is no listener: we immediately return EOPNOTSUPP rather
than blocking until a listener turns up. This could occur during a
diskless boot before rpc.lockd is loaded, or if the lock file persists
across a reboot following the disabling of rpc.lockd. This may have
suddenly started to occur due to fifo blocking fixes--previously it
looks like attempts to read on a fifo with no listener would time out
due to insufficient resources.
Reviewed by: alfred
- Add data structuress for doing 64-bit scatter/gather
- Move busdma tag creations around so that only the parent is
created in aac_pci.c.
- Retrieve the capabilities word from the firmware before setting
up command structures and tags. This allows the driver to decide
whether to do 64-bit commands, and if work-arounds are needed for
systems with >2GB of RAM.
- Only enable the SCSI passthrough if it's enabled in the capabilities
word in the firmware.
This should fix problems with the 2120S and 2200S cards in systems with more
than 2GB of RAM. Full 64-bit support is forthcoming.
MFC-After: 1 week
additional flags argument to indicate blocking disposition, and
pass in M_NOWAIT from the IP reassembly code to indicate that
blocking is not OK when labeling a new IP fragment reassembly
queue. This should eliminate some of the WITNESS warnings that
have started popping up since fine-grained IP stack locking
started going in; if memory allocation fails, the creation of
the fragment queue will be aborted.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
and be prepared to wait much longer for codec to become ready.
Credit to Hugo D. Valentim <hvalentim@gmx.net> for reporting the
problem, providing useful pointers, and repeated diff testing.
channel and disable DXS3. Several users have reported DXS3 as playing
at half speed on the 8233A revision of the chipset. This implicitly
means no SPDIF for VIA8233A users.
cdcleanup(). This fixes sysctl problems ("can't re-use a leaf") when
someone adds another peripheral at the same unit number. (e.g. rescan da0,
it goes away, then rescan again and da0 comes back, but since we haven't
cleaned up the sysctl variables from the last da0 instance, we can't
register the variables for the new instance under the same name.)
Reported by: njl
Tested by: njl
- Don't try to fragment the packet if it's smaller than mbuf_frag_size.
- Preserve the size of the mbuf chain which is modified by m_split().
- Check that m_split() didn't return NULL.
- Make it so we don't end up with two M_PKTHDR mbuf in the chain.
- Use m->m_pkthdr.len instead of m->m_len so that we fragment the whole
chain and not just the first mbuf.
- Fix a nearby style bug and rework the logic of the loops so that it's
more clear.
This is still not quite right, because we're clearly abusing m_split() to
do something it was not designed for, but at least it works now. We
should probably move this code into a m_fragment() function when it's
correct.
The former fakes a valid response to an inquiry command. (I am completely
blown away that there are devices which hang upon receiving inquiry). The
latter returns "invalid request" to any inquiry commands with EVPD set.
NO_INQUIRY implies NO_INQUIRY_EVPD but not vice versa. Both quirks have been
tested separately on my USB key although it didn't require either of them.
While I'm here, fix wildcarding so that any/all of vendor, product, revision
can be wildcarded.
Idea from: Linux
MFC after: 2 weeks
allows you to tell ip_output to fragment all outgoing packets
into mbuf fragments of size net.inet.ip.mbuf_frag_size bytes.
This is an excellent way to test if network drivers can properly
handle long mbuf chains being passed to them.
net.inet.ip.mbuf_frag_size defaults to 0 (no fragmentation)
so that you can at least boot before your network driver dies. :)
functionality for the following entry pints:
mac_test_init_proc_label()
mac_test_destroy_proc_label()
For process labeling entry points, now also track the use of process
labels and test assertions about their integrity and life cycle.
mac_test_thread_userret()
mac_test_check_kenv_dump()
mac_test_check_kenv_get()
mac_test_check_kenv_set()
mac_test_check_kenv_unset()
mac_test_check_kld_load()
mac_test_check_kld_stat()
mac_test_check_kld_unload()
mac_test_check_sysarch_ioperm()
mac_test_check_system_acct()
mac_test_check_system_reboot()
mac_test_check_system_settime()
mac_test_check_system_swapon()
mac_test_check_system_swapoff()
mac_test_check_system_sysctl()
For other entry points, just provide testing stubs.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
for enforcement:
mac_mls_check_system_swapon() - Require that the subject and the
swapfile target vnode labels dominate one another. An additional
check is probably needed here to require that the swapfile target
has a label of mls/high to prevent information leakage through
swapfiles.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
include a new entry point available for enforcement:
mac_bsdextended_check_system_swapon() - Apply extended access
control checks to the file target of swap.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
points available for enforcement:
mac_biba_check_sysarch_ioperm() - Require Biba privilege to make
use of privileged machine-dependent interfaces, protecting against
bypass of the policy via various mechanisms.
mac_biba_check_system_swapoff() - Require Biba privilege to disable
swapping against a vnode target.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
where physical addresses larger than virtual addresses, such as i386s
with PAE.
- Use this to represent physical addresses in the MI vm system and in the
i386 pmap code. This also changes the paddr parameter to d_mmap_t.
- Fix printf formats to handle physical addresses >4G in the i386 memory
detection code, and due to kvtop returning vm_paddr_t instead of u_long.
Note that this is a name change only; vm_paddr_t is still the same as
vm_offset_t on all currently supported platforms.
Sponsored by: DARPA, Network Associates Laboratories
Discussed with: re, phk (cdevsw change)
flexible process_fork, process_exec, and process_exit eventhandlers. This
reduces code duplication and also means that I don't have to go duplicate
the eventhandler locking three more times for each of at_fork, at_exec, and
at_exit.
Reviewed by: phk, jake, almost complete silence on arch@
is set to 0, it now has the same affect as setting witness_dead used to
have.
- Added a sysctl handler that allows root to change witness_watch from a
non-zero value to zero to disable witness at runtime. Note that you
can't turn witness back on once it is off. You can only turn it off as
a one-way switch.
- Added a comment describing the possible values of witness_watch.
Otherwise sysbeep() makes an annoying clicking sound on some systems.
'kbdcontrol -b off' just sets the duration and pitch to zero, it doesn't
set the QUIET_BELL flag.
Tested by: SorAlx <soralx@cydem.zp.ua>
PR: misc/41772
MFC after: 1 week
fifo_open() waiting for another reader or writer if one arrived and
departed while we were waiting (or a little earlier).
Rev.1.79 broke blocking opens of fifos by making them time out after 1
second. This was bad for at least apsfilter.
Tested by: "Simon 'corecode' Schubert" <corecode@corecode.ath.cx>,
Alexander Leidinger <Alexander@leidinger.net>,
phk
MFC after: 4 weeks
doesn't do it. This fixes all known causes of "Context switches not
allowed in the debugger" in mi_switch(). The main cause was trap_fatal()
calling kdb_trap() with interrupts enabled. Switching to ithreads for
interrupt handling then made fatal traps more fatal and harder to debug.
The problem was limited in -current because most interrupt handlers are
blocked by Giant, but it occurred almost deterministically for me because
my clock interrupt handlers are non-fast and not blocked by Giant.
- Clear PCIM_CMD_MWRICEN:
some chips seem to have problem with write invalidate.
clearing this bit fixes SBP timeout problem.
Tested by: Michael Reifenberger <Michael.Reifenberger@Plaut.de>
- Set PCIM_CMD_SERRESPEN and PCIM_CMD_PERRESPEN
- Moderate value for latency timer.
which are no longer required now that we have UFS2 with extended
attribute transactions.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
Change 27224 by imp@imp_hammer on 2003/03/22 00:16:22
Put what I think are the correct TX RATE translation tables
in place for LUCENT firmware. This is based on the 4.x driver.
Maybe it should be table driven?
ifconfig wi0 media DS/11Mbps still fails, but it fails before
we even get to the txrate stuff, so other things are wrong.
Change 27225 by imp@imp_hammer on 2003/03/22 00:45:11
Default ic_fixed_rate to -1. This is the same thing as autoselect.
There really should be a #define for this...
- Use it in atacontrol(8) when listing ATA devices instead of
stopping at the first ENXIO received.
This makes atacontrol list work on my sparc64 where the two ATA
channels I have are numbered 2 and 3.
Reviewed by: sos
functions are now all basically identical except that alpha linux uses
Elf64 arguments and svr4 and i386 linux use Elf32. The fixups include
changing the first argument to be a register_t ** to match the prototype
for fixup functions, asserting that the process in the image_params struct
is always curproc and removing unnecessary locking to read credentials as a
result, and a few style fixes.
in busdma tags. There are currently no tags shared accross
different drivers so this isn't needed at the moment, but it
will be required when we'll have a proper newbus method to get
the parent busdma tag.
switches. Not as lazy as it could be. Changing FPU state with sigcontext
still TODO.
fpu.c - convert some asm to inline C, and macroize fpu loads/stores
swtch.S - call out to save/restore fpu routines
trap.c - always call enable_fpu, since this shouldn't be called once
the FPU has been enabled for a thread
genassym.c - define for pcb fpu flag
on future UltraSPARC cpus for which the data cache is not direct mapped.
- Move UltraSPARC I and II (spitfire, blackbird, sapphire, sabre) specific
functions to spitfire.c, and add cheetah.c for UltraSPARC III specific
functions. Initially just cache flushing, but there are a few other
functions that will need to move here.
- Add an ipi handler for data cache flushing on UltraSPARC III.
- Use function pointers to select the right cache flushing functions based
on cpu_impl.
With this it is possible to boot single user from an mfs root on UltraSPARC
III systems, including spinning up secondary processors. There is currently
no support for the host to pci bridge, and no documentation for it is
publically available.
Thanks to Oleg Derevenetz for providing access to a system with UltraSPARC
III+ cpus.
UltraSPARC III and higher cpus and do needed setup.
- Disable the "system tick" interrupt for UltraSPARC III. This avoids
an interrupt storm on startup since we're not prepared for these at
all. This feature has questionable use anyway.
- Clear tick on startup and then leave it alone.
kse_mailbox to schedule an upcall, this is useful for userland timeout
routine, for example pthread_cond_timedwait().
Also extract upcall scheduling code from kse_reassign and create
a new function called thread_switchout to include these code.
Reviewed by: julain
the devstat is for an "interior" GEOM node and register using the
name argument as a geom identity pointer. Do not put these devstat
structures on the list returned by the sysctl.
This gives us the ability to tell the two kinds of nodes apart and
leave the current "strictly physical" view of devstat intact without
modifications, yet be able to use devstat for both kinds of devices.
It also saves us bloating struct devstat with another 48 bytes of
space for the name. At least for now.
Reviewed by: ken
Add a mutex and protect the allocation and traversal of the list with it.
When we allocate a page for devstat use we drop the mutex and use
M_WAITOK this is not nice, but under the given circumstances the
best we can do.
In the sysctl handler for returning the devstat entries we do not want to
hold the mutex across copyout(9) calls, so we keep a very careful eye on
the devstat_generation count, and abandon with EBUSY if it changes under
our feet.
Specifically test for BIO_WRITE, rather than default non-read,non-deletes
as write. Make the default be DEVSTAT_NO_DATA.
Add atomic increments of the sequence[01] fields so applications using the
mmap'ed view stand a chance of detecting updates in progress.
Reviewed by: ken
driver should use port or memory based IO, determine it dynamically
at runtime, preferring MMIO where possible. This helps us support newer
arches which dislike port based access better.
Tested on i386 & sparc64, with 3c900, 905, 905b, and 905C cards.
(in varying combinations by both jake and myself)
one tx buffer for these cards. The old driver only used one. We use
1 for symbol, and 3 for prism cards.
o Don't do the maximum loops thing in the ISR. In fact, revert to the
old interrupt handler. Lucent cards don't seem to work too well if
you don't disable/enable interrupts from the card in the ISR.
Between these two changes, Lucent cards suck less. They work in
autoselect mode only. And seem to get 1Mbps or 2Mbps only. Setting a
specific media speed doesn't work, and I've had a few issues even with
these patches. They turn a former brick into a nearly useful card.
These patches work on the prism 2 and 2.5 PC Card cards that I have.
I've not tested this on PCI cards. I suspect, but couldn't find
proof, that they were the reason that the ISR was changed so radically
from its FreeBSD roots in NetBSD. We might need to have a variant ISR
if so.
4 bits. This reportedly fixes booting on the SW7500CW2. Much thanks to
the submitter for tracking this down!
Submitted by: Brian Buchanan <brian@ncircle.com>
Reviewed by: peter
MFC after: 3 days
- Use SYSINIT to initialize the structures instead of checking
total_bpages against 0 in alloc_bounce_pages(), which could lead
to several initializations being done at the same time.
- Add missing locking in bus_dmamap_load(), the bounce pages mutex
must be held when calling reserve_bounce_pages() and when touching
the bounce_map_waitinglist list.
- Remove the useless splhigh() and splx() calls.
- Assert that the bounce pages mutex is held in reserve_bounce_pages()
to catch regressions.
code both seem to call wi_start (directly or via the if_start pointer)
without checking to see if OACTIVE is 0. In addition, I think that
with the use of 3 transmit buffers this routine can be called with
OACTIVE set, but I might be mistaken about that (and it doesn't
matter).
Reviewed by: sam
Noticed by: imp, alfred, ambrisko
but I decided that it was important for this patch to not bit-rot, and
since it is mainly moving code around, the total amount of entropy is
epsilon /phk)
This is a patch to move the common parts of linux_getcwd() back into
kern/vfs_cache.c so that the standard FreeBSD libc getcwd() can use it's
extended functionality. The linux syscall linux_getcwd() in
compat/linux/linux_getcwd.c has been rewritten to use it too. It should
be possible to simplify libc's getcwd() after this. No doubt this code
needs some cleaning up, since I've left in the sysctl variables I used
for debugging.
PR: 48169
Submitted by: James Whitwell <abacau@yahoo.com.au>
to avoid a "locking against myself" panic when udf_hashins() tries
to lock it again. Lock the vnode in udf_hashins() before adding it to
the hash bucket.
are machine dependent because they are not required to update the tlb when
mappings are added or removed, and doing so is machine dependent.
In addition, an implementation may require that pages mapped with pmap_kenter
have a backing vm_page_t, which is not necessarily true of all physical
pages, and so may choose to pass the vm_page_t to pmap_kenter instead of the
physical address in order to make this requirement clear.
- Support IFF_MONITOR.
- Borrow some consistency for if_input() routines from if_ethersubr.c.
- Correct comments regarding fddi_input() that no longer apply.
Kernel:
Change statistics to use the *uptime() timescale (ie: relative to
boottime) rather than the UTC aligned timescale. This makes the
device statistics code oblivious to clock steps.
Change timestamps to bintime format, they are cheaper.
Remove the "busy_count", and replace it with two counter fields:
"start_count" and "end_count", which are updated in the down and
up paths respectively. This removes the locking constraint on
devstat.
Add a timestamp argument to devstat_start_transaction(), this will
normally be a timestamp set by the *_bio() function in bp->bio_t0.
Use this field to calculate duration of I/O operations.
Add two timestamp arguments to devstat_end_transaction(), one is
the current time, a NULL pointer means "take timestamp yourself",
the other is the timestamp of when this transaction started (see
above).
Change calculation of busy_time to operate on "the salami principle":
Only when we are idle, which we can determine by the start+end
counts being identical, do we update the "busy_from" field in the
down path. In the up path we accumulate the timeslice in busy_time
and update busy_from.
Change the byte_* and num_* fields into two arrays: bytes[] and
operations[].
Userland:
Change the misleading "busy_time" name to be called "snap_time" and
make the time long double since that is what most users need anyway,
fill it using clock_gettime(CLOCK_MONOTONIC) to put it on the same
timescale as the kernel fields.
Change devstat_compute_etime() to operate on struct bintime.
Remove the version 2 legacy interface: the change to bintime makes
compatibility far too expensive.
Fix a bug in systat's "vm" page where boot relative busy times would
be bogus.
Bump __FreeBSD_version to 500107
Review & Collaboration by: ken
now unnecessary hack from the previous commit;
- Add support for Interrupt Latch Register (ILR) into puc(4). So far only
ILRs compatible with specifications from Digi International are supported.
Support for other types of ILRs could be easily added later;
- Correct clock frequency for IC Book Labs Dreadnought x16 Lite board;
- Enable ILR detection/usage for IC Book Labs Dreadnought x16 boards.
Sponsored by: IC Book Labs
MFC after: 2 weeks