lots of errors. Blind substitution of "dev_t foo" by "struct cdev *foo"
in comments usually just created an English syntax error (e.g.,
"struct cdev *changes"), but here it did less than that since the dev_t
is a user dev_t.
in npxsetregs() too. npxsetregs() must overwrite the previous state, and
it is never paired with an npxgetregs() that would defuse the previous
state (since npxgetregs() would have fninit'ed the state, leaving nothing
to do).
PR: 68058 (this should complete the fix)
Tested by: Simon Barner <barner@in.tum.de>
to vm_map_find() that is less likely to be outside of addressable memory
for 32-bit processes: just past the end of the largest possible heap.
This is the same hint that mmap() uses.
ki_childutime, and ki_emul. Also uses the timevaladd() routine to
correct the calculation of ki_childtime. That will correct the value
returned when ki_childtime.tv_usec > 1,000,000.
This also implements a new KERN_PROC_GID option for kvm_getprocs().
(there will be a similar update to lib/libkvm/kvm_proc.c)
Submitted by: Cyrille Lefevre
which just mark areas which are empty due to issues with the alignment
of already-existing fields. This defines several unrelated variables
in one shot, because most of the work for updating kinfo_proc is making
sure the sizeof(struct kinfo_proc) remains the same across all hardware
platforms, and that no space is wasted on any platform due to alignment
issues with the new variables.
Submitted by: some by Cyrille Lefevre, some by me
unnecessary because cpu_setregs() and/or npxinit() always sets CR0_TS
during system initialization, and CR0_TS is set in the next statement
(fpstate_drop()) if necessary after system initialization. Setting
it unnecessarily was less than a pessimization since it broke the
invariant that the npx can be used without an npxdna() trap if
fpucurthread is non-null. The broken invariant became harmful when I
added an fnclex to npxdrop().
Removed setting of CR0_MP in exec_setregs(). This was similarly
unnecessary but was harmless.
Updated comments (mainly by removing them). Things are simpler now
that we have cpu_setregs() and don't support a math emulator or pretend
to support not having either a math emulator or an npx.
Removed the ifdef for avoiding setting CR0_NE in the !SMP case in
cpu_setregs(). npx_probe() should reverse the setting if it wants to
force IRQ13 exception handling for testing.
lock state. Convert tsleep() into msleep() with socket buffer mutex
as argument. Hold socket buffer lock over sbunlock() to protect sleep
lock state.
Assert socket buffer lock in sbwait() to protect the socket buffer
wait state. Convert tsleep() into msleep() with socket buffer mutex
as argument.
Modify sofree(), sosend(), and soreceive() to acquire SOCKBUF_LOCK()
in order to call into these functions with the lock, as well as to
start protecting other socket buffer use in their implementation. Drop
the socket buffer mutexes around calls into the protocol layer, around
potentially blocking operations, for copying to/from user space, and
VM operations relating to zero-copy. Assert the socket buffer mutex
strategically after code sections or at the beginning of loops. In
some cases, modify return code to ensure locks are properly dropped.
Convert the potentially blocking allocation of storage for the remote
address in soreceive() into a non-blocking allocation; we may wish to
move the allocation earlier so that it can block prior to acquisition
of the socket buffer lock.
Drop some spl use.
NOTE: Some races exist in the current structuring of sosend() and
soreceive(). This commit only merges basic socket locking in this
code; follow-up commits will close additional races. As merged,
these changes are not sufficient to run without Giant safely.
Reviewed by: juli, tjr
ip_ctloutput(), as it may need to perform blocking memory allocations.
This also improves consistency with locking relative to other points
that call into ip_ctloutput().
Bumped into by: Grover Lines <grover@ceribus.net>
output to permanently (not ephemerally) go to the console. It is also
sent to any other console specified by TIOCCONS as normal.
While I'm here, document the kern.log_console_output sysctl.
NULL ifc.ifc_buf pointer, to determine the expected buffer size.
The submitted fix only takes account of interfaces with an AF_INET
address configured. This could no doubt be improved.
PR: kern/45753
Submitted by: Jacques Garrigue (with cleanups)
was received on a broadcast address on the input path. Under certain
circumstances this could result in a panic, notably for locally-generated
packets which do not have m_pkthdr.rcvif set.
This is a similar situation to that which is solved by
src/sys/netinet/ip_icmp.c rev 1.66.
PR: kern/52935
be suspended in thread_suspend_check, after they are resumed, all
threads will call thread_single, but only one can be success,
others should retry and will exit in thread_suspend_check.
wasn't actually clean, it was saving the xmm registers as left over by the
bios. fninit() doesn't clear those.
In fpudna(), instead of doing a fninit() and forgetting to load the initial
mxcsr, do a full fxrstor(&fpu_cleanstate). Otherwise we hand over whatever
random values are left in the xmm registers by the last user.
I'm not certain of whether this is excessive paranoia or not, but there was
an outright bug in neglecting to set the mxcsr value that caused awk to
SIGFPE in some case. Especially for Tim Robbins. :-)
i386 probably should do something about the mxcsr setings too.
Found by: tjr
encapsulated within an IPv6 datagram, do not abuse the 'ipov' pointer
when registering trace records. 'ipov' is specific to IPv4, and
will therefore be uninitialized.
[This fandango is only necessary in the first place because of our
host-byte-order IP field pessimization.]
PR: kern/60856
Submitted by: Galois Zheng
rwatson_netperf:
Introduce conditional locking of the socket buffer in fifofs kqueue
filters; KNOTE() will be called holding the socket buffer locks in
fifofs, but sometimes the kqueue() system call will poll using the
same entry point without holding the socket buffer lock.
Introduce conditional locking of the socket buffer in the socket
kqueue filters; KNOTE() will be called holding the socket buffer
locks in the socket code, but sometimes the kqueue() system call
will poll using the same entry points without holding the socket
buffer lock.
Simplify the logic in sodisconnect() since we no longer need spls.
NOTE: To remove conditional locking in the kqueue filters, it would
make sense to use a separate kqueue API entry into the socket/fifo
code when calling from the kqueue() system call.
unless the segment really contains the last of the data for the stream.
PR: kern/34619
Obtained from: OpenBSD (tcp_output.c rev 1.47)
Noticed by: Joseph Ishac
Reviewed by: George Neville-Neil
frstor can trap despite it being a control instruction, since it bogusly
checks for pending exceptions in the state that it is overwriting.
This used to be a non-problem because frstor was always paired with a
previous fnsave, and fnsave does an implicit fninit so any pending
exceptions only remain live in the saved state. Now frstor is sometimes
paired with npxdrop() and we must do a little more than just forget
that the npx was used in npxdrop() to avoid a trap later. This is a
non-problem in the FXSR case because fxrstor doesn't do the bogus check.
FXSR is part of SSE, and npxdrop() is only in FreeBSD-5.x, so this bug
only affected old machines running FreeBSD-5.x.
PR: 68058
- Lock down low hanging fruit use of sb_flags with socket buffer
lock.
- Lock down low hanging fruit use of so_state with socket lock.
- Lock down low hanging fruit use of so_options.
- Lock down low-hanging fruit use of sb_lowwat and sb_hiwat with
socket buffer lock.
- Annotate situations in which we unlock the socket lock and then
grab the receive socket buffer lock, which are currently actually
the same lock. Depending on how we want to play our cards, we
may want to coallesce these lock uses to reduce overhead.
- Convert a if()->panic() into a KASSERT relating to so_state in
soaccept().
- Remove a number of splnet()/splx() references.
More complex merging of socket and socket buffer locking to
follow.
devclass will be present even if the driver was disabled by a hint. Using
device_get_softc() provides the right info even if it's overkill.
Explained by: jhb
The big lines are:
NODEV -> NULL
NOUDEV -> NODEV
udev_t -> dev_t
udev2dev() -> findcdev()
Various minor adjustments including handling of userland access to kernel
space struct cdev etc.
- prevent an endless loop with route-to lo0, fixes PR 3736 (dhartmei@)
- The rule_number parameter for pf_get_pool() needs to be 32 bits, not 8 -
this fixes corruption of the address pools with large rulesets.
(mcbride@, pb@)
Reviewed-by: dhartmei
timeout values in the CAM CCBs. Divide by 1000 to get values in seconds
which are what ata(4) timeouts internally use.
This does lose granularity, though, and small values can now round down
to zero. It's probably worth making all ata(4) timeouts in terms of
hz/ticks/milliseconds/something.
Version 3.5 brings:
- Atomic commits of ruleset changes (reduce the chance of ending up in an
inconsistent state).
- A 30% reduction in the size of state table entries.
- Source-tracking (limit number of clients and states per client).
- Sticky-address (the flexibility of round-robin with the benefits of
source-hash).
- Significant improvements to interface handling.
- and many more ...
- Remove pflog and pfsync modules. Things will change in such a fashion
that there will be one module with pf+pflog that can be loaded into
GENERIC without problems (which is what most people want). pfsync is no
longer possible as a module.
- Add multicast address for in-kernel multicast pfsync protocol. Protocol
glue will follow once the import is done.
- Add one more mbuf tag
placates gcc which seems to like to complain about -1 being assigned to
an unsigned value. It is well defined and intended, but since signess bugs
are being hunted just change to 0xffffffff.
o Mask the lower 8 bits, not the lower 4 bits for the ai_capabilities word.
All 8 bits are defined and the 0xf was almost certainly a typo.
o Define APM_UNKNOWN to 0xff for emulation layer.
do not pick up the first local ip address for the source
ip address, return ENETUNREACH instead.
Submitted by: Gleb Smirnoff
Reviewed by: -current (silence)
mode tunnel, take the per-route MTU into account, *if* and *only if* it
is non-zero (as found in struct rt_metrics/rt_metrics_lite).
PR: kern/42727
Obtained from: NetBSD (ip_input.c rev 1.151)
fixes the problem of UDP sockets getting wedged in a connected state (and
bound to their destination) under heavy load.
Temporary bind/connect should probably be deleted in future
as an optimization, as described in "A Faster UDP" [Partridge/Pink 1993].
Notes:
- INP_LOCK() is already held in udp_output(). The connection is in effect
happening at a layer lower than the socket layer, therefore in theory
socket locking should not be needed.
- Inlining the in_pcbdisconnect() operation buys us nothing (in the case
of the current state of the code), as laddr is not part of the
inpcb hash or the udbinfo hash. Therefore there should be no need
to rehash after restoring laddr in the error case (this was a
concern of the original author of the patch).
PR: kern/41765
Requested by: gnn
Submitted by: Jinmei Tatuya (with cleanups)
Tested by: spray(8)
so_timeo Used as a sleep/wakeup address, no locking.
sb_* Almost all socket buffer fields locked with
sockbuf lock for the oskcet buffer.
so_cred Static after socket creation.
using rawcb_mtx. Hold this mutex while modifying or iterating over
the control list; this means that the mutex is held over calls into
socket delivery code, which no longer causes a lock order reversal as
the routing socket code uses a netisr to avoid recursing socket ->
routing -> socket.
Note: Locking of IPsec consumers of rawcb_list is not included in this
commit.
ALTQ enabled versions of IFQ_* macros by default, as requested by serveral
others. This is a follow-up to the quick fix I committed yesterday which
turned off the ALTQ checks for non-ALTQ kernels.
other modules to explode. eg: snd_ich->snd_pcm and umass->usb.
The problem was that I was using the unified base address of the module
instead of finding the start address of the section in question.
1. Remove a race whereby contigmalloc() would deadlock against the
running processes in the system if they kept reinstantiating
the memory on the active and inactive page queues that it was
trying to flush out. The process doing the contigmalloc() would
sit in "swwrt" forever and the swap pager would be going at full
force, but never get anywhere. Instead of doing it until the
queues are empty, launder for as many iterations as there are
pages in the queue.
2. Do all laundering to swap synchronously; previously, the vnode
laundering was synchronous and the swap laundering not.
3. Increase the number of launder-or-allocate passes to three, from
two, while failing without bothering to do all the laundering on
the third pass if allocation was not possible. This effectively
gives exactly two chances to launder enough contiguous memory,
helpful with high memory churn where a lot of memory from one pass
to the next (and during a single laundering loop) becomes dirtied
again.
I can now reliably hot-plug hardware requiring a 256KB contigmalloc()
without having the kldload/cbb ithread sit around failing to make
progress, while running a busy X session. Previously, it took killing
X to get contigmalloc() to get further (that is, quiescing the system),
and even then contigmalloc() returned failure.
live in src/lib/libc/gmon/gmon.c. glibc puts these prototypes in the same
header, so put them here for the sake of consistency.
PR: bin/4459
Reviewed by: bde
success and a proper errno value on failure. This makes it
consistent with cv_timedwait(), and paves the way for the
introduction of functions such as sema_timedwait_sig() which can
fail in multiple ways.
Bump __FreeBSD_version and add a note to UPDATING.
Approved by: scottl (ips driver), arch
flags relating to several aspects of socket functionality. This change
breaks out several bits relating to send and receive operation into a
new per-socket buffer field, sb_state, in order to facilitate locking.
This is required because, in order to provide more granular locking of
sockets, different state fields have different locking properties. The
following fields are moved to sb_state:
SS_CANTRCVMORE (so_state)
SS_CANTSENDMORE (so_state)
SS_RCVATMARK (so_state)
Rename respectively to:
SBS_CANTRCVMORE (so_rcv.sb_state)
SBS_CANTSENDMORE (so_snd.sb_state)
SBS_RCVATMARK (so_rcv.sb_state)
This facilitates locking by isolating fields to be located with other
identically locked fields, and permits greater granularity in socket
locking by avoiding storing fields with different locking semantics in
the same short (avoiding locking conflicts). In the future, we may
wish to coallesce sb_state and sb_flags; for the time being I leave
them separate and there is no additional memory overhead due to the
packing/alignment of shorts in the socket buffer structure.
allocated ressouces should be ultimately freed in gv_destroy_geom()
(when unloading the module and not earlier), but I need to look at this
more closely.
I added bounds checking to the patch and cg improved
the formular.
Submitted by: Andriy Gapon <avg@icyb.net.ua>
PR: kern/65485
Approved by: cg
Reviewed by: imp, rwatson, le
I've had this sitting in my tree for a long time and I can't seem to
find who sent it to me in the first place, apologies to whoever is
missing out on a Contributed by: line here.
I belive it works as it should.
In the end drivers should be building with ALTQ checks by default, but for
now build them with the old macros for non-ALTQ kernels.
Note: Check new features w/ LINT *and* w/ LINT minus the new feature.
Found-by: rwatson
allocation was passed up to nexus. Now, we probe sysresource objects and
manage the resources they describe in a local rman pool. This helps
devices which attach/detach varying resources (like the _CST object) and
module loads/unloads. The allocation/release routines now check to see if
the resource is described in a child sysresource object and if so,
allocate from the local rman. Sysresource objects add their resources to
the pool and reserve them upon boot. This means sysresources need to be
probed before other ACPI devices.
Changes include:
* Add ordering to the child device probe. The current order is: system
resource objects, embedded controllers, then everything else.
* Make acpi_MatchHid take a handle instead of a device_t arg.
* Replace acpi_{get,set}_resource with the generic equivalents.
- Define type rlim_t and struct timeval. This makes autoconf
happier. (PR: 62388)
- Add RLIMIT_AS, which is an alias for our RLIMIT_VMEM.
- structs orlimit and loadavg, as well as macros CP*, should
only appear if __BSD_VISIBLE.
- Use underscored versions of int32_t and fixpt_t in case
<sys/types.h> is not included.
- Document areas of non-conformance.
your (network) modules as well as any userland that might make sense of
sizeof(struct ifnet).
This does not change the queueing yet. These changes will follow in a
seperate commit. Same with the driver changes, which need case by case
evaluation.
__FreeBSD_version bump will follow.
Tested-by: (i386)LINT
conform to the rfc2734 and rfc3146 standard for IP over firewire and
should eventually supercede the fwe driver. Right now the broadcast
channel number is hardwired and we don't support MCAP for multicast
channel allocation - more infrastructure is required in the firewire
code itself to fix these problems.
in a TAILQ. Re-arrange some of the ecb elements so that they can stay
stable through alloc/free cycles while the rest get bzero'd.
- Use the tag_id from the ecb rather than fro the ccb. The latter is only
for target mode.
- Honor the ccb flags for tag_action when deciding whether to do a tagged
or untagged transaction.
- Re-arrange autosense completion so that it works correctly in failure
cases.
- Turn on the PI_TAG_ABLE flag so that CAM will send us tagged transactions.
This enables tagged queueing in the driver.
SOCK_LOCK(so):
- Hold socket lock over calls to MAC entry points reading or
manipulating socket labels.
- Assert socket lock in MAC entry point implementations.
- When externalizing the socket label, first make a thread-local
copy while holding the socket lock, then release the socket lock
to externalize to userspace.
order definition for witness. Send lock before receive lock, and
socket locks after accept but before select:
filedesc -> accept -> so_snd -> so_rcv -> sellck
All routing locks after send lock:
so_rcv -> radix node head
All protocol locks before socket locks:
unp -> so_snd
udp -> udpinp -> so_snd
tcp -> tcpinp -> so_snd
before calling sotryfree().
-- Body of earlier bulk commit this belonged with --
Log:
Extend coverage of SOCK_LOCK(so) to include so_count, the socket
reference count:
- Assert SOCK_LOCK(so) macros that directly manipulate so_count:
soref(), sorele().
- Assert SOCK_LOCK(so) in macros/functions that rely on the state of
so_count: sofree(), sotryfree().
- Acquire SOCK_LOCK(so) before calling these functions or macros in
various contexts in the stack, both at the socket and protocol
layers.
- In some cases, perform soisdisconnected() before sotryfree(), as
this could result in frobbing of a non-present socket if
sotryfree() actually frees the socket.
- Note that sofree()/sotryfree() will release the socket lock even if
they don't free the socket.
Submitted by: sam
Sponsored by: FreeBSD Foundation
Obtained from: BSD/OS
reference count:
- Assert SOCK_LOCK(so) macros that directly manipulate so_count:
soref(), sorele().
- Assert SOCK_LOCK(so) in macros/functions that rely on the state of
so_count: sofree(), sotryfree().
- Acquire SOCK_LOCK(so) before calling these functions or macros in
various contexts in the stack, both at the socket and protocol
layers.
- In some cases, perform soisdisconnected() before sotryfree(), as
this could result in frobbing of a non-present socket if
sotryfree() actually frees the socket.
- Note that sofree()/sotryfree() will release the socket lock even if
they don't free the socket.
Submitted by: sam
Sponsored by: FreeBSD Foundation
Obtained from: BSD/OS
Otherwise, the setting of the PG_M bit by one processor could be lost if
another processor is simultaneously changing the PG_W bit.
Reviewed by: tegge@
rig a PREPEND macro for ALTQ as the POLL/DEQUEUE semantic is very bad in
terms of locking. We make this a full functional queue to allow "bulk
dequeue" which will further reduce the locking overhead (for non-altq
enabled devices). Drivers will access this via the following macros, which
will show up in <net/if_var.h> once we expose ALTQ to the build:
IFQ_DRV_DEQUEUE(ifq, m) - takes a mbuf off the queue (driver queue first)
IFQ_DRV_PREPEND(ifq, m) - pushes a mbuf back to the driver queue
IFQ_DRV_PURGE(ifq) - drops all packets in both queues
IFQ_DRV_IS_EMPTY(ifq) - checks for pending mbufs in either queue
One has to make sure that the first three are protected by a driver mutex.
At the moment most network drivers still require Giant, so this is not an
issue. Even those that have thier own mutex usually hold it in if_start and
the like, so this requirement is almost always satisfied.
This evolved from a discussion with Andrew Gallatin.
protect fields in the socket buffer. Add accessor macros to use the
mutex (SOCKBUF_*()). Initialize the mutex in soalloc(), and destroy
it in sodealloc(). Add addition, add SOCK_*() access macros which
will protect most remaining fields in the socket; for the time being,
use the receive socket buffer mutex to implement socket level locking
to reduce memory overhead.
Submitted by: sam
Sponosored by: FreeBSD Foundation
Obtained from: BSD/OS
that the command succeeded. Sheesh! This makes CDROMs no longer cause an
instant panic at boot. Thanks to Jake Burkholder for providing a remote
test setup.
Also make device resets work, thanks to another typo.
pass any traffic. Unfortunately this means no full-duplex link with auto-
negotiation on hme(4) using DP83840A PHYs again.
I really thought I had tested this also on a Netra t1 100...
- add locking
- disable ALTQ3_COMPAT by default (do not remove the code to keep the diff
towards KAME small)
- put some more code under ALTQ3 conditional compilation as it should be
- account for if_xname
- some more minor compile fixes
As people started wondering:
The strange path layout "altq/altq" is there to avoid "-Isys/contrib" and
make it "-Isys/contrib/altq" instead, as we will need at least <altq/altq.h>
and <altq/if_altq.h> for kernel compilation.
The "freebsd4_..." in the privious commit is just the best tag name in the
KAME tree I could find to classify this in order to track its history. It
does *not* mean that this will go to 4-STABLE or anything of that kind.
HEAD at this point). This will not exactly live in a vendor branch, but have
the vendor backing to make it easier to exchange diffs.
This will be followed by a diff which takes most of the .c files off the
vendor branch in order to:
- add locking
- disable ALTQ3_COMPAT code (which is outdated and "un-lockable")
There is work in progress to refine the configuration API. Import this "as
is" now to have more exposure time before 5-STABLE.
This is only the import, it will be some more days until you will actually
be able to compile ALTQ support into your kernel so don't hold your breath.
HEADUPs will be posted on current@ and net@ before this is actually enabled.
No-objection: re(scottl), core(rwatson)
ruleset, the pcb is looked up once per ipfw_chk() activation.
This is done by extracting the required information out of the PCB
and caching it to the ipfw_chk() stack. This should greatly reduce
PCB looking contention and speed up the processing of UID/GID based
firewall rules (especially with large UID/GID rulesets).
Some very basic benchmarks were taken which compares the number
of in_pcblookup_hash(9) activations to the number of firewall
rules containing UID/GID based contraints before and after this patch.
The results can be viewed here:
o http://people.freebsd.org/~csjp/ip_fw_pcb.png
Reviewed by: andre, luigi, rwatson
Approved by: bmilekic (mentor)