release of Giant around the direct manipulation of the vm_object and
the optional call to pmap_object_init_pt().
o In vm_map_findspace(), remove GIANT_REQUIRED. Instead, acquire and
release Giant around the occasional call to pmap_growkernel().
o In vm_map_find(), remove GIANT_REQUIRED.
when machdep.tsc_freq returned a negative number on a 2.2GHz Xeon.
Submitted by: Brian Harrison <bharrison@ironport.com>
Reviewed by: phk
MFC after: 1 week
packet forwarding state ("annotations") during ip processing.
The code is considerably cleaner now.
The variables removed by this change are:
ip_divert_cookie used by divert sockets
ip_fw_fwd_addr used for transparent ip redirection
last_pkt used by dynamic pipes in dummynet
Removal of the first two has been done by carrying the annotations
into volatile structs prepended to the mbuf chains, and adding
appropriate code to add/remove annotations in the routines which
make use of them, i.e. ip_input(), ip_output(), tcp_input(),
bdg_forward(), ether_demux(), ether_output_frame(), div_output().
On passing, remove a bug in divert handling of fragmented packet.
Now it is the fragment at offset 0 which sets the divert status of
the whole packet, whereas formerly it was the last incoming fragment
to decide.
Removal of last_pkt required a change in the interface of ip_fw_chk()
and dummynet_io(). On passing, use the same mechanism for dummynet
annotations and for divert/forward annotations.
option IPFIREWALL_FORWARD is effectively useless, the code to
implement it is very small and is now in by default to avoid the
obfuscation of conditionally compiled code.
NOTES:
* there is at least one global variable left, sro_fwd, in ip_output().
I am not sure if/how this can be removed.
* I have deliberately avoided gratuitous style changes in this commit
to avoid cluttering the diffs. Minor stule cleanup will likely be
necessary
* this commit only focused on the IP layer. I am sure there is a
number of global variables used in the TCP and maybe UDP stack.
* despite the number of files touched, there are absolutely no API's
or data structures changed by this commit (except the interfaces of
ip_fw_chk() and dummynet_io(), which are internal anyways), so
an MFC is quite safe and unintrusive (and desirable, given the
improved readability of the code).
MFC after: 10 days
release of Giant. (Annotate as MPSAFE.)
o Also, in vnode_pager_alloc(), remove an unnecessary re-initialization
of struct vm_object::flags and move a statement that is duplicated
in both branches of an if-else.
filesystem expands the inode to 256 bytes to make space for 64-bit
block pointers. It also adds a file-creation time field, an ability
to use jumbo blocks per inode to allow extent like pointer density,
and space for extended attributes (up to twice the filesystem block
size worth of attributes, e.g., on a 16K filesystem, there is space
for 32K of attributes). UFS2 fully supports and runs existing UFS1
filesystems. New filesystems built using newfs can be built in either
UFS1 or UFS2 format using the -O option. In this commit UFS1 is
the default format, so if you want to build UFS2 format filesystems,
you must specify -O 2. This default will be changed to UFS2 when
UFS2 proves itself to be stable. In this commit the boot code for
reading UFS2 filesystems is not compiled (see /sys/boot/common/ufsread.c)
as there is insufficient space in the boot block. Once the size of the
boot block is increased, this code can be defined.
Things to note: the definition of SBSIZE has changed to SBLOCKSIZE.
The header file <ufs/ufs/dinode.h> must be included before
<ufs/ffs/fs.h> so as to get the definitions of ufs2_daddr_t and
ufs_lbn_t.
Still TODO:
Verify that the first level bootstraps work for all the architectures.
Convert the utility ffsinfo to understand UFS2 and test growfs.
Add support for the extended attribute storage. Update soft updates
to ensure integrity of extended attribute storage. Switch the
current extended attribute interfaces to use the extended attribute
storage. Add the extent like functionality (framework is there,
but is currently never used).
Sponsored by: DARPA & NAI Labs.
Reviewed by: Poul-Henning Kamp <phk@freebsd.org>
As the comment in the code says, eventually there will be a proper
data structure (e.g NetBSD's struct m_tag) to store chains of
annotations, and mbuf-handling procedures will handle these chains
in the correct way.
Right now, these chains do not exist, and we just use the constants
defined here to implement simple ad-hoc solutions to remove some global
variables used so far to pass around informations about packets
being processed.
Global variables are not only ugly and make the code unreadable, they
also prevent from using parallelism in network stack processing.
(the 3-days MFC only refers to this commit, i.e. the PACKET_TAG_*
constants; the full mechanism will be committed and MFC'ed on a
longer timescale).
MFC after: 3 days
a linked list. This is to allow the merging of the mount
options in the MNT_UPDATE case, as the current data structure
is unsuitable for this.
There are no functional differences in this commit.
Reviewed by: phk
for example, break an sbrk(>=4GB) on 64-bit architectures
even if the resource limit allowed it.
o Correct an off-by-one error.
o Correct a spelling error in a comment.
o Reorder an && expression so that the commonly FALSE expression
comes first.
Submitted by: bde (bullets 1 and 2)
4 u_ints but needs to be an array of 4 uint32_t's to work, at least
if unsigned ints have less than 32 bits. It should be a non-array of
1 uint128_t on 128-bit machines, especially if u_int has 128 bits.
The headers that declare uint32_t (actually __uint32_t) are intentionally
not included here since this header should only be included by other
headers.
Fixed some style bugs (space instead of tab after #ifndef and #endif).
implementations can provide a base zero ffs function if they wish.
This changes
#define RQB_FFS(mask) (ffs64(mask))
foo = RQB_FFS(mask) - 1;
to
#define RQB_FFS(mask) (ffs64(mask) - 1)
foo = RQB_FFS(mask);
On some platforms we can get the "- 1" for free, eg: those that use the
C code for ffs64().
Reviewed by: jake (in principle)
Consequently, use vm_map_insert() and vm_map_delete(), which expect
the vm_map to be locked, instead of vm_map_find() and vm_map_remove(),
which do not.
Register the ISR early, but do not actually kick off the timer until we
see some activity. This still saves us from running the arp timers on
a system with no network cards.
- Added a mutex, kld_mtx, to protect the kernel_linker system. Note that
while ``classes'' is global (to that file), it is only read only after
SI_SUB_KLD, SI_ORDER_ANY.
- Add a SYSINIT to flip a flag that disallows class registration after
SI_SUB_KLD, SI_ORDER_ANY.
Idea for ``classes'' read only by: jake
Reviewed by: jake
allocator.
- Properly set M_ZERO when talking to the back end page allocators for
non malloc zones. This forces us to zero fill pages when they are first
brought into a cache.
- Properly handle M_ZERO in uma_zalloc_internal. This fixes a problem where
per cpu buckets weren't always getting zeroed.
in a user process gaining visibility into the 'old' contents of a filesystem
block. There were two cases: (1) when uiomove() fails (user process issues
illegal write), and (2) when uiomove() overlaps a mmap() of the same file at
the same offset (fault -> recursive buffer I/O reads contents of old block).
Unfortunately 1.72 also had the unintended effect of forcing the filesystem
to do a read-before-write in the case of a full-block-write (non append case),
e.g. 'dd if=/dev/zero of=test.dat bs=1m count=256 conv=notrunc'. This
destroys performance.. not only is a read forced for every write, but
clustering breaks as well.
The solution is to clear the buffer manually in the full-block case rather
then asking BALLOC to do it (BALLOC issues the read-before-write). In the
partial-block case we want BALLOC to do it because the read-before-write
is necessary. This patch should greatly improve database and news-feed
server performance.
Found by: MKI <mki@mozone.net>
MFC after: 3 days
uifind() with a proc lock held.
change_ruid() and change_euid() have been modified to take a uidinfo
structure which will be pre-allocated by callers, they will then
call uihold() on the uidinfo structure so that the caller's logic
is simplified.
This allows one to call uifind() before locking the proc struct and
thereby avoid a potential blocking allocation with the proc lock
held.
This may need revisiting, perhaps keeping a spare uidinfo allocated
per process to handle this situation or re-examining if the proc
lock needs to be held over the entire operation of changing real
or effective user id.
Submitted by: Don Lewis <dl-freebsd@catspoiler.org>
release of Giant.
o Reduce the scope of GIANT_REQUIRED in vm_map_insert().
These changes will enable us to remove the acquisition and release
of Giant from obreak().
so that /dev/mumble can be the entrypoint to some networking graph,
e.g. a tunnel or a remote tape drive or whatever...
Not fully tested (by me) yet.
Submitted by: Mark Santcroos <marks@ripe.net>
MFC after: 3 weeks
This facilitates the use in circumstances where you are using a serial
console as well. GDB doesn't support anything higher than 9600 baud (19k2
if you are lucky), but the console does.
allocated slabs and bucket caches for free items. It will not go ask the vm
for pages. This differs from M_NOWAIT in that it not only doesn't block, it
doesn't even ask.
- Add a new zcreate option ZONE_VM, that sets the BUCKETCACHE zflag. This
tells uma that it should only allocate buckets out of the bucket cache, and
not from the VM. It does this by using the M_NOVM option to zalloc when
getting a new bucket. This is so that the VM doesn't recursively enter
itself while trying to allocate buckets for vm_map_entry zones. If there
are already allocated buckets when we get here we'll still use them but
otherwise we'll skip it.
- Use the ZONE_VM flag on vm map entries and pv entries on x86.
during the previous probe are stale.
What really should be done is route the probe through
device_probe_and_attach bit this is one of those ICBBATIASS (I can't be
bothered as there is a simpler solution). The user can easily replug the
device after kldloading a new device driver.
'make load' if an object dir was, like it is used in /sys/modules. I.e.
cd /sys/modules/umass
make obj
make
make load
works again without having to install the module.
If no objdir was used the module in the current directory is used.
magic numbers. Use stxa_sync instead of stxa; membar #Sync; to ensure
that no instruction is placed between the two. This can cause random
corruption even though interrupts are already disabled.
o Move pmap_pageable() outside of Giant in vm_fault_unwire().
(pmap_pageable() is a no-op on all supported architectures.)
o Remove the acquisition and release of Giant from mlock().
CAM_QUIRK_HILUN devices we loop thru 32bits of lun. Oops.
Switch to using USEC_DELAY rather than USEC_SLEEP at isp_reset time.
Try to paper around a defect in clients that don't correctly registers
themeselves with the fabric nameserver.
Minor updates for Mirapoint support- they still use code that is not
HANDLE_LOOPSTATE_IN_OUTER_LAYERS, and, surprise surprise, this old
stuff had some bugs in it.
Clean up some target mode stuff.
MFC after: 1 week
topology, speed, loopid, WWPN/WWNN, etc.
Beef up target mode. Add isp_handle_platform_notify_scsi and
isp_handle_platform_notify_fc platform handlers to handle immediate
notifies (isp_handle_platform_notify_scsi is still stubbed out).
In implementation of isp_handle_platform_notify_fc, for IN_ABORT_TASK,
peel off a pending XPT_IMMED_NOTIFY and call xpt_done on it and hope
that somebody upstream is listening.
Make sure on final CTIO2s that we set residual correctly. These are
absolutely crucial. Make sure we set relative offset for each CTIO2
based upon bytes we've already xferred. This is what the private
adjunct datat to the original ATIO is. Note state of command so
we can figure out where to find it if we get an ABORT from the firmware.
Make sure we *always* set CAM_TAG_ACTION_VALID for ATIO2s. Make sure
we keep track of the original lun.
If se sent status (or we're otherwise done with the command), don't
forget to free the adjunct structure.
(so we can, when things get lost, find out who currently is processing
on behalf of this open exchange. Invariably, when things are lost and
wedged, it's CAM).
Keep an atio resource counter locally.
MFC after: 1 week
running ABOUT FIRMWARE with some that were started by BIOS downloads).
Redo CTIO2 dma mapping- use continuation segments instead of multiple
CTIO2s. Thanks to Veritas for sponsoring this work (in a different
context).
MFC after: 1 week
to *not* do flow control based upon resource counts for the firmware.
Increase default immediate notify count to 16.
Change isp_target_async to a function returning an integer.
within the HARP atm stack and the hea and hfa device drivers, but since
all of these systems were changed to use UMA zones, there is no use for
the api any longer.
vm_map_user_pageable().
o Remove vm_map_pageable() and vm_map_user_pageable().
o Remove vm_map_clear_recursive() and vm_map_set_recursive(). (They were
only used by vm_map_pageable() and vm_map_user_pageable().)
Reviewed by: tegge
the necesary uma_zcreate() and uma_zdestroy calls into module loading
handler and the device attach handling.
- Change the related HARP netatm code to use UMA zone functions when
dealing with the zones that were formerly the ATM interface (hea, hfa)
storage pools.
- Have atm_physif_freenifs() now get passed an uma_zone_t so that we can
properly free the allocated NIF's back to their zone.
This should be the last commit to remove any code that makes use of the
netatm storage pool api. I will be removing the api code within the near
future.
Reviewed by: mdodd
indication of whether this happenned so the calling function
knows whether or not to unlock the pcb.
Submitted by: Jennifer Yang (yangjihui@yahoo.com)
Bug reported by: Sid Carter (sidcarter@symonds.net)
Ensure that the syn cache's syn-ack packets contain the same
ip_tos, ip_ttl, and DF bits as all other tcp packets.
PR: 39141
MFC after: 2 weeks
This time, make sure that ipv4 specific code (aka all of the above)
is only run in the ipv4 case.
adding vnode to hash. The fix is to use atomic hash-lookup-and-add-if-
not-found operation. The odd thing is that this race can't happen
actually because the lowervp vnode is locked exclusively now during the
whole process of null node creation. This must be thought as a step
toward shared lookups.
Also remove vp->v_mount checks when looking for a match in the hash,
as this is the vestige.
Also add comments and cosmetic changes.
be a few bits left to clean from the HARP code in terms of what is using
the storage pools; once that's done, the memory management code can be
removed entirely.
This commit effectively changes the use of dynamic memory routines from
atm_allocate, atm_free, atm_release_pool to uma_zcreate, uma_zalloc,
uma_zfree, uma_zdestroy.
handshake between the ISR and the worker thread. Move the mutex lock
so that it only protects the cv_wait. This elimiates the not sleeping
with pccbb1 held messages some people were seeing.
Reviewed by: jhb (at least an early version)
code. Both tasks are not always performed completely by the firmware.
The former is required to get some e450 models to boot; the latter fixes
the repeated fifo underruns with hme(4)s and gem(4)s observed on some
machines (and probably performance problems with other peripherals as
well).
do not blunder around enabling interrupts and running trap handlers.
trap_pfault() will normally pass control to ddb's fault handler which
will normally do the right thing.
This bug is very old. but in old versions of FreeBSD it is probably only
serious for trap handling that involves sleeping. In -current, attempting
to examine unmapped memory while stopped at a breakpoint at mi_switch()
was always fatal.
Submitted by: tegge
o Eliminate the "!mapentzone" check from vm_map_entry_create() and
vm_map_entry_dispose(). Reviewed by: tegge
o Fix white-space usage in vm_map_entry_create().
or user vm_maps. This implementation has two key benefits when compared
to vm_map_{user_,}pageable(): (1) it avoids a race condition through
the use of "in-transition" vm_map entries and (2) it eliminates lock
recursion on the vm_map.
Note: there is still an error case that requires clean up.
Reviewed by: tegge
obviously bogous return value of ad1816chan_setformat().
PR: 37932
Submitted by: Martin Kaeske <Martin.Kaeske@Stud.TU-Ilmenau.DE>
Reviewed by: hm
MFC after: 10 days
o Add a stub for vm_map_wire().
Note: the description of the previous commit had an error. The in-
transition flag actually blocks the deallocation of a vm_map_entry by
vm_map_delete() and vm_map_simplify_entry().
in their tlb which the prom doesn't clear out, so we have to do so manually
before mapping the kernel page table or the cpu can hang due various
conditions which cause undefined behaviour from the tlb.
or user vm_maps. In accordance with the standards for munlock(2),
and in contrast to vm_map_user_pageable(), this implementation does not
allow holes in the specified region. This implementation uses the
"in transition" flag described below.
o Introduce a new flag, "in transition," to the vm_map_entry.
Eventually, vm_map_delete() and vm_map_simplify_entry() will respect
this flag by deallocating in-transition vm_map_entrys, allowing
the vm_map lock to be safely released in vm_map_unwire() and (the
forthcoming) vm_map_wire().
o Modify vm_map_simplify_entry() to respect the in-transition flag.
In collaboration with: tegge
options do. Comments should be in NOTES and having the comments in two
places usually means that one place will just bitrot. Thus, remove the
comment for KTRACE_REQUEST_POOL from the previous revision.
Requested by: bde
- ktrace no longer requires Giant so do ktrace syscall events before and
after acquiring and releasing Giant, respectively.
- For i386, ia32 syscalls on ia64, powerpc, and sparc64, get rid of the
goto bad hack and instead use the model on ia64 and alpha were we
skip the actual syscall invocation if error != 0. This fixes a bug
where if we the copyin() of the arguments failed for a syscall that
was not marked MP safe, we would try to release Giant when we had
not acquired it.
operations to dump a ktrace event out to an output file are now handled
asychronously by a ktrace worker thread. This enables most ktrace events
to not need Giant once p_tracep and p_traceflag are suitably protected by
the new ktrace_lock.
There is a single todo list of pending ktrace requests. The various
ktrace tracepoints allocate a ktrace request object and tack it onto the
end of the queue. The ktrace kernel thread grabs requests off the head of
the queue and processes them using the trace vnode and credentials of the
thread triggering the event.
Since we cannot assume that the user memory referenced when doing a
ktrgenio() will be valid and since we can't access it from the ktrace
worker thread without a bit of hassle anyways, ktrgenio() requests are
still handled synchronously. However, in order to ensure that the requests
from a given thread still maintain relative order to one another, when a
synchronous ktrace event (such as a genio event) is triggered, we still put
the request object on the todo list to synchronize with the worker thread.
The original thread blocks atomically with putting the item on the queue.
When the worker thread comes across an asynchronous request, it wakes up
the original thread and then blocks to ensure it doesn't manage to write a
later event before the original thread has a chance to write out the
synchronous event. When the original thread wakes up, it writes out the
synchronous using its own context and then finally wakes the worker thread
back up. Yuck. The sychronous events aren't pretty but they do work.
Since ktrace events can be triggered in fairly low-level areas (msleep()
and cv_wait() for example) the ktrace code is designed to use very few
locks when posting an event (currently just the ktrace_mtx lock and the
vnode interlock to bump the refcoun on the trace vnode). This also means
that we can't allocate a ktrace request object when an event is triggered.
Instead, ktrace request objects are allocated from a pre-allocated pool
and returned to the pool after a request is serviced.
The size of this pool defaults to 100 objects, which is about 13k on an
i386 kernel. The size of the pool can be adjusted at compile time via the
KTRACE_REQUEST_POOL kernel option, at boot time via the
kern.ktrace_request_pool loader tunable, or at runtime via the
kern.ktrace_request_pool sysctl.
If the pool of request objects is exhausted, then a warning message is
printed to the console. The message is rate-limited in that it is only
printed once until the size of the pool is adjusted via the sysctl.
I have tested all kernel traces but have not tested user traces submitted
by utrace(2), though they should work fine in theory.
Since a ktrace request has several properties (content of event, trace
vnode, details of originating process, credentials for I/O, etc.), I chose
to drop the first argument to the various ktrfoo() functions. Currently
the functions just assume the event is posted from curthread. If there is
a great desire to do so, I suppose I could instead put back the first
argument but this time make it a thread pointer instead of a vnode pointer.
Also, KTRPOINT() now takes a thread as its first argument instead of a
process. This is because the check for a recursive ktrace event is now
per-thread instead of process-wide.
Tested on: i386
Compiles on: sparc64, alpha
when a thread is in the ktrace subsystem to avoid ktrace'ing internal
ktrace events.
- Update the locking notes for p_traceflag and p_tracep taking into account
the new ktrace_lock mutex.