freebsd

mirror of https://git.FreeBSD.org/src.git synced 2024-12-19 10:53:58 +00:00

Author	SHA1	Message	Date
Archie Cobbs	48d183faca	Fix a bug in m_split(): the "m->m_ext.ext_size" field of an mbuf was being set to zero. This field indicates the total space in the external buffer and therefore should not be modified after the external buffer is added. Add a comment warning that the mbufs returned by m_split() might be read-only. Fix M_TRAILINGSPACE() to return zero if !M_WRITABLE(m). Reviewed by: freebsd-net Obtained from: Vernier Networks, Inc. MFC after: 1 week	2002-05-31 22:09:57 +00:00
Jeffrey Hsu	4037698769	Fix corner case where m_len was not being initialized. Submitted by: Maksim Yevmenkin <myevmenk@digisle.net> MFC after: 1 week	2002-04-12 00:01:50 +00:00
Matthew Dillon	ecde8f7c29	Get rid of the twisted MFREE() macro entirely. Reviewed by: dg, bmilekic MFC after: 3 days	2002-02-05 02:00:56 +00:00
David E. O'Brien	a48740b6c5	Update to C99, s/__FUNCTION__/__func__/.	2001-12-10 05:51:45 +00:00
Julian Elischer	a8cfc0ee40	Forgot to remove this un-needed test. (M_WAITOK won't fail) I vaguely remember someone once proving it COULD return NULL.. was that changed? Reminded by: BDE MFC after: 2 weeks	2001-08-19 04:30:13 +00:00
Bosko Milekic	08442f8a82	Introduce numerous SMP friendly changes to the mbuf allocator. Namely, introduce a modified allocation mechanism for mbufs and mbuf clusters; one which can scale under SMP and which offers the possibility of resource reclamation to be implemented in the future. Notable advantages: o Reduce contention for SMP by offering per-CPU pools and locks. o Better use of data cache due to per-CPU pools. o Much less code cache pollution due to excessively large allocation macros. o Framework for `grouping' objects from same page together so as to be able to possibly free wired-down pages back to the system if they are no longer needed by the network stacks. Additional things changed with this addition: - Moved some mbuf specific declarations and initializations from sys/conf/param.c into mbuf-specific code where they belong. - m_getclr() has been renamed to m_get_clrd() because the old name is really confusing. m_getclr() HAS been preserved though and is defined to the new name. No tree sweep has been done "to change the interface," as the old name will continue to be supported and is not depracated. The change was merely done because m_getclr() sounds too much like "m_get a cluster." - TEMPORARILY disabled mbtypes statistics displaying in netstat(1) and systat(1) (see TODO below). - Fixed systat(1) to display number of "free mbufs" based on new per-CPU stat structures. - Fixed netstat(1) to display new per-CPU stats based on sysctl-exported per-CPU stat structures. All infos are fetched via sysctl. TODO (in order of priority): - Re-enable mbtypes statistics in both netstat(1) and systat(1) after introducing an SMP friendly way to collect the mbtypes stats under the already introduced per-CPU locks (i.e. hopefully don't use atomic() - it seems too costly for a mere stat update, especially when other locks are already present). - Optionally have systat(1) display not only "total free mbufs" but also "total free mbufs per CPU pool." - Fix minor length-fetching issues in netstat(1) related to recently re-enabled option to read mbuf stats from a core file. - Move reference counters at least for mbuf clusters into an unused portion of the cluster itself, to save space and need to allocate a counter. - Look into introducing resource freeing possibly from a kproc. Reviewed by (in parts): jlemon, jake, silby, terry Tested by: jlemon (Intel & Alpha), mjacob (Intel & Alpha) Preliminary performance measurements: jlemon (and me, obviously) URL: http://people.freebsd.org/~bmilekic/mb_alloc/	2001-06-22 06:35:32 +00:00
Bosko Milekic	f5eece3fb9	Change m_devget()'s outdated and unused `offset' argument to actually mean something: offset into the first mbuf of the target chain before copying the source data over. Make drivers using m_devget() with a first argument "data - ETHER_ALIGN" to use the offset argument to pass ETHER_ALIGN in. The way it was previously done is potentially dangerous if the source data was at the top of a page and the offset caused the previous page to be copied (if the previous page has not yet been appropriately mapped). The old `offset' argument in m_devget() is not used anywhere (it's always 0) and dates back to ~1995 (and earlier?) when support for ethernet trailers existed. With that support gone, it was merely collecting dust. Tested on alpha by: jlemon Partially submitted by: jlemon Reviewed by: jlemon MFC after: 3 weeks	2001-06-20 19:48:35 +00:00
Peter Wemm	db957588c9	Patch up a blunder I made a few days ago. nmbcnt was being initialized too late. Noted by: bmilekic Pointy-hat to: peter	2001-06-13 00:36:41 +00:00
Hajimu UMEMOTO	3384154590	Sync with recent KAME. This work was based on kame-20010528-freebsd43-snap.tgz and some critical problem after the snap was out were fixed. There are many many changes since last KAME merge. TODO: - The definitions of SADB_* in sys/net/pfkeyv2.h are still different from RFC2407/IANA assignment because of binary compatibility issue. It should be fixed under 5-CURRENT. - ip6po_m member of struct ip6_pktopts is no longer used. But, it is still there because of binary compatibility issue. It should be removed under 5-CURRENT. Reviewed by: itojun Obtained from: KAME MFC after: 3 weeks	2001-06-11 12:39:29 +00:00
Peter Wemm	0978669829	"Fix" the previous initial attempt at fixing TUNABLE_INT(). This time around, use a common function for looking up and extracting the tunables from the kernel environment. This saves duplicating the same function over and over again. This way typically has an overhead of 8 bytes + the path string, versus about 26 bytes + the path string.	2001-06-08 05:24:21 +00:00
Peter Wemm	4422746fdf	Back out part of my previous commit. This was a last minute change and I botched testing. This is a perfect example of how NOT to do this sort of thing. :-(	2001-06-07 03:17:26 +00:00
Peter Wemm	81930014ef	Make the TUNABLE_() macros look and behave more consistantly like the SYSCTL_() macros. TUNABLE_INT_DECL() was an odd name because it didn't actually declare the int, which is what the name suggests it would do.	2001-06-06 22:17:08 +00:00
David E. O'Brien	240ef84277	Back out jesper's 2001/05/31 14:58:11 PDT commit. It does not compile.	2001-06-01 09:51:14 +00:00
Jesper Skriver	e916d96e64	Move the definition of NMBCLUSTERS from src/sys/kern/uipc_mbuf.c to <sys/param.h>, so it's available to src/sys/netinet/ip_input.c, and remove the now unneeded includes of "opt_param.h". MFC after: 1 week	2001-05-31 21:56:44 +00:00
Bosko Milekic	629db60492	Increment mbstat.m_mpfail, not mbstat.m_mcfail, when m_pullup() fails. This slipped in accidently a few commits back.	2001-05-23 20:44:54 +00:00
Mark Murray	fb919e4d5a	Undo part of the tangle of having sys/lock.h and sys/mutex.h included in other "system" header files. Also help the deprecation of lockmgr.h by making it a sub-include of sys/lock.h and removing sys/lockmgr.h form kernel .c files. Sort sys/*.h includes where possible in affected files. OK'ed by: bde (with reservations)	2001-05-01 08:13:21 +00:00
Bosko Milekic	d04d50d1f7	Fix inconsistency in setup of kernel_map: we need to make sure that we also reserve _adequate_ space for the mb_map submap; i.e. we need space for nmbclusters, nmbufs, _and_ nmbcnt. Furthermore, we need to rounddown, and not roundup, so that we are consistent. Pointed out by: bde	2001-04-18 23:54:13 +00:00
Bosko Milekic	4b8ae40a7c	- Change the msleep()s to condition variables. The mbuf and mcluster free lists now each "own" a condition variable, m_starved. - Clean up minor indentention issues in sys/mbuf.h caused by previous commit.	2001-04-03 04:50:13 +00:00
Alfred Perlstein	5ea487f34d	Use only one mutex for the entire mbuf subsystem. Don't use atomic operations for the stats updating, instead protect the counts with the mbuf mutex. Most twiddling of the stats was done right before or after releasing a mutex. By doing this we reduce the number of locked ops needed as well as allow a sysctl to gain a consitant view of the entire stats structure. In the future... This will allow us to chain common mbuf operations that would normally need to aquire/release 2 or 3 of the locks to build an mbuf with a cluster or external data attached into a single op requiring only one lock. Simplify the per-cpu locks that are planned. There's also some if (1) code that should check if the "how" operation specifies blocking/non-blocking behavior, we _could_ make it so that we hold onto the mutex through calls into kmem_alloc when non-blocking requests are made, but for safety reasons we currently drop and reaquire the mutex around the calls. Also, note that calling kmem_alloc is rare and only happens during a shortage so drop/re-getting the mutex will not be a common occurance. Remove some #define's that seemed to obfuscate the code to me. Remove an extranious comment. Remove an XXX, including mutex.h isn't a crime. Reviewed by: bmilekic	2001-04-03 03:15:11 +00:00
Bosko Milekic	2ba1a89559	Move the atomic() mbstat.m_drops incrementing to the MGET(HDR) and MCLGET macros in order to avoid incrementing the drop count twice. Otherwise, in some cases, we may increment m_drops once in m_mballoc() for example, and increment it again in m_mballoc_wait() if the wait fails.	2001-03-24 23:47:52 +00:00
Bosko Milekic	9612101e4c	Fix a couple of things in the internal mbuf allocation interface: - Make sure that m_mballoc() really doesn't allow over nmbufs mbufs to be allocated from mb_map. In the case where nmbufs-reserved space is not an exact multiple of PAGE_SIZE (which it should be, but anyway...), we hold nmbufs as an absolute maximum which need not ever be reached. - Clean up m_clalloc(); make it more consistent in the sense that the first argument `ncl' really means "the number of clusters ensured to be allocated" and not "the number of pages worth of clusters to be allocated," as was previously the case. This also makes it consistent with m_mballoc() as well as the comment that preceeds it. Reviewed by: jlemon	2001-03-17 23:23:24 +00:00
Boris Popov	03137ec82e	Fix parameter order in the calls to MGET().	2001-02-21 09:24:13 +00:00
Luigi Rizzo	5fe86675f0	Preserve alignment of first mbuf in m_copypacket. This is useful when doing copies of packet where some leading space has been preallocated to insert protocol headers. Note that there are in fact almost no users of m_copypacket. MFC candidate.	2001-02-20 08:23:41 +00:00
Bosko Milekic	fffd12bd72	Implement m_getm() which will perform an "all or nothing" mbuf + cluster allocation, as required. If m_getm() receives NULL as a first argument, then it allocates `len' (second argument) bytes worth of mbufs + clusters and returns the chain only if it was able to allocate everything. If the first argument is non-NULL, then it should be an existing mbuf chain (e.g. pre-allocated mbuf sitting on a ring, on some list, etc.) and so it will allocate `len' bytes worth of clusters and mbufs, as needed, and append them to the tail of the passed in chain, only if it was able to allocate everything requested. If allocation fails, only what was allocated by the routine will be freed, and NULL will be returned. Also, get rid of existing m_getm() in netncp code and replace calls to it to calls to this new generic code. Heavily Reviewed by: bp	2001-02-14 05:13:04 +00:00
Bosko Milekic	122a814af5	Long awaited style fixup in mbuf code. Get rid of K&R style prototyping and function argument declarations. Make sure that functions that are supposed to return a pointer return NULL in case of failure. Don't cast NULL. Finally, get rid of annoying `register' uses.	2001-02-11 05:02:06 +00:00
Bosko Milekic	9ed346bab0	Change and clean the mutex lock interface. mtx_enter(lock, type) becomes: mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks) mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized) similarily, for releasing a lock, we now have: mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN. We change the caller interface for the two different types of locks because the semantics are entirely different for each case, and this makes it explicitly clear and, at the same time, it rids us of the extra `type' argument. The enter->lock and exit->unlock change has been made with the idea that we're "locking data" and not "entering locked code" in mind. Further, remove all additional "flags" previously passed to the lock acquire/release routines with the exception of two: MTX_QUIET and MTX_NOSWITCH The functionality of these flags is preserved and they can be passed to the lock/unlock routines by calling the corresponding wrappers: mtx_{lock, unlock}_flags(lock, flag(s)) and mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN locks, respectively. Re-inline some lock acq/rel code; in the sleep lock case, we only inline the _obtain_lock()s in order to ensure that the inlined code fits into a cache line. In the spin lock case, we inline recursion and actually only perform a function call if we need to spin. This change has been made with the idea that we generally tend to avoid spin locks and that also the spin locks that we do have and are heavily used (i.e. sched_lock) do recurse, and therefore in an effort to reduce function call overhead for some architectures (such as alpha), we inline recursion for this case. Create a new malloc type for the witness code and retire from using the M_DEV type. The new type is called M_WITNESS and is only declared if WITNESS is enabled. Begin cleaning up some machdep/mutex.h code - specifically updated the "optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently need those. Finally, caught up to the interface changes in all sys code. Contributors: jake, jhb, jasone (in no particular order)	2001-02-09 06:11:45 +00:00
John Baldwin	5dbc7fe2d7	Don't bother with acquiring/releasing Giant around kmem_malloc() and kmem_free() for now. Kmem_malloc() and kmem_free() now have appropriate assertions in place, and these checks aren't feasible until more of the networking code is locked down. Also, the extra assertions here should already be caught by the WITNESS code as lock order violations should mutex operations on Giant be reintroduced here later.	2001-02-08 00:27:38 +00:00
Bosko Milekic	56acb799b2	When short of mbufs or mbuf clusters, we sleep on appropriate "counters." The counters are incremented when a thread goes to sleep and decremented either when a thread is woken up by another thread or when the sleep times out. There existed a race where the sleep count could be decremented twice resulting in an eventual underflow. Move the decrementing of the "counters" to the thread initiating the sleep and thus remedy the problem.	2001-01-20 21:29:10 +00:00
Bosko Milekic	35c05ac61b	Add some KASSERTs valid if WITNESS is defined to verify that the mbuf allocation routines are being called safely. Since we drop our relevant mbuf mutex and acquire Giant before we call kmem_malloc(), we have to make sure that this does not pave the way for a fatal lock order reversal. Check that either Giant is already held (in which case it's safe to grab it again and recurse on it) or, if Giant is not held, that no other locks are held before we try to acquire Giant. Similarily, add a KASSERT valid in the WITNESS case in m_reclaim() to nail callers who end up in m_reclaim() and hold a lock. Pointed out by: jhb	2001-01-16 01:53:13 +00:00
Bosko Milekic	d113d3857e	In m_mballoc_wait(), drop the mmbfree mutex lock prior to calling m_reclaim() and re-acquire it when m_reclaim() returns. This means that we now call the drain routines without holding the mutex lock and recursing into it. This was done for mainly two reasons: (i) Avoid the long recursion; long recursions are typically bad and this is the case here because we block all other code from freeing mbufs if they need to. Doing that is kind of counter-productive, since we're really hoping that someone will free. (ii) More importantly, avoid a potential lock order reversal. Right now, not all the locks have been added to our networking code; but without this change, we're introducing the possibility for deadlock. Consider for example ip_drain(). We will likely eventually introduce a lock for ipq there, and so ip_freef() will be called with ipq lock held. But, ip_freef() calls m_freem() which in turn acquires the mmbfree lock. Since we were previously calling ip_drain() with mmbfree held, our lock order would be: mmbfree->ipq->mmbfree. Some other code may very well lock ipq first and then call ip_freef(). This would result in the regular lock order, ipq->mmbfree. Clearly, we have deadlock if one thread acquires the ipq lock and sits waiting for mmbfree while another thread calling m_reclaim() acquires mmbfree and sits waiting for the ipq lock. Also, make sure to add a comment above m_reclaim()'s definition briefly explaining this. Also document this above the call to m_reclaim() in m_mballoc_wait(). Suggested and reviewed by: alfred	2001-01-09 23:58:56 +00:00
Bosko Milekic	2a0c503e7a	* Rename M_WAIT mbuf subsystem flag to M_TRYWAIT. This is because calls with M_WAIT (now M_TRYWAIT) may not wait forever when nothing is available for allocation, and may end up returning NULL. Hopefully we now communicate more of the right thing to developers and make it very clear that it's necessary to check whether calls with M_(TRY)WAIT also resulted in a failed allocation. M_TRYWAIT basically means "try harder, block if necessary, but don't necessarily wait forever." The time spent blocking is tunable with the kern.ipc.mbuf_wait sysctl. M_WAIT is now deprecated but still defined for the next little while. * Fix a typo in a comment in mbuf.h * Fix some code that was actually passing the mbuf subsystem's M_WAIT to malloc(). Made it pass M_WAITOK instead. If we were ever to redefine the value of the M_WAIT flag, this could have became a big problem.	2000-12-21 21:44:31 +00:00
John Baldwin	35e0e5b311	Catch up to moving headers: - machine/ipl.h -> sys/ipl.h - machine/mutex.h -> sys/mutex.h	2000-10-20 07:58:15 +00:00
Bosko Milekic	181d2a1564	Add nmbcnt sysctl and make it tunable at boottime; nmbcnt is the number of ext_buf counters that are possibly allocatable. Do this because: (i) It will make it easier to influence EXT_COUNTERS for if_sk, if_ti (or similar) users where the driver allocates its own ext_bufs and where it is important for the mbuf system to take it into account when reserving necessary space for counters. (ii) Facilitate some percentile calculation for netstat(1)	2000-10-15 06:24:07 +00:00
Bosko Milekic	7d03271452	Big mbuf subsystem diff #1 : incorporate mutexes and fix things up somewhat to accomodate the changes. Here's a list of things that have changed (I may have left out a few); for a relatively complete list, see http://people.freebsd.org/~bmilekic/mtx_journal * Remove old (once useful) mcluster code for MCLBYTES > PAGE_SIZE which nobody uses anymore. It was great while it lasted, but now we're moving onto bigger and better things (Approved by: wollman). * Practically re-wrote the allocation macros in sys/sys/mbuf.h to accomodate new allocations which grab the necessary lock. * Make sure that necessary mbstat variables are manipulated with corresponding atomic() routines. * Changed the "wait" routines, cleaned it up, made one routine that does the job. * Generalized MWAKEUP() macro. Got rid of m_retry and m_retryhdr, as they are now included in the generalized "wait" routines. * Sleep routines now use msleep(). * Free lists have locks. * etc... probably other stuff I'm missing... Things to look out for and work on later: * find a better way to (dynamically) adjust EXT_COUNTERS * move necessity to recurse on a lock from drain routines by providing lock-free lower-level version of MFREE() (and possibly m_free()?). * checkout include of mutex.h in sys/sys/mbuf.h - probably violating general philosophy here. The code has been reviewed quite a bit, but problems may arise... please, don't panic! Send me Emails: bmilekic@freebsd.org Reviewed by: jlemon, cp, alfred, others?	2000-09-30 06:30:39 +00:00
Peter Wemm	9579e8c145	m_mballoc_wait() had a spl/tsleep race. mbufs can be freed in interrupt context, which can cause a wakeup.. which can race with this.	2000-08-25 22:28:08 +00:00
David Malone	a5c4836d39	Replace the mbuf external reference counting code with something that should be better. The old code counted references to mbuf clusters by using the offset of the cluster from the start of memory allocated for mbufs and clusters as an index into an array of chars, which did the reference counting. If the external storage was not a cluster then reference counting had to be done by the code using that external storage. NetBSD's system of linked lists of mbufs was cosidered, but Alfred felt it would have locking issues when the kernel was made more SMP friendly. The system implimented uses a pool of unions to track external storage. The union contains an int for counting the references and a pointer for forming a free list. The reference counts are incremented and decremented atomically and so should be SMP friendly. This system can track reference counts for any sort of external storage. Access to the reference counting stuff is now through macros defined in mbuf.h, so it should be easier to make changes to the system in the future. The possibility of storing the reference count in one of the referencing mbufs was considered, but was rejected 'cos it would often leave extra mbufs allocated. Storing the reference count in the cluster was also considered, but because the external storage may not be a cluster this isn't an option. The size of the pool of reference counters is available in the stats provided by "netstat -m". PR: 19866 Submitted by: Bosko Milekic <bmilekic@dsuper.net> Reviewed by: alfred (glanced at by others on -net)	2000-08-19 08:32:59 +00:00
Alfred Perlstein	9ad48853de	mbstat should be a read-only sysctl. Submitted by: Bosko Milekic <bmilekic@dsuper.net>	2000-07-31 09:24:32 +00:00
Alfred Perlstein	af0e6bcdf0	Make mbstat.m_mtypes seperate and viewable via sysctl, also expand the size from short to ulong Submitted by: Ian Dowse <iedowse@maths.tcd.ie> PR: kern/19809	2000-07-15 06:02:48 +00:00
Jun-ichiro itojun Hagino	686cdd19b1	sync with kame tree as of july00. tons of bug fixes/improvements. API changes: - additional IPv6 ioctls - IPsec PF_KEY API was changed, it is mandatory to upgrade setkey(8). (also syntax change)	2000-07-04 16:35:15 +00:00
Mike Smith	736e4b67ae	Actively limit the allocation of mbufs to NMBUFS/nmbufs and mbuf clusters to NMBCLUSTERS/nmbclusters/kern.ipc.nmbclusters. Add a read-only sysctl kern.ipc.nmbufs matching kern.ipc.nmbclusters. Submitted by: Bosko Milekic <bmilekic@dsuper.net>	1999-12-28 06:35:57 +00:00
Eivind Eklund	6357e7b507	Make m_print const correct (avoids a warning)	1999-12-20 18:10:00 +00:00
Brian Feldman	2cfa173c50	Woops, I'm so sorry I forgot this! From the last mbuf.h change: m_mballoc_wakeup() (inline) -> MMBWAKEUP() (macro) m_clalloc_wakeup() (inline) -> MCLWAKEUP() (macro) Noticed by: peter	1999-12-18 20:04:19 +00:00
Brian Feldman	6c54410230	Bug fix: The variables "m_mclalloc_wid" and "m_mballoc_wid" were not in the proper place. They should have been in uipc_mbuf.c and have been global, not in mbuf.h and local per each file that uses mbuf.h. Sorta bug fix: In mbuf.h, the definitions of various things for KERNEL and not KERNEL cases were very screwy. This fixes all of that which I could find.	1999-12-14 02:23:14 +00:00
Brian Feldman	f48b807fc0	This is Bosko Milekic's mbuf allocation waiting code. Basically, this means that running out of mbuf space isn't a panic anymore, and code which runs out of network memory will sleep to wait for it. Submitted by: Bosko Milekic <bmilekic@dsuper.net> Reviewed by: green, wollman	1999-12-12 05:52:51 +00:00
Archie Cobbs	1c38f2ea70	The functions m_copym() and m_copypacket() return read-only copies, because in the case of mbuf clusters they only increment the reference count rather than actually copying the data. Add comments to this effect, and add a new routine called m_dup() that returns a real, writable copy of an mbuf chain. This is preliminary work required for implementing 'ipfw tee'. Reviewed by: julian	1999-12-01 22:31:32 +00:00
Peter Wemm	eed9fd425d	Fix a warning.	1999-11-18 06:29:57 +00:00
Poul-Henning Kamp	ce4a64f787	New function: m_print(struct mbuf *); hexdumps a mbuf.	1999-11-01 15:03:20 +00:00
Alfred Perlstein	e0a653ddba	change identical and "programming error" panic("mcopy*")'s into more verbose messages using KASSERT. Reviewed by: eivind, des	1999-10-13 09:55:42 +00:00
Peter Wemm	c3aac50f28	$Id$ -> $FreeBSD$	1999-08-28 01:08:13 +00:00
Mike Smith	134c934ce7	Move the initialisation/tuning of nmbclusters from param.c/machdep.c into uipc_mbuf.c. This reduces three sets of identical tunable code to one set, and puts the initialisation with the mbuf code proper. Make NMBUFs tunable as well. Move the nmbclusters sysctl here as well. Move the initialisation of maxsockets from param.c to uipc_socket2.c, next to its corresponding sysctl. Use the new tunable macros for the kern.vm.kmem.size tunable (this should have been in a separate commit, whoops).	1999-07-05 08:52:54 +00:00

1 2

90 Commits