1
0
mirror of https://git.FreeBSD.org/src.git synced 2024-12-29 12:03:03 +00:00
Commit Graph

86 Commits

Author SHA1 Message Date
Mark Johnston
08cfa56ea3 Extend uma_reclaim() to permit different reclamation targets.
The page daemon periodically invokes uma_reclaim() to reclaim cached
items from each zone when the system is under memory pressure.  This
is important since the size of these caches is unbounded by default.
However it also results in bursts of high latency when allocating from
heavily used zones as threads miss in the per-CPU caches and must
access the keg in order to allocate new items.

With r340405 we maintain an estimate of each zone's usage of its
(per-NUMA domain) cache of full buckets.  Start making use of this
estimate to avoid reclaiming the entire cache when under memory
pressure.  In particular, introduce TRIM, DRAIN and DRAIN_CPU
verbs for uma_reclaim() and uma_zone_reclaim().  When trimming, only
items in excess of the estimate are reclaimed.  Draining a zone
reclaims all of the cached full buckets (the previous behaviour of
uma_reclaim()), and may further drain the per-CPU caches in extreme
cases.

Now, when under memory pressure, the page daemon will trim zones
rather than draining them.  As a result, heavily used zones do not incur
bursts of bucket cache misses following reclamation, but large, unused
caches will be reclaimed as before.

Reviewed by:	jeff
Tested by:	pho (an earlier version)
MFC after:	2 months
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D16667
2019-09-01 22:22:43 +00:00
Jeff Roberson
c168508655 Add two new kernel options to control memory locality on NUMA hardware.
- UMA_XDOMAIN enables an additional per-cpu bucket for freed memory that
   was freed on a different domain from where it was allocated.  This is
   only used for UMA_ZONE_NUMA (first-touch) zones.
 - UMA_FIRSTTOUCH sets the default UMA policy to be first-touch for all
   zones.  This tries to maintain locality for kernel memory.

Reviewed by:	gallatin, alc, kib
Tested by:	pho, gallatin
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D20929
2019-08-06 21:50:34 +00:00
Gleb Smirnoff
bec2d7e9a2 The KVM code also needs a fix similar to r344269.
Reported by:	pho
2019-05-29 03:14:46 +00:00
Gleb Smirnoff
aa43298309 With r343051 UMA switched from atomic counts to counter(9) and now kernel
reports snap counts of how much a zone alloced and how much it freed.  It
may happen that snap values doesn't match, e.g alloced - freed < 0.
Workaround that in memstat library.

Reported by:	pho
2019-02-18 21:27:13 +00:00
Gleb Smirnoff
cf64f5197b This was missed in r343051: make uz_allocs, uz_frees and uz_fails counter(9). 2019-01-15 18:47:19 +00:00
Gleb Smirnoff
bb15d1c778 o Move zone limit from keg level up to zone level. This means that now
two zones sharing a keg may have different limits. Now this is going
  to work:

  zone = uma_zcreate();
  uma_zone_set_max(zone, limit);
  zone2 = uma_zsecond_create(zone);
  uma_zone_set_max(zone2, limit2);

  Kegs no longer have uk_maxpages field, but zones have uz_items. When
  set, it may be rounded up to minimum possible CPU bucket cache size.
  For small limits bucket cache can also be reconfigured to be smaller.
  Counter uz_items is updated whenever items transition from keg to a
  bucket cache or directly to a consumer. If zone has uz_maxitems set and
  it is reached, then we are going to sleep.

o Since new limits don't play well with multi-keg zones, remove them. The
  idea of multi-keg zones was introduced exactly 10 years ago, and never
  have had a practical usage. In discussion with Jeff we came to a wild
  agreement that if we ever want to reintroduce the idea of a smart allocator
  that would be able to choose between two (or more) totally different
  backing stores, that choice should be made one level higher than UMA,
  e.g. in malloc(9) or in mget(), or whatever and choice should be controlled
  by the caller.

o Sleeping code is improved to account number of sleepers and wake them one
  by one, to avoid thundering herd problem.

o Flag UMA_ZONE_NOBUCKETCACHE removed, instead uma_zone_set_maxcache()
  KPI added. Having no bucket cache basically means setting maxcache to 0.

o Now with many fields added and many removed (no multi-keg zones!) make
  sure that struct uma_zone is perfectly aligned.

Reviewed by:	markj, jeff
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D17773
2019-01-15 00:02:06 +00:00
Mateusz Guzik
f38828cbc7 libmemstat: adjust for per-cpu stats after r338899
Reported by:	yuripv
Reviewed by:	kib, markj
Approved by:	re (gjb)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17490
2018-10-11 23:25:14 +00:00
Dag-Erling Smørgrav
6bff85ff9a Reduce <sys/queue.h> pollution.
While <sys/sysctl.h> includes <sys/queue.h> unconditionally, it is only
actually used in code which is conditional on _KERNEL.  Make the #include
itself conditional as well, and fix userland code that uses <sys/queue.h>
for other purposes but relied on <sys/sysctl.h> to bring it in.

MFC after:	1 week
2018-05-11 00:01:43 +00:00
Jeff Roberson
ab3185d15e Implement NUMA support in uma(9) and malloc(9). Allocations from specific
domains can be done by the _domain() API variants.  UMA also supports a
first-touch policy via the NUMA zone flag.

The slab layer is now segregated by VM domains and is precise.  It handles
iteration for round-robin directly.  The per-cpu cache layer remains
a mix of domains according to where memory is allocated and freed.  Well
behaved clients can achieve perfect locality with no performance penalty.

The direct domain allocation functions have to visit the slab layer and
so require per-zone locks which come at some expense.

Reviewed by:	Attilio (a slightly older version)
Tested by:	pho
Sponsored by:	Netflix, Dell/EMC Isilon
2018-01-12 23:25:05 +00:00
Pedro F. Giffuni
5e53a4f90f lib: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using mis-identified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
2017-11-26 02:00:33 +00:00
Bryan Drewery
ea825d0274 DIRDEPS_BUILD: Update dependencies.
Sponsored by:	Dell EMC Isilon
2017-10-31 00:07:04 +00:00
Justin Hibbits
4026b44790 Fix buildworld for powerpc.
vmpage requires struct pmap to exist and contain a pm_stats field.  As of
r308817, either AIM or BOOKE is required to be set in order to get their
respective pmap structs.  Rather than expose them both, or try to unify them
unnecessarily, add a third option which contains only a pm_stats field, and
change the two existing pmap structures to place the common fields at the
beginning of the struct.  This actually fixes the stats collection by libkvm on
AIM hardware, because before it was accessing a possibly different offset, which
would cause it to read garbage.

Bump __FreeBSD_version to denote this ABI change, so that ports which depend on
libkvm can be rebuilt.
2016-11-20 06:10:12 +00:00
Glen Barber
30922917c8 MFH
Sponsored by:	The FreeBSD Foundation
2016-02-10 04:20:39 +00:00
Gleb Smirnoff
b28cc462ad Include sys/_task.h into uma_int.h, so that taskqueue.h isn't a
requirement for uma_int.h.

Suggested by:	jhb
2016-02-09 20:22:35 +00:00
Glen Barber
bbb51924bb MFH
Sponsored by:	The FreeBSD Foundation
2016-02-08 12:16:01 +00:00
Glen Barber
a70cba9582 First pass through library packaging.
Sponsored by:	The FreeBSD Foundation
2016-02-04 21:16:35 +00:00
Gleb Smirnoff
9508a0e1fe Fix build. 2016-02-04 00:23:21 +00:00
Bryan Drewery
7b3ea376a2 META MODE: Prefer INSTALL=tools/install.sh to lessen the need for xinstall.host.
This both avoids some dependencies on xinstall.host and allows
bootstrapping on older releases to work due to lack of at least 'install -l'
support.

Sponsored by:	EMC / Isilon Storage Division
2015-11-25 19:10:28 +00:00
Simon J. Gerraty
ccfb965433 Add META_MODE support.
Off by default, build behaves normally.
WITH_META_MODE we get auto objdir creation, the ability to
start build from anywhere in the tree.

Still need to add real targets under targets/ to build packages.

Differential Revision:       D2796
Reviewed by: brooks imp
2015-06-13 19:20:56 +00:00
Simon J. Gerraty
44d314f704 dirdeps.mk now sets DEP_RELDIR 2015-06-08 23:35:17 +00:00
Simon J. Gerraty
98e0ffaefb Merge sync of head 2015-05-27 01:19:58 +00:00
Baptiste Daroussin
6b129086dc Convert libraries to use LIBADD
While here reduce a bit overlinking
2014-11-25 11:07:26 +00:00
Simon J. Gerraty
ee7b0571c2 Merge head from 7/28 2014-08-19 06:50:54 +00:00
Baptiste Daroussin
2b7af31cf5 use .Mt to mark up email addresses consistently (part3)
PR:		191174
Submitted by:	Franco Fichtner  <franco at lastsummer.de>
2014-06-23 08:23:05 +00:00
Simon J. Gerraty
fae50821ae Updated dependencies 2014-05-16 14:09:51 +00:00
Simon J. Gerraty
76b28ad6ab Updated dependencies 2014-05-10 05:16:28 +00:00
Simon J. Gerraty
9d2ab4a62d Merge head 2014-04-27 08:13:43 +00:00
Gleb Smirnoff
345e3f4dd7 Expose real size of UMA allocations via libmemstat(3).
Sponsored by:	Nginx, Inc.
2014-02-10 20:09:10 +00:00
Simon J. Gerraty
d1d0158641 Merge from head 2013-09-05 20:18:59 +00:00
Jeff Roberson
fc03d22b17 Refine UMA bucket allocation to reduce space consumption and improve
performance.

 - Always free to the alloc bucket if there is space.  This gives LIFO
   allocation order to improve hot-cache performance.  This also allows
   for zones with a single bucket per-cpu rather than a pair if the entire
   working set fits in one bucket.
 - Enable per-cpu caches of buckets.  To prevent recursive bucket
   allocation one bucket zone still has per-cpu caches disabled.
 - Pick the initial bucket size based on a table driven maximum size
   per-bucket rather than the number of items per-page.  This gives
   more sane initial sizes.
 - Only grow the bucket size when we face contention on the zone lock, this
   causes bucket sizes to grow more slowly.
 - Adjust the number of items per-bucket to account for the header space.
   This packs the buckets more efficiently per-page while making them
   not quite powers of two.
 - Eliminate the per-zone free bucket list.  Always return buckets back
   to the bucket zone.  This ensures that as zones grow into larger
   bucket sizes they eventually discard the smaller sizes.  It persists
   fewer buckets in the system.  The locking is slightly trickier.
 - Only switch buckets in zalloc, not zfree, this eliminates pathological
   cases where we ping-pong between two buckets.
 - Ensure that the thread that fills a new bucket gets to allocate from
   it to give a better upper bound on allocation time.

Sponsored by:	EMC / Isilon Storage Division
2013-06-18 04:50:20 +00:00
Simon J. Gerraty
7cf3a1c6b2 Updated dependencies 2013-03-11 17:21:52 +00:00
Simon J. Gerraty
f5f7c05209 Updated dependencies 2013-02-16 01:23:54 +00:00
Simon J. Gerraty
7cd2dcf076 Updated/new Makefile.depend 2012-11-08 21:24:17 +00:00
Simon J. Gerraty
23090366f7 Sync from head 2012-11-04 02:52:03 +00:00
Matthew D Fleming
bb196eb480 Const-ify the zone name argument to uma_zcreate(9).
MFC after:	3 days
2012-10-26 17:51:05 +00:00
Marcel Moolenaar
7750ad47a9 Sync FreeBSD's bmake branch with Juniper's internal bmake branch.
Requested by: Simon Gerraty <sjg@juniper.net>
2012-08-22 19:25:57 +00:00
Glen Barber
3102cfe2e2 Fix various typos in manual pages.
Submitted by:	amdmi3
PR:		165431
MFC after:	1 week
2012-02-25 14:31:25 +00:00
Sergey Kandaurov
cfc9e655ba Cosmetic cleanup: remove #define LIBMEMSTAT used to prevent a nested
include of opt_vmpage.h from vm/vm_page.h.  opt_vmpage.h was retired
before 7.0 together with options PQ_NOOPT.

Approved by:	re (kib)
MFC after:	3 days
2011-09-02 14:10:42 +00:00
Sergey Kandaurov
1882360b9b Get rid of MAXCPU knowledge used for internal needs only. Switch to
dynamic memory allocation to hold per-CPU memory types data (sized to
mp_maxid for UMA, and to mp_maxcpus for malloc to match the kernel).

That fixes libmemstat with arbitrary large MAXCPU values and therefore
eliminates MEMSTAT_ERROR_TOOMANYCPUS error type.

Reviewed by:	jhb
Approved by:	re (kib)
2011-08-01 09:43:35 +00:00
Attilio Rao
1de471dfee Revert r222363, as bde@ pointed out the initial solution was far more
correct.
2011-05-31 20:59:53 +00:00
Attilio Rao
d361ed4b1c Style fix: cast to size_t rather than u_long when comparing to sizeof()
rets.

Requested by:	kib
2011-05-27 16:01:51 +00:00
Attilio Rao
d40d5f64a2 Sync with -CURRENT 2011-05-10 18:01:53 +00:00
Attilio Rao
be720a4061 Fix a mismerge. 2011-05-08 14:45:53 +00:00
Attilio Rao
34e4a6f408 Revert MAXCPU introduction. In userland it is always 1.
Noted by:	marcel
2011-05-08 14:29:25 +00:00
Attilio Rao
71a19bdc64 Commit the support for removing cpumask_t and replacing it directly with
cpuset_t objects.
That is going to offer the underlying support for a simple bump of
MAXCPU and then support for number of cpus > 32 (as it is today).

Right now, cpumask_t is an int, 32 bits on all our supported architecture.
cpumask_t on the other side is implemented as an array of longs, and
easilly extendible by definition.

The architectures touched by this commit are the following:
- amd64
- i386
- pc98
- arm
- ia64
- XEN

while the others are still missing.
Userland is believed to be fully converted with the changes contained
here.

Some technical notes:
- This commit may be considered an ABI nop for all the architectures
  different from amd64 and ia64 (and sparc64 in the future)
- per-cpu members, which are now converted to cpuset_t, needs to be
  accessed avoiding migration, because the size of cpuset_t should be
  considered unknown
- size of cpuset_t objects is different from kernel and userland (this is
  primirally done in order to leave some more space in userland to cope
  with KBI extensions). If you need to access kernel cpuset_t from the
  userland please refer to example in this patch on how to do that
  correctly (kgdb may be a good source, for example).
- Support for other architectures is going to be added soon
- Only MAXCPU for amd64 is bumped now

The patch has been tested by sbruno and Nicholas Esborn on opteron
4 x 12 pack CPUs. More testing on big SMP is expected to came soon.
pluknet tested the patch with his 8-ways on both amd64 and i386.

Tested by:	pluknet, sbruno, gianni, Nicholas Esborn
Reviewed by:	jeff, jhb, sbruno
2011-05-05 14:39:14 +00:00
Attilio Rao
1d221389b2 Remove the redefinition of MEMSTAT_MAXCPU and just use MAXCPU for that.
Reviewed by:	sbruno
2011-05-02 17:13:40 +00:00
Attilio Rao
77bdc950f4 MFC @ r221286 2011-05-01 00:48:03 +00:00
Joel Dahl
799162a628 Spelling fixes. 2010-08-03 17:40:09 +00:00
Sean Bruno
bf96595915 Add a new column to the output of vmstat -z to indicate the number
of times the system was forced to sleep when requesting a new allocation.

Expand the debugger hook, db_show_uma, to display these results as well.

This has proven to be very useful in out of memory situations when
it is not known why systems have become sluggish or fail in odd ways.

Reviewed by:	rwatson alc
Approved by:	scottl (mentor) peter
Obtained from:	Yahoo Inc.
2010-06-15 19:28:37 +00:00
Ulrich Spörlein
aa12cea2cc mdoc: order prologue macros consistently by Dd/Dt/Os
Although groff_mdoc(7) gives another impression, this is the ordering
most widely used and also required by mdocml/mandoc.

Reviewed by:	ru
Approved by:	philip, ed (mentors)
2010-04-14 19:08:06 +00:00