on a disconnected client, without running the time-consuming rsyncs.
This is useful when a build is interrupted and needs to be restarted.
* After we have cleaned up the machine, reset the queue counter by using
pollmachine -queue. This has a race condition if other builds are being
dispatched to the machine (e.g. builds on another branch):
getmachine can claim a directory and increment the counter, then the
machine is polled and finds e.g. 0 chroots in use, and resets the
counter to 0, then claim-chroot is run and the build dispatched, with
the counter now off-by-one. This could be fixed by running
claim-chroot with the .lock held, but this turns out to be too
time-consuming. A two-level lock approach might also fix this
efficiently.
same time, assuming that the admin has already built the INDEX and
INDEX.old in advance.
* Adapt to new method of calculating build concurrency, by summing the
value of ${maxjobs} listed in every portbuild.${machine}
* Support 5-exp builds
(i.e. if the package lists a dependency on the relevant package in the
PACKAGE_BUILDING case). This allows packages that require an
available DISPLAY to again build (with some forthcoming fixes to
existing ports).
Improve the reporting of detected filesystem anomalies (extra files
left behind after deinstallation, changes to and removal of
pre-existing files)
synchronously instead of probabilistically scheduling jobs, which
means that the job load on a machine never exceeds a desired
threshold, and we can preferentially use faster machines when they are
available. This has a dramatic effect on package build throughput,
although I don't yet have precise measurements of the performance
improvements.
Specifically, the changes are:
* Introduce the new variable maxjobs in portbuild. This replaces the
build scheduling weights previously listed in the mlist file, which
now changes format to list the build machines only, ranked in order of
preference for job dispatches (i.e. faster machines first).
* The ${arch}/queue directory is used to list machines available for
jobs (file content is the number of jobs currently running on the
machine). Changes to files in this directory are serialized using
lockf on the .lock file.
* Claim a machine with the getmachine script, with the .lock held.
This picks the machine with the fewestnumber of jobs running, which is
listed highest in the mlist file in case of multiple machines with
equal load. The job counter is incremented, and the file removed if
the counter reaches ${maxjobs} for that machine. If all machines are
busy, sleep for 15 seconds and retry.
* After we have claimed a machine, we run claim-chroot on it to claim
an empty chroot, as before. If the claim fails, release the job from
the queue with the releasemachine script and retry after a 15 second
wait.
* When the build is finished, decrement the job counter with the
releasemachine script, with .lock held.
* The checkmachines script now exists only to poll the load averages
for admin convenience (every 2 minutes), and to ping for unreachable
machines. When a machine cannot be reached, remove the entry in the
queue directory to stop further job dispatches to it. This needs more
work to deal with reinitialization of machines after they become
available again.
synchronously instead of probabilistically scheduling jobs, which
means that the job load on a machine never exceeds a desired
threshold, and we can preferentially use faster machines when they are
available. This has a dramatic effect on package build throughput,
although I don't yet have precise measurements of the performance
improvements.
Specifically, the changes are:
* Introduce the new variable maxjobs in portbuild. This replaces the
build scheduling weights previously listed in the mlist file, which
now changes format to list the build machines only, ranked in order of
preference for job dispatches (i.e. faster machines first).
* The ${arch}/queue directory is used to list machines available for
jobs (file content is the number of jobs currently running on the
machine). Changes to files in this directory are serialized using
lockf on the .lock file.
* Claim a machine with the getmachine script, with the .lock held.
This picks the machine with the fewestnumber of jobs running, which is
listed highest in the mlist file in case of multiple machines with
equal load. The job counter is incremented, and the file removed if
the counter reaches ${maxjobs} for that machine. If all machines are
busy, sleep for 15 seconds and retry.
* After we have claimed a machine, we run claim-chroot on it to claim
an empty chroot, as before. If the claim fails, release the job from
the queue with the releasemachine script and retry after a 15 second
wait.
* When the build is finished, decrement the job counter with the
releasemachine script, with .lock held.
* The checkmachines script now exists only to poll the load averages
for admin convenience (every 2 minutes), and to ping for unreachable
machines. When a machine cannot be reached, remove the entry in the
queue directory to stop further job dispatches to it. This needs more
work to deal with reinitialization of machines after they become
available again.
Additional changes to this file:
* Exit if passed a null package name, to avoid badness later on
* Send a nag-mail if pkg-plist errors are detected in the build
/rescue/mount -t linprocfs, so assume that the i386 build hosts have
statically-built copies of the necessary binaries in /sbin, until this is
fixed.
Create /usr/X11R6 inside the chroot so that mtree has something to do, since
this directory is otherwise orphaned.
List the extra/removed/changed files separately, and colour-code the
serious errors (files left behind outside of /usr/local and /usr/X11R^;
files removed that were installed by another port, and files with changed
permissions or ownership)
the port deinstall; mtree does not recurse into subdirectories it does
not know about
* Break out the 'files incorrectly removed' and 'files incorrectly changed'
into their own sections
* Remove USE_QT2 since it's obsolete now. [2]
* Clarify comments about ARCH. [3]
* Speedup 'make readmes'. Add a perl script "Tools/make_readmes"
and modify bsd.port.subdir.mk to avoid recursing into individual
port directories to create README.html. [4]
* Fix 'make search' to allow case insensitive search on 5-x/6-x. [5]
* Add the possibility to search the ports by category. [6]
* Remove tk42 and tcl76 from virtual categories since they're
obsolete. [7]
* Introduce new variable - DISTVERSION, vendor version of the
distribution, that can be set instead of PORTVERSION and is
automatically converted in a conforming PORTVERSION. [8]
* Use --suffix instead of -b option for patch(1) to make it
compatible with BSD patch(1) [9]
* Fix {WANT,WITH}_MYSQL_VER behavior, to deal with conflicting
versions. [10]
PR: ports/68895 [1], ports/69486 [2], ports/68539 [3],
ports/70018 [4], ports/68896 [5], ports/73299 [6],
ports/73570 [7], ports/67171 [8], ports/72182 [9]
Submitted by: linimon [1][3], arved [2][7], cperciva [4],
Matthew Seaman <m.seaman@infracaninophile.co.uk> [5],
Radek Kozlowski <radek@raadradd.com> [6],
eik [8], Andreas Hauser <andy-freebsd@splashground.de> [9],
clement [10]
restricted ports' instead of 'don't build any restricted ports' since
the former is useful when we're not intending to publish the results
of a build, but the latter is not.
Move the build preprocessing (directory setup, old build rotation,
etc) out from under -nobuild, so that we can set up a new build using
that option.
${arch}/${branch}/latest/${portdir}. We will use this in the
processfail script, so that the "new package build errors" webpages do
not have out-of-date links but instead link to the most recent copy of
the build error.
that it may be called by hand.
Support new portbuild.conf variables
client_user = user to connect to on the client (not necessarily
root). This user must have write permission to the
/var/portbuild tree if disconnected=1 (i.e. we're
going to run rsync).
rsync_gzip = set to "-z" to enable compression on low-bandwidth
disconnected clients.
Approved by: portmgr (self)
ssh times out)
* Support new portbuild.conf settings:
client_user = user to connect to on the client (not necessarily root)
sudo_cmd = If ssh'ing to a non-root user, run this command to gain
root privs (set to empty string for client_user=root,
or sudo for !root). Cannot require interactivity, of
course.
Approved by: portmgr (self)
because this file is a chronological history of port builds that have
failed, the files listed may not be present in the current set of
error logs, and we currently have no easy way to find the most recent
failure log to use instead.
i386-5-latest that are linked to from the index.html are symlinks to
dated directories (e.5.`date`), so the URLs in the error reports will
expire with the start of the next build when the symlink is repointed.
This change makes the URLs in the error reports use the realpath of
the target file, so they do not expire.
* Clients no longer have ssh access to the master, so we need to
push/pull everything on the client from here. This means we need to
know where the build took place so we can go in and get the files
after it finishes. Introduce the claim-chroot script which
atomically claims a free chroot directory on the host and returns
the name. This directory is later populated by the portbuild script
if it does not already contain an extracted bindist.
* Use the per-node portbuild.$(hostname) config file to decide where
in the filesystem to claim the chroot on the build host.
* If a port failed unexpectedly (i.e. is not marked BROKEN), or if
something strange happened when trying to pull in build results from
a client, then send me email (XXX should be configurable).
* Clean up after the build finishes and we have everything we need, by
dispatching the clean-chroot script on the client.
if requested (".keep" file in the port directory), no matter where
we fail.
* Add package dependencies before the corresponding build stage
(e.g. FETCH_DEPENDS before 'make fetch'), and remove them again
afterwards. This allows us to catch ports that list their
dependencies too early/late.
* No need to check for set[ug]id files here, the security-check target
in bsd.port.mk does it for us.
* Exclude some more directories and files from showing up in the mtree
before/after comparison, to trim down the false-positive in the
pkg-plist check.
* Other minor changes
it's done properly^Wbetter in makeparallel
* Script accepts new arguments:
-nodoccvs: skip cvs update of the doc tree
-trybroken: try to build BROKEN ports (off by default because the
i386 cluster is fast enough now that when doing incremental builds we
were spending most of the time rebuilding things we know are probably
going to fail anyway. Conversely, the other clusters are slow enough
that we also usually don't want to waste time on BROKEN ports).
-incremental: compare the interesting fields of the new INDEX with
the previous one, remove packages and log files for the old ports that
have changed, and rebuild the rest. This substantially cuts down on
build times since we don't rebuild ports that we know have not
changed. XXX checkpoint of work-in-progress, not yet working as
committed.
* When setting up the nodes, read in per-node config files
("portbuild.$(hostname)") before dispatching the setupnode script on
each node. For disconnected nodes (which don't mount the master via
NFS), we also rsync the interesting files required by the builds
(ports/src/doc trees, bindist tarballs, scripts) into place on the
client. They will be mounted locally via nullfs in the build chroots.
* Break out the restricted.sh generation into a makerestr script so it
can be called manually as needed.
* Remove the -nocvsup argument which has been unused for a long time.
* For now, don't prune the list of failed ports with prunefail,
since when -trybroken is not specified, every BROKEN port ends up in
the duds file (so the build is skipped), and as a result we would
prune almost everything from the list of failed ports. XXX
prunefailure should be run conditionally on -trybroken, or I should
find a way to prune in both cases.
* Don't run index in the background, it was thrashing against makeduds
and not saving any time by doing it concurrently.
* Build with 'make quickports all' to kick off the quickports builds
earlier.
* Delete restricted and/or cdrom distfiles *after* post-processing the
distfiles, otherwise the script doesn't remove any of them since
they're not in the expected place.
* Miscellaneous other minor changes and cleanups
tells us whether the node has NFS access to the master.
* Also copy the bindist-$(hostname).tar file to allow local
customization of the build chroots (e.g. resolv.conf and make.conf
files for disconnected systems)
* For disconnected hosts, we don't copy the bindist files from the
master, but just set up the local directories and let the server rsync
them into place later. Also set up dangling symlinks to the bindist
files in the build area, which will be filled in by the server too (in
the NFS case it makes sense to cache the bindist files locally to
avoid extra NFS traffic, but here we know the file is local so a
symlink is fine)
* Remove an apparently spurious 'killall fetch' that snuck in for what
were probably transient reasons.
* Forcibly clean up old chroot directories since we are preparing to
start another build and don't want old (possibly orphaned) builds to
skew the job scheduling or use up resources.