nodes, but most regularly on sparc64. (Occasionally, on amd64 and ia64).
For reasons I haven't been quite able to track down, on some occasions
a pkg_add command is unable to extract a dependency; the tarfile shows
up as being truncated. This does not seem to be due to disk-low or
memory-low conditions, nor is it a problem with scp; the md5 on the file
is fine when examined afterwards.
The only clue so far is that it seems to happen on systems with the most
package builds running simultaneously -- and thus, possibly more than one
pkg_add running in parallel.
analyzed for how much they will slow this script down; consider this a
rush-job.)
- dirent denotes some change in the usage of dirent.h.
- termios denotes the deprecation of <sys/termios.h>.
- uname denotes the hiding of the uname symbol. This has been backed
out in src so let's hope this case can go away soon.
- utmp_x denotes the replacement of utmp.h with utmpx.h.
Together these catch ~150 new errors on i386-9. However, there are more
that are not caught (second-order effects.)
available on the ftp mirrors alongside the packages [1]
- While I'm here, remove a NOOP check for FreeBSD 4.x -exp
PR: 135024 [1]
Requested by: Dominic Fandrey <kamikaze@bsdforen.de>
- bring this closer to the default FreeBSD page style
- remove unsupported releases
- remove the date stamps, which no longer work
- remove obsolete commented-out junk
Discussed on: portmgr, some time ago
list : lists available builds
clone : creates a new build by cloning a previous one
portsupdate : update a ports tree to the latest ZFS snapshot
srcupdate : update a src tree to the latest ZFS snapshot
cleanup : clean up or remove a build on the clients
destroy : remove a build on the server
There is some trickiness here in that various commands either expect
to run as root, or expect to run as a ports-* user. For the latter
case we can easily use su to proxy as the ports user when running as
root; for the former we use the buildproxy to validate and re-execute
the command as root.
the ports-* users. Currently it is not possible to delegate
management of ZFS filesystems to non-root users, so root privilege
is required to manipulate them. We validate the command passed on
a local domain socket and re-execute the build script with the requested
parameters.
ports and source trees. Since we have >=1 consumer of these trees
that run frequently but do not insist on up-to-the-second trees, it
makes sense to "pre-update" them regularly and then then re-use in all
of the consumers, instead of potentially doing several updates
simultaneously or on demand. Consumers can clone the ZFS snapshot
into their local filesystem which takes a couple of seconds instead of
minutes or tens of minutes for the CVS update.
We update to a date stamp instead of "." because this avoids
ambiguity of commits that happen while the tree update is in progress
(unfortunately it's slower).
listed filesystem we take a new snapshot each time it is run and if
the last full backup was not too long ago, do a compressed incremental
backup from the previous backup.
* Catch up to build ID directory changes
* Support a meta-hostname of 'all' for setting up all clients at once.
This is better than the old way of running one copy of the script
for each client by hand, since it is easier and involves less
duplicated work.
* We copy in the per-build ports, src, and bindist .tbz files and .md5
checksums, as well as refreshing the build scripts and
bindist-$(hostname).tar customization tarball.
* The -force switch forces copying of files and re-extraction of the
tarballs on the client. This is necessary in order to propagate
local changes to the tarballs after the initial client setup
(e.g. if you need to change a file in the ports tree, it must be
recompressed, redistributed, and re-extracted on the client).
* The -queue switch will poll the client's job queue after completion
of the setup. This is racy and should only be used when the machine
is not currently accepting jobs.
* For cleaning up a build the 'build cleanup' command should now be
used instead. It calls back into this command but also allows full
clenaup of build-local files on the client.
TODO: "all" setups are hard on the server since they may spawn dozens
of rsyncs at once. A better solution would be to have a worker pool
of setup tasks to limit the maximum load.
* Catch up to build ID directory changes
* Make it easier to kill a build by not running dopackages in the background
where it is detached from shell job control. Now, sending a termination
signal to this process (e.g. ^C) will also kill off the dopackages script
and in turn the processes created by it. Some background processes
spawned by dopackages, pdispatch, etc, may still remain and need to be
killed by hand.
* Catch up to build ID directory changes
* Improve usage()
* Fix a variety of small bugs
* Remove support for -ftp builds: we have not supported direct
uploading for many years due to the desire to manually inspect build
output for quality
* All data associated to a build is now localized in its own directory
named according to a build ID:
/var/portbuild/${arch}/${branch}/builds/${buildid}, where ${buildid}
is the creation time. These are actually ZFS filesystems.
* Tasks such as cloning a new build, updating a ZFS snapshot, and
cleaning up a build are exported to the "build" script, which can be
used independently.
* Creating a new build is done by ZFS cloning and takes a couple of
seconds since it is copy-on-write (i.e. no data needs to be copied).
* Ports and source trees are also cloned from pre-updated ZFS images
(updated regularly from the "updatesnap" cron job). In most cases
we do not care if we are building a ports tree that is an hour or so
old since it will become outdated almost immediately anyway, so no
matter what we do there will be times when a port has been fixed by
the time the build error is generated by a client.
* In case an up-to-the-second tree is desired, the -portscvs and
-srccvs switches update the existing ports tree via CVS.
* -noports and -nosrc can be used to prevent any automatic changes to
the ports tree. This is useful for dealing with local
modifications (e.g. for -exp builds), since the default when
creating a new build is to replace the previous trees with fresh,
pristine trees. If you forget to use this then any local changes
that are not also present in other trees will be lost.
* By default we keep two builds for each arch/branch pair. These
build IDs also may be referred to via "latest" and "previous"
symlinks. When creating a new build, the old "previous" build is
destroyed by default, unless it was originally created using the
-keep switch. This prevents the build from being destroyed
automatically.
* By default when a build finishes all of the clients are completely
cleaned up (i.e. all build data such as ports trees, tarballs,
client chroots, etc are deleted). This is needed to save space on
the clients. If you expect to *immediately* perform further builds
after this one completes, the -nocleanup switch prevents this step.
Otherwise they will just be set up again if further builds are
scheduled.
* Try to parallelize build pre-processing as much as possible, by
running jobs in the background wherever possible. In several places
we operate on the same parts of the filesystem from multiple jobs,
so we can make good use of caching to improve performance
* Clients no longer need to be set up explicitly at the start of the
build, they will be set up on-demand when the first job is
dispatched to them. This allows fast clients or those that already
have been set up to begin building ports as soon as possible, while
slow clients are set up in the background. It also improves
robustness of client recovery, e.g. if the client was offline at the
time of build startup but later brought back online.
* Optimize copying back in the previous set of restricted packages by
hardlinking instead of copying.
TODO: The record of failed ports is arch/branch-global still. This is
the only thing preventing us from running concurrent builds of the
same arch/branch (e.g. while one is stuck building openoffice, the
next build can start to keep the cluster busy). The difficulty is
that one build from a later ports tree may signal that a build was
successful, then a phase 2 build from an earlier ports tree may
indicate that it was broken. The solution is probably to migrate this
to a real database instead of a flat file, and query it for the set of
broken ports as of a certain ports tree date.
* Clients no longer mount ports/src trees via NFS (even the FreeBSD.org
local clients). This was putting too much load on the server and
slowing down builds.
* Instead ports and src .tbz files are pushed to the clients and
unpacked. MD5 checksums are used to verify correctness
* -force forces re-extraction of the tarballs even if they exist and
appear to be checked out
* Also unpack the compressed bindist
TODO: When we are not using md or ZFS builds it would be even faster
to keep an unpacked copy of the bindist on the scratch filesystem and
hardlink the files into the target directory
* Catch up to build ID directory changes
* Improved support for ZFS builds
* Improved robustness
* Report status verbosely to the caller; whether we succeeded in claiming
a chroot, whether the caller needs to first set up the client, or
whether a setup is in progress.
* If we discover that the client has not been set up either because it
freshly booted and newfs'ed its filesystem, or because a particular
build has not yet been encountered, atomically claim a cookie and
report this to the caller to act on
* Catch up to build ID directory changes
* Add helper functions for resolving a build ID symlink and
validating an arch/branch combination (centralize instead of doing it
in many scripts)
* Catch up to build ID directory changes
* Add support for ssh_cmd and scp_cmd to allow using HPN-SSH with the
none cipher where possible (for performance)
* Lazy client setup; claim-chroot will report if the client needs to be
set up with this buildid, and we initiate the setup and poll until
it is complete. This allows fast clients to begin building before
slow ones have finished setting up.
TODO: a better solution would be to avoid trying to dispatch jobs onto
clients that are in the process of setting up, since they often have low
loads and are picked preferentially by the job scheduler.
* Remove vestiges of archaic support for building bindists from FTP
snapshots; we haven't used this for years and building a world is no
longer a challenge
* Revert half-baked bindist generation number and make it per-buildid
instead. Compress and md5 it for distribution to the clients.
TODO: Merge with makeworld?
checkmachines script. Polls build machines for their status either
once-off or regularly as a daemon. Optionally it will update the
queue entries but this remains subject to race conditions.
TODO: Integrate with queue manager and forward machine status changes
to it
targets.
* Use /rescue/sh for index builds instead of /bin/sh, when it exists.
The former is statically linked and faster to execute, which becomes
significant when executing it tens of thousands of times. This
trick can be used with other recursive targets by passing in
__MAKE_SHELL.
* Get rid of make variable assignments that use != command invocations
in the critical path, using several methods:
- rewriting logic to use shell or make builtins instead of external command executions
- macroizing commands and executing them in the targets where they
are needed instead of with every invocation of make
- precomputing the results of invariant commands in
bsd.port.subdir.mk and passing them in explicitly to child makes,
and using this to avoid recalculation in all the children. NB: the
commands are still run one per top-level subdirectory but this
does not currently seem to be a major issue. They could be moved
further up into the top-level Makefile at the cost of some
cleanliness.
- Committers are strongly discouraged from adding further "bare" !=
assignments to the ports tree, even in their own ports. One of
the above strategies should be used to avoid future bloat.
* Rewrite the core 'describe' target to work entirely within a single
shell process using only builtin commands. The old version is
retained as a backup for use on systems older than 603104, which
does not have the make :u modifier. This cuts down the number of
processes executed during the course of a 'make index' by an order
of magnitude, and we are essentially now amortized to the minimum of
a single make + sh instance per port, plus whatever commands the
port makefile itself executes (which are usually unnecessary and
bogus).
* Less validation of the WWW: target is performed; this can become
policed at a port level by portlint. Specifically we look at the
second word of the first line beginning with "WWW:" in pkg-descr,
and append "http://" to it unless it already begins with "http://",
"https://" or "ftp://". Thanks to dougb for the idea of how to
extract WWW: using shell builtins.
* Use the "true" shell builtin instead of echo > /dev/null for a
measurable decrease in CPU use.
* Add a note about dubious escaping strategy in bsd.port.subdir.mk
* Minor change in output of 'make describe': it no longer strips
trailing CR characters from pkg-descr files with MSDOS CR/LF
termination. Instead the makeindex perl script that post-processes
make describe into the INDEX is tweaked to strip on input.
The bottom line is that on my test hardware INDEX builds are now
faster by more than a factor of 2 and with a reduction in system time
by a factor of 4-8 depending on configuration.
Because the $FreeBSD$ keyword isn't expanded in the new version, we can't
just do a diff, check the return value and ignore the output.
Every new modules file, changed or not with regarding to the contents,
has at least four lines in the diff output (line number, old line,
seperator, new line). Only commit it if there are more than four
lines difference between it.
- added -c, doesn't change anything
- added -n, deal with an already checkouted tree
- removed negative logic in favor of 'unless'
- switch to 3 arg form of open()
- don't use globs for filehandles, this is been obsolete
since at least 5.6.1
- handle possible errors in close()
- allow CVSROOT to be overriden in the ENV
PR: ports/125025
Submitted by: "Philip M. Gollucci" <pgollucci@p6m7g8.com>
* Remove 5.x support
* Leave the archaic ftp snapshot support for now, it is not hurting anything
but will not work
* Be more careful when removing files (use absolute paths)
* Switch to bindist/tmp for the tmp dir
* Fix the recording of the bindist.tar generation number
* Get rid of redundant or useless processing of the world image
* Record the CVS update stamp in some extra places and make sure to remove it
if the build is started with -noportscvs (since this probably means the
ports tree was updated by hand at some random time)
invocations). It also fixes some edge cases that were not handled in
the previous version.
TODO: Correctly report IPv6 sockets (already in use by the sparc64 build)
ordering, which had become too limited.
We now build packages ordered by those that are part of the longest
dependency chains first. This has the effect of building the deepest
parts of the tree first and levelling out the tree height, hopefully
avoiding the situation we currently face where there appear
bottlenecks late in the build where the cluster becomes mostly idle
while waiting for a few long dependency chains to finish building
before the cluster can become fully loaded again.
The algorithm is that we sort the list of remaining packages according
to height (longest dependency chain), then add leaf packages from each
in order until we have filled a queue of length between 100 and 200,
to amortise the cost of this queue rebalancing while not losing the
height averaging property. Jobs are dispatched from this queue into
worker threads as machine slots become available.
Unlike the make-based solution that required a fixed -j concurrency
value and could not respond to addition/removal of build resources, we
now can dynamically add new machines as they become available to the
queue.
The other advantage of using python is that we have more
customisability and visibility into the build status, e.g. we
periodically report the number of remaining packages, as well as the
list of deepest packages that we are working on.
TODO:
* Implement mtime checking for parent package staleness, so that
parents are rebuilt if the dependencies are touched more recently.
Currently packages will not be rebuild if they exist, whether or not
they are "stale" wrt their dependencies.
* Offload the machine selection into an external queue manager.
Currently the queue manager used here doesn't interoperate with the
old one (getmachine/releasemachine) because it's not possible to use
the lockf()-based mutual exclusion within a multithreaded client.
Doing that will also allow for a more flexible job placement
algorithm as well as finer queue customization.