doc: add CUDA contributing section and document passthru test attributes

Signed-off-by: Connor Baker <ConnorBaker01@gmail.com>
This commit is contained in:
Connor Baker 2025-06-06 16:27:55 +00:00
parent b71d7d1820
commit 91e91bc596
2 changed files with 181 additions and 113 deletions

View File

@ -65,97 +65,6 @@ for your specific card(s).
Library maintainers should consult [NVCC Docs](https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/)
and release notes for their software package.
## Adding a new CUDA release {#adding-a-new-cuda-release}
> **WARNING**
>
> This section of the docs is still very much in progress. Feedback is welcome in GitHub Issues tagging @NixOS/cuda-maintainers or on [Matrix](https://matrix.to/#/#cuda:nixos.org).
The CUDA Toolkit is a suite of CUDA libraries and software meant to provide a development environment for CUDA-accelerated applications. Until the release of CUDA 11.4, NVIDIA had only made the CUDA Toolkit available as a multi-gigabyte runfile installer, which we provide through the [`cudaPackages.cudatoolkit`](https://search.nixos.org/packages?channel=unstable&type=packages&query=cudaPackages.cudatoolkit) attribute. From CUDA 11.4 and onwards, NVIDIA has also provided CUDA redistributables (“CUDA-redist”): individually packaged CUDA Toolkit components meant to facilitate redistribution and inclusion in downstream projects. These packages are available in the [`cudaPackages`](https://search.nixos.org/packages?channel=unstable&type=packages&query=cudaPackages) package set.
All new projects should use the CUDA redistributables available in [`cudaPackages`](https://search.nixos.org/packages?channel=unstable&type=packages&query=cudaPackages) in place of [`cudaPackages.cudatoolkit`](https://search.nixos.org/packages?channel=unstable&type=packages&query=cudaPackages.cudatoolkit), as they are much easier to maintain and update.
### Updating CUDA redistributables {#updating-cuda-redistributables}
1. Go to NVIDIA's index of CUDA redistributables: <https://developer.download.nvidia.com/compute/cuda/redist/>
2. Make a note of the new version of CUDA available.
3. Run
```bash
nix run github:connorbaker/cuda-redist-find-features -- \
download-manifests \
--log-level DEBUG \
--version <newest CUDA version> \
https://developer.download.nvidia.com/compute/cuda/redist \
./pkgs/development/cuda-modules/cuda/manifests
```
This will download a copy of the manifest for the new version of CUDA.
4. Run
```bash
nix run github:connorbaker/cuda-redist-find-features -- \
process-manifests \
--log-level DEBUG \
--version <newest CUDA version> \
https://developer.download.nvidia.com/compute/cuda/redist \
./pkgs/development/cuda-modules/cuda/manifests
```
This will generate a `redistrib_features_<newest CUDA version>.json` file in the same directory as the manifest.
5. Update the `cudaVersionMap` attribute set in `pkgs/development/cuda-modules/cuda/extension.nix`.
### Updating cuTensor {#updating-cutensor}
1. Repeat the steps present in [Updating CUDA redistributables](#updating-cuda-redistributables) with the following changes:
- Use the index of cuTensor redistributables: <https://developer.download.nvidia.com/compute/cutensor/redist>
- Use the newest version of cuTensor available instead of the newest version of CUDA.
- Use `pkgs/development/cuda-modules/cutensor/manifests` instead of `pkgs/development/cuda-modules/cuda/manifests`.
- Skip the step of updating `cudaVersionMap` in `pkgs/development/cuda-modules/cuda/extension.nix`.
### Updating supported compilers and GPUs {#updating-supported-compilers-and-gpus}
1. Update `nvccCompatibilities` in `pkgs/development/cuda-modules/_cuda/db/bootstrap/nvcc.nix` to include the newest release of NVCC, as well as any newly supported host compilers.
2. Update `cudaCapabilityToInfo` in `pkgs/development/cuda-modules/_cuda/db/bootstrap/cuda.nix` to include any new GPUs supported by the new release of CUDA.
### Updating the CUDA Toolkit runfile installer {#updating-the-cuda-toolkit}
> **WARNING**
>
> While the CUDA Toolkit runfile installer is still available in Nixpkgs as the [`cudaPackages.cudatoolkit`](https://search.nixos.org/packages?channel=unstable&type=packages&query=cudaPackages.cudatoolkit) attribute, its use is not recommended and should it be considered deprecated. Please migrate to the CUDA redistributables provided by the [`cudaPackages`](https://search.nixos.org/packages?channel=unstable&type=packages&query=cudaPackages) package set.
>
> To ensure packages relying on the CUDA Toolkit runfile installer continue to build, it will continue to be updated until a migration path is available.
1. Go to NVIDIA's CUDA Toolkit runfile installer download page: <https://developer.nvidia.com/cuda-downloads>
2. Select the appropriate OS, architecture, distribution, and version, and installer type.
- For example: Linux, x86_64, Ubuntu, 22.04, runfile (local)
- NOTE: Typically, we use the Ubuntu runfile. It is unclear if the runfile for other distributions will work.
3. Take the link provided by the installer instructions on the webpage after selecting the installer type and get its hash by running:
```bash
nix store prefetch-file --hash-type sha256 <link>
```
4. Update `pkgs/development/cuda-modules/cudatoolkit/releases.nix` to include the release.
### Updating the CUDA package set {#updating-the-cuda-package-set}
1. Include a new `cudaPackages_<major>_<minor>` package set in `pkgs/top-level/all-packages.nix`.
- NOTE: Changing the default CUDA package set should occur in a separate PR, allowing time for additional testing.
2. Successfully build the closure of the new package set, updating `pkgs/development/cuda-modules/cuda/overrides.nix` as needed. Below are some common failures:
| Unable to ... | During ... | Reason | Solution | Note |
| --- | --- | --- | --- | --- |
| Find headers | `configurePhase` or `buildPhase` | Missing dependency on a `dev` output | Add the missing dependency | The `dev` output typically contain the headers |
| Find libraries | `configurePhase` | Missing dependency on a `dev` output | Add the missing dependency | The `dev` output typically contain CMake configuration files |
| Find libraries | `buildPhase` or `patchelf` | Missing dependency on a `lib` or `static` output | Add the missing dependency | The `lib` or `static` output typically contain the libraries |
In the scenario you are unable to run the resulting binary: this is arguably the most complicated as it could be any combination of the previous reasons. This type of failure typically occurs when a library attempts to load or open a library it depends on that it does not declare in its `DT_NEEDED` section. As a first step, ensure that dependencies are patched with [`autoAddDriverRunpath`](https://search.nixos.org/packages?channel=unstable&type=packages&query=autoAddDriverRunpath). Failing that, try running the application with [`nixGL`](https://github.com/guibou/nixGL) or a similar wrapper tool. If that works, it likely means that the application is attempting to load a library that is not in the `RPATH` or `RUNPATH` of the binary.
## Running Docker or Podman containers with CUDA support {#cuda-docker-podman}
It is possible to run Docker or Podman containers with CUDA support. The recommended mechanism to perform this task is to use the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/index.html).
@ -201,7 +110,7 @@ $ nix run nixpkgs#jq -- -r '.devices[].name' < /var/run/cdi/nvidia-container-too
all
```
### Specifying what devices to expose to the container {#specifying-what-devices-to-expose-to-the-container}
### Specifying what devices to expose to the container {#cuda-specifying-what-devices-to-expose-to-the-container}
You can choose what devices are exposed to your containers by using the identifier on the generated CDI specification. Like follows:
@ -222,7 +131,7 @@ GPU 1: NVIDIA GeForce RTX 2080 SUPER (UUID: <REDACTED>)
By default, the NVIDIA Container Toolkit will use the GPU index to identify specific devices. You can change the way to identify what devices to expose by using the `hardware.nvidia-container-toolkit.device-name-strategy` NixOS attribute.
:::
### Using docker-compose {#using-docker-compose}
### Using docker-compose {#cuda-using-docker-compose}
It's possible to expose GPU's to a `docker-compose` environment as well. With a `docker-compose.yaml` file like follows:
@ -256,3 +165,142 @@ services:
- nvidia.com/gpu=0
- nvidia.com/gpu=1
```
## Contributing {#cuda-contributing}
::: {.warning}
This section of the docs is still very much in progress. Feedback is welcome in GitHub Issues tagging @NixOS/cuda-maintainers or on [Matrix](https://matrix.to/#/#cuda:nixos.org).
:::
### Package set maintenance {#cuda-package-set-maintenance}
The CUDA Toolkit is a suite of CUDA libraries and software meant to provide a development environment for CUDA-accelerated applications. Until the release of CUDA 11.4, NVIDIA had only made the CUDA Toolkit available as a multi-gigabyte runfile installer, which we provide through the [`cudaPackages.cudatoolkit`](https://search.nixos.org/packages?channel=unstable&type=packages&query=cudaPackages.cudatoolkit) attribute. From CUDA 11.4 and onwards, NVIDIA has also provided CUDA redistributables (“CUDA-redist”): individually packaged CUDA Toolkit components meant to facilitate redistribution and inclusion in downstream projects. These packages are available in the [`cudaPackages`](https://search.nixos.org/packages?channel=unstable&type=packages&query=cudaPackages) package set.
All new projects should use the CUDA redistributables available in [`cudaPackages`](https://search.nixos.org/packages?channel=unstable&type=packages&query=cudaPackages) in place of [`cudaPackages.cudatoolkit`](https://search.nixos.org/packages?channel=unstable&type=packages&query=cudaPackages.cudatoolkit), as they are much easier to maintain and update.
#### Updating redistributables {#cuda-updating-redistributables}
1. Go to NVIDIA's index of CUDA redistributables: <https://developer.download.nvidia.com/compute/cuda/redist/>
2. Make a note of the new version of CUDA available.
3. Run
```bash
nix run github:connorbaker/cuda-redist-find-features -- \
download-manifests \
--log-level DEBUG \
--version <newest CUDA version> \
https://developer.download.nvidia.com/compute/cuda/redist \
./pkgs/development/cuda-modules/cuda/manifests
```
This will download a copy of the manifest for the new version of CUDA.
4. Run
```bash
nix run github:connorbaker/cuda-redist-find-features -- \
process-manifests \
--log-level DEBUG \
--version <newest CUDA version> \
https://developer.download.nvidia.com/compute/cuda/redist \
./pkgs/development/cuda-modules/cuda/manifests
```
This will generate a `redistrib_features_<newest CUDA version>.json` file in the same directory as the manifest.
5. Update the `cudaVersionMap` attribute set in `pkgs/development/cuda-modules/cuda/extension.nix`.
#### Updating cuTensor {#cuda-updating-cutensor}
1. Repeat the steps present in [Updating CUDA redistributables](#cuda-updating-redistributables) with the following changes:
- Use the index of cuTensor redistributables: <https://developer.download.nvidia.com/compute/cutensor/redist>
- Use the newest version of cuTensor available instead of the newest version of CUDA.
- Use `pkgs/development/cuda-modules/cutensor/manifests` instead of `pkgs/development/cuda-modules/cuda/manifests`.
- Skip the step of updating `cudaVersionMap` in `pkgs/development/cuda-modules/cuda/extension.nix`.
#### Updating supported compilers and GPUs {#cuda-updating-supported-compilers-and-gpus}
1. Update `nvccCompatibilities` in `pkgs/development/cuda-modules/_cuda/data/nvcc.nix` to include the newest release of NVCC, as well as any newly supported host compilers.
2. Update `cudaCapabilityToInfo` in `pkgs/development/cuda-modules/_cuda/data/cuda.nix` to include any new GPUs supported by the new release of CUDA.
#### Updating the CUDA Toolkit runfile installer {#cuda-updating-the-cuda-toolkit}
::: {.warning}
While the CUDA Toolkit runfile installer is still available in Nixpkgs as the [`cudaPackages.cudatoolkit`](https://search.nixos.org/packages?channel=unstable&type=packages&query=cudaPackages.cudatoolkit) attribute, its use is not recommended, and it should be considered deprecated. Please migrate to the CUDA redistributables provided by the [`cudaPackages`](https://search.nixos.org/packages?channel=unstable&type=packages&query=cudaPackages) package set.
To ensure packages relying on the CUDA Toolkit runfile installer continue to build, it will continue to be updated until a migration path is available.
:::
1. Go to NVIDIA's CUDA Toolkit runfile installer download page: <https://developer.nvidia.com/cuda-downloads>
2. Select the appropriate OS, architecture, distribution, and version, and installer type.
- For example: Linux, x86_64, Ubuntu, 22.04, runfile (local)
- NOTE: Typically, we use the Ubuntu runfile. It is unclear if the runfile for other distributions will work.
3. Take the link provided by the installer instructions on the webpage after selecting the installer type and get its hash by running:
```bash
nix store prefetch-file --hash-type sha256 <link>
```
4. Update `pkgs/development/cuda-modules/cudatoolkit/releases.nix` to include the release.
#### Updating the CUDA package set {#cuda-updating-the-cuda-package-set}
1. Include a new `cudaPackages_<major>_<minor>` package set in `pkgs/top-level/all-packages.nix`.
- NOTE: Changing the default CUDA package set should occur in a separate PR, allowing time for additional testing.
2. Successfully build the closure of the new package set, updating `pkgs/development/cuda-modules/cuda/overrides.nix` as needed. Below are some common failures:
| Unable to ... | During ... | Reason | Solution | Note |
| -------------- | -------------------------------- | ------------------------------------------------ | -------------------------- | ------------------------------------------------------------ |
| Find headers | `configurePhase` or `buildPhase` | Missing dependency on a `dev` output | Add the missing dependency | The `dev` output typically contains the headers |
| Find libraries | `configurePhase` | Missing dependency on a `dev` output | Add the missing dependency | The `dev` output typically contains CMake configuration files |
| Find libraries | `buildPhase` or `patchelf` | Missing dependency on a `lib` or `static` output | Add the missing dependency | The `lib` or `static` output typically contains the libraries |
Failure to run the resulting binary is typically the most challenging to diagnose, as it may involve a combination of the aforementioned issues. This type of failure typically occurs when a library attempts to load or open a library it depends on that it does not declare in its `DT_NEEDED` section. Try the following debugging steps:
1. First ensure that dependencies are patched with [`autoAddDriverRunpath`](https://search.nixos.org/packages?channel=unstable&type=packages&query=autoAddDriverRunpath).
2. Failing that, try running the application with [`nixGL`](https://github.com/guibou/nixGL) or a similar wrapper tool.
3. If that works, it likely means that the application is attempting to load a library that is not in the `RPATH` or `RUNPATH` of the binary.
### Writing tests {#cuda-writing-tests}
::: {.caution}
The existence of `passthru.testers` and `passthru.tests` should be considered an implementation detail -- they are not meant to be a public or stable interface.
:::
In general, there are two attribute sets in `passthru` that are used to build and run tests for CUDA packages: `passthru.testers` and `passthru.tests`. Each attribute set may contain an attribute set named `cuda`, which contains CUDA-specific derivations. The `cuda` attribute set is used to separate CUDA-specific derivations from those which support multiple implementations (e.g., OpenCL, ROCm, etc.) or have different licenses. For an example of such generic derivations, see the `magma` package.
::: {.note}
Derivations are nested under the `cuda` attribute due to an OfBorg quirk: if evaluation fails (e.g., because of unfree licenses), the entire enclosing attribute set is discarded. This prevents other attributes in the set from being discovered, evaluated, or built.
:::
#### `passthru.testers` {#cuda-passthru-testers}
Attributes added to `passthru.testers` are derivations which produce an executable which runs a test. The produced executable should:
- Take care to set up the environment, make temporary directories, and so on.
- Be registered as the derivation's `meta.mainProgram` so that it can be run directly.
::: {.note}
Testers which always require CUDA should be placed in `passthru.testers.cuda`, while those which are generic should be placed in `passthru.testers`.
:::
The `passthru.testers` attribute set allows running tests outside the Nix sandbox. There are a number of reasons why this is useful, since such a test:
- Can be run on non-NixOS systems, when wrapped with utilities like `nixGL` or `nix-gl-host`.
- Has network access patterns which are difficult or impossible to sandbox.
- Is free to produce output which is not deterministic, such as timing information.
#### `passthru.tests` {#cuda-passthru-tests}
Attributes added to `passthru.tests` are derivations which run tests inside the Nix sandbox. Tests should:
- Use the executables produced by `passthru.testers`, where possible, to avoid duplication of test logic.
- Include `requiredSystemFeatures = [ "cuda" ];`, possibly conditioned on the value of `cudaSupport` if they are generic, to ensure that they are only run on systems exposing a CUDA-capable GPU.
::: {.note}
Tests which always require CUDA should be placed in `passthru.tests.cuda`, while those which are generic should be placed in `passthru.tests`.
:::
This is useful for tests which are deterministic (e.g., checking exit codes) and which can be provided with all necessary resources in the sandbox.

View File

@ -2756,33 +2756,53 @@
"cuda": [
"index.html#cuda"
],
"adding-a-new-cuda-release": [
"index.html#adding-a-new-cuda-release"
],
"updating-cuda-redistributables": [
"index.html#updating-cuda-redistributables"
],
"updating-cutensor": [
"index.html#updating-cutensor"
],
"updating-supported-compilers-and-gpus": [
"index.html#updating-supported-compilers-and-gpus"
],
"updating-the-cuda-toolkit": [
"index.html#updating-the-cuda-toolkit"
],
"updating-the-cuda-package-set": [
"index.html#updating-the-cuda-package-set"
"cuda-contributing": [
"index.html#cuda-contributing"
],
"cuda-docker-podman": [
"index.html#cuda-docker-podman"
],
"specifying-what-devices-to-expose-to-the-container": [
"cuda-package-set-maintenance": [
"index.html#cuda-package-set-maintenance",
"index.html#adding-a-new-cuda-release"
],
"cuda-passthru-testers": [
"index.html#cuda-passthru-testers"
],
"cuda-passthru-tests": [
"index.html#cuda-passthru-tests"
],
"cuda-specifying-what-devices-to-expose-to-the-container": [
"index.html#cuda-specifying-what-devices-to-expose-to-the-container",
"index.html#specifying-what-devices-to-expose-to-the-container"
],
"using-docker-compose": [
"cuda-updating-redistributables": [
"index.html#cuda-updating-redistributables",
"index.html#updating-cuda-redistributables"
],
"cuda-updating-cutensor": [
"index.html#cuda-updating-cutensor",
"index.html#updating-cutensor"
],
"cuda-updating-supported-compilers-and-gpus": [
"index.html#cuda-updating-supported-compilers-and-gpus",
"index.html#updating-supported-compilers-and-gpus"
],
"cuda-updating-the-cuda-package-set": [
"index.html#cuda-updating-the-cuda-package-set",
"index.html#updating-the-cuda-package-set"
],
"cuda-updating-the-cuda-toolkit": [
"index.html#cuda-updating-the-cuda-toolkit",
"index.html#updating-the-cuda-toolkit"
],
"cuda-using-docker-compose": [
"index.html#cuda-using-docker-compose",
"index.html#using-docker-compose"
],
"cuda-writing-tests": [
"index.html#cuda-writing-tests"
],
"cuelang": [
"index.html#cuelang"
],