doc: add CUDA contributing section and document passthru test attributes

Signed-off-by: Connor Baker <ConnorBaker01@gmail.com>
2025-06-06 16:27:55 +00:00 · 2025-06-06 16:27:55 +00:00 · 91e91bc596
commit 91e91bc596
parent b71d7d1820
2 changed files with 181 additions and 113 deletions
--- a/doc/languages-frameworks/cuda.section.md
+++ b/doc/languages-frameworks/cuda.section.md
@ -65,97 +65,6 @@ for your specific card(s).
 Library maintainers should consult [NVCC Docs](https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/)
 and release notes for their software package.

-## Adding a new CUDA release {#adding-a-new-cuda-release}
-
-> **WARNING**
->
-> This section of the docs is still very much in progress. Feedback is welcome in GitHub Issues tagging @NixOS/cuda-maintainers or on [Matrix](https://matrix.to/#/#cuda:nixos.org).
-
-The CUDA Toolkit is a suite of CUDA libraries and software meant to provide a development environment for CUDA-accelerated applications. Until the release of CUDA 11.4, NVIDIA had only made the CUDA Toolkit available as a multi-gigabyte runfile installer, which we provide through the [`cudaPackages.cudatoolkit`](https://search.nixos.org/packages?channel=unstable&type=packages&query=cudaPackages.cudatoolkit) attribute. From CUDA 11.4 and onwards, NVIDIA has also provided CUDA redistributables (“CUDA-redist”): individually packaged CUDA Toolkit components meant to facilitate redistribution and inclusion in downstream projects. These packages are available in the [`cudaPackages`](https://search.nixos.org/packages?channel=unstable&type=packages&query=cudaPackages) package set.
-
-All new projects should use the CUDA redistributables available in [`cudaPackages`](https://search.nixos.org/packages?channel=unstable&type=packages&query=cudaPackages) in place of [`cudaPackages.cudatoolkit`](https://search.nixos.org/packages?channel=unstable&type=packages&query=cudaPackages.cudatoolkit), as they are much easier to maintain and update.
-
-### Updating CUDA redistributables {#updating-cuda-redistributables}
-
-1. Go to NVIDIA's index of CUDA redistributables: <https://developer.download.nvidia.com/compute/cuda/redist/>
-2. Make a note of the new version of CUDA available.
-3. Run
-
-   ```bash
-   nix run github:connorbaker/cuda-redist-find-features -- \
-      download-manifests \
-      --log-level DEBUG \
-      --version <newest CUDA version> \
-      https://developer.download.nvidia.com/compute/cuda/redist \
-      ./pkgs/development/cuda-modules/cuda/manifests
-   ```
-
-   This will download a copy of the manifest for the new version of CUDA.
-4. Run
-
-   ```bash
-   nix run github:connorbaker/cuda-redist-find-features -- \
-      process-manifests \
-      --log-level DEBUG \
-      --version <newest CUDA version> \
-      https://developer.download.nvidia.com/compute/cuda/redist \
-      ./pkgs/development/cuda-modules/cuda/manifests
-   ```
-
-   This will generate a `redistrib_features_<newest CUDA version>.json` file in the same directory as the manifest.
-5. Update the `cudaVersionMap` attribute set in `pkgs/development/cuda-modules/cuda/extension.nix`.
-
-### Updating cuTensor {#updating-cutensor}
-
-1. Repeat the steps present in [Updating CUDA redistributables](#updating-cuda-redistributables) with the following changes:
-   - Use the index of cuTensor redistributables: <https://developer.download.nvidia.com/compute/cutensor/redist>
-   - Use the newest version of cuTensor available instead of the newest version of CUDA.
-   - Use `pkgs/development/cuda-modules/cutensor/manifests` instead of `pkgs/development/cuda-modules/cuda/manifests`.
-   - Skip the step of updating `cudaVersionMap` in `pkgs/development/cuda-modules/cuda/extension.nix`.
-
-### Updating supported compilers and GPUs {#updating-supported-compilers-and-gpus}
-
-1. Update `nvccCompatibilities` in `pkgs/development/cuda-modules/_cuda/db/bootstrap/nvcc.nix` to include the newest release of NVCC, as well as any newly supported host compilers.
-2. Update `cudaCapabilityToInfo` in `pkgs/development/cuda-modules/_cuda/db/bootstrap/cuda.nix` to include any new GPUs supported by the new release of CUDA.
-
-### Updating the CUDA Toolkit runfile installer {#updating-the-cuda-toolkit}
-
-> **WARNING**
->
-> While the CUDA Toolkit runfile installer is still available in Nixpkgs as the [`cudaPackages.cudatoolkit`](https://search.nixos.org/packages?channel=unstable&type=packages&query=cudaPackages.cudatoolkit) attribute, its use is not recommended and should it be considered deprecated. Please migrate to the CUDA redistributables provided by the [`cudaPackages`](https://search.nixos.org/packages?channel=unstable&type=packages&query=cudaPackages) package set.
->
-> To ensure packages relying on the CUDA Toolkit runfile installer continue to build, it will continue to be updated until a migration path is available.
-
-1. Go to NVIDIA's CUDA Toolkit runfile installer download page: <https://developer.nvidia.com/cuda-downloads>
-2. Select the appropriate OS, architecture, distribution, and version, and installer type.
-
-   - For example: Linux, x86_64, Ubuntu, 22.04, runfile (local)
-   - NOTE: Typically, we use the Ubuntu runfile. It is unclear if the runfile for other distributions will work.
-
-3. Take the link provided by the installer instructions on the webpage after selecting the installer type and get its hash by running:
-
-   ```bash
-   nix store prefetch-file --hash-type sha256 <link>
-   ```
-
-4. Update `pkgs/development/cuda-modules/cudatoolkit/releases.nix` to include the release.
-
-### Updating the CUDA package set {#updating-the-cuda-package-set}
-
-1. Include a new `cudaPackages_<major>_<minor>` package set in `pkgs/top-level/all-packages.nix`.
-
-   - NOTE: Changing the default CUDA package set should occur in a separate PR, allowing time for additional testing.
-
-2. Successfully build the closure of the new package set, updating `pkgs/development/cuda-modules/cuda/overrides.nix` as needed. Below are some common failures:
-
-| Unable to ... | During ... | Reason | Solution | Note |
-| --- | --- | --- | --- | --- |
-| Find headers | `configurePhase` or `buildPhase` | Missing dependency on a `dev` output | Add the missing dependency | The `dev` output typically contain the headers |
-| Find libraries | `configurePhase` | Missing dependency on a `dev` output | Add the missing dependency | The `dev` output typically contain CMake configuration files |
-| Find libraries | `buildPhase` or `patchelf` | Missing dependency on a `lib` or `static` output | Add the missing dependency | The `lib` or `static` output typically contain the libraries |
-
-In the scenario you are unable to run the resulting binary: this is arguably the most complicated as it could be any combination of the previous reasons. This type of failure typically occurs when a library attempts to load or open a library it depends on that it does not declare in its `DT_NEEDED` section. As a first step, ensure that dependencies are patched with [`autoAddDriverRunpath`](https://search.nixos.org/packages?channel=unstable&type=packages&query=autoAddDriverRunpath). Failing that, try running the application with [`nixGL`](https://github.com/guibou/nixGL) or a similar wrapper tool. If that works, it likely means that the application is attempting to load a library that is not in the `RPATH` or `RUNPATH` of the binary.
-
 ## Running Docker or Podman containers with CUDA support {#cuda-docker-podman}

 It is possible to run Docker or Podman containers with CUDA support. The recommended mechanism to perform this task is to use the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/index.html).
@ -201,7 +110,7 @@ $ nix run nixpkgs#jq -- -r '.devices[].name' < /var/run/cdi/nvidia-container-too
 all
 ```

-### Specifying what devices to expose to the container {#specifying-what-devices-to-expose-to-the-container}
+### Specifying what devices to expose to the container {#cuda-specifying-what-devices-to-expose-to-the-container}

 You can choose what devices are exposed to your containers by using the identifier on the generated CDI specification. Like follows:

@ -222,7 +131,7 @@ GPU 1: NVIDIA GeForce RTX 2080 SUPER (UUID: <REDACTED>)
 By default, the NVIDIA Container Toolkit will use the GPU index to identify specific devices. You can change the way to identify what devices to expose by using the `hardware.nvidia-container-toolkit.device-name-strategy` NixOS attribute.
 :::

-### Using docker-compose {#using-docker-compose}
+### Using docker-compose {#cuda-using-docker-compose}

 It's possible to expose GPU's to a `docker-compose` environment as well. With a `docker-compose.yaml` file like follows:

@ -256,3 +165,142 @@ services:
            - nvidia.com/gpu=0
            - nvidia.com/gpu=1
 ```
+
+## Contributing {#cuda-contributing}
+
+::: {.warning}
+This section of the docs is still very much in progress. Feedback is welcome in GitHub Issues tagging @NixOS/cuda-maintainers or on [Matrix](https://matrix.to/#/#cuda:nixos.org).
+:::
+
+### Package set maintenance {#cuda-package-set-maintenance}
+
+The CUDA Toolkit is a suite of CUDA libraries and software meant to provide a development environment for CUDA-accelerated applications. Until the release of CUDA 11.4, NVIDIA had only made the CUDA Toolkit available as a multi-gigabyte runfile installer, which we provide through the [`cudaPackages.cudatoolkit`](https://search.nixos.org/packages?channel=unstable&type=packages&query=cudaPackages.cudatoolkit) attribute. From CUDA 11.4 and onwards, NVIDIA has also provided CUDA redistributables (“CUDA-redist”): individually packaged CUDA Toolkit components meant to facilitate redistribution and inclusion in downstream projects. These packages are available in the [`cudaPackages`](https://search.nixos.org/packages?channel=unstable&type=packages&query=cudaPackages) package set.
+
+All new projects should use the CUDA redistributables available in [`cudaPackages`](https://search.nixos.org/packages?channel=unstable&type=packages&query=cudaPackages) in place of [`cudaPackages.cudatoolkit`](https://search.nixos.org/packages?channel=unstable&type=packages&query=cudaPackages.cudatoolkit), as they are much easier to maintain and update.
+
+#### Updating redistributables {#cuda-updating-redistributables}
+
+1. Go to NVIDIA's index of CUDA redistributables: <https://developer.download.nvidia.com/compute/cuda/redist/>
+2. Make a note of the new version of CUDA available.
+3. Run
+
+   ```bash
+   nix run github:connorbaker/cuda-redist-find-features -- \
+      download-manifests \
+      --log-level DEBUG \
+      --version <newest CUDA version> \
+      https://developer.download.nvidia.com/compute/cuda/redist \
+      ./pkgs/development/cuda-modules/cuda/manifests
+   ```
+
+   This will download a copy of the manifest for the new version of CUDA.
+4. Run
+
+   ```bash
+   nix run github:connorbaker/cuda-redist-find-features -- \
+      process-manifests \
+      --log-level DEBUG \
+      --version <newest CUDA version> \
+      https://developer.download.nvidia.com/compute/cuda/redist \
+      ./pkgs/development/cuda-modules/cuda/manifests
+   ```
+
+   This will generate a `redistrib_features_<newest CUDA version>.json` file in the same directory as the manifest.
+5. Update the `cudaVersionMap` attribute set in `pkgs/development/cuda-modules/cuda/extension.nix`.
+
+#### Updating cuTensor {#cuda-updating-cutensor}
+
+1. Repeat the steps present in [Updating CUDA redistributables](#cuda-updating-redistributables) with the following changes:
+   - Use the index of cuTensor redistributables: <https://developer.download.nvidia.com/compute/cutensor/redist>
+   - Use the newest version of cuTensor available instead of the newest version of CUDA.
+   - Use `pkgs/development/cuda-modules/cutensor/manifests` instead of `pkgs/development/cuda-modules/cuda/manifests`.
+   - Skip the step of updating `cudaVersionMap` in `pkgs/development/cuda-modules/cuda/extension.nix`.
+
+#### Updating supported compilers and GPUs {#cuda-updating-supported-compilers-and-gpus}
+
+1. Update `nvccCompatibilities` in `pkgs/development/cuda-modules/_cuda/data/nvcc.nix` to include the newest release of NVCC, as well as any newly supported host compilers.
+2. Update `cudaCapabilityToInfo` in `pkgs/development/cuda-modules/_cuda/data/cuda.nix` to include any new GPUs supported by the new release of CUDA.
+
+#### Updating the CUDA Toolkit runfile installer {#cuda-updating-the-cuda-toolkit}
+
+::: {.warning}
+While the CUDA Toolkit runfile installer is still available in Nixpkgs as the [`cudaPackages.cudatoolkit`](https://search.nixos.org/packages?channel=unstable&type=packages&query=cudaPackages.cudatoolkit) attribute, its use is not recommended, and it should be considered deprecated. Please migrate to the CUDA redistributables provided by the [`cudaPackages`](https://search.nixos.org/packages?channel=unstable&type=packages&query=cudaPackages) package set.
+
+To ensure packages relying on the CUDA Toolkit runfile installer continue to build, it will continue to be updated until a migration path is available.
+:::
+
+1. Go to NVIDIA's CUDA Toolkit runfile installer download page: <https://developer.nvidia.com/cuda-downloads>
+2. Select the appropriate OS, architecture, distribution, and version, and installer type.
+
+   - For example: Linux, x86_64, Ubuntu, 22.04, runfile (local)
+   - NOTE: Typically, we use the Ubuntu runfile. It is unclear if the runfile for other distributions will work.
+
+3. Take the link provided by the installer instructions on the webpage after selecting the installer type and get its hash by running:
+
+   ```bash
+   nix store prefetch-file --hash-type sha256 <link>
+   ```
+
+4. Update `pkgs/development/cuda-modules/cudatoolkit/releases.nix` to include the release.
+
+#### Updating the CUDA package set {#cuda-updating-the-cuda-package-set}
+
+1. Include a new `cudaPackages_<major>_<minor>` package set in `pkgs/top-level/all-packages.nix`.
+
+   - NOTE: Changing the default CUDA package set should occur in a separate PR, allowing time for additional testing.
+
+2. Successfully build the closure of the new package set, updating `pkgs/development/cuda-modules/cuda/overrides.nix` as needed. Below are some common failures:
+
+| Unable to ...  | During ...                       | Reason                                           | Solution                   | Note                                                         |
+| -------------- | -------------------------------- | ------------------------------------------------ | -------------------------- | ------------------------------------------------------------ |
+| Find headers   | `configurePhase` or `buildPhase` | Missing dependency on a `dev` output             | Add the missing dependency | The `dev` output typically contains the headers               |
+| Find libraries | `configurePhase`                 | Missing dependency on a `dev` output             | Add the missing dependency | The `dev` output typically contains CMake configuration files |
+| Find libraries | `buildPhase` or `patchelf`       | Missing dependency on a `lib` or `static` output | Add the missing dependency | The `lib` or `static` output typically contains the libraries |
+
+Failure to run the resulting binary is typically the most challenging to diagnose, as it may involve a combination of the aforementioned issues. This type of failure typically occurs when a library attempts to load or open a library it depends on that it does not declare in its `DT_NEEDED` section. Try the following debugging steps:
+
+1. First ensure that dependencies are patched with [`autoAddDriverRunpath`](https://search.nixos.org/packages?channel=unstable&type=packages&query=autoAddDriverRunpath).
+2. Failing that, try running the application with [`nixGL`](https://github.com/guibou/nixGL) or a similar wrapper tool.
+3. If that works, it likely means that the application is attempting to load a library that is not in the `RPATH` or `RUNPATH` of the binary.
+
+### Writing tests {#cuda-writing-tests}
+
+::: {.caution}
+The existence of `passthru.testers` and `passthru.tests` should be considered an implementation detail -- they are not meant to be a public or stable interface.
+:::
+
+In general, there are two attribute sets in `passthru` that are used to build and run tests for CUDA packages: `passthru.testers` and `passthru.tests`. Each attribute set may contain an attribute set named `cuda`, which contains CUDA-specific derivations. The `cuda` attribute set is used to separate CUDA-specific derivations from those which support multiple implementations (e.g., OpenCL, ROCm, etc.) or have different licenses. For an example of such generic derivations, see the `magma` package.
+
+::: {.note}
+Derivations are nested under the `cuda` attribute due to an OfBorg quirk: if evaluation fails (e.g., because of unfree licenses), the entire enclosing attribute set is discarded. This prevents other attributes in the set from being discovered, evaluated, or built.
+:::
+
+#### `passthru.testers` {#cuda-passthru-testers}
+
+Attributes added to `passthru.testers` are derivations which produce an executable which runs a test. The produced executable should:
+
+- Take care to set up the environment, make temporary directories, and so on.
+- Be registered as the derivation's `meta.mainProgram` so that it can be run directly.
+
+::: {.note}
+Testers which always require CUDA should be placed in `passthru.testers.cuda`, while those which are generic should be placed in `passthru.testers`.
+:::
+
+The `passthru.testers` attribute set allows running tests outside the Nix sandbox. There are a number of reasons why this is useful, since such a test:
+
+- Can be run on non-NixOS systems, when wrapped with utilities like `nixGL` or `nix-gl-host`.
+- Has network access patterns which are difficult or impossible to sandbox.
+- Is free to produce output which is not deterministic, such as timing information.
+
+#### `passthru.tests` {#cuda-passthru-tests}
+
+Attributes added to `passthru.tests` are derivations which run tests inside the Nix sandbox. Tests should:
+
+- Use the executables produced by `passthru.testers`, where possible, to avoid duplication of test logic.
+- Include `requiredSystemFeatures = [ "cuda" ];`, possibly conditioned on the value of `cudaSupport` if they are generic, to ensure that they are only run on systems exposing a CUDA-capable GPU.
+
+::: {.note}
+Tests which always require CUDA should be placed in `passthru.tests.cuda`, while those which are generic should be placed in `passthru.tests`.
+:::
+
+This is useful for tests which are deterministic (e.g., checking exit codes) and which can be provided with all necessary resources in the sandbox.
--- a/doc/redirects.json
+++ b/doc/redirects.json
@ -2756,33 +2756,53 @@
  "cuda": [
    "index.html#cuda"
  ],
-  "adding-a-new-cuda-release": [
-    "index.html#adding-a-new-cuda-release"
-  ],
-  "updating-cuda-redistributables": [
-    "index.html#updating-cuda-redistributables"
-  ],
-  "updating-cutensor": [
-    "index.html#updating-cutensor"
-  ],
-  "updating-supported-compilers-and-gpus": [
-    "index.html#updating-supported-compilers-and-gpus"
-  ],
-  "updating-the-cuda-toolkit": [
-    "index.html#updating-the-cuda-toolkit"
-  ],
-  "updating-the-cuda-package-set": [
-    "index.html#updating-the-cuda-package-set"
+  "cuda-contributing": [
+    "index.html#cuda-contributing"
  ],
  "cuda-docker-podman": [
    "index.html#cuda-docker-podman"
  ],
-  "specifying-what-devices-to-expose-to-the-container": [
+  "cuda-package-set-maintenance": [
+    "index.html#cuda-package-set-maintenance",
+    "index.html#adding-a-new-cuda-release"
+  ],
+  "cuda-passthru-testers": [
+    "index.html#cuda-passthru-testers"
+  ],
+  "cuda-passthru-tests": [
+    "index.html#cuda-passthru-tests"
+  ],
+  "cuda-specifying-what-devices-to-expose-to-the-container": [
+    "index.html#cuda-specifying-what-devices-to-expose-to-the-container",
    "index.html#specifying-what-devices-to-expose-to-the-container"
  ],
-  "using-docker-compose": [
+  "cuda-updating-redistributables": [
+    "index.html#cuda-updating-redistributables",
+    "index.html#updating-cuda-redistributables"
+  ],
+  "cuda-updating-cutensor": [
+    "index.html#cuda-updating-cutensor",
+    "index.html#updating-cutensor"
+  ],
+  "cuda-updating-supported-compilers-and-gpus": [
+    "index.html#cuda-updating-supported-compilers-and-gpus",
+    "index.html#updating-supported-compilers-and-gpus"
+  ],
+  "cuda-updating-the-cuda-package-set": [
+    "index.html#cuda-updating-the-cuda-package-set",
+    "index.html#updating-the-cuda-package-set"
+  ],
+  "cuda-updating-the-cuda-toolkit": [
+    "index.html#cuda-updating-the-cuda-toolkit",
+    "index.html#updating-the-cuda-toolkit"
+  ],
+  "cuda-using-docker-compose": [
+    "index.html#cuda-using-docker-compose",
    "index.html#using-docker-compose"
  ],
+  "cuda-writing-tests": [
+    "index.html#cuda-writing-tests"
+  ],
  "cuelang": [
    "index.html#cuelang"
  ],