Full diagnosis

`cuda-doctor doctor`

This command answers what is broken, what is risky, and whether the current stack can support the local GPU and intended runtime.

cuda-doctor doctor

v0.0.0

Overview Quickstart Doctor Repair Check Setup Validate Build

What it reports

GPU model, compute capability, VRAM, and architecture family.
Driver version, health, and expected compatibility envelope.
CUDA toolkit version, `nvcc` availability, and architecture target support.
Relevant runtimes and libraries needed for real workloads.
PyTorch compatibility and whether the installed wheel can target the local GPU.
Build-chain readiness for compiling CUDA projects.

Failure modes it should catch explicitly

Missing `sm_120` support

The GPU is Blackwell class but the local toolchain or build flags cannot target it.

Driver/runtime mismatch

`nvidia-smi` works, but the driver is too old for the intended runtime or framework stack.

PyTorch wheel drift

PyTorch imports successfully but is built against a CUDA runtime that cannot execute correctly on this machine.

Kernel module mismatch

Linux is using the wrong NVIDIA kernel module flavor for the desired stack.

Example diagnostic posture

Illustrative outputtext

Risk: high
GPU: RTX 5090
Architecture: Blackwell
Target capability: sm_120
Driver: present but too old for intended runtime
Toolkit: nvcc detected, but local flags do not include sm_120
PyTorch: installed wheel targets an incompatible CUDA runtime
Action: repair required before validation or build can succeed

No fake green checks

Presence checks are not enough. If the machine is likely to fail during a real kernel launch, the diagnosis should say so directly.

Related docs

Repair

doctor auto

Apply compatible repairs to a broken or misleading CUDA environment and refuse success until validation passes.

Diagnose

check

Run a lighter, read-only inspection for CI, scripting, or quick triage.

Execution

validate

Prove that device selection, memory transfer, kernel launch, and runtime behavior work on the local GPU.