cuda-doctor logo

Full diagnosis

`cuda-doctor doctor`

This command answers what is broken, what is risky, and whether the current stack can support the local GPU and intended runtime.

cuda-doctor doctor
v0.0.0

What it reports

  • GPU model, compute capability, VRAM, and architecture family.
  • Driver version, health, and expected compatibility envelope.
  • CUDA toolkit version, `nvcc` availability, and architecture target support.
  • Relevant runtimes and libraries needed for real workloads.
  • PyTorch compatibility and whether the installed wheel can target the local GPU.
  • Build-chain readiness for compiling CUDA projects.

Failure modes it should catch explicitly

Missing `sm_120` support

The GPU is Blackwell class but the local toolchain or build flags cannot target it.

Driver/runtime mismatch

`nvidia-smi` works, but the driver is too old for the intended runtime or framework stack.

PyTorch wheel drift

PyTorch imports successfully but is built against a CUDA runtime that cannot execute correctly on this machine.

Kernel module mismatch

Linux is using the wrong NVIDIA kernel module flavor for the desired stack.

Example diagnostic posture

Illustrative outputtext
Risk: high
GPU: RTX 5090
Architecture: Blackwell
Target capability: sm_120
Driver: present but too old for intended runtime
Toolkit: nvcc detected, but local flags do not include sm_120
PyTorch: installed wheel targets an incompatible CUDA runtime
Action: repair required before validation or build can succeed

No fake green checks

Presence checks are not enough. If the machine is likely to fail during a real kernel launch, the diagnosis should say so directly.