• General
  • 在WSL2-Ubuntu中安装CUDA12.8、cuDNN、Anaconda、Pytorch并验证安装

2025/03/11按照这一篇重新安装一次
我不需要前面那些windows环境下的安装,直接从
二、在WSL2-Ubuntu系统中安装CUDA、cuDNN、Anaoconda
开始

    一. 安装CUDA

    cd /mnt/download
    apt-get install build-essential
    wget https://developer.download.nvidia.com/compute/cuda/12.8.1/local_installers/cuda_12.8.1_570.124.06_linux.run
    sh cuda_12.8.1_570.124.06_linux.run

    按照目前我的环境
    CUDA Toolkit 12.8 Update 1 Downloads
    下面的这个文件
    cuda_12.8.1_570.124.06_linux.run
    有5G

    运行sh命令出现如下错误提示
    root@study:/mnt/download# sh cuda_12.8.1_570.124.06_linux.run
    Installation failed. See log at /var/log/cuda-installer.log for details.

    参照
    https://blog.csdn.net/wr1997/article/details/106909423
    禁用nouveau

    禁用之后重启,主界面没有任何显示,只能是通过ssh进去

    再次报错
    root@study:/mnt/download# sh cuda_12.8.1_570.124.06_linux.run
    sh: 1: dkms: not found
    Installation failed. See log at /var/log/cuda-installer.log for details.

    二话不说,先
    apt-get install dkms

    安装过程中选择了安装nvidia-fs,出现下面错误提示
    mofed is not installed

    再次安装,不选择安装nvidia-fs,结果如下

    root@study:/mnt/download# sh cuda_12.8.1_570.124.06_linux.run


    = Summary =


    Driver: Installed
    Toolkit: Installed in /usr/local/cuda-12.8/

    Please make sure that

    • PATH includes /usr/local/cuda-12.8/bin

    • LD_LIBRARY_PATH includes /usr/local/cuda-12.8/lib64, or, add /usr/local/cuda-12.8/lib64 to /etc/ld.so.conf and run ldconfig as root

      To uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-12.8/bin
      To uninstall the NVIDIA Driver, run nvidia-uninstall
      Logfile is /var/log/cuda-installer.log

    验证一下

    root@study:/mnt/download# nvidia-smi
    Tue Mar 11 09:41:08 2025
    +-----------------------------------------------------------------------------------------+
    | NVIDIA-SMI 570.124.06 Driver Version: 570.124.06 CUDA Version: 12.8 |
    |-----------------------------------------+------------------------+----------------------+
    | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
    | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
    | | | MIG M. |
    |=========================================+========================+======================|
    | 0 NVIDIA GeForce RTX 3060 Off | 00000000:01:00.0 Off | N/A |
    | 30% 29C P0 33W / 170W | 1MiB / 12288MiB | 2% Default |
    | | | N/A |
    +-----------------------------------------+------------------------+----------------------+

    +-----------------------------------------------------------------------------------------+
    | Processes: |
    | GPU GI CI PID Type Process name GPU Memory |
    | ID ID Usage |
    |=========================================================================================|
    | No running processes found |
    +-----------------------------------------------------------------------------------------+

    看来驱动安装成功

      原贴中的
      安装结束后进行环境变量的编辑:
      这一部分感觉不需要
      目前运行
      nvcc -V
      会显示

      root@study:/mnt/download# nvcc -V
      Command 'nvcc' not found, but can be installed with:
      apt install nvidia-cuda-toolkit

        看上面驱动安装之后这一段

        Driver: Installed
        Toolkit: Installed in /usr/local/cuda-12.8/

        Please make sure that

        PATH includes /usr/local/cuda-12.8/bin

        LD_LIBRARY_PATH includes /usr/local/cuda-12.8/lib64, or, add /usr/local/cuda-12.8/lib64 to /etc/ld.so.conf and run ldconfig as root

        To uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-12.8/bin
        To uninstall the NVIDIA Driver, run nvidia-uninstall
        Logfile is /var/log/cuda-installer.log

        看来path定义还是需要的

        编辑 ~/.bashrc 文件
        nano ~/.bashrc
        添加以下内容:
        export PATH=/usr/local/cuda-12.8/bin:$PATH
        export LD_LIBRARY_PATH=/usr/local/cuda-12.8/lib64:$LD_LIBRARY_PATH

        按下 Ctrl + X ,然后按 Y 确认保存,最后按 Enter 完成退出。
        保存文件后,运行以下命令使变量生效:
        source ~/.bashrc

        现在运行结果如下

        root@study:/mnt/download# nvcc -V
        nvcc: NVIDIA (R) Cuda compiler driver
        Copyright (c) 2005-2025 NVIDIA Corporation
        Built on Fri_Feb_21_20:23:50_PST_2025
        Cuda compilation tools, release 12.8, V12.8.93
        Build cuda_12.8.r12.8/compiler.35583870_0

        至此,CUDA安装成功并得到验证。

          走到这一步
          驱动程序安装程序(建议)
          运行
          apt-get install -y nvidia-open
          结果如下

          root@study:/mnt/download# apt-get install -y nvidia-open
          Reading package lists... Done
          Building dependency tree... Done
          Reading state information... Done
          E: Unable to locate package nvidia-open

          参考这一篇
          https://blog.csdn.net/guilutian0541/article/details/119928323
          添加PPA镜像源
          还是不行
          放弃!

            二、安装cuDNN
            原贴作者只运行了
            wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/cuda-keyring_1.1-1_all.deb
            sudo dpkg -i cuda-keyring_1.1-1_all.deb
            sudo apt-get update
            sudo apt-get -y install cudnn

            这四条命令,
            但是这个链接
            https://developer.nvidia.com/cudnn-downloads?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu&target_version=24.04&target_type=deb_network
            下面还有一条命令
            sudo apt-get -y install cudnn-cuda-12

            不需要再执行最后一条命令,上面那个不带版本号的已经装好了!

              三、安装Anaconda
              我是在
              /mnt/download
              下面进行安装的,按照原贴操作
              没有出错

              检查 Conda 版本
              conda --version
              (base) root@study:~# conda --version
              conda 24.9.2

              创建激活新环境先不进行,继续向下

                四、安装Pytorch并验证CUDA12.8、cuDNN、Anaconda、Pytorch的安装

                CUDA12.8版本还不支持conda命令安装
                用官方给的pip命令安装Preview (Nightly)版
                pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128

                啧啧
                感觉下载不少东西,还是应该去download目录下面操作的,等着吧。

                昨晚睡觉竟然没注意直接关机了
                今早继续装