解決 Ubuntu 24.04 Secure Boot 環境下 NVIDIA 驅動無法載入問題

在 Ubuntu 24.04 搭配 Secure Boot 的環境下,明明 NVIDIA 驅動已經裝好,dkms status 也顯示 installed,但 nvidia-smi 卻跳出「NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver」。手動載入模組更是直接被打槍:Key was rejected by service

這篇記錄我如何解決這個問題。

環境

  • Ubuntu 24.04.3 LTS
  • Kernel: 6.14.0-1015-nvidia
  • GPU: NVIDIA GeForce RTX 4090
  • Driver: 590.48.01
  • Secure Boot: 啟用

問題現象

執行 nvidia-smi 出現錯誤:

$ nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver.
Make sure that the latest NVIDIA driver is installed and running.

檢查模組狀態,發現根本沒載入:

$ lsmod | grep nvidia
(無輸出)

但 DKMS 明明顯示已安裝:

$ dkms status
nvidia/590.48.01, 6.14.0-1015-nvidia, x86_64: installed

嘗試手動載入模組,出現關鍵錯誤:

$ sudo modprobe nvidia
modprobe: ERROR: could not insert 'nvidia': Key was rejected by service

問題根因

看到 Key was rejected by service 就知道是 Secure Boot 搞的鬼。

$ mokutil --sb-state
SecureBoot enabled

Secure Boot 要求所有核心模組必須經過簽章驗證。而 nvidia-dkms-open 是透過 DKMS 在本機編譯的模組,沒有被 MOK(Machine Owner Key)資料庫中的金鑰簽署,所以被系統拒絕載入。

解決方案

不想關 Secure Boot 的話,改用 Ubuntu 官方預先簽署的 NVIDIA 核心模組就好。這些模組已經由 Canonical 簽署,MOK 資料庫裡本來就有 Canonical 的憑證。

Step 1:找到對應的預簽署套件

$ apt-cache search linux-modules-nvidia | grep $(uname -r)
linux-modules-nvidia-590-open-6.14.0-1015-nvidia - Linux kernel nvidia modules for version 6.14.0-1015
linux-modules-nvidia-590-server-open-6.14.0-1015-nvidia - Linux kernel nvidia modules for version 6.14.0-1015
...

我原本用的是 nvidia-driver-open(open kernel module),所以對應的預簽署套件是 linux-modules-nvidia-590-open-6.14.0-1015-nvidia

Step 2:安裝預簽署模組

$ sudo apt-get update
$ sudo apt-get install -y linux-modules-nvidia-590-open-6.14.0-1015-nvidia

安裝過程會自動移除 DKMS 版本的驅動:

The following packages will be REMOVED:
  nvidia-dkms-open nvidia-driver-open nvidia-kernel-common nvidia-open
The following NEW packages will be installed:
  linux-modules-nvidia-590-open-6.14.0-1015-nvidia
  nvidia-firmware-590-590.48.01 nvidia-kernel-common-590

Step 3:處理套件衝突(如果遇到)

我在安裝時遇到 nvidia-firmware 衝突:

dpkg: error processing archive nvidia-firmware-590-590.48.01_..._amd64.deb (--unpack):
 trying to overwrite '/lib/firmware/nvidia/590.48.01/gsp_ga10x.bin',
 which is also in package nvidia-firmware 590.48.01-0ubuntu1

這是因為之前從 NVIDIA CUDA repository 裝的 nvidia-firmware 和 Ubuntu repository 的 nvidia-firmware-590-590.48.01 檔案路徑重疊。

強制移除舊的再修復:

$ sudo dpkg --remove --force-remove-reinstreq nvidia-firmware
$ sudo apt-get install -f -y

Step 4:載入模組並驗證

$ sudo modprobe nvidia
$ lsmod | grep nvidia
nvidia              14770176  0

成功載入!最後確認 nvidia-smi

$ nvidia-smi
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 590.48.01              Driver Version: 590.48.01      CUDA Version: 13.1     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
|   0  NVIDIA GeForce RTX 4090        Off |   00000000:46:00.0 Off |                  Off |
| 30%   41C    P0             57W /  450W |       1MiB /  24564MiB |      2%      Default |
+-----------------------------------------+------------------------+----------------------+

結論

在 Secure Boot 環境下,DKMS 編譯的 NVIDIA 模組因為沒有簽署會被拒絕載入。解法有兩個:

  1. 關閉 Secure Boot:最簡單但可能有安全疑慮
  2. 使用 Ubuntu 預簽署模組:本文的做法,保持 Secure Boot 開啟

如果你跟我一樣不想關 Secure Boot,記得改用 linux-modules-nvidia-* 系列的套件,而不是 nvidia-dkms-*