Linux显卡信息查询与排错
2026/5/4 8:58:24 网站建设 项目流程

查看显卡情况

lspci | grep -i vga
lspci | grep -E "VGA|3D|Display"
显示
04:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 41)
4b:00.0 VGA compatible controller: NVIDIA Corporation Device 2204 (rev a1)
65:00.0 VGA compatible controller: NVIDIA Corporation Device 2204 (rev a1)
点击型号网址进行查询
更多指令
查看gpu对应位置

nvidia-smi -q|grep-E"GPU 0000|Product Name|Bus Id"
GPU 00000000:4B:00.0 Product Name:***** Bus Id:00000000:4B:00.0 GPU 00000000:B1:00.0 Product Name:***** Bus Id:00000000:B1:00.0

查看所有的bus address

sudodmidecode -t slot|grep-E"Designation|Bus Address"
Designation: OCPA_CPU0_NVME0 Bus Address: 0000:17:00.0 Designation: OCPA_CPU0_NVME1 Bus Address: 0000:18:00.0 Designation: PCIE1_CPU0_SLOT0 Bus Address: 0000:4b:00.0 Designation: SLIM0_CPU0 Bus Address: 0000:65:00.0 Designation: PCIE0_CPU1_SLOT1 Bus Address: 0000:b1:00.0 Designation: SLIM0_CPU1 Bus Address: 0000:e3:00.0

浪潮 NF5280M6中00000000:4B:00.0对应左侧位置,00000000:B1:00.0对应右侧或中间的位置。

显示报错信息

dmesg|grep-i"XID"
[357.496185]NVRM: Xid(PCI:0000:b1:00):79, GPU has fallen off the bus.[357.497065]NVRM: Xid(PCI:0000:4b:00):154, GPU recovery action changed from 0x0(None)to 0x2(Node Reboot Required)[357.497078]NVRM: Xid(PCI:0000:b1:00):154, GPU recovery action changed from 0x0(None)to 0x2(Node Reboot Required)

Xid Errors表
常见错误

XID代码含义严重性典型原因
32推送缓冲区流无效或损坏驱动Bug、显存溢出
43GPU执行超时计算任务死锁、散热不良
74NVLink链路异常致命硬件损坏、固件故障
79GPU从总线脱落致命电源故障、PCIe插槽接触不良
48显存页分配失败显存不足、驱动兼容性问题

常见维修方法

对于Xid (PCI:0000:b1:00): 79, GPU has fallen off the bus.:将两张显卡交换位置,启动后使用服务器显卡,若产生掉卡,使用dmesg | grep -i "XID"查看信息,如果后续运行中依然报错Xid (PCI:0000:b1:00): 79, GPU has fallen off the bus.说明是PCIe插槽或电源线有问题,如果报错变成Xid (PCI:0000:4b:00): 79, GPU has fallen off the bus.说明显卡有问题。

需要专业的网站建设服务?

联系我们获取免费的网站建设咨询和方案报价,让我们帮助您实现业务目标

立即咨询