본문 바로가기
NVIDIA

[NVIDIA] In use by another client(프로세스 충돌)

by Yoon_estar 2024. 10. 18.
728x90
반응형

ERR 로그

# nvidia-smi --gpu-reset
The following GPUs could not be reset:
  GPU 00000000:B8:00.0: In use by another client

1 device is currently being used by one or more other processes (e.g., Fabric Manager, CUDA application, graphics application such as an X server, or a monitoring application such as another instance of nvidia-smi). Please first kill all processes using this device and all compute applications running in the system.
root@clunix:~# nvidia-smi -lgipp
ERROR: Option -lgipp is not recognized. Please run 'nvidia-smi -h'.

# nvidia-smi mig -lgipp
GPU  0 Profile ID 19 Placements: {0,1,2,3,4,5,6}:1
GPU  0 Profile ID 20 Placements: {0,1,2,3,4,5,6}:1
GPU  0 Profile ID 15 Placements: {0,2,4,6}:2
GPU  0 Profile ID 14 Placements: {0,2,4}:2
GPU  0 Profile ID  9 Placements: {0,4}:4
GPU  0 Profile ID  5 Placement : {0}:4
GPU  0 Profile ID  0 Placement : {0}:8

# nvidia-smi mig -lgip
+-----------------------------------------------------------------------------+
| GPU instance profiles:                                                      |
| GPU   Name             ID    Instances   Memory     P2P    SM    DEC   ENC  |
|                              Free/Total   GiB              CE    JPEG  OFA  |
|=============================================================================|
|   0  MIG 1g.10gb       19     7/7        9.75       No     14     1     0   |
|                                                             1     1     0   |
+-----------------------------------------------------------------------------+
|   0  MIG 1g.10gb+me    20     1/1        9.75       No     14     1     0   |
|                                                             1     1     1   |
+-----------------------------------------------------------------------------+
|   0  MIG 1g.20gb       15     4/4        19.62      No     14     1     0   |
|                                                             1     1     0   |
+-----------------------------------------------------------------------------+
|   0  MIG 2g.20gb       14     3/3        19.62      No     30     2     0   |
|                                                             2     2     0   |
+-----------------------------------------------------------------------------+
|   0  MIG 3g.40gb        9     2/2        39.38      No     46     3     0   |
|                                                             3     3     0   |
+-----------------------------------------------------------------------------+
|   0  MIG 4g.40gb        5     1/1        39.38      No     62     4     0   |
|                                                             4     4     0   |
+-----------------------------------------------------------------------------+
|   0  MIG 7g.80gb        0     1/1        79.12      No     114    7     0   |
|                                                             8     7     1   |
+-----------------------------------------------------------------------------+

# nvidia-smi mig -cgi 19
Unable to create a GPU instance on GPU  0 using profile 19: In use by another client
Failed to create GPU instances: In use by another client

해결

CUDA 드라이버를 설치하고 ~/.bashrc 에 자동으로 환경변수를 잡도록 설정하여서 프로세스 충돌 오류가 발생하였다. 

# nvidia-smi mig -cgi 19
Successfully created GPU instance ID 13 on GPU  0 using profile MIG 1g.10gb (ID 19)
반응형