728x90
반응형
ERR 로그
# nvidia-smi --gpu-reset
The following GPUs could not be reset:
GPU 00000000:B8:00.0: In use by another client
1 device is currently being used by one or more other processes (e.g., Fabric Manager, CUDA application, graphics application such as an X server, or a monitoring application such as another instance of nvidia-smi). Please first kill all processes using this device and all compute applications running in the system.
root@clunix:~# nvidia-smi -lgipp
ERROR: Option -lgipp is not recognized. Please run 'nvidia-smi -h'.
# nvidia-smi mig -lgipp
GPU 0 Profile ID 19 Placements: {0,1,2,3,4,5,6}:1
GPU 0 Profile ID 20 Placements: {0,1,2,3,4,5,6}:1
GPU 0 Profile ID 15 Placements: {0,2,4,6}:2
GPU 0 Profile ID 14 Placements: {0,2,4}:2
GPU 0 Profile ID 9 Placements: {0,4}:4
GPU 0 Profile ID 5 Placement : {0}:4
GPU 0 Profile ID 0 Placement : {0}:8
# nvidia-smi mig -lgip
+-----------------------------------------------------------------------------+
| GPU instance profiles: |
| GPU Name ID Instances Memory P2P SM DEC ENC |
| Free/Total GiB CE JPEG OFA |
|=============================================================================|
| 0 MIG 1g.10gb 19 7/7 9.75 No 14 1 0 |
| 1 1 0 |
+-----------------------------------------------------------------------------+
| 0 MIG 1g.10gb+me 20 1/1 9.75 No 14 1 0 |
| 1 1 1 |
+-----------------------------------------------------------------------------+
| 0 MIG 1g.20gb 15 4/4 19.62 No 14 1 0 |
| 1 1 0 |
+-----------------------------------------------------------------------------+
| 0 MIG 2g.20gb 14 3/3 19.62 No 30 2 0 |
| 2 2 0 |
+-----------------------------------------------------------------------------+
| 0 MIG 3g.40gb 9 2/2 39.38 No 46 3 0 |
| 3 3 0 |
+-----------------------------------------------------------------------------+
| 0 MIG 4g.40gb 5 1/1 39.38 No 62 4 0 |
| 4 4 0 |
+-----------------------------------------------------------------------------+
| 0 MIG 7g.80gb 0 1/1 79.12 No 114 7 0 |
| 8 7 1 |
+-----------------------------------------------------------------------------+
# nvidia-smi mig -cgi 19
Unable to create a GPU instance on GPU 0 using profile 19: In use by another client
Failed to create GPU instances: In use by another client
해결
CUDA 드라이버를 설치하고 ~/.bashrc 에 자동으로 환경변수를 잡도록 설정하여서 프로세스 충돌 오류가 발생하였다.
# nvidia-smi mig -cgi 19
Successfully created GPU instance ID 13 on GPU 0 using profile MIG 1g.10gb (ID 19)
반응형
'NVIDIA' 카테고리의 다른 글
[NVIDIA] Cuda Toolkit 설치 (1) | 2024.10.22 |
---|---|
[NVIDIA] MIG(Multi-Instance-GPU) Docker 컨테이너에 할당 (0) | 2024.08.03 |
[NVIDIA] MIG(Multi-Instance-GPU) 설정 및 생성 삭제 (0) | 2024.08.02 |
[NVIDIA] MIG 활용시 배포 및 시스템 고려 사항 (0) | 2024.08.01 |
[NVIDIA] MIG를 활용한 고성능 컴퓨팅 환경 구축 (0) | 2024.07.31 |