728x90
반응형
1. MIG 활성화
- 확인(비활성화 되어 있음)
# nvidia-smi
Wed Jul 31 10:57:26 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07 Driver Version: 550.90.07 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA H100 80GB HBM3 Off | 00000000:19:00.0 Off | 0 |
| N/A 37C P0 115W / 700W | 1MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA H100 80GB HBM3 Off | 00000000:2D:00.0 Off | 0 |
| N/A 39C P0 115W / 700W | 1MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 2 NVIDIA H100 80GB HBM3 Off | 00000000:3F:00.0 Off | 0 |
| N/A 39C P0 116W / 700W | 1MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 3 NVIDIA H100 80GB HBM3 Off | 00000000:66:00.0 Off | 0 |
| N/A 37C P0 118W / 700W | 1MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 4 NVIDIA H100 80GB HBM3 Off | 00000000:9B:00.0 Off | 0 |
| N/A 36C P0 116W / 700W | 1MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 5 NVIDIA H100 80GB HBM3 Off | 00000000:AE:00.0 Off | 0 |
| N/A 41C P0 128W / 700W | 1MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 6 NVIDIA H100 80GB HBM3 Off | 00000000:BF:00.0 Off | 0 |
| N/A 40C P0 123W / 700W | 1MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 7 NVIDIA H100 80GB HBM3 Off | 00000000:E4:00.0 Off | 0 |
| N/A 37C P0 116W / 700W | 1MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
- 2번GPU MIG 활성화 후 활성화 확인
# nvidia-smi -i 2 -mig 1
# nvidia-smi
Wed Jul 31 10:58:40 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07 Driver Version: 550.90.07 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA H100 80GB HBM3 Off | 00000000:19:00.0 Off | 0 |
| N/A 37C P0 115W / 700W | 1MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA H100 80GB HBM3 Off | 00000000:2D:00.0 Off | On |
| N/A 38C P0 115W / 700W | 1MiB / 81559MiB | N/A Default |
| | | Enabled |
+-----------------------------------------+------------------------+----------------------+
| 2 NVIDIA H100 80GB HBM3 Off | 00000000:3F:00.0 Off | On |
| N/A 39C P0 116W / 700W | 1MiB / 81559MiB | N/A Default |
| | | Enabled |
+-----------------------------------------+------------------------+----------------------+
| 3 NVIDIA H100 80GB HBM3 Off | 00000000:66:00.0 Off | On |
| N/A 37C P0 118W / 700W | 1MiB / 81559MiB | N/A Default |
| | | Enabled |
+-----------------------------------------+------------------------+----------------------+
| 4 NVIDIA H100 80GB HBM3 Off | 00000000:9B:00.0 Off | On |
| N/A 36C P0 115W / 700W | 1MiB / 81559MiB | N/A Default |
| | | Enabled |
+-----------------------------------------+------------------------+----------------------+
| 5 NVIDIA H100 80GB HBM3 Off | 00000000:AE:00.0 Off | 0 |
| N/A 40C P0 127W / 700W | 1MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 6 NVIDIA H100 80GB HBM3 Off | 00000000:BF:00.0 Off | 0 |
| N/A 40C P0 123W / 700W | 1MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 7 NVIDIA H100 80GB HBM3 Off | 00000000:E4:00.0 Off | 0 |
| N/A 37C P0 115W / 700W | 1MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| MIG devices: |
+------------------+----------------------------------+-----------+-----------------------+
| GPU GI CI MIG | Memory-Usage | Vol| Shared |
| ID ID Dev | BAR1-Usage | SM Unc| CE ENC DEC OFA JPG |
| | | ECC| |
|==================+==================================+===========+=======================|
| No MIG devices found |
+-----------------------------------------------------------------------------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
2. MIG GI, CI 생성
- 2.1 GI 생성
# nvidia-smi mig -i 2 -cgi 5,9
Successfully created GPU instance ID 1 on GPU 2 using profile MIG 4g.40gb (ID 5)
Successfully created GPU instance ID 2 on GPU 2 using profile MIG 3g.40gb (ID 9)
# nvidia-smi mig -lgi
+-------------------------------------------------------+
| GPU instances: |
| GPU Name Profile Instance Placement |
| ID ID Start:Size |
|=======================================================|
| 2 MIG 3g.40gb 9 2 4:4 |
+-------------------------------------------------------+
| 2 MIG 4g.40gb 5 1 0:4 |
+-------------------------------------------------------+
- 2.2 CI 생성
# nvidia-smi mig -i 2 -cci -gi 2,1
Successfully created compute instance ID 0 on GPU 2 GPU instance ID 2 using profile MIG 3g.40gb (ID 2)
Successfully created compute instance ID 0 on GPU 2 GPU instance ID 1 using profile MIG 4g.40gb (ID 3)
# nvidia-smi mig -lci
+--------------------------------------------------------------------+
| Compute instances: |
| GPU GPU Name Profile Instance Placement |
| Instance ID ID Start:Size |
| ID |
|====================================================================|
| 2 2 MIG 3g.40gb 2 0 0:4 |
+--------------------------------------------------------------------+
| 2 1 MIG 4g.40gb 3 0 0:4 |
+--------------------------------------------------------------------+
- 2.3 MIG 생성 확인
# nvidia-smi
...
+-----------------------------------------------------------------------------------------+
| MIG devices: |
+------------------+----------------------------------+-----------+-----------------------+
| GPU GI CI MIG | Memory-Usage | Vol| Shared |
| ID ID Dev | BAR1-Usage | SM Unc| CE ENC DEC OFA JPG |
| | | ECC| |
|==================+==================================+===========+=======================|
| 2 1 0 0 | 51MiB / 40320MiB | 64 0 | 4 0 4 0 4 |
| | 0MiB / 65535MiB | | |
+------------------+----------------------------------+-----------+-----------------------+
| 2 2 0 1 | 38MiB / 40320MiB | 60 0 | 3 0 3 0 3 |
| | 0MiB / 65535MiB | | |
+------------------+----------------------------------+-----------+-----------------------+
...
3. DOCKER
- Docker, nvidia-docker 리파지토리 추가 후 docker 와 nvidia docker 설치
# dnf config-manager --add-repo=https://download.docker.com/linux/centos/docker-ce.repo
# dnf -y install containerd.io
# dnf -y install docker-ce
# curl https://nvidia.github.io/nvidia-docker/rhel8.0/nvidia-docker.repo > /etc/yum.repos.d/nvidia-docker.repo
# dnf -y install nvidia-docker2
# systemctl restart docker
# systemctl status docker
4. 작업제출
- 4.1 pytorch 작업 제출
# docker run --gpus '"device=2:0"' nvcr.io/nvidia/pytorch:21.12-py3 /bin/bash -c "cd /opt/pytorch/examples/upstream/mnist && python main.py"
# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS
NAMES
463d966bb1f1 nvcr.io/nvidia/pytorch:21.12-py3 "/opt/nvidia/nvidia_…" 12 minutes ago Up 12 minutes 6006/tcp, 8888/tcp
cranky_benz
# nvidia-smi
.....
+-----------------------------------------------------------------------------------------+
| MIG devices: |
+------------------+----------------------------------+-----------+-----------------------+
| GPU GI CI MIG | Memory-Usage | Vol| Shared |
| ID ID Dev | BAR1-Usage | SM Unc| CE ENC DEC OFA JPG |
| | | ECC| |
|==================+==================================+===========+=======================|
| 2 1 0 0 | 724MiB / 40320MiB | 64 0 | 4 0 4 0 4 |
| | 3MiB / 65535MiB | | |
+------------------+----------------------------------+-----------+-----------------------+
| 2 2 0 1 | 38MiB / 40320MiB | 60 0 | 3 0 3 0 3 |
| | 0MiB / 65535MiB | | |
+------------------+----------------------------------+-----------+-----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 2 1 0 743978 C python 664MiB |
+-----------------------------------------------------------------------------------------+
.....
- 3.2 작업 추가 후 확인
# docker run --gpus '"device=2:1"' nvcr.io/nvidia/pytorch:21.12-py3 /bin/bash -c "cd /opt/pytorch/examples/upstream/mnist && python main.py"
# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS
NAMES
ae6d21ebdd37 nvcr.io/nvidia/pytorch:21.12-py3 "/opt/nvidia/nvidia_…" 6 minutes ago Up 6 minutes 6006/tcp, 8888/tcp
upbeat_ganguly
463d966bb1f1 nvcr.io/nvidia/pytorch:21.12-py3 "/opt/nvidia/nvidia_…" 24 minutes ago Up 24 minutes 6006/tcp, 8888/tcp
cranky_benz
# nvidia-smi
...
+-----------------------------------------------------------------------------------------+
| MIG devices: |
+------------------+----------------------------------+-----------+-----------------------+
| GPU GI CI MIG | Memory-Usage | Vol| Shared |
| ID ID Dev | BAR1-Usage | SM Unc| CE ENC DEC OFA JPG |
| | | ECC| |
|==================+==================================+===========+=======================|
| 2 1 0 0 | 890MiB / 40320MiB | 64 0 | 4 0 4 0 4 |
| | 3MiB / 65535MiB | | |
+------------------+----------------------------------+-----------+-----------------------+
| 2 2 0 1 | 295MiB / 40320MiB | 60 0 | 3 0 3 0 3 |
| | 3MiB / 65535MiB | | |
+------------------+----------------------------------+-----------+-----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 2 1 0 743978 C python 830MiB |
| 2 2 0 755998 C python 248MiB |
+-----------------------------------------------------------------------------------------+
반응형
'NVIDIA' 카테고리의 다른 글
[NVIDIA] Cuda Toolkit 설치 (1) | 2024.10.22 |
---|---|
[NVIDIA] In use by another client(프로세스 충돌) (0) | 2024.10.18 |
[NVIDIA] MIG(Multi-Instance-GPU) 설정 및 생성 삭제 (0) | 2024.08.02 |
[NVIDIA] MIG 활용시 배포 및 시스템 고려 사항 (0) | 2024.08.01 |
[NVIDIA] MIG를 활용한 고성능 컴퓨팅 환경 구축 (0) | 2024.07.31 |