728x90
반응형
MIG 활성화하기
활성화 전
# nvidia-smi
Fri Oct 18 10:04:07 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.14 Driver Version: 550.54.14 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA H100 PCIe Off | 00000000:B8:00.0 Off | 0 |
| N/A 57C P0 57W / 350W | 0MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
활성화
# nvidia-smi -mig 1
활성화 후
# nvidia-smi
Fri Oct 18 10:03:47 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.14 Driver Version: 550.54.14 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA H100 PCIe Off | 00000000:B8:00.0 Off | On |
| N/A 58C P0 88W / 350W | 0MiB / 81559MiB | N/A Default |
| | | Enabled |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| MIG devices: |
+------------------+----------------------------------+-----------+-----------------------+
| GPU GI CI MIG | Memory-Usage | Vol| Shared |
| ID ID Dev | BAR1-Usage | SM Unc| CE ENC DEC OFA JPG |
| | | ECC| |
|==================+==================================+===========+=======================|
| No MIG devices found |
+-----------------------------------------------------------------------------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
GPU 리셋(활성 / 비활성 시 gpu 리셋)
nvidia-smi --gpu-reset
GI 생성 후 GI 조회
# nvidia-smi mig -cgi 19
Successfully created GPU instance ID 13 on GPU 0 using profile MIG 1g.10gb (ID 19)
# nvidia-smi mig -lgi
+-------------------------------------------------------+
| GPU instances: |
| GPU Name Profile Instance Placement |
| ID ID Start:Size |
|=======================================================|
| 0 MIG 1g.10gb 19 13 6:1 |
+-------------------------------------------------------+
CI 생성 후 CI 조회
# nvidia-smi mig -cci -gi 13
Successfully created compute instance ID 0 on GPU 0 GPU instance ID 13 using profile MIG 1g.10gb (ID 0)
# nvidia-smi mig -lci
+--------------------------------------------------------------------+
| Compute instances: |
| GPU GPU Name Profile Instance Placement |
| Instance ID ID Start:Size |
| ID |
|====================================================================|
| 0 13 MIG 1g.10gb 0 0 0:1 |
+--------------------------------------------------------------------+
MIG 생성 확인
# nvidia-smi
Fri Oct 18 10:10:55 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.14 Driver Version: 550.54.14 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA H100 PCIe Off | 00000000:B8:00.0 Off | On |
| N/A 54C P0 55W / 350W | 12MiB / 81559MiB | N/A Default |
| | | Enabled |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| MIG devices: |
+------------------+----------------------------------+-----------+-----------------------+
| GPU GI CI MIG | Memory-Usage | Vol| Shared |
| ID ID Dev | BAR1-Usage | SM Unc| CE ENC DEC OFA JPG |
| | | ECC| |
|==================+==================================+===========+=======================|
| 0 13 0 0 | 12MiB / 9984MiB | 14 0 | 1 0 1 0 1 |
| | 0MiB / 16383MiB | | |
+------------------+----------------------------------+-----------+-----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
MIG 의 UUID 확인
# nvidia-smi -L
GPU 0: NVIDIA H100 PCIe (UUID: GPU-c48bbfa5-9a67-255a-a6a3-8fecf44951b4)
MIG 1g.10gb Device 0: (UUID: MIG-d56fa0ad-7e21-56d6-aee3-fa508d0e6fc9)
도커 실행
# docker run --gpus '"device=MIG-d56fa0ad-7e21-56d6-aee3-fa508d0e6fc9"' -it --rm tensorflow/tensorflow:latest-gpu
________ _______________
___ __/__________________________________ ____/__ /________ __
__ / _ _ \_ __ \_ ___/ __ \_ ___/_ /_ __ /_ __ \_ | /| / /
_ / / __/ / / /(__ )/ /_/ / / _ __/ _ / / /_/ /_ |/ |/ /
/_/ \___//_/ /_//____/ \____//_/ /_/ /_/ \____/____/|__/
WARNING: You are running this container as root, which can cause new files in
mounted volumes to be created as the root user on your host machine.
To avoid this, run the container by specifying your user's userid:
$ docker run -u $(id -u):$(id -g) args...
root@fc4dbfe09ff8:/# nvidia-smi
Fri Oct 18 01:28:02 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.14 Driver Version: 550.54.14 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA H100 PCIe Off | 00000000:B8:00.0 Off | On |
| N/A 53C P0 54W / 350W | N/A | N/A Default |
| | | Enabled |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| MIG devices: |
+------------------+----------------------------------+-----------+-----------------------+
| GPU GI CI MIG | Memory-Usage | Vol| Shared |
| ID ID Dev | BAR1-Usage | SM Unc| CE ENC DEC OFA JPG |
| | | ECC| |
|==================+==================================+===========+=======================|
| 0 13 0 0 | 12MiB / 9984MiB | 14 0 | 1 0 1 0 1 |
| | 0MiB / 16383MiB | | |
+------------------+----------------------------------+-----------+-----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
반응형
'Docker' 카테고리의 다른 글
[Docker] GPU 컨테이너 생성 (1) | 2024.10.23 |
---|---|
[Docker] Podman (3) | 2024.10.11 |
[Docker] 서로 다른 서버의 컨테이너 간 통신 (3) | 2024.10.01 |
[Docker] 도커 네트워크 bridge 설정 (0) | 2024.10.01 |
[Docker] ceth, veth 각각 연결 vs 페어 연결 (0) | 2024.10.01 |