原文: https://pytorch.org/tutorials/beginner/blitz/data_parallel_tutorial.html
作者: Sung Kim Jenny Kang
譯者: bat67
校驗(yàn)者: FontTian 片刻 yearing1017
在這個教程里,我們將學(xué)習(xí)如何使用數(shù)據(jù)并行(DataParallel
)來使用多GPU。
PyTorch非常容易的就可以使用GPU,可以用如下方式把一個模型放到GPU上:
device = torch.device("cuda: 0")
model.to(device)
然后可以復(fù)制所有的張量到GPU上:
mytensor = my_tensor.to(device)
請注意,調(diào)用my_tensor.to(device)
返回一個GPU上的my_tensor
副本,而不是重寫my_tensor
。你需要把它賦值給一個新的張量并在GPU上使用這個張量。
在多GPU上執(zhí)行正向和反向傳播是自然而然的事。然而,PyTorch 默認(rèn)將只是用一個GPU。你可以使用DataParallel
讓模型并行運(yùn)行來輕易的在多個GPU上運(yùn)行你的操作。
model = nn.DataParallel(model)
這是這篇教程背后的核心,我們接下來將更詳細(xì)的介紹它。
導(dǎo)入 PyTorch 模塊和定義參數(shù)。
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader
## Parameters 和 DataLoaders
input_size = 5
output_size = 2
batch_size = 30
data_size = 100
設(shè)備( Device ):
device = torch.device("cuda: 0" if torch.cuda.is_available() else "cpu")
要制作一個虛擬(隨機(jī))數(shù)據(jù)集,你只需實(shí)現(xiàn)__getitem__
。
class RandomDataset(Dataset):
def __init__(self, size, length):
self.len = length
self.data = torch.randn(length, size)
def __getitem__(self, index):
return self.data[index]
def __len__(self):
return self.len
rand_loader = DataLoader(dataset=RandomDataset(input_size, data_size),
batch_size=batch_size, shuffle=True)
作為演示,我們的模型只接受一個輸入,執(zhí)行一個線性操作,然后得到結(jié)果。然而,你能在任何模型(CNN,RNN,Capsule Net等)上使用DataParallel
。
我們在模型內(nèi)部放置了一條打印語句來檢測輸入和輸出向量的大小。請注意批等級為0時打印的內(nèi)容。
class Model(nn.Module):
# Our model
def __init__(self, input_size, output_size):
super(Model, self).__init__()
self.fc = nn.Linear(input_size, output_size)
def forward(self, input):
output = self.fc(input)
print("\tIn Model: input size", input.size(),
"output size", output.size())
return output
這是本教程的核心部分。首先,我們需要創(chuàng)建一個模型實(shí)例和檢測我們是否有多個GPU。如果我們有多個GPU,我們使用nn.DataParallel
來包裝我們的模型。然后通過model.to(device)
把模型放到GPU上。
model = Model(input_size, output_size)
if torch.cuda.device_count() > 1:
print("Let's use", torch.cuda.device_count(), "GPUs!")
# dim = 0 [30, xxx] -> [10, ...], [10, ...], [10, ...] on 3 GPUs
model = nn.DataParallel(model)
model.to(device)
輸出:
Let's use 2 GPUs!
現(xiàn)在我們可以看輸入和輸出張量的大小。
for data in rand_loader:
input = data.to(device)
output = model(input)
print("Outside: input size", input.size(),
"output_size", output.size())
輸出:
In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
In Model: input size torch.Size([5, 5]) output size torch.Size([5, 2])
In Model: input size torch.Size([5, 5]) output size torch.Size([5, 2])
Outside: input size torch.Size([10, 5]) output_size torch.Size([10, 2])
如果沒有GPU或只有1個GPU,當(dāng)我們對30個輸入和輸出進(jìn)行批處理時,我們和期望的一樣得到30個輸入和30個輸出,但是若有多個GPU,會得到如下的結(jié)果。
若有2個GPU,將看到:
Let's use 2 GPUs!
In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
In Model: input size torch.Size([5, 5]) output size torch.Size([5, 2])
In Model: input size torch.Size([5, 5]) output size torch.Size([5, 2])
Outside: input size torch.Size([10, 5]) output_size torch.Size([10, 2])
若有3個GPU,將看到:
Let's use 3 GPUs!
In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
Outside: input size torch.Size([10, 5]) output_size torch.Size([10, 2])
若有8個GPU,將看到:
Let's use 8 GPUs!
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
Outside: input size torch.Size([10, 5]) output_size torch.Size([10, 2])
DataParallel
自動的劃分?jǐn)?shù)據(jù),并將作業(yè)順序發(fā)送到多個GPU上的多個模型。DataParallel
會在每個模型完成作業(yè)后,收集與合并結(jié)果然后返回給你。
更多信息,請參考: https://pytorch.org/tutorials/beginner/former_torchies/parallelism_tutorial.html
更多建議: