PyTorch (實(shí)驗(yàn)性)在 PyTorch 中使用 Eager 模式進(jìn)行靜態(tài)量化

2020-09-10 11:41 更新
原文: https://pytorch.org/tutorials/advanced/static_quantization_tutorial.html

作者: Raghuraman Krishnamoorthi

由編輯:賽斯·魏德曼

本教程介紹了如何進(jìn)行訓(xùn)練后的靜態(tài)量化,并說明了兩種更先進(jìn)的技術(shù)-每通道量化和量化感知訓(xùn)練-可以進(jìn)一步提高模型的準(zhǔn)確性。 請注意,目前僅支持 CPU 量化,因此在本教程中我們將不使用 GPU / CUDA。

在本教程結(jié)束時(shí),您將看到 PyTorch 中的量化如何導(dǎo)致模型大小顯著減小同時(shí)提高速度。 此外,您將在此處看到如何輕松應(yīng)用中顯示的一些高級量化技術(shù),從而使量化后的模型獲得的準(zhǔn)確性降低得多。

警告:我們使用了許多其他 PyTorch 倉庫中的樣板代碼,例如,定義MobileNetV2模型架構(gòu),定義數(shù)據(jù)加載器等。 我們當(dāng)然鼓勵(lì)您閱讀它; 但是如果要使用量化功能,請隨時(shí)跳至“ 4。 訓(xùn)練后靜態(tài)量化”部分。

我們將從進(jìn)行必要的導(dǎo)入開始:

import numpy as np
import torch
import torch.nn as nn
import torchvision
from torch.utils.data import DataLoader
from torchvision import datasets
import torchvision.transforms as transforms
import os
import time
import sys
import torch.quantization


## # Setup warnings
import warnings
warnings.filterwarnings(
    action='ignore',
    category=DeprecationWarning,
    module=r'.*'
)
warnings.filterwarnings(
    action='default',
    module=r'torch.quantization'
)


## Specify random seed for repeatable results
torch.manual_seed(191009)

1.模型架構(gòu)

我們首先定義 MobileNetV2 模型體系結(jié)構(gòu),并進(jìn)行了一些值得注意的修改以實(shí)現(xiàn)量化:

  • nn.quantized.FloatFunctional代替加法運(yùn)算
  • 在網(wǎng)絡(luò)的開頭和結(jié)尾處插入QuantStubDeQuantStub。
  • 用 ReLU 替換 ReLU6

注意:此代碼取自此處。

from torch.quantization import QuantStub, DeQuantStub


def _make_divisible(v, divisor, min_value=None):
    """
    This function is taken from the original tf repo.
    It ensures that all layers have a channel number that is divisible by 8
    It can be seen here:
    https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet/mobilenet.py
    :param v:
    :param divisor:
    :param min_value:
    :return:
    """
    if min_value is None:
        min_value = divisor
    new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
    # Make sure that round down does not go down by more than 10%.
    if new_v < 0.9 * v:
        new_v += divisor
    return new_v


class ConvBNReLU(nn.Sequential):
    def __init__(self, in_planes, out_planes, kernel_size=3, stride=1, groups=1):
        padding = (kernel_size - 1) // 2
        super(ConvBNReLU, self).__init__(
            nn.Conv2d(in_planes, out_planes, kernel_size, stride, padding, groups=groups, bias=False),
            nn.BatchNorm2d(out_planes, momentum=0.1),
            # Replace with ReLU
            nn.ReLU(inplace=False)
        )


class InvertedResidual(nn.Module):
    def __init__(self, inp, oup, stride, expand_ratio):
        super(InvertedResidual, self).__init__()
        self.stride = stride
        assert stride in [1, 2]


        hidden_dim = int(round(inp * expand_ratio))
        self.use_res_connect = self.stride == 1 and inp == oup


        layers = []
        if expand_ratio != 1:
            # pw
            layers.append(ConvBNReLU(inp, hidden_dim, kernel_size=1))
        layers.extend([
            # dw
            ConvBNReLU(hidden_dim, hidden_dim, stride=stride, groups=hidden_dim),
            # pw-linear
            nn.Conv2d(hidden_dim, oup, 1, 1, 0, bias=False),
            nn.BatchNorm2d(oup, momentum=0.1),
        ])
        self.conv = nn.Sequential(*layers)
        # Replace torch.add with floatfunctional
        self.skip_add = nn.quantized.FloatFunctional()


    def forward(self, x):
        if self.use_res_connect:
            return self.skip_add.add(x, self.conv(x))
        else:
            return self.conv(x)


class MobileNetV2(nn.Module):
    def __init__(self, num_classes=1000, width_mult=1.0, inverted_residual_setting=None, round_nearest=8):
        """
        MobileNet V2 main class


        Args:
            num_classes (int): Number of classes
            width_mult (float): Width multiplier - adjusts number of channels in each layer by this amount
            inverted_residual_setting: Network structure
            round_nearest (int): Round the number of channels in each layer to be a multiple of this number
            Set to 1 to turn off rounding
        """
        super(MobileNetV2, self).__init__()
        block = InvertedResidual
        input_channel = 32
        last_channel = 1280


        if inverted_residual_setting is None:
            inverted_residual_setting = [
                # t, c, n, s
                [1, 16, 1, 1],
                [6, 24, 2, 2],
                [6, 32, 3, 2],
                [6, 64, 4, 2],
                [6, 96, 3, 1],
                [6, 160, 3, 2],
                [6, 320, 1, 1],
            ]


        # only check the first element, assuming user knows t,c,n,s are required
        if len(inverted_residual_setting) == 0 or len(inverted_residual_setting[0]) != 4:
            raise ValueError("inverted_residual_setting should be non-empty "
                             "or a 4-element list, got {}".format(inverted_residual_setting))


        # building first layer
        input_channel = _make_divisible(input_channel * width_mult, round_nearest)
        self.last_channel = _make_divisible(last_channel * max(1.0, width_mult), round_nearest)
        features = [ConvBNReLU(3, input_channel, stride=2)]
        # building inverted residual blocks
        for t, c, n, s in inverted_residual_setting:
            output_channel = _make_divisible(c * width_mult, round_nearest)
            for i in range(n):
                stride = s if i == 0 else 1
                features.append(block(input_channel, output_channel, stride, expand_ratio=t))
                input_channel = output_channel
        # building last several layers
        features.append(ConvBNReLU(input_channel, self.last_channel, kernel_size=1))
        # make it nn.Sequential
        self.features = nn.Sequential(*features)
        self.quant = QuantStub()
        self.dequant = DeQuantStub()
        # building classifier
        self.classifier = nn.Sequential(
            nn.Dropout(0.2),
            nn.Linear(self.last_channel, num_classes),
        )


        # weight initialization
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out')
                if m.bias is not None:
                    nn.init.zeros_(m.bias)
            elif isinstance(m, nn.BatchNorm2d):
                nn.init.ones_(m.weight)
                nn.init.zeros_(m.bias)
            elif isinstance(m, nn.Linear):
                nn.init.normal_(m.weight, 0, 0.01)
                nn.init.zeros_(m.bias)


    def forward(self, x):


        x = self.quant(x)


        x = self.features(x)
        x = x.mean([2, 3])
        x = self.classifier(x)
        x = self.dequant(x)
        return x


    # Fuse Conv+BN and Conv+BN+Relu modules prior to quantization
    # This operation does not change the numerics
    def fuse_model(self):
        for m in self.modules():
            if type(m) == ConvBNReLU:
                torch.quantization.fuse_modules(m, ['0', '1', '2'], inplace=True)
            if type(m) == InvertedResidual:
                for idx in range(len(m.conv)):
                    if type(m.conv[idx]) == nn.Conv2d:
                        torch.quantization.fuse_modules(m.conv, [str(idx), str(idx + 1)], inplace=True)

2.助手功能

接下來,我們定義一些幫助程序功能以幫助模型評估。 這些主要來自這里。

class AverageMeter(object):
    """Computes and stores the average and current value"""
    def __init__(self, name, fmt=':f'):
        self.name = name
        self.fmt = fmt
        self.reset()


    def reset(self):
        self.val = 0
        self.avg = 0
        self.sum = 0
        self.count = 0


    def update(self, val, n=1):
        self.val = val
        self.sum += val * n
        self.count += n
        self.avg = self.sum / self.count


    def __str__(self):
        fmtstr = '{name} {val' + self.fmt + '} ({avg' + self.fmt + '})'
        return fmtstr.format(**self.__dict__)


def accuracy(output, target, topk=(1,)):
    """Computes the accuracy over the k top predictions for the specified values of k"""
    with torch.no_grad():
        maxk = max(topk)
        batch_size = target.size(0)


        _, pred = output.topk(maxk, 1, True, True)
        pred = pred.t()
        correct = pred.eq(target.view(1, -1).expand_as(pred))


        res = []
        for k in topk:
            correct_k = correct[:k].view(-1).float().sum(0, keepdim=True)
            res.append(correct_k.mul_(100.0 / batch_size))
        return res


def evaluate(model, criterion, data_loader, neval_batches):
    model.eval()
    top1 = AverageMeter('Acc@1', ':6.2f')
    top5 = AverageMeter('Acc@5', ':6.2f')
    cnt = 0
    with torch.no_grad():
        for image, target in data_loader:
            output = model(image)
            loss = criterion(output, target)
            cnt += 1
            acc1, acc5 = accuracy(output, target, topk=(1, 5))
            print('.', end = '')
            top1.update(acc1[0], image.size(0))
            top5.update(acc5[0], image.size(0))
            if cnt >= neval_batches:
                 return top1, top5


    return top1, top5


def load_model(model_file):
    model = MobileNetV2()
    state_dict = torch.load(model_file)
    model.load_state_dict(state_dict)
    model.to('cpu')
    return model


def print_size_of_model(model):
    torch.save(model.state_dict(), "temp.p")
    print('Size (MB):', os.path.getsize("temp.p")/1e6)
    os.remove('temp.p')

3.定義數(shù)據(jù)集和數(shù)據(jù)加載器

作為最后的主要設(shè)置步驟,我們?yōu)橛?xùn)練和測試集定義了數(shù)據(jù)加載器。

ImageNet 數(shù)據(jù)

我們?yōu)楸窘坛虅?chuàng)建的特定數(shù)據(jù)集僅包含來自 ImageNet 數(shù)據(jù)的 1000 張圖像,每個(gè)類別都有一張(該數(shù)據(jù)集的大小剛好超過 250 MB,可以相對輕松地下載)。 此自定義數(shù)據(jù)集的 URL 為:

https://s3.amazonaws.com/pytorch-tutorial-assets/imagenet_1k.zip

要使用 Python 在本地下載此數(shù)據(jù),可以使用:

import requests


url = 'https://s3.amazonaws.com/pytorch-tutorial-assets/imagenet_1k.zip`
filename = '~/Downloads/imagenet_1k_data.zip'


r = requests.get(url)


with open(filename, 'wb') as f:
    f.write(r.content)

為了運(yùn)行本教程,我們下載了這些數(shù)據(jù),并使用 Makefile 中的這些行將其移到正確的位置。

另一方面,要使用整個(gè) ImageNet 數(shù)據(jù)集運(yùn)行本教程中的代碼,可以在后面的之后使用torchvision下載數(shù)據(jù)。 例如,要下載訓(xùn)練集并對其進(jìn)行一些標(biāo)準(zhǔn)轉(zhuǎn)換,可以使用:

import torchvision
import torchvision.transforms as transforms


imagenet_dataset = torchvision.datasets.ImageNet(
    '~/.data/imagenet',
    split='train',
    download=True,
    transforms.Compose([
        transforms.RandomResizedCrop(224),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406],
                             std=[0.229, 0.224, 0.225]),
    ])

下載完數(shù)據(jù)后,我們在下面顯示了一些函數(shù),這些函數(shù)定義了用于讀取該數(shù)據(jù)的數(shù)據(jù)加載器。 這些功能主要來自此處。

def prepare_data_loaders(data_path):


    traindir = os.path.join(data_path, 'train')
    valdir = os.path.join(data_path, 'val')
    normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                     std=[0.229, 0.224, 0.225])


    dataset = torchvision.datasets.ImageFolder(
        traindir,
        transforms.Compose([
            transforms.RandomResizedCrop(224),
            transforms.RandomHorizontalFlip(),
            transforms.ToTensor(),
            normalize,
        ]))


    dataset_test = torchvision.datasets.ImageFolder(
        valdir,
        transforms.Compose([
            transforms.Resize(256),
            transforms.CenterCrop(224),
            transforms.ToTensor(),
            normalize,
        ]))


    train_sampler = torch.utils.data.RandomSampler(dataset)
    test_sampler = torch.utils.data.SequentialSampler(dataset_test)


    data_loader = torch.utils.data.DataLoader(
        dataset, batch_size=train_batch_size,
        sampler=train_sampler)


    data_loader_test = torch.utils.data.DataLoader(
        dataset_test, batch_size=eval_batch_size,
        sampler=test_sampler)


    return data_loader, data_loader_test

接下來,我們將加載經(jīng)過預(yù)先??訓(xùn)練的 MobileNetV2 模型。 我們在中提供從torchvision 中下載數(shù)據(jù)的 URL。

data_path = 'data/imagenet_1k'
saved_model_dir = 'data/'
float_model_file = 'mobilenet_pretrained_float.pth'
scripted_float_model_file = 'mobilenet_quantization_scripted.pth'
scripted_quantized_model_file = 'mobilenet_quantization_scripted_quantized.pth'


train_batch_size = 30
eval_batch_size = 30


data_loader, data_loader_test = prepare_data_loaders(data_path)
criterion = nn.CrossEntropyLoss()
float_model = load_model(saved_model_dir + float_model_file).to('cpu')

接下來,我們將“融合模塊”; 通過節(jié)省內(nèi)存訪問量,這可以使模型更快,同時(shí)還可以提高數(shù)值精度。 盡管這可以用于任何模型,但在量化模型中尤為常見。

print('\n Inverted Residual Block: Before fusion \n\n', float_model.features[1].conv)
float_model.eval()


## Fuses modules
float_model.fuse_model()


## Note fusion of Conv+BN+Relu and Conv+Relu
print('\n Inverted Residual Block: After fusion\n\n',float_model.features[1].conv)

得出:

Inverted Residual Block: Before fusion


 Sequential(
  (0): ConvBNReLU(
    (0): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False)
    (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU()
  )
  (1): Conv2d(32, 16, kernel_size=(1, 1), stride=(1, 1), bias=False)
  (2): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)


 Inverted Residual Block: After fusion


 Sequential(
  (0): ConvBNReLU(
    (0): ConvReLU2d(
      (0): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32)
      (1): ReLU()
    )
    (1): Identity()
    (2): Identity()
  )
  (1): Conv2d(32, 16, kernel_size=(1, 1), stride=(1, 1))
  (2): Identity()
)

最后,要獲得“基準(zhǔn)”精度,讓我們看看帶有融合模塊的未量化模型的精度

num_eval_batches = 10


print("Size of baseline model")
print_size_of_model(float_model)


top1, top5 = evaluate(float_model, criterion, data_loader_test, neval_batches=num_eval_batches)
print('Evaluation accuracy on %d images, %2.2f'%(num_eval_batches * eval_batch_size, top1.avg))
torch.jit.save(torch.jit.script(float_model), saved_model_dir + scripted_float_model_file)

得出:

Size of baseline model
Size (MB): 13.981375
..........Evaluation accuracy on 300 images, 78.00

我們看到 300 張圖像的準(zhǔn)確率達(dá)到 78%,這是 ImageNet 的堅(jiān)實(shí)基礎(chǔ),尤其是考慮到我們的模型只有 14.0 MB 時(shí)。

這將是我們比較的基準(zhǔn)。 接下來,讓我們嘗試不同的量化方法

4.訓(xùn)練后靜態(tài)量化

訓(xùn)練后的靜態(tài)量化不僅涉及像動態(tài)量化中那樣將權(quán)重從 float 轉(zhuǎn)換為 int,而且還執(zhí)行額外的步驟,即首先通過網(wǎng)絡(luò)饋送一批數(shù)據(jù)并計(jì)算不同激活的結(jié)果分布(具體而言,這是 通過在記錄此數(shù)據(jù)的不同點(diǎn)插入<cite>觀察者</cite>模塊來完成)。 然后使用這些分布來確定在推理時(shí)如何具體量化不同的激活(一種簡單的技術(shù)將簡單地將整個(gè)激活范圍劃分為 256 個(gè)級別,但我們也支持更復(fù)雜的方法)。 重要的是,此附加步驟使我們能夠在操作之間傳遞量化值,而不是在每次操作之間將這些值轉(zhuǎn)換為浮點(diǎn)數(shù),然后再轉(zhuǎn)換為整數(shù),從而顯著提高了速度。

num_calibration_batches = 10


myModel = load_model(saved_model_dir + float_model_file).to('cpu')
myModel.eval()


## Fuse Conv, bn and relu
myModel.fuse_model()


## Specify quantization configuration
## Start with simple min/max range estimation and per-tensor quantization of weights
myModel.qconfig = torch.quantization.default_qconfig
print(myModel.qconfig)
torch.quantization.prepare(myModel, inplace=True)


## Calibrate first
print('Post Training Quantization Prepare: Inserting Observers')
print('\n Inverted Residual Block:After observer insertion \n\n', myModel.features[1].conv)


## Calibrate with the training set
evaluate(myModel, criterion, data_loader, neval_batches=num_calibration_batches)
print('Post Training Quantization: Calibration done')


## Convert to quantized model
torch.quantization.convert(myModel, inplace=True)
print('Post Training Quantization: Convert done')
print('\n Inverted Residual Block: After fusion and quantization, note fused modules: \n\n',myModel.features[1].conv)


print("Size of model after quantization")
print_size_of_model(myModel)


top1, top5 = evaluate(myModel, criterion, data_loader_test, neval_batches=num_eval_batches)
print('Evaluation accuracy on %d images, %2.2f'%(num_eval_batches * eval_batch_size, top1.avg))

得出:

QConfig(activation=functools.partial(<class 'torch.quantization.observer.MinMaxObserver'>, reduce_range=True), weight=functools.partial(<class 'torch.quantization.observer.MinMaxObserver'>, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric))
Post Training Quantization Prepare: Inserting Observers


 Inverted Residual Block:After observer insertion


 Sequential(
  (0): ConvBNReLU(
    (0): ConvReLU2d(
      (0): Conv2d(
        32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32
        (activation_post_process): MinMaxObserver(min_val=None, max_val=None)
      )
      (1): ReLU(
        (activation_post_process): MinMaxObserver(min_val=None, max_val=None)
      )
    )
    (1): Identity()
    (2): Identity()
  )
  (1): Conv2d(
    32, 16, kernel_size=(1, 1), stride=(1, 1)
    (activation_post_process): MinMaxObserver(min_val=None, max_val=None)
  )
  (2): Identity()
)
..........Post Training Quantization: Calibration done
Post Training Quantization: Convert done


 Inverted Residual Block: After fusion and quantization, note fused modules:


 Sequential(
  (0): ConvBNReLU(
    (0): QuantizedConvReLU2d(32, 32, kernel_size=(3, 3), stride=(1, 1), scale=0.15092508494853973, zero_point=0, padding=(1, 1), groups=32)
    (1): Identity()
    (2): Identity()
  )
  (1): QuantizedConv2d(32, 16, kernel_size=(1, 1), stride=(1, 1), scale=0.1737997829914093, zero_point=72)
  (2): Identity()
)
Size of model after quantization
Size (MB): 3.58906
..........Evaluation accuracy on 300 images, 63.33

對于這個(gè)量化模型,我們發(fā)現(xiàn)在這 300 張相同的圖像上,準(zhǔn)確率僅低至?62%。 但是,我們確實(shí)將模型的大小減小到了 3.6 MB 以下,幾乎減少了 4 倍。

此外,我們可以簡單地通過使用不同的量化配置來顯著提高準(zhǔn)確性。 我們使用推薦的配置對 x86 架構(gòu)進(jìn)行量化,重復(fù)相同的練習(xí)。 此配置執(zhí)行以下操作:

  • 量化每個(gè)通道的權(quán)重
  • 使用直方圖觀察器,該直方圖觀察器收集激活的直方圖,然后以最佳方式選擇量化參數(shù)。
per_channel_quantized_model = load_model(saved_model_dir + float_model_file)
per_channel_quantized_model.eval()
per_channel_quantized_model.fuse_model()
per_channel_quantized_model.qconfig = torch.quantization.get_default_qconfig('fbgemm')
print(per_channel_quantized_model.qconfig)


torch.quantization.prepare(per_channel_quantized_model, inplace=True)
evaluate(per_channel_quantized_model,criterion, data_loader, num_calibration_batches)
torch.quantization.convert(per_channel_quantized_model, inplace=True)
top1, top5 = evaluate(per_channel_quantized_model, criterion, data_loader_test, neval_batches=num_eval_batches)
print('Evaluation accuracy on %d images, %2.2f'%(num_eval_batches * eval_batch_size, top1.avg))
torch.jit.save(torch.jit.script(per_channel_quantized_model), saved_model_dir + scripted_quantized_model_file)

得出:

QConfig(activation=functools.partial(<class 'torch.quantization.observer.HistogramObserver'>, reduce_range=True), weight=functools.partial(<class 'torch.quantization.observer.PerChannelMinMaxObserver'>, dtype=torch.qint8, qscheme=torch.per_channel_symmetric))
....................Evaluation accuracy on 300 images, 77.33

僅更改這種量化配置方法,就可以將準(zhǔn)確性提高到 76%以上! 盡管如此,這仍比上述 78%的基準(zhǔn)差 1-2%。 因此,讓我們嘗試量化意識的訓(xùn)練。

5.量化意識訓(xùn)練

量化意識訓(xùn)練(QAT)是通常導(dǎo)致最高準(zhǔn)確性的量化方法。 使用 QAT,在訓(xùn)練的正向和反向過程中,所有權(quán)重和激活都被“偽量化”:也就是說,浮點(diǎn)值會四舍五入以模擬 int8 值,但所有計(jì)算仍將使用浮點(diǎn)數(shù)進(jìn)行。 因此,在訓(xùn)練過程中進(jìn)行所有權(quán)重調(diào)整,同時(shí)“意識到”模型將最終被量化的事實(shí)。 因此,在量化之后,此方法通常比動態(tài)量化或訓(xùn)練后靜態(tài)量化具有更高的準(zhǔn)確性。

實(shí)際執(zhí)行 QAT 的總體工作流程與之前非常相似:

  • 我們可以使用與以前相同的模型:量化意識訓(xùn)練不需要額外的準(zhǔn)備。
  • 我們需要使用qconfig來指定要在權(quán)重和激活之后插入哪種偽量化,而不是指定觀察者

我們首先定義一個(gè)訓(xùn)練函數(shù):

def train_one_epoch(model, criterion, optimizer, data_loader, device, ntrain_batches):
    model.train()
    top1 = AverageMeter('Acc@1', ':6.2f')
    top5 = AverageMeter('Acc@5', ':6.2f')
    avgloss = AverageMeter('Loss', '1.5f')


    cnt = 0
    for image, target in data_loader:
        start_time = time.time()
        print('.', end = '')
        cnt += 1
        image, target = image.to(device), target.to(device)
        output = model(image)
        loss = criterion(output, target)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        acc1, acc5 = accuracy(output, target, topk=(1, 5))
        top1.update(acc1[0], image.size(0))
        top5.update(acc5[0], image.size(0))
        avgloss.update(loss, image.size(0))
        if cnt >= ntrain_batches:
            print('Loss', avgloss.avg)


            print('Training: * Acc@1 {top1.avg:.3f} Acc@5 {top5.avg:.3f}'
                  .format(top1=top1, top5=top5))
            return


    print('Full imagenet train set:  * Acc@1 {top1.global_avg:.3f} Acc@5 {top5.global_avg:.3f}'
          .format(top1=top1, top5=top5))
    return

我們像以前一樣融合模塊

qat_model = load_model(saved_model_dir + float_model_file)
qat_model.fuse_model()


optimizer = torch.optim.SGD(qat_model.parameters(), lr = 0.0001)
qat_model.qconfig = torch.quantization.get_default_qat_qconfig('fbgemm')

最后,prepare_qat執(zhí)行“偽量化”,為量化感知訓(xùn)練準(zhǔn)備模型

torch.quantization.prepare_qat(qat_model, inplace=True)
print('Inverted Residual Block: After preparation for QAT, note fake-quantization modules \n',qat_model.features[1].conv)

得出:

Inverted Residual Block: After preparation for QAT, note fake-quantization modules
 Sequential(
  (0): ConvBNReLU(
    (0): ConvBnReLU2d(
      32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False
      (activation_post_process): FakeQuantize(
        fake_quant_enabled=True, observer_enabled=True,            scale=None, zero_point=None
        (activation_post_process): MovingAverageMinMaxObserver(min_val=None, max_val=None)
      )
      (weight_fake_quant): FakeQuantize(
        fake_quant_enabled=True, observer_enabled=True,            scale=None, zero_point=None
        (activation_post_process): MovingAveragePerChannelMinMaxObserver(min_val=None, max_val=None)
      )
    )
    (1): Identity()
    (2): Identity()
  )
  (1): ConvBn2d(
    32, 16, kernel_size=(1, 1), stride=(1, 1), bias=False
    (activation_post_process): FakeQuantize(
      fake_quant_enabled=True, observer_enabled=True,            scale=None, zero_point=None
      (activation_post_process): MovingAverageMinMaxObserver(min_val=None, max_val=None)
    )
    (weight_fake_quant): FakeQuantize(
      fake_quant_enabled=True, observer_enabled=True,            scale=None, zero_point=None
      (activation_post_process): MovingAveragePerChannelMinMaxObserver(min_val=None, max_val=None)
    )
  )
  (2): Identity()
)

高精度訓(xùn)練量化模型需要在推斷時(shí)對數(shù)字進(jìn)行精確建模。 因此,對于量化感知訓(xùn)練,我們通過以下方式修改訓(xùn)練循環(huán):

  • 在訓(xùn)練快要結(jié)束時(shí)切換批處理規(guī)范以使用運(yùn)行均值和方差,以更好地匹配推理數(shù)字。
  • 我們還凍結(jié)了量化器參數(shù)(比例和零點(diǎn)),并對權(quán)重進(jìn)行了微調(diào)。
num_train_batches = 20


## Train and check accuracy after each epoch
for nepoch in range(8):
    train_one_epoch(qat_model, criterion, optimizer, data_loader, torch.device('cpu'), num_train_batches)
    if nepoch > 3:
        # Freeze quantizer parameters
        qat_model.apply(torch.quantization.disable_observer)
    if nepoch > 2:
        # Freeze batch norm mean and variance estimates
        qat_model.apply(torch.nn.intrinsic.qat.freeze_bn_stats)


    # Check the accuracy after each epoch
    quantized_model = torch.quantization.convert(qat_model.eval(), inplace=False)
    quantized_model.eval()
    top1, top5 = evaluate(quantized_model,criterion, data_loader_test, neval_batches=num_eval_batches)
    print('Epoch %d :Evaluation accuracy on %d images, %2.2f'%(nepoch, num_eval_batches * eval_batch_size, top1.avg))

得出:

....................Loss tensor(2.0660, grad_fn=<DivBackward0>)
Training: * Acc@1 53.000 Acc@5 77.167
..........Epoch 0 :Evaluation accuracy on 300 images, 78.67
....................Loss tensor(2.0398, grad_fn=<DivBackward0>)
Training: * Acc@1 56.000 Acc@5 77.667
..........Epoch 1 :Evaluation accuracy on 300 images, 74.67
....................Loss tensor(2.0917, grad_fn=<DivBackward0>)
Training: * Acc@1 52.833 Acc@5 77.333
..........Epoch 2 :Evaluation accuracy on 300 images, 75.33
....................Loss tensor(1.9406, grad_fn=<DivBackward0>)
Training: * Acc@1 55.000 Acc@5 79.333
..........Epoch 3 :Evaluation accuracy on 300 images, 77.67
....................Loss tensor(1.8255, grad_fn=<DivBackward0>)
Training: * Acc@1 59.833 Acc@5 82.000
..........Epoch 4 :Evaluation accuracy on 300 images, 77.00
....................Loss tensor(1.8275, grad_fn=<DivBackward0>)
Training: * Acc@1 58.167 Acc@5 80.167
..........Epoch 5 :Evaluation accuracy on 300 images, 76.67
....................Loss tensor(1.9429, grad_fn=<DivBackward0>)
Training: * Acc@1 56.333 Acc@5 79.833
..........Epoch 6 :Evaluation accuracy on 300 images, 76.33
....................Loss tensor(1.8643, grad_fn=<DivBackward0>)
Training: * Acc@1 57.333 Acc@5 81.000
..........Epoch 7 :Evaluation accuracy on 300 images, 75.67

在這里,我們只對少數(shù)幾個(gè)時(shí)期執(zhí)行量化感知訓(xùn)練。 盡管如此,量化感知訓(xùn)練在整個(gè) imagenet 數(shù)據(jù)集上的準(zhǔn)確性仍超過 71%,接近 71.9%的浮點(diǎn)準(zhǔn)確性。

有關(guān)量化意識訓(xùn)練的更多信息:

  • QAT 是后期訓(xùn)練量化技術(shù)的超集,可以進(jìn)行更多調(diào)試。 例如,我們可以分析模型的準(zhǔn)確性是否受到權(quán)重或激活量化的限制。
  • 由于我們使用偽量化來對實(shí)際量化算術(shù)的數(shù)值建模,因此我們還可以在浮點(diǎn)中模擬量化模型的準(zhǔn)確性。
  • 我們也可以輕松地模擬訓(xùn)練后量化。

量化加速

最后,讓我們確認(rèn)一下我們上面提到的內(nèi)容:量化模型實(shí)際上執(zhí)行推理的速度更快嗎? 讓我們測試一下:

def run_benchmark(model_file, img_loader):
    elapsed = 0
    model = torch.jit.load(model_file)
    model.eval()
    num_batches = 5
    # Run the scripted model on a few batches of images
    for i, (images, target) in enumerate(img_loader):
        if i < num_batches:
            start = time.time()
            output = model(images)
            end = time.time()
            elapsed = elapsed + (end-start)
        else:
            break
    num_images = images.size()[0] * num_batches


    print('Elapsed time: %3.0f ms' % (elapsed/num_images*1000))
    return elapsed


run_benchmark(saved_model_dir + scripted_float_model_file, data_loader_test)


run_benchmark(saved_model_dir + scripted_quantized_model_file, data_loader_test)

得出:

Elapsed time:  16 ms
Elapsed time:  10 ms

在 MacBook Pro 上本地運(yùn)行此程序,常規(guī)模型的運(yùn)行時(shí)間為 61 毫秒,而量化模型的運(yùn)行時(shí)間僅為 20 毫秒,這說明了量化模型與浮點(diǎn)模型相比,典型的 2-4 倍加速。

結(jié)論

在本教程中,我們展示了兩種量化方法-訓(xùn)練后靜態(tài)量化和量化感知訓(xùn)練-描述它們在“幕后”進(jìn)行的操作以及如何在 PyTorch 中使用它們。

謝謝閱讀! 與往常一樣,我們歡迎您提供任何反饋,因此,如果有任何問題,請?jiān)诖颂巹?chuàng)建一個(gè)問題

腳本的總運(yùn)行時(shí)間:(9 分鐘 43.065 秒)

Download Python source code: static_quantization_tutorial.py Download Jupyter notebook: static_quantization_tutorial.ipynb


以上內(nèi)容是否對您有幫助:
在線筆記
App下載
App下載

掃描二維碼

下載編程獅App

公眾號
微信公眾號

編程獅公眾號