九九九精品中文字幕蜜桃发布,久久久久无码精品国产古代,国产免费无码一区二区三区

0519-85602926 15861139266

更多>>新聞中心

熱門課程

聯(lián)系方式

常州和訊自動(dòng)化培訓(xùn)中心
常州市新北區(qū)府琛商務(wù)廣場(chǎng)2號(hào)樓1409室
電話:0519-85602926
手機(jī):15861139266 13401342299

當(dāng)前位置：網(wǎng)站首頁 > 新聞中心新聞中心

CNN輸入固定尺寸圖像改為任意尺寸圖像-常州上位機(jī)學(xué)習(xí)，常州機(jī)器機(jī)器視覺學(xué)習(xí)

日期：2024-2-2 13:13:31人氣：標(biāo)簽：常州上位機(jī)學(xué)習(xí) 常州機(jī)器視覺學(xué)習(xí)

本文小白將和大家一起學(xué)習(xí)如何在不使用計(jì)算量很大的滑動(dòng)窗口的情況下對(duì)任意尺寸的圖像進(jìn)行圖像分類。通過修改，將ResNet-18CNN框架需要224×224尺寸的圖像輸入改為任意尺寸的圖像輸入。

首先，我們澄清一個(gè)對(duì)卷積神經(jīng)網(wǎng)絡(luò)（CNN）的誤解。

卷積神經(jīng)網(wǎng)絡(luò)不需要固定大小的輸入

如果用過CNN對(duì)圖像進(jìn)行分類，我們需要對(duì)輸入圖像進(jìn)行裁剪或調(diào)整大小使其滿足CNN網(wǎng)絡(luò)所需的輸入大小。雖然這種做法非常普遍，但是使用此方法存在一些局限。

1. 分辨率下降：如果在一幅大圖中有一只小狗但其只占據(jù)圖像中的一小部分，則調(diào)整圖像的大小會(huì)使照片中的狗變得更小，以致無法正確分類圖像。

2. 非正方形長(zhǎng)寬比：通常，圖像分類網(wǎng)絡(luò)是在正方形圖像上訓(xùn)練的。如果輸入圖像不是正方形，一般來說我們會(huì)從中心取出正方形區(qū)域，或者使用不同的比例調(diào)整寬度和高度以使圖像變?yōu)檎叫巍５谝环N情況下，我們可能把不在中心的重要特征忽略了。而在第二種情況下，圖像信息會(huì)因縮放比例不均勻而失真。

3. 計(jì)算量大：為了解決該問題，我們可以重疊裁剪圖像，并在每個(gè)窗口上執(zhí)行圖像分類。這樣計(jì)算量很大，而且完全沒有必要。

有趣的是，許多人沒有意識(shí)到如果我們對(duì)網(wǎng)絡(luò)進(jìn)行較小的修改，CNN便可以接受任何大小的圖像作為輸入，而且不需要再次訓(xùn)練！本文我們將通過修改一個(gè)標(biāo)準(zhǔn)網(wǎng)絡(luò)的示例來向各位小伙伴介紹如何實(shí)現(xiàn)輸入任意大小的圖像。

修改圖像分類體系結(jié)構(gòu)以處理任意大小的圖

幾乎所有分類結(jié)構(gòu)的末尾都有一個(gè)全連接層（FC）。（注意：FC層在PyTorch中稱為“線性”層）FC層的問題在于它們需要輸入固定尺寸的數(shù)據(jù)。如果我們更改輸入圖像的大小，就無法進(jìn)行計(jì)算。因此，我們需要用其他東西替換FC層，但是在此之前，我們需要了解為什么在圖像分類體系結(jié)構(gòu)中需要使用全連接層。

現(xiàn)代的CNN架構(gòu)由幾個(gè)卷積層塊和最后的幾個(gè)FC層組成。這種結(jié)構(gòu)可以追溯到神經(jīng)網(wǎng)絡(luò)的早期研究。卷積層作為“智能”過濾器從圖像中提取語義信息，它們?cè)谀撤N程度上保留了圖像對(duì)象之間的空間關(guān)系。但是，為了對(duì)圖像中的對(duì)象進(jìn)行分類，我們并不需要此空間信息，因此通常將最后一個(gè)卷積層的輸出展平為一個(gè)長(zhǎng)向量。該長(zhǎng)向量是FC層的輸入，它不考慮空間信息。FC層僅對(duì)圖像中所有空間位置的深層特征進(jìn)行加權(quán)求和。

實(shí)際上這種結(jié)構(gòu)的效果很好，并且通過了大量實(shí)踐的證明。但是，由于存在FC層，因此網(wǎng)絡(luò)只能接受固定大小的輸入。因此，我們需要將FC層替換為不需要固定大小輸入的一種網(wǎng)絡(luò)層。這就是不限于其輸入尺寸的卷積層！

接下來我們要做的就是使用等效的卷積層去替代FC層。

全連接層到卷積層的轉(zhuǎn)換

FC和卷積層在目標(biāo)輸入上有所不同–卷積層側(cè)重于局部輸入?yún)^(qū)域，而FC層則將全局特征組合在一起。但是，F(xiàn)C層和卷積層都計(jì)算點(diǎn)積，因此在本質(zhì)上是相似的。所以滿足兩者之間互相轉(zhuǎn)換的條件。

我們通過一個(gè)例子來解釋這一點(diǎn)。

假設(shè)有一個(gè)FC層以卷積層的輸出作為輸入，卷積層輸出5x5x16張量。我們還假設(shè)FC層的輸出大小為120。如果使用FC層，則首先將5x5x16的體積展平為FC層的400×1（即5x5x16）矢量。但是，我們使用等效的卷積層，需要使用大小為5x5x16的核。在CNN中，核的深度（在這種情況下為16）總是與輸入的深度相同，通常寬度和高度是相同的（在這種情況下為5）。因此，我們可以簡(jiǎn)單地說內(nèi)核大小為5，而不是5x5x16。濾波器的數(shù)量需要與我們想要的輸出相同，因此設(shè)置為120。同時(shí)，步幅設(shè)置為1，填充為0。

修改ResNet-18架構(gòu)

ResNet-18是一種流行的CNN架構(gòu)，該網(wǎng)絡(luò)的需要輸入大小為224×224的圖像。但是我們將對(duì)其進(jìn)行修改以接受任意大小的輸入。

下圖是框架的組成

在PyTorch中，Resnet-18體系結(jié)構(gòu)從卷積層開始，稱為conv1（請(qǐng)參見下面的代碼）。然后是池化層。

接下來依次是4個(gè)卷積塊，圖中使用了粉紅色，紫色，黃色和橙色。這些模塊被命名為layer1，layer2，layer3，和layer4。每個(gè)模塊包含4個(gè)卷積層。

最后，我們有一個(gè)平均池化層。該層的輸出被展平并送到最終完全連接層FC。

下面代碼是Resnet框架的實(shí)現(xiàn)。

# from the torchvision's implementation of ResNet

class ResNet:

# ...

self.conv1 = nn.Conv2d(3, self.inplanes, kernel_size=7, stride=2, padding=3,

bias=False)

self.bn1 = norm_layer(self.inplanes)

self.relu = nn.ReLU(inplace=True)

self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)

self.layer1 = self._make_layer(block, 64, layers)

self.layer2 = self._make_layer(block, 128, layers, stride=2, dilate = replace_stride_with_dilation)

self.layer3 = self._make_layer(block, 256, layers, stride=2, dilate = replace_stride_with_dilation)

self.layer4 = self._make_layer(block, 512, layers, stride=2, dilate = replace_stride_with_dilation)

self.avgpool = nn.AdaptiveAvgPool2d((1, 1))

self.fc = nn.Linear(512 * block.expansion, num_classes)

# ...

def _forward_impl(self, x):

# See note [TorchScript super()]

x = self.conv1(x)

x = self.bn1(x)

x = self.relu(x)

x = self.maxpool(x)

x = self.layer1(x)

x = self.layer2(x)

x = self.layer3(x)

x = self.layer4(x)

x = self.avgpool(x)

x = torch.flatten(x, 1)

x = self.fc(x)

return x

我們將通過繼承原始的ResNet類來創(chuàng)建一個(gè)新類FullyConvolutionalResnet18，具體代碼如下：

class FullyConvolutionalResnet18(models.ResNet):

def __init__(self, num_classes=1000, pretrained=False, **kwargs):

# Start with standard resnet18 defined here

super().__init__(block = models.resnet.BasicBlock, layers = [2, 2, 2, 2], num_classes = num_classes, **kwargs)

if pretrained:

state_dict = load_state_dict_from_url( models.resnet.model_urls["resnet18"], progress=True)

self.load_state_dict(state_dict)

# Replace AdaptiveAvgPool2d with standard AvgPool2d

self.avgpool = nn.AvgPool2d((7, 7))

# Convert the original fc layer to a convolutional layer.

self.last_conv = torch.nn.Conv2d( in_channels = self.fc.in_features, out_channels = num_classes, kernel_size = 1)

self.last_conv.weight.data.copy_( self.fc.weight.data.view ( *self.fc.weight.data.shape, 1, 1))

self.last_conv.bias.data.copy_ (self.fc.bias.data)

# Reimplementing forward pass.

def _forward_impl(self, x):

# Standard forward for resnet18

x = self.conv1(x)

x = self.bn1(x)

x = self.relu(x)

x = self.maxpool(x)

x = self.layer1(x)

x = self.layer2(x)

x = self.layer3(x)

x = self.layer4(x)

x = self.avgpool(x)

# Notice, there is no forward pass

# through the original fully connected layer.

# Instead, we forward pass through the last conv layer

x = self.last_conv(x)

return x

使用完全卷積ResNet-18

通過我們的定義，我們已經(jīng)擁有了能夠?qū)θ我獬叽鐖D像進(jìn)行處理的ResNet-18，加下來將要介紹如何使用我們新定義的ResNet-18。

#1. 導(dǎo)入標(biāo)準(zhǔn)庫

import torch

import torch.nn as nn

from torchvision import models

from torch.hub import load_state_dict_from_url

from PIL import Image

import cv2

import numpy as np

from matplotlib import pyplot as plt

#2. 讀取ImageNet類ID到名稱的映射

if __name__ == "__main__":

# Read ImageNet class id to name mapping

with open('imagenet_classes.txt') as f:

labels = [line.strip() for line in f.readlines()]

讀取圖像并將其轉(zhuǎn)換為可以與PyTorch一起使用。

輸入圖像：請(qǐng)注意，駱駝不在圖像上居中

# Read image

original_image = cv2.imread('camel.jpg')

# Convert original image to RGB format

image = cv2.cvtColor(original_image, cv2.COLOR_BGR2RGB)

# Transform input image

# 1. Convert to Tensor

# 2. Subtract mean

# 3. Divide by standard deviation

transform = transforms.Compose([

transforms.ToTensor(), #Convert image to tensor.

transforms.Normalize(

mean=[0.485, 0.456, 0.406], # Subtract mean

std=[0.229, 0.224, 0.225] # Divide by standard deviation

)])

image = transform(image)

image = image.unsqueeze(0)

使用預(yù)先訓(xùn)練的參數(shù)加載FullyConvolutionalResNet18模型。

# Load modified resnet18 model with pretrained ImageNet weights

model = FullyConvolutionalResnet18(pretrained=True).eval()

進(jìn)行網(wǎng)絡(luò)計(jì)算，得到結(jié)果

with torch.no_grad():

# Perform inference.

# Instead of a 1x1000 vector, we will get a

# 1x1000xnxm output ( i.e. a probabibility map

# of size n x m for each 1000 class,

# where n and m depend on the size of the image.)

preds = model(image)

preds = torch.softmax(preds, dim=1)

print('Response map shape : ', preds.shape)

# Find the class with the maximum score in the n x m output map

pred, class_idx = torch.max(preds, dim=1)

print(class_idx)

row_max, row_idx = torch.max(pred, dim=1)

col_max, col_idx = torch.max(row_max, dim=1)

predicted_class = class_idx[0, row_idx[0, col_idx], col_idx]

# Print top predicted class

print('Predicted Class : ', labels[predicted_class], predicted_class)

運(yùn)行上面代碼時(shí)，我們會(huì)得到以下輸出。

Response map shape : torch.Size([1, 1000, 3, 8])

tensor([[[977, 977, 977, 977, 977, 978, 354, 437],

[978, 977, 980, 977, 858, 970, 354, 461],

[977, 978, 977, 977, 977, 977, 354, 354]]])

Predicted Class : Arabian camel, dromedary, Camelus dromedarius tensor([354])

在原始的ResNet中，輸出是1000個(gè)元素的向量，其中向量的每個(gè)元素對(duì)應(yīng)于ImageNet的1000個(gè)類的類概率。

在FC的版本中，我們得到一個(gè)大小為[1，1000，n，m]的響應(yīng)圖，其中n和m取決于原始圖像的大小和網(wǎng)絡(luò)本身。

在我們的示例中，當(dāng)我們輸入大小為1920×725的圖像時(shí)，我們會(huì)收到大小為[1，1000，3，8]的響應(yīng)圖。

預(yù)測(cè)類的響應(yīng)圖

接下來，我們找到預(yù)測(cè)類的響應(yīng)圖，并對(duì)其進(jìn)行上采樣以適合原始圖像。我們對(duì)響應(yīng)圖進(jìn)行閾值處理以獲得感興趣的區(qū)域并在其周圍找到一個(gè)邊界框。具體代碼如下所示：

# Find the n x m score map for the predicted class

score_map = preds[0, predicted_class, :, :].cpu().numpy()

score_map = score_map

# Resize score map to the original image size

score_map = cv2.resize(score_map, (original_image.shape, original_image.shape))

# Binarize score map

_, score_map_for_contours = cv2.threshold(score_map, 0.25, 1, type=cv2.THRESH_BINARY)

score_map_for_contours = score_map_for_contours.astype(np.uint8).copy()

# Find the countour of the binary blob

contours, _ = cv2.findContours(score_map_for_contours, mode=cv2.RETR_EXTERNAL, method=cv2.CHAIN_APPROX_SIMPLE)

# Find bounding box around the object.

rect = cv2.boundingRect(contours)

顯示結(jié)果

以下代碼用于以圖像形式顯示結(jié)果。響應(yīng)圖中越亮的區(qū)域表示高可能性區(qū)域。

# Apply score map as a mask to original image

score_map = score_map - np.min(score_map[:])

score_map = score_map / np.max(score_map[:])

接下來，我們將響應(yīng)圖與原始圖像相乘并顯示邊界框。

score_map = cv2.cvtColor(score_map, cv2.COLOR_GRAY2BGR)

masked_image = (original_image * score_map).astype(np.uint8)

# Display bounding box

cv2.rectangle(masked_image, rect[:2], (rect + rect, rect + rect), (0, 0, 255), 2)

# Display images

cv2.imshow("Original Image", original_image)

cv2.imshow("scaled_score_map", score_map)

cv2.imshow("activations_and_bbox", masked_image)

cv2.waitKey(0)

結(jié)果如下所示。我們看到只有駱駝被突出顯示。通過對(duì)響應(yīng)圖設(shè)定閾值而創(chuàng)建的邊界框?qū)⒉东@駱駝。從這個(gè)意義上說，全卷積圖像分類器的作用就像對(duì)象檢測(cè)器！

本文網(wǎng)址：

上一篇：一個(gè)神經(jīng)元頂5到8層神經(jīng)網(wǎng)絡(luò)，深度學(xué)習(xí)的計(jì)算復(fù)雜度被生物碾壓了

下一篇：沒有資料

相關(guān)信息:

常州機(jī)器視覺培訓(xùn)

常州上位機(jī)軟件開發(fā)

常州工業(yè)機(jī)器人編程設(shè)計(jì)培訓(xùn)

常州PLC培訓(xùn)

常州PLC

常州PLC編程培訓(xùn)

常州電工培訓(xùn)