怎么设置网站,镇江网站建设制作公司,wordpress keshop,黄石做网站联系文章目录 摘要Abstract1. 引言2. 框架2.1 金字塔池化模块2.2 特征提取器的监督2.3 训练细节 3. 创新点和不足3.1 创新点3.2 不足 参考总结 摘要 PSPNet是一个改进了FCN-8s缺点的语义分割模型#xff0c;它解决了FCN-8s的缺点——分割不够精细以及没有考虑上下文信息。PSPNet的… 文章目录 摘要Abstract1. 引言2. 框架2.1 金字塔池化模块2.2 特征提取器的监督2.3 训练细节 3. 创新点和不足3.1 创新点3.2 不足 参考总结 摘要 PSPNet是一个改进了FCN-8s缺点的语义分割模型它解决了FCN-8s的缺点——分割不够精细以及没有考虑上下文信息。PSPNet的核心创新之一是引入了金字塔池化模块该模块通过不同尺度的池化操作来捕捉全局和局部的上下文信息。具体来说金字塔池化模块将特征图划分为不同大小的子区域对每个子区域进行池化操作然后将这些不同尺度的特征图上采样并融合从而生成包含多尺度上下文信息的特征表示。此外PSPNet提出了基于深度监督损失的优化策略。在训练过程中除了主分支的分割损失外还引入了辅助分支的分类损失。辅助分支连接在ResNet的中间层通过计算分类损失来优化中间层的特征表示。这种策略不仅加速了模型的训练收敛还提高了模型对全局特征的学习能力。尽管PSPNet在语义分割上效果好但是它也存在如下的缺点分割精度的提升是以计算复杂度提升为代价的它在小目标上分割效果不佳。
Abstract PSPNet is a semantic segmentation model that improves upon the shortcomings of FCN-8s, addressing its lack of fine segmentation and failure to consider contextual information. One of PSPNet’s core innovations is the introduction of the pyramid pooling module, which captures both global and local contextual information through multi-scale pooling operations. Specifically, the pyramid pooling module divides the feature map into subregions of different sizes, performs pooling operations on each subregion, then upsamples and fuses these multi-scale feature maps to generate a feature representation enriched with multi-scale contextual information. Additionally, PSPNet proposes an optimization strategy based on deep supervision loss. During training, in addition to the segmentation loss from the main branch, an auxiliary classification loss is introduced. The auxiliary branch is connected to an intermediate layer of ResNet and optimizes the feature representation of this intermediate layer through classification loss. This strategy not only accelerates the convergence of model training but also enhances the model’s ability to learn global features. Although PSPNet achieves strong performance in semantic segmentation, it also has the following drawbacks: the improvement in segmentation accuracy comes at the cost of increased computational complexity, and its performance on small objects is suboptimal.
1. 引言 在计算机视觉中场景解析是在语义分割上衍生出来的任务因此它不仅要预测每个像素的类别标签而且还要在语义分割的效果图上给出颜色和类别的对应关系。从场景解析的任务要求上可以看出它对自动驾驶、机器人感知等应用有帮助。 场景解析的难度与场景和标签的多样性密切相关。场景解析的模型大多数都是基于全卷积神经网络的尽管这些基于卷积神经网络的方法能理解动态对象但是仍然面临挑战——场景解析需要处理多种多样的场景并且场景中包含物体的类别标签可能超出预定义的固定词汇表范围。此外由于不同物体之间外表相似的缘故场景解析可能会把一个物体误认成另一个物体。但是这个问题可以通过空间金字塔池化利用物体周围的上下文信息来解决。
2. 框架 PSPNet的运行流程首先输入图片经过一个卷积神经网络(基于残差神经网络的全卷积网络)进行特征提取提取后的特征图再经过金字塔池化模型得到四个不同大小的特征图接着对这些特征图进行上采样使得它们的尺寸与输入前的特征图一致然后将输入前的特征图与这四个上采样后的特征图进行拼接最后对拼接后的特征图先进行卷积再进行逐像素的预测。
2.1 金字塔池化模块 尽管全局平均池化也是一个全局上下文先验但是由于全局平均池化直接把特征图变成了一个在通道维度上的向量会导致空间关系的丢失并引起歧义。全局上下文信息和子区域上下文对于区分不同物体的类别是有帮助的因此论文提出了金字塔池化模块来利用全局上下文和子区域上下文同时希望减少不同子区域之间的上下文信息损失。 金字塔池化模块有四个不同的金字塔规格其中用红色标记的级别使用了全局池化接下来三个金字塔级别将特征图分成不同的子区域并形成不同位置的池化表示。金字塔池化模块中不同级别的输出包含不同大小的特征图接着这些特征图会经过一个卷积核大小为 1 × 1 1\times1 1×1过滤器个数为1的卷积层最终得到四个通道数为1的子区域特征图(论文中四个特征图的尺寸分别为 1 × 1 1\times1 1×1 2 × 2 2\times2 2×2 3 × 3 3\times3 3×3 6 × 6 6\times6 6×6)。这一步的目的是为了保证全局特征图在最终输出特征图中所占比重高。金字塔池化模块的最终输出是将输入特征图(全局特征图)和四个经过双线性插值上采样回输入特征图尺寸的子区域特征图进行拼接后得到的特征图。
2.2 特征提取器的监督 论文利用特征提取器中间层产生的特征图来计算分类损失Loss2同时利用特征提取器的最终结果经过PSPNet中后续的模块来产生分割损失Loss1。因此PSPNet的最终损失等于分割损失Loss1加上0.4倍的分类损失Loss2对分类损失Loss2乘上权重的目的旨在使分割损失对模型的训练起主导作用而分类损失用来辅助模型更好地进行特征提取。在训练阶段辅助损失和最终损失共同指导网络的训练过程但在测试阶段通常只使用经过优化的主分支进行预测。 下面是特征提取器ResNet101的PyTorch实现。
from collections import OrderedDict
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils import model_zoodef load_weights_sequential(target, source_state):new_dict OrderedDict()for (k1, v1), (k2, v2) in zip(target.state_dict().items(), source_state.items()):new_dict[k1] v2target.load_state_dict(new_dict)model_urls {resnet101: https://download.pytorch.org/models/resnet101-5d3b4d8f.pth,
}def conv3x3(in_planes, out_planes, stride1, dilation1):return nn.Conv2d(in_planes, out_planes, kernel_size3, stridestride,paddingdilation, dilationdilation, biasFalse)class Bottleneck(nn.Module):expansion 4def __init__(self, inplanes, planes, stride1, downsampleNone, dilation1):super(Bottleneck, self).__init__()self.conv1 nn.Conv2d(inplanes, planes, kernel_size1, biasFalse)self.bn1 nn.BatchNorm2d(planes)self.conv2 nn.Conv2d(planes, planes, kernel_size3, stridestride, dilationdilation,paddingdilation, biasFalse)self.bn2 nn.BatchNorm2d(planes)self.conv3 nn.Conv2d(planes, planes * 4, kernel_size1, biasFalse)self.bn3 nn.BatchNorm2d(planes * 4)self.relu nn.ReLU(inplaceTrue)self.downsample downsampleself.stride stridedef forward(self, x):residual xout self.conv1(x)out self.bn1(out)out self.relu(out)out self.conv2(out)out self.bn2(out)out self.relu(out)out self.conv3(out)out self.bn3(out)if self.downsample is not None:residual self.downsample(x)out residualout self.relu(out)return outclass ResNet(nn.Module):def __init__(self, block, layers(3, 4, 23, 3)):self.inplanes 64super(ResNet, self).__init__()self.conv1 nn.Conv2d(3, 64, kernel_size7, stride2, padding3,biasFalse)self.bn1 nn.BatchNorm2d(64)self.relu nn.ReLU(inplaceTrue)self.maxpool nn.MaxPool2d(kernel_size3, stride2, padding1)self.layer1 self._make_layer(block, 64, layers[0])self.layer2 self._make_layer(block, 128, layers[1], stride2)self.layer3 self._make_layer(block, 256, layers[2], stride1, dilation2)self.layer4 self._make_layer(block, 512, layers[3], stride1, dilation4)for m in self.modules():if isinstance(m, nn.Conv2d):n m.kernel_size[0] * m.kernel_size[1] * m.out_channelsm.weight.data.normal_(0, math.sqrt(2. / n))elif isinstance(m, nn.BatchNorm2d):m.weight.data.fill_(1)m.bias.data.zero_()def _make_layer(self, block, planes, blocks, stride1, dilation1):downsample Noneif stride ! 1 or self.inplanes ! planes * block.expansion:downsample nn.Sequential(nn.Conv2d(self.inplanes, planes * block.expansion,kernel_size1, stridestride, biasFalse),nn.BatchNorm2d(planes * block.expansion),)layers [block(self.inplanes, planes, stride, downsample)]self.inplanes planes * block.expansionfor i in range(1, blocks):layers.append(block(self.inplanes, planes, dilationdilation))return nn.Sequential(*layers)def forward(self, x):x self.conv1(x)x self.bn1(x)x self.relu(x)x self.maxpool(x)x self.layer1(x)x self.layer2(x)x_3 self.layer3(x)x self.layer4(x_3)return x, x_3def resnet101(pretrainedTrue):model ResNet(Bottleneck, [3, 4, 23, 3])if pretrained:load_weights_sequential(model, model_zoo.load_url(model_urls[resnet101]))return model2.3 训练细节 下面是PSPNet的PyTorch实现。
class PSPModule(nn.Module):def __init__(self, features, out_features1024, sizes(1, 2, 3, 6)):super().__init__()self.stages []self.stages nn.ModuleList([self._make_stage(features, size) for size in sizes])self.bottleneck nn.Conv2d(features * (len(sizes) 1), out_features, kernel_size1)self.relu nn.ReLU()def _make_stage(self, features, size):prior nn.AdaptiveAvgPool2d(output_size(size, size))conv nn.Conv2d(features, features, kernel_size1, biasFalse)return nn.Sequential(prior, conv)def forward(self, feats):h, w feats.size(2), feats.size(3)priors [F.upsample(inputstage(feats), size(h, w), modebilinear) for stage in self.stages] [feats]bottle self.bottleneck(torch.cat(priors, 1))return self.relu(bottle)class PSPUpsample(nn.Module):def __init__(self, in_channels, out_channels):super().__init__()self.conv nn.Sequential(nn.Conv2d(in_channels, out_channels, 3, padding1),nn.BatchNorm2d(out_channels),nn.PReLU())def forward(self, x):h, w 2 * x.size(2), 2 * x.size(3)p F.upsample(inputx, size(h, w), modebilinear)return self.conv(p)class PSPNet(nn.Module):def __init__(self, n_classes18, sizes(1, 2, 3, 6), psp_size2048, deep_features_size1024, backendresnet34,pretrainedTrue):super().__init__()self.feats getattr(extractors, backend)(pretrained)self.psp PSPModule(psp_size, 1024, sizes)self.drop_1 nn.Dropout2d(p0.3)self.up_1 PSPUpsample(1024, 256)self.up_2 PSPUpsample(256, 64)self.up_3 PSPUpsample(64, 64)self.drop_2 nn.Dropout2d(p0.15)self.final nn.Sequential(nn.Conv2d(64, n_classes, kernel_size1),nn.LogSoftmax())self.classifier nn.Sequential(nn.Linear(deep_features_size, 256),nn.ReLU(),nn.Linear(256, n_classes))def forward(self, x):f, class_f self.feats(x) p self.psp(f)p self.drop_1(p)p self.up_1(p)p self.drop_2(p)p self.up_2(p)p self.drop_2(p)p self.up_3(p)p self.drop_2(p)auxiliary F.adaptive_max_pool2d(inputclass_f, output_size(1, 1)).view(-1, class_f.size(1))return self.final(p), self.classifier(auxiliary)下面是训练过程的代码。
from torch import optim
from torch.optim.lr_scheduler import MultiStepLR
from torch.autograd import Variable
from torch.utils.data import DataLoaderfrom tqdm import tqdm
import click
import numpy as npmodels {resnet101: lambda: PSPNet(sizes(1, 2, 3, 6), psp_size2048, deep_features_size1024, backendresnet101)
}def build_network(snapshot, backend):epoch 0backend backend.lower()net models[backend]()net nn.DataParallel(net)if snapshot is not None:_, epoch os.path.basename(snapshot).split(_)epoch int(epoch)net.load_state_dict(torch.load(snapshot))logging.info(Snapshot for epoch {} loaded from {}.format(epoch, snapshot))net net.cuda()return net, epochclick.command()
click.option(--data-path, typestr, helpPath to dataset folder)
click.option(--models-path, typestr, helpPath for storing model snapshots)
click.option(--backend, typestr, defaultresnet34, helpFeature extractor)
click.option(--snapshot, typestr, defaultNone, helpPath to pretrained weights)
click.option(--crop_x, typeint, default256, helpHorizontal random crop size)
click.option(--crop_y, typeint, default256, helpVertical random crop size)
click.option(--batch-size, typeint, default16)
click.option(--alpha, typefloat, default1.0, helpCoefficient for classification loss term)
click.option(--epochs, typeint, default20, helpNumber of training epochs to run)
click.option(--gpu, typestr, default0, helpList of GPUs for parallel training, e.g. 0,1,2,3)
click.option(--start-lr, typefloat, default0.001)
click.option(--milestones, typestr, default10,20,30, helpMilestones for LR decreasing)
def train(data_path, models_path, backend, snapshot, crop_x, crop_y, batch_size, alpha, epochs, start_lr, milestones, gpu):os.environ[CUDA_VISIBLE_DEVICES] gpunet, starting_epoch build_network(snapshot, backend)data_path os.path.abspath(os.path.expanduser(data_path))models_path os.path.abspath(os.path.expanduser(models_path))os.makedirs(models_path, exist_okTrue)train_loader, class_weights, n_images None, None, Noneoptimizer optim.Adam(net.parameters(), lrstart_lr)scheduler MultiStepLR(optimizer, milestones[int(x) for x in milestones.split(,)])for epoch in range(starting_epoch, starting_epoch epochs):seg_criterion nn.NLLLoss2d(weightclass_weights)cls_criterion nn.BCEWithLogitsLoss(weightclass_weights)epoch_losses []train_iterator tqdm(loader, totalmax_steps // batch_size 1)net.train()for x, y, y_cls in train_iterator:steps batch_sizeoptimizer.zero_grad()x, y, y_cls Variable(x).cuda(), Variable(y).cuda(), Variable(y_cls).cuda()out, out_cls net(x)seg_loss, cls_loss seg_criterion(out, y), cls_criterion(out_cls, y_cls)loss seg_loss alpha * cls_lossepoch_losses.append(loss.data[0])status [{0}] loss {1:0.5f} avg {2:0.5f}, LR {5:0.7f}.format(epoch 1, loss.data[0], np.mean(epoch_losses), scheduler.get_lr()[0])train_iterator.set_description(status)loss.backward()optimizer.step()scheduler.step()torch.save(net.state_dict(), os.path.join(models_path, _.join([PSPNet, str(epoch 1)])))train_loss np.mean(epoch_losses)3. 创新点和不足
3.1 创新点 PSPNet的核心创新之一是引入了金字塔池化模块该模块通过不同尺度的池化操作来捕捉全局和局部的上下文信息。具体来说PPM将特征图划分为不同大小的子区域如 1×1、2×2、3×3、6×6对每个子区域进行池化操作然后将这些不同尺度的特征图上采样并融合从而生成包含多尺度上下文信息的特征表示。此外PSPNet提出了基于深度监督损失的优化策略。在训练过程中除了主分支的分割损失外还引入了辅助分支的分类损失。辅助分支连接在ResNet的中间层通过计算分类损失来优化中间层的特征表示。这种策略不仅加速了模型的训练收敛还提高了模型对全局特征的学习能力。
3.2 不足 PSPNet通过金字塔池化模块来捕捉多尺度的上下文信息虽然提高了分割精度但同时也显著增加了计算复杂度。这使得PSPNet在处理高分辨率图像时需要更多的计算资源训练和推理速度相对较慢。此外尽管PSPNet通过金字塔池化模块能够较好地处理大尺度的上下文信息但在处理小目标时其性能可能不如一些专门针对小目标优化的模型。这是因为小目标可能在全局上下文中占比过小容易被忽略。
参考
Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, and et al. Pyramid Scene Parsing Network. 代码来源https://github.com/Lextal/pspnet-pytorch
总结 PSPNet解决了全卷积神经网络在语义分割上的缺点——分割不够精细以及没有考虑上下文信息。PSPNet的运行流程首先输入图片经过一个卷积神经网络(基于残差神经网络的全卷积网络)进行特征提取提取后的特征图再经过金字塔池化模型得到四个不同大小的特征图接着对这些特征图进行上采样使得它们的尺寸与输入前的特征图一致然后将输入前的特征图与这四个上采样后的特征图进行拼接最后对拼接后的特征图先进行卷积再进行逐像素的预测。此外尽管PSPNet在语义分割上效果好但是它也存在如下的缺点分割精度的提升是以计算复杂度提升为代价的它在小目标上分割效果不佳。