当前位置：首页 > news >正文

自己做网站建设一个好的网站建设

news 2026/4/21 13:18:43

自己做网站建设,一个好的网站建设,湖南省城乡建设厅网站,维护网站是什么工作发表时间#xff1a;2023年3月7日论文地址#xff1a;https://arxiv.org/abs/2303.03667 项目地址#xff1a;https://github.com/JierunChen/FasterNet FasterNet-t0在GPU、CPU和ARM处理器上分别比MobileViT-XXS快2.8、3.3和2.4#xff0c;而准确率要高2.9%。我们的大型…发表时间2023年3月7日论文地址https://arxiv.org/abs/2303.03667 项目地址https://github.com/JierunChen/FasterNet FasterNet-t0在GPU、CPU和ARM处理器上分别比MobileViT-XXS快2.8×、3.3×和2.4×而准确率要高2.9%。我们的大型FasterNet-L实现了令人印象深刻的83.5%的前1精度与新兴的Swin-B相当同时在GPU上有36%的推理吞吐量并在CPU上节省了37%的计算时间。FasterNet作者提到的其核心在于PConv模块其不仅减少了FLOPs降低了冗余计算其与ghostnet一样认为conv中存在冗余同时降低了mac大部分输入直达输入故而在取得了高性能的延时能力如在gpu上fps高在cpu与arm设备上延时最低。为此对PConv的设计与实现进行深入分析。 1、论文信息 1.1 模块设计 Pconv与常规卷积、分组卷积相比只对输入通道的少部分做密集卷积常规卷积剩余部分直通到输出。该操作大幅度降低了卷积的运算量如将输入通道分成4份只对其中一份进行卷积剩余的3份直通到下一层也降低了内存访问成本如C_in为400只对其四分之一进行卷积内存访问则为100wh100wh内存访问成本为200wh为原来的1/4 Pconv对应实现代码如下所示可以看到就是split》conv》cat操作 class Partial_conv3(nn.Module):def __init__(self, dim, n_div, forward):super().__init__()self.dim_conv3 dim // n_divself.dim_untouched dim - self.dim_conv3self.partial_conv3 nn.Conv2d(self.dim_conv3, self.dim_conv3, 3, 1, 1, biasFalse)if forward slicing:self.forward self.forward_slicingelif forward split_cat:self.forward self.forward_split_catelse:raise NotImplementedErrordef forward_slicing(self, x: Tensor) - Tensor:# only for inferencex x.clone() # !!! Keep the original input intact for the residual connection laterx[:, :self.dim_conv3, :, :] self.partial_conv3(x[:, :self.dim_conv3, :, :])return xdef forward_split_cat(self, x: Tensor) - Tensor:# for training/inferencex1, x2 torch.split(x, [self.dim_conv3, self.dim_untouched], dim1)x1 self.partial_conv3(x1)x torch.cat((x1, x2), 1)return x在论文中提到了与PWcov结合、或是T-shaped Conv然而在代码层面实际上跟PConv没有任何关系。只是在FasterNet Block中与Conv1x1进行结合conv1x1实现通道间信息交互 1.2 模型结构 Faster的模型结构如下所示可以看到Pconv只是其中的一小部分。作者将Pconv与conv1x1BNRelu残差联合在一起形成FasterNet BlockFasterNet Block才是模型的主要成分。然后模型中参考了VIT模型设计中的很多设计如PatchEmbed、mlp只是没有Transformer模块。 PatchEmbed在模型输入层中可以看到而mlp操作其实就是Pconv后面的Conv1x1bnreluConv1x1 具体模型结构如下所示一共有t0、t1、t2、s、m、l等版本可以看到数据在经过Embedding层后即完成了1/4下采样后续的每一个Stage即FasterNet Block仅是实现特征提取最后经过Merging层即conv2bn层实现对数据的下采样 1.3 结构对比模块性能对比这里对比了conv、分组卷积、深度分离卷积、PConv。对应的feature map在像素点量上是逐步减半的如96x56x56的像素量是192x28x28的一半可以发现只有DWConv的FLOPs是减半其他方法是没有减少的。这里可以发现DWConv是性价比最高的结构PConv是第二的观察fps与latency。唯独在ARM (Cortex-A72,using a single thread)架构下PConv比DWConv要强注1、PConv在r为1/4时FLOPs与group为1/16的分组卷积是一样的但内存访问量是不同的。注2、DWConv是全分组卷积ksize为3分组数为通道数仅实现空间信息交互点卷积组成ksize为1实现通道信息交互作者通过对Conv进行拟合发现PConv是loss最低的。这里是因为GConv与PConv都无法实现全局的通道信息交互所以需要PWConv。然后为了同等对比所以DWConv也被迫加上了一个PWConv这些loss在值差异上只有0.001~0.002实际上是没有区别的具体参考ddb_conv、RepConv进行融合输出值差异内存访问成本对比公式2是Pconv的公式3是conv的但c’是c的1/4故而说Pconv的内存访问成本是conv的1/4 这里是假定了模型输入输出的通道数都为c所以是2c否则是(c_inc_out) 1.3 模型效果宏观对比如下可以发现FasterNet在GPU上达到了最高的fps在cpu与arm上达到了最低的延时。以下图表表示了FasterNet在轻量级与重量级模型中都取得了最近性能。 2、代码实现与分析 2.1 Pconv代码 Pconv的实现代码经过简化后如下所示可以发现就是简单的splitcat操作。23年博主也做过类似尝试用pconv全量替换掉conv并没有训练出好效果 class Partial_conv3(nn.Module):def __init__(self, dim, n_div, forward):super().__init__()self.dim_conv3 dim // n_divself.dim_untouched dim - self.dim_conv3self.partial_conv3 nn.Conv2d(self.dim_conv3, self.dim_conv3, 3, 1, 1, biasFalse)def forward(self, x: Tensor) - Tensor:# only for inferencex x.clone() # !!! Keep the original input intact for the residual connection laterx[:, :self.dim_conv3, :, :] self.partial_conv3(x[:, :self.dim_conv3, :, :])return x2.2 Faster Block代码 spatial_mixing对象为pconv层 mlp对象为Faster Block模块中的非pconv层 forword代码如下 def forward(self, x: Tensor) - Tensor:shortcut xx self.spatial_mixing(x)x shortcut self.drop_path(self.mlp(x))return x完整实现代码如下 class MLPBlock(nn.Module):def __init__(self,dim,n_div,mlp_ratio,drop_path,layer_scale_init_value,act_layer,norm_layer,pconv_fw_type):super().__init__()self.dim dimself.mlp_ratio mlp_ratioself.drop_path DropPath(drop_path) if drop_path 0. else nn.Identity()self.n_div n_divmlp_hidden_dim int(dim * mlp_ratio)mlp_layer: List[nn.Module] [nn.Conv2d(dim, mlp_hidden_dim, 1, biasFalse),norm_layer(mlp_hidden_dim),act_layer(),nn.Conv2d(mlp_hidden_dim, dim, 1, biasFalse)]self.mlp nn.Sequential(*mlp_layer)self.spatial_mixing Partial_conv3(dim,n_div,pconv_fw_type)if layer_scale_init_value 0:self.layer_scale nn.Parameter(layer_scale_init_value * torch.ones((dim)), requires_gradTrue)self.forward self.forward_layer_scaleelse:self.forward self.forwarddef forward(self, x: Tensor) - Tensor:shortcut xx self.spatial_mixing(x)x shortcut self.drop_path(self.mlp(x))return xdef forward_layer_scale(self, x: Tensor) - Tensor:shortcut xx self.spatial_mixing(x)x shortcut self.drop_path(self.layer_scale.unsqueeze(-1).unsqueeze(-1) * self.mlp(x))return x 此外还有一个BasicStage类其主要就是实现多层MLPBlock即Faster Block的堆叠 2.3 PatchEmbed与PatchMerging PatchEmbed是类似于vit模型中的图像切patch将空间信息转移到通道上。 PatchMerging是基于conv的stride实现特征图的分辨率降低同时实现通道的增加。 class PatchEmbed(nn.Module):def __init__(self, patch_size, patch_stride, in_chans, embed_dim, norm_layer):super().__init__()self.proj nn.Conv2d(in_chans, embed_dim, kernel_sizepatch_size, stridepatch_stride, biasFalse)if norm_layer is not None:self.norm norm_layer(embed_dim)else:self.norm nn.Identity()def forward(self, x: Tensor) - Tensor:x self.norm(self.proj(x))return xclass PatchMerging(nn.Module):def __init__(self, patch_size2, patch_stride2, dim, norm_layer):super().__init__()self.reduction nn.Conv2d(dim, 2 * dim, kernel_sizepatch_size2, stridepatch_stride2, biasFalse)if norm_layer is not None:self.norm norm_layer(2 * dim)else:self.norm nn.Identity()def forward(self, x: Tensor) - Tensor:x self.norm(self.reduction(x))return x2.4 模型代码 class FasterNet(nn.Module):def __init__(self,in_chans3,num_classes1000,embed_dim96,depths(1, 2, 8, 2),mlp_ratio2.,n_div4,patch_size4,patch_stride4,patch_size22, # for subsequent layerspatch_stride22,patch_normTrue,feature_dim1280,drop_path_rate0.1,layer_scale_init_value0,norm_layerBN,act_layerRELU,fork_featFalse,init_cfgNone,pretrainedNone,pconv_fw_typesplit_cat,**kwargs):super().__init__()if norm_layer BN:norm_layer nn.BatchNorm2delse:raise NotImplementedErrorif act_layer GELU:act_layer nn.GELUelif act_layer RELU:act_layer partial(nn.ReLU, inplaceTrue)else:raise NotImplementedErrorif not fork_feat:self.num_classes num_classesself.num_stages len(depths)self.embed_dim embed_dimself.patch_norm patch_normself.num_features int(embed_dim * 2 ** (self.num_stages - 1))self.mlp_ratio mlp_ratioself.depths depths# split image into non-overlapping patchesself.patch_embed PatchEmbed(patch_sizepatch_size,patch_stridepatch_stride,in_chansin_chans,embed_dimembed_dim,norm_layernorm_layer if self.patch_norm else None)# stochastic depth decay ruledpr [x.item()for x in torch.linspace(0, drop_path_rate, sum(depths))]# build layersstages_list []for i_stage in range(self.num_stages):stage BasicStage(dimint(embed_dim * 2 ** i_stage),n_divn_div,depthdepths[i_stage],mlp_ratioself.mlp_ratio,drop_pathdpr[sum(depths[:i_stage]):sum(depths[:i_stage 1])],layer_scale_init_valuelayer_scale_init_value,norm_layernorm_layer,act_layeract_layer,pconv_fw_typepconv_fw_type)stages_list.append(stage)# patch merging layerif i_stage self.num_stages - 1:stages_list.append(PatchMerging(patch_size2patch_size2,patch_stride2patch_stride2,dimint(embed_dim * 2 ** i_stage),norm_layernorm_layer))self.stages nn.Sequential(*stages_list)self.fork_feat fork_featif self.fork_feat:self.forward self.forward_det# add a norm layer for each outputself.out_indices [0, 2, 4, 6]for i_emb, i_layer in enumerate(self.out_indices):if i_emb 0 and os.environ.get(FORK_LAST3, None):raise NotImplementedErrorelse:layer norm_layer(int(embed_dim * 2 ** i_emb))layer_name fnorm{i_layer}self.add_module(layer_name, layer)else:self.forward self.forward_cls# Classifier headself.avgpool_pre_head nn.Sequential(nn.AdaptiveAvgPool2d(1),nn.Conv2d(self.num_features, feature_dim, 1, biasFalse),act_layer())self.head nn.Linear(feature_dim, num_classes) \if num_classes 0 else nn.Identity()self.apply(self.cls_init_weights)self.init_cfg copy.deepcopy(init_cfg)if self.fork_feat and (self.init_cfg is not None or pretrained is not None):self.init_weights()def cls_init_weights(self, m):if isinstance(m, nn.Linear):trunc_normal_(m.weight, std.02)if isinstance(m, nn.Linear) and m.bias is not None:nn.init.constant_(m.bias, 0)elif isinstance(m, (nn.Conv1d, nn.Conv2d)):trunc_normal_(m.weight, std.02)if m.bias is not None:nn.init.constant_(m.bias, 0)elif isinstance(m, (nn.LayerNorm, nn.GroupNorm)):nn.init.constant_(m.bias, 0)nn.init.constant_(m.weight, 1.0)# init for mmdetection by loading imagenet pre-trained weightsdef init_weights(self, pretrainedNone):logger get_root_logger()if self.init_cfg is None and pretrained is None:logger.warn(fNo pre-trained weights for f{self.__class__.__name__}, ftraining start from scratch)passelse:assert checkpoint in self.init_cfg, fOnly support \fspecify Pretrained in \finit_cfg in \f{self.__class__.__name__} if self.init_cfg is not None:ckpt_path self.init_cfg[checkpoint]elif pretrained is not None:ckpt_path pretrainedckpt _load_checkpoint(ckpt_path, loggerlogger, map_locationcpu)if state_dict in ckpt:_state_dict ckpt[state_dict]elif model in ckpt:_state_dict ckpt[model]else:_state_dict ckptstate_dict _state_dictmissing_keys, unexpected_keys \self.load_state_dict(state_dict, False)# show for debugprint(missing_keys: , missing_keys)print(unexpected_keys: , unexpected_keys)def forward_cls(self, x):# output only the features of last layer for image classificationx self.patch_embed(x)x self.stages(x)x self.avgpool_pre_head(x) # B C 1 1x torch.flatten(x, 1)x self.head(x)return xdef forward_det(self, x: Tensor) - Tensor:# output the features of four stages for dense predictionx self.patch_embed(x)outs []for idx, stage in enumerate(self.stages):x stage(x)if self.fork_feat and idx in self.out_indices:norm_layer getattr(self, fnorm{idx})x_out norm_layer(x)outs.append(x_out)return outs2.5 完整模型代码完整模型代码只是用于3.2中的FLOPs分析 # Copyright (c) Microsoft Corporation. # Licensed under the MIT License. import torch import torch.nn as nn from timm.models.layers import DropPath, to_2tuple, trunc_normal_ from functools import partial from typing import List from torch import Tensor import copy import ostry:from mmdet.models.builder import BACKBONES as det_BACKBONESfrom mmdet.utils import get_root_loggerfrom mmcv.runner import _load_checkpointhas_mmdet True except ImportError:print(If for detection, please install mmdetection first)has_mmdet Falseclass Partial_conv3(nn.Module):def __init__(self, dim, n_div, forward):super().__init__()self.dim_conv3 dim // n_divself.dim_untouched dim - self.dim_conv3self.partial_conv3 nn.Conv2d(self.dim_conv3, self.dim_conv3, 3, 1, 1, biasFalse)if forward slicing:self.forward self.forward_slicingelif forward split_cat:self.forward self.forward_split_catelse:raise NotImplementedErrordef forward_slicing(self, x: Tensor) - Tensor:# only for inferencex x.clone() # !!! Keep the original input intact for the residual connection laterx[:, :self.dim_conv3, :, :] self.partial_conv3(x[:, :self.dim_conv3, :, :])return xdef forward_split_cat(self, x: Tensor) - Tensor:# for training/inferencex1, x2 torch.split(x, [self.dim_conv3, self.dim_untouched], dim1)x1 self.partial_conv3(x1)x torch.cat((x1, x2), 1)return xclass MLPBlock(nn.Module):def __init__(self,dim,n_div,mlp_ratio,drop_path,layer_scale_init_value,act_layer,norm_layer,pconv_fw_type):super().__init__()self.dim dimself.mlp_ratio mlp_ratioself.drop_path DropPath(drop_path) if drop_path 0. else nn.Identity()self.n_div n_divmlp_hidden_dim int(dim * mlp_ratio)mlp_layer: List[nn.Module] [nn.Conv2d(dim, mlp_hidden_dim, 1, biasFalse),norm_layer(mlp_hidden_dim),act_layer(),nn.Conv2d(mlp_hidden_dim, dim, 1, biasFalse)]self.mlp nn.Sequential(*mlp_layer)self.spatial_mixing Partial_conv3(dim,n_div,pconv_fw_type)if layer_scale_init_value 0:self.layer_scale nn.Parameter(layer_scale_init_value * torch.ones((dim)), requires_gradTrue)self.forward self.forward_layer_scaleelse:self.forward self.forwarddef forward(self, x: Tensor) - Tensor:shortcut xx self.spatial_mixing(x)x shortcut self.drop_path(self.mlp(x))return xdef forward_layer_scale(self, x: Tensor) - Tensor:shortcut xx self.spatial_mixing(x)x shortcut self.drop_path(self.layer_scale.unsqueeze(-1).unsqueeze(-1) * self.mlp(x))return xclass BasicStage(nn.Module):def __init__(self,dim,depth,n_div,mlp_ratio,drop_path,layer_scale_init_value,norm_layer,act_layer,pconv_fw_type):super().__init__()blocks_list [MLPBlock(dimdim,n_divn_div,mlp_ratiomlp_ratio,drop_pathdrop_path[i],layer_scale_init_valuelayer_scale_init_value,norm_layernorm_layer,act_layeract_layer,pconv_fw_typepconv_fw_type)for i in range(depth)]self.blocks nn.Sequential(*blocks_list)def forward(self, x: Tensor) - Tensor:x self.blocks(x)return xclass PatchEmbed(nn.Module):def __init__(self, patch_size, patch_stride, in_chans, embed_dim, norm_layer):super().__init__()self.proj nn.Conv2d(in_chans, embed_dim, kernel_sizepatch_size, stridepatch_stride, biasFalse)if norm_layer is not None:self.norm norm_layer(embed_dim)else:self.norm nn.Identity()def forward(self, x: Tensor) - Tensor:x self.norm(self.proj(x))return xclass PatchMerging(nn.Module):def __init__(self, patch_size2, patch_stride2, dim, norm_layer):super().__init__()self.reduction nn.Conv2d(dim, 2 * dim, kernel_sizepatch_size2, stridepatch_stride2, biasFalse)if norm_layer is not None:self.norm norm_layer(2 * dim)else:self.norm nn.Identity()def forward(self, x: Tensor) - Tensor:x self.norm(self.reduction(x))return xclass FasterNet(nn.Module):def __init__(self,in_chans3,num_classes1000,embed_dim96,depths(1, 2, 8, 2),mlp_ratio2.,n_div4,patch_size4,patch_stride4,patch_size22, # for subsequent layerspatch_stride22,patch_normTrue,feature_dim1280,drop_path_rate0.1,layer_scale_init_value0,norm_layerBN,act_layerRELU,fork_featFalse,init_cfgNone,pretrainedNone,pconv_fw_typesplit_cat,**kwargs):super().__init__()if norm_layer BN:norm_layer nn.BatchNorm2delse:raise NotImplementedErrorif act_layer GELU:act_layer nn.GELUelif act_layer RELU:act_layer partial(nn.ReLU, inplaceTrue)else:raise NotImplementedErrorif not fork_feat:self.num_classes num_classesself.num_stages len(depths)self.embed_dim embed_dimself.patch_norm patch_normself.num_features int(embed_dim * 2 ** (self.num_stages - 1))self.mlp_ratio mlp_ratioself.depths depths# split image into non-overlapping patchesself.patch_embed PatchEmbed(patch_sizepatch_size,patch_stridepatch_stride,in_chansin_chans,embed_dimembed_dim,norm_layernorm_layer if self.patch_norm else None)# stochastic depth decay ruledpr [x.item()for x in torch.linspace(0, drop_path_rate, sum(depths))]# build layersstages_list []for i_stage in range(self.num_stages):stage BasicStage(dimint(embed_dim * 2 ** i_stage),n_divn_div,depthdepths[i_stage],mlp_ratioself.mlp_ratio,drop_pathdpr[sum(depths[:i_stage]):sum(depths[:i_stage 1])],layer_scale_init_valuelayer_scale_init_value,norm_layernorm_layer,act_layeract_layer,pconv_fw_typepconv_fw_type)stages_list.append(stage)# patch merging layerif i_stage self.num_stages - 1:stages_list.append(PatchMerging(patch_size2patch_size2,patch_stride2patch_stride2,dimint(embed_dim * 2 ** i_stage),norm_layernorm_layer))self.stages nn.Sequential(*stages_list)self.fork_feat fork_featif self.fork_feat:self.forward self.forward_det# add a norm layer for each outputself.out_indices [0, 2, 4, 6]for i_emb, i_layer in enumerate(self.out_indices):if i_emb 0 and os.environ.get(FORK_LAST3, None):raise NotImplementedErrorelse:layer norm_layer(int(embed_dim * 2 ** i_emb))layer_name fnorm{i_layer}self.add_module(layer_name, layer)else:self.forward self.forward_cls# Classifier headself.avgpool_pre_head nn.Sequential(nn.AdaptiveAvgPool2d(1),nn.Conv2d(self.num_features, feature_dim, 1, biasFalse),act_layer())self.head nn.Linear(feature_dim, num_classes) \if num_classes 0 else nn.Identity()self.apply(self.cls_init_weights)self.init_cfg copy.deepcopy(init_cfg)if self.fork_feat and (self.init_cfg is not None or pretrained is not None):self.init_weights()def cls_init_weights(self, m):if isinstance(m, nn.Linear):trunc_normal_(m.weight, std.02)if isinstance(m, nn.Linear) and m.bias is not None:nn.init.constant_(m.bias, 0)elif isinstance(m, (nn.Conv1d, nn.Conv2d)):trunc_normal_(m.weight, std.02)if m.bias is not None:nn.init.constant_(m.bias, 0)elif isinstance(m, (nn.LayerNorm, nn.GroupNorm)):nn.init.constant_(m.bias, 0)nn.init.constant_(m.weight, 1.0)# init for mmdetection by loading imagenet pre-trained weightsdef init_weights(self, pretrainedNone):logger get_root_logger()if self.init_cfg is None and pretrained is None:logger.warn(fNo pre-trained weights for f{self.__class__.__name__}, ftraining start from scratch)passelse:assert checkpoint in self.init_cfg, fOnly support \fspecify Pretrained in \finit_cfg in \f{self.__class__.__name__} if self.init_cfg is not None:ckpt_path self.init_cfg[checkpoint]elif pretrained is not None:ckpt_path pretrainedckpt _load_checkpoint(ckpt_path, loggerlogger, map_locationcpu)if state_dict in ckpt:_state_dict ckpt[state_dict]elif model in ckpt:_state_dict ckpt[model]else:_state_dict ckptstate_dict _state_dictmissing_keys, unexpected_keys \self.load_state_dict(state_dict, False)# show for debugprint(missing_keys: , missing_keys)print(unexpected_keys: , unexpected_keys)def forward_cls(self, x):# output only the features of last layer for image classificationx self.patch_embed(x)x self.stages(x)x self.avgpool_pre_head(x) # B C 1 1x torch.flatten(x, 1)x self.head(x)return xdef forward_det(self, x: Tensor) - Tensor:# output the features of four stages for dense predictionx self.patch_embed(x)outs []for idx, stage in enumerate(self.stages):x stage(x)if self.fork_feat and idx in self.out_indices:norm_layer getattr(self, fnorm{idx})x_out norm_layer(x)outs.append(x_out)return outs 3、相关分析 3.1 PConv可以取代Conv么不可以其仅是实现了对于C_in与C_out相等时conv的平替同时其只有局部空间信息的交互大部分通道数据是直连输出因此会是输入数据直传到网络深层。故而需要密集全连接的卷积层进行通道间信息交互。在整个论文实验中也没有将FasterNet中pconv替换为Conv的对比pconv。或许FasterNet的优势仅是因为其结构设计尤其是对输入进行PatchEmbed将空间大小降低为原来的1/16也就是是使用Conv替代pconv在acc与延时上或许依旧占据优势。同样对于PWConv也没有等效对比将FasterNet中pconv替换为PWConv或许还能再度迎来性能提升。毕竟在作者实验中PWConv在gpu上推理速度比pconv更具优势拟合能力与pconv不相上下。 3.2 FasterNet中的FLOPs分布基于以下代码构建了一个简易的FasterNet模型并输出了每一层的flops if __name____main__:modelFasterNet( depths(1, 1, 1, 1),)from fvcore.nn import flop_count_table, FlopCountAnalysis, ActivationCountAnalysis x torch.randn(1, 3, 256, 256)# model SAFMN(dim36, n_blocks12, ffn_scale2.0, upscaling_factor2)print(fparams: {sum(map(lambda x: x.numel(), model.parameters()))})print(flop_count_table(FlopCountAnalysis(model, x), activationsActivationCountAnalysis(model, x)))output model(x)print(output.shape)代码运行输出效果如下可以发现模型关键模块FasterBlock中flops的大头在blocks.0.mlp上spatial_mixing.partial_conv3即pconv只占据了模块10%的计算量为0.21m。 | module | #parameters or shape | #flops | #activations | |:--------------------------------------------------|:-----------------------|:-----------|:---------------| | model | 7.4M | 0.948G | 3.136M | | patch_embed | 4.8K | 20.84M | 0.393M | | patch_embed.proj | 4.608K | 18.874M | 0.393M | | patch_embed.proj.weight | (96, 3, 4, 4) | | | | patch_embed.norm | 0.192K | 1.966M | 0 | | patch_embed.norm.weight | (96,) | | | | patch_embed.norm.bias | (96,) | | | | stages | 5.131M | 0.924G | 2.74M | | stages.0.blocks.0 | 42.432K | 0.176G | 1.278M | | stages.0.blocks.0.mlp | 37.248K | 0.155G | 1.18M | | stages.0.blocks.0.spatial_mixing.partial_conv3 | 5.184K | 21.234M | 98.304K | | stages.1 | 74.112K | 76.481M | 0.197M | | stages.1.reduction | 73.728K | 75.497M | 0.197M | | stages.1.norm | 0.384K | 0.983M | 0 | | stages.2.blocks.0 | 0.169M | 0.174G | 0.639M | | stages.2.blocks.0.mlp | 0.148M | 0.153G | 0.59M | | stages.2.blocks.0.spatial_mixing.partial_conv3 | 20.736K | 21.234M | 49.152K | | stages.3 | 0.296M | 75.989M | 98.304K | | stages.3.reduction | 0.295M | 75.497M | 98.304K | | stages.3.norm | 0.768K | 0.492M | 0 | | stages.4.blocks.0 | 0.674M | 0.173G | 0.319M | | stages.4.blocks.0.mlp | 0.591M | 0.152G | 0.295M | | stages.4.blocks.0.spatial_mixing.partial_conv3 | 82.944K | 21.234M | 24.576K | | stages.5 | 1.181M | 75.743M | 49.152K | | stages.5.reduction | 1.18M | 75.497M | 49.152K | | stages.5.norm | 1.536K | 0.246M | 0 | | stages.6.blocks.0 | 2.694M | 0.173G | 0.16M | | stages.6.blocks.0.mlp | 2.362M | 0.151G | 0.147M | | stages.6.blocks.0.spatial_mixing.partial_conv3 | 0.332M | 21.234M | 12.288K | | avgpool_pre_head | 0.983M | 1.032M | 1.28K | | avgpool_pre_head.1 | 0.983M | 0.983M | 1.28K | | avgpool_pre_head.1.weight | (1280, 768, 1, 1) | | | | avgpool_pre_head.0 | | 49.152K | 0 | | head | 1.281M | 1.28M | 1K | | head.weight | (1000, 1280) | | | | head.bias | (1000,) | | |3.3 将PConv替换为Conv的FLops变化将原来的Partial_conv3类代码替换为以下代码 class Partial_conv3(nn.Module):def __init__(self, dim, n_div, forward):super().__init__()self.conv nn.Conv2d(dim, dim, 3, 1, 1, biasFalse)def forward(self, x: Tensor) - Tensor:# only for inferencex x.clone() # !!! Keep the original input intact for the residual connection laterx self.conv(x)return x再次运行以下代码后 if __name____main__:modelFasterNet( depths(1, 1, 1, 1),)from fvcore.nn import flop_count_table, FlopCountAnalysis, ActivationCountAnalysis x torch.randn(1, 3, 256, 256)# model SAFMN(dim36, n_blocks12, ffn_scale2.0, upscaling_factor2)print(fparams: {sum(map(lambda x: x.numel(), model.parameters()))})print(flop_count_table(FlopCountAnalysis(model, x), activationsActivationCountAnalysis(model, x)))output model(x)print(output.shape)这里可以发现flops为2.22g相比与原来的0.98g翻了一倍。在新的FasterBlock中spatial_mixing.conv中flops的占比达到了70%为0.34g相比于原来的21m为16倍。 | module | #parameters or shape | #flops | #activations | |:-----------------------------------------|:-----------------------|:-----------|:---------------| | model | 14.009M | 2.222G | 3.689M | | patch_embed | 4.8K | 20.84M | 0.393M | | patch_embed.proj | 4.608K | 18.874M | 0.393M | | patch_embed.proj.weight | (96, 3, 4, 4) | | | | patch_embed.norm | 0.192K | 1.966M | 0 | | patch_embed.norm.weight | (96,) | | | | patch_embed.norm.bias | (96,) | | | | stages | 11.74M | 2.199G | 3.293M | | stages.0.blocks.0 | 0.12M | 0.495G | 1.573M | | stages.0.blocks.0.mlp | 37.248K | 0.155G | 1.18M | | stages.0.blocks.0.spatial_mixing.conv | 82.944K | 0.34G | 0.393M | | stages.1 | 74.112K | 76.481M | 0.197M | | stages.1.reduction | 73.728K | 75.497M | 0.197M | | stages.1.norm | 0.384K | 0.983M | 0 | | stages.2.blocks.0 | 0.48M | 0.493G | 0.786M | | stages.2.blocks.0.mlp | 0.148M | 0.153G | 0.59M | | stages.2.blocks.0.spatial_mixing.conv | 0.332M | 0.34G | 0.197M | | stages.3 | 0.296M | 75.989M | 98.304K | | stages.3.reduction | 0.295M | 75.497M | 98.304K | | stages.3.norm | 0.768K | 0.492M | 0 | | stages.4.blocks.0 | 1.918M | 0.492G | 0.393M | | stages.4.blocks.0.mlp | 0.591M | 0.152G | 0.295M | | stages.4.blocks.0.spatial_mixing.conv | 1.327M | 0.34G | 98.304K | | stages.5 | 1.181M | 75.743M | 49.152K | | stages.5.reduction | 1.18M | 75.497M | 49.152K | | stages.5.norm | 1.536K | 0.246M | 0 | | stages.6.blocks.0 | 7.671M | 0.491G | 0.197M | | stages.6.blocks.0.mlp | 2.362M | 0.151G | 0.147M | | stages.6.blocks.0.spatial_mixing.conv | 5.308M | 0.34G | 49.152K | | avgpool_pre_head | 0.983M | 1.032M | 1.28K | | avgpool_pre_head.1 | 0.983M | 0.983M | 1.28K | | avgpool_pre_head.1.weight | (1280, 768, 1, 1) | | | | avgpool_pre_head.0 | | 49.152K | 0 | | head | 1.281M | 1.28M | 1K | | head.weight | (1000, 1280) | | | | head.bias | (1000,) | | | torch.Size([1, 1000])3.3 整体结论基于3.1-3.3的分析可以发现我们不能直接用pconv取代模型中所有的conv层但可以在部分层中取代个别flops较大的conv中。pconv只是近似conv的一个选择其仅是在FasterNet的架构设计下发挥作用直接平替到其他模型中必然存在水土不服需要额外的PWConv层实现信息交互。但是FasterNet却为我们提供了一个强大的backbone其在轻量级与重量级模型中均达到了最佳精度下的最快速度可以用于图像分类、目标检测中。然后在我们的实验中或许可以将FasterNet中的Pconv替换为DWConv这样也许能再次提升backbone能力的提升。毕竟作者没有做这个对比也说不定是发现Pconv不如DWConv后隐匿了这一部分实验数据

http://www.hkea.cn/news/14355022/

相关文章：

专业网页设计费用太原网站推广优化

宜城网站定制wordpress 多级分类

免费网站app下载做网站找不到客户

wordpress换站百度推广广告收费标准

开发区网站建设的目的动易网站cms

平台电商是什么意思网站做优化有必要吗

西安做网站优化怀来网站seo

企业网站怎么自适应工商注册名称核准查询

马蜂窝网站建设目的俄罗斯搜索引擎入口

报班学网站开发价格网站申请域名流程

小型电子商务网站规划建设方案简阳网站建设

免费网站建设模版云盘公司测名

摄影网站的制作公司网站建设后期维护

关键词查询网站广州外贸型网站建设

都用什么软件做网站杭州做网站小程序公司

建设网站公司域名青海省公路建设网站

网站跟别的做的一样的成都企业网站商城定制

华为云上面可以代做网站吗贵州建设厅报名登录网站

合肥网站建设公司加盟如何下载ppt模板免费下载

一个空间多个php网站台州做鞋子网站

软件综合课设做网站提高工作效率的方法

浅谈网站建设开发电子商务的发展现状和前景趋势

天猫交易网站中国建设银行信用卡电话

有哪些小公司网站湖北城乡建设厅官方网站

郑州网站APP合肥广告牌制作公司

莆田建设网站建站河南住房和建设厅网站

上海网站建设海淘科技韩国网站设计欣赏

用php做的博客网站有哪些网站建设哪里公司好

动漫毕业设计作品网站pc端的移动端网站建设

深圳网站建设服务中心官网灵感网站