新华路网站建设,redis缓存wordpress,网站建设asp文件怎么展现,福田欧辉校车一、本文介绍 本文给大家带来的改进机制是主干网络#xff0c;一个名字EfficientViT的特征提取网络(和之前发布的只是同名但不是同一个)#xff0c;其基本原理是提升视觉变换器在高效处理高分辨率视觉任务的能力。它采用了创新的建筑模块设计#xff0c;包括三明治布局和级联…一、本文介绍 本文给大家带来的改进机制是主干网络一个名字EfficientViT的特征提取网络(和之前发布的只是同名但不是同一个)其基本原理是提升视觉变换器在高效处理高分辨率视觉任务的能力。它采用了创新的建筑模块设计包括三明治布局和级联群组注意力模块。其是一种高效率的特征提取网络训练速度非常快推理速度也要比基础版本的要快其效果完爆之前的MobileNetV3等轻量化网络模型。欢迎大家订阅本专栏本专栏每周更新3-5篇最新机制更有包含我所有改进的文件和交流群提供给大家。
欢迎大家订阅我的专栏一起学习YOLO! 专栏目录YOLOv8改进有效系列目录 | 包含卷积、主干、检测头、注意力机制、Neck上百种创新机制 专栏回顾YOLOv8改进系列专栏——本专栏持续复习各种顶会内容——科研必备 目录
一、本文介绍
二、EfficientViT原理
2.1 EfficientViT的基本原理
三、EfficientViT的核心代码
四、手把手教你添加EfficientVit
4.1 修改一
4.2 修改二
4.3 修改三
4.4 修改四
4.5 修改五
4.6 修改六
4.7 修改七
4.8 修改八
注意 额外的修改
修改八
注意事项
五、EfficientViT的yaml文件
5.1 训练文件的代码
六、成功运行记录
七、本文总结 二、EfficientViT原理 论文地址论文官方地址
代码地址代码官方地址 2.1 EfficientViT的基本原理
EfficientViT的基本原理是提升视觉变换器在高效处理高分辨率视觉任务的能力。它采用了创新的建筑模块设计包括三明治布局和级联群组注意力模块。
1. 三明治布局在前馈神经网络FFN层之间使用单个受内存限制的多头自注意力机制MHSA以提高内存效率。
2. 级联群组注意力模块通过将不同的特征分割喂给不同的注意力头减少计算冗余并提高注意力的多样性。
下面为大家展示了EfficientViT的整体架构和关键组成部分 (a). 架构概览EfficientViT的整体架构分为三个阶段每个阶段都包含了若干EfficientViT块随着阶段的进展特征图的维度会减小而通道数会增加。
(b). 三明治布局块展示了EfficientViT块的内部结构它采用了一种三明治布局其中的自注意力层绿色部分被两层前馈神经网络FFN夹在中间。
(c). 级联群组注意力这是一个创新的注意力机制通过将输入特征分割成不同的部分分别喂给
不同的注意力头。 三、EfficientViT的核心代码
代码的使用方式看章节四。
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.utils.checkpoint as checkpoint
import itertools
from timm.models.layers import SqueezeExcite
import numpy as np
import itertools__all__ [EfficientViT_M0, EfficientViT_M1, EfficientViT_M2, EfficientViT_M3, EfficientViT_M4,EfficientViT_M5]class Conv2d_BN(torch.nn.Sequential):def __init__(self, a, b, ks1, stride1, pad0, dilation1,groups1, bn_weight_init1, resolution-10000):super().__init__()self.add_module(c, torch.nn.Conv2d(a, b, ks, stride, pad, dilation, groups, biasFalse))self.add_module(bn, torch.nn.BatchNorm2d(b))torch.nn.init.constant_(self.bn.weight, bn_weight_init)torch.nn.init.constant_(self.bn.bias, 0)torch.no_grad()def switch_to_deploy(self):c, bn self._modules.values()w bn.weight / (bn.running_var bn.eps) ** 0.5w c.weight * w[:, None, None, None]b bn.bias - bn.running_mean * bn.weight / \(bn.running_var bn.eps) ** 0.5m torch.nn.Conv2d(w.size(1) * self.c.groups, w.size(0), w.shape[2:], strideself.c.stride, paddingself.c.padding, dilationself.c.dilation,groupsself.c.groups)m.weight.data.copy_(w)m.bias.data.copy_(b)return mdef replace_batchnorm(net):for child_name, child in net.named_children():if hasattr(child, fuse):setattr(net, child_name, child.fuse())elif isinstance(child, torch.nn.BatchNorm2d):setattr(net, child_name, torch.nn.Identity())else:replace_batchnorm(child)class PatchMerging(torch.nn.Module):def __init__(self, dim, out_dim, input_resolution):super().__init__()hid_dim int(dim * 4)self.conv1 Conv2d_BN(dim, hid_dim, 1, 1, 0, resolutioninput_resolution)self.act torch.nn.ReLU()self.conv2 Conv2d_BN(hid_dim, hid_dim, 3, 2, 1, groupshid_dim, resolutioninput_resolution)self.se SqueezeExcite(hid_dim, .25)self.conv3 Conv2d_BN(hid_dim, out_dim, 1, 1, 0, resolutioninput_resolution // 2)def forward(self, x):x self.conv3(self.se(self.act(self.conv2(self.act(self.conv1(x))))))return xclass Residual(torch.nn.Module):def __init__(self, m, drop0.):super().__init__()self.m mself.drop dropdef forward(self, x):if self.training and self.drop 0:return x self.m(x) * torch.rand(x.size(0), 1, 1, 1,devicex.device).ge_(self.drop).div(1 - self.drop).detach()else:return x self.m(x)class FFN(torch.nn.Module):def __init__(self, ed, h, resolution):super().__init__()self.pw1 Conv2d_BN(ed, h, resolutionresolution)self.act torch.nn.ReLU()self.pw2 Conv2d_BN(h, ed, bn_weight_init0, resolutionresolution)def forward(self, x):x self.pw2(self.act(self.pw1(x)))return xclass CascadedGroupAttention(torch.nn.Module):r Cascaded Group Attention.Args:dim (int): Number of input channels.key_dim (int): The dimension for query and key.num_heads (int): Number of attention heads.attn_ratio (int): Multiplier for the query dim for value dimension.resolution (int): Input resolution, correspond to the window size.kernels (List[int]): The kernel size of the dw conv on query.def __init__(self, dim, key_dim, num_heads8,attn_ratio4,resolution14,kernels[5, 5, 5, 5], ):super().__init__()self.num_heads num_headsself.scale key_dim ** -0.5self.key_dim key_dimself.d int(attn_ratio * key_dim)self.attn_ratio attn_ratioqkvs []dws []for i in range(num_heads):qkvs.append(Conv2d_BN(dim // (num_heads), self.key_dim * 2 self.d, resolutionresolution))dws.append(Conv2d_BN(self.key_dim, self.key_dim, kernels[i], 1, kernels[i] // 2, groupsself.key_dim,resolutionresolution))self.qkvs torch.nn.ModuleList(qkvs)self.dws torch.nn.ModuleList(dws)self.proj torch.nn.Sequential(torch.nn.ReLU(), Conv2d_BN(self.d * num_heads, dim, bn_weight_init0, resolutionresolution))points list(itertools.product(range(resolution), range(resolution)))N len(points)attention_offsets {}idxs []for p1 in points:for p2 in points:offset (abs(p1[0] - p2[0]), abs(p1[1] - p2[1]))if offset not in attention_offsets:attention_offsets[offset] len(attention_offsets)idxs.append(attention_offsets[offset])self.attention_biases torch.nn.Parameter(torch.zeros(num_heads, len(attention_offsets)))self.register_buffer(attention_bias_idxs,torch.LongTensor(idxs).view(N, N))torch.no_grad()def train(self, modeTrue):super().train(mode)if mode and hasattr(self, ab):del self.abelse:self.ab self.attention_biases[:, self.attention_bias_idxs]def forward(self, x): # x (B,C,H,W)B, C, H, W x.shapetrainingab self.attention_biases[:, self.attention_bias_idxs]feats_in x.chunk(len(self.qkvs), dim1)feats_out []feat feats_in[0]for i, qkv in enumerate(self.qkvs):if i 0: # add the previous output to the inputfeat feat feats_in[i]feat qkv(feat)q, k, v feat.view(B, -1, H, W).split([self.key_dim, self.key_dim, self.d], dim1) # B, C/h, H, Wq self.dws[i](q)q, k, v q.flatten(2), k.flatten(2), v.flatten(2) # B, C/h, Nattn ((q.transpose(-2, -1) k) * self.scale(trainingab[i] if self.training else self.ab[i]))attn attn.softmax(dim-1) # BNNfeat (v attn.transpose(-2, -1)).view(B, self.d, H, W) # BCHWfeats_out.append(feat)x self.proj(torch.cat(feats_out, 1))return xclass LocalWindowAttention(torch.nn.Module):r Local Window Attention.Args:dim (int): Number of input channels.key_dim (int): The dimension for query and key.num_heads (int): Number of attention heads.attn_ratio (int): Multiplier for the query dim for value dimension.resolution (int): Input resolution.window_resolution (int): Local window resolution.kernels (List[int]): The kernel size of the dw conv on query.def __init__(self, dim, key_dim, num_heads8,attn_ratio4,resolution14,window_resolution7,kernels[5, 5, 5, 5], ):super().__init__()self.dim dimself.num_heads num_headsself.resolution resolutionassert window_resolution 0, window_size must be greater than 0self.window_resolution window_resolutionself.attn CascadedGroupAttention(dim, key_dim, num_heads,attn_ratioattn_ratio,resolutionwindow_resolution,kernelskernels, )def forward(self, x):B, C, H, W x.shapeif H self.window_resolution and W self.window_resolution:x self.attn(x)else:x x.permute(0, 2, 3, 1)pad_b (self.window_resolution - H %self.window_resolution) % self.window_resolutionpad_r (self.window_resolution - W %self.window_resolution) % self.window_resolutionpadding pad_b 0 or pad_r 0if padding:x torch.nn.functional.pad(x, (0, 0, 0, pad_r, 0, pad_b))pH, pW H pad_b, W pad_rnH pH // self.window_resolutionnW pW // self.window_resolution# window partition, BHWC - B(nHh)(nWw)C - BnHnWhwC - (BnHnW)hwC - (BnHnW)Chwx x.view(B, nH, self.window_resolution, nW, self.window_resolution, C).transpose(2, 3).reshape(B * nH * nW, self.window_resolution, self.window_resolution, C).permute(0, 3, 1, 2)x self.attn(x)# window reverse, (BnHnW)Chw - (BnHnW)hwC - BnHnWhwC - B(nHh)(nWw)C - BHWCx x.permute(0, 2, 3, 1).view(B, nH, nW, self.window_resolution, self.window_resolution,C).transpose(2, 3).reshape(B, pH, pW, C)if padding:x x[:, :H, :W].contiguous()x x.permute(0, 3, 1, 2)return xclass EfficientViTBlock(torch.nn.Module): A basic EfficientViT building block.Args:type (str): Type for token mixer. Default: s for self-attention.ed (int): Number of input channels.kd (int): Dimension for query and key in the token mixer.nh (int): Number of attention heads.ar (int): Multiplier for the query dim for value dimension.resolution (int): Input resolution.window_resolution (int): Local window resolution.kernels (List[int]): The kernel size of the dw conv on query.def __init__(self, type,ed, kd, nh8,ar4,resolution14,window_resolution7,kernels[5, 5, 5, 5], ):super().__init__()self.dw0 Residual(Conv2d_BN(ed, ed, 3, 1, 1, groupsed, bn_weight_init0., resolutionresolution))self.ffn0 Residual(FFN(ed, int(ed * 2), resolution))if type s:self.mixer Residual(LocalWindowAttention(ed, kd, nh, attn_ratioar, \resolutionresolution, window_resolutionwindow_resolution,kernelskernels))self.dw1 Residual(Conv2d_BN(ed, ed, 3, 1, 1, groupsed, bn_weight_init0., resolutionresolution))self.ffn1 Residual(FFN(ed, int(ed * 2), resolution))def forward(self, x):return self.ffn1(self.dw1(self.mixer(self.ffn0(self.dw0(x)))))class EfficientViT(torch.nn.Module):def __init__(self, img_size400,patch_size16,frozen_stages0,in_chans3,stages[s, s, s],embed_dim[64, 128, 192],key_dim[16, 16, 16],depth[1, 2, 3],num_heads[4, 4, 4],window_size[7, 7, 7],kernels[5, 5, 5, 5],down_ops[[subsample, 2], [subsample, 2], []],pretrainedNone,distillationFalse, ):super().__init__()resolution img_sizeself.patch_embed torch.nn.Sequential(Conv2d_BN(in_chans, embed_dim[0] // 8, 3, 2, 1, resolutionresolution),torch.nn.ReLU(),Conv2d_BN(embed_dim[0] // 8, embed_dim[0] // 4, 3, 2, 1,resolutionresolution // 2), torch.nn.ReLU(),Conv2d_BN(embed_dim[0] // 4, embed_dim[0] // 2, 3, 2, 1,resolutionresolution // 4), torch.nn.ReLU(),Conv2d_BN(embed_dim[0] // 2, embed_dim[0], 3, 1, 1,resolutionresolution // 8))resolution img_size // patch_sizeattn_ratio [embed_dim[i] / (key_dim[i] * num_heads[i]) for i in range(len(embed_dim))]self.blocks1 []self.blocks2 []self.blocks3 []for i, (stg, ed, kd, dpth, nh, ar, wd, do) in enumerate(zip(stages, embed_dim, key_dim, depth, num_heads, attn_ratio, window_size, down_ops)):for d in range(dpth):eval(self.blocks str(i 1)).append(EfficientViTBlock(stg, ed, kd, nh, ar, resolution, wd, kernels))if do[0] subsample:# (Subsample stride)blk eval(self.blocks str(i 2))resolution_ (resolution - 1) // do[1] 1blk.append(torch.nn.Sequential(Residual(Conv2d_BN(embed_dim[i], embed_dim[i], 3, 1, 1, groupsembed_dim[i], resolutionresolution)),Residual(FFN(embed_dim[i], int(embed_dim[i] * 2), resolution)), ))blk.append(PatchMerging(*embed_dim[i:i 2], resolution))resolution resolution_blk.append(torch.nn.Sequential(Residual(Conv2d_BN(embed_dim[i 1], embed_dim[i 1], 3, 1, 1, groupsembed_dim[i 1],resolutionresolution)),Residual(FFN(embed_dim[i 1], int(embed_dim[i 1] * 2), resolution)), ))self.blocks1 torch.nn.Sequential(*self.blocks1)self.blocks2 torch.nn.Sequential(*self.blocks2)self.blocks3 torch.nn.Sequential(*self.blocks3)self.width_list [i.size(1) for i in self.forward(torch.randn(1, 3, 640, 640))]def forward(self, x):outs []x self.patch_embed(x)outs.append(x)x self.blocks1(x)outs.append(x)x self.blocks2(x)outs.append(x)x self.blocks3(x)outs.append(x)return outsEfficientViT_m0 {img_size: 224,patch_size: 16,embed_dim: [64, 128, 192],depth: [1, 2, 3],num_heads: [4, 4, 4],window_size: [7, 7, 7],kernels: [7, 5, 3, 3],
}EfficientViT_m1 {img_size: 224,patch_size: 16,embed_dim: [128, 144, 192],depth: [1, 2, 3],num_heads: [2, 3, 3],window_size: [7, 7, 7],kernels: [7, 5, 3, 3],
}EfficientViT_m2 {img_size: 224,patch_size: 16,embed_dim: [128, 192, 224],depth: [1, 2, 3],num_heads: [4, 3, 2],window_size: [7, 7, 7],kernels: [7, 5, 3, 3],
}EfficientViT_m3 {img_size: 224,patch_size: 16,embed_dim: [128, 240, 320],depth: [1, 2, 3],num_heads: [4, 3, 4],window_size: [7, 7, 7],kernels: [5, 5, 5, 5],
}EfficientViT_m4 {img_size: 224,patch_size: 16,embed_dim: [128, 256, 384],depth: [1, 2, 3],num_heads: [4, 4, 4],window_size: [7, 7, 7],kernels: [7, 5, 3, 3],
}EfficientViT_m5 {img_size: 224,patch_size: 16,embed_dim: [192, 288, 384],depth: [1, 3, 4],num_heads: [3, 3, 4],window_size: [7, 7, 7],kernels: [7, 5, 3, 3],
}def EfficientViT_M0(pretrained, frozen_stages0, distillationFalse, fuseFalse, pretrained_cfgNone,model_cfgEfficientViT_m0):model EfficientViT(frozen_stagesfrozen_stages, distillationdistillation, pretrainedpretrained, **model_cfg)if pretrained:model.load_state_dict(update_weight(model.state_dict(), torch.load(pretrained)[model]))if fuse:replace_batchnorm(model)return modeldef EfficientViT_M1(pretrained, frozen_stages0, distillationFalse, fuseFalse, pretrained_cfgNone,model_cfgEfficientViT_m1):model EfficientViT(frozen_stagesfrozen_stages, distillationdistillation, pretrainedpretrained, **model_cfg)if pretrained:model.load_state_dict(update_weight(model.state_dict(), torch.load(pretrained)[model]))if fuse:replace_batchnorm(model)return modeldef EfficientViT_M2(pretrained, frozen_stages0, distillationFalse, fuseFalse, pretrained_cfgNone,model_cfgEfficientViT_m2):model EfficientViT(frozen_stagesfrozen_stages, distillationdistillation, pretrainedpretrained, **model_cfg)if pretrained:model.load_state_dict(update_weight(model.state_dict(), torch.load(pretrained)[model]))if fuse:replace_batchnorm(model)return modeldef EfficientViT_M3(pretrained, frozen_stages0, distillationFalse, fuseFalse, pretrained_cfgNone,model_cfgEfficientViT_m3):model EfficientViT(frozen_stagesfrozen_stages, distillationdistillation, pretrainedpretrained, **model_cfg)if pretrained:model.load_state_dict(update_weight(model.state_dict(), torch.load(pretrained)[model]))if fuse:replace_batchnorm(model)return modeldef EfficientViT_M4(pretrained, frozen_stages0, distillationFalse, fuseFalse, pretrained_cfgNone,model_cfgEfficientViT_m4):model EfficientViT(frozen_stagesfrozen_stages, distillationdistillation, pretrainedpretrained, **model_cfg)if pretrained:model.load_state_dict(update_weight(model.state_dict(), torch.load(pretrained)[model]))if fuse:replace_batchnorm(model)return modeldef EfficientViT_M5(pretrained, frozen_stages0, distillationFalse, fuseFalse, pretrained_cfgNone,model_cfgEfficientViT_m5):model EfficientViT(frozen_stagesfrozen_stages, distillationdistillation, pretrainedpretrained, **model_cfg)if pretrained:model.load_state_dict(update_weight(model.state_dict(), torch.load(pretrained)[model]))if fuse:replace_batchnorm(model)return modeldef update_weight(model_dict, weight_dict):idx, temp_dict 0, {}for k, v in weight_dict.items():# k k[9:]if k in model_dict.keys() and np.shape(model_dict[k]) np.shape(v):temp_dict[k] vidx 1model_dict.update(temp_dict)print(floading weights... {idx}/{len(model_dict)} items)return model_dict 四、手把手教你添加EfficientVit
4.1 修改一
第一步还是建立文件我们找到如下ultralytics/nn/modules文件夹下建立一个目录名字呢就是Addmodules文件夹(用群内的文件的话已经有了无需新建)然后在其内部建立一个新的py文件将核心代码复制粘贴进去即可。
此处需要注意我们之前已经修改过一个EfficientViT了所以这里需要加一个2其模型为名字重复实际不是一个。 4.2 修改二
第二步我们在该目录下创建一个新的py文件名字为__init__.py(用群内的文件的话已经有了无需新建)然后在其内部导入我们的检测头如下图所示。 4.3 修改三
第三步我门中到如下文件ultralytics/nn/tasks.py进行导入和注册我们的模块(用群内的文件的话已经有了无需重新导入直接开始第四步即可)
从今天开始以后的教程就都统一成这个样子了因为我默认大家用了我群内的文件来进行修改 4.4 修改四
添加如下两行代码
4.5 修改五
找到七百多行大概把具体看图片按照图片来修改就行添加红框内的部分注意没有()只是函数名。 elif m in {自行添加对应的模型即可下面都是一样的}:m m(*args)c2 m.width_list # 返回通道列表backbone True 4.6 修改六
下面的两个红框内都是需要改动的。
if isinstance(c2, list):m_ mm_.backbone Trueelse:m_ nn.Sequential(*(m(*args) for _ in range(n))) if n 1 else m(*args) # modulet str(m)[8:-2].replace(__main__., ) # module typem.np sum(x.numel() for x in m_.parameters()) # number paramsm_.i, m_.f, m_.type i 4 if backbone else i, f, t # attach index, from index, type 4.7 修改七
如下的也需要修改全部按照我的来。
代码如下把原先的代码替换了即可。 if verbose:LOGGER.info(f{i:3}{str(f):20}{n_:3}{m.np:10.0f} {t:45}{str(args):30}) # printsave.extend(x % (i 4 if backbone else i) for x in ([f] if isinstance(f, int) else f) if x ! -1) # append to savelistlayers.append(m_)if i 0:ch []if isinstance(c2, list):ch.extend(c2)if len(c2) ! 5:ch.insert(0, 0)else:ch.append(c2) 4.8 修改八
修改七和前面的都不太一样需要修改前向传播中的一个部分 已经离开了parse_model方法了。
可以在图片中开代码行数没有离开task.py文件都是同一个文件。 同时这个部分有好几个前向传播都很相似大家不要看错了是70多行左右的同时我后面提供了代码大家直接复制粘贴即可有时间我针对这里会出一个视频。
代码如下- def _predict_once(self, x, profileFalse, visualizeFalse):Perform a forward pass through the network.Args:x (torch.Tensor): The input tensor to the model.profile (bool): Print the computation time of each layer if True, defaults to False.visualize (bool): Save the feature maps of the model if True, defaults to False.Returns:(torch.Tensor): The last output of the model.y, dt [], [] # outputsfor m in self.model:if m.f ! -1: # if not from previous layerx y[m.f] if isinstance(m.f, int) else [x if j -1 else y[j] for j in m.f] # from earlier layersif profile:self._profile_one_layer(m, x, dt)if hasattr(m, backbone):x m(x)if len(x) ! 5: # 0 - 5x.insert(0, None)for index, i in enumerate(x):if index in self.save:y.append(i)else:y.append(None)x x[-1] # 最后一个输出传给下一层else:x m(x) # runy.append(x if m.i in self.save else None) # save outputif visualize:feature_visualization(x, m.type, m.i, save_dirvisualize)return x
到这里就完成了修改部分但是这里面细节很多大家千万要注意不要替换多余的代码导致报错也不要拉下任何一部都会导致运行失败而且报错很难排查很难排查 注意 额外的修改
关注我的其实都知道我大部分的修改都是一样的这个网络需要额外的修改一步就是s一个参数将下面的s改为640即可完美运行 修改八
我们找到如下文件ultralytics/utils/torch_utils.py按照如下的图片进行修改否则容易打印不出来计算量。
注意事项
如果大家在验证的时候报错形状不匹配的错误可以固定验证集的图片尺寸方法如下 -
找到下面这个文件ultralytics/models/yolo/detect/train.py然后其中有一个类是DetectionTrainer class中的build_dataset函数中的一个参数rectmode val改为rectFalse 五、EfficientViT的yaml文件
复制如下yaml文件进行运行
# Ultralytics YOLO , AGPL-3.0 license
# YOLOv8 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect# Parameters
nc: 80 # number of classes
scales: # model compound scaling constants, i.e. modelyolov8n.yaml will call yolov8.yaml with scale n# [depth, width, max_channels]n: [0.33, 0.25, 1024] # YOLOv8n summary: 225 layers, 3157200 parameters, 3157184 gradients, 8.9 GFLOPss: [0.33, 0.50, 1024] # YOLOv8s summary: 225 layers, 11166560 parameters, 11166544 gradients, 28.8 GFLOPsm: [0.67, 0.75, 768] # YOLOv8m summary: 295 layers, 25902640 parameters, 25902624 gradients, 79.3 GFLOPsl: [1.00, 1.00, 512] # YOLOv8l summary: 365 layers, 43691520 parameters, 43691504 gradients, 165.7 GFLOPsx: [1.00, 1.25, 512] # YOLOv8x summary: 365 layers, 68229648 parameters, 68229632 gradients, 258.5 GFLOPs# YOLOv8.0n backbone
backbone:# [from, repeats, module, args]- [-1, 1, EfficientViT_M0, []] # 4 大家可以下载官方的与训练权重进行加载训练用字符串格式放在参数list里就行- [-1, 1, SPPF, [1024, 5]] # 5# YOLOv8.0n head
head:- [-1, 1, nn.Upsample, [None, 2, nearest]] # 6- [[-1, 3], 1, Concat, [1]] # 7 cat backbone P4- [-1, 3, C2f, [512]] # 8- [-1, 1, nn.Upsample, [None, 2, nearest]] # 9- [[-1, 2], 1, Concat, [1]] # 10 cat backbone P3- [-1, 3, C2f, [256]] # 11 (P3/8-small)- [-1, 1, Conv, [256, 3, 2]] # 12- [[-1, 8], 1, Concat, [1]] # 13 cat head P4- [-1, 3, C2f, [512]] # 14 (P4/16-medium)- [-1, 1, Conv, [512, 3, 2]] # 15- [[-1, 5], 1, Concat, [1]] # 16 cat head P5- [-1, 3, C2f, [1024]] # 17 (P5/32-large)- [[11, 14, 17], 1, Detect, [nc]] # Detect(P3, P4, P5)5.1 训练文件的代码
可以复制我的运行文件进行运行。
import warnings
warnings.filterwarnings(ignore)
from ultralytics import YOLOif __name__ __main__:model YOLO(替换你的yaml文件地址)model.load(yolov8n.pt) model.train(datar你的数据集的地址,cacheFalse,imgsz640,epochs150,batch4,close_mosaic0,workers0,device0,optimizerSGDampFalse,)六、成功运行记录
下面是成功运行的截图已经完成了有1个epochs的训练图片太大截不全第2个epochs了。 七、本文总结
到此本文的正式分享内容就结束了在这里给大家推荐我的YOLOv8改进有效涨点专栏本专栏目前为新开的平均质量分98分后期我会根据各种最新的前沿顶会进行论文复现也会对一些老的改进机制进行补充如果大家觉得本文帮助到你了订阅本专栏关注后续更多的更新~ 专栏回顾YOLOv8改进系列专栏——本专栏持续复习各种顶会内容——科研必备