当前位置：首页 > news >正文

餐饮美食网站模板源码上海网站建设制作微信

news 2026/4/16 18:59:31

餐饮美食网站模板源码,上海网站建设制作微信,上海网站建设海淘科技,嘉定网络公司欢迎关注我的CSDN#xff1a;https://spike.blog.csdn.net/ 本文地址#xff1a;https://spike.blog.csdn.net/article/details/143749468 免责声明#xff1a;本文来源于个人知识与公开资料#xff0c;仅用于学术交流#xff0c;欢迎讨论#xff0c;不支持转载。影响 (… 欢迎关注我的CSDNhttps://spike.blog.csdn.net/ 本文地址https://spike.blog.csdn.net/article/details/143749468 免责声明本文来源于个人知识与公开资料仅用于学术交流欢迎讨论不支持转载。影响 (多模态)大语言模型参数量的主要网络模块即 Linear、Embedding、Norm(LayerNorm or RMSNorm) 等 3 个部分其中多模态大模型还包括 Conv3D手动计算参数量与 PyTorch 直接计算保持一致。 PyTorch 源码 def count_parameters(model):return sum(p.numel() for p in model.parameters() if p.requires_grad)以 Qwen2-VL-7B-Instruct 、Qwen2-7B-Instruct、Llama-3.1-8B-Instruct 为例。网络结构参数量 Linear参数矩阵或者加上biasLinear(in_featuresw, out_featuresh, biasTrue) 参数量是 xw*hh当 biasFalse, 则是 xw*h。Embedding认为是没有 bias 的 Linear。Norm LayerNorm 包括 2 个可训练参数 γ \gamma γ 和 β \beta β假设 hidden_size 的大小为 hhidden_size 每一维都有两个参数即 2*hidden_sizeRMSNorm 每 1 维则只有 1 个可训练参数 , 即 hidden_size Conv3D即 Conv3d(3, 1280, kernel_size(2, 14, 14), stride(2, 14, 14), biasFalse)即参数量输入维度*输出维度*卷积核 3*1280*2*14*141505280RotaryEmbedding、Activition 和 Dropout旋转位置编码、激活函数、Dropout 都没有可训练参数 Llama-3.1-8B-Instruct 参数量 128256 ∗ 4096 32 ∗ ( 4096 ∗ 4096 ∗ 2 4096 ∗ 1024 ∗ 2 4096 ∗ 14336 ∗ 3 2 ∗ 4096 ) 4096 4096 ∗ 128256 8030261248 8 B 128256*4096 32*(4096*4096*2 4096*1024*2 4096*14336*3 2*4096) 4096 4096*128256 8030261248 8B 128256∗409632∗(4096∗4096∗24096∗1024∗24096∗14336∗32∗4096)40964096∗12825680302612488B 即 P a r a m e t e r s E m b e d d i n g l a y e r s ∗ ( L i n e a r Q K V O L i n e a r m l p R M S N o r m ) R M S N o r m L i n e a r Parameters Embedding layers*(Linear_{QKVO} Linear_{mlp}RMSNorm) RMSNorm Linear ParametersEmbeddinglayers∗(LinearQKVOLinearmlpRMSNorm)RMSNormLinear 计算参数量[Info] parameters: 8030261248 大语言模型 Llama-3.1-8B-Instruct 的网络结构 LlamaForCausalLM((model): LlamaModel((embed_tokens): Embedding(128256, 4096)(layers): ModuleList((0-31): 32 x LlamaDecoderLayer((self_attn): LlamaSdpaAttention((q_proj): Linear(in_features4096, out_features4096, biasFalse)(k_proj): Linear(in_features4096, out_features1024, biasFalse)(v_proj): Linear(in_features4096, out_features1024, biasFalse)(o_proj): Linear(in_features4096, out_features4096, biasFalse)(rotary_emb): LlamaRotaryEmbedding())(mlp): LlamaMLP((gate_proj): Linear(in_features4096, out_features14336, biasFalse)(up_proj): Linear(in_features4096, out_features14336, biasFalse)(down_proj): Linear(in_features14336, out_features4096, biasFalse)(act_fn): SiLU())(input_layernorm): LlamaRMSNorm((4096,), eps1e-05)(post_attention_layernorm): LlamaRMSNorm((4096,), eps1e-05)))(norm): LlamaRMSNorm((4096,), eps1e-05)(rotary_emb): LlamaRotaryEmbedding())(lm_head): Linear(in_features4096, out_features128256, biasFalse) )多模态视觉大模型 Qwen2-VL-7B-Instruct 的网络结构 Qwen2VLForConditionalGeneration((visual): Qwen2VisionTransformerPretrainedModel((patch_embed): PatchEmbed((proj): Conv3d(3, 1280, kernel_size(2, 14, 14), stride(2, 14, 14), biasFalse))(rotary_pos_emb): VisionRotaryEmbedding()(blocks): ModuleList((0-31): 32 x Qwen2VLVisionBlock((norm1): LayerNorm((1280,), eps1e-06, elementwise_affineTrue)(norm2): LayerNorm((1280,), eps1e-06, elementwise_affineTrue)(attn): VisionSdpaAttention((qkv): Linear(in_features1280, out_features3840, biasTrue)(proj): Linear(in_features1280, out_features1280, biasTrue))(mlp): VisionMlp((fc1): Linear(in_features1280, out_features5120, biasTrue)(act): QuickGELUActivation()(fc2): Linear(in_features5120, out_features1280, biasTrue))))(merger): PatchMerger((ln_q): LayerNorm((1280,), eps1e-06, elementwise_affineTrue)(mlp): Sequential((0): Linear(in_features5120, out_features5120, biasTrue)(1): GELU(approximatenone)(2): Linear(in_features5120, out_features3584, biasTrue))))(model): Qwen2VLModel((embed_tokens): Embedding(152064, 3584)(layers): ModuleList((0-27): 28 x Qwen2VLDecoderLayer((self_attn): Qwen2VLSdpaAttention((q_proj): Linear(in_features3584, out_features3584, biasTrue)(k_proj): Linear(in_features3584, out_features512, biasTrue)(v_proj): Linear(in_features3584, out_features512, biasTrue)(o_proj): Linear(in_features3584, out_features3584, biasFalse)(rotary_emb): Qwen2VLRotaryEmbedding())(mlp): Qwen2MLP((gate_proj): Linear(in_features3584, out_features18944, biasFalse)(up_proj): Linear(in_features3584, out_features18944, biasFalse)(down_proj): Linear(in_features18944, out_features3584, biasFalse)(act_fn): SiLU())(input_layernorm): Qwen2RMSNorm((3584,), eps1e-06)(post_attention_layernorm): Qwen2RMSNorm((3584,), eps1e-06)))(norm): Qwen2RMSNorm((3584,), eps1e-06)(rotary_emb): Qwen2VLRotaryEmbedding())(lm_head): Linear(in_features3584, out_features152064, biasFalse) )总参数量[Info] parameters: 8291375616 视觉模型的参数量[Info] parameters model.visual: 675759104语言模型的参数量[Info] parameters model.model: 7070619136 [Info] parameters model.lm_head: 544997376 即675759104(8.15%) 7070619136(85.28%) 544997376(6.57%) 8291375616 8B Qwen2-VL-7B-Instruct 的 Qwen2VisionTransformerPretrainedModel 参数量 patch_embed 参数量 3*1280*2*14*141505280blocks 参数量[Info] parameters model.visual.blocks: 629678080 详细计算公式32*(1280*2*2 (12801)*3840 (12801)*1280 1280*5121 5120*1281)629678080 merger 参数量合并计算公式 3 ∗ 1280 ∗ 2 ∗ 14 ∗ 14 32 ∗ ( 1280 ∗ 2 ∗ 2 ( 1280 1 ) ∗ 3840 ( 1280 1 ) ∗ 1280 1280 ∗ 5121 5120 ∗ 1281 ) 1280 ∗ 2 5120 ∗ 5121 ( 5120 1 ) ∗ 3584 675759104 3*1280*2*14*14 32*(1280*2*2 (12801)*3840 (12801)*1280 1280*5121 5120*1281) 1280*2 5120*5121 (51201)*3584 \\ 675759104 3∗1280∗2∗14∗1432∗(1280∗2∗2(12801)∗3840(12801)∗12801280∗51215120∗1281)1280∗25120∗5121(51201)∗3584675759104 Qwen2-VL-7B-Instruct 的 Qwen2VLModel 参数量 152064 ∗ 3584 28 ∗ ( ( 3584 1 ) ∗ 3584 ( 3584 1 ) ∗ 512 ∗ 2 3584 ∗ 3584 3584 ∗ 18944 ∗ 3 2 ∗ 3584 ) 3584 7070619136 3584 ∗ 152064 544997376 152064*3584 28*((35841)*3584 (35841)*512*2 3584*3584 3584*18944*3 2*3584) 3584 \\ 7070619136 \\ 3584 * 152064 544997376 152064∗358428∗((35841)∗3584(35841)∗512∗23584∗35843584∗18944∗32∗3584)358470706191363584∗152064544997376 因此Qwen2-VL-7B 的数据量完全对齐。测试 # 预训练模型, 查看其词表大小 import torch import transformers from transformers import AutoModelForCausalLM, AutoTokenizer from transformers import Qwen2VLForConditionalGeneration, AutoTokenizer, AutoProcessorprint(f[Info] transformers version: {transformers.__version__})def count_parameters(model):return sum(p.numel() for p in model.parameters() if p.requires_grad)# ------------ Qwen2-VL-7B ----------- # model_path [your path]/llm/Qwen/Qwen2-VL-7B-Instruct print(f[Info] model_path: {model_path})# Load the model in half-precision on the available device(s) model Qwen2VLForConditionalGeneration.from_pretrained(model_path, torch_dtypeauto, device_mapauto ) processor AutoProcessor.from_pretrained(model_path) configuration model.config print(f[Info] Qwen2-VL-7B vocab_size: {configuration.vocab_size}) print(model) print(f[Info] parameters: {count_parameters(model)}) print(f[Info] parameters model.visual: {count_parameters(model.visual)}) print(f[Info] parameters model.model: {count_parameters(model.model)}) print(f[Info] parameters model.lm_head: {count_parameters(model.lm_head)}) print(f[Info] parameters model.visual.patch_embed: {count_parameters(model.visual.patch_embed)}) print(f[Info] parameters model.visual.blocks: {count_parameters(model.visual.blocks)}) print(f[Info] parameters model.visual.blocks[0].norm1: {count_parameters(model.visual.blocks[0].norm1)}) print(f[Info] parameters model.visual.blocks[0].norm2: {count_parameters(model.visual.blocks[0].norm2)}) print(f[Info] parameters model.visual.blocks[0].attn: {count_parameters(model.visual.blocks[0].attn)}) print(f[Info] parameters model.visual.blocks[0].mlp: {count_parameters(model.visual.blocks[0].mlp)}) # ------------ Qwen2-VL-7B ----------- ## ------------ Qwen2-7B ----------- # model_path [your path]/llm/Qwen/Qwen2-7B-Instruct print(f[Info] model_path: {model_path})device cuda # the device to load the model onto model AutoModelForCausalLM.from_pretrained(model_path, device_mapauto) tokenizer AutoTokenizer.from_pretrained(model_path) print(f[Info] Qwen2-7B vocab_size: {tokenizer.vocab_size}) print(model) print(f[Info] parameters: {count_parameters(model)}) # ------------ Qwen2-7B ----------- ## ------------ Llama-3.1-8B ----------- # model_path [your path]/llm/Meta-Llama-3.1-8B-Instruct print(f[Info] model_path: {model_path}) tokenizer AutoTokenizer.from_pretrained(model_path) model AutoModelForCausalLM.from_pretrained(model_path,torch_dtypetorch.bfloat16,device_mapauto, ) print(f[Info] Llama-3.1-8B vocab_size: {tokenizer.vocab_size}) print(model) print(f[Info] parameters: {count_parameters(model)}) # ------------ Llama-3.1-8B ----------- #Qwen2-7B 的参数量是 7615616512即 7070619136 544997376 7615616512 参考大模型的参数量是如何计算的大模型参数量如何计算如何根据模型结构计算大模型的参数量

查看全文

http://www.hkea.cn/news/14291441/