当前位置: 首页 > news >正文

武进区住房和城乡建设局网站教育主管部门建设的专题资源网站

武进区住房和城乡建设局网站,教育主管部门建设的专题资源网站,暂时没有域名怎么做网站,python网站开发简单吗PyTorch——利用Accelerate轻松控制多个CPU/GPU/TPU加速计算 前言官方示例单个程序内控制多个CPU/GPU/TPU简单说一下设备环境导包加载数据 FashionMNIST创建一个简单的CNN模型训练函数-只包含训练训练函数-包含训练和验证训练 多个服务器、多个程序间控制多个CPU/GPU/TPU参考链… PyTorch——利用Accelerate轻松控制多个CPU/GPU/TPU加速计算 前言官方示例单个程序内控制多个CPU/GPU/TPU简单说一下设备环境导包加载数据 FashionMNIST创建一个简单的CNN模型训练函数-只包含训练训练函数-包含训练和验证训练 多个服务器、多个程序间控制多个CPU/GPU/TPU参考链接 前言 CPUGPUTPU 计算设备太多很混乱切换环境代码大量改来改去不懂怎么调用多个CPU/GPU/TPU或者想轻松调用 OKOKOK 来自HuggingFace的Accelerate库帮你轻松解决这些问题只需几行代码改动就可以快速完成计算设备的自动调整。 相关地址 官方文档https://huggingface.co/docs/accelerate/indexGitHubhttps://github.com/huggingface/accelerate安装推荐用0.14的版本 $ pip install accelerate 下面就来说说怎么用 你也可以直接看我在Kaggle上做好的完整的Notebook示例 官方示例 先大致看个样移除掉以前.to(device)部分的代码引入Accelerator对model、optimizer、data、loss.backward()做下处理即可 import torch import torch.nn.functional as F from datasets import load_dataset from accelerate import Accelerator# device cpu accelerator Accelerator()# model torch.nn.Transformer().to(device) model torch.nn.Transformer() optimizer torch.optim.Adam(model.parameters())dataset load_dataset(my_dataset) data torch.utils.data.DataLoader(dataset, shuffleTrue)model, optimizer, data accelerator.prepare(model, optimizer, data)model.train() for epoch in range(10):for source, targets in data:# source source.to(device)# targets targets.to(device)optimizer.zero_grad()output model(source)loss F.cross_entropy(output, targets)# loss.backward()accelerator.backward(loss)optimizer.step()单个程序内控制多个CPU/GPU/TPU 详细内容请参考官方Example 简单说一下 对于单个计算设备像前面那个简单示例改下代码即可多个计算设备例如GPU的情况下有一点特殊的要处理下面做个完整的PyTorch训练示例 你可以拿这个和我之前发的示例做个对比 CNN图像分类-FashionMNIST也可以直接看我在Kaggle上做好的完整的Notebook示例 设备环境 看看当前的显卡设备2颗Tesla T4命令 $ nvidia-smi Thu Apr 27 10:53:26 2023 ----------------------------------------------------------------------------- | NVIDIA-SMI 470.161.03 Driver Version: 470.161.03 CUDA Version: 11.4 | |--------------------------------------------------------------------------- | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | || | 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 | | N/A 43C P8 9W / 70W | 0MiB / 15109MiB | 0% Default | | | | N/A | --------------------------------------------------------------------------- | 1 Tesla T4 Off | 00000000:00:05.0 Off | 0 | | N/A 41C P8 9W / 70W | 0MiB / 15109MiB | 0% Default | | | | N/A | -------------------------------------------------------------------------------------------------------------------------------------------------------- | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | || | No running processes found | -----------------------------------------------------------------------------安装或更新Accelerate命令 $ !pip install --upgrade accelerate 导包 import torch from torch import nn from torch.utils.data import DataLoader from torchvision.transforms import ToTensor, Compose import torchvision.datasets as datasets from accelerate import Accelerator from accelerate import notebook_launcher加载数据 FashionMNIST train_data datasets.FashionMNIST(root./data,trainTrue,downloadTrue,transformCompose([ToTensor()]) )test_data datasets.FashionMNIST(root./data,trainFalse,downloadTrue,transformCompose([ToTensor()]) )print(train_data.data.shape) print(test_data.data.shape)创建一个简单的CNN模型 class CNNModel(nn.Module):def __init__(self):super(CNNModel, self).__init__()self.module1 nn.Sequential(nn.Conv2d(1, 32, kernel_size5, stride1, padding2),nn.BatchNorm2d(32),nn.ReLU(),nn.MaxPool2d(kernel_size2, stride2)) self.module2 nn.Sequential(nn.Conv2d(32, 64, kernel_size5, stride1, padding2),nn.BatchNorm2d(64),nn.ReLU(),nn.MaxPool2d(kernel_size2, stride2))self.flatten nn.Flatten()self.linear1 nn.Linear(7 * 7 * 64, 64)self.linear2 nn.Linear(64, 10)self.relu nn.ReLU()def forward(self, x):out self.module1(x)out self.module2(out)out self.flatten(out)out self.linear1(out)out self.relu(out)out self.linear2(out)return out训练函数-只包含训练 注意看accelerator相关代码若要实现多设备控制训练for epoch in range(epoch_num):中末尾处的代码必不可少 def training_function():# 参数配置epoch_num 4batch_size 64learning_rate 0.005# device torch.device(cuda:0 if torch.cuda.is_available() else cpu)# 数据train_loader DataLoader(datasettrain_data, batch_sizebatch_size, shuffleTrue)val_loader DataLoader(test_data, batch_sizebatch_size, shuffleTrue)# 模型/损失函数/优化器# model CNNModel().to(device)model CNNModel()criterion nn.CrossEntropyLoss()optimizer torch.optim.Adam(model.parameters(), lrlearning_rate)accelerator Accelerator()model, optimizer, train_loader, val_loader accelerator.prepare(model, optimizer, train_loader, val_loader)# 开始训练for epoch in range(epoch_num):# 训练model.train()for i, (X_train, y_train) in enumerate(train_loader):# X_train X_train.to(device)# y_train y_train.to(device)out model(X_train)loss criterion(out, y_train)optimizer.zero_grad()# loss.backward()accelerator.backward(loss)optimizer.step()if (i 1) % 100 0:print(f{accelerator.device} Train... [epoch {epoch 1}/{epoch_num}, step {i 1}/{len(train_loader)}]\t[loss {loss.item()}])# 等待每个GPU上的模型执行完当前的epoch并进行合并同步accelerator.wait_for_everyone() model accelerator.unwrap_model(model)# 现在所有GPU上都一样了可以保存modelaccelerator.save(model, model.pth) 训练函数-包含训练和验证 相比前面的代码多了“验证”相关的代码验证时因为使用多个设备进行训练所以会比较特殊会涉及到多个设备的验证结果合并的问题 def training_function():# 参数配置epoch_num 4batch_size 64learning_rate 0.005# 数据train_loader DataLoader(datasettrain_data, batch_sizebatch_size, shuffleTrue)val_loader DataLoader(test_data, batch_sizebatch_size, shuffleTrue)# 模型/损失函数/优化器model CNNModel()criterion nn.CrossEntropyLoss()optimizer torch.optim.Adam(model.parameters(), lrlearning_rate)accelerator Accelerator()model, optimizer, train_loader, val_loader accelerator.prepare(model, optimizer, train_loader, val_loader)# 开始训练for epoch in range(epoch_num):# 训练model.train()for i, (X_train, y_train) in enumerate(train_loader):out model(X_train)loss criterion(out, y_train)optimizer.zero_grad()accelerator.backward(loss)optimizer.step()if (i 1) % 100 0:print(f{accelerator.device} Train... [epoch {epoch 1}/{epoch_num}, step {i 1}/{len(train_loader)}]\t[loss {loss.item()}])# 验证model.eval()correct, total 0, 0for X_val, y_val in val_loader:with torch.no_grad():output model(X_val)_, pred torch.max(output, 1)# 合并每个GPU的验证数据pred, y_val accelerator.gather_for_metrics((pred, y_val))total y_val.size(0)correct (pred y_val).sum()# 用main process打印accuracyaccelerator.print(fepoch {epoch 1}/{epoch_num}, accuracy {100 * (correct.item() / total):.2f})# 等待每个GPU上的模型执行完当前的epoch并进行合并同步accelerator.wait_for_everyone() model accelerator.unwrap_model(model)# 现在所有GPU上都一样了可以保存modelaccelerator.save(model, model.pth) 训练 如果你在本地训练的话直接调用前面定义的函数training_function即可。最后在命令行启动训练脚本 $ accelerate launch example.py。 training_function()如果你在Kaggle/Colab上面则需要利用notebook_launcher进行训练 # num_processes2 指定使用2个GPU因为当前我申请了2颗 Nvidia T4 notebook_launcher(training_function, num_processes2)下面是2个GPU训练时的控制台输出样例 Launching training on 2 GPUs. cuda:0 Train... [epoch 1/4, step 100/469] [loss 0.43843933939933777] cuda:1 Train... [epoch 1/4, step 100/469] [loss 0.5267877578735352] cuda:0 Train... [epoch 1/4, step 200/469] [loss 0.39918822050094604]cuda:1 Train... [epoch 1/4, step 200/469] [loss 0.2748252749443054]cuda:1 Train... [epoch 1/4, step 300/469] [loss 0.54105544090271]cuda:0 Train... [epoch 1/4, step 300/469] [loss 0.34716445207595825]cuda:1 Train... [epoch 1/4, step 400/469] [loss 0.2694844901561737] cuda:0 Train... [epoch 1/4, step 400/469] [loss 0.4343942701816559] epoch 1/4, accuracy 88.49 cuda:0 Train... [epoch 2/4, step 100/469] [loss 0.19695354998111725] cuda:1 Train... [epoch 2/4, step 100/469] [loss 0.2911057770252228] cuda:0 Train... [epoch 2/4, step 200/469] [loss 0.2948791980743408] cuda:1 Train... [epoch 2/4, step 200/469] [loss 0.292676717042923] cuda:0 Train... [epoch 2/4, step 300/469] [loss 0.222089946269989] cuda:1 Train... [epoch 2/4, step 300/469] [loss 0.28814008831977844] cuda:0 Train... [epoch 2/4, step 400/469] [loss 0.3431250751018524] cuda:1 Train... [epoch 2/4, step 400/469] [loss 0.2546379864215851] epoch 2/4, accuracy 87.31 cuda:1 Train... [epoch 3/4, step 100/469] [loss 0.24118559062480927]cuda:0 Train... [epoch 3/4, step 100/469] [loss 0.363821804523468]cuda:0 Train... [epoch 3/4, step 200/469] [loss 0.36783623695373535] cuda:1 Train... [epoch 3/4, step 200/469] [loss 0.18346744775772095] cuda:0 Train... [epoch 3/4, step 300/469] [loss 0.23459288477897644] cuda:1 Train... [epoch 3/4, step 300/469] [loss 0.2887689769268036] cuda:0 Train... [epoch 3/4, step 400/469] [loss 0.3079166114330292] cuda:1 Train... [epoch 3/4, step 400/469] [loss 0.18255220353603363] epoch 3/4, accuracy 88.46 cuda:1 Train... [epoch 4/4, step 100/469] [loss 0.27428603172302246] cuda:0 Train... [epoch 4/4, step 100/469] [loss 0.17705145478248596] cuda:1 Train... [epoch 4/4, step 200/469] [loss 0.2811894416809082] cuda:0 Train... [epoch 4/4, step 200/469] [loss 0.22682836651802063] cuda:0 Train... [epoch 4/4, step 300/469] [loss 0.2291710525751114] cuda:1 Train... [epoch 4/4, step 300/469] [loss 0.32024848461151123] cuda:0 Train... [epoch 4/4, step 400/469] [loss 0.24648766219615936] cuda:1 Train... [epoch 4/4, step 400/469] [loss 0.0805584192276001] epoch 4/4, accuracy 89.38下面是1个TPU训练时的控制台输出样例 Launching training on CPU. xla:0 Train... [epoch 1/4, step 100/938] [loss 0.6051161289215088] xla:0 Train... [epoch 1/4, step 200/938] [loss 0.27442359924316406] xla:0 Train... [epoch 1/4, step 300/938] [loss 0.557417631149292] xla:0 Train... [epoch 1/4, step 400/938] [loss 0.1840067058801651] xla:0 Train... [epoch 1/4, step 500/938] [loss 0.5252436399459839] xla:0 Train... [epoch 1/4, step 600/938] [loss 0.2718536853790283] xla:0 Train... [epoch 1/4, step 700/938] [loss 0.2763175368309021] xla:0 Train... [epoch 1/4, step 800/938] [loss 0.39897507429122925] xla:0 Train... [epoch 1/4, step 900/938] [loss 0.28720396757125854] epoch 0, accuracy 86.36 xla:0 Train... [epoch 2/4, step 100/938] [loss 0.24496735632419586] xla:0 Train... [epoch 2/4, step 200/938] [loss 0.37713131308555603] xla:0 Train... [epoch 2/4, step 300/938] [loss 0.3106330633163452] xla:0 Train... [epoch 2/4, step 400/938] [loss 0.40438592433929443] xla:0 Train... [epoch 2/4, step 500/938] [loss 0.38303741812705994] xla:0 Train... [epoch 2/4, step 600/938] [loss 0.39199298620224] xla:0 Train... [epoch 2/4, step 700/938] [loss 0.38932573795318604] xla:0 Train... [epoch 2/4, step 800/938] [loss 0.26298171281814575] xla:0 Train... [epoch 2/4, step 900/938] [loss 0.21517205238342285] epoch 1, accuracy 90.07 xla:0 Train... [epoch 3/4, step 100/938] [loss 0.366019606590271] xla:0 Train... [epoch 3/4, step 200/938] [loss 0.27360212802886963] xla:0 Train... [epoch 3/4, step 300/938] [loss 0.2014923095703125] xla:0 Train... [epoch 3/4, step 400/938] [loss 0.21998485922813416] xla:0 Train... [epoch 3/4, step 500/938] [loss 0.28129786252975464] xla:0 Train... [epoch 3/4, step 600/938] [loss 0.42534705996513367] xla:0 Train... [epoch 3/4, step 700/938] [loss 0.22158119082450867] xla:0 Train... [epoch 3/4, step 800/938] [loss 0.359947144985199] xla:0 Train... [epoch 3/4, step 900/938] [loss 0.3221997022628784] epoch 2, accuracy 90.36 xla:0 Train... [epoch 4/4, step 100/938] [loss 0.2814193069934845] xla:0 Train... [epoch 4/4, step 200/938] [loss 0.16465164721012115] xla:0 Train... [epoch 4/4, step 300/938] [loss 0.2897304892539978] xla:0 Train... [epoch 4/4, step 400/938] [loss 0.13403896987438202] xla:0 Train... [epoch 4/4, step 500/938] [loss 0.1135573536157608] xla:0 Train... [epoch 4/4, step 600/938] [loss 0.14964193105697632] xla:0 Train... [epoch 4/4, step 700/938] [loss 0.20239461958408356] xla:0 Train... [epoch 4/4, step 800/938] [loss 0.23625142872333527] xla:0 Train... [epoch 4/4, step 900/938] [loss 0.3418393135070801] epoch 3, accuracy 90.11多个服务器、多个程序间控制多个CPU/GPU/TPU 详细内容请参考官方Example包括 单服务器内多个程序控制多个计算设备多个服务器间多个程序控制多个计算设备 写好代码后请先在每个服务器下执行$ accelerate config生成对应的配置文件下面是个样例 (huggingface) PS C:\Users\alion\temp accelerate config ------------------------------------------------------------------------------------------------------------------------In which compute environment are you running? This machine ------------------------------------------------------------------------------------------------------------------------Which type of machine are you using? multi-GPU How many different machines will you use (use more than 1 for multi-node training)? [1]: 2 ------------------------------------------------------------------------------------------------------------------------What is the rank of this machine? 0 What is the IP address of the machine that will host the main process? 192.168.101 What is the port you will use to communicate with the main process? 12345 Are all the machines on the same local network? Answer no if nodes are on the cloud and/or on different network hosts [YES/no]: yes Do you wish to optimize your script with torch dynamo?[yes/NO]:no Do you want to use DeepSpeed? [yes/NO]: no Do you want to use FullyShardedDataParallel? [yes/NO]: no Do you want to use Megatron-LM ? [yes/NO]: no How many GPU(s) should be used for distributed training? [1]:2 What GPU(s) (by id) should be used for training on this machine as a comma-seperated list? [all]:0 ------------------------------------------------------------------------------------------------------------------------Do you wish to use FP16 or BF16 (mixed precision)? fp16 accelerate configuration saved at C:\Users\alion/.cache\huggingface\accelerate\default_config.yaml最后在每个服务器启动训练脚本 $ accelerate launch example.py如果你是单台服务器多个程序那就只启动一台的脚本就完了 参考链接 https://github.com/huggingface/acceleratehttps://www.kaggle.com/code/muellerzr/multi-gpu-and-acceleratehttps://github.com/huggingface/notebooks/blob/main/examples/accelerate_examples/simple_nlp_example.ipynbhttps://github.com/huggingface/accelerate/tree/main/examples
http://www.hkea.cn/news/14305619/

相关文章:

  • 十堰秦楚网 十堰新闻门户网站凡诺网站建设
  • 仿快递网站源码wordpress做成公众号
  • 提高网站订单转化率邯郸新闻
  • 域名购买哪个网站上海seo怎么优化
  • 深圳百度seo整站中国行业网站联盟
  • 万网如何建设网站二手汽车手机网站模板
  • 有关网站建设的标题怎样做团购网站
  • 杭州网站建设网页制作商城项目
  • 网站建设与网络营销珠海室内设计学校
  • 网站开发对算法有要求么互联网内容服务商有哪些
  • 网站设计的优点利用分类信息网站做推广
  • 龙采哈尔滨建站公司怎样做app
  • 免费模板网站知乎个人网站 icp 代理
  • 广州定制型网站建设大连企业信息
  • 做恋爱方面的网站外贸流程图片
  • 网站素材模板个人型网站开站费用
  • 学习吧网站网站主色调简介
  • 建设网站松岗做微信平台图片网站
  • 织梦源码网站模板网站设计方案怎么做
  • 做检索网站如何建淘客网站
  • 网页美工制作网站网站验证码原理
  • 在网上做效果图赚钱的网站如今做哪些网站能致富
  • 帝国网站地图模板邯郸哪做网站
  • 如何快速优化网站排名网站的目的和意义
  • 绵阳网站建设软件有哪些西安做视频网站公司
  • 做网站准备的资料北京网站建设工作室哪家好
  • 网站一键生成手机网站特价锦州网站建设
  • 摄影婚纱网站建设网站快捷按钮以什么方式做
  • 专门找人做软件的网站扬州做网站
  • 做英语趣味教具的网站网站服务器申请