wap网站解析,服装网站推广计划书范文500字,公司网站内容更新该怎么做,做电子商务网站 费用-1:
numpy必须为1.20.0#xff0c;否则会报错#xff0c;版本冲突0.rlvalue-based: 如q-learning#xff08;走迷宫#xff09;#xff0c;对当前状态下作出的动作进行价值计算#xff0c;通过贪婪策略穷尽所有可能选择最佳state-action,但是对于连续的动作空间#x…-1:
numpy必须为1.20.0否则会报错版本冲突0.rlvalue-based: 如q-learning走迷宫对当前状态下作出的动作进行价值计算通过贪婪策略穷尽所有可能选择最佳state-action,但是对于连续的动作空间动作的值是无穷的把他们离散化会维度爆炸MSEpolicy-based: 比如移动机器人在一个室内环境中导航策略网络以机器人当前的传感器信息如激光雷达数据、摄像头图像作为输入输出动作的概率分布对于连续动作。通过策略梯度算法根据机器人是否成功到达目标位置以及所花费的时间等奖励信号来更新策略网络的参数。训练过程可能会受到局部最优解的影响并且策略梯度的估计可能存在较大方差导致训练不稳定LOSS:负号是因为我们要使用梯度下降优化器而策略梯度的目标是进行梯度上升。--PPO 是一种改进的策略梯度算法旨在提高训练的稳定性和样本效率。它通过限制策略更新的步长避免更新幅度过大导致性能下降。1.代码解析
1.0 on_policy_runner.py: class OnPolicyRunner.learntot_iter self.current_learning_iteration num_learning_iterationsfor it in range(self.current_learning_iteration, tot_iter):start time.time()with torch.inference_mode():for i in range(self.num_steps_per_env):actions self.alg.act(obs, critic_obs) # 2048884(6419)--transformer 预测动作分布再采样出2048*19个关节的动作, 84是观测值观测值的计算综合了机器人的多种状态信息如姿态、角速度、指令、关节位置和速度、动作等并且可以根据配置添加感知输入和噪声。obs, privileged_obs, rewards, dones, infos self.env.step(actions) #仿真环境中执行上面的动作,action--》compute_torques;compute observations,rewards, resetscritic_obs privileged_obs if privileged_obs is not None else obsobs, critic_obs, rewards, dones obs.to(self.device), critic_obs.to(self.device), rewards.to(self.device), dones.to(self.device)self.alg.process_env_step(rewards, dones, infos) # 对环境交互一个时间步结果的处理包括奖励和终止信号的保存、超时情况的处理、转换信息的记录以及智能体的重置为后续的学习和决策提供了基础。1.1 HST
--class H1(): legged_gym/env/h1/h1.py----init: self._super_init--self.create_sim--self._create_envs----step----post_physics_step----reset----compute_reward----compute_observations----create_sim----_compute_torques----_create_envs----render
--class H1RoughCfg( BaseConfig ): legged_gym/env/h1/h1_config.py----class human:----class env:----class terrain:----class commands:----class init_state:----class control:----class asset:----class domain_rand:----class rewards:----class noise:----class sim:
--class H1RoughCfgPPO(BaseConfig):----class policy:----class algorithm:----class runner:2.网络结构
1.ActorNET Transformer mlp
Actor MLP: Transformer((input_layer): Sequential((0): Linear(in_features84, out_features128, biasTrue)(1): Dropout(p0.1, inplaceFalse))(weight_pos_embed): Embedding(8, 128)(attention_blocks): Sequential((0): Transformer_Block((ln_1): LayerNorm((128,), eps1e-05, elementwise_affineTrue)(attn): MultiheadAttention((out_proj): NonDynamicallyQuantizableLinear(in_features128, out_features128, biasTrue))(ln_2): LayerNorm((128,), eps1e-05, elementwise_affineTrue)(mlp): Sequential((0): Linear(in_features128, out_features512, biasTrue)(1): GELU()(2): Linear(in_features512, out_features128, biasTrue)(3): Dropout(p0.1, inplaceFalse)))(1): Transformer_Block((ln_1): LayerNorm((128,), eps1e-05, elementwise_affineTrue)(attn): MultiheadAttention((out_proj): NonDynamicallyQuantizableLinear(in_features128, out_features128, biasTrue))(ln_2): LayerNorm((128,), eps1e-05, elementwise_affineTrue)(mlp): Sequential((0): Linear(in_features128, out_features512, biasTrue)(1): GELU()(2): Linear(in_features512, out_features128, biasTrue)(3): Dropout(p0.1, inplaceFalse)))(2): Transformer_Block((ln_1): LayerNorm((128,), eps1e-05, elementwise_affineTrue)(attn): MultiheadAttention((out_proj): NonDynamicallyQuantizableLinear(in_features128, out_features128, biasTrue))(ln_2): LayerNorm((128,), eps1e-05, elementwise_affineTrue)(mlp): Sequential((0): Linear(in_features128, out_features512, biasTrue)(1): GELU()(2): Linear(in_features512, out_features128, biasTrue)(3): Dropout(p0.1, inplaceFalse)))(3): Transformer_Block((ln_1): LayerNorm((128,), eps1e-05, elementwise_affineTrue)(attn): MultiheadAttention((out_proj): NonDynamicallyQuantizableLinear(in_features128, out_features128, biasTrue))(ln_2): LayerNorm((128,), eps1e-05, elementwise_affineTrue)(mlp): Sequential((0): Linear(in_features128, out_features512, biasTrue)(1): GELU()(2): Linear(in_features512, out_features128, biasTrue)(3): Dropout(p0.1, inplaceFalse))))(output_layer): Sequential((0): LayerNorm((128,), eps1e-05, elementwise_affineTrue)(1): Linear(in_features128, out_features19, biasTrue))
)
2.CriticNet MLP Transformer
Critic MLP: Transformer((input_layer): Sequential((0): Linear(in_features84, out_features128, biasTrue)(1): Dropout(p0.1, inplaceFalse))(weight_pos_embed): Embedding(8, 128)(attention_blocks): Sequential((0): Transformer_Block((ln_1): LayerNorm((128,), eps1e-05, elementwise_affineTrue)(attn): MultiheadAttention((out_proj): NonDynamicallyQuantizableLinear(in_features128, out_features128, biasTrue))(ln_2): LayerNorm((128,), eps1e-05, elementwise_affineTrue)(mlp): Sequential((0): Linear(in_features128, out_features512, biasTrue)(1): GELU()(2): Linear(in_features512, out_features128, biasTrue)(3): Dropout(p0.1, inplaceFalse)))(1): Transformer_Block((ln_1): LayerNorm((128,), eps1e-05, elementwise_affineTrue)(attn): MultiheadAttention((out_proj): NonDynamicallyQuantizableLinear(in_features128, out_features128, biasTrue))(ln_2): LayerNorm((128,), eps1e-05, elementwise_affineTrue)(mlp): Sequential((0): Linear(in_features128, out_features512, biasTrue)(1): GELU()(2): Linear(in_features512, out_features128, biasTrue)(3): Dropout(p0.1, inplaceFalse)))(2): Transformer_Block((ln_1): LayerNorm((128,), eps1e-05, elementwise_affineTrue)(attn): MultiheadAttention((out_proj): NonDynamicallyQuantizableLinear(in_features128, out_features128, biasTrue))(ln_2): LayerNorm((128,), eps1e-05, elementwise_affineTrue)(mlp): Sequential((0): Linear(in_features128, out_features512, biasTrue)(1): GELU()(2): Linear(in_features512, out_features128, biasTrue)(3): Dropout(p0.1, inplaceFalse)))(3): Transformer_Block((ln_1): LayerNorm((128,), eps1e-05, elementwise_affineTrue)(attn): MultiheadAttention((out_proj): NonDynamicallyQuantizableLinear(in_features128, out_features128, biasTrue))(ln_2): LayerNorm((128,), eps1e-05, elementwise_affineTrue)(mlp): Sequential((0): Linear(in_features128, out_features512, biasTrue)(1): GELU()(2): Linear(in_features512, out_features128, biasTrue)(3): Dropout(p0.1, inplaceFalse))))(output_layer): Sequential((0): LayerNorm((128,), eps1e-05, elementwise_affineTrue)(1): Linear(in_features128, out_features1, biasTrue))
)