强化学习：连续控制问题中Actor-Critic算法的linear baseline - 服务器托管|北京服务器租用|机房托管租用|IDC托管租用|机房机柜带宽租用-价格及费用咨询

最近在看连续控制问题，看到了一个Actor-Critic算法中手动扩展features和设置linear baseline的方法，这些方法源自论文：《Benchmarking Deep Reinforcement Learning for Continuous Control》。

对于低维的features我们可以手动扩展：

代码实现：

return torch.cat([observations, observations ** 2, al, al ** 2, al ** 3, ones], dim=2)

—————————————————–

linear baseline，在AC算法中给Critic降低方差之用，给出一种简单的线性拟合方式，使用最小二乘法拟合：

代码：

def fit(self, episodes):
        # sequence_length * batch_size x feature_size
        featmat = self._feature(episodes).view(-1, self.feature_size)
        # sequence_length * batch_size x 1
        returns = episodes.returns.view(-1, 1)

        reg_coeff = self._reg_coeff
        eye = torch.eye(self.feature_size, dtype=torch.float32,
                        device=self.linear.weight.device)
        for _ in range(5):
            try:
                coeffs = torch.linalg.lstsq(
                    torch.matmul(featmat.t(), featmat) + reg_coeff * eye,
                    torch.matmul(featmat.t(), returns)
                ).solution
                break
            except RuntimeError:
                reg_coeff += 10
        else:
            raise RuntimeError('Unable to solve the normal equations in '
                               '`LinearFeatureBaseline`. The matrix X^T*X (with X the design '
                               'matrix) is not full-rank, regardless of the regularization '
                               '(maximum regularization: {0}).'.format(reg_coeff))
        self.linear.weight.data = coeffs.data.t()

===============================================

详细代码地址：

https://gitee.com/devilmaycry812839668/MAML-Pytorch-RL/blob/master/maml_rl/baseline.py

服务器托管，北京服务器托管，服务器租用 http://www.fwqtg.net
机房租用，北京机房租用，IDC机房托管， http://www.fwqtg.net

相关推荐: 自从用了 Kiali 以后才知道，配置 Istio 的流量管理是如此容易

在生产环境中，直接登录服务器是非常不方便的，我们可以使用Kiali配置Istio的流量管理。本文以Istio官方提供的Bookinfo应用示例为例，使用Kiali配置Istio的流量管理。Bookinfo应用的架构图如下：其中，包含四个单独的微服务： pr…

服务器托管，北京服务器托管，服务器租用，机房机柜带宽租用