
ContinuousTime Fitted Value Iteration for Robust Policies
Solving the HamiltonJacobiBellman equation is important in many domain...
read it

Dynamically optimal treatment allocation using Reinforcement Learning
Consider a situation wherein a stream of individuals arrive sequentially...
read it

Qlearning for Optimal Control of Continuoustime Systems
In this paper, two Qlearning (QL) methods are proposed and their conver...
read it

Reinforcement Learning with FunctionValued Action Spaces for Partial Differential Equation Control
Recent work has shown that reinforcement learning (RL) is a promising ap...
read it

Integral Policy Iterations for Reinforcement Learning Problems in Continuous Time and Space
Policy iteration (PI) is a recursive process of policy evaluation and im...
read it

Offpolicy reinforcement learning for H_∞ control design
The H_∞ control design problem is considered for nonlinear systems with ...
read it

Optimal Reinforcement Learning for Gaussian Systems
The explorationexploitation tradeoff is among the central challenges o...
read it
Distributional Offline ContinuousTime Reinforcement Learning with Neural PhysicsInformed PDEs (SciPhy RL for DOCTRL)
This paper addresses distributional offline continuoustime reinforcement learning (DOCTRL) with stochastic policies for highdimensional optimal control. A soft distributional version of the classical HamiltonJacobiBellman (HJB) equation is given by a semilinear partial differential equation (PDE). This `soft HJB equation' can be learned from offline data without assuming that the latter correspond to a previous optimal or nearoptimal policy. A datadriven solution of the soft HJB equation uses methods of Neural PDEs and PhysicsInformed Neural Networks developed in the field of Scientific Machine Learning (SciML). The suggested approach, dubbed `SciPhy RL', thus reduces DOCTRL to solving neural PDEs from data. Our algorithm called Deep DOCTRL converts offline highdimensional data into an optimal policy in one step by reducing it to supervised learning, instead of relying on value iteration or policy iteration methods. The method enables a computable approach to the quality control of obtained policies in terms of both their expected returns and uncertainties about their values.
READ FULL TEXT
Comments
There are no comments yet.