Continuous-Time Discounting
Advanced RL Theory, Planning & TD Learning DS practice problem on Onlearn.
Difficulty: medium.
Topics: Continuous-Time Discounting, Discount Rate Parameterization, Infinitesimal Generator, Exponential Decay Kernel, Continuous-Time Bellman Operator, Martingale Representation Theorem, Stochastic Calculus, Dynamic Programming, Measure Theory, Functional Analysis, Control Theory, Itô Calculus, Hamilton-Jacobi-Bellman Equations, Lebesgue Integration, Operator Semigroups, Optimal Control.
Implement a function continuous time discount that computes the discounted return for a trajectory in a continuous time setting, where transitions can take variable amounts of real time. In standard discrete time reinforcement learning, rewards are discounted by a fixed factor gamma raised to the step number. In continuous time settings (such as Semi Markov Decision Processes), the time between transitions varies, and discounting must account for the actual elapsed time rather than simply counting steps. Your function should accept: beta: A non negative continuous discount rate rewards: A list of rewards received at each transition durations: A list of time durations for each transition (time spent before receiving the corresponding reward) unit dt: A positive float representing the reference time interval for computing an equivalent discrete discount factor (default 1.0) The function should return a dictionary with: 'cumulative times': The cumulative elapsed time at which each reward is received, rounded to 4 decimal places 'discount factors': The continuous time discount factor applied to each reward, rounded to 4 decimal places 'discounted return': The total discounted return, rounded to 4 decimal places 'equivalent gamma': The equivalent discrete time discount factor for the given unit dt, rounded to 4 decimal places The rewards and durations lists will always have the same length and contain at least one element.