Feature Selection for RL Value Function Approximation

Representation Learning, Advanced Theory & Miscellaneous DS practice problem on Onlearn.

Difficulty: medium.

Topics: Feature Selection for RL Value Function Approximation, Bellman Residual Minimization, Lasso Penalty, Basis Function Expansion, Mutual Information Maximization, Singular Value Decomposition, Reinforcement Learning, Statistical Learning Theory, Optimization Theory, Information Theory, Linear Algebra, Value Function Approximation, Dimensionality Reduction, Regularization Techniques, Stochastic Approximation, Kernel Methods.

In reinforcement learning with function approximation, choosing the right state features is critical for learning good value functions. Irrelevant or redundant features can slow convergence and degrade performance. Implement a function rl feature selection that selects the most relevant features for linear value function approximation using a two pass Bellman error correlation method: 1. Using the given experience data (state features, next state features, rewards, terminal flags, and discount factor), compute initial TD targets assuming a zero valued bootstrap (i.e., the initial value function estimate is zero everywhere). 2. Fit a preliminary linear value function to these initial targets using least squares. 3. Recompute improved TD targets by bootstrapping with the preliminary value estimate on next states. 4. Score each feature by its absolute Pearson correlation with the improved TD targets. 5. Select the top k features based on their scores (highest first). Break ties by preferring lower feature indices. Return a dictionary with two keys: 'selected features': list of integer feature indices (sorted by score descending, ties broken by index ascending) 'scores': list of corresponding absolute correlation scores rounded to 4 decimal places Use the population standard deviation (ddof=0) for correlation calculations. If a feature has zero variance or the target has zero variance, assign that feature a score of 0.0.