Unified Task Handler for Multi-Task RL Agent

RL Environments, Games & Applications DS practice problem on Onlearn.

Difficulty: medium.

Topics: Unified Task Handler for Multi-Task RL Agent, Reward Function Normalization, Shared Backbone Architecture, Experience Replay Buffer, Task-Specific Head Branching, Non-Stationary Environment Dynamics, Reinforcement Learning, Software Engineering, Distributed Systems, Control Theory, Computational Complexity, Multi-Task Learning, Asynchronous Task Scheduling, Environment Abstraction, Policy Gradient Methods, Inter-Process Communication.

In many reinforcement learning settings, a single agent must handle multiple different tasks using a shared representation. Rather than maintaining separate policies or value functions for each task, the agent uses a unified system that conditions its behavior on a task identifier. Implement a function unified task handler that computes task conditioned action values and an action selection policy for a multi task RL agent. Given: state features: a numpy array of shape (num actions, feature dim) representing the feature vector associated with each available action in the current state. task weights: a numpy array of shape (num tasks, feature dim) containing learned weight vectors, one per task. task id: an integer specifying which task the agent is currently performing. epsilon: a float (default 0.0) for epsilon greedy action selection. Your function should return a dictionary with: 'q values': a numpy array of shape (num actions,) containing the estimated action values under the specified task. 'greedy action': an integer indicating the action with the highest Q value. In the case of ties, select the action with the lowest index. 'action probs': a numpy array of shape (num actions,) containing the epsilon greedy probability of selecting each action. Use only NumPy.