Gibbs Softmax Action Selection
Foundations & Tabular RL DS practice problem on Onlearn.
Difficulty: medium.
Topics: Understanding Gibbs (Softmax) Action Selection in Reinforcement Learning, Temperature Parameter, Softmax Normalization, Numerical Stability (Log-Sum-Exp), Categorical Sampling, Action Preference Scaling, Reinforcement Learning, Probability Theory, Numerical Optimization, Decision Theory, Statistical Modeling, Exploration vs Exploitation, Policy Gradient Methods, Stochastic Policies, Action Preferences, Boltzmann Distribution.
Implement a Gibbs Softmax action selection function. Given a list of action preferences (Q values) and a temperature parameter 'tau', return a probability distribution over the actions. Then, implement a function to sample a single action index based on that distribution.