Epsilon-Greedy Action Selection for n-Armed Bandit
Foundations & Tabular RL DS practice problem on Onlearn.
Difficulty: easy.
Topics: Understanding Epsilon-Greedy Action Selection in Multi-Armed Bandits, Epsilon Parameter, Argmax Selection, Random Sampling, Action Space Indexing, Tie-breaking in Greedy Selection, Reinforcement Learning, Probability Theory, Decision Theory, Optimization, Statistics, Exploration-Exploitation Trade-off, Multi-Armed Bandit Problem, Action-Value Estimation, Greedy Policy, Stochastic Processes.
Implement an epsilon greedy action selection function for an n armed bandit problem. The function should take the current estimated action values (Q values), the exploration rate epsilon, and a random seed. It should return the index of the selected action.