Dynamic Channel Allocation with Afterstates

Planning, Dynamics & Decision Systems DS practice problem on Onlearn.

Difficulty: hard.

Topics: Dynamic Channel Allocation with Afterstates, Afterstate Representation, Bellman Equation, Exploration-Exploitation Tradeoff, State-Action Value Function, Discount Factor, Reinforcement Learning, Stochastic Optimization, Control Theory, Information Theory, Probability and Statistics, Temporal Difference Learning, Markov Decision Processes, Dynamic Programming, Resource Scheduling, Value Function Approximation.

Implement a function that performs one step of dynamic channel allocation using afterstate value functions. You manage a cellular system with n cells cells and n channels available frequency channels. The current state is represented by a binary allocation matrix of shape (n cells, n channels), where a 1 indicates that a cell is actively using that channel. When a new call arrives at a specific cell, you must decide which free channel to assign (or reject the call entirely). The key insight is to use afterstates the allocation pattern that results immediately after your channel assignment, but before any future stochastic events (new calls, call departures). For each candidate action (assign a free channel or reject): Accepting on channel ch: the immediate reward is 1.0 additional interference, where additional interference is the number of new conflicting pairs created (pairs of adjacent cells now sharing the same channel that were not already conflicting). Rejecting : the immediate reward is 0.0 and the afterstate is the unchanged allocation. Select the action maximizing reward + gamma V(afterstate), where V is a dictionary mapping afterstate keys (tuple of tuples of ints) to estimated values (default 0.0 for unseen afterstates). Break ties by preferring acceptance over rejection, and among accepting options, prefer the lowest channel index. If prev afterstate key is not None, perform a TD(0) update on the previous afterstate: V(prev) < V(prev) + alpha (prev reward + gamma V(current afterstate) V(prev)) Return a dictionary with keys 'chosen channel' (int, 1 if rejected), 'reward' (float rounded to 4 decimals), 'afterstate key' (tuple of tuples), and 'V' (updated value dict with values rounded to 4 decimals).