TDNN for Variable-Length Sequences

Sequence Models & Generative Models DS practice problem on Onlearn.

Difficulty: medium.

Topics: TDNN for Variable-Length Sequences, Dilated Convolutions, Weight Sharing, Receptive Field Expansion, Global Average Pooling, Backpropagation Through Time, Deep Learning Foundations, Signal Processing, Sequence Modeling, Computational Complexity, Optimization Theory, Temporal Convolutional Networks, Feature Extraction Architectures, Time-Invariant Representations, Dynamic Programming, Stochastic Gradient Descent.

Implement a forward pass for a Time Delay Neural Network (TDNN) that processes variable length sequences through one or more TDNN layers. A TDNN layer operates by looking at specific time context offsets relative to each time step. For a given time step t, the layer gathers feature vectors at positions t+o for each offset o in the offset list, concatenates them into a single vector, applies a linear transformation (matrix multiply + bias), and then optionally applies an activation function. Only time steps where ALL context positions fall within the valid range of the sequence should produce output (no zero padding). This means for a sequence of length T with offsets that span from min offset to max offset, the output sequence will generally be shorter than the input. When multiple layers are stacked, each subsequent layer operates on the (shortened) output of the previous layer. Write a function tdnn forward(sequences, layer configs) that takes: sequences: a list of numpy arrays, each of shape (T i, D) where T i may differ across sequences layer configs: a list of dicts, each containing: 'weights': numpy array of shape (len(offsets) D in, D out) 'bias': numpy array of shape (D out,) 'offsets': list of integer context offsets (e.g., [ 1, 0, 1] or [ 2, 0, 2]) 'activation': string, either 'relu' or 'none' The function should return a list of results, one per input sequence. Each result is the output array (after all layers) converted to a nested Python list via .tolist(), with values rounded to 4 decimal places. If a sequence is too short for the required context at any layer, that sequence's output should be an empty list.