Engram Context-Aware Gating
Vision-Language & Cross-Modal Systems DS practice problem on Onlearn.
Difficulty: medium.
Topics: Engram Context-Aware Gating, Sigmoid Activation, Kullback-Leibler Divergence, Multi-Head Self-Attention, Feature Map Concatenation, Weight Decay, Deep Learning Architectures, Multimodal Representation Learning, Attention Mechanisms, Optimization Theory, Information Theory, Gating Networks, Cross-Modal Fusion, Transformer Blocks, Stochastic Gradient Descent, Entropy Regularization.
Implement the context aware gating mechanism from the Engram architecture (DeepSeek). This mechanism dynamically modulates retrieved static N gram embeddings based on the current hidden state context. The Engram module retrieves static embeddings from an N gram memory table, but these embeddings are context independent and may contain noise from hash collisions or polysemy. The context aware gating mechanism resolves this by using the hidden state (which has aggregated global context) to compute a scalar gate that suppresses irrelevant memory. Given: h: Hidden states of shape (T, d) representing contextualized token representations e: Retrieved memory embeddings of shape (T, d mem) from the N gram lookup W K: Key projection matrix of shape (d mem, d) W V: Value projection matrix of shape (d mem, d) Your function should return the gated output tensor of shape (T, d).