Unigram Probability from Corpus
Probability Theory DS practice problem on Onlearn.
Difficulty: easy.
Topics: Understanding Calculate Unigram Probability from Corpus, Unigram Model, Maximum Likelihood Estimation, Boundary Tokenization, Corpus Normalization, Floating Point Precision, Natural Language Processing, Probability and Statistics, Data Structures and Algorithms, Information Theory, Computational Linguistics, Language Modeling, Frequency Analysis, Corpus Preprocessing, Discrete Probability Distributions, Tokenization Strategies.
Implement a function that calculates the unigram probability of a given word in a corpus of sentences. Include start <s and end </s tokens in the calculation. The probability should be rounded to 4 decimal places.