Unigram Probability from Corpus
Text Representation & Classical NLP DS practice problem on Onlearn.
Difficulty: easy.
Topics: Understanding Calculate Unigram Probability from Corpus, Unigram Model, Maximum Likelihood Estimation, Boundary Tokenization, Floating Point Precision, Frequency Count Normalization, Natural Language Processing, Probability and Statistics, Data Structures and Algorithms, Software Engineering, Information Theory, Language Modeling, Corpus Preprocessing, Frequency Analysis, Tokenization Strategies, Computational Complexity.
Implement a function that calculates the unigram probability of a given word in a corpus of sentences. Include start <s and end </s tokens in the calculation. The probability should be rounded to 4 decimal places.