Google TechTalks March 17, 2006 Ramesh Nallapati ABSTRACT A base distribution that generates data samples is a core component of generative graphical models. In the recent past, the multinomial distribution has become the default distribution for text owing mainly to its simplicity in representation and estimation. However, it has been shown that the multinomial fails to capture the burstiness or heavy tail behavior of term occurrences and is thus a poor fit for text. The Dirichlet-Compound-Multinomial (DCM) distribution, on the other hand, overcomes this flaw, but pays the price in terms of estimation complexity, rendering it unattractive for information retrieval tasks which typically require...
Get notified about new features and conference additions.