What is the assumption on the distribution of data in gaussian mixture models?

by Olórin   Last Updated March 14, 2019 20:19 PM

I am reading about Gaussian mixture models from this slide


However, I am super confused at the very first line.

It says:

We have a dataset of some data $x_i$

Each data is assumed to be generated i.i.d. from an underlying distribution. We assume that the underlying distribution is a mixture of Gaussian distribution.

I do not understand why we make the assumption that the underlying distribution for the data is the mixture of Gaussian distribution.

This seems to me to be completely false.

The data distribution could be anything. We are only fitting a mixture of Gaussian model to whatever that underlying distribution is. We are minimizing the log-likehood using EM to approximate that distribution with the GMM.

Why do people assume that the data themselves are generated through Gaussians?

Is my interpretation correct?

Answers 2

When we model a (true) distribution with a Mixture of Gaussians (MG), it can be said that we assumed the distribution is MG. Similarly, in linear regression, we can say we assume the relation between Y and X is linear, however, it is unlikely to be exactly linear. We should not interpret "assuming" as "believing", we don't believe, we just assume, which may be an apparent, unrealistic simplification. This is why we can say "simplifying assumptions", we are admitting to be ignorant right at the beginning.

March 14, 2019 20:03 PM

Actually, the GMM assumes the underlying data is generated from Gaussians. You are thereby automatically in the position of assuming the Gaussianity of data by accepting and using the model. You're actually believing that the GMM will approximately able to represent your data well enough. In almost every algorithm, there are certain assumptions that you accept/assume, e.g. Naive Bayes assumes independence between features. Remember that almost all models are wrong.

March 14, 2019 20:09 PM

Related Questions

Updated March 06, 2019 08:19 AM

Updated January 10, 2019 15:19 PM

Updated April 24, 2017 04:19 AM

Updated September 15, 2017 04:19 AM

Updated December 13, 2016 08:08 AM