by Olórin
Last Updated March 14, 2019 20:19 PM

I am reading about Gaussian mixture models from this slide

https://www.ics.uci.edu/~smyth/courses/cs274/notes/EMnotes.pdf

However, I am super confused at the very first line.

It says:

We have a dataset of some data $x_i$

Each data is assumed to be generated i.i.d. from an underlying distribution. We assume that the underlying distribution is a mixture of Gaussian distribution.

I do not understand why we make the assumption that the underlying distribution for the data is the mixture of Gaussian distribution.

This seems to me to be completely false.

The data distribution could be anything. We are only fitting a mixture of Gaussian model to whatever that underlying distribution is. We are minimizing the log-likehood using EM to approximate that distribution with the GMM.

Why do people assume that the data themselves are generated through Gaussians?

Is my interpretation correct?

When we *model* a (true) distribution with a Mixture of Gaussians (MG), it can be said that we *assumed* the distribution is MG. Similarly, in linear regression, we can say we assume the relation between Y and X is linear, however, it is unlikely to be exactly linear. We should not interpret "assuming" as "believing", we don't believe, we just assume, which may be an apparent, unrealistic simplification. This is why we can say "simplifying assumptions", we are admitting to be ignorant right at the beginning.

Actually, the GMM assumes the underlying data is generated from Gaussians. You are thereby automatically in the position of assuming the *Gaussianity* of data by accepting and using the model. You're actually believing that the GMM will approximately able to represent your data well enough. In almost every algorithm, there are certain assumptions that you accept/assume, e.g. Naive Bayes assumes independence between features. Remember that almost all models are wrong.

Updated January 10, 2019 15:19 PM

Updated April 24, 2017 04:19 AM

Updated December 13, 2016 08:08 AM

- Serverfault Query
- Superuser Query
- Ubuntu Query
- Webapps Query
- Webmasters Query
- Programmers Query
- Dba Query
- Drupal Query
- Wordpress Query
- Magento Query
- Joomla Query
- Android Query
- Apple Query
- Game Query
- Gaming Query
- Blender Query
- Ux Query
- Cooking Query
- Photo Query
- Stats Query
- Math Query
- Diy Query
- Gis Query
- Tex Query
- Meta Query
- Electronics Query
- Stackoverflow Query
- Bitcoin Query
- Ethereum Query