# Question about the Multiple Linear Regression: why and how does it work?

by user1337   Last Updated May 16, 2019 03:19 AM

I know this question is quite simple and maybe quite naive as well, but I would like to get some help. The general linear model can be expressed as \begin{align*} \textbf{Y} = \textbf{X}\beta + \epsilon \end{align*}

where $$Y\sim\mathcal{N}(\textbf{X}\beta,\sigma^{2}\textbf{I})$$ represents the random component, $$\textbf{X}\beta$$ represents the systematic component and the link function is given by the identity $$g(\mu) = \mu = \textbf{X}\beta$$.

My question is: why do we assume the response variable $$\textbf{Y} = (Y_{1},Y_{2},\ldots,Y_{n})$$ equals the mean $$\mu = \textbf{X}\beta$$ plus an error $$\epsilon$$, which is normally distributed? Moreover, how do we interpret the mean of each component $$Y_{i}$$? Since each $$Y_{i}$$ is an observation from the random variable whose distribution describes the data, why should them have different means? Does each $$Y_{i}$$ represent a "person" from the target population?

Here it is an example. Consider that $$\mu_{i} = \beta_{0} + \beta_{1}x_{i1} + \beta_{2}x_{i2}$$, where $$\mu_{i}$$ indicates the average income from the population that lives in the city $$i$$, $$1\leq i\leq 3$$, and the $$x_{ij}$$ represent some features which influence its value. Then, most probably, we will obtain different values for the means $$\mu_{1}$$, $$\mu_{2}$$ and $$\mu_{3}$$. Why does it sound reasonable to state that $$Y_{i} = \mu_{i} + \epsilon_{i}$$, where $$\epsilon$$ is normally distributed?

Any help is appreciated. Thanks in advance!

Tags :

## Related Questions

Updated November 15, 2017 01:19 AM

Updated June 05, 2019 10:19 AM

Updated September 16, 2017 16:19 PM

Updated September 25, 2017 12:19 PM