9.1.7 Estimation for Random Vectors
Linear MMSE for Random Vectors:
Suppose that we would like to have an estimator for the random vector $\mathbf{X}$ in the form of \begin{align} \hat{\mathbf{X}}_L=\mathbf{A} \mathbf{Y}+ \mathbf{b}, \end{align} where $\mathbf{A}$ and $\mathbf{b}$ are fixed matrices to be determined. Remember that for two random variables $X$ and $Y$, the linear MMSE estimator of $X$ given $Y$ is \begin{align} \hat{X}_L&=\frac{\textrm{Cov}(X,Y)}{\textrm{Var}(Y)} (Y-EY)+ EX\\ &=\frac{\textrm{Cov}(X,Y)}{\textrm{Cov}(Y,Y)} (Y-EY)+ EX. \end{align} We can extend this result to the case of random vectors. More specifically, we can show that the linear MMSE estimator of the random vector $\mathbf{X}$ given the random vector $\mathbf{Y}$ is given by \begin{align} \hat{\mathbf{X}}_L=\mathbf{\textbf{C}_\textbf{XY}} \mathbf{\textbf{C}_\textbf{Y}}^{-1} (\mathbf{Y}-E[\textbf{Y}])+ E[\textbf{X}]. \end{align} In the above equation, $\textbf{C}_\textbf{Y}$ is the covariance matrix of $\mathbf{Y}$, defined as \begin{align} \nonumber \textbf{C}_\textbf{Y}&=E[(\textbf{Y}-E\textbf{Y})(\textbf{Y}-E\textbf{Y})^{T}], \end{align} and $\textbf{C}_\textbf{XY}$ is the cross covariance matrix of $\mathbf{X}$ and $\mathbf{Y}$, defined as \begin{align} \nonumber \textbf{C}_\textbf{XY}=E[(\textbf{X}-E\textbf{X})(\textbf{Y}-E\textbf{Y})^T]. \end{align} The above calculations can easily be done using MATLAB or other packages. However, it is sometimes easier to use the orthogonality principle to find $\hat{\mathbf{X}}_L$. We now explain how to use the orthogonality principle to find linear MMSE estimators.Using the Orthogonality Principle to Find Linear MMSE Estimators for Random Vectors:
Suppose that we are estimating a vector $\textbf{X}$: \begin{equation} \nonumber \textbf{X} = \begin{bmatrix} X_1 \\%[5pt] X_2 \\%[5pt] . \\[-10pt] . \\[-10pt] . \\[5pt] X_m \end{bmatrix} \end{equation} given that we have observed the random vector $\textbf{Y}$. Let \begin{equation} \nonumber \hat{\mathbf{X}}_L= \begin{bmatrix} \hat{X}_1 \\%[5pt] \hat{X}_2 \\%[5pt] . \\[-10pt] . \\[-10pt] . \\[5pt] \hat{X}_m \end{bmatrix} \end{equation} be the vector estimate. We define the MSE as \begin{align} \nonumber MSE=\sum_{k=1}^{m} E[(X_k-\hat{X}_k)^2]. \end{align} Therefore, to minimize the MSE, it suffices to minimize each $E[(X_k-\hat{X}_k)^2]$ individually. This means that we only need to discuss estimating a random variable $X$ given that we have observed the random vector $\textbf{Y}$. Since we would like our estimator to be linear, we can write \begin{align} \hat{X}_L=\sum_{k=1}^{n}a_k Y_k+b. \end{align} The error in our estimate $\tilde{X}$ is then given by \begin{align} \tilde{X}&=X-\hat{X}_L\\ &=X-\sum_{k=1}^{n}a_k Y_k-b. \end{align} Similar to the proof of Theorem 9.1, we can show that the linear MMSE should satisfy \begin{align} &E[\tilde{X}]=0,\\ &\textrm{Cov}(\tilde{X},Y_j)=E[\tilde{X} Y_j]=0, \quad \textrm{ for all }j=1,2,\cdots, n. \end{align} The above equations are called the orthogonality principle. The orthogonality principle is often stated as follows: The error ($\tilde{X}$) must be orthogonal to the observations ($Y_1$, $Y_2$, $\cdots$, $Y_n$). Note that there are $n+1$ unknowns ($a_1$, $a_2$, $\cdots$, $a_n$ and $b$) and $n+1$ equations. Let us look at an example to see how we can apply the orthogonality principle.Example
Let $X$ be an unobserved random variable with $EX=0$, $\textrm{Var}(X)=4$. Assume that we have observed $Y_1$ and $Y_2$ given by \begin{align} Y_1&=X+W_1,\\ Y_2&=X+W_2, \end{align} where $EW_1=EW_2=0$, $\textrm{Var}(W_1)=1$, and $\textrm{Var}(W_2)=4$. Assume that $W_1$, $W_2$ , and $X$ are independent random variables. Find the linear MMSE estimator of $X$, given $Y_1$ and $Y_2$.
- Solution
- The linear MMSE of $X$ given $Y$ has the form \begin{align} \hat{X}_L=aY_1+bY_2+c. \end{align} We use the orthogonality principle. We have \begin{align} E[\tilde{X}]&=-aEY_1-bEY_2-c\\ &=-a \cdot 0- b \cdot 0-c=-c. \end{align} Using $E[\tilde{X}]=0$, we conclude $c=0$. Next, we note \begin{align} \textrm{Cov}(\hat{X}_L,Y_1) &= \textrm{Cov}(aY_1+bY_2,Y_1)\\ &=a \textrm{Cov}(Y_1, Y_1)+ b \textrm{Cov}(Y_1,Y_2)\\ &=a \textrm{Cov} (X+W_1,X+W_1)+b \textrm{Cov} (X+W_1,X+W_2)\\ &=a (\textrm{Var}(X) +\textrm{Var}(W_1))+b \textrm{Var}(X)\\ &=5a +4b. \end{align} Similarly, we find \begin{align} \textrm{Cov}(\hat{X}_L,Y_2) &= \textrm{Cov}(aY_1+bY_2,Y_2)\\ &=a \textrm{Var}(X) +b (\textrm{Var}(X)+\textrm{Var}(W_2))\\ &=4a +8b. \end{align} We need to have \begin{align} &\textrm{Cov}(\tilde{X},Y_j)=0, \quad \textrm{ for }j=1,2, \end{align} which is equivalent to \begin{align} &\textrm{Cov}(\hat{X}_L,Y_j)=\textrm{Cov}(X,Y_j), \quad \textrm{ for }j=1,2. \end{align} Since $\textrm{Cov}(X,Y_1)=\textrm{Cov}(X,Y_2)=\textrm{Var}(X)=4$, we conclude \begin{align} &5a +4b=4,\\ &4a +8b=4. \end{align} Solving for $a$ and $b$, we obtain $a=\frac{2}{3}$, and $b=\frac{1}{6}$. Therefore, the linear MMSE estimator of $X$, given $Y_1$ and $Y_2$, is \begin{align} \hat{X}_L=\frac{2}{3} Y_1+ \frac{1}{6} Y_2. \end{align}