Johnson Lindenstrauss Lemma

The Johnson-Lindenstrauss (JL) Lemma is a important result in mathematics that provides a way to reduce the dimensionality of data points while preserving the pairwise distances between them, up to a certain distortion. In this blog post, we introduce the JL Lemma and present a proof of the theorem.

The proof is based on the work of Dasgupta et al.’s paper An elementary proof of a theorem of Johnson and Lindenstrauss.. For a more detailed explanation, please refer to the original paper.

JL Lemma

For any set \(\mathcal{O}\) of \(n\) points in \(\mathbb{R}^d\), there exists a map \(f:\mathbb{R}^{d}\to\mathbb{R}^k\), such that for all \(\boldsymbol{o_i},\boldsymbol{o_j}\in\mathcal{P}\):

\[ (1-\epsilon)\Vert\boldsymbol{o_i}-\boldsymbol{o_j}\Vert_2^2\le\Vert\boldsymbol{f(o_i)-f(o_j)}\Vert_2^2\le(1+\epsilon)\Vert\boldsymbol{o_i}-\boldsymbol{o_j}\Vert_2^2 \]

where \(0\lt\epsilon\lt 1\) and \(k\) is a positive integer satisfying:

\[ k\gt\frac{24}{3\epsilon^2-2\epsilon^3}\log{n} \]

Let \(A\) be a random matrix of size \(k\times d\), where each entry is independently drawn from the standard normal distribution \(\mathcal{N}(0,1)\). Using this random matrix, we can construct a mapping that satisfies the JL Lemma. For any point \(\boldsymbol{p}\in\mathbb{R}^d\), the mapping is defined as:

\[ \boldsymbol{f(p)}=\frac{A}{\sqrt{k}}\cdot\boldsymbol{p} \]

Proof of JL Lemma

To prove the JL Lemma, we first establish some auxiliary lemmas and inequalities. For convenience, we will omit the subscript \(2\) in the norm, so \(\Vert\cdot\Vert^2\) refers to the \(2\)-norm by default.

Lemma 1

For any point \(\boldsymbol{p}\in\mathbb{R}^d\), let \(\boldsymbol{Y}=\frac{A}{\sqrt{k}}\cdot\boldsymbol{p}\). Then, we have:

\[ \mathrm{E}[\Vert\boldsymbol{Y}\Vert^2]=\Vert\boldsymbol{p}\Vert^2 \]

Proof of Lemma 1

From the definition of \(Y_i\), we have:

\[ Y_i=\frac{1}{\sqrt{k}}\sum_{j=1}^dA_{ij}p_j \]

Now, we compute \(\text{E}\left[\Vert\boldsymbol{Y}\Vert^2\right]\):

\[ \begin{align*} \text{E}\left[\Vert\boldsymbol{Y}\Vert^2\right]&=\text{E}\left[\sum_{i=1}^{k}\frac{1}{k}\left(\sum_{j=1}^dA_{ij}p_j\right)^2\right] \\ &=\text{E}\left[\sum_{i=1}^{k}\frac{1}{k}\sum_{j=1}^{d}\sum_{t=1}^{d}A_{ij}A_{it}p_jp_t\right] \\ &=\sum_{i=1}^k\frac{1}{k}\sum_{j=1}^{d}\sum_{t=1}^{d}p_jp_t\text{E}\left[A_{ij}A_{it}\right] \end{align*} \]

  • if \(i\ne t\), then: \[ \text{E}\left[A_{ij}A_{it}\right]=\text{E}\left[A_{ij}\right]\text{E}\left[A_{it}\right]=0 \]
  • if \(i=t\), then: \[ \text{E}\left[A_{ij}A_{it}\right]=\text{E}\left[A_{ij}^{2}\right]=\text{Var}(A_{ij})+\text{E}^2\left[A_{ij}\right]=1 \]

Thus, we get:

\[ \text{E}\left[\Vert\boldsymbol{Y}\Vert^2\right]=\sum_{i=1}^{k}\frac{1}{k}\sum_{j=1}^{d}p_j^2=\sum_{i=1}^{k}\frac{1}{k}\Vert\boldsymbol{p}\Vert^2=\Vert\boldsymbol{p}\Vert^2 \]

Lemma 2

Let \(X\) be a random variable distributed as \(\mathcal{N}(0,1)\). For a constant \(\lambda\le\frac{1}{2}\), we have: \[ \text{E}\left[e^{\lambda X^2}\right]=\frac{1}{\sqrt{1-2\lambda}} \]

Proof of Lemma 2

The PDF of the standard normal distribution is:

\[ f(x)=\frac{1}{\sqrt{2\pi}}e^{-\frac{x^2}{2}} \]

We now compute \(\text{E}\left[e^{\lambda x^2}\right]\):

\[ \text{E}\left[e^{\lambda x^2}\right]=\int_{-\infty}^{+\infty}\frac{1}{\sqrt{2\pi}}e^{\lambda x^2}e^{-\frac{x^2}{2}}\mathrm{d}x=\frac{1}{\sqrt{2\pi}}\int_{-\infty}^{+\infty}e^{-(1-2\lambda)\frac{x^2}{2}}\mathrm{d}x \]

Next, let \(y=x\sqrt{1-2\lambda}\), so that \(\mathrm{d}y=\sqrt{1-2\lambda}\mathrm{d}x\). Substituting this into the integral, we get:

\[ \text{E}\left[e^{\lambda x^2}\right]=\frac{1}{\sqrt{1-2\lambda}}\frac{1}{\sqrt{2\pi}}\int_{-\infty}^{+\infty}e^{-\frac{y^2}{2}}\mathrm{d}y \]

Using the well-known result for the standard normal distribution:

\[ \frac{1}{\sqrt{2\pi}}\int_{-\infty}^{+\infty}e^{-\frac{y^2}{2}}\mathrm{d}y=1 \]

Therefore:

\[ \text{E}\left[e^{\lambda x^2}\right]=\frac{1}{\sqrt{1-2\lambda}} \]

Lemma 3

For any point \(\boldsymbol{p}\in\mathbb{R}^{d}\), let \(\boldsymbol{Y}=\frac{A}{\sqrt{k}}\cdot\boldsymbol{p}\). The following inequalities hold:

\[ \begin{align*} \Pr\left[\Vert\boldsymbol{Y}\Vert^2\ge(1+\epsilon)\Vert\boldsymbol{p}\Vert^2\right]&\le\frac{1}{n^2}\\ \Pr\left[\Vert\boldsymbol{Y}\Vert^2\le(1-\epsilon)\Vert\boldsymbol{p}\Vert^2\right]&\le\frac{1}{n^2} \end{align*} \]

Proof of Lemma 3

We begin by simplifying the inequalities. From the definition of \(\boldsymbol{Y}\), we have:

\[ Y_i=\frac{1}{\sqrt{k}}\sum_{j=1}^{d}A_{ij}p_j \]

We now compute \(\Vert\boldsymbol{Y}\Vert^2\):

\[ \Vert\boldsymbol{Y}\Vert^2=\sum_{i=1}^{k}\left(\frac{1}{\sqrt{k}}\sum_{j=1}^{d}A_{ij}p_j\right)^2=\sum_{i=1}^{k}\frac{1}{k}\left(\sum_{j=1}^{d}A_{ij}p_j\right)^2 \]

Since the standard normal distribution is a \(2\)-stable distribution, using the will-known property, we have:

\[ \Vert\boldsymbol{Y}\Vert^2=\sum_{i=1}^{k}\frac{1}{k}\Vert p\Vert^2X_{i}^{2}=\frac{\Vert p\Vert^2}{k}\sum_{i=1}^{k}X_{i}^{2} \]

where \(X_i\) is a random variable distributed as the standard normal distribution. Substituting this back, the inequalities we need to prove reduce to:

\[ \begin{align*} \Pr\left[\sum_{i=1}^{k}X_{i}^{2}\ge k(1+\epsilon)\right]&\le\frac{1}{n^2}\\ \Pr\left[\sum_{i=1}^{k}X_{i}^{2}\le k(1-\epsilon)\right]&\le\frac{1}{n^2} \end{align*} \]

For the first inequality, let \(\lambda>0\), then:

\[ \Pr\left[\sum_{i=1}^{k}X_{i}^{2}\ge k(1+\epsilon)\right]=\Pr\left[e^{\lambda\sum_{i=1}^{k}X_i^2}\ge e^{\lambda k(1+\epsilon)}\right] \]

Using the Markov’s inequality, we have:

\[ \Pr\left[e^{\lambda\sum_{i=1}^{k}X_i^2}\ge e^{\lambda k(1+\epsilon)}\right]\le\frac{\text{E}\left[e^{\lambda\sum_{i=1}^{k}X_i^2}\right]}{e^{\lambda k(1+\epsilon)}}=\frac{\text{E}^{k}\left[e^{\lambda X_1^2}\right]}{e^{\lambda k(1+\epsilon)}} \]

Let \(\lambda=\frac{\epsilon}{2(1+\epsilon)}\le\frac{1}{2}\). By Lemma 2, we have:

\[ \frac{\text{E}^{k}\left[e^{\lambda X_1^2}\right]}{e^{\lambda k(1+\epsilon)}}=\left(\frac{1}{\sqrt{1-2\lambda}}\right)^{k}e^{-\lambda k(1+\epsilon)}=\left[(1+\epsilon)e^{-\epsilon}\right]^{\frac{k}{2}} \]

Using the Taylor expansion for \(\ln(1+\epsilon)\):

\[ \ln(1+\epsilon)=\epsilon-\frac{\epsilon^2}{2}+\frac{\epsilon^3}{3}-\frac{\epsilon^4}{4}+\cdots \]

We have:

\[ 1+\epsilon\le e^{\epsilon-\frac{\epsilon^2}{2}+\frac{\epsilon^3}{3}} \]

Thus:

\[ \left[(1+\epsilon)e^{-\epsilon}\right]^{\frac{k}{2}}\le \left(e^{\epsilon-\frac{\epsilon^2}{2}+\frac{\epsilon^3}{3}}e^{-\epsilon}\right)^{\frac{k}{2}}=e^{-\frac{k(3\epsilon^2-2\epsilon^3)}{12}} \]

If \(k\) satisfies:

\[ k\gt\frac{24}{3\epsilon^2-2\epsilon^3}\log{n} \]

then:

\[ e^{-\frac{k(3\epsilon^2-2\epsilon^3)}{12}}\le e^{-2\log{n}}=\frac{1}{n^2} \]

Therefore:

\[ \Pr\left[\sum_{i=1}^{k}X_{i}^{2}\ge k(1+\epsilon)\right]\le\frac{1}{n^2} \]

Similarly, for the second inequality, let \(\lambda>0\), then:

\[ \begin{align*} \Pr\left[\sum_{i=1}^{k}X_{i}^{2}\le k(1-\epsilon)\right]&=\Pr\left[e^{-\lambda\sum_{i=1}^{k}X_i^2}\ge e^{-\lambda k(1-\epsilon)}\right] \\ &\le\frac{\text{E}\left[e^{-\lambda\sum_{i=1}^{k}X_i^2}\right]}{e^{-\lambda k(1-\epsilon)}} \\ &=\frac{\text{E}^{k}\left[e^{-\lambda X_1^2}\right]}{e^{-\lambda k(1-\epsilon)}} \end{align*} \]

Let \(\lambda=\frac{\epsilon}{2(1-\epsilon)}\), so that \(-\lambda=\frac{\epsilon}{2(\epsilon-1)}\lt 0\lt \frac{1}{2}\). We have:

\[ \frac{\text{E}^{k}\left[e^{-\lambda X_1^2}\right]}{e^{-\lambda k(1-\epsilon)}}=\left(\frac{1}{\sqrt{1+2\lambda}}\right)^ke^{\lambda k(1-\epsilon)}=\left[(1-\epsilon)e^{\epsilon}\right]^{\frac{k}{2}} \]

Similarly, using the Taylor expansion for \(\ln(1-\epsilon)\):

\[ \ln(1-\epsilon)=-\epsilon-\frac{\epsilon^2}{2}-\frac{\epsilon^3}{3}\cdots \]

we have:

\[ 1-\epsilon\le e^{-\epsilon-\frac{\epsilon^2}{2}} \]

Thus:

\[ \left[(1-\epsilon)e^{\epsilon}\right]^{\frac{k}{2}}\le e^{-\frac{k\epsilon^2}{4}}\lt e^{-\frac{6\epsilon^2}{3\epsilon^2-2\epsilon^3}\log{n}}\lt e^{-2\log{n}}=\frac{1}{n^2} \]

Therefore:

\[ \Pr\left[\sum_{i=1}^{k}X_{i}^{2}\le k(1-\epsilon)\right]\le\frac{1}{n^2} \]

Proof of the JL Lemma

Given any point \(\boldsymbol{p}\in\mathbb{R}^d\), we will use Lemma 3 and Boole’s inequality to show that the JL Lemma holds.

From Lemma 3, for any point \(\boldsymbol{p}\in\mathbb{R}^{d}\), the following holds:

\[ \Pr\left[\Vert\boldsymbol{f(p)}\Vert^2\notin\left[(1-\epsilon)\Vert\boldsymbol{p}\Vert^{2},(1+\epsilon)\Vert\boldsymbol{p}\Vert^{2}\right]\right]\le\frac{2}{n^2} \]

Since this probability holds for any point \(\boldsymbol{p}\in\mathbb{R}^d\), it also holds for any pair of points \(\boldsymbol{u}=\boldsymbol{o_i}-\boldsymbol{o_j}\), where \(\boldsymbol{o_i},\boldsymbol{o_j}\in\mathcal{O}\).

Furthermore, using the linearity of the projection, we have:

\[ \boldsymbol{f(o_1-o_2)}=\boldsymbol{f(o_1)}-\boldsymbol{f(o_2)} \]

Therefore, applying Lemma 3 to the difference between two points \(\boldsymbol{o_i}\) and \(\boldsymbol{o_j}\), we obtain:

\[ \Pr\left[\Vert\boldsymbol{f(o_i)}-\boldsymbol{f(o_j)}\Vert^2\notin\left[(1-\epsilon)\Vert\boldsymbol{o_i}-\boldsymbol{o_j}\Vert^{2},(1+\epsilon)\Vert\boldsymbol{o_i}-\boldsymbol{o_j}\Vert^{2}\right]\right]\le\frac{2}{n^2} \]

The number of pairs of points \(\boldsymbol{o_i},\boldsymbol{o_j}\in\mathcal{O}\) is \(\binom{n}{2}\). Thus, using Boole’s inequality, the probability that at least one pair of points falls outside the desired error bound is at most:

\[ \frac{n(n-1)}{2}\frac{2}{n^2}=1-\frac{1}{n} \]

Thus, with probability at least \(\frac{1}{n}\gt 0\), all pairs of points in \(\mathcal{O}\) will fall within the desired error bounds, completing the proof of the JL Lemma.