Dimensionality Reduction


Legal Notices and Disclaimers

This presentation is for informational purposes only. INTEL MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY.

Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. Check with your system manufacturer or retailer or learn more at

intel.com

.

This sample source code is released under the

Intel Sample Source Code License Agreement

.

Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.

*Other names and brands may be claimed as the property of others.

Copyright © 2017, Intel Corporation. All rights reserved.


Curse of Dimensionality

Theoretically, increasing features

should

improve performance

In practice,

more

features leads to

worse performance

Number

of training examples required

increases

exponentially with

dimensionality

1 dimension

:

10 positions

2 dimensions:

100 positions

3 dimensions:

1000 positions


Curse of Dimensionality

Theoretically, increasing features should improve performance

In practice,

too many features leads to

worse performance

1 dimension

:

10 positions

2 dimensions:

100 positions

3 dimensions:

1000 positions


Curse of Dimensionality

Theoretically, increasing features

should

improve performance

In practice, too many features leads to worse performance

Number

of training examples required

increases

exponentially with

dimensionality

1 dimension

:

10 positions

2 dimensions:

100 positions

3 dimensions:

1000 positions


Solution: Dimensionality Reduction

Data can be represented by fewer dimensions (features)

Reduce dimensionality by selecting subset (feature elimination)

Combine with linear and non-linear transformations

Height

Cigarettes/Day


Solution: Dimensionality Reduction

Data can be represented by fewer dimensions (features)

Reduce dimensionality by selecting subset (feature elimination)

Combine with linear and non-linear transformations

Height

Cigarettes/Day


Solution: Dimensionality Reduction

Data can be represented by fewer dimensions (features)

Reduce dimensionality by selecting subset (feature elimination)

Combine with linear and non-linear transformations

Height

Cigarettes/Day


Solution: Dimensionality Reduction

Two features: height and cigarettes per day

Both features increase together (correlated)

Can we reduce number of features to one?

Height

Cigarettes/Day


Solution: Dimensionality Reduction

Two features: height and cigarettes per day

Both features increase together (correlated)

Can we reduce number of features to one?

Height

Cigarettes/Day


Solution: Dimensionality Reduction

Two features: height and cigarettes per day

Both features increase together (correlated)

Can we reduce number of features to one?

Height

Cigarettes/Day


Solution: Dimensionality Reduction

Two features: height and cigarettes per day

Both features increase together (correlated)

Can we reduce number of features to one?

Height

Cigarettes/Day

Height + Cigarettes/Day


Solution: Dimensionality Reduction

Create single feature that is combination of height and cigarettes

This is Principal Component Analysis (PCA)

Height

Cigarettes/Day

Height + Cigarettes/Day


Solution: Dimensionality Reduction

Create single feature that is combination of height and cigarettes

This is Principal Component Analysis (PCA)

Height + Cigarettes/Day


Dimensionality Reduction

Given an

𝑁

-dimensional data set (

𝑥

), find a

𝑁

×

𝐾


matrix (

𝑈

):


𝑦

=

𝑈

𝑇

𝑥

, where

𝑦

 
has

𝐾

dimensions and

𝐾

&lt;

𝑁

 
𝑥

=

𝑥

1

𝑥

2

⋯

𝑥

𝑛

   
𝑈

𝑇


𝑦

=

𝑦

1

𝑦

2

⋯

𝑦

𝑘

(

𝐾

&lt;

𝑁

)

 
Principal Component Analysis (PCA)

X2

X1


Principal Component Analysis (PCA)

X2

X1


Principal Component Analysis (PCA)

X2

X1


Principal Component Analysis (PCA)

X2

X1

Direction:

𝑣

1

Length:

𝜆

1

 
Direction:

𝑣

2

Length:

𝜆

2

 
SVD is a matrix factorization method normally used for PCA

Does not require a square data

set

SVD is used by

Scikit

-learn for PCA

Single Value Decomposition (SVD)

𝐴

𝑚

×

𝑛

 
𝑈

𝑚

×

𝑚

 
𝑆

𝑚

×

𝑛

 
𝑉

𝑛

×

𝑛

𝑇

 
⋆

⋆

⋆

⋆

⋆

⋆

⋆

⋆

⋆

⋆

⋆

⋆

⋆

⋆

⋆

=

⋆

⋆

⋆

⋆

⋆

⋆

⋆

⋆

⋆

⋆

⋆

⋆

⋆

⋆

⋆

⋆

⋆

⋆

⋆

⋆

⋆

⋆

⋆

⋆

⋆


⋆

0

0

0

⋆

0

0

0

⋆

0

0

0

0

0

0


⋆

⋆

⋆

⋆

⋆

⋆

⋆

⋆

⋆

 
How can SVD be used for dimensionality reduction?

Principal components are calculated from

𝑈𝑆

"Truncated SVD" used for dimensionality reduction (

𝑛

→

𝑘

)

 
Truncated Single Value Decomposition

𝐴

𝑚

×

𝑛

 
𝑈

𝑚

×

𝑘

 
𝑆

𝑘

×

𝑘

 
𝑉

𝑘

×

𝑛

𝑇

 
⋆

⋆

⋆

⋆

⋆

⋆

⋆

⋆

⋆

⋆

⋆

⋆

⋆

⋆

⋆

≈

⋆

⋆

⋆

⋆

⋆

⋆

⋆

⋆

⋆

⋆

⋆

⋆

⋆

⋆

⋆

⋆

⋆

⋆

⋆

⋆

⋆

⋆

⋆

⋆

⋆


𝟗

0

0

0

𝟕

0

0

0

𝟏

0

0

0

0

0

0


⋆

⋆

⋆

⋆

⋆

⋆

⋆

⋆

⋆

 
PCA and SVD seek to find the vectors that capture the most variance

Variance is sensitive to axis scale

Must scale data!

Importance of Feature Scaling

X2

X1

100

200

300

400

500

10

20

30

40

50

Unscaled


PCA and SVD seek to find the vectors that capture the most variance

Variance is sensitive to axis scale

Must scale data!

X2

X1

10

20

30

40

50

10

20

30

40

50

Unscaled

Scaled

Importance of Feature Scaling


Import the class containing the dimensionality reduction method


from

sklearn.decomposition


import

PCA

Create an instance of the class


PCAinst


=


PCA

(

n_components

=3, whiten=True)

Fit

the instance on the data and then

transform the data


X_trans


=

PCAinst

.

fit_transform

(

X_train

)

Does not work with sparse matrices

PCA: The Syntax


Import the class containing the dimensionality reduction method


from

sklearn.decomposition


import

PCA

Create an instance of the class


PCAinst


=


PCA

(

n_components

=3, whiten=True)

PCA: The Syntax


Import the class containing the dimensionality reduction method


from

sklearn.decomposition


import

PCA

Create an instance of the class


PCAinst


=


PCA

(

n_components

=3, whiten=True)

final number of dimensions

PCA: The Syntax


Import the class containing the dimensionality reduction method


from

sklearn.decomposition


import

PCA

Create an instance of the class


PCAinst


=


PCA

(

n_components

=3, whiten=True)

whiten = scale

and center data

PCA: The Syntax


Import the class containing the dimensionality reduction method


from

sklearn.decomposition


import

PCA

Create an instance of the class


PCAinst


=


PCA

(

n_components

=3, whiten=True)

Fit

the instance on the data and then

transform the data


X_trans


=

PCAinst

.

fit_transform

(

X_train

)

Does not work with sparse matrices

PCA: The Syntax


Import the class containing the dimensionality reduction method


from

sklearn.decomposition


import

PCA

Create an instance of the class


PCAinst


=


PCA

(

n_components

=3, whiten=True)

Fit

the instance on the data and then

transform the data


X_trans


=

PCAinst

.

fit_transform

(

X_train

)

Does not work with sparse matrices

PCA: The Syntax


Truncated SVD

: The

Syntax

Import the class containing the dimensionality reduction method


from

sklearn.decomposition


import

TruncatedSVD

Create an instance of the class

SVD


=


TruncatedSVD

(

n_components

=3)

Fit

the instance on the data and then

transform the data


X_trans


=

SVD

.

fit_transform

(

X_sparse

)

Works with sparse matrices—used with text data for Latent Semantic Analysis (LSA)


Import the class containing the dimensionality reduction method


from

sklearn.decomposition


import

TruncatedSVD

Create an instance of the class

SVD


=


TruncatedSVD

(

n_components

=3)

Fit

the instance on the data and then

transform the data


X_trans


=

SVD

.

fit_transform

(

X_sparse

)

Works with sparse matrices—used with text data for Latent Semantic Analysis (LSA)

does not center data

Truncated SVD

: The

Syntax


Transformations calculated with PCA/SVD are linear

Data can have non-linear features

This can cause dimensionality reduction to fail

Moving Beyond Linearity

Original Space

Projection by PCA


Transformations calculated with PCA/SVD are linear

Data can have non-linear features

This can cause dimensionality reduction to fail

Moving Beyond Linearity

Original Space

Projection by PCA


Transformations calculated with PCA/SVD are linear

Data can have non-linear features

This can cause dimensionality reduction to fail

Moving Beyond Linearity

Original Space

Projection by PCA

dimensionality reduction

fails


Solution:

kernels can be used to perform non-linear PCA

Like the kernel trick introduced for SVMs

Kernel PCA

Original Space

Projection by KPCA


Kernel PCA

Linear PCA

𝑅

2

 
𝑅

2

 
Φ

 
𝐹

 
Kernel PCA

Solution:

kernels can be used to perform non-linear PCA

Like the kernel trick introduced for SVMs


Kernel PCA: The Syntax

Import the class containing the dimensionality reduction method


from

sklearn.decomposition


import

KernelPCA

Create an instance of the class


kPCA


=


KernelPCA

(

n_components

=3, kernel='

rbf

', gamma=1.0)

Fit the instance on the data and then transform the data


X_trans


=

kPCA

.

fit_transform

(

X_train

)


![image](pptimages/image1.jpeg)

X

Y

Z


MDS: The Syntax

Import the class containing the dimensionality reduction method


from

sklearn.manifold


import

MDS

Create an instance of the class


mdsMod


=


MDS

(

n_components

=2)

Fit

the instance on the data and then

transform the data


X_trans


=

mdsMod

.

fit_transform

(

X_sparse

)

Many other manifold dimensionality methods exist:

Isomap

,

TSNE

.


![image](pptimages/image2.jpeg)


![image](pptimages/image3.jpeg)

Divide image into 12 x 12 pixel sections

Flatten section to create row of data with 144 features

Perform PCA on all data points

Image Source

: https://

commons.wikimedia.org

/wiki/

File:Monarch_In_May.jpg


![image](pptimages/image4.jpeg)

Divide image into 12 x 12 pixel sections

Flatten section to create row of data with 144 features

Perform PCA on all data points

Image Source

: https://

commons.wikimedia.org

/wiki/

File:Monarch_In_May.jpg

12 x 12

1

2

3

…

142

143

144


![image](pptimages/image5.jpeg)

Divide image into 12 x 12 pixel sections

Flatten section to create row of data with 144 features

Perform PCA on all data points

Image Source

: https://

commons.wikimedia.org

/wiki/

File:Monarch_In_May.jpg

1

2

3

…

142

143

144

1

2

3

…

142

143

144

1

2

3

…

142

143

144

1

2

3

…

142

143

144

1

2

3

…

142

143

144

1

2

3

…

142

143

144


![image](pptimages/image6.jpeg)

144 Dimensions

60 Dimensions


![image](pptimages/image7.jpeg)

144 Dimensions

16 Dimensions


![image](pptimages/image8.jpeg)


![image](pptimages/image9.jpeg)

PCA

Compression

:

144 → 4 Dimensions

144 Dimensions

4 Dimensions


L2

Error

and PCA

Dimension

PCA Dimension

0.2

40

20

Relative Error

0.8

1.0

0.6

0.4

60

80

100

120

140


![image](pptimages/image10.jpeg)


![image](pptimages/image11.jpeg)


![image](pptimages/image12.jpeg)

PCA

Compression

:

144 → 1 Dimension

144 Dimensions

1 Dimension