Dimensionality Reduction Legal Notices and Disclaimers This presentation is for informational purposes only. INTEL MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY. Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. Check with your system manufacturer or retailer or learn more at intel.com . This sample source code is released under the Intel Sample Source Code License Agreement . Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. Copyright Β© 2017, Intel Corporation. All rights reserved. Curse of Dimensionality Theoretically, increasing features should improve performance In practice, more features leads to worse performance Number of training examples required increases exponentially with dimensionality 1 dimension : 10 positions 2 dimensions: 100 positions 3 dimensions: 1000 positions Curse of Dimensionality Theoretically, increasing features should improve performance In practice, too many features leads to worse performance 1 dimension : 10 positions 2 dimensions: 100 positions 3 dimensions: 1000 positions Curse of Dimensionality Theoretically, increasing features should improve performance In practice, too many features leads to worse performance Number of training examples required increases exponentially with dimensionality 1 dimension : 10 positions 2 dimensions: 100 positions 3 dimensions: 1000 positions Solution: Dimensionality Reduction Data can be represented by fewer dimensions (features) Reduce dimensionality by selecting subset (feature elimination) Combine with linear and non-linear transformations Height Cigarettes/Day Solution: Dimensionality Reduction Data can be represented by fewer dimensions (features) Reduce dimensionality by selecting subset (feature elimination) Combine with linear and non-linear transformations Height Cigarettes/Day Solution: Dimensionality Reduction Data can be represented by fewer dimensions (features) Reduce dimensionality by selecting subset (feature elimination) Combine with linear and non-linear transformations Height Cigarettes/Day Solution: Dimensionality Reduction Two features: height and cigarettes per day Both features increase together (correlated) Can we reduce number of features to one? Height Cigarettes/Day Solution: Dimensionality Reduction Two features: height and cigarettes per day Both features increase together (correlated) Can we reduce number of features to one? Height Cigarettes/Day Solution: Dimensionality Reduction Two features: height and cigarettes per day Both features increase together (correlated) Can we reduce number of features to one? Height Cigarettes/Day Solution: Dimensionality Reduction Two features: height and cigarettes per day Both features increase together (correlated) Can we reduce number of features to one? Height Cigarettes/Day Height + Cigarettes/Day Solution: Dimensionality Reduction Create single feature that is combination of height and cigarettes This is Principal Component Analysis (PCA) Height Cigarettes/Day Height + Cigarettes/Day Solution: Dimensionality Reduction Create single feature that is combination of height and cigarettes This is Principal Component Analysis (PCA) Height + Cigarettes/Day Dimensionality Reduction Given an 𝑁 -dimensional data set ( π‘₯ ), find a 𝑁 Γ— 𝐾 matrix ( π‘ˆ ): 𝑦 = π‘ˆ 𝑇 π‘₯ , where 𝑦 Β  has 𝐾 dimensions and 𝐾 < 𝑁 Β  π‘₯ = π‘₯ 1 π‘₯ 2 β‹― π‘₯ 𝑛 Β Β Β  π‘ˆ 𝑇 𝑦 = 𝑦 1 𝑦 2 β‹― 𝑦 π‘˜ ( 𝐾 < 𝑁 ) Β  Principal Component Analysis (PCA) X2 X1 Principal Component Analysis (PCA) X2 X1 Principal Component Analysis (PCA) X2 X1 Principal Component Analysis (PCA) X2 X1 Direction: 𝑣 1 Length: πœ† 1 Β  Direction: 𝑣 2 Length: πœ† 2 Β  SVD is a matrix factorization method normally used for PCA Does not require a square data set SVD is used by Scikit -learn for PCA Single Value Decomposition (SVD) 𝐴 π‘š Γ— 𝑛 Β  π‘ˆ π‘š Γ— π‘š Β  𝑆 π‘š Γ— 𝑛 Β  𝑉 𝑛 Γ— 𝑛 𝑇 Β  ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ = ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ 0 0 0 ⋆ 0 0 0 ⋆ 0 0 0 0 0 0 ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ Β  How can SVD be used for dimensionality reduction? Principal components are calculated from π‘ˆπ‘† "Truncated SVD" used for dimensionality reduction ( 𝑛 β†’ π‘˜ ) Β  Truncated Single Value Decomposition 𝐴 π‘š Γ— 𝑛 Β  π‘ˆ π‘š Γ— π‘˜ Β  𝑆 π‘˜ Γ— π‘˜ Β  𝑉 π‘˜ Γ— 𝑛 𝑇 Β  ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ β‰ˆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ πŸ— 0 0 0 πŸ• 0 0 0 𝟏 0 0 0 0 0 0 ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ Β  PCA and SVD seek to find the vectors that capture the most variance Variance is sensitive to axis scale Must scale data! Importance of Feature Scaling X2 X1 100 200 300 400 500 10 20 30 40 50 Unscaled PCA and SVD seek to find the vectors that capture the most variance Variance is sensitive to axis scale Must scale data! X2 X1 10 20 30 40 50 10 20 30 40 50 Unscaled Scaled Importance of Feature Scaling Import the class containing the dimensionality reduction method from sklearn.decomposition import PCA Create an instance of the class PCAinst = PCA ( n_components =3, whiten=True) Fit the instance on the data and then transform the data X_trans = PCAinst . fit_transform ( X_train ) Does not work with sparse matrices PCA: The Syntax Import the class containing the dimensionality reduction method from sklearn.decomposition import PCA Create an instance of the class PCAinst = PCA ( n_components =3, whiten=True) PCA: The Syntax Import the class containing the dimensionality reduction method from sklearn.decomposition import PCA Create an instance of the class PCAinst = PCA ( n_components =3, whiten=True) final number of dimensions PCA: The Syntax Import the class containing the dimensionality reduction method from sklearn.decomposition import PCA Create an instance of the class PCAinst = PCA ( n_components =3, whiten=True) whiten = scale and center data PCA: The Syntax Import the class containing the dimensionality reduction method from sklearn.decomposition import PCA Create an instance of the class PCAinst = PCA ( n_components =3, whiten=True) Fit the instance on the data and then transform the data X_trans = PCAinst . fit_transform ( X_train ) Does not work with sparse matrices PCA: The Syntax Import the class containing the dimensionality reduction method from sklearn.decomposition import PCA Create an instance of the class PCAinst = PCA ( n_components =3, whiten=True) Fit the instance on the data and then transform the data X_trans = PCAinst . fit_transform ( X_train ) Does not work with sparse matrices PCA: The Syntax Truncated SVD : The Syntax Import the class containing the dimensionality reduction method from sklearn.decomposition import TruncatedSVD Create an instance of the class SVD = TruncatedSVD ( n_components =3) Fit the instance on the data and then transform the data X_trans = SVD . fit_transform ( X_sparse ) Works with sparse matricesβ€”used with text data for Latent Semantic Analysis (LSA) Import the class containing the dimensionality reduction method from sklearn.decomposition import TruncatedSVD Create an instance of the class SVD = TruncatedSVD ( n_components =3) Fit the instance on the data and then transform the data X_trans = SVD . fit_transform ( X_sparse ) Works with sparse matricesβ€”used with text data for Latent Semantic Analysis (LSA) does not center data Truncated SVD : The Syntax Transformations calculated with PCA/SVD are linear Data can have non-linear features This can cause dimensionality reduction to fail Moving Beyond Linearity Original Space Projection by PCA Transformations calculated with PCA/SVD are linear Data can have non-linear features This can cause dimensionality reduction to fail Moving Beyond Linearity Original Space Projection by PCA Transformations calculated with PCA/SVD are linear Data can have non-linear features This can cause dimensionality reduction to fail Moving Beyond Linearity Original Space Projection by PCA dimensionality reduction fails Solution: kernels can be used to perform non-linear PCA Like the kernel trick introduced for SVMs Kernel PCA Original Space Projection by KPCA Kernel PCA Linear PCA 𝑅 2 Β  𝑅 2 Β  Ξ¦ Β  𝐹 Β  Kernel PCA Solution: kernels can be used to perform non-linear PCA Like the kernel trick introduced for SVMs Kernel PCA: The Syntax Import the class containing the dimensionality reduction method from sklearn.decomposition import KernelPCA Create an instance of the class kPCA = KernelPCA ( n_components =3, kernel=' rbf ', gamma=1.0) Fit the instance on the data and then transform the data X_trans = kPCA . fit_transform ( X_train ) ![image](pptimages/image1.jpeg) X Y Z MDS: The Syntax Import the class containing the dimensionality reduction method from sklearn.manifold import MDS Create an instance of the class mdsMod = MDS ( n_components =2) Fit the instance on the data and then transform the data X_trans = mdsMod . fit_transform ( X_sparse ) Many other manifold dimensionality methods exist: Isomap , TSNE . ![image](pptimages/image2.jpeg) ![image](pptimages/image3.jpeg) Divide image into 12 x 12 pixel sections Flatten section to create row of data with 144 features Perform PCA on all data points Image Source : https:// commons.wikimedia.org /wiki/ File:Monarch_In_May.jpg ![image](pptimages/image4.jpeg) Divide image into 12 x 12 pixel sections Flatten section to create row of data with 144 features Perform PCA on all data points Image Source : https:// commons.wikimedia.org /wiki/ File:Monarch_In_May.jpg 12 x 12 1 2 3 … 142 143 144 ![image](pptimages/image5.jpeg) Divide image into 12 x 12 pixel sections Flatten section to create row of data with 144 features Perform PCA on all data points Image Source : https:// commons.wikimedia.org /wiki/ File:Monarch_In_May.jpg 1 2 3 … 142 143 144 1 2 3 … 142 143 144 1 2 3 … 142 143 144 1 2 3 … 142 143 144 1 2 3 … 142 143 144 1 2 3 … 142 143 144 ![image](pptimages/image6.jpeg) 144 Dimensions 60 Dimensions ![image](pptimages/image7.jpeg) 144 Dimensions 16 Dimensions ![image](pptimages/image8.jpeg) ![image](pptimages/image9.jpeg) PCA Compression : 144 β†’ 4 Dimensions 144 Dimensions 4 Dimensions L2 Error and PCA Dimension PCA Dimension 0.2 40 20 Relative Error 0.8 1.0 0.6 0.4 60 80 100 120 140 ![image](pptimages/image10.jpeg) ![image](pptimages/image11.jpeg) ![image](pptimages/image12.jpeg) PCA Compression : 144 β†’ 1 Dimension 144 Dimensions 1 Dimension