DATA MUNGING
DATA CLEANING PYTHON
MACHINE LEARNING RECIPES
PANDAS CHEATSHEET
ALL TAGS
# How to reduce dimentionality using PCA in Python?

This recipe helps you reduce dimentionality using PCA in Python

In many datasets we find that number of features are very large and if we want to train the model it take more computational cost. To decrease the number of features we can use Principal component analysis (PCA). PCA decrease the number of features by selecting dimension of features which have most of the variance.

So this recipe is a short example of how can reduce dimentionality using PCA in Python.

```
from sklearn import datasets
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
```

Here we have imported various modules like PCA, datasets and StandardScale from differnt libraries. We will understand the use of these later while using it in the in the code snipet.

For now just have a look on these imports.

Here we have used datasets to load the inbuilt digits dataset.
```
digits = datasets.load_digits()
```

StandardScaler is used to remove the outliners and scale the data by making the mean of the data 0 and standard deviation as 1.
```
X = StandardScaler().fit_transform(digits.data)
print(); print(X)
```

We are also using Principal Component Analysis(PCA) which will reduce the dimension of features by creating new features which have most of the varience of the original data. We have passed the parameter n_components as 0.85 which is the percentage of feature in final dataset. We have also printed shape of intial and final dataset.
```
pca = PCA(n_components=0.85, whiten=True)
X_pca = pca.fit_transform(X)
print(X_pca)
print("Original number of features:", X.shape[1])
print("Reduced number of features:", X_pca.shape[1])
```

Foe better understanding we are applying PCA again. Now We have passed the parameter n_components as 0.85 which is the percentage of feature in final dataset. We have also printed shape of intial and final dataset.
```
pca = PCA(n_components=2, whiten=True)
X_pca = pca.fit_transform(X)
print(X_pca)
print("Original number of features:", X.shape[1])
print("Reduced number of features:", X_pca.shape[1])
```

As an output we get:

[[ 0. -0.33501649 -0.04308102 ... -1.14664746 -0.5056698 -0.19600752] [ 0. -0.33501649 -1.09493684 ... 0.54856067 -0.5056698 -0.19600752] [ 0. -0.33501649 -1.09493684 ... 1.56568555 1.6951369 -0.19600752] ... [ 0. -0.33501649 -0.88456568 ... -0.12952258 -0.5056698 -0.19600752] [ 0. -0.33501649 -0.67419451 ... 0.8876023 -0.5056698 -0.19600752] [ 0. -0.33501649 1.00877481 ... 0.8876023 -0.26113572 -0.19600752]] [[ 0.70631939 -0.39512814 -1.73816236 ... 0.60320435 -0.94455291 -0.60204272] [ 0.21732591 0.38276482 1.72878893 ... -0.56722002 0.61131544 1.02457999] [ 0.4804351 -0.13130437 1.33172761 ... -1.51284419 -0.48470912 -0.52826811] ... [ 0.37732433 -0.0612296 1.0879821 ... 0.04925597 0.29271531 -0.33891255] [ 0.39705007 -0.15768102 -1.08160094 ... 1.31785641 0.38883981 -1.21854835] [-0.46407544 -0.92213976 0.12493334 ... -1.27242756 -0.34190284 -1.17852306]] Original number of features: 64 Reduced number of features: 25 [[ 0.70634542 -0.39504744] [ 0.21730901 0.38270788] [ 0.48044955 -0.13126596] ... [ 0.37733004 -0.06120936] [ 0.39703595 -0.15774013] [-0.46406594 -0.92210953]] Original number of features: 64 Reduced number of features: 2

**
Download Materials
**

Deep Learning Project to implement an Abstractive Text Summarizer using Google's Transformers-BART Model to generate news article headlines.

In this deep learning project, you will learn to implement Unet++ models for medical image segmentation to detect and classify colorectal polyps.

In this data science project, you will contextualize customer data and predict the likelihood a customer will stay at 100 different hotel groups.

In this machine learning project, you will use the video clip of an IPL match played between CSK and RCB to forecast key performance indicators like the number of appearances of a brand logo, the frames, and the shortest and longest area percentage in the video.

In this data science project, we will predict the credit card fraud in the transactional dataset using some of the predictive models.

In this time series project, you will build a model to predict the stock prices and identify the best time series forecasting model that gives reliable and authentic results for decision making.

In this time series project, you will learn how to build an autoregressive model in Python from Scratch for forecasting time series data.

Given big data at taxi service (ride-hailing) i.e. OLA, you will learn multi-step time series forecasting and clustering with Mini-Batch K-means Algorithm on geospatial data to predict future ride requests for a particular region at a given time.

In this deep learning project, you will learn how to build your custom OCR (optical character recognition) from scratch by using Google Tesseract and YOLO to read the text from any images.

Classification is one of the basic things in ML and most of us jump to Neural networks or boosting to predict classes. But more often than not, to make the other person understand how the classification is happening, we need to use basic models like Logistic, decision trees etc. In this project we talk about you can apply various basic techniques, the maths and intuition behind them and how they paved way to bagging and boosting of the world