Dimensionality Reduction

alnavar8
Jul 4, 2020
2 min read

In the era of technology, a large amount of data is being produced at every instant of time. This data is very crucial for building intelligent systems and is utilized by data scientists. However, for Machine Learning, a large amount of data is difficult to handle. Due to large number of dimensions or features, the accuracy of a model reduces and can also cause overfitting. This brings in the need for dimensionality reduction. The dimensions can be reduced by two methods i.e. Feature elimination and Feature extraction.

In feature elimination, the multiple features are ignored and only the important features are considered. Here, we lose important, non-trivial information and hence reduce the accuracy of prediction. Whereas, in feature extraction some of the original variables are retained and new variables are created as a combination of the remaining original variables such that they are independent of each other. This method reduces the dimensionality and also retains non-trivial information of the data. PCA or Principle Component Analysis and LDA or Linear Discriminant Analysis are methods of Feature Extraction.

PCA uses transformation in order to convert the data from the dataset into a low dimensional space to achieve maximum variance. This variance would reflect the differences among the classes. The resultant features set are called principle components and the feature that has the largest eigen value carries maximum information. The features with smaller eigen values carry less information and can be discarded. PCA is used in the field of bioinformatics, psychology, data mining and finance wherein the data has many dimensions. It is seen in facial recognition, image compression and computer vision. PCA is mainly used in classification problems.

Linear discriminant analysis is another technique to perform dimensionality reduction and aims to reduce the variance and increase the distance between the means of the classes. LDA is mainly used for supervised classification type of problems where in the dataset is labelled. LDA can be used in the field of computer science, just like PCA. It aims to classify the data into non-overlapping classes.

Learnt Something new? Then hit the heart! How else would I know you like my content? Don't forget to comment down below and discuss how data can become the biggest asset of a company. Also, subscribe to THE AI STUDIO for more AI related Content.

Dimensionality Reduction

Recent Posts

Comments