Principal Component Analysis with Structured Factors

Professor Andrea Montanari
Professor, Stanford University
Given on: April 24th, 2014

Abstract

Many modern data sets are naturally presented as data matrices. Examples include recommendation systems (with rows corresponding to products, and columns to customers), hyper-spectral imaging (with rows corresponding to pixels, and columns to frequencies), gene expression data (with rows corresponding to patients, and columns to genes). Principal component analysis aims at reducing the dimensionality of such datasets by projecting samples in a few directions of maximum variability.

The principal vectors are often interpretable and convey important information about the data. This in turn imposes constraints on these vectors: for instance, in some cases the principal directions are known to be non-negative or sparse. Substantial work has been devoted to exploiting these constraints to improve statistical accuracy. Despite this, it is still poorly understood whether and when this is a good idea. I will discuss three examples that illustrate the mathematical richness of this problem, and its far reaching practical implications. [Based on joint work with Yash Deshpande and Emile Richard.]