Modern data analysis methods are expected to handle massive amounts of high dimensional data that are being collected in a variety of domains. The high dimensionality of such data introduces numerous challenges, typically referred to as the “curse of dimensionality”, which render traditional statisical learning approaches impractical or ineffective for their analysis. To cope with these challenges, significant effort has been focused on developing geometric data analysis approachs that model and capture the inrinsic geometry of processed data, rather than directly modeling their distribution. In this course we will explore such approaches and provide an analytical study of the models and algorithms they use. We will start by considering supervised learning and distinguish classifiers that are based on geometric principles from statistical learning approaches, such as Baysian classification. Next, we will consider the unsupervised learning task of clustering data and contrast density based clustering from partitional and hierarchical clustering methods that rely on metric spaces or graph constructions. Finally, we will consider more fundamental tasks in intrinsic representation learning, with particular focus on dimensionality reduction and manifold learning methods, such as Isomap, Diffusion Maps, LLE, and tSNE. Time permitting, the course will also include guest talks discussing recent development in related research areas.
The course will be suitable for CS, statistics, and applied math students interested in data science and machine learning.