Machine Learning for Dimension Reduction in High‐Dimensional Datasets

Save to favorites

    Cardiff University
    United Kingdom
    Formal sciences



In today’s environment where computer processors are powerful and computer memory cheap, researchers are able to collect and store huge amounts of data. Analysing that data needs sophisticated statistical and computational methods as most classic statistical methodology was developed at an era where data collection was not as easy and datasets where a lot of orders of magnitude smaller. Sufficient dimension reduction (SDR) is a class of methods for feature extraction in regression and classification problems with the purpose of reducing the size of a multidimensional dataset to a few important features.

This has the potential of improving visualization of the most important relationships between the variables. This project will focus on the improvement of existing methodology for more accurate and computationally faster estimation algorithms to achieve SDR. Among the most interesting suggestions in the literature uses machine learning algorithms and more specifically Support Vector Machines (SVM). The method although powerful can be improved in different directions and therefore there are a number of directions that a student can take on this project. A few examples are: to derive new SDR methodology robust to outliers; to derive Sparse SDR methodology; to derive SDR methodology when we have missing predictors; to derive SDR methodology for functional data and many more.

Moreover there are many modern applications (like text data analysis) where the data are really high-dimensional and not derived from a Gaussian distribution. In those cases, the literature is rather thin in computationally effective methods for efficient dimension reduction. We are looking into developing both supervised and unsupervised dimension reduction methods (like non-Gaussian PCA, non-Gaussian CCA etc) which are computationally efficient and accurate in the results especially in the nonlinear feature extraction setting. Interested students can look into a number of directions sparse methodology, real time algorithms or applications to real datasets.

What is funded

Self-funded students only.

How to Apply

Applicants should submit an application for postgraduate study via the online application service: http://www.cardiff.ac.uk/study/postgraduate/research/programmes/programm...

In the research proposal section of your application, please specify the project title and supervisors of this project.

We are interested in pursuing this project and welcome applications if you are self-funded or have funding from other sources, including government sponsorships or your employer.

If you are applying for more than one Cardiff University project please note this in the research proposal section.


The responsibility for the funding offers published on this website, including the funding description, lies entirely with the publishing institutions. The application is handled uniquely by the employer, who is also fully responsible for the recruitment and selection processes.