Yankai Cao

Assistant Professor

Research Classification

Research Interests

optimization
Artificial Intelligence
Renewable energy systems
Process Control

Relevant Thesis-Based Degree Programs

Affiliations to Research Centres, Institutes & Clusters

Research Options

I am available and interested in collaborations (e.g. clusters, grants).
I am interested in and conduct interdisciplinary research.
I am interested in working with undergraduate students on research projects.
 
 

Research Methodology

machine learning
Stochastic Optimization

Recruitment

Master's students
Doctoral students
Postdoctoral Fellows
Any time / year round

My research group focuses on the design and implementation of large-scale local and global optimization algorithms to tackle problems that arise in diverse decision-making paradigms such as machine learning, stochastic optimization, and optimal control. Our algorithms combine mathematical techniques and emerging high-performance computing hardware to achieve computational scalability.

The problems that we are addressing are of unprecedented complexity and defy the state-of-the-art. For example, in our recent work, we developed a novel global optimization algorithm capable of solving k-center clustering problems (an unsupervised learning task) with up to 1 billion samples, while state-of-the-art approaches in the literature can only address several thousand samples.

We are currently using our tools to address engineering and scientific questions that arise in diverse application domains, including optimal decision trees, optimal clustering, deep-learning-based control, optimal power system planning, AI for bioprocess operation, and optimal design of zero energy buildings.

 

Selected publications

K. Hua, J. Ren, and Y. Cao. “A Scalable Deterministic Global Optimization Algorithm for Training Optimal Decision Tree on Large Datasets.” Conference on Neural Information Processing Systems (NeurIPS), 2022.

Y. Li, K. Hua, and Y. Cao. “Using stochastic programming to train neural network approximation of nonlinear MPC laws.” Automatica, 146, 110665, 2022. https://doi.org/10.1016/j.automatica.2022.110665

M. Shi, K. Hua, J. Ren, and Y. Cao. “Global Optimization of K-Center Clustering.” International Conference on Machine Learning (ICML), 2022. https://proceedings.mlr.press/v162/shi22b.html

K. Hua, M. Shi, and Y. Cao. “A Scalable Deterministic Global Optimization Algorithm for Clustering Problems.” International Conference on Machine Learning (ICML), 2021. http://proceedings.mlr.press/v139/hua21a.html

M. Mehrtash, and Y. Cao. “A New Global Solver for Transmission Expansion Planning with AC Network Model.” IEEE Transactions on Power Systems, 37(1), 282 – 293, 2021. https://doi.org/10.1109/TPWRS.2021.3086085

I support public scholarship, e.g. through the Public Scholars Initiative, and am available to supervise students and Postdocs interested in collaborating with external partners as part of their research.
I support experiential learning experiences, such as internships and work placements, for my graduate students and Postdocs.
I am open to hosting Visiting International Research Students (non-degree, up to 12 months).
I am interested in hiring Co-op students for research placements.

Complete these steps before you reach out to a faculty member!

Check requirements
  • Familiarize yourself with program requirements. You want to learn as much as possible from the information available to you before you reach out to a faculty member. Be sure to visit the graduate degree program listing and program-specific websites.
  • Check whether the program requires you to seek commitment from a supervisor prior to submitting an application. For some programs this is an essential step while others match successful applicants with faculty members within the first year of study. This is either indicated in the program profile under "Admission Information & Requirements" - "Prepare Application" - "Supervision" or on the program website.
Focus your search
  • Identify specific faculty members who are conducting research in your specific area of interest.
  • Establish that your research interests align with the faculty member’s research interests.
    • Read up on the faculty members in the program and the research being conducted in the department.
    • Familiarize yourself with their work, read their recent publications and past theses/dissertations that they supervised. Be certain that their research is indeed what you are hoping to study.
Make a good impression
  • Compose an error-free and grammatically correct email addressed to your specifically targeted faculty member, and remember to use their correct titles.
    • Do not send non-specific, mass emails to everyone in the department hoping for a match.
    • Address the faculty members by name. Your contact should be genuine rather than generic.
  • Include a brief outline of your academic background, why you are interested in working with the faculty member, and what experience you could bring to the department. The supervision enquiry form guides you with targeted questions. Ensure to craft compelling answers to these questions.
  • Highlight your achievements and why you are a top student. Faculty members receive dozens of requests from prospective students and you may have less than 30 seconds to pique someone’s interest.
  • Demonstrate that you are familiar with their research:
    • Convey the specific ways you are a good fit for the program.
    • Convey the specific ways the program/lab/faculty member is a good fit for the research you are interested in/already conducting.
  • Be enthusiastic, but don’t overdo it.
Attend an information session

G+PS regularly provides virtual sessions that focus on admission requirements and procedures and tips how to improve your application.

 

ADVICE AND INSIGHTS FROM UBC FACULTY ON REACHING OUT TO SUPERVISORS

These videos contain some general advice from faculty across UBC on finding and reaching out to a potential thesis supervisor.

Graduate Student Supervision

Doctoral Student Supervision

Dissertations completed in 2010 or later are listed below. Please note that there is a 6-12 month delay to add the latest dissertations.

Data-driven degradation modeling of lithium-ion batteries (2024)

Ensuring the safe and reliable usage of Lithium-ion batteries (LIBs) necessitates accurate degradation modeling. While data-driven methods offer promising prospects for modeling battery degradation, the intricate structures often make them specific to datasets. Furthermore, the black-box nature of data-driven models complicates the understanding of their decision-making process. In this thesis, we delve into data-driven modeling of battery degradation, focusing on capacity estimation and cycle life prediction, to address the challenges of generalizability and interpretability.To improve the generalizability of the model, we first propose adopting a simple and robust machine learning model, partial least squares regression (PLSR), for joint battery capacity estimation and remaining useful life (RUL) prediction. Experimental results on three battery cells cycled at varied conditions demonstrate superior generalizability of the suggested model over complex and sophisticated methods. Another approach we propose to improve the generalizability of the model is to use transfer learning. This approach presents excellent performance in handling the significant diversity in different types of batteries, as it can transfer the knowledge contained in well-studied batteries to a new battery. The key idea involves training a model in one type of battery with sufficient data. Then, the model can be applicable to a new type of battery by fine-tuning some parameters with limited data. Experimental results confirm that transfer learning can effectively enhance the generalizability of data-driven models in capacity estimation and cycle life prediction across different battery types.To build interpretable models, we advocate the use of decision trees for capacity estimation. We start with a classic regression tree with parallel splits for capacity estimation, but it requires a tree depth of 11 to achieve satisfactory performance. To address this challenge, we adopt optimal regression trees with hyperplane splits and propose a novel algorithm, DE-LR-ORTH, to train such a tree. DE-LR-ORTH initially conducts a one-step optimal hyperplane split for each branch node via differential evolution, followed by logistic regression-based fine-tuning to achieve overall optimality. Additionally, a GPU-accelerated implementation is proposed to significantly reduce the training time. Experimental results reveal a 1.0% capacity estimation error at depth 6 while maintaining high interpretability.

View record

Interpretable and stable soft sensor modeling for industrial applications (2024)

Soft sensor technology is an effective way to measure parameters that are hard to measure in real time. It is of significant importance for the monitoring, control, and optimization of industrial production processes. With the richness of process data and the rapid development of machine learning techniques, data-driven soft sensor technologies are increasingly favored. Although soft sensor models have great potential and value in industrial applications, they still face significant challenges, particularly in the areas of model interpretability and stability. Ensuring interpretability and stability is crucial because it directly impacts the reliability and safety of operations in hazardous industrial environments.This dissertation provides a detailed exploration of soft sensor technologies, focusing on enhancing their interpretability and stability for industrial process monitoring. Chapters 2 and 3 focus on improving the interpretability of soft sensors. Chapter 2 introduces the Extra Trees (ET) algorithm and employs SHapley Additive exPlanations (SHAP) to enhance the interpretability of this inherently accurate but complex model. Chapter 3 explores interpretable feature selection techniques, particularly emphasizing the role of SHAP in the selection of meaningful features from complex industrial data. Subsequently, we utilize the selected interpretable features to establish a simple soft sensor model. In Chapter 4, the main topic shifts to the stability of the soft sensor model; we propose a stable learning algorithm based on the generation of virtual samples to improve stability in the face of industrial disturbances and data scarcity. Chapter 5 delves into the role of causality in soft sensor modeling, demonstrating how mining causal relationships between variables can significantly improve both stability and interpretability. It also emphasizes the importance of incorporating the knowledge of the process to ensure precision in the discovery of causal relationships. Chapter 6 presents two methods for extracting unsupervised and supervised latent causal features. By extracting latent causal features, not only is interpretability retained, but our model also becomes more robust. Finally, we analyze the main contributions and consider how they can be utilized in industrial contexts to improve the efficiency, safety,reliability, and interpretability of soft sensors.

View record

Master's Student Supervision

Theses completed in 2010 or later are listed below. Please note that there is a 6-12 month delay to add the latest theses.

Deep learning-based approximation of model predictive control using mixture networks (2023)

Model Predictive Control (MPC) is an optimization-based control scheme exploited in various industrial processes. It determines optimal control inputs that achieve the desired outcome by predicting future behavior based on models while satisfying system constraint sets. The consideration of complex system dynamics and multiple constraints enables the control of nonlinear processes with complicated behavior. Furthermore, because of its extensive applicability, MPC has been applied to the design of supply chain management, especially to scheduling problems that are formulated as mixed-integer linear programming (MILP) problems. However, the online implementation of MPC is challenging, especially for large-scale systems, due to the prohibitive computation cost. In recent years, the approximation method of MPC control laws using deep neural networks (DNNs) has been studied to address this issue. Nevertheless, it struggles to provide accurate approximation when multiple optimal control inputs exist for each system state. In this case, the MPC control laws follow one-to-many mappings, which DNNs cannot correctly approximate as they can only provide one-to-one mappings. Therefore, we propose mixture network-based approximation methods. Mixture networks, with components of probability (density) distributions in the output layer, can approximate the MPC control laws through a combination of conditional probabilities provided by mixing several estimated probability distributions. This approach then generates multiple control inputs with the highest probabilities. Notably, the proposed method can be applied to various problems by selecting an appropriate probability distribution, such as using a Gaussian distribution for nonlinear problems and a Bernoulli distribution for MILP problems. In this thesis, we investigate two case studies: a benchmark problem for nonlinear problems and a scheduling problem in the steel-making process for MILP problems. The simulation results demonstrate that the mixture network-based approximation method outperforms the DNN-based approximation method.

View record

Global optimization of clustering problems (2022)

Clustering is a fundamental unsupervised machine learning task that aims to aggregate similar data into one cluster and separate those in diverse into different clusters. Cluster analysis can always be formulated as an optimization problem. Various objective functions may lead to different clustering problems. In this thesis, we concentrate on k-means and k-center problems. Each can be formulated as a mixed-integer nonlinear programming problem. The work about k-means clustering optimization has been published on ICML 2021 [30]. Moreover, we also submitted the work about global optimization of k-center clustering to ICML 2022 and the paper has been accepted in Phase 1 of reviewing. This thesis provides a practical global optimization algorithm for these two tasks based on a reduced-space spatial branch and bound (BB) scheme. This algorithm can guarantee convergence to the global optimum by only branching on the centers of clusters, which is independent of the dataset’s cardinality. We also design several methods to construct lower and upper bounds at each node in the BB scheme. In addition, for k-center problem, a set of feasibility-based bounds tightening techniques are proposed to narrow down the domain of centers and significantly accelerate the convergence. To demonstrate the capacity of this algorithm, we present computational results on UCI datasets and compare our proposed algorithms with the off-the-shelf global optimal solvers and classical local optimal algorithms. For k-means clustering, the numerical experiments demonstrated our algorithm’s ability to handle datasets with up to 200,000 samples. Besides, for k-center clustering, the serial implementation of the algorithm on the dataset with 14 million samples and 3 features can attain the global optimum to an optimality gap of 0.1% within 2 hours.

View record

 
 

If this is your researcher profile you can log in to the Faculty & Staff portal to update your details and provide recruitment preferences.

 
 

Sign up for an information session to connect with students, advisors and faculty from across UBC and gain application advice and insight.