HomeNewsNews 》 Content

Statistician Yiyuan She To Build a Data Science Innovation Hub at Westlake University

2025-07-21 15:20:18

When Prof. Yiyuan She, an internationally renowned statistician, recently joined Westlake University as a chair professor in the School of Science and the Institute for Theoretical Sciences, the odds were pretty good that his office would quickly become a popular hub for faculty discussion and collaboration.

That's because statistics, as the core foundation of data science, is an indispensable pillar for fields like machine learning and artificial intelligence, serving a wide range of disciplines including the natural sciences, engineering and social sciences.

In fact, statistics is everywhere. It influences not only how we interpret and process data but also profoundly shapes our scientific philosophies and beliefs.

She's research spans high-dimensional statistics, machine learning, optimization techniques, big data analysis and robust statistics. His work, integrating theory, computation and application, lies at the intersection of statistics, mathematics and computer science.

She's interdisciplinary academic journey saw him study mathematics and computer science at Peking University and then pursue a doctorate in statistics at Stanford University  ̶  one of the top institutions in the field  ̶  in the United States. After earning his Ph.D in 2008, he joined the Department of Statistics at Florida State University, where he became a full professor in 2018.

He is a fellow of the American Statistical Association, a fellow of the Institute of Mathematical Statistics, and an elected member of the International Statistical Institute. In 2014, he received the U.S. National Science Foundation's CAREER Award, given to junior faculty members who have the potential to become academic role models in research and education.

The first statistician recruited by Westlake, She will help to establish the university as a leading research and talent hub in data science and its interdisciplinary applications.

Statistics infer general patterns from data and provide a rigorous theoretical foundation for real-world decision-making.

In She's words, it is "a science centered on data, focused on understanding and dealing with uncertainty".

Why focus on uncertainty? It arises from measurement error, individual differences, sampling bias and the inherent complexity and incompleteness of real-world models. Through systematic analysis and modeling, statistics help us understand and quantify these uncertainties, improving the reliability of scientific inferences and decision-making.

One of the fields in which statistics plays a key role is AI, where ideal models must not only perform well on given data but also exhibit statistical validity  ̶  the ability to generalize to broader, unknown scenarios.

In statistics and machine learning, this is called generalization. Without it, models often suffer from overfitting, losing their ability to adapt beyond the training data.

Today's complex models may have trillions of parameters, far exceeding human cognitive capacity. One of She's major focuses, high-dimensional statistics, tackles the challenge of making inferences when the number of variables far exceeds the number of samples  ̶  the so-called "curse of dimensionality".

He says "a core challenge of modern statistics is to recover the underlying structure of high-dimensional data from limited and imperfect samples".

Recent statistical research shows that even the most complex data often has hidden patterns, making accurate prediction and analysis possible.

She's research integrates non-asymptotic theory, which is a way of getting useful results from limited amounts of data, with efficient optimization algorithms and advanced regularization techniques (such as variable selection, projection and clustering), aiming to extract deep relationships from data and make seemingly chaotic high-dimensional information clear and interpretable.

His work in high-dimensional statistics, low-rank modeling, robust inference, and non-convex/non-smooth optimization provides novel methods for machine learning to discover patterns in complex data and offers powerful tools and fresh insights for fields such as biomedicine and economics.

Real-world data is often messy: Outliers, labeling errors and high-leverage points are common. These anomalies can be highly disruptive  ̶  even a single extreme outlier can render traditional estimation and inference methods completely ineffective.

But in modern big data applications, manually detecting outliers is nearly impossible. Moreover, parameter estimation, outlier detection, and statistical inference are deeply intertwined, particularly in the supervised learning scenarios seen in the development of AI large language models.

So how do we find the normal within the abnormal? According to She, the rise of data science presents new opportunities and challenges for robust statistics, which is a way of dealing with extreme data points.

Traditional robust estimation methods often treat anomalies as "noise" and focus on suppressing their impact. But in reality, outliers may carry crucial information. Just as in criminal investigations, seemingly anomalous clues can often lead to breakthroughs.

That means simultaneously quantifying the anomaly risk of each data point during modeling, estimation and inference is crucial. To address this challenge, She innovatively bridged robust loss functions and high-dimensional statistical regularization, unifying anomaly detection and parameter estimation within a single framework by integrating sparse constraints and non-convex optimization techniques. His methods, backed by finite-sample theoretical guarantees, offer efficient algorithms for big-data analysis.

To statisticians, the world is inherently uncertain, and so is the data we observe  ̶  full of randomness and variability.

So, does data shape our beliefs, or do our beliefs shape how we interpret data? The astounding capabilities of some complex models today are largely data-driven, but more research is needed to truly understand these mechanisms.

That is where She is leading his students: "To explore the hidden, grasp the essential, understand the ordinary, and adapt to changes."