Leeds Statistics & Probability PGR Seminar Term 1 2023-24

This is Term 1 2023-24, so all the dates below correspond to the year 2023

Copyright disclaimer: The titles and abstracts of the talks are the property of the speakers and their colaborators.

18th October

Euler-Maruyama scheme for SDEs with distributional drifts: linear and McKean-Vlasov

Luis Mario Chaparro Jáquez

We present an Euler scheme developed to solve SDEs with distributional drifts belonging to a Hölder-Zygmund space and unit diffusion driven by a standard Brownian motion. In particular the empirical and theoretical results for the linear SDE case and the advances on an alternative approach to the numerical solution of one type of McKean-Vlasov SDEs.

This is joint work with Elena Issoglio (University of Torino) and Jan Palczewski (University of Leeds).

25th October

Decision-making Using Social Information

Anna Sigalou

For the first part of this talk, I will present my recent (and ongoing) work on the evolution of decision-making rules in groups of social agents. In the second part, I will provide a brief introduction on social information, its relevance as a research subject, and some applications and open questions.

1st November

The Regular Case of Optimal Stopping

Jacob Smith

The study of optimal stopping problems has a long history, but the seminal work by El Karoui expanded the reach of the analysis. In this talk, I will present the first generalisation of problems of optimal stopping and the solution methods.

8th November

Stats new starters introduction

This is an opportunity for the new PhD students (starting 2023) to introduce themselves, meet the rest of people in the department and talk a bit about their interests.

15th November

Interpreting machine learning models based on Shapley values

Mingrui Zhang

Machine learning models excel in prediction but lack in interpretation because they are “black boxes”. Our study focuses on finding the important variables in machine learning models to get some insight of them. Firstly, we use Shapley values to calculate a contribution for each variable on each prediction. Second, we do Shapley regression on those Shapley values and predictions. From the efficiency property of Shapley values, the coefficients for important variables in Shapley regression should be close to 1. However, the number of data needed to observe convergence to 1 may be large. We tried penalized linear regression and Bayesian approach to “speed up” the convergence. We discussed reason why these two methods does not works well and the pitfall of Shapley regression.

Functional Principal Component Analysis for Longitudinally Captured MRI Datasets

Sonia Dembowska

Imaging plays an integral role in the diagnosis and monitoring of neurodegenerative diseases. Currently, the common methods for such diseases include models for biomarkers extracted from imaging data. Methods for the temporal modeling of images themselves remain limited due to the high dimensionality of the problem. Our aim is to model longitudinally disease progression using complete images. We propose a novel Functional Principal Component Analysis (FPCA) model which includes a cross sectional component for the high dimensional data and a time component to model longitudinal dependency. Our contributions include the proposed model and its estimation to circumvent the computational burden due to high dimensionality.

Our model is obtained by applying Mercer’s theorem on the marginal covariance function of the images where we integrate over time. The time component is modeled in the individual specific score functions. For estimation of the model, we propose a method which comprises a singular value decomposition of the inner products of the underlying processes for each time point and a regression component to estimate the scores. As we are modeling disease progression, we account for longitudinal data in the temporal component by using splines for the scores and longitudinal mixed models.

We apply our method to Alzheimer’s Disease Neuroimaging Initiative data that provides imaging data and diagnosis for 382 subjects with Alzheimer’s disease. We will use the score functions to relate to outcomes. Separately, we study the model performance in simulation studies to measure our ability to reconstruct original data and forecast patient disease using the scores.

21st November

Bayesian CART for insurance pricing

Yaojun Zhang

The accuracy and interpretability of a (non-life) insurance pricing model are essential qualities to ensure fair and transparent premiums for policy-holders, that reflect their risk. In recent years, classification and regression trees (CARTs) and their ensembles have gained popularity in the actuarial literature, since they offer good prediction performance and are relatively easy to interpret. In this paper, we introduce Bayesian CART models for insurance pricing. To this end, we introduce a general MCMC algorithm using data augmentation methods for posterior tree exploration. We also introduce the deviance information criterion (DIC) for tree model selection. The proposed models are able to identify trees which can better classify the policy-holders into risk groups. Simulations and real insurance data will be used to illustrate the applicability of these models.

29th November

Regularisation by multiplicative noise for reaction-diffusion equations

Teodor Holland

We consider the stochastic reaction-diffusion equation in $1+1$ dimensions driven by multiplicative space-time white noise with distributional drift belonging in the negative Besov-Hölder space $C^\alpha$ with $\alpha>−1$. We assume that the diffusion coefficient is sufficiently regular and nondegenerate. By using a combination of stochastic sewing and Malliavin calculus, we show that the equation admits a unique strong solution. As a by-product we also obtain quantitative bounds for the density of the multiplicative stochastic heat equation and its derivatives.

6th December

Classification trees based on distance measures and discriminant functions

Yijun Fu

The Classification and Regression Tree (CART) (Breiman, 1984), a decision tree algorithm used for classification, is one of the most important machine learning algorithms and has been wildly used in many fields. However, CART generates classification trees using only one feature in each split, which may lead to a complex tree structure and high computational cost. Various distance-based trees, linear discriminant and quadratic discriminant-based trees have been proposed. This talk investigates four distance-based classification tree models which simplify the tree structure and can improve the performance of CART. We propose classification tree models using Euclidean distance, Mahalanobis distance, Linear discriminant function and Quadratic discriminant function to define splitting rules. A modified cost-complexity pruning method is combined with our proposed distance-based classification trees to avoid overfitting. We compare the performance of our proposed models with that of CART on simulated and real data sets.

Exit Games with Asymmetric information

Yukun Cao

The exit timing is one of the most important decisions in the business world. Numerous empirical studies show that a mistake could result in substantial financial loss. The exit game concerns two firms who are competing in a declining industry trying to decide when to exit the market. They each know their private type i.e. their profit or loss if they choose to leave the market. We will examine the best strategy for a firm under two different settings. The first one is where it is optimal for a monopolist to stay in the market forever and the second one is where there exists an optimal stopping time for the monopolist. We used the guess-and-verify process and constructed a belief process.

13th December

Functional principal component analysis for time-varying binary networks

Puchong Paophan

This study addresses the challenge of analysing time-varying binary network data, commonly seen in dynamic systems like brain region interactions and muscle movements. Due to the limited capabilities of current statistical tools for such data, we propose a novel method using functional principal component analysis (FPCA). This technique transforms the interaction networks into vectors of binary functions (multivariate functional data). By adopting a finite Karhunen-Loéve representation for each univariate component, we establish a Bernoulli likelihood and apply a majorisation-minimisation algorithm for parameter estimation. Subsequently, we leverage the connection between univariate and multivariate FPCA in finite decomposition to derive multivariate functional principal components and scores. These are beneficial for subsequent analyses, including regression and clustering. The effectiveness of our approach is validated through simulations and its application to data involving muscle interactions, proving its capability to elicit significant insights from time-varying binary networks.

Statistical Analyses on Human Smiles

Xiangyu Wu

Dynamical facial expressions are able to express crucial emotional feelings, hence, they form a critical part in daily life. Among these expressions, smile is extraordinarily important, as it is the most salient expression and the most frequently used. In reality, some people may suffer from facial disfigurement, such as cleft lips, when they were born. Studies have shown that people with cleft lip may experience a series of difficulties in everyday life and social problems as they may be marginalised by society. Surgeries are capable of resolving at least part of these abnormalities for cleft patients. Hence, it is important to quantify to what extent surgical interventions can recover a normal lip. This involves assessing bilateral symmetry statistically. We perform various analyses on data containing cleft patients and normal subjects in order to distinguish them.