# Leeds Stats PGR Seminar 2022-23 Term 1

This is *Term 1 2022-23*, so all the dates below correspond to the year 2022.

*Copyright disclaimer: The titles and abstracts of the talks are the property of the speakers and their colaborators.*

## October 14th

### An Euler-Maruyama scheme for SDEs with distributional drift

*Luis Mario Chaparro Jaquez*

The Euler-Maruyama scheme is a widely used numerical method to compute approximations of the solutions to SDEs. When the coefficients have sufficient regularity the strong convergence rate is known to be $1/2$. Current research studies the convergence rate when those conditions are not satisfied. In particular, we proved that when we have an SDE

$$ X_t = \int_0^t b(s, X_s) ds + W_t $$

with the drift coefficient $b$ in the space of Schwartz distributions $\mathcal{S}’(\mathbb{R})$ we can create a numerical scheme with convergence rate up to $1/2-\epsilon$.

### Evolution of social decision-making rules

*Anna Sigalou*

Social animals live in uncertain environments and rely on cues to navigate them. To make decisions, they use public and social information which they process using decision-making strategies. This process is often modelled by observing collective behaviour, and finding a model that leads to these collective dynamics. Here we attempt the opposite: we first try to identify plausible individual decision-making rules, and then see how they can lead to the observed collective behaviours. We investigate how individual decision-making affects the groupâ€™s collective behaviour, how this group reaches optimal and evolutionary stable behaviour, and we explore how the different decision-making strategies compete with each-other.

## October 28th

### Distance-Based Classification Trees

*Yijun Fu*

Abstract: The classification tree is an important supervised machine learning approach for classification problems. The four proposed distance-based classification tree algorithms and comparisons among their performances is our main interest. Hence, this talk will give a brief introduction about classification trees and discuss the performances of four proposed distance-based classification tree algorithms, including some primary applications. At the end, the main factors that affect the accuracy of proposed distance-based classification trees will be explained.

### Dynkin Games with Partial and Asymmetric Information

*Jacob Smith*

Abstract: Dynkin games are two player stopping games where both players have an incentive to wait for the other to stop. We introduce an asymmetry of information through an exogenous random variable, whose outcome is only known by one player. Moreover, we let the players choose from mixed strategies (randomised stopping times), which allows us to find equilibria in a wider selection of games. Our work consists of: finding the gameâ€™s value and Nash equilibria at all times during the game; finding properties of the optimal strategies; and finding relevant martingale and sub/ super-martingale properties of the value.

## November 11th

### Network-valued functional data analysis with application in New York taxi network

*Puchong Paophan*

Abstract: Network-valued functional data or dynamic network is a network which changes over time. This type of functional data is increasingly available in many applications, such as social interactions, connections between regions in the brain or transport networks. However, it lacks a suitable statistical analysis tool, so it motivated us to develop a framework for the statistical analysis of dynamic networks. In this work, we focus on networks represented as graph Laplacian matrices for which we choose an appropriate metric and define projections to transform graph Laplacian matrices in Euclidean space. This allows us to calculate the means of dynamic networks in the original space and perform functional principal component analysis (FPCA) in the Euclidean space. Finally, we apply the methodology to New York taxi networks.

### Interpreting machine learning models

*Mingrui Zhang*

Abstract: Machine learning models excel in prediction but lack in interpretation because they are â€śblack boxesâ€ť. Our study focuses on finding the important variables in machine learning models to get some insight of them. Firstly, we use Shapley values to calculate a contribution for each variable on each prediction. Second, we do Shapley regression on those Shapley values and predictions. From the efficiency property of Shapley values, the coefficients for important variables in Shapley regression should be close to 1. However, the number of data needed to observe convergence to 1 may be large. We consider a special case that specialists might have prior knowledge of variablesâ€™ importance. Hence, we add a Bayesian prior to the coefficients to â€śspeed upâ€ť the convergence. We demonstrate Bayesian Shapley regression for a simple simulation and show we can successfully detect the noise variable.

## November 25th

### Spatio-Functional Principal Component Analysis for brain fMRI

*Sonia Dembowska*

Abstract: Images are frequently used in clinical settings, but statistical models are still limited. Our work is motivated by experimental data on brain fMRI images captured on 15 subjects whilst performing decision-making tasks. The images were acquired every 2.5 s during the investigation providing each subject with a time series of 3 dimensional images representing their brain activity, each patient also received a clinically-derived risk score at the end. The goal of the study was to associate the images to the clinical outcome. The method used to achieve this goal reduced the temporal dataset to one image by taking the difference of the first three images in one time series. This approach is arbitrary and does not address the full response of a subject over time. We propose a novel method using functional data analysis which decomposes the spatio-temporal images into time-independent principal components and time-dependent score functions. Our functional principal component analysis (FPCA) method preserves the spatial relationships between pixels in the observed data including the temporal aspect. The components hold the spatial information and are three-dimensional whilst we let the scores depend on time and represent individual variation with respect to the components. The novelty of the method is two-fold: firstly, we create a functional principal component analysis model to deal with three-dimensional images for clinical analysis where space and time are detangled. Secondly, the method of parameter estimation is done such that computational requirements are minimal, as we avoid calculating the covariance matrix used to estimate principal components. This makes our method computationally fast. Data analysis reveals that our score functions are able to predict the clinical risk with high accuracy, outperforming the previous method that is unable to account for time. In 10 of the tasks performed, the individual score functions have predicted the risk rating of subjects with an average RMSE of 0.788 whilst the average RMSE of the old method is 2.497. Our result shows meaningful insights and better clinical predictions can be obtained when modelling temporal aspects of image data.

### Discriminative Performance for Non-Proportional Hazards Survival Models

*Alfensi Faruk*

Abstract: Standard discriminative measures are no more appropriately applied to evaluate the performance of survival models due to censoring. We propose a discriminative measure for non-proportional hazard models as an unbiased version of Antolini’s time-dependent concordance. The convergence of our estimator is shown based on Nolan and Pollard’s results for U-statistics. We apply them to survival neural networks and random survival forests hyperparameter tuning.

## December 9th

### Probabilistic Volumetric Positioning of High Energy Photons in Monolithic Crystals for Positron Emission Tomography

*Viet Dao*

Abstract: In this work we employ simulations to calibrate the positioning algorithm and determine its error for a Lutetiumâ€“ Yttrium Oxyorthosilicate monolithic crystal (LYSO)-based position emission tomography (PET) scanner. Using the centre of gravity method, with a bias, to extract position of photoelectric interactions, we measure the standard deviation of estimated position (x, y and depth) from true position. Finally, we simulate a point source scan with the system. The resulting data are processed and reconstructed using Software for Tomographic Image Reconstruction (STIR) with depth encoding. We found that introducing bias decreases standard deviation for X, Y position within a monolithic crystal from â€ś2.62mmâ€ť to â€ś1.81mmâ€ť.