Presenter

Dmitrii Usynin, Technical University of Munich
I am a PhD student at the Joint Academy of Doctoral Studies (JADS) launched between Imperial College London and Technical University of Munich. My research interests lie on the intersection of collaborative machine learning (CML) and trustworthy artificial intelligence (TAI). In particular, I am interested in topics such as privacy-preserving machine learning (PPML), attacks on CML, adversarial robustness, federated learning and memorisation in ML. Additionally, I am interested in applications of my research in the domain of collaborative biomedical imaging. Some of my recent works include gradient-based model inversion attacks on collaboratively trained computer vision models (ACM TOPS 2023), low-cost empirical defences against privacy adversaries (PoPETS 2022), a framework for trustworthy collaborative medical image analysis (Nature Machine Intelligence 2021) and an overview of the current state of PPML and attacks on CML (Nature Machine Intelligence 2021). Outside of my PhD, I am an Investment Partner at CreatorFund, leading early-stage deep tech investment in Europe. Previously I was also a privacy researcher at OpenMined, working on federated learning and differential privacy in healthcare. And outside of all that I am a rower and a WSET-certified expert in beer.
Abstract:
Obtaining high-quality data to train well-generalisable machine learning models can be a challenging task due to A) regulatory concerns and B) a lack of data owner incentives to participate. The first issue can be addressed through the combination of distributed machine learning techniques (e.g. federated learning) and privacy enhancing technologies (PET), such as the differentially private (DP) model training. The second challenge can be addressed by rewarding the participants for giving access to data which is beneficial to the training model, which is of particular importance in federated settings, where the data is unevenly distributed. However, many PETs which make such collaborations compliant with the data protection regulations can inadvertently affect the fairness of the reward distribution. Taking DP as a practical example we see that randomised noise can adversely affect the underrepresented and the atypical (yet often informative) data samples, making it difficult to assess their usefulness for the final model, potentially reducing the monetary incentives among the underrepresented subgroups. In order to resolve this problem we need to answer the following questions a) why do we even need PETs in the first place, b) how can we apply PETs in order to make large-scale ML regulation-compliant and c) what adaptations are needed in order to make reward allocation more meaningful in such settings.
Problem to be solved:
Standardised, verified and versatile open-source frameworks to connect individual methods for trustworthy ML into unified pipelines. Currently each research lab has their own benchmarks, libraries and custom connectors making collaboration and verification of previous results significantly more challenging.