Umberto Picchini's Research

My research is about **statistical inference for stochastic modelling**, typically stochastic dynamical systems. That is, given a model
representing mathematically the time-evolution of a system subject to random *noise*, I am particularly interested in the estimation of the relevant unknown model-parameters using
available data.

Most of my research is focused on how to deal with models not admitting readily available analytic expressions to perform inference. Most typically, this means that the likelihood function is unavailable
in closed-form and/or it is computationally difficult to approximate.
Therefore, my main interest at the moment is to construct ways to deal with models having **intractable likelihoods**. This usually implies building probabilistic algorithms based on different types of Monte Carlo strategies.

I am particulary interested in inference for state-space models and stochastic differential equation (SDE) models with applications to biomedicine. More specifically some of my interests are: accelerating MCMC computations for Bayesian inference; mixed-effects modelling for SDEs; likelihood-based and Bayesian inference for SDEs; approximate Bayesian computation (ABC); stochastic modelling for tumor growth, protein folding and glycemia-insulinemia dynamics.

Finally, as a consequence of the need to apply modern (but demanding!) inferential methods for complex high-dimensional stochastic models, I am particularly fascinated by computationally challenging probabilistic methods (such as Markov chain Monte Carlo and Sequential Monte Carlo) and their efficient computer implementations.

I have received a research grant from the Swedish research council for the interdisciplinary project "Statistical Inference and Stochastic Modelling of Protein Folding" (here is an accessible description) for which I am the
principal investigator, in collaboration with Kresten Lindorff-Larsen (Dept. Biology, Copenhagen University) and Julie Lyng Forman (Dept. Biostatistics, Copenhagen University).

**Selected talks:** slides from recent and not-so-recent talks are available at my SlideShare account.

Below you can find short descriptions of some of the research areas and applications I have been involved with, together with links to the relevant publications (notice
that most of my preprints are freely available in my publications page):

- Approximate Bayesian computation (ABC) and other likelihood-free methods
- Approximate maximum likelihood estimation
- Modelling of protein-folding data
- Stochastic models for glycemia dynamics
- Mixed-effects models defined via SDEs

Approximate Bayesian computation (ABC) is a *likelihood-free* methodology that
is enjoying increasingly popularity as it provides a practical approach to perform inference
for models that, due to likelihood function intractability, would otherwise be
computationally too challenging to be considered (see a review by Sisson and Fan, 2010 and another review by Marin e al. (2011)). Ideally we wish to make inference about unknown parameters using Bayesian methods, i.e.
given some data we want to simulate from the posterior distribution on the parameters space. However for many complex models not only a closed form expression
for such posterior is unavailable, but also very general Markov chain Monte Carlo (MCMC)
methods such as Metropolis-Hastings may fail for a number of reasons, including e.g.
difficulties in exploring the parameter space, multimodality
in the posterior surface, difficulties in constructing adequate proposal densities.
For stochastic models of my interest the likelihood function is typically unavailable. ABC circumvents
the evaluation of the intractable likelihood function while still targeting the posterior distribution
or an approximation thereof. ABC methodology is fascinating and extremely flexible. Here I considered ABC for stochastic differential equation models observed with error.
The case of partially observed systems is also considered. Simulations for pharmacokinetics/pharmacodynamics and for stochastic chemical reactions studies
are presented. The `abc-sde` MATLAB package implementing the methodology is freely available.

In another paper with Julie Lyng Forman we used ABC for inference on a (relatively) large dataset and a computationally challenging sum-of-diffusions model for protein folding data.

In a joint work with Rachele Anderson we consider parameter estimation for a general class of models using an hybrid MLE-Bayesian strategy, ultimately leading to a maximum likelihood estimator
while making use of an ABC-MCMC sampler: this work is based on the strategy popularly known as "data cloning".

In a joint work with Adeline Samson we embed ABC within SAEM (stochastic approximation EM) for maximum likelihood estimation in state-space models (also known
as hidden Markov models).

**Relevant papers:** paper #1 paper #2
paper #3 paper #4 **Software:** my `abc-sde` package

"Likelihood-free" methods are of course also relevant outside the Bayesian paradigm considered above.

A popular methodology for models having some hidden (unobserved) component is the EM algorithm for maximum likelihood estimation. An important implementation of EM is a stochastic version called SAEM (where the E-step in EM is approximated).
While SAEM has nice algorithmic and theoretical properties, its application is restricted to models having an analytically tractable "complete likelihood" function. Also, SAEM requires from the user the analytic specification of sufficient statistics for such likelihood.
This is typically impossible (or at best difficult) for most models of realistic complexity.
In a single authored work I have enabled SAEM for complex intractable models, using the concept of *synthetic likelihood*. The resulting strategy is SAEM-SL, a "likelihood-free" version of SAEM (demo MATLAB code is available).

However, when the standard SAEM can be implemented (possibly after non-negligible effort) this can fail in some cases. For example, for the case of state-space models, with Adeline Samson we have shown how the popular *bootstrap filter* sequential Monte Carlo algorithm (which
is a way to provide SAEM with paths for the latent process) can make SAEM produce very biased inference in some cases. A simple modification to such filter, involving an approximate Bayesian computation strategy, is able to produce better paths in some specific circumstances. See the resulting SAEM-ABC method and a MATLAB demo.

**Relevant papers:** paper SAEM-SL paper SAEM-ABC **Software:** MATLAB demo for `SAEM-SL`; **Software: **MATLAB demo for `SAEM-ABC`

I have received a research grant from the Swedish research council for the interdisciplinary project "Statistical Inference and Stochastic Modelling of Protein Folding" (here is an accessible description) for which I am the
principal investigator, in collaboration with Kresten Lindorff-Larsen (Dept. Biology, Copenhagen University) and Julie Lyng Forman (Dept. Biostatistics, Copenhagen University).

In a joint work with Julie Forman we have considered the problem of estimating folding rates for some protein having a coordinate switching between the *folded* and *unfolded* state, which is noticeable in the picture above.
The so called "protein-folding problem" has been referred to as *"the Holy Grail of biochemistry and biophysics"* and therefore we are not contemplating to find a solution to this problem (!).
However some contribution from the inference point of view can be given and we have proposed a new dynamical model (expressed as sum of two diffusions) and a quite fast computational strategy based on Approximate Bayesian Computation (ABC, see above) that seems to work well and could be used in place of exact Bayesian inference,
when large datasets do not allow for the latter.

**Relevant papers:** paper 2014

In my early works, together with Andrea De Gaetano (Rome)
and Susanne Ditlevsen (Copenhagen), I considered the problem of formulating models able to accommodate stochastic variability in glycemia dynamics. Previous attempts in literature focussed on deterministic modelling (ODE and DDE based), which are intrinsically
unable to represent randomness in the modelled (physiological) system and thus the only random variability which could be contemplated had to be interpreted as measurement error.
By using stochastic differential equations this is no more the case. In a 2006 paper we have been able to consider likelihood-based inference via computer intensive simulated Monte Carlo and separate intrinsic stochasticity in glycemia dynamics from
measurement error variability. In order to ease the inference, a more computationally feasible model was proposed in a 2008 paper, where the likelihood function is approximated in closed form,
but this time measurement error is not modelized.

**Relevant papers:** JMB 2006 MMB 2008.

It is often the case that a given experiment involves repeated measurements, particularly in biomedicine, where the several replicates might be
measurements of the same experiment performed on different subjects or animals. Mixed-effects models assist in modelling variability when it is of interest
to "catch" the overall behaviour of the entire "population" of subjects, that is making simultaneous inference for the collective dynamics of all subjects and not the individual (subject-specific) behaviour, by introducing random parameters. This
allow for a more precise estimation of *population parameters*.
Since early '80s mixed-effects dynamical models have had a deterministic flavour, i.e. they were based on ODEs. More recently support for SDEs has been introduced, thus allowing
the simultaneous representation of *within-subject* stochastic variability in addition to collective (*between-subjects*) variation. Together with Susanne Ditlevsen
(Copenhagen) and Andrea De Gaetano (Rome) I have considered likelihood-based inferential methods for mixed-effects models defined via SDEs; see a methodological 2010 paper and a more
computational paper from 2011. See also an application to neuronal models.

With Julie Lyng Forman I have considered a "likelihood free" methodology named *synthetic likelihood* to estimate a model of tumor growth in mice.

**Relevant papers:** SJS 2010 CSDA 2011 NECO 2008
tumor 2016