Journal of Statistical Planning and Inference, Mar 1, 2019
We address the problem of parameter estimation for partially observed linear Ordinary Differentia... more We address the problem of parameter estimation for partially observed linear Ordinary Differential Equations. Estimation from time series with standard estimators can give misleading results because estimation is often ill-posed, or the models are misspecified. The addition of a forcing function u, that represents uncertainties in the original ODE, can overcome these problems as shown in [Clairon and Brunel, 2017]. A general regularized estimation procedure is derived, that corresponds to an Optimal Control Problem (OCP) solved by the Pontryagin Maximum Principle for nonlinear ODEs. Here, we focus on the linear case and solve the OCP with a computationally fast deterministic Kalman filter which allows weakening of conditions needed for √ n−consistency. A significant improvement is the avoidance of the estimation of initial conditions thanks to a profiling step. Consequently, we can deal with more elaborated penalties and also provide a profiled semiparametric estimation procedure in the case of timevarying parameters. Simulations and real data examples show that our approach is generally more accurate and more reliable than reference methods when the Fisher information matrix is badly-conditioned, with noticeable improvement in the case of model misspecification.
Journal of the American Statistical Association, Jun 5, 2018
Ordinary Differential Equations (ODE) are routinely calibrated on real data for estimating unknow... more Ordinary Differential Equations (ODE) are routinely calibrated on real data for estimating unknown parameters or for reverse-engineering. Nevertheless, standard statistical techniques can give disappointing results because of the complex relationship between parameters and states, that makes the corresponding estimation problem ill-posed. Moreover, ODE are mechanistic models that are prone to modeling errors, whose influences on inference are often neglected during statistical analysis. We propose a regularized estimation framework, called Tracking, that consists in adding a perturbation (L 2 function) to the original ODE. This perturbation facilitates data fitting and represents also possible model misspecifications, so that parameter estimation is done by solving a trade-off between data fidelity and model fidelity. We show that the underlying optimization problem is an optimal control problem, that can be solved by the Pontryagin Maximum Principle for general nonlinear and partially observed ODE. The same methodology can be used for the joint estimation of finite and time-varying parameters. We show, in the case of a well-specified parametric model, that our estimator is consistent and reaches the root-n rate. In addition, numerical experiments considering various sources of model misspecifications shows that Tracking still furnishes accurate estimates. Finally, we consider semiparametric estimation on both simulated data and on a real data example. Supplementary Materials.
Journal of the American Statistical Association, Jan 2, 2014
Dierential equations are commonly used to model dynamical deterministic systems in applications. ... more Dierential equations are commonly used to model dynamical deterministic systems in applications. When statistical parameter estimation is required to calibrate theoretical models to data, classical statistical estimators are often confronted to complex and potentially ill-posed optimization problem. As a consequence, alternative estimators to classical parametric estimators are needed for obtaining reliable estimates. We propose a gradient matching approach for the estimation of parametric Ordinary Dierential Equations observed with noise. Starting from a nonparametric proxy of a true solution of the ODE, we build a parametric estimator based on a variational characterization of the solution. As a Generalized Moment Estimator, our estimator must satisfy a set of orthogonal conditions that are solved in the least squares sense. Despite the use of a nonparametric estimator, we prove the root-n consistency and asymptotic normality of the Orthogonal Conditions estimator. We can derive condence sets thanks to a closed-form expression for the asymptotic variance. Finally, the OC estimator is compared to classical estimators in several (simulated and real) experiments and ODE models in order to show its versatility and relevance with respect to classical Gradient Matching and Nonlinear Least Squares estimators. In particular, we show on a real dataset of inuenza infection that the approach gives reliable estimates. Moreover, we show that our approach can deal directly with more elaborated models such as Delay Dierential Equation (DDE).
HAL (Le Centre pour la Communication Scientifique Directe), Oct 23, 2014
We recall some notations introduced in the core paper. The solution of the ODĖ x(t) = A θ (t)x(t)... more We recall some notations introduced in the core paper. The solution of the ODĖ x(t) = A θ (t)x(t) + r θ (t) (1.1) θ (t) E g θ (T) = −Q 1 (1.5) Using linear quadratic theory, we know that the matrix (1.4) is associated to the minimal cost C(u, z 0 , θ, λ) = z T 0 Q 1 z 0 +ˆT 0 Z z0,u (t) T W 1 (t)Z z0,u (t)dt + λˆT 0 u(t) 2 2 dt (1.6) and the ODE:Ż z0,u = A 1 (t)Z z0,u + B 1 u(t) Z z0,u (0) = z 0
Motivation: Statistical inference of biological networks such as gene regulatory networks, signal... more Motivation: Statistical inference of biological networks such as gene regulatory networks, signaling pathways and metabolic networks can contribute to build a picture of complex interactions that take place in the cell. However, biological systems considered as dynamical, non-linear and generally partially observed processes may be difficult to estimate even if the structure of interactions is given. Results: Using the same approach as Sitz et al. proposed in another context, we derive non-linear state-space models from ODEs describing biological networks. In this framework, we apply Unscented Kalman Filtering (UKF) to the estimation of both parameters and hidden variables of non-linear state-space models. We instantiate the method on a transcriptional regulatory model based on Hill kinetics and a signaling pathway model based on mass action kinetics. We successfully use synthetic data and experimental data to test our approach. Conclusion: This approach covers a large set of biological networks models and gives rise to simple and fast estimation algorithms. Moreover, the Bayesian tool used here directly provides uncertainty estimates on parameters and hidden states. Let us also emphasize that it can be coupled with structure inference methods used in Graphical Probabilistic Models.
Shapley Values (SV) are widely used in explainable AI, but their estimation and interpretation ca... more Shapley Values (SV) are widely used in explainable AI, but their estimation and interpretation can be challenging, leading to inaccurate inferences and explanations. As a starting point, we remind an invariance principle for SV and derive the correct approach for computing the SV of categorical variables that are particularly sensitive to the encoding used. In the case of tree-based models, we introduce two estimators of Shapley Values that exploit the tree structure efficiently and are more accurate than state-of-the-art methods. Simulations and comparisons are performed with state-of-the-art algorithms and show the practical gain of our approach. Finally, we discuss the limitations of Shapley Values as a local explanation. These methods are available as a Python package.
The analysis of curves has been routinely dealt with using tools from functional data analysis. H... more The analysis of curves has been routinely dealt with using tools from functional data analysis. However its extension to multi-dimensional curves poses a new challenge due to its inherent geometric features that are difficult to capture with the classical approaches that rely on linear approximations. We develop an alternative characterization of a mean that reflects shape variation of the curves. Based on a geometric representation of the curves through the Frenet-Serret ordinary differential equations, we introduce a new definition of mean curvature and mean torsion, as well as mean shape through the notion of mean vector field. This new formulation of the mean for multi-dimensional curves allows us to integrate the parameters for the shape features into the unified functional data modelling framework. We formulate the estimation problem of the functional parameters in a penalized regression and develop an efficient algorithm. We demonstrate our approach with both simulated data and real data examples.
Although Shapley Values (SV) are widely used in explainable AI, they can be poorly understood and... more Although Shapley Values (SV) are widely used in explainable AI, they can be poorly understood and estimated, which implies that their analysis may lead to spurious inferences and explanations. As a starting point, we remind an invariance principle for SV and derive the correct approach for computing the SV of categorical variables that are particularly sensitive to the encoding used. In the case of tree-based models, we introduce two estimators of Shapley Values that exploit efficiently the tree structure and are more accurate than state-of-the-art methods. For interpreting additive explanations, we recommend to filter the non-influential variables and to compute the Shapley Values only for groups of influential variables. For this purpose, we use the concept of "Same Decision Probability" (SDP) that evaluates the robustness of a prediction when some variables are missing. This prior selection procedure produces sparse additive explanations easier to visualize and analyse....
Ordinary Differential Equations are widespread tools to model chemical, physical, biological proc... more Ordinary Differential Equations are widespread tools to model chemical, physical, biological process but they usually rely on parameters which are of critical importance in terms of dynamic and need to be estimated directly from the data. Classical statistical approaches (nonlinear least squares, maximum likelihood estimator) can give unsatisfactory results because of computational difficulties and ill-posedness of the statistical problem. New estimation methods that use some nonparametric devices have been proposed to circumvent these issues. We present a new estimator that shares properties with Two-Step estimator and Generalized Smoothing (introduced by Ramsay et al. [34]). We introduce a perturbed model and we use optimal control theory for constructing a criterion that aims at minimizing the discrepancy with data and the model. Here, we focus on the case of linear Ordinary Differential Equations as our criterion has a closed-form expression that permits a detailed analysis. Our approach avoids the use of a nonparametric estimator of the derivative, which is one of the main cause of inaccuracy in Two-Step estimators. Moreover, we take into account model discrepancy and our estimator is more robust to model misspecification than classical methods. The discrepancy with the parametric ODE model correspond to the minimum perturbation (or control) to apply to the initial model. Its qualitative analysis can be informative for misspecification diagnosis. In the case of well-specified model, we show the consistency of our estimator and that we reach the parametric √ n− rate when regression splines are used in the first step.
The key of tip growth in eukaryotes is the polarized distribution on plasma membrane of a particl... more The key of tip growth in eukaryotes is the polarized distribution on plasma membrane of a particle named ROP1. This distribution is the result of a positive feedback loop, whose mechanism can be described by a Differential Equation parametrized by two meaningful parameters kpf and knf . We introduce a mechanistic Integro-Differential Equation (IDE) derived from a spatiotemporal model of cell polarity and we show how this model can be fitted to real data, i.e., ROP1 intensities measured on pollen tubes. At first, we provide an existence and uniqueness result for the solution of our IDE model under certain conditions. Interestingly, this analysis gives a tractable expression for the likelihood, and our approach can be seen as the estimation of a constrained nonlinear model. Moreover, we introduce a population variability by a constrained nonlinear mixed model. We then propose a constrained Least Squares method to fit the model for the single pollen tube case, and two methods, constrai...
Electronic Journal of Applied Statistical Analysis, 2018
In the same way that most of the robots and advanced mobile machines are designed to optimize the... more In the same way that most of the robots and advanced mobile machines are designed to optimize their energy consumption or the smoothness of their motions, it has been demonstrated that competitive runners tend to exhibit smoother strides than recreational runners during running and fast walking. Here, we describe the statistical mechanics of Humans trying to self-pace a constant acceleration, by studying the statistical properties of the accelerations of the runner's center of mass. Furthermore, it has been checked that this could be even achieved in a state of fatigue during exhaustive 3 self-pace ramp runs. For that purpose, we analyse a small sample of 3 male and 2 female middle-aged, recreational runners ran, in random order, three exhaustive self-paced acceleration trials (SAT) perceived to be "soft", "medium" or "hard". A statistical analysis shows that Humans can be able to self-pace constant accelerationin some exhaustive runs, by continuo...
We use mixture of percentile functions to model credit spread evolution, which allows to obtain a... more We use mixture of percentile functions to model credit spread evolution, which allows to obtain a flexible description of credit indices and their components at the same time. We show regularity results in order to extend mixture percentile to the dynamic case. We characterise the stochastic differential equation of the flow of cumulative distribution function and we link it with the ordered list of the components of the credit index. The main application is to introduce a functional version of Bollinger bands. The crossing of bands by the spread is associated with a trading signal. Finally, we show the richness of the signals produced by functional Bollinger bands compared with standard one with a pratical example.
Physica A: Statistical Mechanics and its Applications, 2018
Although it has been experimentally reported that speed variations is the optimal way of optimizi... more Although it has been experimentally reported that speed variations is the optimal way of optimizing his pace for achieving a given distance in a minimal time, we still do not know what the optimal speed variations (i.e accelerations) are. At first, we have to check the hypothesis that human is able to accurately self-pacing its acceleration and this even in a state of fatigue during exhaustive self-pacing ramp runs. For that purpose, 3 males and 2 females middle-aged, recreational runners ran, in random order, three exhaustive acceleration trials. We instructed the five runners to perform three self-paced acceleration trials based on three acceleration intensity levels: "soft", "medium" and "hard". We chose a descriptive modeling approach to analyse the behaviour of the runners. Once we knew that
Ordinary Differential Equations are widespread tools to model chemical, physical, biological proc... more Ordinary Differential Equations are widespread tools to model chemical, physical, biological process but they usually rely on parameters which are of critical importance in terms of dynamic and need to be estimated directly from the data. Classical statistical approaches (nonlinear least squares, maximum likelihood estimator) can give unsatisfactory results because of computational difficulties and ill-posed statistical problem. New estimation methods that use some nonparametric devices have been proposed to circumvent these issues. We present a new estimator that shares properties with Two-Step estimators and Generalized Smoothing (introduced by Ramsay et al. [37]). Our estimation method relies on a relaxation and penalization scheme to regularize the inverse problem. We introduce a perturbed model and we use optimal control theory for constructing a criterion that aims at minimizing the discrepancy between data and the original model. Here, we focus on the case of linear Ordinary Differential Equations as our criterion has a closed-form expression that permits a detailed analysis. Our approach avoids the use of a nonparametric estimator of the derivative, which is one of the main causes of inaccuracy in Two-Step estimators. Regarding the theoretical asymptotic behavior of our estimator, we show its consistency and that we reach the parametric √ n-rate when regression splines are used in the first step. We consider the estimation of two models possessing sloppy parameters, which usually makes the estimation of ODE models an ill-posed problem in applications [20, 41] and shows the efficiency of the Tracking estimator. Quite interestingly, our relaxation scheme makes the estimator robust to some kind of model misspecification, as shown in simulations.
To explain the decision of any regression and classification model, we extend the notion of proba... more To explain the decision of any regression and classification model, we extend the notion of probabilistic sufficient explanations (P-SE). For each instance, this approach selects the minimal subset of features that is sufficient to yield the same prediction with high probability, while removing other features. The crux of P-SE is to compute the conditional probability of maintaining the same prediction. Therefore, we introduce an accurate and fast estimator of this probability via random Forests for any data (X, Y) and show its efficiency through a theoretical analysis of its consistency. As a consequence, we extend the P-SE to regression problems. In addition, we deal with non-discrete features, without learning the distribution of X nor having the model for making predictions. Finally, we introduce local rule-based explanations for regression/classification based on the P-SE and compare our approaches w.r.t other explainable AI methods. These methods are available as a Python package 1 .
Variations of the curves and trajectories in 1D can be analysed efficiently with functional data ... more Variations of the curves and trajectories in 1D can be analysed efficiently with functional data analysis tools. The main sources of variations in 1D curves have been identified as amplitude and phase variations. Dealing with the latter gives rise to the problem of curve alignment and registration problems. It has been recognised that it is important to incorporate geometric features of the curves in developing statistical approaches to address such problems. Extending these techniques to multidimensional curves is not obvious, as the notion of multidimensional amplitude can be defined in multiple ways. We propose a framework to deal with the curve alignment in multidimensional curves as 3D objects. In particular, we propose a new distance between the curves that utilises the geometric information of the curves through the Frenet-Serret representation of the curves. This can be viewed as a generalisation of the elastic shape analysis based on the square root velocity framework. We develop an efficient computational algorithm to find an optimal alignment based on the proposed distance using dynamic programming.
While Shapley Values (SV) are one of the gold standard for interpreting machine learning models, ... more While Shapley Values (SV) are one of the gold standard for interpreting machine learning models, we show that they are still poorly understood, in particular in the presence of categorical variables or of variables of low importance. For instance, we show that the popular practice that consists in summing the SV of dummy variables is false as it provides wrong estimates of all the SV in the model and implies spurious interpretations. Based on the identification of null and active coalitions, and a coalitional version of the SV, we provide a correct computation and inference of important variables. Moreover, a Python library1 that computes reliably conditional expectations and SV for tree-based models, is implemented and compared with state-of-the-art algorithms on toy models and real data sets.
We propose a multivariate signal model with temporal and spectral dependence, well-fitted for the... more We propose a multivariate signal model with temporal and spectral dependence, well-fitted for the modeling of Radar signals. The proposed model is a Hidden Markov Chain for which observations are Spherically Invariant Random Vectors (SIRV), and temporal correlation is described by a copula. It is still possible to estimate the parameters of the SIRV, and we study the robustness of the estimation under different kinds of copulas and various strength of dependence. Finally, we explore the influence of an omitted dependence for statistical segmentation of Radar signals based on Hidden Markov Chains.
Variations of the curves and trajectories in 1D can be analysed efficiently with functional data ... more Variations of the curves and trajectories in 1D can be analysed efficiently with functional data analysis tools. In particular, the main sources of variations in 1D curves have been identified as amplitude and phase variations. Dealing with the latter gives rise to the problem of curve alignment and registration problems. It has been recognised that it is important to incorporate geometric features of the curves in developing statistical approaches to address such problems. Extending these techniques to multidimensional curves is not obvious, as the notion of multidimensional amplitude can be defined in multiple ways. We propose a framework to deal with the curve alignment in multidimensional curves as 3D objects. We propose a new distance between the curves that utilises the geometric information of the curves through the Frenet-Serret representation of the curves. This can be viewed as a generalisation of the elastic shape analysis based on the square root velocity framework. We develop an efficient computational algorithm to find an optimal alignment based on the proposed distance using dynamic programming.
Journal of Statistical Planning and Inference, Mar 1, 2019
We address the problem of parameter estimation for partially observed linear Ordinary Differentia... more We address the problem of parameter estimation for partially observed linear Ordinary Differential Equations. Estimation from time series with standard estimators can give misleading results because estimation is often ill-posed, or the models are misspecified. The addition of a forcing function u, that represents uncertainties in the original ODE, can overcome these problems as shown in [Clairon and Brunel, 2017]. A general regularized estimation procedure is derived, that corresponds to an Optimal Control Problem (OCP) solved by the Pontryagin Maximum Principle for nonlinear ODEs. Here, we focus on the linear case and solve the OCP with a computationally fast deterministic Kalman filter which allows weakening of conditions needed for √ n−consistency. A significant improvement is the avoidance of the estimation of initial conditions thanks to a profiling step. Consequently, we can deal with more elaborated penalties and also provide a profiled semiparametric estimation procedure in the case of timevarying parameters. Simulations and real data examples show that our approach is generally more accurate and more reliable than reference methods when the Fisher information matrix is badly-conditioned, with noticeable improvement in the case of model misspecification.
Journal of the American Statistical Association, Jun 5, 2018
Ordinary Differential Equations (ODE) are routinely calibrated on real data for estimating unknow... more Ordinary Differential Equations (ODE) are routinely calibrated on real data for estimating unknown parameters or for reverse-engineering. Nevertheless, standard statistical techniques can give disappointing results because of the complex relationship between parameters and states, that makes the corresponding estimation problem ill-posed. Moreover, ODE are mechanistic models that are prone to modeling errors, whose influences on inference are often neglected during statistical analysis. We propose a regularized estimation framework, called Tracking, that consists in adding a perturbation (L 2 function) to the original ODE. This perturbation facilitates data fitting and represents also possible model misspecifications, so that parameter estimation is done by solving a trade-off between data fidelity and model fidelity. We show that the underlying optimization problem is an optimal control problem, that can be solved by the Pontryagin Maximum Principle for general nonlinear and partially observed ODE. The same methodology can be used for the joint estimation of finite and time-varying parameters. We show, in the case of a well-specified parametric model, that our estimator is consistent and reaches the root-n rate. In addition, numerical experiments considering various sources of model misspecifications shows that Tracking still furnishes accurate estimates. Finally, we consider semiparametric estimation on both simulated data and on a real data example. Supplementary Materials.
Journal of the American Statistical Association, Jan 2, 2014
Dierential equations are commonly used to model dynamical deterministic systems in applications. ... more Dierential equations are commonly used to model dynamical deterministic systems in applications. When statistical parameter estimation is required to calibrate theoretical models to data, classical statistical estimators are often confronted to complex and potentially ill-posed optimization problem. As a consequence, alternative estimators to classical parametric estimators are needed for obtaining reliable estimates. We propose a gradient matching approach for the estimation of parametric Ordinary Dierential Equations observed with noise. Starting from a nonparametric proxy of a true solution of the ODE, we build a parametric estimator based on a variational characterization of the solution. As a Generalized Moment Estimator, our estimator must satisfy a set of orthogonal conditions that are solved in the least squares sense. Despite the use of a nonparametric estimator, we prove the root-n consistency and asymptotic normality of the Orthogonal Conditions estimator. We can derive condence sets thanks to a closed-form expression for the asymptotic variance. Finally, the OC estimator is compared to classical estimators in several (simulated and real) experiments and ODE models in order to show its versatility and relevance with respect to classical Gradient Matching and Nonlinear Least Squares estimators. In particular, we show on a real dataset of inuenza infection that the approach gives reliable estimates. Moreover, we show that our approach can deal directly with more elaborated models such as Delay Dierential Equation (DDE).
HAL (Le Centre pour la Communication Scientifique Directe), Oct 23, 2014
We recall some notations introduced in the core paper. The solution of the ODĖ x(t) = A θ (t)x(t)... more We recall some notations introduced in the core paper. The solution of the ODĖ x(t) = A θ (t)x(t) + r θ (t) (1.1) θ (t) E g θ (T) = −Q 1 (1.5) Using linear quadratic theory, we know that the matrix (1.4) is associated to the minimal cost C(u, z 0 , θ, λ) = z T 0 Q 1 z 0 +ˆT 0 Z z0,u (t) T W 1 (t)Z z0,u (t)dt + λˆT 0 u(t) 2 2 dt (1.6) and the ODE:Ż z0,u = A 1 (t)Z z0,u + B 1 u(t) Z z0,u (0) = z 0
Motivation: Statistical inference of biological networks such as gene regulatory networks, signal... more Motivation: Statistical inference of biological networks such as gene regulatory networks, signaling pathways and metabolic networks can contribute to build a picture of complex interactions that take place in the cell. However, biological systems considered as dynamical, non-linear and generally partially observed processes may be difficult to estimate even if the structure of interactions is given. Results: Using the same approach as Sitz et al. proposed in another context, we derive non-linear state-space models from ODEs describing biological networks. In this framework, we apply Unscented Kalman Filtering (UKF) to the estimation of both parameters and hidden variables of non-linear state-space models. We instantiate the method on a transcriptional regulatory model based on Hill kinetics and a signaling pathway model based on mass action kinetics. We successfully use synthetic data and experimental data to test our approach. Conclusion: This approach covers a large set of biological networks models and gives rise to simple and fast estimation algorithms. Moreover, the Bayesian tool used here directly provides uncertainty estimates on parameters and hidden states. Let us also emphasize that it can be coupled with structure inference methods used in Graphical Probabilistic Models.
Shapley Values (SV) are widely used in explainable AI, but their estimation and interpretation ca... more Shapley Values (SV) are widely used in explainable AI, but their estimation and interpretation can be challenging, leading to inaccurate inferences and explanations. As a starting point, we remind an invariance principle for SV and derive the correct approach for computing the SV of categorical variables that are particularly sensitive to the encoding used. In the case of tree-based models, we introduce two estimators of Shapley Values that exploit the tree structure efficiently and are more accurate than state-of-the-art methods. Simulations and comparisons are performed with state-of-the-art algorithms and show the practical gain of our approach. Finally, we discuss the limitations of Shapley Values as a local explanation. These methods are available as a Python package.
The analysis of curves has been routinely dealt with using tools from functional data analysis. H... more The analysis of curves has been routinely dealt with using tools from functional data analysis. However its extension to multi-dimensional curves poses a new challenge due to its inherent geometric features that are difficult to capture with the classical approaches that rely on linear approximations. We develop an alternative characterization of a mean that reflects shape variation of the curves. Based on a geometric representation of the curves through the Frenet-Serret ordinary differential equations, we introduce a new definition of mean curvature and mean torsion, as well as mean shape through the notion of mean vector field. This new formulation of the mean for multi-dimensional curves allows us to integrate the parameters for the shape features into the unified functional data modelling framework. We formulate the estimation problem of the functional parameters in a penalized regression and develop an efficient algorithm. We demonstrate our approach with both simulated data and real data examples.
Although Shapley Values (SV) are widely used in explainable AI, they can be poorly understood and... more Although Shapley Values (SV) are widely used in explainable AI, they can be poorly understood and estimated, which implies that their analysis may lead to spurious inferences and explanations. As a starting point, we remind an invariance principle for SV and derive the correct approach for computing the SV of categorical variables that are particularly sensitive to the encoding used. In the case of tree-based models, we introduce two estimators of Shapley Values that exploit efficiently the tree structure and are more accurate than state-of-the-art methods. For interpreting additive explanations, we recommend to filter the non-influential variables and to compute the Shapley Values only for groups of influential variables. For this purpose, we use the concept of "Same Decision Probability" (SDP) that evaluates the robustness of a prediction when some variables are missing. This prior selection procedure produces sparse additive explanations easier to visualize and analyse....
Ordinary Differential Equations are widespread tools to model chemical, physical, biological proc... more Ordinary Differential Equations are widespread tools to model chemical, physical, biological process but they usually rely on parameters which are of critical importance in terms of dynamic and need to be estimated directly from the data. Classical statistical approaches (nonlinear least squares, maximum likelihood estimator) can give unsatisfactory results because of computational difficulties and ill-posedness of the statistical problem. New estimation methods that use some nonparametric devices have been proposed to circumvent these issues. We present a new estimator that shares properties with Two-Step estimator and Generalized Smoothing (introduced by Ramsay et al. [34]). We introduce a perturbed model and we use optimal control theory for constructing a criterion that aims at minimizing the discrepancy with data and the model. Here, we focus on the case of linear Ordinary Differential Equations as our criterion has a closed-form expression that permits a detailed analysis. Our approach avoids the use of a nonparametric estimator of the derivative, which is one of the main cause of inaccuracy in Two-Step estimators. Moreover, we take into account model discrepancy and our estimator is more robust to model misspecification than classical methods. The discrepancy with the parametric ODE model correspond to the minimum perturbation (or control) to apply to the initial model. Its qualitative analysis can be informative for misspecification diagnosis. In the case of well-specified model, we show the consistency of our estimator and that we reach the parametric √ n− rate when regression splines are used in the first step.
The key of tip growth in eukaryotes is the polarized distribution on plasma membrane of a particl... more The key of tip growth in eukaryotes is the polarized distribution on plasma membrane of a particle named ROP1. This distribution is the result of a positive feedback loop, whose mechanism can be described by a Differential Equation parametrized by two meaningful parameters kpf and knf . We introduce a mechanistic Integro-Differential Equation (IDE) derived from a spatiotemporal model of cell polarity and we show how this model can be fitted to real data, i.e., ROP1 intensities measured on pollen tubes. At first, we provide an existence and uniqueness result for the solution of our IDE model under certain conditions. Interestingly, this analysis gives a tractable expression for the likelihood, and our approach can be seen as the estimation of a constrained nonlinear model. Moreover, we introduce a population variability by a constrained nonlinear mixed model. We then propose a constrained Least Squares method to fit the model for the single pollen tube case, and two methods, constrai...
Electronic Journal of Applied Statistical Analysis, 2018
In the same way that most of the robots and advanced mobile machines are designed to optimize the... more In the same way that most of the robots and advanced mobile machines are designed to optimize their energy consumption or the smoothness of their motions, it has been demonstrated that competitive runners tend to exhibit smoother strides than recreational runners during running and fast walking. Here, we describe the statistical mechanics of Humans trying to self-pace a constant acceleration, by studying the statistical properties of the accelerations of the runner's center of mass. Furthermore, it has been checked that this could be even achieved in a state of fatigue during exhaustive 3 self-pace ramp runs. For that purpose, we analyse a small sample of 3 male and 2 female middle-aged, recreational runners ran, in random order, three exhaustive self-paced acceleration trials (SAT) perceived to be "soft", "medium" or "hard". A statistical analysis shows that Humans can be able to self-pace constant accelerationin some exhaustive runs, by continuo...
We use mixture of percentile functions to model credit spread evolution, which allows to obtain a... more We use mixture of percentile functions to model credit spread evolution, which allows to obtain a flexible description of credit indices and their components at the same time. We show regularity results in order to extend mixture percentile to the dynamic case. We characterise the stochastic differential equation of the flow of cumulative distribution function and we link it with the ordered list of the components of the credit index. The main application is to introduce a functional version of Bollinger bands. The crossing of bands by the spread is associated with a trading signal. Finally, we show the richness of the signals produced by functional Bollinger bands compared with standard one with a pratical example.
Physica A: Statistical Mechanics and its Applications, 2018
Although it has been experimentally reported that speed variations is the optimal way of optimizi... more Although it has been experimentally reported that speed variations is the optimal way of optimizing his pace for achieving a given distance in a minimal time, we still do not know what the optimal speed variations (i.e accelerations) are. At first, we have to check the hypothesis that human is able to accurately self-pacing its acceleration and this even in a state of fatigue during exhaustive self-pacing ramp runs. For that purpose, 3 males and 2 females middle-aged, recreational runners ran, in random order, three exhaustive acceleration trials. We instructed the five runners to perform three self-paced acceleration trials based on three acceleration intensity levels: "soft", "medium" and "hard". We chose a descriptive modeling approach to analyse the behaviour of the runners. Once we knew that
Ordinary Differential Equations are widespread tools to model chemical, physical, biological proc... more Ordinary Differential Equations are widespread tools to model chemical, physical, biological process but they usually rely on parameters which are of critical importance in terms of dynamic and need to be estimated directly from the data. Classical statistical approaches (nonlinear least squares, maximum likelihood estimator) can give unsatisfactory results because of computational difficulties and ill-posed statistical problem. New estimation methods that use some nonparametric devices have been proposed to circumvent these issues. We present a new estimator that shares properties with Two-Step estimators and Generalized Smoothing (introduced by Ramsay et al. [37]). Our estimation method relies on a relaxation and penalization scheme to regularize the inverse problem. We introduce a perturbed model and we use optimal control theory for constructing a criterion that aims at minimizing the discrepancy between data and the original model. Here, we focus on the case of linear Ordinary Differential Equations as our criterion has a closed-form expression that permits a detailed analysis. Our approach avoids the use of a nonparametric estimator of the derivative, which is one of the main causes of inaccuracy in Two-Step estimators. Regarding the theoretical asymptotic behavior of our estimator, we show its consistency and that we reach the parametric √ n-rate when regression splines are used in the first step. We consider the estimation of two models possessing sloppy parameters, which usually makes the estimation of ODE models an ill-posed problem in applications [20, 41] and shows the efficiency of the Tracking estimator. Quite interestingly, our relaxation scheme makes the estimator robust to some kind of model misspecification, as shown in simulations.
To explain the decision of any regression and classification model, we extend the notion of proba... more To explain the decision of any regression and classification model, we extend the notion of probabilistic sufficient explanations (P-SE). For each instance, this approach selects the minimal subset of features that is sufficient to yield the same prediction with high probability, while removing other features. The crux of P-SE is to compute the conditional probability of maintaining the same prediction. Therefore, we introduce an accurate and fast estimator of this probability via random Forests for any data (X, Y) and show its efficiency through a theoretical analysis of its consistency. As a consequence, we extend the P-SE to regression problems. In addition, we deal with non-discrete features, without learning the distribution of X nor having the model for making predictions. Finally, we introduce local rule-based explanations for regression/classification based on the P-SE and compare our approaches w.r.t other explainable AI methods. These methods are available as a Python package 1 .
Variations of the curves and trajectories in 1D can be analysed efficiently with functional data ... more Variations of the curves and trajectories in 1D can be analysed efficiently with functional data analysis tools. The main sources of variations in 1D curves have been identified as amplitude and phase variations. Dealing with the latter gives rise to the problem of curve alignment and registration problems. It has been recognised that it is important to incorporate geometric features of the curves in developing statistical approaches to address such problems. Extending these techniques to multidimensional curves is not obvious, as the notion of multidimensional amplitude can be defined in multiple ways. We propose a framework to deal with the curve alignment in multidimensional curves as 3D objects. In particular, we propose a new distance between the curves that utilises the geometric information of the curves through the Frenet-Serret representation of the curves. This can be viewed as a generalisation of the elastic shape analysis based on the square root velocity framework. We develop an efficient computational algorithm to find an optimal alignment based on the proposed distance using dynamic programming.
While Shapley Values (SV) are one of the gold standard for interpreting machine learning models, ... more While Shapley Values (SV) are one of the gold standard for interpreting machine learning models, we show that they are still poorly understood, in particular in the presence of categorical variables or of variables of low importance. For instance, we show that the popular practice that consists in summing the SV of dummy variables is false as it provides wrong estimates of all the SV in the model and implies spurious interpretations. Based on the identification of null and active coalitions, and a coalitional version of the SV, we provide a correct computation and inference of important variables. Moreover, a Python library1 that computes reliably conditional expectations and SV for tree-based models, is implemented and compared with state-of-the-art algorithms on toy models and real data sets.
We propose a multivariate signal model with temporal and spectral dependence, well-fitted for the... more We propose a multivariate signal model with temporal and spectral dependence, well-fitted for the modeling of Radar signals. The proposed model is a Hidden Markov Chain for which observations are Spherically Invariant Random Vectors (SIRV), and temporal correlation is described by a copula. It is still possible to estimate the parameters of the SIRV, and we study the robustness of the estimation under different kinds of copulas and various strength of dependence. Finally, we explore the influence of an omitted dependence for statistical segmentation of Radar signals based on Hidden Markov Chains.
Variations of the curves and trajectories in 1D can be analysed efficiently with functional data ... more Variations of the curves and trajectories in 1D can be analysed efficiently with functional data analysis tools. In particular, the main sources of variations in 1D curves have been identified as amplitude and phase variations. Dealing with the latter gives rise to the problem of curve alignment and registration problems. It has been recognised that it is important to incorporate geometric features of the curves in developing statistical approaches to address such problems. Extending these techniques to multidimensional curves is not obvious, as the notion of multidimensional amplitude can be defined in multiple ways. We propose a framework to deal with the curve alignment in multidimensional curves as 3D objects. We propose a new distance between the curves that utilises the geometric information of the curves through the Frenet-Serret representation of the curves. This can be viewed as a generalisation of the elastic shape analysis based on the square root velocity framework. We develop an efficient computational algorithm to find an optimal alignment based on the proposed distance using dynamic programming.
Uploads
Papers by Nicolas Brunel