Papers by Maurizio Carpita
Stochastic environmental research and risk assessment, Jun 25, 2024
Time series of traffic flows, extracted from mobile phone origin-destination data, are employed f... more Time series of traffic flows, extracted from mobile phone origin-destination data, are employed for monitoring people crowding and mobility in areas subject to flooding risk. By applying a vector autoregressive model with exogenous covariates combined with dynamic harmonic regression to such time series, we detected the presence of many extreme events in the residuals, which exhibit heavy-tailed distribution. For this reason, we propose a time series clustering procedure based on tail dependence which is suitable for data characterized by a spatial dimension, since objects' geographical proximity is taken into account. The final aim is to obtain clusters of areas characterized by the common tendency to the manifestation of extreme events, that in this case study are represented by extremely high incoming traffic flows. The proposed method is applied to the Mandolossa, a strongly urbanized area located on the western outskirts of Brescia (northern Italy) which is subject to frequent flooding.
Springer proceedings in mathematics & statistics, 2022
In 2015, the Province of Trento launched a multi-annual programme to assess the competence in Eng... more In 2015, the Province of Trento launched a multi-annual programme to assess the competence in English and German of the students of the provincial schools, entrusting the implementation to IPRASE, Provincial Institute for Research and Educational Experimentation. This article presents the main results of the TLT survey – Trentino Language Testing 2018. After a rst part dedicated to the presentation of the characteristics of the tests, then the design modalities and the sampling plans are introduced. Then the third part of the paper focuses on psychometric analysis and presentation of standardized test results obtained by Rasch model estimation
Computational Statistics, Oct 20, 2022
Advances in data analysis and classification, Feb 17, 2016
We extend the simple linear measurement error model through the inclusion of a composite indicato... more We extend the simple linear measurement error model through the inclusion of a composite indicator by using the generalized maximum entropy estimator. A Monte Carlo simulation study is proposed for comparing the performances of the proposed estimator to his counterpart the ordinary least squares “Adjusted for attenuation”. The two estimators are compared in term of correlation with the true latent variable, standard error and root mean of squared error. Two illustrative case studies are reported in order to discuss the results obtained on the real data set, and relate them to the conclusions drawn via simulation study.
Statistical Analysis and Data Mining, May 18, 2023
<p>Maps of flooding risk and exposure generally assume people and vehicles ... more <p>Maps of flooding risk and exposure generally assume people and vehicles density constant over time, although this is not the case in the real world, as crowding is a highly dynamic process in urban areas. Monitoring and forecasting people mobility is a relevant aspect for metropolitan areas subjected to high risk of flooding. Information and communication technologies (ICT) along with big data are massively used, e.g., to support the optimization of traffic flows and the study of urban systems. In particular, mobile phone network data suits with the aim of producing dynamic information on people's movements that can be used to develop dynamic exposure to flood risk maps for areas with hydrogeological criticality, as done by Balistrocchi et al. (2020).</p><p>In this work we aim at proposing a time series modelling strategy to obtain “real time” traffic flows prediction. To do so we use mobile phone origin-destination signals on the flow of Telecom Italia Mobile (TIM) users among different census areas (ACE of ISTAT, the Italian National Statistical Institute), and for the MoSoRe Project 2020-2022 and recorded at hourly basis from September 2020 to August 2021.</p><p>An Harmonic Dynamic Regression (HDR) model (Hyndman, Athanasopoulos, 2021) as it follows:</p><p>Flow= α+Fourier.day (K_d )+Fourier.week (K_w )+ Month+ε_(ARIMA(p,d,q))                        (1)</p><p>is proposed, where multiple seasonal periods are modelled with a properly selected number of Fourier basis, month is a dummy variable to account for different levels of flows by months and the error component is structured as an ARIMA.</p><p>HDR model suits for our purposes due to the strong daily and weekly patterns in traffic flows, as also confirmed by preliminar results on the accuracy of prediction based on a cross-validation strategy.</p><p>In future developments, the model in equation 1 may be improved by adding proper features as explanatory variables to increase the prediction accuracy, such as, e.g., the presence of people in the census area of origin and in the census area of destination of the flow, or precipitation data.</p><p>People’s and vehicles’ exposure obtained from mobile phone data and processed with the above stochastic model are then combined to flooding hazard maps estimated for different storm return period in a urbanized area close to Brescia to estimate dynamic flood risk maps.      </p><p><strong>References </strong></p><p>Balistrocchi, M., Metulini, R, Carpita, M., Ranzi, R.: Dynamic maps of human exposure to floods based on mobile phone data. Natural Hazards and Earth System Sciences, 20: 3485{3500 (2020).</p><p>Hyndman, R. J., Athanasopoulos, G.: Forecasting: principles and practice. 3rd edition, OTexts: Melbourne, Australia. OTexts.com/fpp3 (2021)</p>
Is there an effect of the news for the European economic conditions on the citizens’ opinion abou... more Is there an effect of the news for the European economic conditions on the citizens’ opinion about this economy? The study aims at measuring this relation, defining a theoretical framework based on available data from the European Official Statistics, and implementing a Higher Order Multiple Indicators Multiple Causes (HO-MIMIC) Model with parameters estimated using the Partial Least Squares (PLS) method
This series aims to ensure that selected papers from conferences in which EMES has been involved ... more This series aims to ensure that selected papers from conferences in which EMES has been involved will be accessible to a larger community interested in the third sector and social enterprise. EMES Conferences Selected Papers have not undergone any editing process. All the papers of the series are available for download at www.emes.net.
Annals of Operations Research, Feb 6, 2023
In sports, studying player performances is a key issue since it provides a guideline for strategi... more In sports, studying player performances is a key issue since it provides a guideline for strategic choices and helps teams in the complex procedure of buying and selling of players. In this paper we aim at investigating the ability of various composite indicators to define a measurement structure for the global soccer performance. We rely on data provided by the EA Sports experts, who are the ultimate authority on soccer performance measurement: they periodically produce a set of players' attributes that make up the broader, theoretical performance dimensions. Considering the potential of clustering techniques to confirm or disconfirm the experts' assumptions in terms of aggregations between indicators, 29 players' performance attributes or variables (from the FIFA19 version of the videogame, that is, sofifa) have been considered and processed with three different techniques: the Cluster of variables around latent variables (CLV), the Principal covariates regression (PCovR) and Bayesian modelbased clustering (B-MBC). The three procedures yielded clusters that differed from experts' classification. In order to identify the most appropriate measurement structure, the resulting clusters have been embedded into Structural equation models with partial least squares (PLS-SEMs) with a Higher-Order Component (that is, the overall soccer performance). The statistically derived composite indicators have been compared with those of experts' classification. Results support the concurrent validity of composite indicators derived through the statistical methods: overall, they show that, in the lack of expert judgement, composite indicators, as well as the resulting PLS-SEM models, are a viable alternative given their greater correlation to players' economic value and salary. Keywords Soccer performance • Cluster of variables around latent variables • Principal covariates regression • Bayesian model-based clustering • Structural equation model • Partial least squares • Higher order component
Statistical Methods & Applications
The use of new sources of big data collected at a high-frequency rate in conjunction with adminis... more The use of new sources of big data collected at a high-frequency rate in conjunction with administrative data is critical to developing indicators of the exposure to risks of small urban areas. Correctly accounting for the crowding of people and for their movements is crucial to mitigate the effect of natural disasters, while guaranteeing the quality of life in a “smart city” approach. We use two different types of mobile phone data to estimate people crowding and traffic intensity. We analyze the temporal dynamics of crowding and traffic using a Model-Based Functional Cluster Analysis, and their spatial dynamics using the T-mode Principal Component Analysis. Then, we propose five indicators useful for risk management in small urban areas: two composite indicators based on cutting-edge mobile phone dynamic data and three indicators based on open-source street map static data. A case study for the flood-prone area of the Mandolossa (the western outskirts of the city of Brescia, Italy...
Nonostante la crescita d�interesse per l�impresa cooperativa le conoscenze sulle reali dimensioni... more Nonostante la crescita d�interesse per l�impresa cooperativa le conoscenze sulle reali dimensioni economiche, occupazionali e sociali del movimento cooperativo, sia a livello internazionale che per i singoli paesi, rimangono molto scarse e frammentarie. In Italia enti ed istituzioni (Istat, Confcooperative, Legacoop, Unioncamere) si occupano da diverso tempo di studiare la consistenza del movimento cooperativo. Gli studi prodotti, nella maggior parte dei casi, fanno tuttavia riferimento a contesti specifici, a particolari tipologie o a campioni limitati d�imprese. Mancano ancora analisi aggiornate delle dimensioni complessive del fenomeno in grado di quantificarne, con ragionevole precisione, la capacit� di creare reddito e occupazione e di seguirne l�evoluzione. Partendo da queste premesse il presente studio intende offrire un quadro attendibile e disaggregato della cooperazione in Italia nel 2008, evidenziando le dimensioni economiche e occupazionali del fenomeno. Quest�analisi si...
Annals of Operations Research
Floods are one of the natural disasters which cause the worst human, social and economic impacts ... more Floods are one of the natural disasters which cause the worst human, social and economic impacts to the detriment of both public and private sectors. Today, public decision-makers can take advantage of the availability of data-driven systems that allow to monitor hydrogeological risk areas and that can be used for predictive purposes to deal with future emergency situations. Flooding risk exposure maps traditionally assume amount of presences constant over time, although crowding is a highly dynamic process in metropolitan areas. Real-time monitoring and forecasting of people’s presences and mobility is thus a relevant aspect for metropolitan areas subjected to flooding risk. In this respect, mobile phone network data have been used with the aim of obtaining dynamic measure for the exposure risk in areas with hydrogeological criticality. In this work, we use mobile phone origin-destination signals on traffic flows by Telecom Italia Mobile (TIM) users with the aim of forecasting the ...
All in-text references underlined in blue are linked to publications on ResearchGate, letting you... more All in-text references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately.
In the Data Science panorama, great room for indicators building, as well as predictive modeling ... more In the Data Science panorama, great room for indicators building, as well as predictive modeling is represented by sports data. Match outcome lends itself to the application of statistical learning models, while players’ performance represents a topic of particular interest for decision making and best choices in the competitive framework. The European Soccer database, available on Kaggle (KES database) incorporates players’ and teams’ data of about 20,000 soccer matches for seasons 2009-2015 in 10 European countries (Carpita et al., 2019b- c). Experts of the EA Sports FIFA videogame (see the website sofifa.com) state that the performance of a soccer player is made up of 7 broad dimensions (power, mentality, skill, movement, attacking, defending and goalkeeping), each of which incorporates, in turn, more specific skills to be developed and mastered by players on the pitch. Relying on experts’ suggestion, Carpita et al. (2019c) modified the original sofifa indicators by incorporating the four player roles (forward, midfielder, defender, goalkeeper): results showed that performance skills might play a different role according to where players are located in the pitch. However, no statistical inquiry has been carried out on sofifa experts’ performance indicators. Correlations among them revealed an unclear dimensional structure, making their statistical structure worth to be examined in detail. As a first development, Carpita et al. (2019a) used a non-supervised clustering technique for multivariate data which, however, did not consistently improve prediction of match results. For this reason, it is worth to examine the KES database with clustering techniques that also encompass prediction objectives. Principal Covariates Regression (PCovR) fits this purpose: it simultaneously reduces the predictors to a few components and regresses the criterion on these components (De Jong and Kiers, 1992). The predictive performance of PCovR components is compared with experts’ sofifa indicators via Skellam Model, a regression variation that best fits the distribution of home and team goal differences (Karlis and Ntzoufras, 2008).
Uploads
Papers by Maurizio Carpita
of people using mobile phone (Carpita, Simonetto, 2014) are source of very large data. Telecom Italia Mobile (TIM), which is currently the largest operator in Italy in this sector, thanks to a research agreement with the Statistical Office of the Municipality of Brescia, provided to us
about two years (April 2014 to June 2016, n ' 700) of Daily Mobile Phone Density Profiles (DMPDPs) for the Province of Brescia in the form of a regular grid of 923 x 607 cells each 15 minutes.
In order to find regularities and detect anomalies in the flow of people’s presences, this work aims to cluster similar DMPDPs, where each DMPDP is characterized by both the 2-D spatial component (i.e. 923 x 607 dimensions, one for each cell of the grid) and by the temporal
component (i.e. each cell has repeated values in time, for a total of 96 daily dimensions per cell). So, while each DMPDP counts for p ' 50 millions (923 x 607 x 96) of space-time dimensions, time and economic constraints prevent us from having a longer time series of DMPDPs. In
this terms, to group DMPDPs configures as an High Dimensional Low Sample Size (HDLSS) problem, since p is smaller than n.
We propose a mixed-approach procedure that we apply to the city of Brescia. First, borrowing the method of the Histogram of Oriented Gradients (HOG) from the Image Clustering discipline (Tomasi, 2012), we perform a reduction of the DMPDPs dimensionality computing
their features extractions. In doing so, we perform some tuning on the HOG parameters in order to reduce as much as possible the DMPDPs dimensionality while preserving as much as possible the information contained in the extracted features. With this approach we preserve both the spatial and the temporal components of the DMPDPs. Then, using the HOG features extractions, we group DMPDPs by applying - and by testing the feasibility of - different clustering
approaches for large data. We finally represent each cluster's DMPDP in terms of tensor decomposition.
geo-localization of people by mobile phone, by quantifying the number of people at a given moment in time, enriches the amount of useful information for “smart” (cities) evaluations. However, using Telecom Italia Mobile (TIM) data, we are able to characterize the spatio-temporal dynamic of the presences in the city of just TIM users. A strategy to estimate total presences is needed. In this paper we propose a
strategy to extrapolate the number of total people by using TIM data only. To do so, we apply a spatial record linkage of mobile phone data with administrative archives using the number of residents at the level of “sezione di censimento”.