Skip to main content

Carlo Vaccari

Followers

1,223

Following

66

Co-authors

8

Public Views

InterestsView All (13)

Uploads

Books by Carlo Vaccari

Gestione Archivi e Banche Dati

Papers by Carlo Vaccari

The Internet Trends and Experience: The Case of Ghana

The paper was intended to find out the Internet experience among Ghanaian users. The motivation o... more The paper was intended to find out the Internet experience among Ghanaian users. The motivation of the paper was compare the current Internet trends and experience in Ghana, a West African country, to the rest of the world. The research philosophy was based on the subjectivist view using the positivism philosophy. The survey method using online questionnaire was used. The respondents were from undergraduate students of the University of Education, Winneba – Ghana. Data collected was analysed and conclusions made. From the data analysis, the following findings and conclusions were made: There were more male online users than female users. Google, Facebook, Yahoo, Microsoft, and Wikipedia were still popular in Ghana. However, Apple and Amazon were less popular. Facebook was the most popular photo sharing platform in Ghana followed by Instagram, Flickr, and then Snapchat. The online Video portal: YouTube was popular and used to a large extent in Ghana. Facebook was the number one socia...

A Shared Computation Environment for International Cooperation on Big Data

Joint work with: CORE: a Standard Platform for Statistical Production Processes

Matjaz Jug A Shared Computation Environment for International Cooperation on Big Data The Role of Big Data in the Modernisation of Statistical Production and Services

A Common Reference Architecture for National Statistical Institutes: the CORA Project

The CORA (COmmon Reference Architecture) project is an ESSnet financed by Eurostat under the 2009... more The CORA (COmmon Reference Architecture) project is an ESSnet financed by Eurostat under the 2009 Statistical Programme. The principal result of the project is the definition of an architecture to be assumed as a reference by NSIs. Such an architecture has been articulated according to three distinct dimensions, namely organizational, IT and business. The organizational dimension of CORA is based on a survey of the commercial and legal foundations for the exchange of software between NSIs. The IT and business dimensions have been structured according to a layered approach, in which lower layers offer services to upper ones. This approach has the advantage of providing clear contracts between the components at each layer that, in this way, have precise duties and rights. The technical architecture develops itself alongside GSBPM (Generic Statistical Business Process Model) and a construction dimension that is determined by the way services make use of one another to deliver their res...

CORE: a Standard Platform for Statistical Production Processes

CORE (COmmon Reference Environment) is an environment supporting the definition of statistical pr... more CORE (COmmon Reference Environment) is an environment supporting the definition of statistical processes and their automated execution. CORE processes are designed in a standard way, starting from available services; specifically, process definition is provided in terms of abstract statistical services that can be mapped to specific IT tools. CORE goes in the direction of fostering the sharing of tools among NSIs. Indeed, a tool developed by a specific NSI can be wrapped according to CORE principles, and thus easily integrated within a statistical process of another NSI. Moreover, having a single environment for the execution of entire statistical processes provides a high level of automation and a complete reproducibility of processes execution.

Smart: a web architecture for long running applications

Standardizing European Statistical Processes: CORA and CORE Projects

The CORA (COmmon Reference Architecture) project is a research network financed by Eurostat under... more The CORA (COmmon Reference Architecture) project is a research network financed by Eurostat under the 2009 Statistical Programme. The principal result of the project has been the definition of an architecture to be assumed as a reference by National Statistical Institutes (NSIs). Such an architecture has been articulated according to three distinct dimensions, namely technical, organizational and business. The organizational dimension of CORA is based on a survey of the commercial and legal foundations for the exchange of software between NSIs. The technical and business dimensions of the CORA architecture have been structured according to a layered approach, in which lower layers offer services to upper ones. The first dimension of the technical architecture develops itself alongside GSBPM (Generic Statistical Business Process Model). The second dimension, called the construction dimension, is determined by the way services make use of one another to deliver their respective produc...

Internet time: Open Data and Laws for European citizens

The article covers three different aspects related to the Internet usage in Europe. The first the... more The article covers three different aspects related to the Internet usage in Europe. The first theme examines the Open Data phenomenon and the use of Public Sector Information in the interest of the citizens. In this part we listed the studies that in last years attempted to quantify the PSI market. In the second paragraph we list the actions taken by European Commission to develop the PSI market and to use the “openness” to improve economic growth in Europe. In the final section an overview of the Italian law relating to the use of the Internet is given, connecting it with the latest developments of European and UN laws on the usefulness of new technologies for the European digital citizen. The paper provides an overall look at the studies, actions and European laws regarding the use of the Internet and public data and the resulting benefits for citizens.

Globalization and Cultural Diversity Internet Time: Open Data and Laws for European Citizens

This article covers three different aspects related to the Internet usage in Europe. The first th... more This article covers three different aspects related to the Internet usage in Europe. The first theme examines the Open Data phenomenon and the use of Public Sector Information in the interest of the citizens. In this part we listed the studies that in last years attempted to quantify the PSI market. In the second paragraph we list the actions taken by European Commission to develop the PSI market and to use the "openness" to improve economic growth in Europe. In the final section an overview of the Italian law relating to the use of the Internet is given, connecting it with the latest developments of European and UN laws on the usefulness of new technologies for the European digital citizen. The paper provides an overall look at the studies, actions and European laws regarding the use of the Internet and public data and the resulting benefits for citizens.

Big data and textual analysis : a corpus selection from Twitter . Rome between the fear of terrorism and the Jubilee

The exponential growth of web technologies makes Big Data a field of great interest for textual a... more The exponential growth of web technologies makes Big Data a field of great interest for textual analysis. Twitter, among the social media, best suits the analysis of ideas and contents for its "openness" and "horizontality". However, extracting a textual corpus from Twitter is not an immediate task. The Big Data Sandbox project, promoted as part of the High Level Group at UNECE, aims to check the possibility of using Big Data in the official statistics. The project, started in 2014 and attended by about twenty national and international statistical organizations, focused in 2015 on the analysis of four different sources of Big Data. In particular one group focused on the collection of geo-located tweets. The public interface provided by Twitter is used to extract tweet generated within defined geographic coordinates. Within this project, all tweets generated in the territory of Rome starting from November 2015 are stored, to monitoring activities related to the J...

INS integrated architecture : pilot application in external trade statistics and methodological improvements in data processing

This paper describes one of the actions implemented in the framework of the twinning project "Mod... more This paper describes one of the actions implemented in the framework of the twinning project "Modernisation de l'appareil statistique tunisien", namely the introduction of a standard IT architecture for statistical processes and its application in the external trade statistical pilot domain. The architecture covered all the Generic Statistical Business Process Model macro-phases, offering an opportunity to introduce methodological improvements in the INS external trade GSBPM Metadata and Quality Management phases. The new integrated IT architecture was designed with INS experts and focuses on the "core" production process, with the aim of standardizing and streamlining the data production phases by (i) enhancing the adoption of standardized metadata in the collection, processing and dissemination phases (ii) introducing a new methodological approach for selective data editing and automatic imputation based on robust statistical methods (iii) minimizing the need for manual intervention in data editing (iv) developing new IT procedures for outlier selection and imputation fully scalable to other statistical domains. The suggested methodological and architectural solutions are compliant with the standards adopted in the context of official statistics and scalable to different domains.

Internet time: Open Data and Laws for European citizens

by Carlo Vaccari and Maria Concetta De Vivo

This article covers three different aspects related to the Internet usage in Europe. The first th... more This article covers three different aspects related to the Internet usage in Europe. The first theme examines the Open Data phenomenon and the use of Public Sector Information in the interest of the citizens. In this part we listed the studies that in last years attempted to quantify the PSI market. In the second paragraph we list the actions taken by European Commission to develop the PSI market and to use the “openness” to improve economic growth in Europe. In the final section an overview of the Italian law relating to the use of the Internet is given, connecting it with the latest developments of European and UN laws on the usefulness of new technologies for the European digital citizen. The paper provides an overall look at the studies, actions and European laws regarding the use of the Internet and public data and the resulting benefits for citizens.

Big Data and Official Statistics

ВОПРОСЫ МЕТОДОЛОГИИ Большие данные и официальная статистика Шьям Упадхьяя Организация Объединенны... more ВОПРОСЫ МЕТОДОЛОГИИ Большие данные и официальная статистика Шьям Упадхьяя Организация Объединенных Наций по промышленному развитию (ЮНИДО), г. Вена Большие данные-одна из составляющих четвертой промышленной революции. Глубокое внедрение цифровой технологии в экономику способствовало тому, что информация стала неотъемлемым элементом производственного процесса. Большие данные создаются в процессе работы машины, взаимодействия человека с машиной и взаимодействия между людьми. В статье последовательно рассматриваются вопросы, вытекающие из ее названия. Прежде всего, раскрывается содержание понятия «большие данные», отмечается, что это не только цифры в традиционном понимании, но и текстовая часть, аудио-и видеозаписи в социальных сетях, фотографии, спутниковые изображения, электронные письма, программы, приложения и многое другое. Автор проводит различие между неструктурированными и структурированными данными, отмечая, что последниеэто в основном количественные данные, которые представлены в базе данных с заранее определенной моделью для их хранения, обработки и распространения. Анализируя такой новый и, безусловно, революционный источник информации, каковым являются большие данные, автор оценивает их с точки зрения соответствия основным критериям и базовым принципам качества данных, таким как достоверность, возможность обеспечения сопоставимости, точность и надежность, правильное использование методологии. Отмечается, что тема больших данных вызывает чрезвычайный интерес у статистиков, которые рассматривают их как дополнительный источник сведений в условиях бурного развития информационных технологий. При этом некоторые пользователи переоценивают их потенциал и часто трактуют большие данные как предстоящую замену официальной статистики. Однако, по мнению автора, такое заключение является преждевременным; использовать большие данные необходимо с определенной осторожностью. Автор статьи отмечает два важных момента. Во-первых, для значительного числа пользователей интерес представляет только часть больших данных, а именно структурированные данные, в результате чего объем первых значительно сокращается. Во-вторых, как наукой, так и практикой доказано, что для получения достоверных результатов достаточно наблюдать небольшое число единиц, отобранных на основе случайной выборки (выборочной совокупности). В статье также дается критическая оценка больших данных с точки зрения других национальных базовых принципов, принятых ООН для обеспечения качества статистических данных. Особо выделяются проблемы, связанные с мониторингом достижения Целей устойчивого развития (ЦУР). Автор отмечает, что за данные, предоставляемые официальной статистикой, несут ответственность национальные статистические управления (НСУ). В отсутствие какой-либо институциональной ответственности надежность больших данных может быть поставлена под сомнение. В заключение подчеркивается, что пригодность больших данных определяется обоснованностью предположений, которые устанавливаются в ходе трансформации неструктурированного массива информации для проведения некоторого количественного измерения. В противном случае, по мнению автора, возможно проникновение в информационное поле потока нестатистической количественной информации в большом объеме, которая может дезинформировать общество и привести органы государственного управления и бизнеса к принятию неверных решений.

An Experience in the Conceptual Design of a Statistical Data Base: Report

An Experience in the Conceptual Design of a Statistical Data Base: Report

Big data and textual analysis: a corpus selection from Twitter. Rome between the fear of terrorism and the Jubilee

by Carlo Vaccari, Toni Virgillito, and Maria Elena Pontecorvo

The exponential growth of web technologies makes Big Data a field of great interest for textual a... more The exponential growth of web technologies makes Big Data a field of great interest for textual analysis. Twitter, among the social media, best suits the analysis of ideas and contents for its "openness" and "horizontality". However, extracting a textual corpus from Twitter is not an immediate task. The Big Data Sandbox project, promoted as part of the High Level Group at UNECE, aims to check the possibility of using Big Data in the official statistics. The project, started in 2014 and attended by about twenty national and international statistical organizations, focused in 2015 on the analysis of four different sources of Big Data. In particular one group focused on the collection of geo-located tweets. The public interface provided by Twitter is used to extract tweet generated within defined geographic coordinates. Within this project, all tweets generated in the territory of Rome starting from November 2015 are stored, to monitoring activities related to the Jubilee. The dramatic events of November 13 in Paris, quickly attracted the attention of users in Rome: in the context of the global threat of terrorist, the attack on a European city has deeply affected the imagination of Twitter users, also in view of the forthcoming Jubilee, which increased worldwide media exposure of the city. This suggested the opportunity to investigate the connections between Jubilee and terrorism to understand whether among Twitter's users the global threat of terrorism could affect the way of telling the Jubilee. The aim of this work is to apply some techniques of textual analysis on a corpus extracted from Twitter, to describe its contents and to investigate possible ties between technologies for Big Data analysis and Text Mining. Despite the selected corpus shows a poor connection between the two phenomena in the period of analysis, the analysis supplied interesting possibilities.

PhD Thesis on "Big Data in Official Statistics"

The explosion in the amount of data, called “data deluge”, is forcing to redefine many scientific... more The explosion in the amount of data, called “data deluge”, is forcing to redefine many scientific and technological fields, with the affirmation in any environment of Big Data as a potential source of data.
Official statistics institutions a few years ago started to open up to external data sources such as administrative data. The advent of Big Data is introducing important innovations: the availability of additional external data sources, dimensions previously unknown and questionable consistency, poses new challenges to the institutions of official statistics, imposing a general rethinking that involves tools, software, methodologies and organizations.
The relative newness of the field of study on Big Data requires first of all an introduction phase, for addressing the problems of definition and for defining the areas of technology involved and the possible fields of application.
The challenges that the use of Big Data poses to institutions that deal with official statistics are then presented in detail, after a brief discussion of the relationship between the new "data science" and statistics.
Although at an early stage, there is already a number, limited but growing practical experience in the use of Big Data as a data source for use in statistics by public (and private) institutions. The review of these experiences can serve as a stimulus to address in a more conscious and organized way the challenges that the use of this data source requires all producers of official statistics.
The worldwide spread of data sources (web, e-commerce, sensors) has also prompted the statistical community to take joint action to tackle the complex set of methodological, technical and legal problems. And so many national statistical institutes along with the most prestigious international organizations have initiated joint projects that will develop in the coming years to address the complex issues raised by Big Data for statistical methodology and computer technology.

CORE: a Standard Platform for Statistical Production Processes

by Carlo Vaccari, Toni Virgillito, Giulia Vaste, and Mauro Bruno

CORE (COmmon Reference Environment) is an environment supporting the definition of statistical pr... more CORE (COmmon Reference Environment) is an environment supporting the definition of statistical processes and their automated execution.
CORE processes are designed in a standard way, starting from available services; specifically, process definition is provided in terms of abstract statistical services that can be mapped to specific IT tools.
CORE goes in the direction of fostering the sharing of tools among NSIs. Indeed, a tool developed by a specific NSI can be wrapped according to CORE principles, and thus easily integrated within a statistical process of another NSI.
Moreover, having a single environment for the execution of entire statistical processes provides a high level of automation and a complete reproducibility of processes execution.

Gestione Archivi e Banche Dati

The Internet Trends and Experience: The Case of Ghana

The paper was intended to find out the Internet experience among Ghanaian users. The motivation o... more The paper was intended to find out the Internet experience among Ghanaian users. The motivation of the paper was compare the current Internet trends and experience in Ghana, a West African country, to the rest of the world. The research philosophy was based on the subjectivist view using the positivism philosophy. The survey method using online questionnaire was used. The respondents were from undergraduate students of the University of Education, Winneba – Ghana. Data collected was analysed and conclusions made. From the data analysis, the following findings and conclusions were made: There were more male online users than female users. Google, Facebook, Yahoo, Microsoft, and Wikipedia were still popular in Ghana. However, Apple and Amazon were less popular. Facebook was the most popular photo sharing platform in Ghana followed by Instagram, Flickr, and then Snapchat. The online Video portal: YouTube was popular and used to a large extent in Ghana. Facebook was the number one socia...

A Shared Computation Environment for International Cooperation on Big Data

Joint work with: CORE: a Standard Platform for Statistical Production Processes

Matjaz Jug A Shared Computation Environment for International Cooperation on Big Data The Role of Big Data in the Modernisation of Statistical Production and Services

A Common Reference Architecture for National Statistical Institutes: the CORA Project

The CORA (COmmon Reference Architecture) project is an ESSnet financed by Eurostat under the 2009... more The CORA (COmmon Reference Architecture) project is an ESSnet financed by Eurostat under the 2009 Statistical Programme. The principal result of the project is the definition of an architecture to be assumed as a reference by NSIs. Such an architecture has been articulated according to three distinct dimensions, namely organizational, IT and business. The organizational dimension of CORA is based on a survey of the commercial and legal foundations for the exchange of software between NSIs. The IT and business dimensions have been structured according to a layered approach, in which lower layers offer services to upper ones. This approach has the advantage of providing clear contracts between the components at each layer that, in this way, have precise duties and rights. The technical architecture develops itself alongside GSBPM (Generic Statistical Business Process Model) and a construction dimension that is determined by the way services make use of one another to deliver their res...

CORE: a Standard Platform for Statistical Production Processes

CORE (COmmon Reference Environment) is an environment supporting the definition of statistical pr... more CORE (COmmon Reference Environment) is an environment supporting the definition of statistical processes and their automated execution. CORE processes are designed in a standard way, starting from available services; specifically, process definition is provided in terms of abstract statistical services that can be mapped to specific IT tools. CORE goes in the direction of fostering the sharing of tools among NSIs. Indeed, a tool developed by a specific NSI can be wrapped according to CORE principles, and thus easily integrated within a statistical process of another NSI. Moreover, having a single environment for the execution of entire statistical processes provides a high level of automation and a complete reproducibility of processes execution.

Smart: a web architecture for long running applications

Standardizing European Statistical Processes: CORA and CORE Projects

The CORA (COmmon Reference Architecture) project is a research network financed by Eurostat under... more The CORA (COmmon Reference Architecture) project is a research network financed by Eurostat under the 2009 Statistical Programme. The principal result of the project has been the definition of an architecture to be assumed as a reference by National Statistical Institutes (NSIs). Such an architecture has been articulated according to three distinct dimensions, namely technical, organizational and business. The organizational dimension of CORA is based on a survey of the commercial and legal foundations for the exchange of software between NSIs. The technical and business dimensions of the CORA architecture have been structured according to a layered approach, in which lower layers offer services to upper ones. The first dimension of the technical architecture develops itself alongside GSBPM (Generic Statistical Business Process Model). The second dimension, called the construction dimension, is determined by the way services make use of one another to deliver their respective produc...

Internet time: Open Data and Laws for European citizens

The article covers three different aspects related to the Internet usage in Europe. The first the... more The article covers three different aspects related to the Internet usage in Europe. The first theme examines the Open Data phenomenon and the use of Public Sector Information in the interest of the citizens. In this part we listed the studies that in last years attempted to quantify the PSI market. In the second paragraph we list the actions taken by European Commission to develop the PSI market and to use the “openness” to improve economic growth in Europe. In the final section an overview of the Italian law relating to the use of the Internet is given, connecting it with the latest developments of European and UN laws on the usefulness of new technologies for the European digital citizen. The paper provides an overall look at the studies, actions and European laws regarding the use of the Internet and public data and the resulting benefits for citizens.

Globalization and Cultural Diversity Internet Time: Open Data and Laws for European Citizens

This article covers three different aspects related to the Internet usage in Europe. The first th... more This article covers three different aspects related to the Internet usage in Europe. The first theme examines the Open Data phenomenon and the use of Public Sector Information in the interest of the citizens. In this part we listed the studies that in last years attempted to quantify the PSI market. In the second paragraph we list the actions taken by European Commission to develop the PSI market and to use the "openness" to improve economic growth in Europe. In the final section an overview of the Italian law relating to the use of the Internet is given, connecting it with the latest developments of European and UN laws on the usefulness of new technologies for the European digital citizen. The paper provides an overall look at the studies, actions and European laws regarding the use of the Internet and public data and the resulting benefits for citizens.

Big data and textual analysis : a corpus selection from Twitter . Rome between the fear of terrorism and the Jubilee

The exponential growth of web technologies makes Big Data a field of great interest for textual a... more The exponential growth of web technologies makes Big Data a field of great interest for textual analysis. Twitter, among the social media, best suits the analysis of ideas and contents for its "openness" and "horizontality". However, extracting a textual corpus from Twitter is not an immediate task. The Big Data Sandbox project, promoted as part of the High Level Group at UNECE, aims to check the possibility of using Big Data in the official statistics. The project, started in 2014 and attended by about twenty national and international statistical organizations, focused in 2015 on the analysis of four different sources of Big Data. In particular one group focused on the collection of geo-located tweets. The public interface provided by Twitter is used to extract tweet generated within defined geographic coordinates. Within this project, all tweets generated in the territory of Rome starting from November 2015 are stored, to monitoring activities related to the J...

INS integrated architecture : pilot application in external trade statistics and methodological improvements in data processing

This paper describes one of the actions implemented in the framework of the twinning project "Mod... more This paper describes one of the actions implemented in the framework of the twinning project "Modernisation de l'appareil statistique tunisien", namely the introduction of a standard IT architecture for statistical processes and its application in the external trade statistical pilot domain. The architecture covered all the Generic Statistical Business Process Model macro-phases, offering an opportunity to introduce methodological improvements in the INS external trade GSBPM Metadata and Quality Management phases. The new integrated IT architecture was designed with INS experts and focuses on the "core" production process, with the aim of standardizing and streamlining the data production phases by (i) enhancing the adoption of standardized metadata in the collection, processing and dissemination phases (ii) introducing a new methodological approach for selective data editing and automatic imputation based on robust statistical methods (iii) minimizing the need for manual intervention in data editing (iv) developing new IT procedures for outlier selection and imputation fully scalable to other statistical domains. The suggested methodological and architectural solutions are compliant with the standards adopted in the context of official statistics and scalable to different domains.

Internet time: Open Data and Laws for European citizens

by Carlo Vaccari and Maria Concetta De Vivo

This article covers three different aspects related to the Internet usage in Europe. The first th... more This article covers three different aspects related to the Internet usage in Europe. The first theme examines the Open Data phenomenon and the use of Public Sector Information in the interest of the citizens. In this part we listed the studies that in last years attempted to quantify the PSI market. In the second paragraph we list the actions taken by European Commission to develop the PSI market and to use the “openness” to improve economic growth in Europe. In the final section an overview of the Italian law relating to the use of the Internet is given, connecting it with the latest developments of European and UN laws on the usefulness of new technologies for the European digital citizen. The paper provides an overall look at the studies, actions and European laws regarding the use of the Internet and public data and the resulting benefits for citizens.

Big Data and Official Statistics

ВОПРОСЫ МЕТОДОЛОГИИ Большие данные и официальная статистика Шьям Упадхьяя Организация Объединенны... more ВОПРОСЫ МЕТОДОЛОГИИ Большие данные и официальная статистика Шьям Упадхьяя Организация Объединенных Наций по промышленному развитию (ЮНИДО), г. Вена Большие данные-одна из составляющих четвертой промышленной революции. Глубокое внедрение цифровой технологии в экономику способствовало тому, что информация стала неотъемлемым элементом производственного процесса. Большие данные создаются в процессе работы машины, взаимодействия человека с машиной и взаимодействия между людьми. В статье последовательно рассматриваются вопросы, вытекающие из ее названия. Прежде всего, раскрывается содержание понятия «большие данные», отмечается, что это не только цифры в традиционном понимании, но и текстовая часть, аудио-и видеозаписи в социальных сетях, фотографии, спутниковые изображения, электронные письма, программы, приложения и многое другое. Автор проводит различие между неструктурированными и структурированными данными, отмечая, что последниеэто в основном количественные данные, которые представлены в базе данных с заранее определенной моделью для их хранения, обработки и распространения. Анализируя такой новый и, безусловно, революционный источник информации, каковым являются большие данные, автор оценивает их с точки зрения соответствия основным критериям и базовым принципам качества данных, таким как достоверность, возможность обеспечения сопоставимости, точность и надежность, правильное использование методологии. Отмечается, что тема больших данных вызывает чрезвычайный интерес у статистиков, которые рассматривают их как дополнительный источник сведений в условиях бурного развития информационных технологий. При этом некоторые пользователи переоценивают их потенциал и часто трактуют большие данные как предстоящую замену официальной статистики. Однако, по мнению автора, такое заключение является преждевременным; использовать большие данные необходимо с определенной осторожностью. Автор статьи отмечает два важных момента. Во-первых, для значительного числа пользователей интерес представляет только часть больших данных, а именно структурированные данные, в результате чего объем первых значительно сокращается. Во-вторых, как наукой, так и практикой доказано, что для получения достоверных результатов достаточно наблюдать небольшое число единиц, отобранных на основе случайной выборки (выборочной совокупности). В статье также дается критическая оценка больших данных с точки зрения других национальных базовых принципов, принятых ООН для обеспечения качества статистических данных. Особо выделяются проблемы, связанные с мониторингом достижения Целей устойчивого развития (ЦУР). Автор отмечает, что за данные, предоставляемые официальной статистикой, несут ответственность национальные статистические управления (НСУ). В отсутствие какой-либо институциональной ответственности надежность больших данных может быть поставлена под сомнение. В заключение подчеркивается, что пригодность больших данных определяется обоснованностью предположений, которые устанавливаются в ходе трансформации неструктурированного массива информации для проведения некоторого количественного измерения. В противном случае, по мнению автора, возможно проникновение в информационное поле потока нестатистической количественной информации в большом объеме, которая может дезинформировать общество и привести органы государственного управления и бизнеса к принятию неверных решений.

An Experience in the Conceptual Design of a Statistical Data Base: Report

An Experience in the Conceptual Design of a Statistical Data Base: Report

Big data and textual analysis: a corpus selection from Twitter. Rome between the fear of terrorism and the Jubilee

by Carlo Vaccari, Toni Virgillito, and Maria Elena Pontecorvo

The exponential growth of web technologies makes Big Data a field of great interest for textual a... more The exponential growth of web technologies makes Big Data a field of great interest for textual analysis. Twitter, among the social media, best suits the analysis of ideas and contents for its "openness" and "horizontality". However, extracting a textual corpus from Twitter is not an immediate task. The Big Data Sandbox project, promoted as part of the High Level Group at UNECE, aims to check the possibility of using Big Data in the official statistics. The project, started in 2014 and attended by about twenty national and international statistical organizations, focused in 2015 on the analysis of four different sources of Big Data. In particular one group focused on the collection of geo-located tweets. The public interface provided by Twitter is used to extract tweet generated within defined geographic coordinates. Within this project, all tweets generated in the territory of Rome starting from November 2015 are stored, to monitoring activities related to the Jubilee. The dramatic events of November 13 in Paris, quickly attracted the attention of users in Rome: in the context of the global threat of terrorist, the attack on a European city has deeply affected the imagination of Twitter users, also in view of the forthcoming Jubilee, which increased worldwide media exposure of the city. This suggested the opportunity to investigate the connections between Jubilee and terrorism to understand whether among Twitter's users the global threat of terrorism could affect the way of telling the Jubilee. The aim of this work is to apply some techniques of textual analysis on a corpus extracted from Twitter, to describe its contents and to investigate possible ties between technologies for Big Data analysis and Text Mining. Despite the selected corpus shows a poor connection between the two phenomena in the period of analysis, the analysis supplied interesting possibilities.

PhD Thesis on "Big Data in Official Statistics"

The explosion in the amount of data, called “data deluge”, is forcing to redefine many scientific... more The explosion in the amount of data, called “data deluge”, is forcing to redefine many scientific and technological fields, with the affirmation in any environment of Big Data as a potential source of data.
Official statistics institutions a few years ago started to open up to external data sources such as administrative data. The advent of Big Data is introducing important innovations: the availability of additional external data sources, dimensions previously unknown and questionable consistency, poses new challenges to the institutions of official statistics, imposing a general rethinking that involves tools, software, methodologies and organizations.
The relative newness of the field of study on Big Data requires first of all an introduction phase, for addressing the problems of definition and for defining the areas of technology involved and the possible fields of application.
The challenges that the use of Big Data poses to institutions that deal with official statistics are then presented in detail, after a brief discussion of the relationship between the new "data science" and statistics.
Although at an early stage, there is already a number, limited but growing practical experience in the use of Big Data as a data source for use in statistics by public (and private) institutions. The review of these experiences can serve as a stimulus to address in a more conscious and organized way the challenges that the use of this data source requires all producers of official statistics.
The worldwide spread of data sources (web, e-commerce, sensors) has also prompted the statistical community to take joint action to tackle the complex set of methodological, technical and legal problems. And so many national statistical institutes along with the most prestigious international organizations have initiated joint projects that will develop in the coming years to address the complex issues raised by Big Data for statistical methodology and computer technology.

CORE: a Standard Platform for Statistical Production Processes

by Carlo Vaccari, Toni Virgillito, Giulia Vaste, and Mauro Bruno

CORE (COmmon Reference Environment) is an environment supporting the definition of statistical pr... more CORE (COmmon Reference Environment) is an environment supporting the definition of statistical processes and their automated execution.
CORE processes are designed in a standard way, starting from available services; specifically, process definition is provided in terms of abstract statistical services that can be mapped to specific IT tools.
CORE goes in the direction of fostering the sharing of tools among NSIs. Indeed, a tool developed by a specific NSI can be wrapped according to CORE principles, and thus easily integrated within a statistical process of another NSI.
Moreover, having a single environment for the execution of entire statistical processes provides a high level of automation and a complete reproducibility of processes execution.

Internet time: Open Data and Laws for European citizens

by Carlo Vaccari and Pietro Tapanelli

The article covers three different aspects related to the Internet usage in Europe. The first the... more The article covers three different aspects related to the Internet usage in Europe. The first theme examines the Open Data phenomenon and the use of Public Sector Information in the interest of the citizens. In this part we listed the studies that in last years attempted to quantify the PSI market. In the second paragraph we list the actions taken by European Commission to develop the PSI market and to use the “openness” to improve economic growth in Europe. In the final section an overview of the Italian law relating to the use of the Internet is given, connecting it with the latest developments of European and UN laws on the usefulness of new technologies for the European digital citizen. The paper provides an overall look at the studies, actions and European laws regarding the use of the Internet and public data and the resulting benefits for citizens.

Big Data (University of Camerino 2013-2014)

Open Government and Open Data: an introduction

Web 2.0 - blog, wiki, tag, social network: what are they, how to use them and why they are important