Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2021, ArXiv
…
9 pages
1 file
Time series data are fundamental for a variety of applications, ranging from financial markets to energy systems. Due to their importance, the number and complexity of tools and methods used for time series analysis is constantly increasing. However, due to unclear APIs and a lack of documentation, researchers struggle to integrate them into their research projects and replicate results. Additionally, in time series analysis there exist many repetitive tasks, which are often re-implemented for each project, unnecessarily costing time. To solve these problems we present pyWATTS, an open-source Python-based package that is a non-sequential workflow automation tool for the analysis of time series data. pyWATTS includes modules with clearly defined interfaces to enable seamless integration of new or existing methods, subpipelining to easily reproduce repetitive tasks, load and save functionality to simply replicate results, and native support for key Python machine learning libraries su...
Proceedings of the 15th Python in Science Conference
Inference on time series data is a common requirement in many scientific disciplines and internet of things (IoT) applications, yet there are few resources available to domain scientists to easily, robustly, and repeatably build such complex inference workflows: traditional statistical models of time series are often too rigid to explain complex time domain behavior, while popular machine learning packages require already-featurized dataset inputs. Moreover, the software engineering tasks required to instantiate the computational platform are daunting. cesium is an end-to-end time series analysis framework, consisting of a Python library as well as a web front-end interface, that allows researchers to featurize raw data and apply modern machine learning techniques in a simple, reproducible, and extensible way. Users can apply out-of-the-box feature engineering workflows as well as save and replay their own analyses. Any steps taken in the front end can also be exported to a Jupyter notebook, so users can iterate between possible models within the front end and then finetune their analysis using the additional capabilities of the back-end library. The open-source packages make us of many use modern Python toolkits, including xarray, dask, Celery, Flask, and scikit-learn.
Journal of Open Source Software
Two fundamental tasks in time series analysis are identifying anomalous events ("discords") and repeated patterns ("motifs"). Successfully accomplishing these tasks is of the utmost importance across many disciplines, and can lead to powerful technological advancements, prevention of catastrophic failures and the generation of significant economic gain. Dozens of algorithms have been developed to solve these problems, including AR(I)MA regression (Däubener, Schmitt, Wang, Bäck, & Krause, 2019), Hierarchical Temporal Memory (Ahmad & Purdy, 2016), Extreme Studentized Deviate (Däubener et al., 2019) and Artificial Neural Networks (Bishop, 2006). Unfortunately, these approaches are hampered by a combination of steep methodological learning curves, numerous parameters that require tuning and the inability to scale across large datasets (Yeh et al., 2016). The explosive growth of the data science community provides an additional hurdle for traditional time series analysis methods, as many practitioners lack experience in advanced mathematical and statistical principles. In this paper we present MPA (the Matrix Profile API) as a solution to all of these challenges. MPA is a cross-language platform in Python (matrixprofile), R (tsmp) and Golang (go-matrixprofile) that leverages a novel data transformation known as the Matrix Profile (Yeh et al., 2016) to rapidly identify motifs and discords. Perhaps most importantly, MPA is an easy-to-use API that's relevant for time series novices and experts alike.
2021
We introduce Merlion1, an open-source machine learning library for time series. It features a unified interface for many commonly used models and datasets for anomaly detection and forecasting on both univariate and multivariate time series, along with standard pre/postprocessing layers. It has several modules to improve ease-of-use, including visualization, anomaly score calibration to improve interpetability, AutoML for hyperparameter tuning and model selection, and model ensembling. Merlion also provides a unique evaluation framework that simulates the live deployment and re-training of a model in production. This library aims to provide engineers and researchers a one-stop solution to rapidly develop models for their specific time series needs and benchmark them across multiple time series datasets. In this technical report, we highlight Merlion’s architecture and major functionalities, and we report benchmark numbers across different baseline models and ensembles.
2020
Users that work with time series data typically disaggregate time series problems into various isolated tasks and use specific libraries, packages, tools, and services that deal with each individual task. However, the tools used are often fragmented. Analysts have to load different packages for common tasks such as data preprocessing, clustering, feature extraction, forecasting, hierarchical reconciliation, evaluation, and visualization. This disclosure describes a reliable, scalable infrastructure to meet various needs of time series practitioners without adding
ACM SIGMOD Record
The analysis of time-series data associated with modernday industrial operations and scientific experiments is now pushing both computational power and resources to their limits. In order to analyze the existing and (more importantly) future very large time series collections, new technologies and the development of more efficient and smarter algorithms are required. The two editions of the Interdisciplinary Time Series Analysis Workshop brought together data analysts from the fields of computer science, astrophysics, neuroscience, engineering, electricity networks, and music. The focus of these workshops was on the requirements of different applications in the various domains, and also on the advances in both academia and industry, in the areas of time-series management and analysis. In this paper, we summarize the experiences presented in and the results obtained from the two workshops, highlighting the relevant state-ofthe- art-techniques and open research problems.
arXiv (Cornell University), 2022
In modeling time series data, we often need to augment the existing data records to increase the modeling accuracy. In this work, we describe a number of techniques to extract dynamic information about the current state of a large scientific workflow, which could be generalized to other types of applications. The specific task to be modeled is the time needed for transferring a file from an experimental facility to a data center. The key idea of our approach is to find recent past data transfer events that match the current event in some ways. Tests showed that we could identify recent events matching some recorded properties and reduce the prediction error by about 12% compared to the similar models with only static features. We additionally explored an application specific technique to extract information about the data production process, and was able to reduce the average prediction error by 44%.
Syme, G., Hatton MacDonald, D., Fulton, B. and Piantadosi, J. (eds) MODSIM2017, 22nd International Congress on Modelling and Simulation., 2017
Models of physical systems are the foundation of many scientific and decision support systems. These models rely heavily on observational data, typically collected from sensors. Increasingly this data comes from a wide range of sources. For example, agricultural models often require data from climate observations, soil conditions, on-farm equipment, seasonal forecasts, among others. Integration of these data with models is very time-consuming and often is repetitious across different models. Furthermore, automation of model runs is difficult due to the complexity of managing data dependencies. We have developed a distributed system, Senaps, to support automation of sensor data retrieval and coupling with model execution in a scalable way. It has been developed over many years across scientific disciplines, including water management, agriculture, aquaculture, and related Information, Communication and Technologies areas. It has been used, and is in use, by a diverse range of projects, resulting in a flexible system that is not tied to a specific domain. Senaps includes a publish-subscribe subsystem that handles ingestion of disparate time-series data. It supports stream processing, such as quality assurance and data checking, and automates data ingestion with monitoring and recovery. The storage and access subsystem is a scalable time-series backend with an Application Programming Interface (API) to allow third party developers to build on. It has a range of features including dynamic temporal aggregations; fine-grained access control to support data privacy and sharing (users can elect to share data between organisations); metadata for sensor data management; and controlled vocabularies. The focus of this paper is the model integration subsystem, which provides the model integration and automation features. This system builds on developments in cloud and container-based computing to isolate a user submitted model's runtime environment and provide access to the data backend. APIs are provided to handle environment images (e.g. Linux with R), model definition, workflows (instances of a model), and running of model jobs. We have successfully used this system to automate model runs and provide continuous results from a number of parameterised models. We have hosted a number of models on the platform, including a timber drying model and two agricultural prediction models. Being tied to a robust sensor-data backend ensures models are run on the most recent data and removes the need for model developers to continuously manage model execution. Results from the model are automatically available and can be easily shared between users and organisations. In this paper, we detail the technical challenges in implementation, provide example results from a running model, and describe our next research steps.
SoftwareX, 2022
Time series processing and feature extraction are crucial and time-intensive steps in conventional machine learning pipelines. Existing packages are limited in their applicability, as they cannot cope with irregularly-sampled or asynchronous data and make strong assumptions about the data format. Moreover, these packages do not focus on execution speed and memory efficiency, resulting in considerable overhead. We present tsflex, a Python toolkit for time series processing and feature extraction, that focuses on performance and flexibility, enabling broad applicability. This toolkit leverages window-stride arguments of the same data type as the sequence-index, and maintains the sequence-index through all operations. tsflex is flexible as it supports (1) multivariate time series, (2) multiple window-stride configurations, and (3) integrates with processing and feature functions from other packages, while (4) making no assumptions about the data sampling regularity, series alignment, and data type. Other functionalities include multiprocessing, detailed execution logging, chunking sequences, and serialization. Benchmarks show that tsflex is faster and more memory-efficient compared to similar packages, while being more permissive and flexible in its utilization.
ITISE 2022
This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY
Performance Evaluation, 1995
TEStool is a visual interactive software environment for modeling autocorrelatedtime series, using a versatile class of stochastic processes called TES (Transform-Expand-Sample). The novel feature of the TES modeling approach is that it strives to fit amodel to empirical records by simultaneously capturing both the empirical distributionand the leading empirical autocorrelations. Thus, TES models can have a high degreeof fidelity, since
Rosa Sanz, Marina Diaz Bourgeal y José Ramón Pérez-Accino (eds.). Eros Imperat. Poder y deseo en la Antigüedad, 2020
Forthcoming in Festschrift for Professor Arindam Chakrabarti, 2025
Journal of Management & Organization, 2021
Bhakti Persada: Jurnal Aplikasi Ipteks, 2019
Science. Business. Society., 2021
DergiPark (Istanbul University), 2022
2023
Journal of Environmental Chemical Engineering, 2017
GRAFOLOGIJA, 2023
European Heart Journal, 1996
Psico-USF, 2015
Journal of AOAC International, 2017
Academy of Strategic Management Journal, 2020
SSRN Electronic Journal, 2021
Epistola 2. La lettre diplomatique Écriture épistolaire et actes de la pratique dans l'Occident latin médiéval, 2018