DOLAP 2011-Analytics Over Large Scale MD Data
DOLAP 2011-Analytics Over Large Scale MD Data
DOLAP 2011-Analytics Over Large Scale MD Data
Il-Yeol Song
Karen C. Davis
Drexel University
Philadelphia, PA, USA
University of Cincinnati
Cincinnati, OH, USA
ABSTRACT
In this paper, we provide an overview of state-of-the-art research
issues and achievements in the field of analytics over big data, and
we extend the discussion to analytics over big multidimensional
data as well, by highlighting open problems and actual research
trends. Our analytical contribution is finally completed by several
novel research directions arising in this field, which plays a leading
role in next-generation Data Warehousing and OLAP research.
General Terms
Algorithms, Design, Management, Performance, Theory
Keywords
Analytics over Big Data, Analytics over Big Multidimensional Data,
Data Warehousing, OLAP
1. INTRODUCTION
Big Data refers to enormous amounts of unstructured data
produced by high-performance applications falling in a wide and
heterogeneous family of application scenarios: from scientific
computing applications to social networks, from e-government
applications to medical information systems, and so forth. Data
stored in the underlying layer of all these application scenarios have
some specific characteristics in common, among which we recall: (i)
large-scale data, which refers to the size and the distribution of data
repositories; (ii) scalability issues, which refers to the capabilities of
applications running on large-scale, enormous data repositories (i.e.,
big data, for short) to scale over growing-in-size inputs rapidly; (iii)
supporting advanced Extraction-Transformation-Loading (ETL)
processes from low-level, raw data to somewhat structured
information; (iv) designing and developing easy and interpretable
analytics over big data repositories in order to derive intelligence
and extract useful knowledge from them.
101
102
these, noticeable ones are the following: (i) moving towards more
expressive, complex aggregations, e.g. OLAP-like rather than SQLlike, hence enforcing the User Defined Function (UDF) and the
User Defined Aggregate Function (UDAF) [5] paradigms; (ii)
covering advanced SQL statements such as nested queries and
order-by predicates; (iii) incorporating data compression paradigms
in order to achieve higher performance; (iv) devising novel costbased optimizations, e.g. based on table or column statistics; (v)
integration with third-part BI tools.
5. CONCLUSIONS
Starting from state-of-the-art research issues and achievements in
analytics over big data, in this paper we have provided critical
discussion over open research issues and achievements arising in
this scientific field, and we have extended the discussion to the
emerging context of analytics over big multidimensional data. Open
problems and actual research trends have been highlighted, and
novel research directions have been proposed.
6. REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
103