IJTHI2009 Prado Freitas Sbrici1
IJTHI2009 Prado Freitas Sbrici1
IJTHI2009 Prado Freitas Sbrici1
net/publication/220672995
CITATIONS READS
5 3,051
3 authors, including:
All content following this page was uploaded by Alysson Prado on 08 August 2014.
managers; it is time for the HR to encompass this technology in its favor. This
containing actual Human Resource data. We provide the data models and
show their use through OLAP software. We also discuss the progress and
Introduction
Since the late 1980's and the beginning of the 1990's, Business Intelligence (BI) tools have
been proposed as valuable tools for companies (Inmon, 2005; Kimball et al, 1998), helping
with decision support. Inside and outside the Human Resources department there is need
(Guiderdony, 2007). On Line Analytical Processing (OLAP) is one of the BI proposals for
production data, for instance, the sales amount. According to Ngai & Wat (2004), the use of
Information Systems in the HR to help make more precise decisions is only the tenth in the
In this paper we present the experience gathered during the course of a project that aimed to
apply OLAP tools for HRM, targeting on employees demography and absenteeism. This
text represents the point of view of the IT staff regarding the project progress and sequels,
intending to provide an example of how such powerful technology can help the Human
The projects began in 2005 at the Human Resources department of a public Brazilian
Institutes and Schools, one academic medical center, 23 research centers and an
administrative area. The institution employs teachers and researchers, technical and
This paper is organized as follows: in the next Sections there are, respectively, an overview
of the adopted Business Intelligence theory, the description of implementations of two Data
Marts regarding Human Resource information and how their content can be provided to
final users. In the remaining Sections we discuss the challenges faced during the project,
the drawbacks, lessons learned, project follow-up and present our final remarks.
Business Intelligence overview
The concept of Business Intelligence (BI) refers to the abilities of the corporations to
retrieve information related to their operation processes and area of activity, in a flexible
and dynamic way, allowing the analysis, detailing and understanding their work and
providing means for decision support. The term has been popularized since the late 1980’s
The data managed by Business Intelligence systems have certain specific characteristics,
reflecting on the way they are gathered, stored and retrieved, which will be briefly
The process of obtaining and modifying the data for feeding a Business Intelligence
database is called ETL, in respect to the three steps it involves: Extract, Transform and
Load.
In the Extract step data are typically queried from other systems of the company, the so-
organization operations. Spreadsheets and plain text files can also be used as data sources
for Extraction.
In the Transform step the data are handled aiming to fit in the view the users of the decision
support system have of the process and of the facts they represent. This means unity
conversions, codes standardization and data filtering, categorization and so forth.
In the Load step the data produced by the prior steps are stored in a special database
The Data Warehouse (DW) is a large data repository (Inmon, 2005), obtained from all the
relevant sections of the organization. The Data Warehouse contains the raw material for the
When the Data Warehouse is updated from ETL, no data is deleted or overwritten. Instead,
the data are accumulated, constructing the history of the data involved in the company
operations.
The data structure of a DW often does not follow the common database systems techniques
that use normalization to ensure data integrity and less storage space. Instead, the data are
de-normalized and arranged in such a way that helps to query for reports and analysis.
From the organization Data Warehouse, the information related to each area of interest is
separated, forming Data Marts. Each Data Mart (DM) is a subset of the whole DW focused
Data on a DM is arranged according to the star schema: each relevant data from the
organization, known as fact, is stored in a structure called cube, which can be viewed or
the cube. The dimensions are some typical meaningful attributes of the fact, such as time,
The dimensions can have several hierarchies for classifying the facts according to many
Retrieving data from a DM requires specialized software that can deal with cubes and
dimensions. Besides, it must be able to provide a single entry point for the managers of the
organization to search for the information needed to their decision making process. This
An OLAP system provides means for accessing a Data Mart, listing its cubes and related
dimensions, constructing reports, regardless of the particular data it contains. This way, the
user does not need to learn how to operate different systems, and also programmers do not
need to develop new programs and reports for each subject addressed by the Data Marts.
Through an OLAP, a user can build reports and charts that summarize the facts retrieved
from the DM according to the selected dimensions. This way, it is easy to create cross-
It’s also easy to change the granularity chosen by a certain dimension, in a process called
drill-up or drill-down, when the user navigates trough the hierarchic levels of a dimension.
For each of these modifications on the table structure, the data on the cross-table cells are
re-summarized on-line, providing immediate answers for the user and helping decision-
making.
Along with the use of data warehouses, companies sometimes faced situations in which the
information needed to support a decision was not fully present in the BI systems, neither
available nor, at least, difficult to retrieve from the operational systems and databases. To
bridge this gap, Inmon et al. (1996) proposed the Operational Data Store – ODS.
The Operational Data Store can act as an intermediate repository for data coming from the
operational and legacy systems, and going to the Data Warehouse and Data Marts. The data
ODS content is fine-grained, current-valued or at least often updated, and subject-oriented.
The ODS is fed with operational data from several sources, which are put together to serve
as a single source of information for reports and another Decision Support Systems, and for
Sometimes managers need to see data with different levels of granularity in the same
data triggers a lookup in the BI data in order to discover a trend, prevent some future failure
or explain any apparent outlier information. Having both the current and the historical data
side by side is helpful to place the suspicious data in the historical scenario of the
Nowadays, there are many available technologies that enterprises can adopt to bear their IT
exclusively for that organization. On the other hand, the growing presence of free software
that provides solutions with the same level of quality is undeniable. From the first set, we
can cite the Microsoft .Net Framework used by many organizations to support their
software; from the last one, the Java and particularly the JEE platform has a broad spread
use.
Many literature sources (Few, 2009; Casati et al., 2004; Malik, 2005; Niven, 2008) refer to
a need of an “enterprise dashboard” in which BI data can be promptly seen and understood.
This artifact can be achieved by the use of a report generator such as Jasper Reports
Java applications, both for web and desktop environments, and has both free and
There is a whole package of tools from the Jasper repository intended to be used with
Suite. Its main component is the report generator, but there is also support for interactive
With the report generator, components such as the Pivot Table can be mixed with pie charts
in order to obtain the desired visual information. The IT team responsible for the
development and maintenance of the operational Human Resource Information System can
confidential data from the employees, the BI Suite provides single sign-on integration with
external security systems such as LDAP and Windows Active Directory Services, with
granular access control by row, column, cell or report. It also enables an audit trail logging,
with which one can know who accessed which report and when.
As a result, when someone from the HR staff uses the operational software in his or her
daily activities – let us say, interview a candidate for a job or a promotion – he or she can
see the necessary updated or outdated data from the person, with the background data of the
To help with the decision-making of the Human Resource area of the University we
modeled the Absenteeism Data Mart and Employees Data Mart. Thus, we met the demand
for information referring to workers counts and absenteeism rates distinctly. The following
Resource activities. According to Burke and Ng (2006), some of the up-to-date subjects in
HR are related to demographic changes of workers, such as aging and cultural diversity. In
the studies presented at the United States Worker Health Chartbook (NIOSH, 2004) the
major variables considered were age, sex, ethnicity and occupation. These variables can be
applied both inside and outside an organization. But specifically inside the enterprise, more
The Human Resource Management Systems (OLTP) used at the organization stores the
workers’ enrollment data in several distinct tables. Our first challenge was to choose a set
requirements from some Human Resource departments and other business areas of the
University.
Besides, we decided to elect some extra dimensions to put in the model, trying to avoid a
new modeling caused by new requirements in a short period of time. As the fact for the
This cube was planned to be updated monthly, so the Time dimension contains month and
year for every snapshot of an employee’s situation, taken on the last day of the month. The
Workplace means the current position in the physical and organizational structure,
(institutes, departments, and so on): the Journey dimension represents the nominal workday
length and Frequency Situation reflects the employee availability at the workplace (still
working, retired, on vacation, absent and so on). The other dimensions are self-explanatory.
worker’s faults or absences in the enterprise. Quick and Lapertosa (1982) suggest the
accidents;
Legal Absenteeism: when the absences are predicted and accepted by laws;
Compulsory Absenteeism: situations beyond the worker’s will, for instance, any
penalty or arrest.
Free Will Absenteeism: generated by personal affairs and situations without legal
subsidy.
Couto (1998) emphasizes that it is very important to know how to measure and understand
absenteeism rates in a corporation, based not on a single point of view, but instead
Human Resources: build projections to foresee the future needs of the working
force.
Occupational Safety and Health: aims to detect and prevent diseases produced by
work situation, resulting in programs to deal with hazards and enhance life quality.
Based on these approaches, authors define several different absenteeism rate formulas, with
slight variations in the input data, but all having the general rationale of quantifying the
ratio between absent and expected workforces. Couto (1998) defines his rates based on the
average number of employees working and the number of days that are important for each
analysis: number of missed days, number of days on medical leaves, and so forth.
calculation in order to make it comparable in terms of time evolution and distribution in the
company geography. This convergence can be obtained by a single system that provides the
data to the whole organization, however respecting particular needs, by using tools the as
From an OLTP system used in the company to store and process data related to the daily
attendance of the employees, we define a star schema that uses this information in a
consolidated and complete way. The information structure is modeled in agreement with
To build the Cube, in the beginning of the project, we considered the use of the number of
days of absence (Couto, 1998) as the fact for the cube. But due to characteristics of the
organization, such as the variety of shift durations, and the availability of data on the HR
operational system, we changed it to use the number of hours to populate the cube.
The cube thus sums the amount of hours every employee has for each frequency situation
of the organization. The number of hours is obtained through the verification of the
employee frequency, which is registered in a database that stores the daily amount of hours
To use as Dimensions, besides some historical dimensions of enrollment data like the
Absenteeism, which contains the items Integral Actuation, Partial Actuation and Long
We considered as Integral Actuation on the company all employees that worked, or were
absent in accordance with the Brazilian Consolidation of the Labor Law (CLT, 2007, art.
Absences that were shorter than thirty days in a row or that were not legal were defined as
Partial Actuation on the company. Absences exceeding thirty days in a row were regarded
itself, without estimates. The number of hours is split among Integral Actuation (I), Partial
Besides, we can apply a function to translate the raw number of hours into a generic
absenteeism rate, r (x). To get the Integral Actuation rate in the company, we divide the
amount of hours of the Integral Actuation by the total number of hours of Actuation,
The absenteeism rate of the Partial Actuation is obtained from the division of the amount of
hours of the Partial Actuation by the sum of hours of the Integral Actuation plus the hours
To get the Long Term Absence rate, just do the division of the amount of hours in the Long
Term Absence by the sum of hours of the Actuation Integral plus the hours of the Actuation
Spreading knowledge
The data marts were finalized, though the information is still obscure for the end-user, then
we concluded that a layer have to be created between the data mart and the end-user,
because there must be means to access the data to get valuable information for analysis.
Rasmussen et al. (2002) describe five different scenarios for integrating BI tools with the
data source:
b) Adopt more than one BI tool to access the OLTP data source;
c) Integrate some BI tools to access the OLTP data source, by using a single web
d) Use BI tools that come with the OLTP information systems by accessing its data
source (e.g.: functions of the HRIS that might generate analytical reports and
querying.);
We knew that the end-user has, not often been able to manipulate any kind of BI tool, thus
some reports have been made available on a web browser. On the other hand, some users
were experts at generating reports, graphics and analytical functions using a BI tool, for
that reason, having an OLAP tool available was essential. Therefore, maintaining a portal
where analytical reports could be accessible on the web, and disposing a BI tool for some
warehouse and data marts structured, consequently, alternatives a), b) and d) were
disregarded, but c) did not, because use a web portal for some reports was fact, then we
joined options c) and d), thus, our setting was build a web portal where analytical reports
could be available, and in the same interface give access to the BI tool adopted to access the
data marts.
Absenteeism Data Mart through a BI software tool. The use of the same tool for accessing
By accessing the Data Mart the managers can create themselves several spreadsheets and
charts, or access some pre-built ones. Figure 5 below shows a simple but typical table,
where we can perceive the star-schema structure: dimensions on rows and columns, and a
The dimension Time was used for columns on the Month hierarchy level, and the
dimension Absenteeism was placed row titles, using both levels Type and Subtype. The
fact presented is the number of hours for each situation, summarized by a Sum function.
The data were filtered for showing only the months 01/2006, 02/2006 and 03/2006.
A common OLAP tool allows us to apply user-defined functions and calculations to the
raw facts. For instance, to the same data used for Figure 5 we can apply the formulas
described on the previous section for the absenteeism rate on the organization, producing
Since the users’ needs for this system concern mainly the Partial Actuation index, we can
plot a chart to view its behavior during all the months of 2006, as shows Figure 7, simply
changing the filter of the Time dimension and applying the fact data to the Y axis of the
chart.
Figure 7: Chart for the Partial Actuation rate during the year of 2006
If the user becomes curious about whether this behavior repeats for every year, he or she
can drill-up the Time dimension, accessing the Year hierarchy level, and producing another
report and chart (Figure 8). Managers now can get informed about possible workforce
Figure 8: Drilling-up the Time dimension and showing the last 3 years
Other dimensions, as Workplace for instance, can be added to the sheet by simply dragging
and dropping onto the rows or columns, and then the same procedure of drilling up and
down can be applied for finding departments or offices of the organization where the
absenteeism rates require more attention from the Human Resources department.
HR-BI Portal
Set the portal as the primary point of interaction was our purpose, and then Human
Resource – Business Intelligence Portal was the name granted by the web interface that
made available the access to the BI tool and the analytical reports created.
The end-user community has access to this web portal according to three security levels:
1. Public: human resource quantitative reports, created in the BI software, that were
2. Private: human resource costs and payroll reports, created in the BI software, that
only human resource area, managers and authorized users can access;
3. BI tool restrict: qualified users able to generate analytical reports and graphics
Lessons learned
When our first cubes were released to users there were a great acceptance and signals of a
promising future to the project (Prado et al., 2007). But during the following months, we
perceived a gradual disinterest from users to the system. Despite of many available
needed to investigate the real causes and try to correct them. The following topics
A common requirement after the implantation was to allow users to see which people were
summarized on each cell of the fact table. This requirement shows us that data in a human
resource system has a singular feature when compared with other systems, as his raw data
This behavior affords us to consider the use of an Operational Data Store in addition to the
Data Warehouse, as a data source for Decision Support Systems for Human Resource. The
data store has a fine-grained amount of data, where the information about the people is
preserved, while in the data warehouse we could focus on quantified data to manager
analysis. It is important to note that we must try to build an OLAP that establish
communication between the two bases, starting from the DW but going to the ODS
whenever it is needed.
User interaction deserves more attention
The most common user interfaces for exploring multidimensional data use the table-based
approaches (Siefer, 2006) in which the user displace dimensions and facts on a
bidimensional structure and electively applies filters. The user constructs nested or
compound tables and may have navigational control through subsets of the data available in
the cubes. The table paradigm fits well with users’ capabilities, since they are familiarized
Nonetheless, the data selection and filtering prior to the table viewing revealed to be a
problem. Sometimes the users did not applied correct dimensions combination, leading to
inappropriate results, lack of reliability in the tool, and they abandoned its use. The
purchased OLAP tool did not provide some necessary data constraints, leaving the user
Another drawback was the absence of a reach-trough capability in the tool, enabling the
user to scrutinize the raw data (maybe in an ODS, as proposed in 4.1). Issuing a query to
retrieve the underlying data became an IT task, via SQL, and the final users evaded the
direct interaction with the OLAP software. Front-line employees cannot wait for IT queries,
and as a result, take action based on experience and intuition rather than in facts (Ahlberg,
2007).
Frequently changing requirements
After the data warehouse implantation, it was very common to arise new requirements from
some clients that were not asked in the previous phase. A quick solution could be a data
warehouse refactoring, resulting in some ETL process tuning. But, just after the new
solution, the clients asked more dimensions or facts that were not served by the data
warehouse. BI reporting has grown and changed in dramatic ways (Ahlberg, 2007), and
Faced with this reality we believed that, inspired in the software development process
evolution, which assumed a requirement changing as a part of the development process and
not as the big evil that we need to avoid, the development of a BI system should also
evolution must be the premise, or rather, we need to understand that the clients and
business needs will not be stable, and could be based on agile methodologies that were
already concerned about it. Agile development strives to deliver small units of functionality
In face of the requirement change addressed by the users we observed that the variability of
have this concerned, up to dated data will need low granularity while old data will not.
Considering that, we suggest that the premise in a BI project identifies not only which data
are relevant, but also which data are relevant in a timeline. For example, we observed that
for the last absenteeism data, the managers need to know which people are qualified to act
directly with each person. By the other way, in an older data analysis (older than 12
months) we observe needs for statistic data that shows evolutive institutional snapshots,
Managing data and business context together in real time and ensuring the data structures
are aligned to handle different levels of granularity and completeness of data is a key
final product has become little helpful in making decisions related to human resources, after
The employees data mart has supported the statistical yearbook report, nowadays, the main
usefulness of the OLAP tool. Dynamic reports were built by team in 2006 and it has been
used since then. Unfortunately this task has been executed for IT area. IT area has used
OLAP tool to extract data, after it converts into spreadsheets, format spreadsheets and
finally send data to final user. The final user hasn’t extracted data because the OLAP tool is
considered very difficult to use and the IT area hasn’t had top management’s support to
The absenteeism data mart was abandoned, nobody extracted data directly on OLAP tool.
The final users always need others dimensions and always need the names of employees.
The IT team has extracted data from data store with SQL consults. Other problem is that a
lot of absent data are edited after the ETL’s day. In the future is very important IT team
The data store has supported many others reports that are important in a Human Resource
Area. These reports have extracted from data store with SQL consults. The team’s
expectation was to implement improvements and develop more data marts, but the urgent
We believe that the main factor to minimize problems is to form a team focused on
Bussiness Intelligence. This team will have primary responsibility for developing the
environment of business intelligence based on the urgent demands and difficulties of users.
This team would provide rapid solutions to promote quality assurance of data and promote
Final words
As highlighted by Zeleny (2007), there is a growing need for information and knowledge in
the Human Resource area. Business Intelligence has been providing this information for
other areas in the organization. It is time for HR to encompass this technology in its favor.
In this paper we provided two sample models for OLAP cubes containing Human Resource
specific data and showed its use through software. We also discussed the progress of the
the literature examples and practices, but must instead define its own methods and models,
We believe that the future of this project relies on making an effort to address the points
covered in Section 4. We hope this could alert other projects regarding the use of BI in
References
Ahlberg, C. (2007). Disrupting the status quo: next-generation BI for the real world.
management.com/infodirect/2007_50/10000412-1.html.
Burke, R. J., & Ng, E. S. W. (2006). The changing nature of work and organizations:
Casati, F., Castellanos, M., & Shan, M. (2004). Enterprise cockpit for business operation
Springer Berlin/Heidelberg.
Chiavenato, I. (2002). Recursos humanos: edição compacta, (7a ed.), Sao Paulo:Atlas.
www.dmreview.com/dmdirect/20050422/1025869-1.html
Decreto Lei Nº 5.452 - Consolidação das Leis do Trabalho, May,01,1943. Retrieved Feb,
Administracao de Pessoal.
Few, S. (2009). Information dashboard design: the effective visual communication of data.
en.wikipedia.org/wiki/ICD
Imhoff, C. (2005). Business intelligence project pitfalls. Retrieved Dec, 23, 2007, from
www.b-eye-network.cn/view/1519
Inmon, W. H. (2005). Building the data warehouse. Wiley Publishers, (4th ed)
Inmon, W., Imhoff, C., & Battas, G. (1996). Building the operational data store. New
Kimball, R., Reeves, L., Ross, M., & Thornthwaite, W. (1998). The data warehouse
Malik, S. (2005). Enterprise dashboards: design and best practices for IT. Wiley Editors.
Ngai, E.W.T., & Wat, F.K.T. (2006). Human resource informatin systems: a review and
empirical analysis. Personnel Review, vol. 35, n. 3., pp. 297-314, Emerald Group
Publishing Limited.
National Institute for Occupational Safety and Health. (2004). Worker health chartbook,
Niven, P. (2008). Balanced scorecard: step-by-step for government and nonprofit agencies.
Wiley Editors.
Power, D. J. (2002). Decision support systems: concepts and resources for managers.
Quorum Books.
Prado, A. B., Freitas, C. E. F, & Sbrici, T. R.(2007). Business intelligence as a HRIS for
Rasmussen, Nils, Goldy, Paul S. & Solli Per O. (2002). Financial business intelligence.
Rogalski, S. (2007). What BI 2.0 means for data management. Retrieved Jan, 05, 2008,
from www.dmreview.com/issues/2007_41/10000378-1.html
Siefer (2006). User interfaces for the exploration of hierarchical multi-dimensional data.