Big Data Analytics in Healthcare Industry

Amjad Ali
Dr. Ifran Khan Tanoli
Szabist University Karachi
Szabist University Karachi
Karachi, Pakistan
Karachi, Pakistan

Abstract— Big Data has revolutionized all industries from different sources in different format. As the
because of sheer size and enormous amount of data available unstructured data is increasing day by day it becomes
for analytics. Healthcare industry is no different. Big Data difficult for traditional database management systems to
Analytics is promising in providing world of opportunities in extract knowledge out of it. Big Data is the solution to such
healthcare. It can assist in early detection, prevention, type of problems as it may help in extracting the knowledge
prediction and thereby would improve the quality of life. Many from structured, unstructured and semi-structured data.
scientists and researchers are working on use of Big Data
Technologies for a better impact on health in future. Health Industry analyst reports show that Big Data and analytics
Industry is one of the significant industry and remained under can generate significant financial values in many vertical
focused since the existence of man. Various tools and markets, including healthcare, finance, retail, environmental
techniques are being used to collect, process, analyze and research, genomics, and biological and life science research.
manage large amount of data in both structure and
unstructured form. In this paper, we have provided an Big Data alone for analytics in a single institutional data
overview of Big Data, it’s applicability in field in healthcare, are vulnerable to institutional data bias and also a privacy
some of the work in progress, how can it improve the quality of issue, when data brought from multiple institution in central
life, concept of federated learning and a future outlook. data lake. One way to solve this privacy problem is federated
Keywords—analytics, prevention, detection, big data,
federated learning, structured, unstructured
I. INTRODUCTION Digitization in hospital system for provision of quality of
life has brought a major change in medical industry and
Digitization of information system and later induction of changed the way we manage healthcare data and made use of
Big Data technologies has become a buzz word now a days it. One of the most notable areas where data analytics is
whose meaning and use of it sometime misunderstood. making big changes is healthcare. Big data in terms of
Digitization is the process of converting continuous, analog, healthcare is defined as the name given to larger and
hard copy data into discrete information into digital and complex healthcare datasets that are difficult to manage by
machine-readable format, which has reached broad employing common traditional methods, tools or software.
popularity with the mass digitization projects. Data generated Big data in healthcare are generated by healthcare records
today is its increasing variety in type. Data can be (such as patients’ record, disease surveillance, hospital,
characterized as Structured data (traditional text/numeric medicine, health management, doctor, clinical decision
information) unstructured data (audio, video, images, text support or feedback of patient) and clinical data (like
and human language) and semi-structured data, such as imaging, personal, financial record, genetic and
XML, RSS feeds etc. The diversity of data types has posed pharmaceutical data, Electronic Medical Records etc. The
big challenges to the organizations in order to make value of generation and management of these enormous healthcare
the extensive informational assets available. records considered very complex; thus, big data analytics
have been introduced. With the rise of technological
innovation and personalised medicine, big data analytics has
the potential to make a huge impact on our life, by helping to
predict, prevent, manage, treat and cure disease.
Furthermore, it helps government agencies, policy maker,
and hospital to manage resources, improving medical
research, planning preventative methods, and managing

Big data is exactly what the name suggests, a “big” In fact, analytics of healthcare has the potential to
amount of data. Big Data means a data set that is large in analyze, predict diseases, reduce costs of treatment, avoid
terms of volume and is more complex. Because of the large preventable diseases, and thereby improve the quality of life.
volume and higher complexity of Big Data, traditional data The average human lifespan is increasing across the world
processing software cannot handle it. The extent of the population, which poses new challenges to today’s treatment
dataset size and the complexity of operations needed for its delivery methods. Health professionals, just like business
processing entail stringent memory storage and entrepreneurs, are capable of collecting massive amounts of
computational performance requirements. Its data set can be data and looking for the best strategies to use these numbers.
defined with five definitions i.e. variety, velocity, volume, Healthcare data are not only collected from clinical record,
value and veracity. tele-monitoring or medical tests but also there are a larger
number of healthcare apps. Figure 1 below depicts the
Big data’s processing can be done in four layers i.e. data ecosystem of healthcare assisted by big data and cloud
collection, data storage, data processing, and report & data computing approaches.
anlaysis. Main challenge is to collect huge amount of data
D. Value
The value of data in the healthcare ecosystem refers to
the degree to which it is beneficial and useful. For example,
raw data like paper prescriptions, official record or patient
information are less valuable than diagnostics record,
medicines and laboratory instruments reading record.

E. Veracity
Veracity tells the reliability or understandability of
healthcare record that explains the capturing of diagnosis,
procedures, treatments, etc.
Figure 1 : Healthcare assisted by big data
As per ESOMAR , Vantage Market Research the market HEALTHCARE?
size of AI in healthcare is projected to reach $95.65 billion
by 2028. Analyzing big data can aid healthcare stakeholders to
deliver efficient procedures and insights into the patients and
their health. Numerous benefits can be obtained with big data
analytics. Main source of healthcare data are: Electronic
Health Records (EHR), Laboratory Information Management
system, Pharmacy, Monitoring and diagnostic instruments,
Finance (Insurance claim and billing) and hospital resources.
With the advancement of data acquisition devices and
analytics techniques data source are getting enriched with
newer forms of data, i.e., hospitals start to collect genetic
information in EHR as well. Within this vast variety of
patient data lie valuable insights for both patients and
III. 5 V’S IN HEALTHCARE organizations, which when applied judiciously can bring in
wonderful results. Potential benefits include advanced patient
A. Volume care:-
Volume refers to the medical record of personal data,
clinical data, radiology images, genetics and population A. Quality of care
information. The exponential increase in diseases and EHR helps in assembling demographic and medical data
medications further contributes to the growth of data in the such as clinical data, lab test, diagnoses, and medical
healthcare industry. To effectively handle this data, modern conditions. Discovering associations and patterns within this
techniques such as advances in data management, cloud data helps healthcare practitioners to provide quality care,
computing, and visualization play a vital role in healthcare save lives and lower costs. Analysis of healthcare big data
systems. The healthcare sector generates more than 19 also contributes to greater insight into patients cohorts that
terabytes of clinical data alone each year, and that sum are at greater risk for illness, thereby permitting a proactive
doesn’t even begin to consider other forms of healthcare approach to prevention. Big data analytics can also be used to
data. Estimations suggest that the volume of big data in educate, inform and motivate patients to take responsibility
healthcare reached over 100 zettabytes by 2025, highlighting for their own wellness.
the immense growth and importance of managing and
utilizing such data effectively. B. Disease prevention and Cost Reduction
Spending more on health does not guarantee health
B. Variety
system efficiency. The investment in prevention can help to
The variety of healthcare records is immense and reduce the cost as well as improve health quality and
includes patient information, doctor's notes, prescriptions, efficiency. Health systems face considerable challenges in
official medical records, as well as images from MRI, CT endorsing and protecting health at a time when the burden on
scans, and radiography films. finances and resources is substantial in many countries. The
early detection and prevention of disease plays a very
C. Velocity important role in reducing deaths as well as healthcare costs.
Another important characteristic of healthcare data is
velocity, which refers to the speed at which data is generated C. Error Minimization and Precise Treatment
or changes. There are different velocities at which healthcare Prescription errors are a serious problem in healthcare
data can be encountered, either at rest or in motion. there are organizations. Because humans will always make the
situations where high velocity is crucial, often becoming a occasional error, patients sometimes end up with the wrong
matter of life or death. In these cases, real-time data prescription which could cause harm or even death. Big data
monitoring is required. Examples of high-velocity healthcare can help reduce those error rates dramatically by analyzing
data include monitoring the internal functioning of the heart, the patient’s records with all prescribed treatments and
anesthesia and trauma monitoring for blood pressure flagging anything that seems out of place.
fluctuations etc.
D. Advancement in the System
Being able to access numerous data points in just a few
seconds would, of course, enable the more rapid discovery of
effective solutions. Conditions could be more easily Also in September 2021, Amazon Web Services (AWS)
treatable, and personalized solutions for lesser common launched a new healthcare analytics platform called Amazon
health problems might also be obtained. Any advancements HealthLake, which is designed to help healthcare
in the healthcare sector are worth considering. And currently, organizations aggregate and analyze large amounts of patient
things seem to be going on the right track with the further data.
development of tech tools.
E. Strategic Planning
The use of big data in healthcare allows for strategic A. North America
planning thanks to better insights into people’s motivations. One of the biggest markets for healthcare analytics is
Care managers can analyze check-up results among people in North America, which is mainly driven by the United States.
different demographic groups and identify the factors that One of the reasons propelling the growth of the healthcare
discourage people from taking up treatment.
analytics market in North America is the existence of a well-
established healthcare industry, rising demand for high-
F. Reduce Fraud and Enhance Security
quality healthcare, and a growing emphasis on value-based
Personal data is extremely valuable and any breach would care.
have dramatic consequences. With that in mind, many
organizations have started to use analytics to help prevent B. Asia Pacific.
security threats by identifying changes in network traffic, or The Asia-Pacific healthcare analytics market is
any other behavior that reflects a cyber-attack. anticipated to expand rapidly, largely due to the influence of
nations like China, India, and Japan. Some of the factors
V. HEALTHCARE ANALYTICS MARKET PLAYERS propelling the development of the healthcare analytics
In 2023, the financial analytics is estimated to account for market in the Asia-Pacific region include the rising demand
the largest share of the healthcare analytics market. The for high-quality healthcare, the rising adoption of digital
healthcare analytics market key players include :- technologies, and the rising investments in healthcare
 IBM (US)
 Optum (US)
 Cerner (US)
 SAS Institute (US)
 Allscripts (US)
 McKesson (US)
 MedeAnalytics (US)
 Inovalon (US)
 Oracle (US)
 Health Catalyst (US)
 Cotiviti (formerly Verscend Technologies) (US) A. Asthmapolis
 CitiusTech (US) It utilizes a GPS-enabled tracker on inhalers to monitor
 Wipro (India) usage and improve asthma treatment. The device records
when and where the inhaler is used, transmitting the data to a
 VitreosHealth (US)
website. Combined with information on asthma triggers and
In June 2021, IBM announced a partnership with the City pollen counts, the system provides personalized insights for
of Hope medical center to use its Watson Health Imaging AI patients and helps physicians create targeted treatment plans.
platform to help clinicians better analyze medical images and This data-driven approach enhances awareness and allows
improve patient care. for dosage adjustments based on higher-risk periods of the
In August 2021, Cerner Corporation, a leading health
information technology company, acquired Kantar Health, a
data analytics and consulting firm, to strengthen its ability to B. Battling the Flu
deliver data-driven insights to healthcare organizations. The CDC (Centre for Disease Control) utilizes big data
analytics to combat influenza, a disease that causes millions
In September 2021, Optum, a subsidiary of UnitedHealth of deaths annually. With over 700,000 flu reports received
Group, announced the acquisition of Change Healthcare, a weekly, the CDC analyzes this data to create FluView, an
healthcare technology company that specializes in data application that provides real-time insights on the spread of
analytics and revenue cycle management. The acquisition is the disease. FluView offers precise patient locations,
expected to enhance Optum's analytics capabilities and information on flu strains, recommended vaccines, and
expand its customer base. effective antiviral medications to aid healthcare professionals
in treatment decisions and improve patient recovery.
C. MD Anderson’s Moon Shots Program to Fight Cancer these areas are now enabling healthcare industry in more
The Moon Shots Program aims to improve survival for comprehensive and valuable analysis, leading to deeper
several types of cancers. The Moon Shots Program, a insights and thereby better quality of life.
collaborative effort between industries and cancer As data continues to grow exponentially, the question
researchers, incorporates Big Data and Massive Data remains whether our analytical capabilities can scale fast
Analytics. A Big Data platform facilitates the centralization, enough to provide valuable insights in a timely manner. The
integration, and secure access of patient and research data, days of exporting data weekly, or monthly, then sitting down
along with analytical results. The program also utilizes a to analyze it are long gone. In the future, big data analytics
Massive Data Analytics infrastructure to enable complex will increasingly focus on data freshness with the ultimate
analytics and clinical decision support systems. Integrated goal of real-time analysis of data especially in case of chronic
patient information, including clinical and research data, is diseases.
leveraged to enhance cancer research and clinical practices
within this initiative. The rise of social media and mobility has empowered
patients with greater access to information, making them
D. NIH Big Data to Knowledge (BD2K) more knowledgeable about their healthcare options. This is
facilitated by tools and technologies that provide access to
The NIH Big Data to Knowledge (BD2K) Initiative aims data and reports, which are supported by powerful Big Data
to assist biomedical scientists in effectively utilizing the vast platforms performing extensive searches, aggregations, and
amounts of Big Data generated by research communities. pattern recognition. In the near future, we can expect the
emergence of new data sources and analytics technologies
VIII. RECENT DEVELOPMENT that will revolutionize the practice of medicine. In future the
research area would be more focused towards in real time
A. Snowflake analytics and privacy of user data (can be achieved through
Snowflake, announced Snowpipe streaming at this year’s federated learning); both these aspects are critical in
summit. The company has refactored their Kafka connector healthcare industry.
and made it so that when data lands in Snowflake it is
Integrating data from multiple sources, tools, and
technologies enables informative extrapolations of Big Data
in healthcare, leading to innovative solutions. Big data,
characterized by dense and complex datasets, has been a
longstanding concept. What's noteworthy is the evolving
ability of data engineers, scientists, and analysts to
effectively manage, experiment with, and analyze this vast
resource of raw business insights. Ongoing developments in

