IMRAD
IMRAD
IMRAD
I. INTRODUCTION
Text mining technology provides a solution for bridging the knowledge gap
between free-text and structured representation of related information in
cancer research.It has emerged as a potential solution for bridging the gap
between free-text and structured representation of cancer information. It
employs many computational technologies, such as machine learning, natural
language processing and deep learning, to find new exciting outcomes hidden
in the unstructured cancer related articles. There are many applications of
text mining on cancer-related articles, such as identifying malignant tumor
related Biomedical mentions, finding relationships among biomedical entities
such as protein-protein interaction, gene disease network, etc.), extracting
knowledge from text and generating hypotheses. Cancer is a complex disease,
which is related to a large number of genes. Biomedical research is interested
in mining from the cancer-related articles to study cancer diagnostics,
treatment and prevention. There have been a number of text mining applications
specifically focusing on extracting cancer related information; Spasić et
al.provide a comprehensive review of these. The review highlight a strong bias
towards symbolic techniques, i.e., the use of pattern matching for cancer-
related entity extraction to deliver good performance. Pletscher-Frankild et
al. present a system used text mining for extracting disease-gene associations
from biomedical abstracts. The system consists of a highly efficient
dictionary-based tagger for named entity recognition of genes and diseases,
and it combines with a scoring scheme that takes into account co-occurrences
both within and between sentences. Gonzalez et al. present an overview of the
fundamental methods for text and data mining, as well as recent advances and
emerging applications toward precision medicine. Baker et al. provide an
extensive Hallmarks of Cancer taxonomy and develop an automatic text mining
methodology of cancer-related articles from PubMed into the taxonomy. It offers
a great potential to organize and correctly classify cancer-related articles.
In this study, we explore the conceptual content of cancer research based on
the abstract of articles extracted from PubMed. Findings from this study may
be used to identify gaps in cancer research. In addition, we used Topic model
and GloVe for the major task in text mining so the broader scientific community
may be able to gain ideas and insights based on the existing cancer studies
for the development of their own primary studies.
II. METHODS
A. Data Source
Data were retrieved and downloaded from PubMed, a website
(http://www.ncbi.nlm.nih.gov/pubmed/) that provides free access to biomedical
journal citations and abstracts mainly indexed by Medline [7]. The service is
administered by the National Center for Biotechnology Information of the United
States National Library of Medicine. Search terms including “(“neoplasms[MeSH
Terms] OR “neoplasms”[All] OR “cancer”[All]) AND (“2011/01: 2016/12/[DP]”)
were used in the search strategies. The publication date was limited to the
year between 2011 and 2016. The search was conducted on July 30, 2016 and a
total of 925,648 articles were retrieved.
B. Word Clouds
Word clouds are visualizations that display the words that frequently occur
in a text. These visualizations are particularly useful when one has no
preconceived idea of which concepts should occur in a text. In word clouds,
words that appear more frequently in a text are printed in larger fonts than
words that occur less often. The advantage of word clouds is that this
visualization is not biased by the use of a predefined set of concepts or an
ontology, but is driven by the raw content of the text. As such, they can
provide new ideas and insights on a particular concept and can function as a
starting point for more specific searches.
C. Topics Model
Previous text mining studies mainly rely on quantitative measures and suffer
from the lack of content analysis. To incorporate content analysis into cancer-
related articles, text mining techniques are applied. Topic-modeling
techniques are mostly adapted to identify the topics of a subject area while
analyzing that area more abundantly . The Latent Dirichlet Allocation (LDA)
is the best received topic-modeling technique. LDA is a populistic generative
model that assumes each document is mixture of latent topics, where each topic
is probability distribution over all words in the vocabulary. LDAis a three-
level hierarchical Bayesian model, in which each item of a collection is
modeled as a finite mixture over an underlying set of topics. It treats each
document as a mixture of topics, and each topic as a mixture of words. LDA
assumes the following generative process for each document W in a corpus
D:
1. Choose a multinomial distribution __ for document d, Dir(), where _ ∈ {1,…,}
and Dir() is a Dirichlet distribution with a symmetric parameter _ which
typically is spares (_ < 1)
2. Chose a multinomial distribution for topic t, Dir(), where _ ∈ {1,…,
} and is a Dirichlet distribution parameter
3. For each of the work positions i, j, where ∈ {1,…,} and _ ∈ {1,…,}
In this study, we use Gibbs sampling[11] to estimate LDA parameters. Gibbs
sampling is a Monte Carlo Markov-chain algorithm, powerful in statistical
inference and a method of generating a sample from a joint distribution when
only conditional distributions of each variable can be efficiently computed.
Gibbs sampling have widely for LSA parameters estimate[12].
Of the 925,648 publications based on the data for the cancer publishing between
2011 and 2016. Table 1 lists the journals that publish from cancer-related
study between the years 2011 to 2016. The Top 5 journals with cancer related
research article are PLoS ONE, Oncotaget, Asia Pac. J. Cancer Preve., Tumour
Biol. and J. Clin. Incol., with 22543, 9390, 6616, 6364 and 5235 counts,
respectively. The 5-year impact factors of the journals ranged from 2.86 to
16.80, with a mean 9.83. The Top 5 most studied cancer types are breast cancer
(23.82), lung cancer (10.54), prostate cancer (9.90), rectal cancer (8.44) and
ovarian cancer (4.44) (Table 2).
Table1. Top 10 Journals that published cancer-related articles between
2011and 2016
Title (Five-Year Bioxbio Journal Impact* ) Frequency Normalized
(N= 925,648)
A. Word Cloud
The term frequencies in the abstracts of 925,648 articles were visualized as
word clouds with a larger word size represent a higher frequency of appearance
among the articles, as shown in Figure 2. The Top 5 terms with the highest
frequencies both over different periods and over the entire period of 2011 to
2016 are patients, cancer, cell, tumor and study. The five terms are suppressed
in the display of the word cloud to allow a better visualization of the
remaining terms. The words ranked the sixth to the tenth in frequency are
expression, treatment, survival, associated and clinical.
Keyword Frequency The secondary keyword with highest correlation with the keyword (correlation coefficient)
median(0.36), months(0.31), treated(0.27), rate(0.23), retrospectively(0.23), age(0.22), surgery(0.22),
patients 1445688
respectively(0.21), chemotherapy(0.21), retrospective(0.21)
cell 676924 induced(0.21), inhibited (0.21), arrest(0.21), migration(0.21) inhibition (0.21 ), vitro(0.20), viability(0.20),
squamous(0.19), carcinoma(0.18), protein(0.18), stem(0.17), death(0.16), role (0.16),
C. Topic Model
We employ topic modeling to explore the topics as a bunch of words in the
abstracts of cancer-related. The topic model analysis on the cancer-related
articles is presented in Table4. As
can be seen from the table, we find that a dominant word cancer and patient
are in these description texts, in addition there is meaningful difference
between these collections of word, from word about cancer study, patient
treatment, survival of patients, risk of patients, cancer type and cancer
cells study on genome in the topic. Topics 1 is cancer study to word about
cancer, patient, study and data. Topics 2 is treatment of patient about word
patients, cancer, surving and treatment. Topics 3 is about genomics and related
to cell biology. Topics 4 word about women, cancer and breast. This is not
surprising because breast cancer is a common cancer in women. Topics 5 includes
risk of patient and Topics 6 contains survival of patients. The topic modeling
process has identified groupings of terms that we can understand as human
readers.
Table 5. the top 15 most frequently occurred keywords and the most similar words according to GloVe
Keyword Top 15 most frequently occurred keywords most similar words
patients receiving, cases, treated, enrolled, diagnosed, surgery, undergoing, received, advanced, primary
cancer breast, colorectal, lung, prostate, ovarian, gastric, cervical bladder, pancreatic, carcinoma
cell proliferation, migration, lines, apoptosis, growth, human, viability, stem, vitro, differentiation
tumor metastasis, metastatic, invasion, growth, size, tissue, malignant, lesion, cell, differentiation
study conducted, present, retrospective, prospective, carried, examined, evaluated, investigate, aim,
review
expression mRNA, overexpression, downregulation, upregulation, protein, miRNA, levels, Furthermore, gene, correlated
treatment therapy, chemotherapy, adjuvant, regimens, therapies, neoadjuvant, combination, options, radiotherapy, effective.
survival OS (Overall survival), overall, disease-free, DFS (Disease-free survival), PFS (Progression-free survival), progression-free,
associated related, factors, risk, worse, significantly, correlated, increased, poor, significant, occurrence
clinical evaluation, outcome, regarding, prognosis, practice, data, review, implications, preclinical,
diagnostic
risk mortality, incidence, associated, occurrence, prevalence, odds, among, predictors, adjusted, disease
breast endometrial, prostate, ovarian, lung, bladder, colorectal, thyroid, melanoma, metastatic, cervical,
analysis multivariate, Multivariate, Cox, univariate, regression, univariate, logistic, Kaplan Meier, statistical, determined
disease failure, relapse, chronic, recurrence, malignancy, progression, metastases, diagnosis, risk, diabetes
significantly significantly, decreased, reduced, markedly, increased, whereas, higher, correlated, compared, lower
IV. CONCLUSION
Charles C.N. Wang1,*, I-Seng Chang1, Phillip C.Y. Sheu2 and Jeffrey J. P. Tsai1
1
Department of Biomedical Informatics, Asia University, 500, Lioufeng Rd., Wufeng, Taichung 41354,
Taiwan
2
Department of Electrical Engineering and Computer Science, University of California, Irvine, 5200
Engineering Hall, Irvine, CA 92697,
*[email protected]
IMRAD FORM
Reflection
People we spoke to whose cancer had been cured or was in remission, talked about the extent to which
they felt able to put the experience behind them and about what had helped them to overcome the
illness. While some said their cancer was always in the back of their mind, or that they were sometimes
reminded of it, others said they rarely thought about it and their life had moved on (see also ‘Facing the
future’). It was common for the illness to be referred to as merely 'an episode' in their life, a bad time, a
blip, or a page in the book of life, and it was important to live your life and not let it ruin the rest of your
life. Some said it was hard to believe now that it had happened to them and that time was a great
healer. People who were still attending hospital check-ups said they felt odd waiting among patients
who were still having cancer treatment.
I often say that there’s no single right way through cancer. What do I mean by that?
Some people aggressively treat their cancers with surgery and chemotherapy long past the time that
others would have switched to comfort measures.
Some people keep their cancer diagnosis a secret from nearly everyone while others make it a point to
tell strangers on the street.
Some people join support groups while others cringe at the thought of the idea.
Some people, when finished with treatment, try hard to not think about cancer ever again, while others
become engaged in cancer organizations and activism.
Some people want to hear only positive stories about cancer, while others want to hear the good, the
bad, and the ugly.
No single approach to cancer is right or wrong. What’s important is that you follow the approach that
helps you.
This is why I don’t like most books written about cancer. They tell you what you should do as if everyone
is the same. They’re written like cookbooks providing recipes.
We’re all different with varying personalities, living situations, and belief systems. In addition, our
cancers are all different. Your breast cancer is unlikely to be exactly like my breast cancer.
And remember that you can and will change your mind over time. A support group might not work for
you when you’re first diagnosed, but be open to the possibility when your treatment is ending.
Finally, no one really knows when they will choose to stop active treatment until they’re in that exact
situation. I can speculate what I would do, but I don’t know for sure. Neither do you.
As for the rest of us, we can support each person with cancer by listening with kindness and without
judgement.
To those people who do not have cancer please, do all the things to prevent and avoid it because cancer
is a “killing machine” you cannot cure it, well there are researches about cancer but we will not let the
time comes that we are going to experience cancer. There are lots of people who have cancer speaking
that they regret because they experienced that kind of illness. So as a citizen I eventhough I didn’t
experienced and hopefully will not triggered we to listen and apply what people with cancer suggest.
We should keep our immune system and eat healthy foods and avoid unhealthy llifestyle.