apache cTAKES

This is an old revision of this page, as edited by Kiwi128 (talk | contribs) at 05:51, 12 June 2017 ({{lowercase title}}). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Apache cTAKES: clinical Text Analysis and Knowledge Extraction System is an open-source natural language processing system for information extraction from electronic health record clinical free-text. It processes clinical notes, identifying types of clinical named entities — drugs, diseases/disorders, signs/symptoms, anatomical sites and procedures. Each named entity has attributes for the text span, the ontology mapping code, context (family history of, current, unrelated to patient), and negated/not negated.

cTAKES
Developer(s)Apache Software Foundation
Repository
Written inJava
Operating systemCross-platform
TypeNatural language processing, Bioinformatics, Text mining, Information Extraction
LicenseApache License 2.0
Websitectakes.apache.org

cTAKES was built using the UIMA Unstructured Information Management Architecture framework and OpenNLP natural language processing toolkit. Its components are specifically trained for the clinical domain, and create rich linguistic and semantic annotations that can be utilized by clinical decision support systems and clinical research.

These components include:

  • Named Section identifier
  • Sentence boundary detector
  • Rule-based tokenizer
  • Formatted list identifier
  • Normalizer
  • Context dependent tokenizer
  • Part-of-speech tagger
  • Phrasal chunker
  • Dictionary lookup annotator
  • Context annotator
  • Negation detector
  • Uncertainty detector
  • Subject detector
  • Dependency parser
  • patient smoking status identifier
  • Drug mention annotator

History

The development of cTAKES started in 2006 by a team of physicians, computer scientists and software engineers at the Mayo Clinic. The development team was led by Dr. Guergana Savova & Dr. Christopher Chute. This system was deployed at Mayo and is currently an integral part of their clinical data management infrastructure and has processed in excess of 80 million clinical notes.

Currently, the core development team is co-located at Boston Children's Hospital following Dr. Savova's move there in early 2010. Additional collaborations with external groups including University of Colorado, Brandeis University, University of Pittsburgh, University of California at San Diego continue to extend the capabilities of cTAKES into areas such Temporal Reasoning, Clinical Question Answering, and coreference resolution for the clinical domain.

In 2010, cTAKES was adopted by the i2b2 program and is a central component of the SHARPn Area 4

In 2013, cTAKES released their first release as an Apache incubator project: cTAKES 3.0

In March 2013, cTAKES has graduated to an Apache Top Level Project (TLP) [1]

See also