Content deleted Content added
m Update Programming Langs |
Seanfinan75 (talk | contribs) Update to version 6 |
||
(24 intermediate revisions by 15 users not shown) | |||
Line 1:
{{Short description|Natural language processing system}}
{{Infobox software
| name = Apache cTAKES
| logo = [[File:Apache Ctakes logo.jpg|250px|Apache cTAKES Logo]]
| screenshot =
| caption =
| collapsible =
| developer = [[Apache Software Foundation]]
| latest release version =
| latest release date = {{
| latest preview version =
| latest preview date =
| operating system = [[Cross-platform]]
| repo = {{URL|https://github.com/apache/ctakes|cTakes Repository}}
| programming language = [[Java (programming language)|Java]], [[Scala (programming language)|Scala]], [[Python (programming language)|Python]]
| genre = [[Natural language processing]], [[Bioinformatics]], [[Text mining]], [[Information Extraction]]
| license = [[Apache License 2.0]]
| website = {{
}}
'''Apache cTAKES: clinical Text Analysis and Knowledge Extraction System''' is an open-source [[Natural Language Processing]] (NLP) system that extracts clinical information from [[electronic health record]] [[unstructured data|unstructured text]]. It processes clinical notes, identifying types of clinical named entities — drugs, diseases/disorders, signs/symptoms, anatomical sites and procedures. Each named entity has attributes for the text span, the ontology mapping code, context (family history of, current, unrelated to patient), and negated/not negated.<ref>{{Cite book|chapter-url={{google books|yVp4CgAAQBAJ|plainurl=yes}}|title=Health Web Science: Social Media Data for Healthcare|last=Denecke|first=Kerstin|date=2015-08-31|publisher=Springer|isbn=978-3-319-20582-3 |chapter=Tools and Resources for Information Extraction |page=[{{google books|yVp4CgAAQBAJ|page=67|plainurl=yes}} 67] |via=Google Books }}</ref>
cTAKES was built using the [[UIMA|UIMA Unstructured Information Management Architecture framework]] and [[OpenNLP]] natural language processing toolkit.<ref>{{Cite journal|last=Khalifa|first=Abdulrahman|last2=Meystre|first2=Stéphane|date=2015-12-01|title=Adapting existing natural language processing resources for cardiovascular risk factors identification in clinical notes
== Components ==
Components of cTAKES are specifically trained for the clinical domain, and create rich linguistic and semantic annotations that can be utilized by clinical decision support systems and clinical research.<ref>{{Cite journal|last=Savova|first=Guergana K|last2=Masanz|first2=James J|last3=Ogren|first3=Philip V|last4=Zheng|first4=Jiaping|last5=Sohn|first5=Sunghwan|last6=Kipper-Schuler|first6=Karin C|last7=Chute|first7=Christopher G|date=2010|title=Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications|journal=Journal of the American Medical Informatics Association
These components include:
Line 46 ⟶ 47:
When Dr. Savova's moved to [[Boston Children's Hospital]] in early 2010, the core development team grew to include members there. Further external collaborations include:<ref name=cTAKES_history/>
* [[University of Colorado]]
* [[Brandeis University]]
* [[University of Pittsburgh]]
* [[University of California, San Diego|University of California]] at [[San Diego]]
Such collaborations have extended cTAKES' capabilities into other areas such as Temporal Reasoning, Clinical Question Answering, and coreference resolution for the clinical domain.<ref name=cTAKES_history/>
In 2010, cTAKES was adopted by the [http://www.i2b2.org i2b2] program and is a central component of the [https://web.archive.org/web/20170430025922/https://www.healthit.gov/policy-researchers-implementers/secondary-use-ehr-data SHARP Area 4].<ref name=cTAKES_history/>
In 2013, cTAKES released their first release as an [[Apache Software Foundation]] incubator project: [http://incubator.apache.org/ctakes/ cTAKES 3.0].{{citation needed|date=July 2020}}
In March 2013, cTAKES became an [[Apache Software Foundation]] Top Level Project (TLP).<ref name=cTAKES_history/>
== See also ==
Line 65 ⟶ 66:
== References ==
{{Reflist}}
== External links ==
* [
* [https://projects.apache.org/project.html?ctakes Apache cTAKES Project Information page] from [[Apache Software Foundation|ASF]]
* [http://jamia.bmj.com/content/17/5/507.abstract Abstract (JAMIA)]
Line 78 ⟶ 79:
* [http://code.google.com/p/cleartk/ Computational Language and Education Research toolkit (cleartk)] (''No longer maintained'') has been developed at the University of Colorado at Boulder, and provides a framework for developing statistical NLP components in Java. It is built on top of [[UIMA|Apache UIMA]].
* [http://code.google.com/p/negex/ NegEx] - is a tool developed at the University of Pittsburgh to detect negated terms from clinical text. The system utilizes trigger terms as a method to determine likely negation scenarios within a sentence.
* [https://web.archive.org/web/20111204211529/http://www.dbmi.pitt.edu/blulab/ConText.html ConText]): an extension to NegEx, and is also developed by the University of Pittsburgh. ConText extends NegEx to not only detect negated concepts, but to also find temporal (recent, historical or hypothetical scenarios) and who the Subject (of experience) is (patient or other).
* [http://metamap.nlm.nih.gov/ MetaMap] (by [[United States National Library of Medicine]]): is a comprehensive concept tagging system which is built on top of the [[Unified Medical Language System]]. It requires an active ''UMLS Metathesaurus License Agreement'' (and account) for use.
* [
* [https://web.archive.org/web/20150626172745/http://knowledgemap.mc.vanderbilt.edu/research/content/sectag-tagging-clinical-note-section-headers SecTag] (section tagging hierarchy): recognizes note section headers using NLP, Bayesian, spelling correction, and scoring techniques. Use is free with either a UMLS or LOINC license.
* ([http://nlp.stanford.edu/software/CRF-NER.shtml Stanford Named Entity Recognizer (NER)]): Stanford’s NER is a Conditional Random Field sequence model, together with well-engineered features for Named Entity Recognition in English and German.
* ([http://nlp.stanford.edu/software/corenlp.shtml Stanford CoreNLP]) is an integrated suite of natural language processing tools for English in Java, including [[Lexical analysis#Tokenization|tokenization]], part-of-speech tagging, named entity recognition, parsing, and coreference.
Line 88 ⟶ 89:
{{Health software}}
[[Category:Apache Software Foundation projects|cTAKES]]
[[Category:Electronic health record software]]
[[Category:Natural language processing software]]
[[Category:Free
[[Category:
|