Drug Information Retrieval & Storage
Drug Information Retrieval & Storage
Drug Information Retrieval & Storage
[email protected]
Vrije Universiteit Brussel
Pleinlaan 2, B-1050 Brussels, Belgium
4
-Interruptions
-Questions
-Remarks
-Discussions
are welcome
5
Text information
storage and retrieval systems
Database systems:
definition
Software for
information storage and retrieval
(ISR software)
Text(-oriented) database management systems
(Text-DBMS)
Text information management systems
(TIMS)
Document retrieval systems
Document management systems
10
Text-information management
systems: characteristics and definition
Text-information management:
from free-form to structure
12
Functions of
database management software
+
Export / Import
13
Display formats
Display formats Format for
Display
for formats
output toto
Display
for formats
output
for output
display,for
printer, toto exchange/export purposes
display, outputfile
printer, file
display,
display,printer,
printer,file
file
14
Book
Index_term_1 page x1, y1, z1,...
Index_term_2 page x2, y2, z2,... Printed
...
Invisible
Database
Index_term_1 record nr. x1 / field type nr. x1 / field occurrence x1 / position x1
record nr. y1 / field type nr. y1 / field occurrence y1 / position y1
...
Index_term_2 record nr. x2 / field type nr. x2 / field occurrence x2 / position x2
record nr. y2 / field type nr. y2 / field occurrence y2 / position y2
...
...
15
to video display
to printer
to computer file
(printing to a file)
! #
16
Hierarchy
in the use of a database
Database
structure
Input / Editing
Searching / Output
17
18
Information retrieval:
via a database to the user
Information
Information Linear file Inverted file
content
content
Database
Search engine
20
Information retrieval:
the basic processes in search systems
Information Text
problem documents
Representation Representation
22
Result
Resultof
ofaasearch
search
24
Layered structure
of a database
Database
(File)
+
Records in many systems:
relations / links
Fields between
records
Characters
26
$
in the case of multi-linguality:
cross-language information retrieval;
that is when more than 1 language is used
in the contents of the searched database(s)
and/or in the subject descriptors of the searched
database(s) OR
in the search terms used in a query
even when only 1 language is applied
throughout the system
27
Problem:
A word or phrase or term is not the same as a concept or
subject or topic.
Word
Concept
Word
28
$
29
synonyms!
(such as :
Latin names of species in biology besides the common
names,
scientific names besides common names of substances in
chemistry)
30
$
31
32
?? Question ??
Which
Whichproblems
problemsin intext
textretrieval
retrieval
are
are illustrated by the followingsentences?
illustrated by the following sentences?
$
33
34
OK!
36
Problem:
A word or phrase can have more than 1 meaning.
Ambiguity of the meaning of a word is a problem for
retrieval.
This decreases the precision of many searches.
The meaning can depend on the context.
The meaning may depend on the region where the term is
used.
$
Example 37
Example of a word:
Pascal the philosopher
Pascal the computer language
38
Example of sentences:
The banks of New Zealand flooded our mailboxes with
free account proposals.
The banks of New Zealand flooded with heavy rains
account for the economic loss.
$
39
Problem:
Ambiguity of meaning
may be the cause of low precision.
Relevant concept
Word
Irrelevant concept
NOT wanted
40
ambiguity of meaning
ambiguity of meaning
42
ambiguity of meaning
ambiguity of meaning
44
ambiguity of meaning
Word1 Concept1
Word2 Concept2
Word3 Concept3
46
$
precision.
48
some problems due to language
Representation Representation
50
52
54
Knowledge organisation:
classifications and thesaurus systems
55
Knowledge organisation:
introduction
To organise knowledge / documents / books / reports /
information / data / records / things / items / materials
for more efficient storage and retrieval, some related,
similar tools / systems / methods / approaches are used.
Often but not yet always, this process is assisted by a
computer system.
Good systems are expanded and updated when the need
arises.
The organization system applied should ideally be clearly
and immediately visible or even searchable on computer,
by the user of the materials.
56
Knowledge organisation:
relations between tools
Controlled
vocabularies
Thesauri
Classification systems:
introduction
Classification systems
present the subjects in a
logical order, usually going
from the more general to the
more specific.
Examples 58
Classification systems:
examples of universal systems
Universal means here: covering all subjects
Not just one but several competing systems exist.
Examples
Universal Decimal Classification = UDC
used mainly outside U.S.A.
Dewey Decimal Classification = DDC
used mainly in U.S.A.
Library of Congress Classification
used mainly in U.S.A.
...
59
Thesaurus:
description
Thesaurus (contents) =
system to control a vocabulary
(= words and phrases + their relations)
+ the contents of this vocabulary
Thesaurus program =
program to create, manage, modify and/or search a
thesaurus using a computer
60
Thesaurus
relations
Term(s) with broader meaning
BT (= Broader Term)
NT (= Narrower Term)
Thesaurus applications
related to information searching (1)
62
Thesaurus applications
related to information searching (2)
Thesaurus applications
related to information searching (3)
64
Thesaurus systems
that cover all subjects
General systems
Universal systems
Covering all subjects
Broad and shallow systems
Horizontal systems
Examples 65
Thesaurus systems
that cover all subjects: examples (1)
Examples 66
Thesaurus systems
that cover all subjects: examples (2)
Example 68
70
Examples 72
Knowledge organization:
classifications versus thesauri
Classification
Good for placement of documents (because documents on
many related subjects can be kept together)
Not well suited for computer searching (too complicated)
Thesaurus
Not suited for placement of documents
(because documents with related subjects would NOT be
kept together)
Well suited for computer searching
(relatively simple alphabetic listing of keywords)
74
Pictures on computers
Graphics formats:
bitmaps and vector graphics
Bitmap/raster graphics
used in programs for painting
Vector-based graphics
used in programs for drawing
76
Resolution adapted - +
to output medium
Graphics formats
for bitmaps only
File name extension Origin
Examples 78
Graphics formats
for vector graphics (+ bitmaps)
Pictures on computers
Bitmap pictures
80
82
Disks
83
Disks:
overview of various types
Hard disks
84
Disks:
comparison of types
'!
Floppy
#
Hard Optical Other
Failure rate $ High Low Low
Disk capacity $ Low High High High
Storage cost per bit $ High $ High Low Low
Speed of data access and
transfer $ Lowest High $ Low
Exchangeability + $- + $-
Transportability + $- + +
Risk of disk crashes Low $ High Low
85
Disks:
decreasing prices: 1970-1995
Source
Sourcecited
citedinin1997:
1997:
http://community.bellcore.com/lesk/ages/ages.html
http://community.bellcore.com/lesk/ages/ages.html
86
Disks:
decreasing prices: 1995-
#
87
Disks:
formats
'! #
Physical format Hardware
88
Disks:
data transfer rate
Transfer rate
= the speed at which the computer reads data from a disk
#
once the data is found (kB/s)
'!
Floppy CD-ROM Hard
disk disk
Compact disc
$ Stable from 1985: no increase of capacity
High compatibility with CD and DVD drives
90
Tape drives
Removable drives
Recordable CD (not rewritable)
Recordable and rewritable CD
Extra hard disk
DVD-RAM
Network drives
91
+Pros:
inexpensive hardware
low media cost
large capacity
-Cons:
slow
serial storage; no fast random access
92
Named CD-R
The files are not erasable; not rewritable
+Pros:
inexpensive; low media cost
random access storage
disks can be read by most CD-ROM drives
-Cons:
limited to 700 MB
Not erasable / rewritable / reusable
93
Named CD-RW
The files are erasable; rewritable
+Pros:
inexpensive; low media cost
(but more expensive than CD-R)
random access storage
-Cons:
limited to 700 MB
disks can NOT be read directly by most CD-ROM drives
94
+Pros:
fast
random access storage
-Cons:
expensive but prices come down
not removable / cannot be kept off-site
95
+Pros:
random access storage
large capacity
-Cons:
drives are expensive
not many drives are available
96
+Pros:
random access storage
large capacity
inexpensive
-Cons:
fragile disks
low reliability
97
Compact Discs
98
Media based on
optical discs
100
Compact Discs:
reading the data
Compact Discs:
storage capacity (Part 1)
102
Compact Discs:
storage capacity (Part 2)
104
DVD:
description
106
DVD-ROM:
comparison of design with CD-ROM
DVD-ROM: comparison of
performance with CD-ROM
108
Questions?
Suggestions?
Topics for discussion?