SearchingForInformation Lanning2014
SearchingForInformation Lanning2014
SearchingForInformation Lanning2014
There are many, many electronic resources available for use. In this chapter we
examine database basics and how search works. These concepts have broad
application across databases. The focus of this chapter is on vendor-supplied
databases, those commercial resources you buy on behalf of your students.
Many of these resources are purchased on your behalf by the state, county, or
school district for use in the school library. This chapter will help you instruct your
students and teachers in the effective use of your library’s electronic resources.
WHAT IS A DATABASE?
Electronic resources are databases of information. Databases store informa-
tion and provide a means to retrieve it. The standard means of storing informa-
tion in a database is a record. A record consists of fields. Each field is a
container for a specific type of data. Here is a sample list of fields:
Author
Title of the article
Name of journals
Publication information
Subject headings
Abstract
Each individual field has little value without the other. Taken together, the
data in fields provide information about one item in the database. The fields
listed here comprise standard information found in a database of magazine
105
106 Reference and Instructional Services
and journal articles. With the exception of the author field, these fields re-
present required information. You cannot have a record that just has the title
of an article and nothing else. Other fields may be included in a database that
each contain specific information. This is how databases impose a structure
on the information they collect and how they enable searching.
Can a printed reference resource be a database? For example, look closely at a
dictionary. It is easy to see the structure used to organize the information into a
record. A record is all the information associated with a word. The fields are the
headword (the word being defined), part of speech, pronunciation, definition,
and so on. All the records together make up the database that is the dictionary.
SEARCH MECHANICS
Search mechanics are the commands the search engine software interprets,
and how it interprets them to execute a search. They are the language of the
search engines—as opposed to the language used in the search. They have their
own vocabulary and grammar. Whatever you enter into a search box is the
Searching for Information 107
search statement. It is what the search engine processes and parses to execute
a search. The search engine looks for command words like Boolean operators to
tell it what to do.
• AND
• OR
• NOT
The operators do not have to be in upper case, but we will write them this way
to make them easier to see in the text of this book.
Venn diagrams are used to illustrate how Boolean operators work. They were
developed by John Venn (1834–1923), who was also an English mathematician
(Gillispie 1970b). Venn diagrams consist of overlapping circles with appropriate
areas shaded to represent the application of a Boolean operator on search terms
and search sets. You will see them in Figure 12.1.
AND
AND is the most important and useful of the Boolean operators. It is the oper-
ator of intersection. It finds where search terms overlap and narrows search
results. If you are searching a database and want to find all the articles that
mention both “Facebook” and “anxiety,” you use AND. The search statement is:
Facebook AND anxiety
The Venn diagram for this search is:
Numbers may help explain how AND works. For example, you search a data-
base and find 1,000 articles about Facebook. You then do a new search and find
500 articles about anxiety. When you search for “Facebook AND anxiety,” you
find 40 articles. These 40 articles represent the only articles in your database
that mention both of your search terms. This is the intersection of your ideas.
You may use multiple ANDs in your search. Each AND reduces the number of
articles you find. This is a great way to refine your search and to narrow the
focus of your topic. For example, if we add a third search term, students, to
our search statement, it would change our search to:
Facebook AND anxiety AND students
The Venn diagram now becomes more complicated. Figure 12.2 shows the
Venn diagram for two ANDs.
Instead of finding 40 articles, this time the search retrieves only 12. These 12
articles may better reflect the intent of your search and be more relevant.
Relevance versus retrieval reflects a basic concern when searching and a prob-
lem for search engine designers. In general, the higher your retrieval of items,
the lower your relevance, and the higher your relevance of items retrieved, the
lower your retrieval. For example, you are interested in the effect that
Facebook usage has on the anxiety levels of high school students, and your
search statement is Facebook.
This search would result in very high retrieval and very low relevance. “False
hits” is the terminology used to describe the results that are not relevant to the
search. Using our first example search statement of “Facebook AND anxiety”
would lower the retrieval significantly while also increasing the relevance of
the retrieved articles significantly. However, as the relevance rises and the
retrieval lowers, the likelihood that relevant items will be eliminated from the
results also increases.
For example, if we had searched for “Facebook AND anxiety AND depression
AND boys AND high school,” the results, if there are any, would be highly rel-
evant. However, we may have cut out the perfect articles, because one article
studied both boys and girls using the term “students,” another used the term
“teenagers” never mentioning high school, and a third talked about anxiety dis-
orders, but did not directly say “depression.” In this case, we have “missed hits,”
positive results that have been eliminated by a narrow search.
Search engine designers look for ways of increasing relevance without
adversely impacting retrieval. Some of these methods are examined later in this
chapter. As a searcher, you need to be aware of this phenomenon and seek a
balance between relevance and retrieval in your searches. This concept may
be too much for many students to understand, but you can present the idea
Searching for Information 109
as narrowing and broadening search results based on how many items were
retrieved by the search. One hundred items are too many; you will need to nar-
row your search. Four items are too few; you will need to broaden your search.
OR
NOT
For example, you know that there are 1,000 articles about Facebook in our
database and 500 articles about anxiety. A search for “Facebook NOT anxiety”
will find 960 articles. You know this because your AND search for those terms
found 40 articles that overlapped. NOT eliminates the second term and, there-
fore, the overlap between the two terms.
NOT is the hardest, and the most specialized, of the Boolean operators to use
effectively. It is used to improve the relevance of your search by eliminating
ideas you do not want in your search. You want to search for students, but
you do not want to include “college students” in your search results. You can
search for “students NOT college” to eliminate that idea from your search and
improve your search’s relevance.
Searches composed of only Boolean operators can be quite effective and effi-
cient. Even so, false hits are commonplace, especially when searching large
databases or searching the full text. Many databases contain the full text of
the information sources. For example, an article database’s content may be
60 percent full text, with the other 40 percent of the content having only an
abstract. A newspaper database’s content may be 100 percent full text, which
means that every item in the database represents the full article from the news-
paper. While many databases are largely constructed of full-text items, most
databases do not default to a full-text search. The search engine searches only
a select set of fields, such as author, title of the article, name of the journal, sub-
ject terms, and abstract. If you wish to actually search the entire content of the
items in the database, you have to choose to do a full-text search.
Proximity and phrase searching improve the relevance of your search and are
extremely useful search tools. They are also extremely limiting. In a proximity
search, a word is searched for in relation to another word. Proximity operators
are not used that often anymore. One problem with proximity operators is that
they are not standardized across vendor platforms. Each vendor has its own
implementation of proximity. You will need to check the help screen in the data-
base to find how proximity is implemented in the database you are searching.
Here is an example of what a proximity search may look like:
bears N/5 Utah
This search looked for the word “bears,” and “bears” had to be within five
words of the word “Utah.” The assumption with proximity searching is that
Searching for Information 111
close proximity implies strong relationship. Because “bears” was found close to
“Utah,” the articles retrieved should be focused on “Utah bears,” as opposed to
Wyoming or Montana bears, though they have not been excluded from the
search. The number of words that come between the search terms can be
changed by changing the number in the operator: N/8 would allow up to eight
words, while N/2 would allow only two words.
Proximity operators can also require that the search terms occur in a specific
order. In the earlier example, “bears” could show up on either side of the word
“Utah.” If we change the operator to a W/5, then we are telling the search engine
to find “bears” within five words before “Utah.”
There are many proximity operators. The search engine you are using will
determine which ones are available to you. There are proximity operators that
require the search terms to be in the same field as each other, within the same
sentence, or within the same paragraph. There may be two sets of these opera-
tors for those terms: one for order and one for no order.
Phrase searching is much less complicated than proximity search, and the
phrase operator is largely the same across databases and even Web search
engines. The phrase operator is the double quotation mark. Anything enclosed in
quotes is searched for exactly as typed. Assume you have a student who wants to
find information about a specific breed of dog. You could construct the search as:
fox AND terriers
However, you can see that there will be a number of false hits. You can use
proximity to improve your search:
fox W/2 terriers
But a phrase search is the most appropriate and effective:
“fox terriers”
That search will find only articles that mention “fox terriers.” Phrase search-
ing is an ideal way to search for multiword phrases that convey one idea. For
example, “Bryce Canyon National Park” is the phrase that represents one spe-
cific place, and “information behavior” represents an idea. A phrase search is
essentially a proximity search with order:
fox W/0 terriers
Proximity and phrase searching are powerful tools, with phrase searching
being the preferred method. These search operators are much more limiting than
AND. Care needs to be taken when using them. The longer the phrase is, the fewer
items that will be retrieved. This raises the number of missed hits. Phrases
should be a unique representation of an idea. For example, you do not want to
search for the phrase “California laws” since that idea can also be expressed as
“laws of California” or “California state laws,” or even as “California code.”
Web search engines do a wonderful job looking for alternate spellings and
plurals of the search terms we enter. Vendor search engines were not this
sophisticated in the past, and you needed to use special operators to enable this
112 Reference and Instructional Services
kind of searching. Now these search engines also automatically search for plu-
rals and alternate spellings. They are just not as good at it as the Web search
engines, yet.
Truncation describes the process of finding word endings on a truncated
term. The asterisk (*) is widely accepted as the truncation symbol in vendor
databases. For example, you can search for: change*. The asterisk will find
any word ending, including no word ending, that can be placed on “change”
and make sense. This search would yield: change, changes, changed, change-
ling, or changeable. As you can see, the asterisk performs an OR search. If your
truncation looked like this: chang*, the search would also yield “changing.”
Truncation does not find synonyms. Our search for change* will not find “alter”
or “modify.” It finds only word endings.
Truncation is also known as right-hand truncation because you truncate on
the right side of the word. Left-hand truncation is very rare. It is simply not nec-
essary in most searches. However, it is implemented in the electronic version of
the Oxford English Dictionary. Left-hand truncation allows you to search for the
words that end in “gry,” for instance, which is a big help to librarians who have
to answer that question.
Wildcards are used to find alternate spellings of a word. A classic example is:
colo?r. This search is the equivalent of: color OR colour. The question mark is
the wildcard symbol. It tells the search engine to find either zero or one other let-
ter that makes sense in this context. The only other letter that works is the letter
“u,” which gives us the British spelling “colour.” Vendor databases are handling
this alternate spelling without the need for using the wildcard operator.
However, a more esoteric example navajo/navaho stumps the search engine
and requires the use of a wildcard in such a search as nava?o. In most cases,
wildcards will not be necessary anymore.
Truncation and wildcard operators are not standardized. You would need to
consult the help screen from the database vendor to see which characters it
uses for these operators. The need for these operators is dwindling, but they
may still be necessary to improve a search. You need to be aware of them, but
their value to students is very limited.
Order of execution refers to the order in which search engines interpret com-
mands. It may seem like this should not matter, but it does, and it makes a big
difference in the search results you retrieve. For example, a student is working
with this research question: What impact do genetically modified foods
(GMOs) have on insects and birds? The search statement for this research ques-
tion could look like this:
insects OR birds AND “genetically modified foods” OR GMOs
We understand the intent of the search statement. The search engine does
not. It follows rules for interpreting search statements and which commands
to execute in what order. In general, search engines will execute AND state-
ments first and then OR. Figure 12.5 shows how the search engine interprets
our search.
The results of this search are not what we expected. While “birds AND genet-
ically modified foods” is good, getting everything in the database about
Searching for Information 113
insects and everything about GMOs is not. Our search is a mess with the
relevant material buried within a large number of irrelevant hits. The Venn
diagram for this search is in Figure 12.6.
To fix this problem, we need to control the order of execution. One way to do
this is to use parentheses. Parentheses tell the database to execute the com-
mands found within them first. This search technique is called nesting. You
can see in Figure 12.7 that nesting the operators fixed the problem we had with
the order of execution in our search. These results are what we wanted. The
Venn diagram in Figure 12.8 further illustrates the success of this search.
This is the Venn diagram for our nested search.
Nesting is an essential concept to understand in order to search effectively. In
our example, the parenthetical material in each nest forms a set, and then those
two sets are AND’ed together. That is why the Venn diagram for our search looks
exactly like the Venn diagram for any two-term search. One circle is used for
each set.
Set Logic
The Venn diagram in Figure 12.8 is an example of set logic. The advanced
search screens on vendor-supplied databases use set logic. We will talk about
that more in the next chapter. Using advanced search screens or crafting long
searches with parentheses is one way to use set logic. The other method is often
buried in the search engine under such a heading as “search history.”
Use the previous nesting example, and instead of searching for all the infor-
mation at once, search only for the idea contained in the first nest or set. Each
set should represent one idea and its synonyms. Next, you would search for
the second idea contained in the second set, then a third idea, and so on. Now
you have the results from multiple searches, but the results are not related to
one another. Finally, you combine the sets using the AND operator and the
database’s designation for the sets. It may look like this:
S1 AND S2 AND S3
Using set logic in this way controls for the order of execution and allows you
to restructure your search easily. You can combine, recombine, and add addi-
tional sets to your search until you get the results that work best for your infor-
mation need. This technique works well for searches that seek to combine many
ideas. If the search retrieves too few hits, then a set can be dropped from the
search to find more hits. If too many items are found, then a set can be added
to retrieve fewer but more relevant hits.
Since this search option is usually difficult to locate in a search engine, it is
better to use the advanced search screen and teach students how to use each
line to create a search set that represents one idea.
SEARCH TERMS
In this chapter, we have so far discussed search mechanics or how search
engines work. Now, we will turn our attention to the ideas we wish to find and
the words we need to use to find them. Search terms are those words we input
into the search engine. Choosing the wrong words can lead to a failed search,
Searching for Information 115
whereas choosing the right words can lead to success. How do we choose the
terms for search?
Keyword
A keyword is a significant word found in the title and frequently used in the
text of a document that represents the content of the whole document
(“Keyword [linguistics]” 2013; “Keyword, N.” 2013; Merriam-Webster, Inc.
2013c). In Web 2.0 terminology, a keyword is a tag. As a search term, a keyword
is a term that, you hope, meets that definition and returns the results you are
after. It is a single word, or short phrase, which embodies one of the concepts
or ideas that you are searching.
Choosing the right keywords can be difficult for anyone to do. Expect it to be
difficult for students who have less experience researching and less knowledge
of the subjects they are researching. To help students find keywords, start with
the research question. A student wants to find information on junk food and its
contribution to weight problems in kids.
Pick out the most important concepts, the biggest ideas that describe this ques-
tion in the fewest words. You should be left with “junk food,” “weight problems,”
and “kids.” These are your keywords. Keywords are often nouns. “Contribution”
is not a keyword. It does not further clarify the topic. It has a verb form and can
be replaced with words like “effect” and “impact” without changing the meaning
of the research question. It is descriptive of the relationship between the keywords,
but that relationship will be found through the use of the Boolean AND.
Think of synonyms and alternate ways to express these ideas. “Junk food” is
a specific concept. “Fast food” may be considered a subset of “junk food,” and a
useful synonym can be made for a search. “Snack food” is too broad and has
less of a negative implication. Snack food can also be healthy. “Weight prob-
lems” could be “obesity,” and “kids” could be “children” or “teens.” “Weight
problems” and “kids” could be “childhood obesity.” Now you link the ideas
together using AND, and your search statement is:
“junk food” AND “childhood obesity”
Now, the evaluation process takes over. Did this search work well? If not, is
there at least one good item on the results list? What keywords did that item
use to describe the topic? Does the search need to be refined and tried again?
Keyword searching can be difficult, because the best keywords for a topic may
not be known. It is important to be flexible, to explore, to evaluate results, and
to refine search statements.
One way to improve the relevance of your search results is to use controlled
vocabulary terms as your search terms. A controlled vocabulary is a list of pre-
scribed subject terms that are used to describe items. These lists of terms corre-
spond to a database and were published in book form in the past. Thesaurus of
ERIC Descriptors or Library of Congress Subject Headings are two examples.
Both of these resources list the prescribed subject terms that the indexers at
ERIC and catalogers across the country are required to use to describe the
116 Reference and Instructional Services
Stop Words
Stop words are words that you cannot find with a search. Stop words, like
“and,” “or,” “not,” “with,” and “in,” are the commands used to tell the search
engine what to do. Stop words, like “the,” “it,” “a,” and “an,” are short, common
words that retrieve too many items if you could search for them. Stop words can
be thought of as a subset of the controlled vocabulary. These are the list of
prescribed terms that cannot be searched.
FEDERATED SEARCH
Federated searching is fading away, as Web-scale discovery systems move
into the technology spot light. A federated search allows customers to search
multiple databases across a number of vendors at the same time. There are
advantages to this kind of searching. Customers do not have to pick which
database is the best for their topic. Good resources that are not first choices will
receive more searches and more exposure. Articles, books, eBooks, and media
resources can all be searched from a Google-like search box.
The simplified model given in Figure 12.10 shows that a search using a feder-
ated search engine is sent to the vendors of the various databases being
searched, including the library catalog. The results are returned from the data-
bases to the federated search software that processes the results, removes
duplicate information, and displays the results to the customer. The results
can be displayed in different manners. The results screen can show the results
from each individual database searched, or they can be integrated into one list.
Federated search can be configured to search subsets of your databases. For
example, you can set up one federated search for your electronic reference
materials, another to search your journal databases, and a third to search all
the library catalogs in the area. Another advantage federated searching offers
118 Reference and Instructional Services
is the ability to give customers results from databases they might not have con-
sidered searching or did not even know they had. It is a fast and simple way to
search through large amounts of information.
Federated searching has its disadvantages. It introduces errors into your
usage statistics, as those little used databases show high numbers of searches
that may not reflect actual use of the material in the database. If you have data-
bases that limit the number of customers who can be using them at the same
time, federated searching will take up one of those seats. This leads to custom-
ers getting false zero results when there are too many customers searching the
database at one time. Federated searching is simplified to work with many re-
sources. The unique search abilities of a particular database can be lost.
Finally, another argument states that federated searching does not teach the
good search skill that students will need in the future, and it even encourages
the “good enough” attitude that students learn from using resources like
Google (Rethlefsen 2008).
Web-scale discovery systems are also called discovery layers and discovery
search. Like federated search, discovery search provides one search box to
search all your databases and your catalog. Your library’s special collections
can even be added into the index and kept private for your own searching or
shared with all the other libraries on that vendor’s system. The major difference
is the unified index.
The vendor takes the indexing information from every database it can,
including your library catalog and special collections, and it puts all this infor-
mation in one large index, the unified index. The system resides in the cloud.
The vendor’s servers host the index and the search software. A wider variety of
searching is possible with a discovery search than a federated search, because
the software no longer has to meet the lowest common denominator.
There are economies of scale for the vendor. Creating the unified index is a
big expense, but adding libraries to the system is an easier task, and each of
these libraries uses the same unified index. This creates more value for the ven-
dor, leading to lower prices as its systems mature and gain clients.
For your library, a discovery search system should require less work on your
part than a federated system. With more search feature and the ability to search
everything in and owned by your library or all the libraries in your district, dis-
covery search is being hailed as a revolution in libraries. In a webinar presented
by SerialsSolutions, Dr. Michael Eisenberg, one of the creators of the Big6, said
that teaching students how to search could be de-emphasized because of the
effectiveness of discovery search (Eisenberg 2012).
Discovery search offers a Google-like search experience for your students,
while providing Google-like access to all of your resources, increasing the value
and usage of your resources. It provides a useful starting point and a familiar
interface for students, while also providing many options for limiting and refin-
ing search results, and not requiring the knowledge of which databases to use
for their information need.
Challenges still remain. We are not to the level of effectiveness that Eisenberg
mentioned. Information overload is an increased problem as more results are
returned. Confusion about what is being presented in the results list, how it is
ranked, and how to find a known item, an item you know you own and can
access, that did not show up at the top of the list is a major issue. Students still
need to be taught how to use discovery search to get the best and most relevant
materials from it or to find specific types of information, like your audiobook
collection. One study compared college student search results on using two dif-
ferent discovery services, plus Google Scholar, and a standard library resource,
and one of their conclusions was that a discovery search does not eliminate the
need for information literacy instruction but may make instruction easier
because it can focus on one interface (Asher, Duke, and Wilson 2013).
Discovery search is still a young product, and it should continue to improve
with time.
SEARCH WORKSHEET
Figure 12.12 shows a sample worksheet to aid in the construction of
searches. It will help with the proper use of Boolean ANDs and ORs. It should
help with the use of keywords and synonyms. It also mimics the advanced
120 Reference and Instructional Services
search screens that you and your customers will see in various databases and
builds familiarity with this important search option.
Each row represents one concept. As you work from left to right filling in syn-
onyms, you add an OR between each term. When the search is executed, the
first concept line is OR’ed together, then the terms on the second concept line
are OR’ed together, then the third, and so on. Finally, each line is AND’ed to
the other. You can add more columns for more synonyms or add more rows for
more concepts, but the more you add, the more confusing the form and the
search become.
Figure 12.13 shows an example of how to use the worksheet.
Vocabulary
and database
Boolean operators discovery search
controlled vocabulary federated search
Searching for Information 121
field record
keyword relevance
not retrieval
or search engine
order of execution synonym
phrase Venn diagram
Questions
Can you draw the parallels between an electronic database and print refer-
ence source or a novel?
Which operator(s) work best to narrow the focus of a search, and why?
Is it better to use truncation or the OR operator to find synonyms, and why?
How is discovery search different from using Google?
Assignment
Create a Venn diagram for the search statement shown in Figure 12.14.
Pick a vendor-supplied database and practice your searching. Pick one term
from each column, search them separately, and then combine them using
Boolean operators. Add a third term of your choosing and combine and recom-
bine them until you feel you understand what the search is doing.
Use the blank search worksheet from Figure 12.12 and write a search state-
ment for the following research question. “What effect does interacting with
dogs and cats have on the health of elderly patients?” Run your search in a
vendor-supplied database and record your results. Was this an effective
search?
Column 1 Column 2
Birds Environment
Elephants Global warming
Frogs Migration
Rabbits Pollution
Whales Predators