NLP Unit I
NLP Unit I
NLP Unit I
Natural Language Processing (NLP) is a field of research which determines the way computers can be used
to understand and manage natural language text or speech to do useful things.
Natural Language Processing (NLP) a subset technique of Artificial Intelligence which is used to narrow
the communication gap between the Computer and Human. It is originated from the idea of Machine
Translation (MT) which came to existence during the 1950. The primary aim was to translate one human
language to another human language, for example, Russian language to English language using the brain of
the Computers but after that, the thought of conversion of human language to computer language and vice-
versa emerged, so that communication with the machine became easy.
NLP is a process where input provided in a human language and converts this input into a useful form of
representation. The field of NLP is primarily concerned with getting computers to perform interesting and
useful tasks with human languages. The field of NLP is secondarily concerned with helping us come to a
better understanding of human language.
As stated above the idea had emerged from the need for Machine Translation in the 1950s. Then the
original language was English and Russian. But the use of other words such as Chinese also came into
existence in the initial period of the 1960s.
In the 1960s, the NLP got a new life when the idea and need of Artificial Intelligence emerged.
In 1978 LUNAR is developed by W.A woods; it could analyze, compare and evaluate the chemical data on
a lunar rock and soil composition that was accumulating as a result of Apollo moon missions and can
answer the related question.
In the 1980s the area of computational grammar became a very active field of research which was linked
with the science of reasoning for meaning and considering the user„s beliefs and intentions.
In the period of 1990s, the pace of growth of NLP increased. Grammars, tools and Practical resources
related to NLP became available with the parsers. Probabilistic and data-driven models had become quite
standard by then.
In 2000, Engineers had a large amount of spoken and textual data available for creating systems. Today,
large amount of work is being done in the field of NLP using Machine Learning or Deep Neural Networks
in general, where we are able to create state-of-the-art models in text classification, Question and Answer
generation, Sentiment Classification, etc.
There are variety of output can be generated by the system. The output from a system that incorporates an
NLP might be an answer from a database, a command to change some data in a database, a spoken
response, Semantics, Part of speech, Morphology of word, Semantics of the word or some other action on
the part of the system. Remember these are the output of the system as a whole, not the output of the NLP
component of the system.
Figure 1.2.1 shows a generic NLP system and its input-output variety. Figure 1.2.2 shows a typical view of
what might be inside the NLP box of Figure 1.2.1. Each of the boxes in Figure 1.2.2 represents one of the
types of processing that make up an NLP analysis.
Figure 1.2.1 Generic NLP system.
NLU explains the meaning behind the written NLG generates the natural language using
text or speech in natural language machines.
NLU understands the human language and NLG uses the structured data and generates
converts it into data meaningful narratives out of it
The NLP can broadly be divided into various levels as shown in Figure 1.3.1.
Figure 1.3.1 Levels of NLP
1. Phonology:
It concerned with interpretation of speech sound within and across words.
2. Morphology:
It deals with how words are constructed from more basic meaning units called morphemes. A morpheme is
the primitive unit of meaning in a language. For example, “truth+ful+ness”.
3. Syntax:
It concerns how words can be put together to form correct sentences and determines what structural role
each word plays in the sentence and what phrases are subparts of other phrases.
For example, “the dog ate my homework”
4. Semantics:
It is a study of the meaning of words and how these meaning combine in sentences to form sentence
meaning. It is study of context- independent meaning. For example, plant: industrial plant/ living organism .
Pragmatics concerns with how sentences are used in different situations and how it affects the interpretation
of the sentence. Discourse context deals with how the immediately preceding sentences affect the
interpretation of the next sentence. For example, interpreting pronouns and interpreting the temporal aspects
of the information.
5. Reasoning:
To produce an answer to a question which is not explicitly stored in a database; Natural Language Interface
to Database (NLIDB) carries out reasoning based on data stored in the database. For example, consider the
database that holds the student academic information, and user posed a query such as: „Which student is
likely to fail in Science subject?‟ To answer the query, NLIDB needs a domain expert to narrow down the
reasoning process.
1.4 The Study of Language:
Language is studied in several different academic disciplines. Each discipline defines its own set of
problems and has its own methods for addressing them. The linguist, for instance, studies the structure of
language itself, considering questions such as why certain combinations of words form sentences but others
do not, and why a sentence can have some meanings but not others. The psycholinguist, on the other hand,
studies the processes of human language production and comprehension, considering questions such as how
people identify the appropriate structure of a sentence and when they decide on the appropriate meaning for
words. The philosopher considers how words can mean anything at all and how they identify objects in the
world. Philosophers also consider what it means to have beliefs, goals, and intentions, and how these
cognitive capabilities relate to language. The goal of the computational linguist is to develop a compu
tational theory of language, using the notions of algorithms and data structures from computer science. Of
course, to build a computational model, you must take advantage of what is known from all the other
disciplines.
Table 1.1 summarizes these different approaches to studying language.
a) Machine Translation
Machine translation is basically used to convert text or speech from one natural language to another natural
language. Machine translation, an integral part of Natural Language Processing where translation is done
from source language to target language preserving the meaning of the sentence. Example: Google
Translator
b)Information Retrieval
It refers to the human-computer interaction (HCI) that happens when we use a machine to search a body of
information for information objects (content) that match our search query. A Person's query is matched
against a set of documents to find a subset of 'relevant' document. Examples: Google, Yahoo, Altavista, etc.
c) Text Categorization
Text categorization (also known as text classification or topic spotting) is the task of automatically sorting a
set of documents into categories (clusters).
Uses of Text Categorization
• Filtering of content
• Spam filtering Identification of document content
• Survey coding
d) Information Extraction
Identify specific pieces of information in unstructured or semi-structured textual document. Transform
unstructured information in a corpus of documents or web pages into a structured database.
Applied to different types of text:
• Newspaper articles
• Web pages
• Scientific articles
• Newsgroup messages
• Classified ads
• Medical notes
e)Grammar Correction-
In word processor software like MS-word, NLP techniques are widely used for spelling correction &
grammar check.
f) Sentiment Analysis-
Sentiment Analysis is also known as opinion mining. It is mainly used on the web to analyse the behaviour,
attitude, and emotional state of the sender. This application is implemented through a combination of NLP)
and statistics by assigning the values to the text (natural, positive or negative), identify the mood of the
context (sad, happy, angry, etc.)
g)Question-Answering systems-
Question Answering focuses on constructing systems that automatically answer the questions asked by
humans in a natural language. It presents only the requested information instead of searching full
documents like search engine. The basic idea behind the QA system is that the users just have to ask the
question and the system will retrieve the most appropriate and correct answer for that question.
E.g.
Q. “What is the birth place of Shree Krishna?
A. Mathura
h)Spam Detection
To detect unwanted e-mails getting to a user's inbox, spam detection is used
i)Chatbot
Chatbot is one of the most important applications of NLP. It is used by many companies to provide the
customer's chat services.
j) Speech Recognition-
Speech recognition is used for converting spoken words into text. It is used in applications, such as mobile,
home automation, video recovery, dictating to Microsoft Word, voice biometrics, voice user interface, and
so on.
k)Text summarization
This task aims to create short summaries of longer documents while retaining the core content and
preserving the overall meaning of the text.
NLU enables human-computer interaction. It is the comprehension of human language such as English,
Hindi, Telugu, Spanish and French, for example, that allows computers to understand commands without
the formalized syntax of computer languages. NLU also enables computers to communicate back to humans
in their own languages.
The main purpose of NLU is to create chat- and voice-enabled bots that can interact with the public without
supervision. Many major IT companies, such as Amazon, Apple, Google and Microsoft, and startups have
NLU projects underway.
NLU analyzes data to determine its meaning by using algorithms to reduce human speech into a structured
ontology -- a data model consisting of semantics and pragmatics definitions. Two fundamental concepts of
NLU are intent and entity recognition.
Intent recognition is the process of identifying the user's sentiment in input text and determining their
objective. It is the first and most important part of NLU because it establishes the meaning of the text.
Entity recognition is a specific type of NLU that focuses on identifying the entities in a message, then
extracting the most important information about those entities. There are two types of entities: named
entities and numeric entities. Named entities are grouped into categories -- such as people, companies and
locations. Numeric entities are recognized as numbers, currencies and percentages.
For example, a request for an island camping trip on Vancouver Island on Aug. 18 might be broken down
like this: ferry tickets [intent] / need: camping lot reservation [intent] / Vancouver Island [location] / Aug.
18 [date].
IVR and message routing. Interactive Voice Response (IVR) is used for self-service and call routing.
Early iterations were strictly touchtone and did not involve AI. However, as IVR technology advanced,
features such as NLP and NLU have broadened its capabilities and users can interact with the phone system
via voice. The system processes the user's voice, converts the words to text, and then parses the
grammatical structure of the sentence to determine the probable intent of the caller.
Customer support and service through intelligent personal assistants. NLU is the technology
behind chatbots, which is a computer program that converses with a human in natural language via text or
voice. Chatbots follow a script and can only answer questions in that script. These intelligent personal
assistants can be a useful addition to customer service. For example, chatbots are used to provide answers to
frequently asked questions. Accomplishing this involves layers of different processes in NLU technology,
such as feature extraction and classification, entity linking and knowledge management.
Machine translation. Machine learning (ML) is a branch of AI that enables computers to learn and change
behavior based on training data. Machine learning algorithms are also used to generate natural language text
from scratch. In the case of translation, a machine learning algorithm analyzes millions of pages of text --
say, contracts or financial documents -- to learn how to translate them into another language. The more
documents it analyzes, the more accurate the translation. For example, if a user is translating data with an
automatic language tool such as a dictionary, it will perform a word-for-word substitution. However, when
using machine translation, it will look up the words in context, which helps return a more accurate
translation.
Data capture. Data capture is the process of gathering and recording information about an object, person or
event. For example, if an e-commerce company used NLU, it could ask customers to enter their shipping
and billing information verbally. The software would understand what the customer meant and enter the
information automatically.
Conversational interfaces. Many voice-activated devices -- including Amazon Alexa and Google Home --
allow users to speak naturally. By using NLU, conversational interfaces can understand and respond to
human language by segmenting words and sentences, recognizing grammar, and using semantic knowledge
to infer intent.
1.7 The Different Levels of Language Analysis
A natural language-system must use considerable knowledge about the structure of the language
itself, including what the words are, how words combine to form sentences, what the words mean, how
word meanings contribute to sentence meanings, and so on. However, we cannot completely account for
linguistic behavior without also taking into account another aspect of what makes humans intelligent —
their general world knowledge and their reasoning abilities. For example, to answer questions or to
participate in a conversation, a person not only must know a lot about the structure of the language being
used, but also must know about the world in general and the conversational setting in particular.
The following are some of the different forms of knowledge relevant for natural language understanding:
Phonetic and phonological knowledge - concerns how words are related to the sounds that realize them.
Such knowledge is crucial for speech-based systems.
Morphological knowledge - concerns how words are constructed from more basic meaning units called
morphemes. A morpheme is the primitive unit of meaning in a language (for example, the meaning of the
word "friendly" is derivable from the meaning of the noun "friend" and the suffix "-ly", which transforms a
noun into an adjective).
Syntactic knowledge - concerns how words can be put together to form correct sentences and determines
what structural role each word plays in the sentence and what phrases are subparts of what other phrases.
Semantic knowledge - concerns what words mean and how these meanings -combine in sentences to form
sentence meanings. This is the study of context-independent meaning - the meaning a sentence has
regardless of the context in which it is used.
Pragmatic knowledge - concerns how sentences are used in different situations and how use affects the
interpretation of the sentence.
Discourse knowledge-concerns how the immediately preceding sentences affect the interpretation of the
next sentence. This information is especially important for interpreting pronouns and for interpreting the
temporal aspects of the information conveyed.
World knowledge - includes the general knowledge about the struc ture of the world that language users
must have in order to, for example, maintain a conversation. It includes what each language user must know
about the other user’s beliefs and goals.
Ambiguity can occur at all NLP levels. It is a property of linguistic expressions. If an expression
(word/phrase/sentence) has more than one meaning we can refer it as ambiguous. To represent meaning, we
must have a more precise language. The tools to do this come from mathematics and logic and involve the
use of formally specified representation languages. Formal languages are specified from very simple
building blocks. The most fundamental is the notion of an atomic symbol which is distinguishable from any
other atomic symbol simply based on how it is written. Useful representation languages have the following
two properties:
● The representation must be precise and unambiguous. You should be able to express every distinct
reading of a sentence as a distinct formula in the representation.
● The representation should capture the intuitive structure of the natural language sentences that it
represents. For example, sentences that appear to be structurally similar should have similar structural
representations, and the meanings of two sentences that are paraphrases of each other should be closely
related to each other.
These sentences share certain structural properties. In each, the noun phrases are "John", "Mary", and "the
book", and the act described is some selling action. In other respects, these sentences are significantly
different. For instance, even though both sentences are always either true or false in the exact same
situations, you could only give sentence 1 as an answer to the question "What did John do for Mary?"
Sentence 2 is a much better continuation of a sentence beginning with the phrase "After it fell in the river",
as sentences 3 and 4 show. Following the standard convention in linguistics, this book will use an asterisk
(*) before any example of an ill-formed or questionable sentence.
Most syntactic representations of language are based on the notion of context-free grammars, which
represent sentence structure in terms of what phrases are subparts of other phrases. This information is often
presented in a tree form, such as the one shown in Figure 1.4, which shows two different structures for the
sentence "Rice flies like sand". In the first reading, the sentence is formed from a noun phrase (NP)
describing a type of fly‟ rice flies, and a verb phrase (VP) that asserts that these flies like sand. In the
second structure, the sentence is formed from a noun phrase describing a type of substance, rice, and a verb
phrase stating that this substance flies like sand (say, if you throw it). The two structures also give further
details on the structure of the noun phrase and verb phrase and identify the part of speech for each word. In
particular, the word "like" is a verb (V) in the first reading and a preposition (P) in the second.
4. After it fell in the river, the book was sold to Mary by John.
Many other structural properties can be revealed by considering sentences that are not well-formed.
Sentence 5 is ill-formed because the subject and the verb do not agree in number (the subject is singular and
the verb is plural), while 6 is ill-formed because the verb put requires some modifier that describes where
John put the object.
2) Syntactic Analysis:
Syntactic analysis consists of analysis of words in the sentence for grammar and ordering words in a way
that shows the relationship among the words. For example the sentence such as “The school goes to boy” is
rejected by English syntactic analyzer.
3) Semantic Analysis:
Semantic analysis is a structure created by the syntactic analyzer which assigns meanings. This component
transfers linear sequences of words into structures. It shows how the words are associated with each other.
Semantics focuses only on the literal meaning of words, phrases, and sentences. This only draws the
dictionary meaning or the real meaning from the given text. The structures assigned by the syntactic
analyzer always have assigned meaning.
The text is checked for meaningfulness. It is done by mapping syntactic structure and objects in the task
domain. E.g. “colorless green idea”. This would be rejected by the Symantec analysis as colorless here;
green doesn‟t make any sense.
4) Discourse Integration:
The meaning of any sentence depends upon the meaning of the sentence just before it. Furthermore, it also
brings about the meaning of immediately succeeding sentence. For example, “He wanted that”, in this
sentence the word “that” depends upon the prior discourse context.
5) Pragmatic Analysis:
Pragmatic analysis concerned with the overall communicative and social content and its effect on
interpretation. It means abstracting or deriving the meaningful use of language in situations. In this analysis,
what was said is reinterpreted on what it truly meant. It contains deriving those aspects of language which
necessitate real world knowledge.
E.g., “close the window?” should be interpreted as a request instead of an order.
Morphological Analysis:
Consider we have an English interface to an operating system and the following sentence is typed: “I want
to print Bill‟s .init file “.
Morphological analysis must do the following things:
• Pull apart the word “Bill‟s” into proper noun “Bill” and the possessive suffix “‟s”.
• Recognize the sequence “.init” as a file extension that is functioning as an adjective in the sentence.
This process will usually assign syntactic categories to all the words in the sentence. Consider the word
“prints”. This word is either a plural noun or a third person singular verb (he prints).
Syntactic analysis:
This method examines the structure of a sentence and performs detailed analysis of the sentence and
semantics of the statement. In order to perform this, the system is expected to have through knowledge of
the grammar of the language. The basic unit of any language is sentence, made up of group of words,
having their own meanings and linked together to present an idea or thought. Apart from having meanings,
words fall under categories called parts of speech. In English languages, there are eight different parts of
speech. They are nouns, pronoun, adjectives, verb, adverb, prepositions, conjunction and interjections.
In English language, a sentence S is made up of a noun phrase (NP) and a verb phrase (VP), i.e.
S=NP+VP
The given noun phrase (NP) normally can have an article or delimiter (D) or an adjective (ADJ) and the
noun (N), i.e.
NP=D+ADJ+N
Also a noun phrase may have a prepositional phrase (PP) which has a preposition (P), a delimiter (D) and
the noun (N), i.e.
PP=D+P+N
The verb phrase (VP) has a verb (V) and the object of the verb. The object of the verb may be a noun (N)
and its determiner, i.e.
VP=V+N+D
These are some of the rules of the English grammar that helps one to construct a small parser for NLP.
Syntactic analysis must exploit the results of morphological analysis to build a structural description of the
sentence. The goal of this process, called parsing, is to convert the flat list of words that forms the sentence
into a structure that defines the units that are represented by that flat list. The important thing here is that a
flat sentence has been converted into a hierarchical structure and that the structures correspond to meaning
units when semantic analysis is performed. Reference markers are shown in the parenthesis in the parse
tree. Each one corresponds to some entity that has been mentioned in the sentence.
Semantic Analysis:
The structures created by the syntactic analyzer assigned meanings. Also, a mapping made between the
syntactic structures and objects in the task domain. Moreover, Structures for which no such mapping
possible may reject.
Semantic analysis must do two important things: It must map individual words into appropriate objects in
the knowledge base or database It must create the correct structures to correspond to the way the meanings
of the individual words combine with each other.
Discourse Integration:
The meaning of an individual sentence may depend on the sentences that precede it. And also, may
influence the meanings of the sentences that follow it Specifically we do not know whom the pronoun “I”
or the proper noun “Bill” refers to. To pin down these references requires an appeal to a model of the
current discourse context, from which we can learn that the current user is USER068 and that the only
person named “Bill” about whom we could be talking is USER073.Once the correct referent for Bill is
known, we can also determine exactly which file is being referred to.
Pragmatic Analysis
Moreover, The structure representing what said reinterpreted to determine what was actually meant. The
final step toward effective understanding is to decide what to do as a results. One possible thing to do is to
record what was said as a fact and be done with it. For some sentences, whose intended effect is clearly
declarative, that is precisely correct thing to do. But for other sentences, including this one, the intended
effect is different. We can discover this intended effect by applying a set of rules that characterize
cooperative dialogues. The final step in pragmatic processing is to translate, from the knowledge based
representation to a command to be executed by the system.
Results of each of the main processes combine to form a natural language system.
The results of the understanding process are lpr /ali/stuff.init. All of the processes are important in a
complete natural language understanding system. Not all programs are written with exactly these
components. Sometimes two or more of them collapsed. Doing that usually results in a system that is easier
to build for restricted subsets of English but one that is harder to extend to wider coverage.
Words
The Elements of Simple Noun Phrases
Verb Phrases and Simple Sentences
Noun Phrases Revisited
Adjective Phrases
Adverbial Phrases
1.Words
At first glance the most basic unit of linguistic structure appears to be the word. The word, though, is
far from the fundamental element of study in linguistics; it is already the result of a complex set of
more primitive parts. The study of morphology concerns the construction of words from more basic
components corresponding roughly to meaning units. There are two basic ways that new words are
formed, traditionally classified as inflectional forms and derivational forms. Inflectional forms use a
root form of a word and typically add a suffix so that the word appears in the appropriate form given
the sentence. Verbs are the best examples of this in English. Each verb has a basic form that then is
typically changed depending on the subject and the tense of the sentence. For example, the verb sigh
will take suffixes such as -s, -ing, and -ed to create the verb forms sighs, sighing, and sighed,
respectively. These new words are all verbs and share the same basic meaning. Derivational
morphology involves the derivation of new words from other forms. The new words may be in
completely different cate- gories from their subparts. For example, the noun friend is made into the
adjec- tive friendly by adding the suffix -ly. A more complex derivation would allow you to derive the
noun friendliness from the adjective form. There are many interesting issues concerned with how
words are derived and how the choice of word form is affected by the syntactic structure of the
sentence that constrains it.
Traditionally, linguists classify words into different categories based on their uses. Two related areas
of evidence are used to divide words into categories. The first area concerns the word‟s contribution to
the meaning of the phrase that contains it, and the second area concerns the actual syntactic structures
in which the word may play a role. For example, you might posit the class noun as those words that
can be used to identify the basic type of object, concept, or place being discussed, and adjective
as those words that further qualify the object, concept, or place. Thus green would be an adjective
and book a noun, as shown in the phrases the green book and green books. But things are not so
simple: green might play the role of a noun, as in That green is lighter than the other, and book
might play the role of a modifier, as in the book worm. In fact, most nouns seem to be able to be used
as a modifier in some situations. Perhaps the classes should be combined, since they overlap a great
deal. But other forms of evidence exist. Consider what words could complete the sentence It’s so . .
..
You might say It’s so green, It’s so hot, It’s so true, and so on. Note that although book can be a
modifier in the book worm, you cannot say *It’s so book about anything. Thus there are two classes of
modifiers: adjective modifiers and noun modifiers.
Consider again the case where adjectives can be used as nouns, as in the green. Not all adjectives
can be used in such a way. For example, the noun phrase the hot can be used, given a context
where there are hot and cold plates, in a sentence such as The hot are on the table. But this refers to the
hot plates; it cannot refer to hotness in the way the phrase the green refers to green. With this evidence
you could subdivide adjectives into two subclasses—those that can also be used to describe a concept
or quality directly, and those that cannot. Alter- natively, however, you can simply say that green is
ambiguous between being an adjective or a noun and, therefore, falls in both classes. Since green can
behave like any other noun, the second solution seems the most direct.
Using similar arguments, we can identify four main classes of words in English that contribute to
the meaning of sentences. These classes are nouns, adjectives, verbs, and adverbs. Sentences are built
out of phrases centered on these four word classes. Of course, there are many other classes of words
that are necessary to form sentences, such as articles, pronouns, prepositions, particles, quantifiers,
conjunctions, and so on. But these classes are fixed in the sense that new words in these classes are
rarely introduced into the language. New nouns, verbs, adjectives and adverbs, on the other hand, are
regularly introduced into the language as it evolves. As a result, these classes are called the open class
words, and the others are called the closed class words.
A word in any of the four open classes may be used to form the basis of a phrase. This word is
called the head of the phrase and indicates the type of thing, activity, or quality that the phrase
describes. For example, with noun phrases, the head word indicates the general classes of objects
being described. The phrases
the dog
the mangy dog
the mangy dog at the pound
are all noun phrases that describe an object in the class of dogs. The first describes a member from
the class of all dogs, the second an object from the class of mangy dogs, and the third an object
from the class of mangy dogs that are at the pound. The word dog is the head of each of these phrases.
Noun Phrases Verb Phrases
happy that he’d won the prize intermittently throughout the day
Figure 2.1asExamples
angry a hippo of heads and complements
inside the house
Excluding pronouns and proper names, the head of a noun phrase is usually a common noun.
Nouns divide into two main classes:
count nouns—nouns that describe specific objects or sets of objects.
mass nouns—nouns that describe composites or substances.
Count nouns acquired their name because they can be counted. There may be one dog or many dogs,
one book or several books, one crowd or several crowds. If a single count noun is used to describe a
whole class of objects, it must be in its plural form. Thus you can say Dogs are friendly but not *Dog
is friendly.
Mass nouns cannot be counted. There may be some water, some wheat, or some sand. If you try
to count with a mass noun, you change the meaning. For example, some wheat refers to a portion of
some quantity of wheat, whereas one wheat is a single type of wheat rather than a single grain of
wheat. A mass noun can be used to describe a whole class of material without using a plural form.
Thus you say Water is necessary for life, not *Waters are necessary for life.
In addition to a head, a noun phrase may contain specifiers and qualifiers preceding the head.
The qualifiers further describe the general class of objects identified by the head, while the specifiers
indicate how many such objects are being described, as well as how the objects being described relate
to the speaker and hearer. Specifiers are constructed out of ordinals (such as first and second),
cardinals (such as one and two), and determiners. Determiners can be sub- divided into the following
general classes:
articles—the words the, a, and an.
possessives—noun phrases followed by the suffix ’s, such as John’s and the fat man’s, as well as
possessive pronouns, such as her, my, and whose.
quantifying determiners—words such as some, every, most, no, any, both, and half.
Number First Person Second Person Third
Person
he
singular I you (masculine)
she
(feminine)
it (neuter)
plural we you they
A simple noun phrase may have at most one determiner, one ordinal, and one cardinal. It is
possible to have all three, as in the first three contestants. An exception to this rule exists with a few
quantifying determiners such as many, few, several, and little. These words can be preceded by an
article, yielding noun phrases such as the few songs we knew. Using this evidence, you could
subcategorize the quantifying determiners into those that allow this and those that don‟t, but the present
coarse categorization is fine for our purposes at this time.
The qualifiers in a noun phrase occur after the specifiers (if any) and before the head. They consist
of adjectives and nouns being used as modifiers. The following are more precise definitions:
adjectives—words that attribute qualities to objects yet do not refer to the qualities themselves (for
example, angry is an adjective that attributes the quality of anger to something).
noun modifiers—mass or count nouns used to modify another noun, as in the cook book or the ceiling
paint can.
Before moving on to other structures, consider the different inflectional forms that nouns take and
how they are realized in English. Two forms of nouns—the singular and plural forms—have already
been mentioned. Pronouns take forms based on person (first, second, and third) and gender
(masculine, feminine, and neuter). Each of these distinctions reflects a systematic analysis that is
almost wholly explicit in some languages, such as Latin, while implicit in others. In French, for
example, nouns are classified by their gender. In English many of these distinctions are not explicitly
marked except in a few cases. The pronouns provide the best example of this. They distinguish
number, person, gender, and case (that is, whether they are used as possessive, subject, or object), as
shown in Figures 2.2 through 2.4.
Number First Person Second Person Third
Person
him her
singular me you
it
plural us you them
Mood Example
simple present hit, cries, go, am The dog cries every day. I am
thirsty.
simple past hit, cried, went,I was thirsty.
was I went to the store.
present participle hitting, crying,I‟m going to the store.
going, being Being the last in line aggravates
me.
past participle hit, cried, gone,I‟ve been there before. The cake
been was gone.
Figure 2.6 The five verb forms
modifiers followed by the head verb and its complements. Every verb must appear in one of the
five possible forms shown in Figure 2.6.
Verbs can be divided into several different classes: the auxiliary verbs, such as be, do, and have;
the modal verbs, such as will, can, and could; and the main verbs, such as eat, ran, and believe. The
auxiliary and modal verbs usually take a verb phrase as a complement, which produces a sequence
of verbs, each the head of its own verb phrase. These sequences are used to form sentences with
different tenses.
The tense system identifies when the proposition described in the sentence is said to be true. The
tense system is complex; only the basic forms are outlined in Figure 2.7. In addition, verbs may be in
the progressive tense. Corresponding
First Second Third
Singular I am you are he is
I walk you walk she walks
Plural we are you are they are
we walk you walk they walk
to the tenses listed in Figure 2.7 are the progressive tenses shown in Figure 2.8. Each progressive tense
is formed by the normal tense construction of the verb be followed by a present participle.
Verb groups also encode person and number information in the first word in the verb group. The
person and number must agree with the noun phrase that is the subject of the verb phrase. Some verbs
distinguish nearly all the possibilities, but most verbs distinguish only the third person singular (by
adding an -s suffix). Some examples are shown in Figure 2.9.
Jack saw the ball. The ball was seen by Jack. The
clue will be found by me.I was
I will find the clue. hit by Jack.
Jack hit me.
Some verbs allow two noun phrases to follow them in a sentence; for example, Jack gave Sue a
book or Jack found me a key. In such sentences the second NP corresponds to the object NP outlined
earlier and is sometimes called the direct object. The other NP is called the indirect object.
Generally, such sentences have an equivalent sentence where the indirect object appears with a
preposition, as in Jack gave a book to Sue or Jack found a key for me.
Particles
Some verb forms are constructed from a verb and an additional word called a particle. Particles
generally overlap with the class of prepositions considered in the next section. Some examples are up,
out, over, and in. With verbs such as look, take, or put, you can construct many different verbs by
combining the verb with a particle (for example, look up, look out, look over, and so on). In some
sentences the difference between a particle and a preposition results in two different readings for the
same sentence. For example, look over the paper would mean reading the paper, if you consider over a
particle (the verb is look over). In contrast, the same sentence would mean looking at something else
behind or above the paper, if you consider over a preposition (the verb is look).
You can make a sharp distinction between particles and prepositions when the object of the verb
is a pronoun. With a verb-particle sentence, the pronoun must precede the particle, as in I looked it up.
With the prepositional reading, the pronoun follows the preposition, as in I looked up it. Particles
also may follow the object NP. Thus you can say I gave up the game to Mary or I gave the game up to
Mary. This is not allowed with prepositions; for example, you cannot say *I climbed the ladder up.
Clausal Complements
Many verbs allow clauses as complements. Clauses share most of the same prop- erties of sentences
and may have a subject, indicate tense, and occur in passivized forms. One common clause form
consists of a sentence form preceded by the complementizer that, as in that Jack ate the pizza. This
clause will be identified by the expression S[that], indicating a special subclass of S structures. This
clause may appear as the complement of the verb know, as in Sam knows
that Jack ate the pizza. The passive is possible, as in Sam knows that the pizza was eaten by Jack.
Another clause type involves the infinitive form of the verb. The VP[inf] clause is simply a VP
starting in the infinitive form, as in the complement of the verb wish in Jack wishes to eat the pizza.
An infinitive sentence S[inf] form is also possible where the subject is indicated by a for phrase, as in
Jack wishes for Sam to eat the pizza.
Another important class of clauses are sentences with complementizers that are wh-words, such as
who, what, where, why, whether, and how many. These question clauses, S[WH], can be used as a
complement of verbs such as know, as in Sam knows whether we went to the party and The police
know who committed the crime.
modifies book.)
In contrast, a verb like put can take any PP that describes a location, as in Jack put the book in the box.
Jack put the book inside the box. Jack put the book by the door.
To account for this, we allow complement specifications that indicate prepo- sitional phrases with
particular prepositions. Thus the verb give would have a complement of the form NP+PP[to].
Similarly the verb decide would have a complement form NP+PP[about], and the verb blame would
have a complement form NP+PP[on], as in Jack blamed the accident on the police.
Verbs such as put, which take any phrase that can describe a location (complement
NP+Location), are also common in English. While locations are typically prepositional phrases, they
also can be noun phrases, such as home, or particles, such as back or here. A distinction can be made
between phrases that describe locations and phrases that describe a path of motion, although many
location phrases can be interpreted either way. The distinction can be made in some cases, though. For
instance, prepositional phrases beginning with to generally indicate a path of motion. Thus they cannot
be used with a verb such as put that requires a location (for example, *I put the ball to the box). This
distinction will be explored further in Chapter 4.
Figure 2.11 summarizes many of the verb complement structures found in English. A full list
would contain over 40 different forms. Note that while the
Verb Complement Example
Structure
laugh Empty (intransitive) Jack laughed.
find NP (transitive) Jack found a key.
give NP+NP (bitransitive) Jack gave Sue the paper.
give NP+PP[to] Jack gave the book to the library.
reside Location phrase Jack resides in Rochester
put NP+Location phrase Jack put the book inside.
speak PP[with]+PP[about] Jack spoke with Sue about the
book.
try VP[to] Jack tried to apologize.
tell NP+VP[to] Jack told the man to go.
wish S[to] Jack wished for the man to go.
keep VP[ing] Jack keeps hoping for the best.
catch NP+VP[ing] Jack caught Sam looking in his
desk.
watch NP+VP[base] Jack watched Sam eat the pizza.
regret S[that] Jack regretted that he‟d eaten the
whole thing.
tell NP+S[that] Jack told Sue that he was sorry.
seem ADJP Jack seems unhappy in his new job.
think NP+ADJP Jack thinks Sue is happy in her job.
know S[WH] Jack knows where the money is.
examples typically use a different verb for each form, most verbs will allow several different
complement structures.
Many nouns, such as desire, reluctance, and research, take an infinitive VP form as a
complement, as in the noun phrases his desire to release the guinea pig, a reluctance to open the
case again, and the doctor’s research to find a cure for cancer. These nouns, in fact, can also take the
S[inf] form, as in my hope for John to open the case again.
Noun phrases can also be built out of clauses, which were introduced in the last section as the
complements for verbs. For example, a that clause (S[that]) can be used as the subject of a sentence,
as in the sentence That George had the
ring was surprising. Infinitive forms of verb phrases (VP[inf]) and sentences (S[inf]) can also function
as noun phrases, as in the sentences To own a car would be delightful and For us to complete a
project on time would be unprecedented. In addition, the gerundive forms (VP[ing] and S[ing]) can
also function as noun phrases, as in the sentences Giving up the game was unfortunate and John’s
giving up the game caused a riot.
Relative clauses involve sentence forms used as modifiers in noun phrases. These clauses are
often introduced by relative pronouns such as who, which, that, and so on, as in
The man who gave Bill the money . . . The rug that George gave to Ernest . . .
The man whom George gave the money to . . .
In each of these relative clauses, the embedded sentence is the same structure as a regular sentence
except that one noun phrase is missing. If this missing NP is filled in with the NP that the sentence
modifies, the result is a complete sentence that captures the same meaning as what was conveyed by
the relative clause. The missing NPs in the preceding three sentences occur in the subject position, in
the object position, and as object to a preposition, respectively. Deleting the relative pronoun and
filling in the missing NP in each produces the following:
The man gave Bill the money. George gave the rug to Ernest. George gave the money to the man.
As was true earlier, relative clauses can be modified in the same ways as regular sentences. In
particular, passive forms of the preceding sentences would be as follows:
Bill was given the money by the man. The rug was given to Ernest by George.
The money was given to the man by George.
Correspondingly, these sentences could have relative clauses in the passive form as follows:
The man Bill was given the money by . . .
Notice that some relative clauses need not be introduced by a relative pronoun. Often the relative
pronoun can be omitted, producing what is called a base relative clause, as in the NP the man
George gave the money to. Yet another form deletes the relative pronoun and an auxiliary be form,
creating a reduced relative clause, as in the NP the man given the money, which means the same as
the NP the man who was given the money.
2.4 Adjective Phrases
You have already seen simple adjective phrases (ADJPs) consisting of a single adjective in several
examples. More complex adjective phrases are also possible, as adjectives may take many of the
same complement forms that occur with verbs. This includes specific prepositional phrases, as with
the adjective pleased, which takes the complement form PP[with] (for example, Jack was pleased with
the prize), or angry with the complement form PP[at] (for example, Jack was angry at the committee).
Angry also may take an S[that] complement form, as in Jack was angry that he was left behind. Other
adjectives take infinitive forms, such as the adjective willing with the complement form VP[inf], as in
Jack seemed willing to lead the chorus.
These more complex adjective phrases are most commonly found as the complements of verbs
such as be or seem, or following the head in a noun phrase. They generally cannot be used as modifiers
preceding the heads of noun phrases (for example, consider *the angry at the committee man vs. the
angry man vs. the man angry at the committee).
Adjective phrases may also take a degree modifier preceding the head, as in the adjective phrase
very angry or somewhat fond of Mary. More complex degree modifications are possible, as in far too
heavy and much more desperate. Finally, certain constructs have degree modifiers that involve their
own complement forms, as in too stupid to come in out of the rain, so boring that everyone fell asleep,
and as slow as a dead horse.
Adverbs may occur in several different positions in sentences: in the sentence initial position (for
example, Then, Jack will open the drawer), in the verb sequence (for example, Jack then will open the
drawer, Jack will then open the drawer), and in the sentence final position (for example, Jack opened
the drawer then). The exact restrictions on what adverb can go where, however, is quite idiosyncratic
to the particular adverb.
In addition to these adverbs, adverbial modifiers can be constructed out of a wide range of
constructs, such as prepositional phrases indicating, among other things, location (for example, in the
box) or manner (for example, in great haste); noun phrases indicating, among other things, frequency
(for example, every day); or clauses indicating, among other things, the time (for example, when the
bomb exploded). Such adverbial phrases, however, usually cannot occur except in the sentence initial
or sentence final position. For example, we can say Every day
John opens his drawer or John opens his drawer every day, but not *John every day opens his drawer.
Because of the wide range of forms, it generally is more useful to consider adverbial phrases
(ADVPs) by function rather than syntactic form. Thus we can consider manner, temporal, duration,
location, degree, and frequency adverbial phrases each as its own form. We considered the location
and degree forms earlier, so here we will consider some of the others.
Temporal adverbials occur in a wide range of forms: adverbial particles (for example, now),
noun phrases (for example, today, yesterday), prepositional phrases (for example, at noon, during the
fight), and clauses (for example, when the clock struck noon, before the fight started).
Frequency adverbials also can occur in a wide range of forms: particles (for example, often), noun
phrases (for example, every day), prepositional phrases (for example, at every party), and clauses
(for example, every time that John comes for a visit).
Duration adverbials appear most commonly as prepositional phrases (for example, for three
hours, about 20 feet) and clauses (for example, until the moon turns blue).
Manner adverbials occur in a wide range of forms, including particles (for example, slowly),
noun phrases (for example, this way), prepositional phrases (for example, in great haste), and
clauses (for example, by holding the embers at the end of a stick).
In the analyses that follow, adverbials will most commonly occur as modifiers of the action or
state described in a sentence. As such, an issue arises as to how to distinguish verb complements from
adverbials. One distinction is that adverbial phrases are always optional. Thus you should be able to
delete the adverbial and still have a sentence with approximately the same meaning (missing,
obviously, the contribution of the adverbial). Consider the sentences
Jack put the box by the door. Jack ate the pizza by the door.
In the first sentence the prepositional phrase is clearly a complement, since deleting it to produce *Jack
put the box results in a nonsensical utterance. On the other hand, deleting the phrase from the second
sentence has only a minor effect: Jack ate the pizza is just a less general assertion about the same
situation described by Jack ate the pizza by the door.
30