Academia.eduAcademia.edu

Prosodic persistence in music performance and speech production

2002, The Journal of the Acoustical Society of America

Prosodic cues play a similar role in speech and music; they can differentiate interpretations of ambiguous sentences or melodies and they can aid memory. Prosodic cues can be related to the syntactic structure of an intended production, but they can also exist separately from the structural features. Four experiments examined the persistence of structurally-related and structurally-unrelated prosodic cues in production. In Experiment 1, pianists listened to melodies containing intensity and articulation cues and performed similar metrically ambiguous melodies. Pianists' performances persisted only in the metrically-unrelated articulation cues, but in a metrically related manner. In Experiment 2, speakers heard sentences containing prosodic phrase breaks and tonal patterns and produced similar syntactically ambiguous sentences. Their productions reflected syntactically-related prosodic phrase breaks and some evidence of syntacticallyunrelated tonal (pitch) accents. Experiments 3 and 4 examined listeners' ability to judge the meter or syntax from the productions collected in Experiments 1 and 2. Listeners correctly identified the meter and syntax of the productions. These results suggest that prosodic persistence may be beneficial in a conversational or ensemble context. iii Dedicated to my wonderful family iv ACKNOWLEDGMENTS

PROSODIC PERSISTENCE IN MUSIC PERFORMANCE AND SPEECH PRODUCTION DISSERTATION Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School of The Ohio State University By Melissa Kay Jungers, M.A. ***** The Ohio State University 2003 Dissertation Committee: Approved by Dr. Caroline Palmer, Adviser Dr. Neal Johnson Dr. Mark Pitt Dr. Shari Speer _____________________ Adviser Psychology Graduate Program ABSTRACT Prosodic cues play a similar role in speech and music; they can differentiate interpretations of ambiguous sentences or melodies and they can aid memory. Prosodic cues can be related to the syntactic structure of an intended production, but they can also exist separately from the structural features. Four experiments examined the persistence of structurally-related and structurally-unrelated prosodic cues in production. In Experiment 1, pianists listened to melodies containing intensity and articulation cues and performed similar metrically ambiguous melodies. Pianists’ performances persisted only in the metrically-unrelated articulation cues, but in a metrically related manner. In Experiment 2, speakers heard sentences containing prosodic phrase breaks and tonal patterns and produced similar syntactically ambiguous sentences. Their productions reflected syntactically-related prosodic phrase breaks and some evidence of syntacticallyunrelated tonal (pitch) accents. Experiments 3 and 4 examined listeners’ ability to judge the meter or syntax from the productions collected in Experiments 1 and 2. Listeners correctly identified the meter and syntax of the productions. These results suggest that prosodic persistence may be beneficial in a conversational or ensemble context. ii Dedicated to my wonderful family iii ACKNOWLEDGMENTS I wish to thank Caroline Palmer for her encouragement and support over the past five years. Merci! I also wish to thank Shari Speer for her enthusiastic introduction to the world of psycholinguistics. I thank Neal Johnson and Mark Pitt, who offered their insight and suggestions. I wish to thank Laurie Maynell for her help and advice in the creation of stimuli. I am also grateful to Beth Mechlin, Lindsay Barber, Chris Hanson, Jenna Johnson, and Michael Keida, who ran subjects and entered data. I thank the Cognitive Science Center at Ohio State University for supporting my research and connecting me with two great collaborators. iv VITA 1998……………………..B.S. Psychology, Bowling Green State University 2000……………………..M.A. Psychology, Ohio State University 1998-present………….…Graduate Fellow and Research Assistant The Ohio State University PUBLICATIONS Research Publication 1. Jungers, M.K., Palmer, C., & Speer, S.R. (2002). Time after time: the coordinating influence of tempo in music and speech. Cognitive Processing, 2, 21-35. 2. Palmer, C. & Jungers, M.K. (2001). Music cognition. R. Goldstone (Ed.), Encyclopedia of Cognitive Science. London: Macmillan. 3. Palmer, C., Jungers, M.K., & Jusczyk, P. (2001). Episodic memory for musical prosody. Journal of Memory and Language, 45, 526-545. FIELDS OF STUDY Major Field: Psychology v TABLE OF CONTENTS Page Abstract……………………………………………………………………...……ii Dedication...………………………………………….………………….….……iii Acknowledgments……………………………….……………...…...………......iv Vita…………………………………………………………………………….…v List of Tables..………………………………………………….…...………….vii List of Figures……………………………………………………...………….viii Chapters: 1 Introduction……………………………………………………...……1 2 Control Experiment 1: Music stimuli……………..………….….….19 3 Control Experiment 2: Speech stimuli………………………..…….26 4 Experiment 1: Music production……………………………..…….33 5 Experiment 2: Speech production………………………….………43 6 Experiment 3: Music perception……………………………....…...58 7 Experiment 4: Speech perception……………………….…..……...64 8. General Discussion……………………….…………………...…….68 List of References……………………………………………………..….…..77 vi LIST OF TABLES Page Table 3.1 Control Experiment 2 & Experiment 2: Four types of syntactically ambiguous sentences with interpretations…………………...……...…...29 5.1 Experiment 2: ToBI transcription for prime utterances…………..……..49 5.2 Experiment 2: Percentage of target utterances by participants following Early or Late prime sentences that contain pitch accents or phrase breaks at specified locations..…………………………....…….51 5.3 Experiment 2: Chi-square table for prime and target utterances……......53 vii LIST OF FIGURES Page Figure 2.1 Control Experiment 1 & Experiment1: Sample metrically ambiguous melody….……………………………………………………21 4.1 Experiment 1: Mean ISI/IOI for pianists’ target performances following legato and staccato prime articulation ………….……….……38 4.2 Experiment 1: Mean ISI/IOI for each event in pianists’ target performances following binary and ternary prime melodies….….……...40 4.3 Experiment 1: Mean percent ‘yes’ response for melody recognition task by cue condition…………………………………….……………….41 5.1 Experiment 2: Mean percent ‘yes’ response for sentence recognition task by cue condition………………………………..……….55 5.2 Experiment 2: Mean percent ‘yes’ response for same or different prosodic phrase breaks……………………………………….…..……….56 6.1 Experiment 3: Mean percent ‘ternary’ response in musical perception task with intensity or articulation cue…………………………61 viii CHAPTER 1 INTRODUCTION When two people have a conversation, does the way in which one person produces sentences influence the way in which the second person speaks? When two musicians perform together, does the style of one musician influence the style of the other? Although these questions come from different domains, the underlying issue is the same. What prosodic dimensions persist among producers and listeners in speech and music? The current study examines the persistence of prosodic cues in both speech and music. In order to persist in the prosody of the utterance or performance just produced, the listener must remember the prosody. How well do listeners retain acoustic features in memory? If listeners do not retain the acoustic details from a production, then they would not be expected to produce these details in subsequent productions. Why might speakers and performers persist in the acoustic cues of what they have just heard? One possibility is that persistence aids communication in a conversational speech context or a small-ensemble musical context. Perhaps the two (or more) parties in a conversation mutually influence each other’s prosody. This may make communicating a message easier because the parties focus on the message and are not distracted by the acoustic variation. 1 Prosody What is prosody? Prosody has been described as a structure that organizes sound as well as the suprasegmental features of speech such as pitch, timing, and loudness (Cutler, Dahan, van Donselaar, 1997). Thus, prosody refers to both an abstract hierarchical structure and the fine acoustic details in speech, such as the “variation in fundamental frequency, spectral information, amplitude, and the relative duration of sound and silence” (Speer, Crowder, & Thomas, 1993). Prosody is also described as the “stress, rhythm, and intonation in spoken sentences” (Kjelgaard & Speer, 1999). Informally, prosody refers to the way in which something is spoken, rather than just the words. Prosodic dimensions such as word duration, timing, and intonation influence listeners’ interpretation of sentence meaning. The specific acoustic details associated with a speaker’s voice can aid sentence interpretation (Nygaard & Pisoni, 1998). In one study, listeners were familiarized with ten speakers’ productions of isolated words (Nygaard & Pisoni, 1998). At test, listeners showed better intelligibility for the familiar voices than unfamiliar voices for isolated novel words in noise. Also, sentences produced by a familiar speaker were more intelligible to listeners than sentences produced by a new speaker (Nygaard & Pisoni, 1998). Word durations can disambiguate the meaning of ambiguous sentences (Lehiste, 1973; Lehiste, Olive, & Streeter, 1976). Listeners used acoustic features to determine the intended meaning of different versions of syntactically ambiguous sentences. Analysis of the acoustic properties from the sentences suggested that timing and intonation were useful features for disambiguation 2 (Lehiste, 1973). The placement and duration of pauses provides another perceptual cue to sentence meaning; speakers’ pause patterns tend to correlate with the syntactic structure of a sentence, with longer pauses near important structural boundaries (Lehiste, et al., 1976). Prosodic emphasis also influences interpretation of sentence meaning (Speer, et al., 1993). Listeners heard syntactically ambiguous sentences that contained different prosodic realizations of a single word, such as the sentences “They are FRYING chickens” and “They are frying CHICKENS”. Listeners’ paraphrasings of the sentences showed that the interpretation depended on the prosodic emphasis. A diverse set of prosodic features, including duration, timing, and intonation, influence the interpretation of sentences. In order to examine and discuss prosody in speech, a system is needed so that the speech signal can be consistently described. Beckman and Pierrehumbert (1986) designed this type of transcription system for describing the prosodic aspects of a spoken sentence. Their system is called ToBI, which stands for Tone and Break Indices. This system has four parallel tiers: orthographic, tone, break index, and miscellaneous (Pierrehumbert & Hirschberg, 1990). Certain tones are stressed and are known as pitch accents. English includes high (H) and low (L) tones. Every utterance contains an intonational phrase that is delimited on the right edge by a boundary tone (L% or H%). Each intonational phrase (Iph) is made up of at least one phonological (Pph) or intermediate phrase (ip) which is delimited on the right edge by a phrase accent (H- or L-). Each phonological phrase contains at least one pitch accent. Prosody is hierarchical, but this hierarchy is shallower than that of syntax (Cutler et al., 1997). 3 How do prosody and syntax relate? Syntax refers to the rules native speakers use to put words together into sentences. There is not a one-to-one mapping between prosody and syntax. However, prosodic boundaries often coincide with syntactic boundaries. For example, segmentation of a sentence relied on both syntactic structure as well as the acoustic pattern in a listening task (Wingfield & Klein, 1971). Subjects heard spliced sentences in which a complete phrase was put into a sentence that matched or did not match the intonation pattern of the phrase. The task was to determine the point at which the recorded sentence switched from one ear to the other. Both the syntactic form and the prosodic pronunciation influenced the determination for the switch. The authors concluded that segmentation is determined primarily by syntactic structure, but the acoustic pattern helps to mark the syntax (Wingfield & Klein, 1971). Prosody also helps to disambiguate syntactically ambiguous sentences. In another early experiment, Lehiste (1973) asked four speakers to read grammatically ambiguous sentences. Listeners heard the sentences and had to guess the speakers’ intended meaning. Listeners were better than chance for 10 of the 15 sentences (Lehiste, 1973). Analysis of acoustic properties from the sentences revealed that timing and intonation were successful strategies for disambiguation (Lehiste, 1973). Those sentences that were difficult for subjects to interpret were those sentences in which the difference in meaning did not correlate with a different surface constituent structure, for example, “John doesn’t know how good meat tastes” (Lehiste, 1973). 4 Past research indicated that listeners use prosody to determine the meaning of an ambiguous sentence (Lehiste, 1973, Lehiste, et al., 1976), but several recent papers questioned whether this effect is partly a result of using trained speakers who produced intonation and timing patterns that clarify ambiguities for listeners. Do people produce and perceive prosodic cues to resolve syntactic ambiguity in normal conversations? In one study (Albritton, McKoon, & Ratcliff, 1996), trained and untrained speakers read syntactically ambiguous sentences that were embedded in two different passages that clarified the intended meaning. Two judges rated their intended meaning. When the untrained speakers were unaware of the ambiguity, they read the passages without disambiguating the embedded sentences, according to the two judges’ ratings about the intended meaning. Likewise, trained speakers who were unaware of the ambiguity did not disambiguate the meaning. Only trained and informed speakers were judged, by both independent raters and by naïve listeners, to have disambiguated the meaning when reading the sentences (Albritton, et al., 1996). The authors concluded that although it is possible to use prosody to disambiguate syntax, prosody may be a relatively minimal cue and its use may not translate to conversational speech outside the laboratory (Albritton, et al., 1996). In another study examining natural speech, speakers memorized and then produced short passages containing an embedded syntactically ambiguous sentence (Fox Tree & Meijer, 2000). The researchers pitted prosody against context by inserting either the original sentence or an incongruent middle sentence between two sentences that indicated a semantic context. The incongruent middle sentence contained the same 5 words as the original middle sentence, but came from a production of the sentence within another context. When listeners chose one of two intended meanings of the embedded sentence, they made their decision based on the context and not on the prosody of the embedded sentence (Fox Tree & Meijer, 2000). The authors concluded that prosodic cues are not consistent enough to use for syntactic disambiguation in everyday conversation. From this research, it may appear that prosodic cues are not generally useful for syntactic disambiguation in a conversational context. However, several problems in this study make this conclusion less clear. For example, the stimuli were created by naïve speakers who read the passages silently and then delivered them from memory. This method of production is more natural than reading, but speakers may have been more concerned with correctly saying the words from the three sentences than with communicating the idea of the passage. Also, the listeners’ choice of intended sentence meaning could be made without even hearing the embedded sentence since the semantics of the context sentences made the answer clear. In addition, listeners were not told to use prosody (or even the middle sentence) to make their decision, so it is not surprising that they used the context. Other research has demonstrated the use of prosody by untrained speakers in a natural context. In a game task, naïve speakers produced prosodic cues to disambiguate syntax (Schafer, et al., 2000). The set of possible utterances was limited, so the speakers learned to produce them without making errors or referring to the text. The speakers disambiguated the syntax with prosodic cues, even when the game situation did not require disambiguation (Schafer, et al, 2000). 6 The debate about the use of prosody to interpret ambiguous syntax in natural contexts continues, but there is certainly evidence to suggest a link between prosody and syntax (Wingfield & Klein, 1971; Lehiste, 1973, Schafer et al., 2000). The relationship between these speech elements is not isomorphic (Cutler et al., 1997). In fact, Beckman (1996) argues that prosody has its own structure that is parsed. Musical prosody What is the musical equivalent to prosody? Prosody is expressiveness in music; the acoustic features that performers add, beyond what composers specify in notation. Such features are referred to as performance “expression,” and can differentiate two performances of the same music (Palmer, 1997). What is musical syntax and how does prosody relate to it? Western tonal music contains style-specific syntactic properties, such as meter and grouping (Cooper & Meyer, 1960, Lerdahl & Jackendoff, 1983). Meter refers to the alternation of strong and weak beats. For example, the beats in a march (2/4, 4/4) alternate between strong and weak while the beats of a waltz (3/4, 6/8), follow a strong, weak, weak pattern. Grouping is based on pitch relationships or rhythmic patterns (Lerdahl & Jackendoff, 1983; Cooper & Meyer, 1960). Both meter and grouping are hierarchically arranged, with sequences divided into smaller sequences of pitches or rhythms. The acoustic features that contribute to performance expression are related in a rule-based way to the printed score. For example, differences between categories of length or pitch are exaggerated, such that short notes are shortened and long notes are lengthened (Sundberg, 1999). Also, small pauses are inserted between pitch leaps and 7 musical phrases, emphasizing the grouping structure of the music (Sundberg, 1999). Decreased tempo and dynamics are expected at the ends of phrases (Windsor & Clarke, 1997, Henderson, 1936). This phrase-final lengthening also indicates the hierarchical importance of the phrase (Lerdahl & Jackendoff, 1983; Palmer, 1996a; Palmer, 1997). Meter is also expressed through acoustic features. Events that align with metrically strong beats are performed with increased duration, louder accents, and a more legato articulation than weak beats (Sloboda, 1983, 1985). One study examined whether accents associated with different musical structures (meter, rhythmic grouping, and melodic accent) influence performance expression independently or interactively (Drake & Palmer, 1993). Meter and rhythmic grouping influenced performance expression independently, but the influence of melodic accent on expression depended on the context (Drake & Palmer, 1993). The relationship between performance expression and structure also influences the way listeners perceive music. When listeners heard performances that were altered to contain one or more of the acoustic cues, listeners used the articulation cues to choose the intended meter (Sloboda, 1985). Loudness was also used to determine meter, but not all performers differentiated meter with loudness (Sloboda, 1985). Listeners were less likely to detect a computer-lengthened event before a long duration in a simple rhythmic pattern (Drake, 1993); the same location at which performers often lengthen events (Drake & Palmer, 1993). Also, listeners were less likely to detect a lengthened event in a computer-generated performance when it occurred at a structurally-expected location (Repp, 1992). Listeners’ judgments of goodness of fit of a probe beat inserted in a 8 metrical context reflected knowledge of metrical accent structure (Palmer & Krumhansl, 1990). Thus, listeners use the prosodic features to determine the structure of a production and they also use the structure to guide their perception. Memory for prosody Human listeners can understand speech that is produced by men, women, and children, with different vocal pitch ranges. They can understand speakers with foreign accents and even speakers with colds. For this reason, many studies on speech perception focused on listeners’ ability to ignore the prosodic details in order to understand the message. This approach of normalization assumes that the listener transforms the physical speech signal into a standard representation of the sentence devoid of prosodic details (Pisoni, 1997). It is this bland representation that is stored in memory. According to this approach, the acoustic details of a production will not be retained. Several more recent studies suggest that acoustic features of speech are incorporated in memory for language. Sentences are recognized more accurately when they are presented with the same prosody at learning and test. Also, prosodic cues aid memory for syntactically ambiguous sentences (Speer, et al, 1993). Listeners can use extralinguistic information, including talker identity and talker’s rate, to accurately identify previously presented words (Bradlow, Nygaard, & Pisoni, 1999). The rate of presentation affects listeners’ abilities to recall items produced by different speakers. Listeners show better recall for those items presented at the same rate in both 9 familiarization and test than for items presented at different rates from familiarization to test (Nygaard, Sommers, & Pisoni, 1995). These findings suggest that prosodic cues influence memory for speech contents. When you walk away from a concert humming, do you remember only the melody or do you remember the way in which the song was performed? Some early research in music perception focused on listeners’ ability to recognize a tune, even when performed at a different tempo or with different instrumentation. This approach of perceptual constancy, in which the stimulus sounds the same although it is physically different, assumed that some of the acoustic details were removed in order to recognize the underlying similarity of a musical excerpt (Dowling & Harwood, 1986; Large, Palmer, & Pollack, 1995). Through a normalization process similar to the one proposed for speech, acoustic details were filtered out and only the underlying representation was stored in memory (Large, et al., 1995). Another reason the prosodic cues may not be retained in memory is because listeners do not have fine-grained memories for music performances (Raffman, 1993). Raffman (1993) suggests prosodic cues are used to form categories of pitch and rhythm, but the specific acoustic details are not retained. There is evidence that even trained musicians do not accurately identify small within-category pitch differences (Siegel & Siegel, 1977). Thus, memory for music will be limited to larger pitch and rhythm categories. 10 Some studies demonstrate that listeners remember particular acoustic features from performances. Palmer, Jungers, and Jusczyk (2001) examined memory for acoustic details in music performance. Listeners with and without music training were familiarized with one of two performances of the same short musical excerpt. The performances differed in articulation, intensity, and interonset interval cues. At test, the listeners heard the original performances from familiarization as well as different performances of the same melodies (the same notated pitches and durations), but different intensities, articulations, and interonset intervals. Listeners were asked to identify which of the performances were present at familiarization. Listeners could recognize the performances of the melodies they had heard during familiarization, even though the categorical pitches and durations in the two versions were identical (Palmer et al., 2001). The adult listeners in Palmer et al. (2001) had many years of exposure to music. To address whether this musical acculturation is necessary for memory for musical features, Palmer et al. (2001) tested 10-month-old infants’ memory for performances with the same melodies, using a head-turn preference procedure (Kemler Nelson et al., 1995). After being familiarized with one performance of each melody, infants oriented longer to the familiar performances during test than to other performances of the same melodies. Thus, even infants (with little exposure to music) can use acoustic cues that differentiate performances to form a memory for short melodies (Palmer et al., 2001). 11 Although this study indicated that listeners are sensitive to subtle performance differences and can retain them in memory, it does not indicate which prosodic cues are most salient in perception and memory. In another study, musician listeners were tested for their ability to discriminate and remember music performances that differed in only one or two acoustic cues (Jungers & Palmer, 2000). In a discrimination task, listeners could accurately distinguish same from different pairs of performances of the same melody listeners when articulation or articulation with intensity cues were present. In a memory task, musician listeners were familiarized with performances that varied in articulation, intensity, or articulation with intensity cues and later heard these performances as well as novel performances of the same melody. Listeners could more accurately identify performances they had heard before and were most accurate at identifying those performances that varied in articulation cues (Jungers & Palmer, 2000). Thus, listeners were particularly sensitive to the articulation cues in music performances; listeners discriminated musical sequences based on the timing between pitch events within the sequence. Both musician and non-musician listeners can remember particular performance tempi over prolonged timed periods. Musicians can reproduce performances of long musical pieces, such as an entire movement of a symphony, at the same tempo with very low variability (Clynes & Walker 1986; Collier & Collier, 1994). Similarly, nonmusicians can reproduce popular songs from memory at tempi very close to the original tempo (Levitin & Cook, 1996). Furthermore, when people sang familiar songs as fast or as slow as possible, songs that lacked a tempo standard in original recordings were 12 produced with a larger variability in tempo; this counters arguments that memory for the tempo of remembered songs was solely a function of articulatory constraints (Levitin & Cook, 1996). In sum, listeners demonstrate the ability to remember prosodic details in both speech and music. Listeners recognize sentences more accurately when the same prosodic cues are present at learning and test (Speer et al., 1993). Listeners can use information such as talker identity and rate to identify previously heard words (Bradlow, et al., 1999). Adults and infants remember the specific acoustic details of a performance and can distinguish previously heard performances from different performances of the same categorical pitches and durations (Palmer, et al., 2001). Also, nonmusicians reproduce popular songs at the tempo they have heard before (Levitin & Cook, 1996). Prosody details of language and music become part of the memory representation. Do these prosodic cues influence future performances? Several studies point to both acoustic and structural features that persist in speech and music. Prosodic persistence One aspect of speech that may persist is the rate. Kosslyn and Matt (1977) played a recording of two male speakers for listeners: one speaking at a fast rate and one at a slow rate. Then the subjects read a passage they were told was written by one of the speakers. The subjects imitated the rate of the speaker who supposedly wrote the passage, although they were not explicitly instructed to do so (Kosslyn & Matt, 1977). In that study, it is possible that subjects may have associated each written passage with a particular speaker and felt an expectation to reproduce the rate of that speaker. 13 Another aspect of speech that persists is syntax. When listeners were asked to repeat a sentence they had heard and then produce a description of a picture, they tended to use the same syntactic form as in the former sentence to describe the scene (Bock, 1986). For example, when subjects heard and repeated the sentence, “The referee was punched by one of the fans,” they were more likely to describe a picture with a church and a lightning bolt as “The church is being struck by lightning,” with both sentences in the passive form (Bock, 1986). A few studies suggest that the tempo of music performances persists across sequences. Cathcart and Dawson (1928) instructed pianists to perform one melody at a particular tempo and another melody at a faster or slower tempo. When pianists attempted to perform the first melody again at the original tempo, their tempo drifted in the direction of the second melody. More recently, Warren (1985) reviewed studies of tasks that varied from color judgments to lifting weights. Each domain displayed a perceptual homeostasis, which Warren (1985) termed the “criterion shift rule:” that the criterion for perceptual judgments shifts in the direction of stimuli to which a person has been exposed. Warren (1985) suggested that a criterion shift serves to calibrate perceptual systems so that behavior will be appropriate for environmental conditions. Jungers, Palmer, and Speer (2002) found evidence for temporal persistence in language and music. In a speech experiment, subjects read two sentences aloud as a measure of their preferred speech rate. Then they heard a prime sentence followed by a written target sentence matched for number of syllables, lexical stress pattern, and syntactic structure on each trial. The primes were recorded by a naive female speaker at 14 slow (750 ms or 80 bpm per accent) and fast (375 ms or 160 bpm per accent) rates. Subjects read the target sentences aloud. They were instructed to attend carefully to the sentences for a later recognition task. The subjects’ rates were influenced by both their preferred rate and the prime rate. The music task was similar in design to the speech task, but subjects were experienced pianists. Prime and target melodies were matched for meter and length. As in the language version, both the prime rate and the preferred rate predicted the performance rate of the target melody. The goal of the current studies is to examine whether people persist in the prosody of language and music. The prosodic persistence study by Jungers, et al. (2002) suggests that people do persist in the global prosodic dimension of tempo. This study seeks to determine if people persist in prosodic dimensions that do or do not relate to the syntactic or rule-based elements of speech and music. A series of control experiments that test listeners’ syntactic and metrical interpretations serve to give base rate information for a set of sentences and melodies that listeners can clearly disambiguate, metrically and syntactically, based on prosodic cues of intensity and phrase breaks. Four experiments are reported, two of which address prosodic cues in performance and two of which address prosodic cues in perception. The first experiment examines whether pianists persist in the intensity or the articulation of what they have just heard. The intensity patterns of strong and weak beats are tied in a meaningful, rule-based way to either a binary (2/4 or 4/4) or ternary (3/4 or 6/8) meter. Articulation is an acoustic variable that is varied across performances, but not tied to a particular meter. (Articulation is defined as the offset of one event minus the 15 onset of the next event, so that negative values represent staccato (separated) events and positive values represent legato (overlapping) events). Melodies heard by pianists contained either staccato or legato articulation across all events in both the binary and ternary intensity patterns. After pianists heard these melodies, they then performed melodies that were similar in number of events and musical structure. The experiment asked whether pianists persist in the rule-based (intensity-meter) or the non-rule-based (articulation) acoustic cues from what they have heard before when they perform similar melodies. Pianists were told to concentrate on the melodies for a later recognition task. The melody recognition task addressed whether intensity or articulation cues influence later memory recognition of these melodies. If pianists form a representation that includes only categorical information (Raffman, 1993), then they would not be expected to produce or remember either the articulation cues or the intensity patterns. These cues should be used to form the representation of the melody, but then be lost. Another possibility is that pianists will only focus on and persist in the intensity cues identifying meter, since this is part of the musical structure, while the articulation cues will not be retained. The second experiment, a language production study, examines whether speakers persist in the phrase break or the tonal pattern of what they have just heard. The phrase break location (placed early or late in the sentence by a phonetically trained speaker instructed to use particular breaks and tonal patterns) is correlated with the syntactic interpretation of the sentence. The tonal pattern (H-L% or L-H%, heard at the phrase break), another prosodic dimension, is varied across utterances, but is not tied to a 16 particular syntactic interpretation. Listeners heard a sentence and then produced a sentence similar in number of syllables and grammatical structure. Experiment 2 asked whether speakers persist in the syntactic (phrase break) or the non-syntactic (pitch pattern) acoustic cues. Listeners were also asked to remember the sentences for a later recognition task. The sentence recognition task addressed whether these syntacticallyrelated or syntactically-unrelated cues influence recognition memory for the sentences. If listeners form a reduced representation of each sentence based solely on the words of a sentence, then neither prosodic phrase break nor tonal pattern cues may persist. If listeners form a representation that includes sentence meaning, without acoustic details, they may still persist in the phrase break since it relates to the meaning, but they will not persist in the pitch pattern. There is evidence that speakers persist in the syntax of a sentence (Bock, 1996). Since the phrase break prosodic cue relates to the syntax, then participants may persist in the phrase breaks only. If listeners form a representation that includes all the acoustic details of a sentence, then they are more likely to persist in both the tonal pattern and the prosodic phrase break. This result is possible since there is evidence that speakers do remember non-syntactic prosodic details (Speer, et al., 1993). It should be noted that even if listeners perceive and remember the acoustic details, this does not mean the listeners will then produce these acoustic features in a subsequent utterance. The last two experiments test the possibility that a second group of listeners can detect the syntax or meter that persisted in the musicians’ and speakers’ productions. The third experiment, a music perception experiment, examines whether listeners can 17 correctly interpret the intended meter of productions from the performance experiment. The productions of four pianists were included. Two of the pianists used intensity cues to indicate meter and two of the pianists used articulation cues. Will listeners be able to identify the meter of the performances using the intensity or articulation cues? If the listeners in the music perception experiment are able to use the pianists’ target performances to identify the original meter, then there is evidence that prosody can persist through more than one listener. The fourth experiment, a speech perception experiment, examines whether listeners can interpret the syntax of the productions from four of the speakers in the speech production study. Listeners were asked to identify the intended meaning of each produced sentence and choose one of two interpretations. If listeners can accurately interpret the syntax of the productions that match the original prime sentence syntax, it suggests that persistence could be useful in a conversational context. 18 CHAPTER 2 CONTROL EXPERIMENT 1: MUSIC STIMULI The goal of this experiment was to identify a set of melodies whose meter can be determined by listeners through intensity cues in staccato and legato computer-controlled performances. In order to make claims about persistence, it was important to begin with stimuli that contained salient acoustic cues that listeners clearly perceived. Thus, if a pianist’s performance contained particular acoustic cues, the claim that these cues related to the original heard performance was possible. In this experiment, listeners heard metrically ambiguous melodies, performed with intensity cues indicating either a binary or ternary interpretation, as well as a control version of the melodies that contained no intensity cues (all events were the same intensity). Listeners chose “binary” or “ternary” and their confidence in their decision. The set of stimuli for which their “binary” or “ternary” decisions agreed with the intensity cue pattern in both the staccato and legato versions were to be used later in the music performance study. Method Participants. Twenty-six musically trained listeners participated in the study. Twenty-five participants had formal training on a musical instrument (mean yrs of private lessons = 6.78, range = 2 to 12 yrs). One participant did not have any private lesson 19 experience, but had performed an instrument in a band for 8 yrs and was included in the study. Participants received course credit in an introductory psychology course. None of the subjects reported having any hearing problems. Apparatus. The musical stimuli were heard over speakers with a piano timbre generated by a Roland RD-600 keyboard and amplified through a Superscope stereo amplifier. Materials. The stimuli consisted of 30 short, isochronous melodies that were 13 quarter-note events long. Each melody was composed to be metrically ambiguous so that the melodic contour did not clearly indicate either a binary (2/4 or 4/4) or a ternary (3/4 or 6/8) meter. See Figure 2.1 for a metrically ambiguous sample melody. The interonset interval (IOI, measured from onset-to-onset) for each quarter-note event was 500 ms. Two different articulation versions for each melody were created: staccato and legato. In the staccato version, there were 350 ms between the offset of one event and the onset of the next event; the duration of each tone was 150 ms. In the legato version, there were 10 ms between the offset of one event and the onset of the next event; the duration of each tone was 490 ms. Musically, staccato is described as a detached style and legato is described as a connected style of performance. For each articulation version, three intensity patterns were created on the computer: control, binary, and ternary. In the control version, all of the note events were the same intensity. In the binary version, the event intensities alternated between strong and weak. In the ternary version, the event 20 Figure 2.1. Sample metrically ambiguous melody 21 intensity pattern was strong, weak, weak. Each of the 30 melodies had 6 versions, based on articulation and intensity patterns: staccato-control, staccato-binary, staccato-ternary, legato-control, legato-binary, and legato-ternary. An amateur pianist performed a subset (melodies 1-5) of the stimuli on a Roland RD-600 keyboard to determine appropriate values for the intensity levels and staccato/legato articulation values. The performer had 12 years of private piano lessons. The pianist first performed each of the 5 melodies in a binary and ternary style to a metronome set to 500 ms. The intensity was measured in MIDI units that are correlated with keystroke velocity. The average note intensity for metrically strong beats across all performances in both meters was 85. For the binary performances, the odd beats were metrically strong. For the ternary performances, every third beat, beginning with the first beat, was metrically strong. The thirteenth (final) event was not included in the analysis. The average note intensity value for metrically weak beats across both meters was 52. The average intensity value across all events in both meters was 66. These numbers became the intensity values for the accented and non-accented stimuli events. The control condition consisted of all events at the intensity level of 66. The same pianist performed these melodies on a separate occasion to determine the appropriate duration (ISI, measured from onset-to-offset) for the staccato and legato versions of the experiment. The pianist was instructed to perform each of the five melodies in a staccato and legato style. On this occasion, the pianist did not play to a metronome, but the average IOI was 394 ms per quarter-note event. The average duration (onset-to-offset) of the notes was 392.8 in the legato version (approximately 22 100% of the IOI) and 125 in the staccato version (approximately 30% of the IOI). Based on these values, the staccato duration for the experimental stimuli was set to 150 ms, 30% of the 500 ms IOI and the legato version was set to 490, 98% of the 500 ms IOI. Design. There were 6 versions of each of the 30 melodies, for a total of 180 performances created on the computer. These 180 performances were divided by articulation type into 2 sets of 90 performances: a staccato set and a legato set. Within each set, the same melody (13-note sequence) was heard 3 times, once each for the control, binary, and ternary intensity patterns. Each participant was presented with one of the two sets so that the participant heard either the staccato or the legato version of the three intensity patterns. The first independent variable, articulation, had two levels (staccato and legato), and was a between-subject factor. The second independent variable was intensity pattern with three levels (control, binary, ternary) and was a within-subject factor. The sets were arranged so that the same melody (13-note sequence) would not be heard with different intensity patterns on adjacent trials and so that no intensity pattern (control, binary, ternary) would be heard more than three times consecutively. Procedure. Listeners heard a melody over speakers repeated twice on each trial with 1 second of silence between repetitions. and they were instructed to judge whether the melody was performed in a binary (2/4 or 4/4) or ternary (3/4 or 6/8) meter. The participants circled the word “binary” or “ternary” and also circled their confidence in their decision on a scale of 1 to 3, with 1 = not confident and 3 = confident. 23 Results A one-way analysis of variance (ANOVA) on percent ternary response by intensity (control, binary, ternary) collapsed across the articulation conditions (staccato/legato), revealed significant differences among the three intensity patterns, F(2, 50) = 179.37, p < .05. The mean response differed in the three conditions, with participants responding correctly in the binary and ternary conditions: control = .43, binary cues = .11, ternary cues = .91. The “binary” and “ternary” choices were combined with the confidence scale to create a scale with 1 = confident binary and 6 = confident ternary, with middle values showing less confidence. An analysis of variance on the rating scale showed the same pattern of results as the binary/ternary decision, with significant differences among the intensity versions, F (2,50) = 157.54, p < .05. The scale means showed the same pattern: control = 3.26, binary cues = 1.84, ternary cues = 5.35. An ANOVA on percent ternary response for the articulation and intensity variables showed a main effect of intensity (F (2, 48) = 178, p < .05), but there was no main effect of articulation and no interaction of articulation and intensity. In addition, the individual melodies were examined to ensure that they each followed the expected pattern of metrical interpretation. Four melodies were not clearly perceived by subjects as binary or ternary and they were not included in the music performance experiment. Also, two melodies showed a strong binary bias in the control condition and were not included in the music performance experiment. Thus, 24 melodies remained for which unambiguous metrical judgments were given in the presence of intensity cues. 24 Discussion This experiment was conducted to find a set of metrically ambiguous melodies that listeners would perceive as binary or ternary when given intensity cues that denote that meter. In order to make claims about persistence, the initial melodies must be shown to be clearly identifiable as “binary” or “ternary” by listeners. Participants successfully used the intensity patterns to make judgments about meter, but found the melodies to be metrically ambiguous in the absence of intensity cues. As expected, the articulation pattern did not influence their decision because the articulation cues were not correlated with the metrical structure. 25 CHAPTER 3 CONTROL EXPERIMENT 2: SPEECH STIMULI In order to make claims about speech persistence, it was necessary to identify linguistic utterances that contain salient acoustic cues that listeners can clearly perceive. Thus, if a speaker’s utterances contain particular acoustic cues, the claim that these cues relate to the original heard sentence is possible. In this experiment, listeners heard syntactically ambiguous sentences, produced with either early or late prosodic phrase breaks. For example, the sentence, “She spoke to the child with an accent” was heard with an early break after “spoke” or a late break after “child.” In addition, a version of each sentence was presented that contained either H-L% or L-H% intonational pitch patterns at the phrase break. The L-H% pattern was marked by a pitch drop followed by a small pitch rise. The H-L% was marked by a steady pitch located in the higher part of the pitch range of the speaker. Method Participants. Sixteen adult listeners participated in the study. Most listeners had little or no formal musical training (mean = 1.5 yrs private lessons). Participants received course credit in an introductory psychology course. None of the subjects reported having any specific hearing problems. Apparatus. The speech stimuli were presented by computer and were heard over AKG K270 headphones at a comfortable listening level. 26 Materials. The stimuli consisted of 54 syntactically ambiguous sentences. There were four types of ambiguous sentences. One sentence type included an adjective that ambiguously modified either a conjoined noun phrase or the first of the two nouns in this phraes, as in the sentences, “The boy bought black shoes and socks” and “The old cat and dog were the last to be adopted.” The second sentence type included an ambiguous prepositional phrase attachment, as in, “She spoke to the child with an accent.” Another sentence type involved conjunctions, as in, “Pat and Shelly’s father said it was done for now” or “Either Brett or Mike and Kay will come to babysit.” The final sentence type contained an ambiguous pronoun, as in, “Jay called Eric and he yelled at Jamie.” Four variations of each sentence were recorded by a female phonetician who was familiar with the ToBI transcription system. The speaker was instructed to use an early prosodic break or a late prosodic break, and either a H-L% or L-H% intonational phrase boundary tone at the break. Thus, there were 4 spoken versions of each sentence: early break-H-L%, early break-L-H%, late break-H-L%, late break-L-H%. Previous work has shown that listeners expect a sentence to continue following the H-L% or L-H% intonational phrase boundary tones (Beckman & Pierrehumbert, 1986). Thus, the H-L% and L-H% intonational boundary tones can indicate similar sentence interpretations, unlike a H-H% boundary tone, which previous work has shown that listeners usually perceive as indicating a yes-no question. All sentences ended in a L-L% phrase accent and boundary tone sequence. Unlike past studies (Albritton, et al., 1996, Fox Tree & Miejer, 2000), in which the speaker read a sentence in a meaningful context or was told 27 to clearly disambiguate, this speaker was instructed only about the type of phrase break and pitch contour. The remaining words and breaks were not specified, but the speaker produced similar sentence types with the same prosodic pattern. The phrase breaks were syntactically-related cues, like the intensity in the music, since the location of a phrase break was predicted to determine the syntactic interpretation. The specific tonal sequence of phrase accent and boundary tone was like the articulation in the music, because the pitch pattern that occurs at an intonation phrase break in English is not predictable from a sentence’s syntactic form. There are at least six possible phrase final tonal sequences in English, including H-L%, !H-L%, H-H%, !HH%, and L-L%, so that the presence of a phrase break does not specific the type of pitch pattern. One difference between the articulation cue in the music and the pitch pattern in the speech is the scope of the cue. The intensity (staccato or legato) was a global prosodic cue, appearing on each event. The pitch pattern (H-L%, L-H%) was a local prosodic cue, appearing at a phrase break location. The syntactic interpretation of the sentence was related to the phrase break location, but the particular phrase break locations and possible sentence interpretations depended on the type of sentence. Table 3.1 lists the four sentence types and gives an example for each interpretation. 28 1. Adjective that modifies either a conjoined noun phrase or the first of the two nouns Early prosodic phrase break: The boy bought black shoes / and socks. Interpretation: The shoes are black, but the socks may be black. No prosodic phrase break: The boy bought black shoes and socks. Interpretation: Both the socks and shoes are black. 2. Prepositional phrase attachment Early prosodic phrase break: She startled / the man with the gun. Interpretation: The man had the gun. Late prosodic phrase break: She startled the man / with the gun. Interpretation: She had the gun. 3. Conjunction Early prosodic phrase break: Either Brett / or Mike and Kay will come to babysit. Interpretation: Brett will come alone or Mike and Kay together will come. Late prosodic phrase break: Either Brett or Mike / and Kay will come to babysit. Interpretation: Kay will come with one of the two men (Brett or Mike.) 4. Ambiguous pronoun Pitch accent on “and”: Ruth hit Kate AND she hit Jason. Interpretation: Ruth hit Jason. Pitch accent on pronoun: Ruth hit Kate and SHE hit Jason. Interpretation: Kate hit Jason. / = prosodic phrase break CAPITAL letters = pitch accent Table 3.1. Control Experiment 2 and Experiment 2: Four types of syntactically ambiguous sentences. 29 The sentence, “Either Brett or Mike and Kay will come to babysit” was produced with a phrase break after “Brett” (early break) or after “Mike” (late break). The early break, “Brett/ or Mike and Kay,” suggested that Brett alone or Mike and Kay together will come to babysit. The late break, “Brett or Mike/ and Kay,” suggested that Kay will come with one of the two men (either Brett or Mike). Sentences with an ambiguous prepositional phrase, such as, “She spoke to the child with an accent” contained either an early break after “spoke” or a late break after “child.” For these sentences, a break after “spoke” suggested that the child had the accent, a low syntactic attachment interpretation. A late break after “child” suggested that the main subject, “she” had the accent, a high syntactic attachment interpretation. The ambiguous adjective sentences did not have a true “early” and “late” break. Instead, there was an early break after the second noun, as in, “He bought black shoes/ and socks.” The “late break” version contained no breaks within the sentence. Unlike the music stimuli, in which a computer-generated controlled intensity version is possible, it is difficult to create a neutral or control sentence. Instead, the linguistic control consisted of the same sentences presented visually (in text) on a computer screen to assess readers’ interpretation without any auditory stimulus. Design. There were 4 spoken versions and 1 text version of each of the 54 sentences, for a total of 216 spoken utterances and 54 text sentences. The 216 spoken utterances were divided by intonational phrase boundary tone into 2 sets of 108 utterances: a H-L% set and a L-H% set. Within each set, the same sentence was heard twice, once with an early phrase break and once with a late phrase break. Each participant 30 was presented with one of the two sets so that the participant heard either the H-L% or the L-H% version of the two phrase breaks for each sentence. Every participant saw the same 54 text sentences. The first independent variable, boundary tone, had two levels; H-L% and L-H%, and was a between-subject factor. The second independent variable was phrase break with three levels (control (text), binary, ternary) and was a withinsubject factor. Procedure. In the first block of trials, the 54 control sentences were presented in text on the computer screen. On each trial, participants read the sentence silently and three seconds later, two possible written interpretations appeared, one on each side of the computer screen. For example, participants read, “She threatened the man with a gun.” The two interpretations were, “A) She had the gun.” and “B) The man had the gun.” Participants circled A or B on their answer sheet as well as their confidence in their decision on a scale of 1 to 3. Participants then filled out a questionnaire on their language background. This questionnaire was inserted at this point so that participants would be able to focus on the auditory stimuli and would be less focused on the written sentences they had just seen. In the second block of trials, the early and late phrase break spoken versions of each sentence were presented in random order. On each trial, participants heard a spoken sentence over headphones. Three seconds later, two written interpretations appeared on the screen and the spoken sentence was repeated. Subjects chose interpretation A or B and their confidence in their decision. The experiment lasted approximately 50 minutes. 31 Results A one-way analysis of variance on the mean percentage late break interpretations by early and late phrase breaks showed a significant difference between the phrase break conditions, F(1,15) = 206.0, p < .05, seen in the means: early break = 17.8% and late break = 85.9%. The interpretation choices were combined with the confidence scale to create a scale with 1 = confident early break and 6 = confident late break, with middle values showing less confidence. This scale showed the same pattern of results as the basic early/late decision. A two-way ANOVA on responses by phrase breaks and boundary tones showed there was no significant difference in responses to sentences with H-L% and L-H% boundary tones. The individual sentences were examined to ensure that they each followed the expected pattern of syntactic interpretation. Based on this control experiment, twenty-four utterances were chosen that listeners correctly interpreted based on the intended phrase break cues in both the H-L% and L-H% versions. Discussion This experiment was conducted to identify a set of syntactically ambiguous sentences that listeners interpreted as either high or low syntactic attachment when they heard a phrase break that signified that syntactic interpretation. Listeners were influenced by the phrase break in identifying the syntactic interpretation regardless of boundary tone (H-L% and L-H%) versions. 32 CHAPTER 4 EXPERIMENT 1: MUSIC PRODUCTION The goal of the music performance experiment was to determine if pianists persist in the prosody of a performance they have just heard. More specifically, do pianists persist in the metrically-related (intensity) or the metrically-unrelated (articulation) acoustic cues? Pianists heard either a strong-weak or a strong-weak-weak pattern of intensity which related to a binary and ternary interpretation. Pianists also heard staccato and legato performances. These articulation variables do not relate to any particular meter. On each trial, pianists heard a melody and then were asked to perform a different melody. The melodies were blocked by the intensity cue pattern. The pianists were not told to imitate what they had heard. Instead, the focus of the experiment was on the concluding memory test. Pianists were instructed that they would be tested on their memory for the melodies at the end of the experiment. Their performances were examined for intensity, IOI, and articulation. Method Participants. Sixteen adult pianists who had taken formal piano lessons (mean yrs of lessons = 9.375, range = 5-13) participated in the study. Participants received course credit in an introductory psychology course or a nominal fee for their participation. None of the subjects reported having any hearing problems. 33 Apparatus. Participants heard musical stimuli with a “concert grand” piano timbre generated by a Roland RD-600 keyboard over AKG headphones and they performed on a Roland RD-600 digital piano. The pitch events, timing, and keystroke velocity were recorded using FTAP (Finney, 2001). Materials. Twenty-four melodies taken from the control experiment served as stimuli. These melodies were perceived as metrically ambiguous when heard without intensity changes, but perceived as binary or ternary when the intensity pattern of strong and weak beats indicated a meter. Only the intensity-marked versions (binary, ternary) of these melodies were included. These melodies were paired as prime and target melodies. The melodies in each prime/target pair were both major or both minor, but they were in different musical keys. The musical notation for the target melodies did not include time signature (meter) or dynamic (intensity) indications. Design. The independent variables in this within-subject design were intensity pattern (binary, ternary) and articulation (staccato, legato). The intensity pattern was blocked so participants heard binary trials followed by ternary trials or vice versa. The intensity pattern was also counterbalanced with meter within a stimulus so that a given melody was heard in its binary intensity pattern for half of the subjects and its ternary intensity pattern for of half the subjects. The articulation was randomized across stimuli so that half of the stimuli were staccato and half were legato, but fixed for each stimulus; an individual prime melody was either staccato or legato for all participants. The member of the stimulus pair (prime, target) and the order (first half, second half) were 34 counterbalancing variables. The prime and target melodies alternated so a given melody was heard as a prime melody for half of the pianists and performed as a target melody for the other half. Pianists alternated listening to and performing melodies on each of ten trials, with a block of five trials in which the prime was binary and a block of five trials in which the prime was ternary. Thus, each pianist heard a total of ten prime melodies and performed ten target melodies. In addition, each pianist began with a practice trial. The prime melody in the practice trial contained the same intensity cues as the prime in the first experimental trial. The dependent variables were intensity, interonset interval (IOI, onset-to-onset), interstimulus interval (ISI, onset-to-offset), and adjusted articulation (ISI/IOI) across all events and on the expected metrically strong and weak beats. A fifteen-item memory test on prime melodies only followed the prime/target portion of the experiment and included four types of items: three “same” (identical to prime melodies), six “new” (not included previously in the experiment), three “same intensity-different articulation”, and three “different intensity-same articulation”. The items in the memory test were identical to or related to those items that had been primes. Procedure. The pianists first sight-read two melodies notated without meter indication to assess their preferred intensity and timing. Next, pianists listened to a prime melody and then performed a notated target melody on each of 11 trials. The first of these trials was a practice trial. The experimental trials were blocked in two groups of five prime-target pairs that contained the same intensity cues. Between the two blocks, 35 participants filled out a questionnaire about their music experience. The pianists were instructed to pay careful attention to the melodies because they would be asked to recognize them later. Following the prime and target melodies, pianists listened to a 15item memory test. On each trial, the melody was repeated two times. The pianists answered “yes” or “no” to whether they had heard this melody before. They also circled their confidence in their decision on a scale of 1 to 5, with 1 = not very confident and 5 = very confident. Results The offset time, onset time, and intensity levels for each note event were measured. The 13th event was the final event in each melody. Because the timing for the final event is unspecified, this event’s intensity and offset were not included in the analysis. The IOI (interonset interval) was measured from the beginning of one event to the beginning of the next event. ISI was calculated by subtracting an event’s offset minus its onset. The adjusted articulation was calculated by dividing the ISI by the IOI, to show the percent of the IOI during which the event sounded. An analysis of variance on IOI by articulation priming condition (staccato/legato) showed no differences between articulation priming conditions, F(1,15) = 2.19, n.s. An ANOVA on IOI by intensity (binary/ternary) and beat (strong/weak) showed no significant effects and no interaction. This is not surprising since IOI is a representation of the tempo of the performance and the priming melodies were heard with the same IOI throughout. 36 An ANOVA on ISI by articulation condition showed a significant difference in ISI following a staccato or legato prime , F(1,15) = 6.045, p < .05. Pianists played with a more separated style following the staccato priming melodies than following the legato priming melodies. However, this analysis does not take the tempo into account and the difference may simply be due to faster performances following the staccato prime. To control for this, the ISI was divided by the IOI, forming an adjusted articulation, to show the percent of the IOI during which the event was sounded. A legato event would be 1 or greater than 1 and a staccato event would be less than 1. There was a significant difference between the adjusted articulation following a staccato prime vs. a legato prime: F(1,15) = 5.15, p < .05. Even when controlling for tempo variations, the pianists played a more separated style following the staccato than following the legato prime melodies. The average adjusted articulation in the preferred melodies was 1.00, showing that pianists performed in a legato style when the prime was not present. Figure 4.1 shows the mean ISI/IOI for pianists’ target performances following legato and staccato prime melodies. 37 120 ISI / IOI (%) 100 80 60 Legato Staccato Prime Articulation Figure 4.1. Experiment 1: Mean ISI/IOI for pianists’ target performances following legato and staccato prime articulation 38 An analysis of performed intensity levels by meter of prime and expected strong and weak beats showed no significant differences between expected strong and weak beats, although there was a trend for the expected strong beats to be more intense than the expected weak beats. However, this does not mean that the pianists did not reproduce the meter of the prime melodies. An ANOVA on the adjusted articulation by expected strong and weak beats revealed a significant difference : F(1, 15) = 9.105, p < .05) with expected strong beats showing longer (legato) values than expected weak beats. See Figure 4.2 for the mean adjusted articulation (ISI/IOI) value for each event by prime binary and ternary melody conditions. Memory results. A one-way ANOVA on percent “yes” response by type of memory stimuli (same meter/same articulation, new, same intensity/different articulation, same articulation/different intensity) was borderline significant, with listeners most often responding “no” to new melodies which were not heard during the performance part of the experiment: F(3,45) = 2.73 , p = .055). The yes/no response was combined with the participants’ confidence in the decision to form a combined scale with 1 = confident no and 10 = confident yes. An ANOVA on the combined scale by the type of memory stimuli was significant, with participants responding no to new melodies better than previously heard melodies F(3,45) = 4.65, p < .05). Post-hoc tests did not show differences between the three types of old items (same meter/same articulation, same meter/different articulation, different meter/same articulation). Figure 4.3 shows the percent “yes” response for melodies with same or different prosodic cues. 39 97 96 Percent ISI / IOI 95 94 binary ternary 93 92 91 90 1 2 3 4 5 6 7 Event Number 8 9 10 11 12 Figure 4.2. Experiment 1: Mean ISI/IOI for each event in pianists’ target performances following binary and ternary prime melodies 40 100 90 80 Percent Yes Response 70 60 50 40 30 20 10 0 Same Meter Same Articulation Same Meter Stimuli Type Different Articulation Different Meter Same Articulation New Melody Figure 4.3. Experiment 1: Mean percent “yes” response for melody recognition task by cue condition 41 Discussion Pianists showed persistence of the metrically-unrelated cue of articulation (staccato/legato). Pianists did not perform with the same intensity pattern they had heard, even though this metrically-related cue was blocked. However, pianists did not ignore the meter. In fact, they did persist in the meter, but they used articulation cues than intensity to produce a binary or ternary metrical interpretation. Pianists played metrically strong beats with longer ISIs than metrically weak beats. This metrical beat difference was true even when tempo was controlled, as in the adjusted articulation. This means pianists did perceive the meter and persisted in the meter, but they did not simply imitate the acoustic cues they had heard. Instead, they formed a representation of meter based on the intensity pattern and performed a following melody in the same meter using ISI differences. Thus, pianists’ performances revealed persistence of metrically-related and metrically-unrelated prosodic dimensions. Although listeners were able to discriminate new from previously heard melodies, the memory test did not show differentiation between the categories of melodies listeners had heard before. This may be due partly to a floor effect. The melodies were very difficult to remember. Each melody consisted of thirteen isochronous events with no rhythmic or tempo variations to differentiate the melodies. Pianists reported finding the memory task very difficult. 42 CHAPTER 5: EXPERIMENT 2: SPEECH PRODUCTION The goal of the language production experiment was to determine if speakers persist in the prosody of sentences they have just heard. More specifically, do speakers persist in the syntactically-related or the syntactically-unrelated acoustic cues? The syntactically-related cue was the location of the phrase break (either early or late). The syntactically-unrelated cue was the tonal pattern (either H-L% or L-H%), which did not relate to a particular syntactic interpretation in the sentences from the control experiment. On each of twelve trials, participants heard a sentence and then read aloud a second sentence that was similar in number of syllables and structure. The sentences were blocked by the location of the phrase break. The participants were not told specifically that they were to imitate what they had heard. Listeners were instructed that they would be “asked to listen to and to read short sentences.” They were also instructed to “pay careful attention to these sentences because you will be asked to recognize them later.” Method Participants. Thirty-two adults participated in the study. Participants had between 0 and 8 years of private lessons on a musical instrument (mean = 1.95). Participants 43 received course credit in an introductory psychology course. Thirty-one of the subjects reported having no hearing problems and one subject reported a slight problem, but did not specify the problem and was included. Apparatus. Participants heard language stimuli over AKG K270 headphones and their voices were recorded to DAT using a head-mounted AKG C420 microphone. The auditory stimuli as well as the text for spoken sentences were presented on a personal computer. Materials. The experiment consisted of twenty-six heard and produced sentences, referred to as prime and target sentences. These sentences were chosen from the language control experiment because the ambiguous sentence meaning was interpretable based on the location (or existence) of a break and the pitch pattern (L-H%, H-L%) did not correlate with the syntactic interpretation. The prime and target sentences were paired, so that each pair was made of the same type of sentence ambiguity (prepositional phrase, pronoun, conjunction, modifying adjective) with a similar number of syllables. The text for the target sentences did not include punctuation such as commas to mark expected phrasing. See Table 3.1 for an example of each of the sentence types. Design. The independent, within-subject variables were phrase break (early break, late break) and intonational pitch pattern (H-L%, L-H%). The phrase break was blocked so that participants heard early break sentences followed by late break sentences or vice versa. The phrase break was also counterbalanced so that a given sentence was heard with an early break for half of the participants and a late break for half of the participants. The intonational pitch pattern was randomized within the phrase break 44 location so that half of the sentences were H-L% and half were L-H%. An individual prime sentence was either H-L% or L-H% for all participants. The counterbalancing variables were member of stimulus pair (prime, target), and order (first half, second half). The prime and target sentences alternated so a given sentence was heard as a prime sentence for half of the participants and produced as a target sentence for the other half. The dependent variables were IOI of each key word, pitch pattern, and phrase break. The IOI of key words were measured from the onset of the word to the onset of the next word. Thus, both the word duration and the following pause were included. A key word was defined as the word before a possible phrase break. If participants break after this word, the word should have a longer IOI than if the participants do not break at this point. The pitch patterns and phrase breaks were determined through a ToBI analysis. A twenty-item memory test on the prime sentences included five types of items: four “same” (identical to prime sentences), eight “new” (sentences not included in the experiment), four “same phrase break (syntax)-different tonal pattern (prosody),” four “different phrase break (syntax)-same tonal pattern (prosody),” and four “different phrase break (syntax)-different tonal pattern (prosody).” The new items contained some of the same words used in the original sentences so that participants could not use a single word or phrase to remember a sentence. Procedure. The participants first read aloud three sentences presented in text on the computer screen to assess their preferred prosodic production. Next, participants listened to a prime sentence and then produced a target sentence on each of thirteen trials. 45 The first trial served as a practice trial. Twelve experimental trials allowed for six early break and six late break trials. Between blocks, participants completed a questionnaire on their music and language background. Only ten experimental trials were used in the music performance experiment because ambiguous melodies were more difficult to create than ambiguous sentences. The participants were instructed to pay careful attention to the sentences because they would be asked to recognize them later. Following the prime and target sentences, participants listened to a twenty-item memory test. On each trial, the sentence was heard over headphones two times. The participants answered “yes” or “no” to whether the words in the sentence were exactly the same as the words from a sentence they had heard before. They also circled their confidence in their decision on a scale of 1 to 5, with 1 = not very confident and 5 = very confident. Results The IOI of the word proceeding the possible phrase break in each sentence was measured from the onset of the word to the onset of the next word, so that pauses following the word were included. If participants persisted in the prosody of the sentences they just heard, they should produce the same phrase breaks as in the prime. This analysis was based on the assumption that the word prior to a break will be longer than the same word that is not followed by a phrase break (Cooper & Paccia-Cooper, 1980). This IOI analysis did not include two sentences that had an ambiguous pronoun because these sentences were hypothesized to rely on accent and not duration for syntactic interpretation. Table 3.1 lists the sentence types and gives an example of the possible prosodic phrase break locations. In the ambiguous adjective sentences, such as 46 “She served the cold Sprite and tea,” the duration of the noun following the adjective was measured. In the ambiguous grouping sentences, such as “Mike and Meg or Bob will come to..,” the duration of the first noun was measured, and in the ambiguous prepositional phrase sentences, such as “She yelled at the kid using the intercom,” the first noun following the verb was measured. The average IOI for the words produced at an expected break was longer than the IOI for the words produced at a location without an expected break. The expected phrase break mean was 388 ms and the no phrase break mean was 373 ms: a 4% difference in IOI. However, an ANOVA on word IOI by break category (expected, not expected), was not significant. A ToBI analysis was conducted on half of the utterances to determine the participants’ prosodic phrase breaks and tonal patterns (Beckman & Pierrehumbert, 1986). The utterances of sixteen participants (four participants from each of the four orders) were included. As mentioned in the introduction, ToBI is a prosodic transcription system that allows a transcriber to mark prosodic components, including tones and breaks. The tonal component is made of a series of several nested tonal events. A boundary tone (either H% (high) or L% (low)) is located at the end of an intonational phrase. Each intonational phrase is made up of one or more intermediate phrases, marked by a H- or L- phrase accent. Each intermediate phrase must contain at least one pitch accent, such as H* (high pitch accent), L* (low pitch accent), or L+H* (rising from low to high). The break index describes the amount of disjuncture between words and ranges from 0 (words grouped together) to 4 (full intonational phrase). 47 A ToBI analysis on the recordings of the original primes showed that the speaker produced the same phrase break and pitch accent patterns for each early and late break sentence within each type of sentence ambiguity. Thus, the speaker used one pattern for all early prosodic phrase break sentences and another pattern for all late prosodic phrase break sentences within each of the four sentence types. See Table 5.1 for the ToBI transcription of an early and late prosodic phrase break example from each sentence type by the speaker. ToBI analysis on the participants’ utterances in response to the prime sentences showed some phrase break patterns and pitch accents that were expected with persistence. Although the results for this ToBI analysis were not statistically significant with a chi-squared test on expected break of prime by produced break of speaker, the target sentence patterns seen for the four sentence types agreed with the expected phrase breaks and pitch patterns from the prime sentences. See Table 5.2 for the percent of target utterances that contained the pitch accent or phrase break at the locations specified by the prime utterances. 48 Sample speaker’s production for phrase break and no phrase break for sentence type #1: The boy bought black shoes and socks. Early Phrase Break, H-L% Pitch Pattern: H* H* H* ( (black shoes H-) Pph L%) Iph and socks L-)Pph L%)Iph No Phrase Break within Sentence, H-L%% Pitch Pattern H* H* H* black (shoes and socks L-)Pph L%)Iph Sample speaker’s production for early and late phrase break for sentence type #2: She startled the man with the gun. Early Phrase Break L-H% Pitch Pattern: H* H* ((startled L-)Pph H%)Iph the man Late Phrase Break L-H% Pitch Pattern: *H H* startled the man L-)Pph H%) Iph Sample speaker’s production for early and late phrase break for sentence type #3: Either Brett or Mike and Kay will come to babysit. Early Phrase Break, L-H% Pitch Pattern: H* H* H* ((Brett L-) Pph H%) Iph or Mike and Kay Late Phrase Break, L-H% Pitch Pattern: H* H* H* ((Brett or Mike L-) Pph H%) Iph and Kay Sample speaker’s production for accented “and” or pronoun for sentence type #4: Ruth hit Kate and she hit Jason. Accent on “and”: H* H* Kate H-) Pph L%) Iph and L-) Pph she Accent on pronoun: H* H* Kate H-)Pph L%) Iph and she L-) Pph H%) Iph Table 5.1. Experiment 2: ToBI transcription for prime utterances 49 A ToBI analysis on phrase breaks or pitch accents only captures part of the prosodic information in a sentence. Another analysis examined the entire ambiguous phrase for each sentence, as shown in the speaker’s transcription in Table 5.1. For this analysis, the speaker’s prime data was compared to the participants’ target data in terms of pitch accents, intermediate phrase breaks, and intonational phrase breaks (as in Pitrelli, et al, 1994). If the speaker produced a pitch accent and the participant also produced a pitch accent, this counted as one match. Also, if the speaker produced a phrase break and the participant also produced a phrase break, this counted as a match. (Likewise, a match occurred if the speaker did not produce a pitch accent and the listener also did not produce a pitch accent at that location). If the listener also produced the same type (H*, L*, etc.) of pitch accent, this counted as an additional match. A total of 2514 items (pitch accents, intermediate phrase breaks, intonational phrase breaks, and the tonal information at each location) were judged for matches in the 16 participants’ utterances. Of the 2514 items, listeners’ targets matched the speaker’s primes for 1667 items, or 66.3%. 50 1. The boy bought black shoes / and socks. Pph after noun (expected following early prime sentence) Early No break 35% 33% 2. She startled / the man / with the gun. Early Late Pph after Verb (expected early) Pph after Noun (expected late) 46% 42% 50% 58% 3. Either Brett / or Mike / and Kay will come to babysit. Early Late Pph after N1(expected early) Pph after N2 (expected late) 58% 50% 41% 58% 4. Ruth hit Kate and* she* hit Jason. Early Late And (expected early)* Pronoun (expected late)* 33% 25% 53% 75% Table 5.2. Experiment 2: Percent of target utterances by participants following Early or Late prime sentences that contain pitch accent or phrase break at specified locations 51 An analysis was conducted to examine matches between speaker and listener sentences defined as the matching presence/absence of a pitch accent, matching presence/absence of an intermediate phrase break, or matching presence/absence of an intonational phrase break. The tones (H, L) at these pitch accent or phrase break locations were not analyzed. In this analysis, 1048 of 1365 items matched for the listener and speaker, 76.8%. The items were grouped into four categories for each sentence: “matched presence” (both prime and target contained a pitch accent or phrase break at this location), “mismatched presence” (prime had accent/phrase break but target did not have accent/phrase break at same location), “matched absence” (neither prime or target contained an accent/break), and “mismatched presence” (prime did not have accent/phrase break but target did have accent/phrase break). A chi-squared analysis showed a significant interaction of the speaker’s and listeners’ productions (chi-square (1) = 267.6, p < 0.05). See Table 5.3 for a summary of speakers’ and listeners’ productions. The listeners were more likely to produce a phrase break or pitch accent at the same location the speaker produced a break than at another location. 52 Speaker Prime Productions: Pitch accent/ phrase Listeners Target Productions at same location: Present: Absent: Present: 609 725 Absent: 280 1308 Table 5.3. Experiment 2: Chi-square table for prime and target utterances A ToBI analysis comparing the prime and target tones revealed similarities at pitch accent and intermediate phrase break locations. When the prime and target sentences contained a pitch accent or phrase break at the same location, 82.4% of the locations also matched for H or L tone. Phrase break location was related to the syntactic interpretation and 89.6% of the matching phrase break locations contained the same tone. Although pitch accent location was not related to syntax, 78.6% of the prime and target tones matched at pitch accent locations. Participants did not persist in the full L-H% or H-L% intonational phrase pitch pattern from the prime sentences when they produced target sentences. Participants rarely produced a full intonational phrase during the experiment. 53 Memory recognition task. An ANOVA on percent “yes” response by type of memory item (F (4,124) = 36.56, p < .05) showed significant differences, with subjects most often responding “no” to new items. As shown in Figure 5.1, listeners best remembered those sentences that contained the same phrase breaks (syntax) and the same tonal pattern (prosody) as in the experiment. This yes/no response was combined with confidence rating to create a 10-point scale with 1= very confident “no” and 10 = very confident “yes.” An ANOVA on combined confidence scale by memory item type was also significant, F(4,124) = 45.39, p < .05. These significant differences did not rely on the “new” items. An ANOVA on yes/no response by intonational pitch pattern (H-L%, L-H%) was not significant, F(1,31) = 0.09, n.s. A 2x2 analysis of variance on percent “yes” responses grouping together items with same breaks (with same or different pitches) and different breaks (with same or different pitches), showed a significant difference between responses to the two break categories: F(1,31) = 4.98, p < .05. As shown in Figure 5.2, participants responded “yes” most often when the same break was present in the memory test: same break = .79, different break = .66. This effect is also true for the confidence rating scale: F(1,31) = 6.11, p < .05. Post-hoc tests revealed that the “new” items were significantly different from the other four conditions. Also, the “same syntax-same prosody” condition was significantly different from the other four conditions. 54 100 90 80 Percent Yes Response 70 60 50 40 30 20 10 0 Same PhraseBreak Same Tonal Pattern Same Phrase Break Different Tonal Pattern Different PhraseBreak Same Tonal Pattern Different Phrase Break Different Tonal Pattern New Sentence Figure 5.1. Experiment 2: Percent “yes” response for sentence recognition task by cue condition 55 100 Percent Yes Response 80 60 40 Same Prosodic Phrase Stimuli Different Prosodic Phrase Figure 5.2. Experiment 2: Percent “yes” response for same or different prosodic phrase breaks 56 Discussion The ToBI analysis showed significant persistence effects for syntactically-related and syntactically-unrelated acoustic cues. Listeners produced the same phrasal events in their target utterances as they had heard in the prime. The analysis of IOI of words proceeding phrase breaks revealed some evidence of persistence in the lengthening of words before expected intermediate phrase breaks. Also, listeners’ target utterances matched the speaker’s target utterances for the presence/absence of prosodic phrase breaks and pitch accents at the same event location. In addition, when the prime and target sentences contained a pitch accent or phrase break at the same location, the tones matched. The memory test revealed differences between syntactically-related and syntactically-unrelated cues. Listeners were more likely to say “yes” to having heard a sentence before if the current sentence contained the same phrase break as the earlier sentence. Note that the instructions in the language memory experiment were very specific. Listeners were told to response “yes” if the sentence contained the “same words as a sentence you heard before.” Thus, listeners should respond “yes” to the twelve trials in which the sentence had the same words and “no” to the eight trials in which the sentence contained new words. Even with these specific instructions, listeners responded “yes” and had a higher combined confidence scale when the phrase break was the same as before. This suggests that listeners incorporated the syntactically-related cue of phrase break more strongly in memory than the syntactically-unrelated cue of pitch pattern. 57 CHAPTER 6 EXPERIMENT 3: MUSIC PERCEPTION The goal of the music perception study was to examine listeners’ ability to identify the metrical interpretation of performances that contained intensity or articulation cues to mark the meter. On average, pianists in the performance study persisted in the metrically-unrelated articulation of the original performances, but not in the metricallyrelated intensity. Instead, most performers used adjusted articulation cues (ISI/IOI) to mark the meter. In this experiment, participants listened to performances generated by four pianists from the music performance experiment; two pianists who consistently persisted in adjusted articulation cues to mark meter in their target productions and two pianists who consistently persisted in intensity cues to mark meter in their target productions. Listeners heard each performance repeated two times and indicated the meter they thought the performer intended. Method Participants. Sixteen adult (mean age = 19.125) musicians, who had taken formal lessons on an instrument (mean yrs of lessons = 6.125, range = 3-11), participated in the study. Participants received course credit in an introductory psychology course for their participation. All participants were native speakers of English. None of the subjects reported having any hearing problems. 58 Apparatus. Participants heard stimuli over AKG K270 headphones while seated at a computer. FTAP was used to control the stimulus presentation (Finney, 2001). Materials. A subset of the performances of pianists in the music performance experiment was included. Experiment 1 showed that pianists used articulation differences more often than intensity differences to mark meter. In this experiment, four pianists’ performances were included: two pianists who used intensity to mark metrical downbeats and two pianists who used articulation to mark metrical downbeats. The performances of two pianists who showed stronger intensities on the expected strong beats (mean = 70.58) than on the expected weak beats (mean = 52.45) were included as the intensity performances). These pianists’ performances did not show large differences in adjusted articulation on strong beats and weak beats. The performances of two other pianists who showed longer adjusted articulation on strong beats (mean = .97) than weak beats (mean = .94) but little intensity differences were included as articulation performances. Design. Thirty-three error-free performances by the four pianists were arranged so that no more than 3 binary or ternary performances were in a row and no single pianists’ performances were in a row. There were 17 binary trials (10 articulation / 7 intensity) and 16 ternary trials (7 articulation /9 intensity). Seven of the 40 experimental target trial performances contained errors and were not included. Listeners heard one performance repeated twice on each of 33 trials in this within-subject design. The independent variables were cue type (intensity/articulation) and meter (binary/ternary). The dependent variables were metrical response (binary/ternary) and confidence rating. 59 Procedure. Listeners sat in front of a computer and listened to the melodies over AKG K270 headphones. They were instructed to choose the meter they thought the performer intended by circling “binary” for 2/4 or 4/4 melodies and “ternary” for 3/4 or 6/8 melodies. Each melody was heard two times. Listeners also circled their confidence in their decision, with 1 = not very confident and 5 = very confident. Listeners were asked informally if they were familiar with the distinction between the meters (2/4, 4/4, 3/4, 6/8). All subjects knew the distinction, although some subjects had not used the terms “binary” or “ternary” before participating in the experiment. Results A two-way ANOVA on percent ternary response by primed meter and cue type (intonation / articulation) showed no main effect of meter (binary/ternary). However, there was an interaction of primed meter and cue type: F(1,15) = 15.5, p < .05. As shown in Figure 6.1, intensity cues led to more accurate identification of primed meter while articulation cues led to inaccurate identification of primed meter. The response (binary/ternary was combined with the confidence rating to form a 10-point confidence scale). A two-way ANOVA on this combined scale for response and confidence by primed meter and cue type showed the same interaction between cue type and primed meter: F (1,15) = 17.9, p < .05. 60 100 90 PRIMED METER: Percent Ternary Response 80 TERNARY 70 BINARY 60 50 40 30 20 10 0 Intensity CUES Articulation Figure 6.1. Experiment 3: Percent “ternary” response in music perception task with intensity or articulation cues 61 In addition, the confidence scale showed a main effect of cue type, with higher values for intensity than articulation: F(1,15) = 5.6, p < .05. This increased scale response can be attributed partly to the overall increased confidence on intensity trials, regardless of the meter. An ANOVA on confidence alone showed higher confidence for intensity (mean = 3.68) than articulation (mean = 3.42) trials: F(1,15) = 6.38, p < .05, and higher confidence for ternary (mean = 3.63) than binary (mean = 3.47) trials: F(1,15) = 5.25, p < .05. Discussion Listeners were more accurate at identifying the primed meter when the performance contained metrically-related intensity cues than when the performance contained metrically-related articulation cues. Listeners identified the meter of the target melody through intensity cues that persisted from the prime melody. A single experiment alone is not sufficient to draw conclusions about persistence in an ensemble context. This experiment included only some of the performances from the music performance study. However, it demonstrates that it is possible for listeners to interpret the meter when the target performance contains the same acoustic details as the prime performance. One concern is the perceptual salience of the two cues, intensity and articulation. Although the performances were chosen so that a distinction between articulation and intensity cues could be drawn, the intensity cues may have more strongly indicated the meter than the articulation cues in this set of performances. 62 A more natural context, in which instrumentalists are trading a melody back and forth may reveal stronger effects of this continuing persistence. The listeners in the laboratory often tapped their feet or moved their bodies to the performances in order to determine the meter. In a natural setting, this bodily incorporation of the meter would be more pronounced. Also, it is possible that listeners could use both the acoustic cues as well as visual cues when determining and persisting in the meter of a live performance. At the very least, this experiment suggests that the theme of prosodic persistence in an ensemble setting may be worth pursuing. 63 CHAPTER 7 EXPERIMENT 4: SPEECH PERCEPTION The goal of the speech perception study was to examine the phenomenon of persistence in a situation that is closer to natural spoken conversation. Listeners judged the interpretation of sentence that contained prosodic phrase break acoustic cues. If listeners in the speech production study persisted in the acoustic cues of the sentences they heard, could other listeners hear their utterances and identify the syntactic interpretation of the original sentences? Listeners heard the productions of four speakers from the speech production experiment and judged the speakers’ intended syntactic interpretation. Method Participants. The same sixteen adults who participated in the music perception study also participated in this experiment. All participants were native speakers of English. Apparatus. Participants heard stimuli over AKG K270 headphones while seated at computer. Materials. A subset of the productions from the language production experiment was included. Forty-four error-free productions of four speakers were included. (4 of the 48 experimental utterances contained errors such as unnatural pauses or mispronounced 64 words). The particular speakers were chosen because their ISI for the word before a prosodic phrase break matched the syntactically-related phrase break cues from the prime for six or more of their ten utterances. Design. Forty-four error-free performances by the four speakers were arranged so that no more than 3 early or late primed productions were in a row and no single speakers’ productions were in a row. There were four types of sentences: 15 adjective that modifies either a conjoined noun phrase or the first of the two nouns, 11 prepositional phrase attachment, 11 conjunction, and 7 ambiguous pronoun. Listeners heard one performance repeated twice on each of forty-four trials. The independent variable was primed break condition (early break/late break). The dependent variables were response (early/late break interpretation) and confidence rating. Procedure. Participants were instructed to choose the sentence interpretation they thought the speaker intended. On each trial, they heard an utterance, saw two possible sentence interpretations on the computer screen, and then heard the sentence again. The sentence interpretations were labeled A and B and participants circled A or B on an answer sheet. Participants also circled their confidence in their decision with 1 = not very confident and 5 = very confident. The ‘A’ interpretation appeared on the left side of the screen and the ‘B’ interpretation appeared on the left side. For half of the participants, ‘A’ was the early phrase break interpretation and for half of the participants, ‘B’ was the early phrase break interpretation. 65 Results An ANOVA on the percentage late break interpretation responses by phrase break (early/late prime) showed significant differences between the expected early and late interpretations: F (1,15) = 17.35, p < .01. Listeners responded with the late interpretation on 72.4% of primed late break trials and with the early interpretation on 40.9% of primed early break trials. An analysis by item showed participants interpreted 61.4% of the stimuli in the expected (primed) direction, which was significantly different from chance (t (1, 42) = 5.04, p < .05). Discussion Listeners responded differently to the early and late phrase break sentence trials. Although overall accuracy was not high, listeners interpreted the late break sentences in the direction of the original syntactically-related cues. Thus, the persistence effects seen in Experiment 2 were sufficient to influence other listeners’ syntactic interpretations. Listeners had a bias to respond with the late break interpretation. They responded with the late prosodic phrase break interpretation 65.8% of the time. Despite this bias, there was still a difference between listeners’ response to the early and late prosodic phrase break trials. Note that the labels for early and late prosodic phrase break refer to the prime sentences and not to the acoustic properties of the target sentences heard by the listeners. The assumption is that if the speakers in the sentence production experiment persisted in the syntactically-related prosodic phrase breaks, then the listeners in this speech perception study would be able to determine the phrase breaks from the target 66 sentences. As in the music perception experiment, this single experiment does not explain persistence in a conversational context. However, this experiment does suggest that further study in a more realistic conversation context is needed. 67 CHAPTER 8 GENERAL DISCUSSION Several experiments demonstrated persistence effects for syntactically-related and syntactically-unrelated prosodic variations in the domains of music and language. In the music performance study, pianists listened to prime melodies and produced similar target melodies. The prime melodies contained metrically-related intensity cues (supports binary or ternary meter) as well as metrically-unrelated articulation cues (staccato or legato). On each trial, pianists heard a melody and then were asked to perform a different melody. The melodies were blocked by the intensity cue pattern. The pianists were not told to imitate what they had heard. Instead, the focus of the experiment was on the concluding memory test. Pianists were instructed that they would be tested on their memory for the melodies from the experiment. Pianists performed the musical pieces with the same articulation (staccato or legato) as the performances they had just heard. These articulation cues altered from trial to trial (with each piece), so this demonstrated pianists’ ability to persist in structurallyunrelated musical cues. However, pianists did not directly recreate the metrically-related intensity pattern of the prime pieces in their performances. Pianists persisted in the strong and weak beats of the prime meter, but they used timing variation instead of intensity variation. The primes contained louder intensities on the strong beats than the 68 weak beats and pianists performed with longer adjusted articulations (ISI/IOI) on the strong beats than the weak beats. This suggests that pianists used the intensity cues to form a representation of the meter. When the pianists performed subsequent pieces, they used the same meter as the performance they just heard, but they instantiated it with articulation cues instead of intensity cues. Finally, pianists better remembered melodies they had heard earlier than new melodies in a melody recognition task. Pianists listened to each melody and judged whether they had heard it before. There was no difference in the pianists’ ability to recognize performances with different metrically-related or metrically-unrelated prosodic cues from the original performances they heard during the experiment. The inability to differentiate performances was likely due to the pianists’ overall difficulty in remembering the metrically ambiguous, isochronous melodies. In the speech production experiment, listeners heard prime sentences and produced similar target sentences. The prime sentences contained syntactically-related prosodic phrase breaks (early or late) as well as syntactically-unrelated tonal pattern cues (H-L% or L-H%). On each trial, listeners heard a sentence and then read aloud a different sentence. The sentences were blocked by prosodic phrase break. The participants were instructed to attend to the sentences for a later sentence recognition test. Speakers persisted in the syntactically-related phrase breaks and the syntacticallyunrelated tones from the primes. Speakers were more likely to produce pitch accents and prosodic phrase breaks in the same location of their target sentences as where the pitch accents and prosodic phrase breaks occurred in the prime sentences. Also, words before 69 a prime phrase break were longer in IOI than the same words without a prime phrase break. Also, listeners and speakers produced the same tone (H, L) at matching phrase break or pitch accent locations. Finally, listeners better remembered sentences they had heard before than new sentences in a sentence recognition task. In addition, listeners better remembered sentences that contained the same prosodic phrase breaks they had heard before than sentences that contained the same tonal patterns they had heard before. Thus, listeners better remembered the sentences when the syntax (indicated by prosodic phrase break) stayed the same from the experiment to the memory recognition task. The third and fourth experiments examined listeners’ ability to determine the meter or syntax of performances or utterances from the production studies. Musicians who listened to target melody performances from the music production experiment could recognize the meter when metrically-related intensity cues (but not articulation cues) persisted in the performance. Similarly, listeners who heard target sentences from the speech production experiment were better than chance at choosing the phrase structure interpretation that matched the original prime sentence. These perception experiments suggest that it is possible for listeners to determine the meter and syntax from prosodic cues that persisted in the productions. One limitation of the current study is the naturalness of the task, particularly for the speakers. Listeners heard sentences with strong prosodic patterns (both phrase break and tonal patterns) and produced speech by reading sentences. More natural productions and more prosodic variability may be possible if participants do not read the sentences, 70 but instead, produce sentences from memory. More natural context situations, such as conversation during a game, show greater prosodic variation than read speech (Schafer, Speer, Warren, & White, 2000). Future studies of prosodic persistence should involve more natural contexts, so that the speech contains the full range of prosodic variability. Also, the number of stimuli in the speech study was relatively limited. More stimuli or a prolonged exposure to a break pattern could lead to greater persistence. Prosodic persistence The experiments suggest that prosodic persistence is the continuation of acoustic variations (both structurally-related and structurally-unrelated) from perception to production. In addition, this persistence may be involved in conversational speech or ensemble performances, as demonstrated by listeners’ ability to hear target productions and recognize the primed syntax or meter. In both the speech and music production experiments, participants incorporated prime acoustic details into their own target productions. This suggests that listeners formed representations of the melodies or sentences that included more than just the notes or words. Did listeners persist in information that was not categorical? According to Raffman (1993), listeners may forget the prosodic cues once they are used to form categories of pitch or duration. This approach does not explain pianists’ ability to perform with different articulation cues following the staccato and legato primes. These articulation cues should not be remembered because they are sub-categorical. If these articulation cues are not remembered, there is no reason for them to influence and persist 71 in future performances. Thus, the articulation cue persistence suggests pianists do persist in sub-categorical acoustic details. Also, this approach does not explain pianists’ ability to persist in the meter, which they heard instantiated with intensity cues. Prosodic persistence is not simple imitation: the pianists persisted in the prime meter, but they marked the meter with adjusted articulation cues in their performances instead of the primed intensity cues. Even though pianists heard strong beats marked with higher intensities than weak beats, they did not reproduce the meter with intensity cues. Pianists performed with shorter adjusted articulations across events after hearing staccato primes than after hearing legato primes. However, the pianists never performed with the same event duration from the prime melodies. In the prime staccato melodies, the duration of each note event was 150 ms, 30% of the IOI. In the target melodies following the staccato prime, the event durations were 87.6% of the IOI. In the speech production experiment, the speakers produced sentences with pitch accents and prosodic phrase breaks in the same locations as where the pitch accents and breaks were located in the prime sentences, but they often used different tones. Finally, prosodic persistence does not exist solely to support syntactic persistence. Bock (1996) showed evidence for syntactic persistence in which participants described a new scene using the same syntax they had just heard. In the current study, listeners persisted in the prosodic cues, using acoustic cues from what they have heard. For music, the pianists’ performances included both metrically-related and metrically-unrelated acoustic cues. If the pianists were to only persist in metrically-related cues, than they would not have persisted in the prime articulation cues. In speech, speakers persisted in 72 the pitch accent and prosodic phrase break locations from the prime sentences. In addition, they matched the tone at these locations. Although phrase break location and phrasal tone related to syntax, the matching pitch accent location and pitch accent tone related to prosody. An alternative explanation for the persistence effect is that the first sentence implicitly primes the syntactic structure that is then regenerated in the produced target sentence (Potter & Lombardi, 1998). This explanation assumes the prime representation does not include surface structure features (Potter & Lombardi, 1998). This explanation is plausible for the syntactically-related persistence, but it does not make sense for the syntactically-unrelated persistence. If the representation of the prime does not contain surface level features, then acoustic details unrelated to syntax should not persist because they will not be regenerated with the syntax. The musicians and speakers persisted in the articulation (staccato/legato) and the prosodic tones (H/L) from the prime, suggesting that the representation included both syntactic and prosodic features. Speech/music differences There is evidence for syntactically-related and syntactically-unrelated persistence for both speech and music, but there were differences in degree in the two domains. The musicians persisted in the structurally-unrelated articulation (staccato/legato) and used these same cues to produce meter, but speakers showed strong syntactically-related persistence and did not produce full intonational phrase tones. This difference may be related to the different goals in the two domains. In speech, the goal is to communicate 73 an idea and in music, the goal is the expression of the piece. A change of syntax in speech changes the meaning of the utterance, but in music, a change of either meter or articulation changes the expression of the musical performance. Further evidence of the importance of syntactically-related cues was demonstrated in the speech memory test. Listeners better recognized sentences they had heard before if the phrase break (a syntactically-related cue) was the same than if it was different, but listeners did not differentiate performances that contained a different tonal pattern from performances with the same pattern they heard during the experiment. This result is surprising since past research showed evidence for memory for non-structural prosodic cues. For example, listeners can use information such as talker identity or speech rate to recognize previously heard words (Bradlow, et al., 1999, Nygaard, et al., 1995) and listeners’ memory for talkers’ voices aids identification of novel words in noise (Nygaard & Pisoni, 1998). The pianists did not differentiate different cue conditions in the memory test. However, the difficulty of the task suggests that pianists’ results may be due partly to floor effects. Another possible reason for different outcomes in the speech and music experiments is that the salience of the structurally-unrelated prosodic cues may have differed in the two domains. It is possible that the tonal patterns in the speech were less salient details than the syntactic structure. The sentences were blocked by syntactic structure. Perhaps if the sentences were blocked by prosodic tones, these tones would persist to a stronger degree. Different syntactically-unrelated cues, such as style of 74 speaking (ex. clearly enunciate speech vs. normal speech) or location of utterance within pitch range (high or low), may persist. Also, the use of sentences with contrastive stress may reveal stronger non-syntactic persistence. Why were cues different for production and perception in music? In music, pianists persisted in the articulation cues across events and played strong metrical beats with longer articulation than weak metrical beats. Although there was a trend, there were no significant differences in intensities for strong and weak beats. However, in the music perception task, intensity cues better indicated the meter for listeners than articulation cues. This difference between production and perception is likely due to the salience of the intensity and articulation cues in the four performances including in the perception study. By including performances with metrically-related intensity cues but not metrically-related articulation cues (and vice versa), those performances with both types of metrically-related cues were eliminated. Articulation and intensity cues may interact in performance to indicate meter. This result is slightly different than the results of Sloboda (1985), in which listeners could use articulation to determine the metrical context of a performance. Intensity was also a useful cue to meter, but only some performers used this cue (Sloboda, 1985). Thus, the ability to use prosodic cues to determine meter may depend largely on the specific performance. The pianists in the Sloboda (1985) study were more experienced than the pianists in the present performance study and they may have been more consistent in their performance expression of articulation. 75 Why would prosodic persistence be useful? Prosodic variations in music, whether they relate to the meter or not, are part of the expression of music. Musicians in an ensemble such as a band or orchestra coordinate their performances so that they play with the same timing and the same expression. Prosodic variations in speech help speakers disambiguate the syntax in ambiguous sentences. Perhaps persisting in the same prosody also helps speakers in a conversation communicate more clearly and receive the message more easily because they eliminate some acoustic variability when they persist in each others’ prosody. The perception studies showed evidence for both speech and music; a second generation of listeners identified the structural interpretation of the original primed source through the target production. This hints at the importance of prosody for conversational and musical exchanges and suggests that further exploration of mutual priming of prosodic cues in conversational and ensemble contexts may be fruitful. 76 LIST OF REFERENCES Allbritton, D.W., McKoon, G., & Ratcliff, R. (1996). Reliability of prosodic cues for resolving syntactic ambiguity. Journal of Experimental Psychology: Learning, Memory, & Cognition, 22, 714-735. Beckman, M.E. (1996). The parsing of prosody. Language and Cognitive Processses, 11, 17-67. Beckman, M.E. & Elam, G.A. (1997). Guidelines for ToBI labeling. (Version 3). Columbus, OH: Ohio State University. Beckman, M.E. & Pierrehumbert, J.B. (1986). Intonational structure in English and Japanese. Phonology Yearbook, 3, 255-310. Bock, K. (1986). Syntactic persistence in language production. Cognitive Psychology, 18, 355-387. Bock, K. (2002). Persistent structural priming: Transient activation or implicit learning? Paper presented at CUNY Sentence Processing Conference, New York. Bock, K. & Griffin, Z.M. (2000). The persistence of structural priming: Transient activation or implicit learning? Journal of Experimental Psychology: General, 129, 177192. Boltz, M.G. (1998). Tempo discrimination of musical patterns: Effects due to pitch and rhythmic structure. Perception & Psychophysics, 60, 1357-1373. Bradlow, A.R., Nygaard, L.C., & Pisoni, D.B. (1999). Effects of talker, rate, and amplitude variation on recognition memory for spoken words. Perception & Psychophysics, 61, 206-219. Cathcart, E.P. & Dawson, S. (1928). Persistence: A characteristic of remembering. British Journal of Psychology, 18, 262-275. Clark, H.H. (1996). Using language. NY: Cambridge University Press. Clark, H.H. (2002). Speaking in time. Speech Communication, 36, 5-13. 77 Clynes, M., & Walker, J. (1986). Music as time's measure. Music Perception, 4, 85-119. Collier, G.L, & Collier, J.L. (1994). An exploration of the use of tempo in jazz. Music Perception, 11, 219-242. Cooper, A.M. & Whalen, D.H., & Fowler, C.A. (1986). P-centers are unaffected by phonetic categorization. Perception and Psychophysics, 39, 187-196. Cooper, G., & Meyer, L.B. (1960). The rhythmic structure of music. Chicago: University of Chicago Press. Cooper, W.E., & Eady, S.J. (1986). Metrical phonology in speech production. Journal of Memory and Language, 25, 369-384. Cooper, W. & Paccia-Cooper, J. (1980). Syntax and Speech. Harvard University Press: Cambridge, Massachusetts. Cutler, A., Dahan, D., & van Donselaar, W. (1997). Prosody in the comprehension of spoken language: A literature review. Language and Speech, 40, 141201. Dowling, W.J. & Harwood, D.L. (1986). Music cognition. Orlando: Academic Press. Drake, C. (1993). Perceptual and performed accents in musical sequences. Bulletin of the Psychonomic Society, 31, 107-110. Drake, C., & Botte, M.C. (1993). Tempo sensitivity in auditory sequences: Evidence for a multiple-look model. Perception & Psychophysics, 54, 277-286. Drake, C., Jones, M.R., & Baruch, C. (2000). The development of rhythmic attending in auditory sequences: Attunement, referent period, focal attending. Cognition, 77, 251-288. Drake, C., & Palmer, C. (1993). Accent structures in music performance. Music Perception, 10, 343-378. Ellis, M.C. (1991). Thresholds for detecting tempo change. Psychology of Music, 19, 164-169. Essens, P.J., & Povel, D.-J. (1985). Metrical and nonmetrical representations of temporal patterns. Perception & Psychophysics, 37, 1-7. 78 Finney, S. (2001). FTAP: A Linux-based program for tapping and music experiments. Behavoir Research Methods, Instruments, and Computers, 33, 63-72. Fox Tree, J.E. (2000). Coordinating spontaneous talk. In L. Wheeldon (Ed), Aspects of language production. (pp. 375-406). Philadelphia: Psychology Press. Fox Tree, J.E., & Meijer, P. (2000). Untrained speakers' use of prosody in syntactic disambiguation and listeners' interpretations. Psychological Research, 63, 1-13. Fraisse, P. (1982). Rhythm and tempo. In D. Deutsch (Ed), The psychology of music (pp.149-180). New York: Academic Press. Gabrielsson, A. (1987). Once again: The theme from Mozart’s Piano Sonata in A Major (K331): A comparison of five performances. In A. Gabrielsson (Ed), Action and perception in rhythm and music (pp. 81-104). Stockholm: Royal Swedish Academy of Music. Gee, J.P. & Grosjean, F.H., (1983). Performance structures: A psycholinguistic and linguistic appraisal, Cognitive Psychology, 15, 411-458. Grosjean, F.H., Grosjean, L., & Lane, H. (1979). The patterns of silence: Performance structures in sentence production. Cognitive Psychology, 11, 58-81. Henderson, M.T. (1936). Rhythmic organization in artistic piano performance. University of Iowa Studies in the Psychology of Music, 41936, 281-305. Jones, M.R. (1987). Perspectives on musical time. In A. Gabrielsson (Ed), Action and perception in rhythm and music, pp. 153-176. Stockholm: Royal Swedish Academy of Music. Jones, M.R. (1976). Time, our lost dimension: Toward a new theory of perception, attention, and memory. Psychological Review, 83, 323-355. Jungers, M.K., & Palmer, C. (2000). Episodic memory for music performance. Abstracts of the Psychonomic Society, 5, 105. Jungers, M.K., Palmer, C., & Speer, S.R. (2002). Time after time: The coordinating influence of tempo in music and speech. Cognitive Processing, 2, 21-35. Kemler Nelson, D.G., Jusczyk, P.W., Mandel, D.R., Myers, J., Turk, A., and Gerken, L.A. (1995). The head-turn preference procedure for testing auditory perception. Infant Behavior and Development, 18, 111-116. 79 Kjelgaard, M.M., & Speer, S.R. (1999). Prosodic facilitation and interference in the resolution of temporary syntactic closure ambiguity. Journal of Memory and Language, 40, 153-194. Kosslyn, S.M., & Matt, A.M. (1977). If you speak slowly, do people read your prose slowly? Person-particular speech recoding during reading. Bulletin of the Psychonomic Society, 9, 250-252. Large, E.W., & Jones, M.R. (1999). The dynamics of attending: How people track time-varying events. Psychological Review, 106, 119-159. Large, E.W., & Palmer, C. (2001). Perceiving temporal regularity in music. Cognitive Science, 26, 1-37. Large, E.W., Palmer, C., & Pollack, J.B. (1995). Reduced memory representations for music. Cognitive Science, 19, 53-96. LeBlanc, A., Colman, J., McCrary, J., Sherrill, C., & Malin, S. (1988). Tempo preferences of different age music listeners. Journal of Research in Music Education, 36, 156-168. Lehiste, I. (1973). Phonetic disambiguation of syntactic ambiguity. Glossa, 7, 106-122. Lehiste, I. (1977). Isochrony reconsidered. Journal of Phonetics, 5, 253-263. Lehiste, I., Olive, J.P., & Streeter, L. (1976). Role of duration in disambiguating syntactically ambiguous sentences. Journal of the Acoustical Society of America, 60, 1199-1202. Lerdahl, F., & Jackendoff, R. (1983). A generative theory of tonal music. Cambridge, MA: MIT Press. Levelt, W.J.M., (1989). Speaking: From intention to articulation. Cambridge, MA: MIT Press. Levitin, D.J., & Cook, P.R. (1996). Memory for musical tempo: Additional evidence that auditory memory is absolute. Perception & Psychophysics, 58, 927-935. Martin, J.G. (1970). Rhythm-induced judgments of word stress in sentences. Journal of Verbal Learning and Verbal Behavior, 9, 627-633. Merker, B. (2000). Synchronous chorusing and human origins. In N.L. Wallin, B. Merker, & S. Brown (Eds.), The origins of music. (pp. 315-327). Cambridge: MIT Press. 80 Miller, J.L., Grosjean, F., and Lomato, C. (1984). Articulation rate and its variability in spontaneous speech: A reanalysis and some implications. Phonetica, 41, 215-225. Munhall, K., Fowler, C.A., Hawkins, S., & Saltzman, E. (1992). “Compensatory shortening” in monosyllables of spoken English. Journal of Phonetics, 20, 225-239. Nygaard, L.C., & Pisoni, D.B. (1998). Talker-specific learning in speech perception. Perception & Psychophysics, 60(3), 355-376. Nygaard, L.C., Sommers, M.S., & Pisoni, D.B. (1993). Effects of stimulus variability on perception and representation of spoken words in memory. Perception & Psychophysics, 57, 989-1001. Palmer, C. (1989). Mapping musical thought to musical performance. Journal of Experimental Psychology: Human Perception and Performance, 15, 331-346. Palmer, C. (1996a). Anatomy of a performance: sources of musical expression. Music Perception, 13, 433-454. Palmer, C. (1996b). On the assignment of structure in music performance. Music Perception, 14, 21-54. Palmer, C. (1997). Music performance. Annual Review of Psychology, 48, 115138. Palmer, C., Jungers, M.K., & Jusczyk, P.W. (2001). Memory for musical prosody. Journal of Memory and Language, 45, 526-545. Palmer, C. & Krumhansl, C.L. (1990). Mental representations for musical meter. Journal of Experimental Psychology: Human Perception and Performance, 16, 728-741. Pierrehumbert, J., & Hirschberg, J. (1990). The meaning of intonation contours in the interpretation of discourse. In P.R. Cohen, J. Morgan, & M.E. Pollack (Eds), Intentions in communication (pp. 271-311). Cambridge, MA: MIT Press. Pitrelli, J., Beckman, M.E., & Hirshberg, J. (1994). Evaluation of prosodic transcription labeling reliability in the ToBI framework. In Proceedings of the 1994 International Conference on Spoken Language Processing (pp. 123-126). Yohohama, Japan. 81 Pisoni, D.B. (1997). Some thoughts on “normalization” in speech perception. In K. Johnson & J.W. Mullenix (Eds.), Talker variability in speech processing, pp. 9-32, San Diego: Academic Press. Potter, M.C., & Lombardi, L. (1998). Syntactic priming in immediate recall of sentences. Journal of Memory and Language, 38, 265-282. Povel, D.J. (1981). Internal representation of simple temporal patterns. Journal of Experimental Psychology: Human Perception & Performance, 7, 3-18. Price, P., Ostendorf, M., Shattuck-Hufnagle, S., & Fong, C. (1991). The use of prosody in syntactic disambiguation. Journal of the Acoustical Society of America, 90, 723-735. Raffman, D. (1993). Language, music, and mind. (Bradford Book, ed.). Cambridge, MA: MIT Press. Repp, B.H. (1992). Probing the cognitive representation of musical time: structural constraints on the perception of timing perturbations. Cognition, 44, 241-281. Repp, B.H. (1994). On determining the basic tempo of an expressive music performance. Psychology of Music, 22, 157-167. Schafer, A.J., Speer, S. R., Warren, P., & White, D.S. (2000). Intonational disambiguation in sentence production and comprehension. Journal of Psycholinguistic Research, 29, 169-182. Seashore, C.E. (ed. 1936). Objective analysis of musical performance, Vol. 4. Iowa City: University of Iowa Press. Siegel, J.A., & Siegel, W. (1977). Categorical perception of tonal intervals: Musicians can’t tell sharp from flat. Perception and Psychophysics, 21, 399-407. Sloboda, J.A. (1983). The communication of music metre in piano performance. Quarterly Journal of Experimental Psychology, 35, 377-395. Sloboda, J.A. (1985). Expressive skill in two pianists: metrical communication in real and simulated performances. Canadian Journal of Psychology, 39, 273-293. Speer, S.R., Crowder, R.G., & Thomas, L.M. (1993). Prosodic structure and sentence recognition. Journal of Memory and Language, 32, 336-358. Stein, E. (1989). Form and performance. New York: Limelight. 82 Streeter, L. (1978). Acoustic determinants of phrase boundary perception. Journal of the Acoustical Society of America, 64, 1582-1592. Sundberg, J. (1993). How can music be expressive? Speech Communication, 13, 239-253. Volaitis, L.E., & Miller, J.L. (1992). Phonetic prototypes: Influence of place of articulation and speaking rate on the internal structure of voicing categories. Journal of the Acoustical Society of America, 92, 723-735. Wales, R. & Toner, J. (1979). Intonation and ambiguity. In W.E. Cooper and E.C.T. Walker (Eds.), Sentence processing: Psycholinguistic studies presented to Merrill Garrett. Hillsdale, N.J.: Erlbaum. Warren, P. (1999). Prosody and sentence processing. In S. Garrod and M. Pickering (Eds.), Language processing. (pp. 155-188). Hove: Psychology Press. Warren, R.M. (1985). Criterion shift rule and perceptual homeostasis. Psychological Review, 92, 574-584. Wayland, S.C., Miller, J.L., & Volaitis, L.E. (1994). The influence of sentential speaking rate on the internal structure of phonetic categories. Journal of the Acoustical Society of America, 95, 2694-2701. Windsor, W. L, & Clarke, E. F. (1997). Expressive timing and dynamics in real and artificial music performances: Using an algorithm as an analytical tool. Music Perception, 15, 127-152. Wingfield, A., & Klein, J.F. (1971). Syntactic structure and acoustic pattern in speech perception. Perception and Psychophysics, 9, 23-25. 83