Vocalizer 30 Developers Guide PDF
Vocalizer 30 Developers Guide PDF
Vocalizer 30 Developers Guide PDF
3.0
Developers Guide
Nuance Vocalizer 3.0
Developers Guide
Copyright 2004 Nuance Communications, Inc. All rights reserved.
1005 Hamilton Avenue, Menlo Park, California 94025 U.S.A.
Printed in the United States of America.
Last updated July 2004.
Information in this document is subject to change without notice and does not represent a
commitment on the part of Nuance Communications, Inc. The software described in this document is
furnished under a license agreement or nondisclosure agreement. The software may be used or copied
only in accordance with the terms of the agreement. You may not copy, use, modify, or distribute the
software except as specifically allowed in the license or nondisclosure agreement. No part of this
document may be reproduced or transmitted in any form or by any means, electronic or mechanical,
including photocopying and recording, for any purpose, without the express written permission of
Nuance Communications, Inc.
Nuance Vocalizer uses the Regex++ library distributed by Boost.org. Copyright 1998-2001 Dr. John
Maddock.
Nuance and Nuance Communications are registered trademarks of Nuance Communications, Inc.
SpeechChannel, SpeechObjects, Nuance Verifier, and Nuance Voice Web Server are trademarks of
Nuance Communications, Inc. Any other trademarks belong to their respective owners.
Contents iii
Contents
About this guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
Audience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
Related documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x
Typographical conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
Where to get help . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii
Chapter 1. Introducing Nuance Vocalizer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Chapter 2. Installing Nuance Vocalizer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Supported platforms and system requirements . . . . . . . . . . . . . . . . . . . . . . 3
Memory requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Third-party software requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Installing Nuance Vocalizer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Installing Vocalizer on Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Installing Vocalizer on Solaris . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Environment variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Nuance Vocalizer licensing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Testing your installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Uninstalling Vocalizer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Installing voice packs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Installing voice packs on Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Installing voice packs on Solaris . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Chapter 3. Configuring Vocalizer with the Nuance System . . . . . . . . . . . . . . . . . 9
Including Vocalizer in a Nuance configuration . . . . . . . . . . . . . . . . . . . . . . . 9
Starting a Vocalizer TTS server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Command-line options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Nuance Vocalizer 3.0
Developers Guide
iv
Nuance parameter settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Connecting Vocalizer to a resource manager . . . . . . . . . . . . . . . . . . . . 17
Running Vocalizer without a resource manager . . . . . . . . . . . . . . . . . 19
Using the Nuance Watcher to start Vocalizer . . . . . . . . . . . . . . . . . . . . . . . 20
Starting Vocalizer automatically with a Watcher startup file . . . . . . 21
Communicating with Vocalizer through the Watcher . . . . . . . . . . . . 22
Enabling Vocalizer logs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Directing requests to a specific TTS server . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Chapter 4. Playing TTS prompts from your application . . . . . . . . . . . . . . . . . . . 27
Using Nuance prompt playback functions . . . . . . . . . . . . . . . . . . . . . . . . . 27
SpeechObjects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
NuanceSpeechChannel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
RCEngine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Using TTS in a VoiceXML application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
VoiceXML 1.0 elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
SSML 1.0 elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
SSML sample text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Resolving XML parsing errors with predefined entities . . . . . . . . . . . . . . 39
Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
URI formats for the SSML <audio> element . . . . . . . . . . . . . . . . . . . . . . . . 40
Chapter 5. Starting Vocalizer through the Launcher . . . . . . . . . . . . . . . . . . . . . 43
Using the Vocalizer Launcher . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Configuration editor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Status window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Chapter 6. Preprocessing text input . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
Passing email messages to Vocalizer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
Features of the Email Reader Configuration Tool . . . . . . . . . . . . . . . . 47
Using the Email Reader Configuration Tool . . . . . . . . . . . . . . . . . . . . 48
Distributing changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
Creating custom filters for text replacement . . . . . . . . . . . . . . . . . . . . . . . . 52
Applying filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
Using the Text Replacement Filter Editor . . . . . . . . . . . . . . . . . . . . . . . 53
Contents v
Distributing changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
Chapter 7. Generating audio files from text input . . . . . . . . . . . . . . . . . . . . . . . . 55
Using the Offline Audio Generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
Validating multi-server configurations . . . . . . . . . . . . . . . . . . . . . . . . . 57
Chapter 8. Customizing your applications dictionary . . . . . . . . . . . . . . . . . . . . . 59
Features of the Dictionary Editor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
Audio feedback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
Word details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Phoneme set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Audio scratchpad . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
Using the Dictionary Editor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
Enabling logging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Adding new entries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Modifying dictionary entries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
Testing your entries in Scratchpad . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
Deleting entries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
Distributing dictionary changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
Chapter 9. Techniques for enhancing audio output . . . . . . . . . . . . . . . . . . . . . . 69
Phrasing and punctuation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
Periods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
Hyphens and dashes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
Parentheses and double quotes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
Apostrophes and single quotes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
Single-letter abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
Ambiguous abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
Homographic abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
Measurement and numeric abbreviations . . . . . . . . . . . . . . . . . . . . . . 75
Acronyms and initials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
Note on capitalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
Cardinals and ordinals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
Nuance Vocalizer 3.0
Developers Guide
vi
Digit sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
Fractions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
Decimal fractions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
Percentages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
Account and social security numbers . . . . . . . . . . . . . . . . . . . . . . . . . . 80
Combining letters and numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
Currency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
Telephone numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
Addresses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
Zip codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
Dates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
Times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
Email and web addresses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
Handling 8-bit characters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
Appendix A. Text processing for Canadian French . . . . . . . . . . . . . . . . . . . . . . . 87
Text encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
Phrasing and punctuation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
Periods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
Hyphens and dashes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
Parentheses and double quotes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
Single-letter abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
Homographic abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
Measurement and numeric abbreviations . . . . . . . . . . . . . . . . . . . . . . 91
Acronyms and initials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
Capitalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
Cardinals and ordinals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
Digit sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
Fractions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
Decimal fractions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
Contents vii
Percentages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
Account numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
Combining letters and numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
Currency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
Telephone numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
Addresses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
Dates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
Times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
Email and web addresses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
Appendix B. Text processing for American Spanish . . . . . . . . . . . . . . . . . . . . . 103
Text encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
Regionalism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
Phrasing and punctuation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
Periods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
Parentheses and double quotes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
Single-letter abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
Homographic abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
Measurement and numeric abbreviations . . . . . . . . . . . . . . . . . . . . . 107
Acronyms and initials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
Capitalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
Cardinals and decimal fractions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
Ordinals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
Digit sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
Fractions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
Percentages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
Account numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
Combining letters and numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
Currency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
Telephone numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
Dates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
Times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
Nuance Vocalizer 3.0
Developers Guide
viii
Addresses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
Email and web addresses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
About this guide
Audience
ix
About this guide
Nuance Vocalizer provides text-to-speech (TTS) services integrated with the
Nuance Systems distributed architecture.
This guide describes how to use Nuance Vocalizer to incorporate text-to-speech
in a Nuance speech application, including how to invoke TTS from your
application code and how to set up your runtime configuration to most
efficiently use Nuance Vocalizer services.
Audience
This document is for developers creating speech applications based on the
Nuance Speech Recognition System, including applications built with
Foundation SpeechObjects or applications built with VoiceXML running on
the Nuance Voice Web Server.
Organization
The guide is organized as follows:
Chapter 1 introduces Nuance Vocalizers features.
Chapter 2 covers the installation process and system requirements for running
Nuance Vocalizer.
Chapter 3 describes how to start and configure Nuance Vocalizer with the
Nuance System.
Chapter 4 provides information on invoking TTS from your application code and
describes Nuance Vocalizers support for speech synthesis markup tags.
Chapter 5 explains how to start and modify configuration settings for Nuance
Vocalizer using the Launcher, a graphical tool included with your installation.
Chapter 6 outlines the procedures for preprocessing text input for email reading
and creating text replacement filters.
Nuance Vocalizer 3.0
Developers Guide
x
Chapter 7 explains how to generate audio files from text input through the
Offline Audio Generator tool.
Chapter 8 describes how to use the Dictionary Editor tool to customize the
pronunciation dictionary for your applications.
Chapter 9 describes how English text input is handled by the TTS engine.
Appendix A provides information on how Canadian French text input is
processed.
Appendix B provides information on how American Spanish text input is
processed.
Related documentation
The Nuance documentation contains a set of developer guides as well as
comprehensive online API reference documentation.
In addition to this guide, the documentation set includes:
Nuance Verifier Developers Guide, which describes how to use the Nuance
Verifier to add security features to speech recognition and IVR
applications.
Nuance System Glossary, which defines terms and acronyms used in the
Nuance documentation set.
Compliance to VoiceXML 1.0 and the World Wide Web Consortiums (W3C)
April 5, 2002 Working Draft for Speech Synthesis Markup Language
Specification
Customized text preprocessing for email reading and text replacement filters
Generation of audio files in various formats from text files or direct user
input
Ability to run voice data from disk, thus reducing system memory
requirements
For 50 channels, 1 GB
Note: These recommendations are approximate figures, as actual memory
requirements depend on the maximum prompt duration used.
Third-party software requirements
The GUI utilities included with Nuance Vocalizer require the Java 2 Runtime
Environment (JRE) version 1.3 (or higher).
DictionaryEditor.bat
EmailReaderConfigTool.bat
OfflineAudioGenerator.bat
VocalizerLauncher.bat
ReplacementFilterEditor.bat
b In each of the files, change the path to the javaw executable. For example,
if your path is:
"C:\Program Files\JavaSoft\JRE\1.3\bin\javaw"
change this to the location of the latest version, for example:
"C:\Program Files\JavaSoft\JRE\1.4\bin\javaw"
On Solaris, you must download and install the JRE if you do not have it
already. See the Sun website (www.sun.com) for download information.
Chapter 2 Installing Nuance Vocalizer
Installing Nuance Vocalizer
5
Installing Nuance Vocalizer
The Nuance Vocalizer installation package is available for download from the
Nuance Technical Support Online website (support.nuance.com). After
downloading the appropriate installation package, follow the installation
procedures described next.
Installing Vocalizer on Windows
To install Vocalizer on Windows:
1 Double-click the downloaded executable file to run InstallShield.
a The opening dialogue window offers the option of including these GUI
tools and the JRE version 1.3 in your installation:
Dictionary Editor
Vocalizer Launcher
If you do not plan to use the GUI tools on this machine, you are not
required to install them or the JRE
If you already have the JRE version 1.3 (or higher) installed, do not
include the JRE in your installation
With bash
bash-$] cd $NUANCE
bash-$] . SETUP sh
bash-$] cd $VOCALIZER/bin/sparc-solaris
bash-$] . SETUP sh
With tcsh
Chapter 2 Installing Nuance Vocalizer
Testing your installation
7
tcsh-$] cd $NUANCE
tcsh-$] source SETUP
tcsh-$] cd $VOCALIZER/bin/sparc-solaris
tcsh-$] source SETUP
Nuance recommends that this environment variable be set in a persistent
place, such as in a .profile (bash) or .cshrc (tcsh) file.
Environment variables
On Windows, the Vocalizer installation process modifies your systems PATH
variable to include necessary libraries. Make sure the PATH variable includes
%NUANCE%\bin\win32 and %VOCALIZER%\bin\win32.
On Unix, $VOCALIZER/bin/sparc-solaris is added to the PATH and LD_LIBRARY
environment variables when you source the SETUP script, as described in the
installation procedure for Solaris (step 8 on page 6).
Nuance Vocalizer licensing
Nuance provides a license key that controls the number of concurrent channels
that can be opened to the TTS server. You can run this license file using an
instance of the Nuance License Manager (nlm), as described in Starting a
Vocalizer TTS server on page 11.
Note: The default two-port license shipped with Vocalizer uses port 8471. When
you buy your Vocalizer license, this port number may change, and you must set
the lm.Addresses parameter accordingly.
Testing your installation
You can test your installation by sending text to Vocalizer to be synthesized
through the Offline Audio Generator, a graphical tool included with your
installation. See Chapter 7 for information on using this tool.
Uninstalling Vocalizer
When you uninstall Vocalizer, all user data, including custom dictionary files,
email configuration files, voice configuration files, and custom filters, is backed
up into a temporary folder, vocalizer_files-<date>-<time>, in either the %TEMP%
or %SYSTEMROOT% directory.
Nuance Vocalizer 3.0
Developers Guide
8
Installing voice packs
The voice packs are available for download from the Nuance Technical Support
Online website (support.nuance.com) and do not require separate licensing.
Vocalizer servers only support one voice at a time. When you start your
Vocalizer server, you can select which voice to use through the -voice
command-line option. See Selecting voices on page 12.
Installing voice packs on Windows
To install a voice pack on Windows:
1 Double-click the voice pack executable file to run InstallShield.
2 You will be prompted for the installation path.
By default, InstallShield detects your Vocalizer installation and installs the
voice pack files in the appropriate directory.
Installing voice packs on Solaris
To install a voice pack on Solaris:
1 Log on as the root user.
2 Copy the voice pack file into a temporary directory:
cp VoiceName_pack.tar.gz tmp
3 From that directory, uncompress the file:
gunzip VoiceName_pack.tar.gz
4 Untar the file:
tar xvf VoiceName_pack.tar
5 Run the install script:
./install-VoiceName.sh
The script detects your Vocalizer installation and installs the voice pack files
in the appropriate directory.
Chapter 3 Configuring Vocalizer with the Nuance System
Including Vocalizer in a Nuance configuration
3
9
Chapter 3
Configuring Vocalizer with
the Nuance System
This chapter describes how to start the processes that let you use Nuance
Vocalizer text-to-speech services from a Nuance speech application. You can
start these processes directly from the command line, or via the Nuance Watcher
tool.
Note: This chapter describes the runtime configuration for supporting
text-to-speech. For information on Nuance programming interfaces for
incorporating text-to-speech in your application code, see Chapter 4.
Including Vocalizer in a Nuance configuration
Vocalizer runs as an integrated part of a Nuance configuration, using
functionality provided by the recognition client to issue TTS requests to
Vocalizer and play the synthesized speech back to the caller.
The following diagram shows the general flow of data between the application
and the TTS engine:
Nuance Vocalizer 3.0
Developers Guide
10
Vocalizer runs as a server process similar to other Nuance processes such as the
recognition and compilation servers. You can set up a runtime configuration
including Vocalizer TTS servers in one of two ways:
Getting general runtime information on the Vocalizer TTS server such as the
process ID or the time the process was started
To start a Vocalizer TTS server through the Watcher Telnet interface, you could
send the following command:
> startvp vocalizer watcher.RestartOnFailure=TRUE \
config.config.LogFileMaxNum=d:/data/logs/tts
The Watcher lists Vocalizer TTS server processes as vocalizer and process
parameters.
Chapter 3 Configuring Vocalizer with the Nuance System
Enabling Vocalizer logs
23
Note: For more information on information available through the Watcher, see
the Nuance Application Developers Guide.
Enabling Vocalizer logs
The Vocalizer TTS server process can generate logs similar to other Nuance
runtime processes. If you are using the Watcher, you should enable logging, as
standard program output is otherwise lost.
To enable logging:
1 Create a folder to store the log files.
2 On the vocalizer command line, set the parameter config.LogFileRootDir
to the location at which you want log files to be written. This enables logging
and sets both the location to which the files are written and the root name
for the files. For example:
> vocalizer config.LogFileRootDir=d:/data/logs/vocalizer/tts
causes Vocalizer to write log files named tts001, tts002, and so on to the
directory d:\data\logs\vocalizer.
Note: To set this parameter in a Watcher startup file, use forward slashes in
your path name instead of back slashes. On a Windows DOS command line,
you can use either format.
You can further control the behavior of the logging mechanism by setting these
parameters on the vocalizer command line:
SpeechObjects
NuanceSpeechChannel
RCEngine
Nuance Vocalizer 3.0
Developers Guide
28
SpeechObjects
The Foundation SpeechObjects, Nuances Java-based speech application
development framework, defines classes that let you create specific TTS prompt
objects from a simple text string. This class, TTSPrompt, implements the Playable
interface, which allows it to be appended to a prompt queue and played
directly:
To use the TTSPrompt class, just set the text string you want to be synthesized.
You can then append the prompt to the prompt queue of your
SpeechChannel. For example:
TTSPrompt tts = new TTSPrompt("Hello world");
tts.appendTo(sc);
Note: In this example, sc is the SpeechChannel object being used by your
application.
NuanceSpeechChannel
If you are programming with the NuanceSpeechChannel API directly instead of
using the Foundation SpeechObjects, use the appendTTS method defined in the
CorePromptPlayer interface to create TTS prompts. This method creates the
prompt and appends it to the prompt queue.
The NuanceSpeechChannel class includes a default implementation of the
CorePromptPlayer interface. For example, if sc is your NuanceSpeechChannel
object, the following appends and plays a TTS prompt:
CorePromptPlayer player = sc.getPromptPlayer();
player.appendTTS("Hello world");
player.play(false);
For more information about the NuanceSpeechChannel API, see the online
documentation shipped with the Nuance System.
vcommerce.util.prompt.Playable
void appendTo(PromptPlayer player)
TTSPrompt
java.lang.String getText()
void setText (java.lang.String text)
Chapter 4 Playing TTS prompts from your application
Using TTS in a VoiceXML application
29
Note: The TTSPrompt class described in SpeechObjects on page 28 does not
work with the CorePromptPlayer class. You must use the SpeechChannel
implementation provided by the SpeechObjects framework to use TTSPrompt.
RCEngine
The RCEngine is a C++ recognition client-level API targeted for use by
developers who are integrating Nuance speech technology with an existing
telephony platform. To play TTS prompts through a Nuance RCEngine, you use
the same RCEngine PlayPrompts function you use to play a prerecorded
prompt. Instead of specifying a set of audio file names, you can specify a TTS
string by using the prefix "-tts_text:". This indicates that the following text is
the transcription for a TTS prompt, and not the name of a prerecorded audio file.
For example:
char const * prompts[2] = {"-tts_text:Hello world", NULL};
unsigned play_id = rce->GetUniqueID();
rce->PlayPrompts(prompts, play_id);
Note that the argument passed to PlayPrompts is actually a null-terminated
array of strings representing prompts to be played. This means you can include
multiple TTS prompt specifications, or mix TTS prompts with prerecorded
prompts. You can also add a pause to the array of prompts by using the prefix
"-pause:" followed by a number of milliseconds. For example:
char const * prompts[4] = {"welcome.wav", "-pause:500",
"-tts_text:Mister Jones", NULL};
unsigned play_id = rce->GetUniqueID();
rce->PlayPrompts(prompts, play_id);
See the Nuance Platform Integrators Guide shipped with the Nuance System for
more information on the RCEngine API.
Using TTS in a VoiceXML application
You can use Vocalizer with the Nuance Voice Web Server to generate TTS
prompts from VoiceXML applications.
Vocalizer supports the following markup languages:
SSML 1.0, based on the W3Cs April 5, 2002 Working Draft for Speech
Synthesis Markup Language Specification
(www.w3.org/TR/2002/WD-speech-synthesis-20020405)
Nuance Vocalizer 3.0
Developers Guide
30
Text input with markup elements must be well-formed in order to be recognized
as XML markup rather than plain text. For example, the following text is valid
SSML:
<speak> Hello <break/> World</speak>
However, this text:
Hello <break/> World
would be synthesized as:
Hello angle bracket break slash angle bracket world.
Note that attributes specified as required in VoiceXML 1.0 and SSML 1.0 are not
enforced by Vocalizer. If a required attribute is missing from a tag, Vocalizer still
attempts to parse and handle that tag in a best-effort manner. Both VoiceXML 1.0
and SSML 1.0 translators treat all numeric attribute values as integers.
VoiceXML 1.0 elements
When using VoiceXML 1.0 elements, make sure Vocalizer is running in vxml
mode:
> vocalizer lm.Addresses=hostname:8471 -text_type vxml
The primary VoiceXML element for specifying a TTS prompt is the <prompt>
element. Any character data within a <prompt></prompt> block that is not
nested in another child element is treated as text to be synthesized by the TTS
server. For example:
<prompt>
Welcome to the Acme Travel Company.
</prompt>
Note: Text sent to Vocalizer must be well-formed XML in order to be recognized
as XML markup rather than plain text.
VoiceXML defines a number of child elements you can nest within a <prompt>
element to control the characteristics of the synthesized text, including those
listed in Table 1.
Table 1: Nested elements
Element Description
<break> Inserts a pause.
<emp> Increases or reduces the emphasis put on specific words.
<div> Allows you to specify either a sentence or a paragraph.
Chapter 4 Playing TTS prompts from your application
Using TTS in a VoiceXML application
31
<audio> Plays an audio file within a prompt. Supports the following encoding:
Supports the following encoding:
For WAV files:
13:30:10 is converted to
one thirty and ten seconds p.m.
12:00:00 is converted to
twelve oclock noon
00:00:00 is converted to
twelve oclock midnight
type currency Supports values which can be interpreted as dollars
and cents only. International currency indicators,
such as USD or CAD, are not supported.
type name Indicates contained text is a proper name.
type net:email Indicates contained text is an internet identifier.
net:uri
type telephone Supports North American style phone numbers of 7,
10, and 11 digits. All punctuation is stripped and
ignored.
Table 2: Supported SSML 1.0 elements (continued)
Element Attribute Value Notes
Chapter 4 Playing TTS prompts from your application
Using TTS in a VoiceXML application
35
<say-as> type voicexml:date Correspond to the VoiceXML 2.0 built-in grammar
types.
For details on the required input formats for these
types, see the VoiceXML 2.0 specification at
www.w3.org/TR/2001/WD-voicexml20-20011023/
#dml2.3.1.1
voicexml:time
voicexml:boolean
voicexml:phone
voicexml:currency
voicexml:digits
voicexml:number
<prosody> Specifies prosodic information for the enclosed text.
For all <prosody> attribute tags, if user-specified
values are above or below the supported range of
values, the maximum or minimum value is
substituted for the user-specified value.
pitch Supported values:
Descriptive:
high (190)
medium (155)
low (140)
Descriptive:
high (100)
medium (50)
low (10)
Descriptive:
fast (80)
medium (100)
slow (130)
For AU files:
utf-8
utf-16
iso-8859-1
us-ascii
When using the <phoneme> element in SSML, if you select IPA as the alphabet,
the IPA characters can be represented by multi-byte values only, and the entire
document passed to Vocalizer must therefore be encoded using either UTF-8 or
UTF-16.
URI formats for the SSML <audio> element
Vocalizer handles a specific set of http: and file: URI formats (per RFC s 2616 and
2396) for the VoiceXML <audio> element. For example, this table shows illegal
formats:
URI Error
C:/foo.wav Invalid format per RFC 2396
file://C:/foo.wav Uses only two forward slashes (/) instead of three
between the colon (:) and the C.
The correct form is: file:///C:/foo.wav
Chapter 4 Playing TTS prompts from your application
URI formats for the SSML <audio> element
41
Table 3 and Table 4 demonstrate the correct formats for absolute and relative
URIs, and show how these URI formats are handled by Vocalizer.
Some notes:
If the path portion of the URI is relative, the file is searched for in the
%VOCALIZER%/data/audioFiles/ directory
Table 3: Absolute URIs
URI Reference
file://localhost/C:/foo/bar.wav C:\foo\bar.wav
file://localhost/foo/bar.wav /foo/bar.wav (Unix)
file:///C:/foo/bar.wav C:\foo\bar.wav
file:///foo/bar.wav /foo/bar.wav (Unix)
file:/C:/foo/bar.wav C:\foo\bar.wav
file:/foo/bar.wav /foo/bar.wav (Unix)
http://localhost/bar.wav
http://audio-server/audiofiles/bar.wav
http://audio-server:8080/audiofiles/bar.wav
Table 4: Relative URIs (file: format only)
URI Reference
//localhost/C:/foo/bar.wav C:\foo\bar.wav
Nuance Vocalizer 3.0
Developers Guide
42
//localhost/foo/bar.wav /foo/bar.wav (Unix)
///C:/foo/bar.wav C:\foo\bar.wav
///foo/bar.wav /foo/bar.wav (Unix)
C:/foo/bar.wav / C:\foo\bar.wav
/foo/bar.wav /foo/bar.wav (Unix)
foo/bar.wav %VOCALIZER%\data\audioFiles\foo\bar.wav
./foo/bar.wav %VOCALIZER%\data\audioFiles\foo\bar.wav
../foo/bar.wav %VOCALIZER%\foo\bar.wav
bar.wav %VOCALIZER%\data\audioFiles\bar.wav
./bar.wav %VOCALIZER%\data\audioFiles\bar.wav
../bar.wav %VOCALIZER%\bar.wav
Table 4: Relative URIs (file: format only)
URI Reference
Chapter 5 Starting Vocalizer through the Launcher
Using the Vocalizer Launcher
5
43
Chapter 5
Starting Vocalizer through
the Launcher
The Vocalizer Launcher is graphical tool that allows you to create, edit, and save
start-up configurations for running Nuance Vocalizer.
Using the Vocalizer Launcher
From the Vocalizer Launcher interface, you can configure and start vocalizer,
along with the nlm process. You can also start any of the other GUI tools
provided with Vocalizer.
To use the Launcher:
1 From the Windows Start menu, go to Programs Nuance Nuance Vocalizer
3.0 Executables Vocalizer Launcher
The Configurations window lists the voice packs and configurations
available on your system.
Nuance Vocalizer 3.0
Developers Guide
44
2 After selecting a configuration, choose either:
StartOpens the Status window and starts vocalizer with the default
options for the selected configuration.
Vocalizer configuration
Vocalizer portAllows you to change the TTS port setting for audio
generation
License manager
Resource manager
Voice configuration
PersonaLists the voice packs available from which you can make a
selection
FilterAllows you to select the text filter (or filters) you want to use and
specify the order in which they are applied
Nuance Vocalizer 3.0
Developers Guide
46
On Windows, from the Programs group in your Windows Start menu, select
Nuance Nuance Vocalizer 3.0 Executables Email Reader
Configuration Tool
On Solaris, enter:
> EmailReaderConfigTool.sh
Chapter 6 Preprocessing text input
Passing email messages to Vocalizer
49
The Text In box displays a sample RFC 822 email. The Text Out box allows you
to see the results of configuration choices, displaying how the email messages
will be read. Use the tabs to choose which field you want to edit. You can see
your updates in the Text Out box as you make your changes.
Specifying the fields
to read
Under each tab is a checkbox labeled Read field_name field. Check the Enabled
box to specify whether or not you want information from that field included in
the synthesized output. For example, under the Date tab in the previous
graphic, Read Date field is enabled.
Specifying order The Field Read Order list box lists all the email fields in the order they will be
read. To change the order, select the field you want to move, then click the Up or
Down button.
Specifying text
output
Each field of the email can be prefaced or followed by user-defined comments.
To edit this text, select the tab for the particular field, either Date, From, To,
Copied, Subject, or Body. Enter your comments in the appropriate text box.
The To and Copied fields often include several names. You can decide if you
want Vocalizer to tell you the total number of names listed, and if you want
Vocalizer to read each name.
Nuance Vocalizer 3.0
Developers Guide
50
To avoid having a long list of names synthesized, you can specify a maximum
number, X, so that Vocalizer only reads the list if there are fewer than X names.
For example, if you set the If there are less than field to 5, the application reads
all the names that are on the list only if there are fewer than five. If there are five
or more names, nothing is read.
If you want Vocalizer to tell you the total number of names listed, specify a
minimum number, Y, so that Vocalizer only tells you the total when there are
more than Y names. For example, if you set the If there are more than field to
1, the application tells you the total number of names only if there is more than
one name. If there is only one or zero names, nothing is said.
Address and
signature handling
From the Miscellaneous tab, you can enable the settings for normalizing email
addresses and signature handling.
Normalizing addresses uses white space to replace underscore characters (_)
and to separate digits and characters within a strings. For example:
Address input Normalized result
[email protected] a [email protected]
[email protected] ted [email protected]
Chapter 6 Preprocessing text input
Passing email messages to Vocalizer
51
Some email signatures include a separator line. Check the Remove non
alpha/numeric characters? to remove lines which do not contain alpha or
numeric characters.
Distributing changes
You can edit the email configuration files for all your Vocalizer servers on a
single development machine. When you save changes in the Email Reader
Configuration Tool, the configuration file is compiled and put in the
%VOCALIZER%\data\locale\email directory. You can distribute this file to
your other TTS server machines by copying the contents of the
%VOCALIZER%\data\locale\email directory to the same path on the
machine(s) where you want to distribute the changes.
Nuance Vocalizer 3.0
Developers Guide
52
Creating custom filters for text replacement
The Text Replacement Filter Editor allows you to create custom filters to modify
how specific text input is rendered by the TTS engine.
Text replacement filters allow you to specify a set of related translations,
mapping text input to a replacement value. For example, the emoticon filter
includes several replacements, such as:
Each filter is stored in a single file. There is no limit to the number of filters you
can create and use.
To enable this feature, add the -filter option on the vocalizer command line:
lm.Addresses=localhost:8471 -filter your_custom_filter
Applying filters
You can specify multiple filters and choose the order in which they are applied.
Enter the filter names using a comma-separated list with no spaces. For
example, if you have a filter named medical, which expands medical acronyms,
you can include it with the emoticon filter like this:
lm.Addresses=localhost:8471 -filter emoticon,medical
Filters are applied in the order in which they appear. The order specified can
affect the output if replacement text generated by one filter matches a transform
for a following filter. For example, if you input this text:
My computer is broken. :-(
the emoticon filter outputs:
My computer is broken. sad face
If your medical filter recognizes sad as an acronym to be expanded, then it
would output:
My computer is broken. Seasonal Affective Disorder face
Reversing the order of the filters yields the following output:
Input Replacement text
:-) smiling face
;-) winking face
:-( sad face
Chapter 6 Preprocessing text input
Creating custom filters for text replacement
53
My computer is broken. sad face
Using the Text Replacement Filter Editor
To use the Text Replacement Filter Editor:
1 Start the tool:
On Solaris, enter:
> ReplacementFilterEditor.sh
2 In the Replace text box, enter the text you want recognized. You can also
choose to:
Match case
By default, Vocalizer ignores the case for text entered. Selecting Match
Case allows a case-sensitive search for the specified string.
3 In the With text box, enter the text you want Vocalizer to output.
4 Click Add Replacement.
Nuance Vocalizer 3.0
Developers Guide
54
When youre done, save the list of replacements as a file in the
%VOCALIZER%\data\replace-filter directory. You can use the Text Replacement
Filter Editor to open and edit any of your text replacement files.
Distributing changes
You can create the filter files for all your Vocalizer servers on a single
development machine. After saving your changes in the Text Replacement Filter
Editor, the filter files are stored in the %VOCALIZER%\data\replace-filter
directory. You can distribute this file to your other TTS server machines by
copying the contents of the %VOCALIZER%\data\replace-filter directory to the
same path on the machine(s) where you want to distribute the changes.
Chapter 7 Generating audio files from text input
Using the Offline Audio Generator
7
55
Chapter 7
Generating audio files from
text input
Vocalizer can generate audio streams directly from user input or from text files.
The resulting audio stream can be saved to files for playback. This chapter
explains how to do this using the Offline Audio Generator.
Using the Offline Audio Generator
The Offline Audio Generator is a GUI tool which accepts text input either
directly from users or from text files.
The text input is sent to Vocalizer, which returns an audio stream. The audio is
played back to the user and can be written to a file in one of these formats:
Headerless mulaw
Headerless Linear
On Solaris, enter:
> OfflineAudioGenerator.sh
The GUI will appear.
3 Enter your IP address and the TTS port number. (The default TTS port is
32323).
4 Specify the text to be rendered by selecting either:
Chapter 7 Generating audio files from text input
Using the Offline Audio Generator
57
Use text from fileSelect a text file through the Browse button
5 Choose whether or not to save audio output to a file. If you select Save
output to file:
a Specify the file to which audio output should be written, by either
entering the filename in the output file field or selecting an existing audio
file to overwrite.
b Specify the audio format for saving output from the drop-down menu.
By default, audio files are generated as .wav files. Choose the Sphere
headered for use with Nuance applications, or headerless formats for use
with other audio editors.
6 Click Play.
You can stop playback at anytime by clicking the Stop button. The Exit
button also stops playback before quitting.
Validating multi-server configurations
If you want to test the Vocalizer installation on other servers in your
configuration, enter the IP address of the appropriate machine. The text is sent
to the Vocalizer server on that machine.
Nuance Vocalizer 3.0
Developers Guide
58
Chapter 8 Customizing your applications dictionary
Features of the Dictionary Editor
8
59
Chapter 8
Customizing your
applications dictionary
Vocalizer includes the Dictionary Editor, a graphical tool that allows you to
dynamically modify the dictionary files used by the TTS engine.
This chapter describes how to use the Dictionary Editor to add words and
acronyms to your dictionary, or modify the pronunciation of existing entries. It
also explains how to distribute changes made on one machine to other Vocalizer
servers.
Features of the Dictionary Editor
The Dictionary Editor is a Java application built on Java 2. The Java Runtime
Environment (JRE) 1.3 installed with Vocalizer allows you to run this utility. See
Third-party software requirements on page 4.
The Dictionary Editors main menu includes File and Options:
Modify the port settings for Vocalizer audio generation. The default TTS
port is 32323, and the default port for phonetic generation is 22552.
Nuance Vocalizer 3.0
Developers Guide
60
U.K. English
Word EditorLets you view the current dictionary entries, add new words,
modify the way existing entries are pronounced, or remove entries
Say As (in the Acronym Editor only)Displays the text expansion for
acronyms
On Solaris, enter:
> DictionaryEditor.sh
Note: Changes are not saved automatically when you close the editing tool. After
following the procedures for adding or modifying dictionary entries, make sure
to save your changes before closing the Dictionary Editor.
Word Pronunciation encoding
emoticon i ' m o - t I . k A n
laden ' l A - d * n
Chapter 8 Customizing your applications dictionary
Using the Dictionary Editor
65
Enabling logging
You can edit the Dictionary.bat file to enable debug information to be sent to a log
file (dictionaryEditor.log) in the %VOCALIZER%\bin\win32 directory:
1 From the %VOCALIZER%\bin\win32 directory, open Dictionary.bat as a text
file.
2 Before -D"INSTALL_PATH", add the parameter setting -D"debug=true". For
example:
"C:\Program Files\JavaSoft\JRE\1.3\bin\java" -jar -cp
"C:\Program Files\Nuance\Vocalizer3.0\java"
-D"debug=true"
-D"INSTALL_PATH=C:\Program Files\Nuance\Vocalizer3.0"
"C:\Program Files\Nuance\Vocalizer3.0\java\
DictionaryEditor.jar"
Adding new entries
To add a new word or acronym to the dictionary:
1 In Word Options, click New to open the Word Editor or Acronym Editor.
2 In the Spelling field, enter the new word or acronym.
Make sure to enter the correct spelling for text transcriptions sent to the TTS
engine; dictionary entries are not case-sensitive.
If you are entering a new word, the User Pronunciation field will
automatically display a phonetic translation based on the text in the Spelling
field.
3 From the Part of Speech drop-down menu, make the appropriate selection,
for example, noun, proper or adverb.
4 If you are entering a new acronym, in the Say As field, enter the expansion.
For example, with the acronym TTS you would enter text-to-speech.
The User Pronunciation field automatically displays a phonetic translation
based on the text in the Say As field.
5 In the User Pronunciation, build or modify the phonetic pronunciation for
your word:
a This field is filled automatically, but you can modify the phonetic
translation using the Phoneme Set. See Phoneme set on page 63.
b Add Primary or Secondary Stress symbols before the syllable to be
accented; see Inserting stress markers on page 63.
Nuance Vocalizer 3.0
Developers Guide
66
6 Press the Play button next to User Pronunciation or use the Scratchpad to
test your phonetic translation; see Testing your entries in Scratchpad on
page 67.
7 When you are happy with the pronunciation, click OK to add the word to the
dictionary.
8 Save your changes. From the File menu, select either:
From the Acronym Editor, you can modify acronym expansion and
pronunciation
2 In Word Options, click Edit to open up the Word Editor (or Acronym Editor).
3 Modify the appropriate Word Details, as described in Adding new entries
on page 65.
4 Click OK when you are done.
5 Save your changes. From the File menu, select either:
Select User to play your text string using the modified pronunciation; if
youve changed the Say As field for an acronym, Scratchpad plays the
modified expansion.
Select System to play your text string using the systems version of that
entrys pronunciation.
Deleting entries
To delete from the Dictionary Entries:
1 In Dictionary Entries, select the appropriate word or acronym.
2 In Word Options, select Delete.
3 Save your changes. From the File menu, select either:
Comma (,)
Semicolon (;)
Colon (:)
End of phrase
Either a:
Period (.)
Ellipsis (...)
Followed by a single space
End of sentence
Period (.)not immediately following an abbreviation or
a single characterfollowed by a single space
See Hyphens and dashes on page 71
End of sentence
Either a:
Period (.)
Ellipsis (...)
Followed by a:
Single quote ( )
Double quote ( )
Closing parenthesis )
and either a single space, a new line, or the end of input
End of sentence
Period (.) preceded by a non-alphabetic character End of sentence
Two or more consecutive new lines End of paragraph
End of a sentence followed by a new line and either
another new line, a tab, or a space
End of paragraph
End of input End of paragraph
Chapter 9 Techniques for enhancing audio output
Phrasing and punctuation
71
Note: The last character in a sentence does not necessarily have to be a sentence
terminator. In some cases, sentences are enclosed inside quoted or
parenthesized sections, for example:
Then he said: Sorry, I cant do that.
Periods
Periods can be used as sentence terminators, abbreviation markers, decimal
points, IP address dividers, and date dividers. Vocalizer determines the correct
interpretation from the context:
A closing single quote if the character is the last character of a word whose last
letter is not s
An apostrophe if the character is the last character of a word whose last letter
is s and a quoted section has not been opened previously
A string of letters including one or more uppercase letters (not including the
first letter)
Euro symbol ()
Text encoding
Abbreviations
Capitalization
Numbers
Currency
Telephone numbers
Addresses
Dates
Times
Periods
A trait dunion if located between letters and consider each side as being part
of the same word if in dictionary.
A string of letters including one or more uppercase letters (not including the
first letter)
Cent sign ()
Euro symbol ()
Yen symbol ()
Vocalizer supports the official French convention of currency symbols following
the digits and will also accept currency symbols preceding the digits, as in
English, which allows the abbreviation for millions or billions suffix to be
added.
Input text Output
50$ cinquante dollars
12$ CAD douze dollars canadien
12 $USD douze dollars amricain
$25.50 vingt-cinq dollars et cinquante cennes
10,40 $ CAD dix dollars canadiens et quarante cennes
$0,99 quatre-vingt-dix-neuf cennes
65,66USD soixante-cinq virgule soixante-six cennes amricain
$50M cinquante millions de dollars
10.20 dix livres et vingt penny
12,34 douze virgule trente-quatre euros
30,75FF trente francs et soixante-quinze centimes
12,34 DM douze virgule trente-quatre deutsche mark
Nuance Vocalizer 3.0
Developers Guide
98
Telephone numbers
Vocalizer recognizes local and national telephone number formats used in North
America, including area codes.
The digits are grouped as they are presented, in order that pauses may be
introduced:
Addresses
Vocalizer recognizes zip and postal codes and address abbreviations, such as
ave or st. Addresses are handled as follows:
100 cent yen
Input text Output
Input text Output
256 9866 deux cinq six (pause) neuf huit six six
514-732-4619 cinq un quatre (pause) sept trois deux (pause)
quatre six un neuf
(819) 623-4455 huit un neuf (pause) six deux trois (pause)
quatre quatre cinq cinq
1 800 256 9866 un (pause) huit cents (pause) deux cinq six (pause)
neuf huit six six
1-514-732-4619 un (pause) cinq un quatre (pause) sept trois deux
(pause) quatre six un neuf
Input text Output
Alexandre LeGrand
101 3e ave.
Mtl, Qc
H5T 2v2
Alexandre LeGrand cent et un troisieme avenue (pause)
Montral (pause) Qubec h cinq t deux v deux
111 Duke, Mtl, Qc,
Canada, H8T2V7
cent onze Duke (pause) Montral (pause) Qubec (pause)
Canada (pause) h huit t deux v sept
CA 22333 Californie deux deux trois trois trois
Appendix A Text processing for Canadian French
Dates
99
Note: Pauses are added when there is punctuation between address fields.
Dates
Vocalizer recognizes dates specified in both text (9 dcembre, 2003) and digit
(2003/12/09) formats, and reads them accordingly. The digit format can be
delimited with forward slashes (/), periods (.), or hyphens (-). Normally,
the digit order is year first, followed by month then day, but the reverse order is
also supported (day/month/year) when no ambiguity is possible with the main
convention. Standard abbreviations for months are supported.
CO. 44666-1234 Colorado quatre quatre six six six trait dunion un deux trois
quatre
Input text Output
Input text Output
1999/1/5 le cinq janvier mille neuf cent quatre-vingt-dix-neuf
13-12-2003 le treize dcembre deux mille trois
99/03/04 le quatre mars mille neuf cent quatre-vingt-dix-neuf
01/02/03 le trois fvrier deux mille un
02/4/5 le cinq avril deux mille deux
6 jan, 2002 le six janvier deux mille deux
4 fev le quatre fvrier
2003-04-02 le quatre fvrier deux mille trois
14/14/2000 un quatre barre oblique un quatre barre oblique deux zro
zro zro
Note: This example can not be a valid date so it is
interpreted as an account number. See Account numbers
on page 95.
Nuance Vocalizer 3.0
Developers Guide
100
Times
Vocalizer recognizes time-of-day phrases using h and min to delimit the
hour and minute fields. It also recognizes time phrases that are succeeded by the
abbreviations am, pm, AM, or PM with a space between the number
and the abbreviation.
Email and web addresses
Vocalizer reads email and web addresses naturally, including symbols such as
@ (expanded to a commercial) and . (expanded to point). Vocalizer
determines when address segments can be read as words or need to be spelled
out. For example:
Input text Output
5h30min30sec cinq heures trente minutes et trente secondes
13h45 treize heures quarante-cinq minutes
4 h 20 min quatre heures vingt minutes
20h vingt heures
20h00 vingt heures
0h30 zro heures trente minutes
6h pm six heures p m
3h 30min am trois heures trente minutes a m
9h AM neuf heures a m
Input text Output
[email protected] courrier a commercial nuance point com
[email protected] a b c a commercial nuance point com
www.nuance.com triple w point nuance point com
http://www.nuance.com h t t p deux points barre oblique barre oblique triple w
point nuance point com
Appendix A Text processing for Canadian French
Email and web addresses
101
http://support.nuance.com
/developers/index.html
h t t p deux points barre oblique barre oblique support
point nuance point com barre oblique developers barre
oblique index point h t m l
Input text Output
Nuance Vocalizer 3.0
Developers Guide
102
Appendix B Text processing for American Spanish
B
103
Appendix B
Text processing for
American Spanish
This appendix outlines how Nuance Vocalizer processes text input for
American Spanish.
Topics include:
Text encoding
Regionalism
Abbreviations
Capitalization
Numbers
Currency
Telephone numbers
Dates
Times
Addresses
Periods
A string of letters including one or more uppercase letters (not including the
first letter)
Cent sign ()