NLS

Download as pdf or txt
Download as pdf or txt
You are on page 1of 138

Ascential DataStage

NLS Guide

Version 7.5 June 2004 Part No. 00D-0007DS75

Published by Ascential Software Corporation. 2004 Ascential Software Corporation. All rights reserved. Ascential, DataStage, QualityStage, AuditStage,, ProfileStage, and MetaStage are trademarks of Ascential Software Corporation or its affiliates and may be registered in the United States or other jurisdictions. Windows is a trademark of Microsoft Corporation. Unix is a registered trademark of The Open Group. Adobe and Acrobat are registered trademarks of Adobe Systems Incorporated. Other marks are the property of the owners of those marks. This product may contain or utilize third party components subject to the user documentation previously provided by Ascential Software Corporation or contained herein. Documentation Team: Mandy deBelin

Table of Contents
How to Use this Guide
Organization of This Manual .................................................................................... 1-v Documentation Conventions ...................................................................................1-vi

Chapter 1. What Is NLS?


NLS Mode .................................................................................................................... 1-1 How NLS Mode Works .............................................................................................. 1-1

Chapter 2. Server Jobs and NLS


Maps and Locales in DataStage Jobs ........................................................................ 2-1 Using Maps in Server Jobs ......................................................................................... 2-5 Using Locales in Server Jobs .................................................................................... 2-12 Creating New Maps .................................................................................................. 2-15 How Locales Work .................................................................................................... 2-21 Creating New Locales .............................................................................................. 2-24

Chapter 3. Parallel Jobs and NLS


Maps and Locales in DataStage Parallel Jobs ......................................................... 3-1 Using Maps in Parallel Jobs ....................................................................................... 3-3 Using Locales in Parallel Jobs ................................................................................. 3-10 Defining Date/Time and Number Formats .......................................................... 3-15 Creating New Maps .................................................................................................. 3-22 Overriding Collate Conventions ............................................................................. 3-26

Appendix A. NLS and Server Jobs - Supplementary Information


The NLS Administration Tool .................................................................................. A-1 The NLS Database ...................................................................................................... A-6

Table of Contents

iii

Appendix B. Maps and Locales Supplied with DataStage


Server Job Character Set Maps ................................................................................. B-1 Server Job Locales ....................................................................................................... B-5 Parallel Job Character Set Maps ............................................................................... B-7 Parallel Job Locales ................................................................................................... B-16

iv

NLS Guide

How to Use this Guide


This guide is for users, programmers, and administrators who are familiar with DataStage and want to use and manage its National Language Support (NLS) facilities. To find particular topics you can: Use the Guides contents list (at the beginning of the Guide). Use the Guides index (at the end of the Guide). Use the Adobe Acrobat Reader bookmarks. Use the Adobe Acrobat Reader search facility (select Edit Search). The guide contains links both to other topics within the guide, and to other guides in the DataStage manual set. The links are shown in blue. Note that, if you follow a link to another manual, you will jump to that manual and lose your place in this manual. Such links are shown in italics.

Organization of This Manual


This manual contains the following: Chapter 1 gives an overview of how NLS works, and describes the NLS features that are included in DataStage. Chapter 2 gives details about NLS in DataStage server jobs Chapter 3 gives details about NLS in DataStage parallel jobs. Appendix A contains reference information about NLS and server jobs. Appendix B describes the national convention hooks users can write to implement specific NLS functions and then hook them into UniVerse. The Glossary defines the NLS terms that are used in this manual.

How to Use this Guide

Documentation Conventions
This manual uses the following conventions: Convention Bold Usage In syntax, bold indicates commands, function names, and options. In text, bold indicates keys to press, function names, menu selections, and MS-DOS commands. In syntax, uppercase indicates DataStage commands, keywords, and options; BASIC statements and functions; and SQL statements and keywords. In text, uppercase also indicates DataStage identifiers such as filenames, account names, schema names, and Windows NT filenames and pathnames. In syntax, italic indicates information that you supply. In text, italic also indicates UNIX commands and options, filenames, and pathnames. Courier indicates examples of source code and system output. In examples, courier bold indicates characters that the user types or keys the user presses (for example, <Return>). Brackets enclose optional items. Do not type the brackets unless indicated. Braces enclose nonoptional items from which you must select at least one. Do not type the braces. A vertical bar separating items indicates that you can choose only one item. Do not type the vertical bar. Three periods indicate that more of the same type of item can optionally follow. A right arrow between menu options indicates you should choose each option in sequence. For example, Choose File Exit means you should choose File from the menu bar, then choose Exit from the File pull-down menu. Item mark. For example, the item mark ( I ) in the following string delimits elements 1 and 2, and elements 3 and 4:
1I2F3I4V5
F

UPPERCASE

Italic

Courier Courier Bold

[] {}
itemA | itemB ...

Field mark. For example, the field mark ( F ) in the following string delimits elements FLD1 and VAL1:
FLD1FVAL1VSUBV1SSUBV2

vi

Ascential DataStage NLS Guide

Convention
V

Usage Value mark. For example, the value mark ( V ) in the following string delimits elements VAL1 and SUBV1:
FLD1FVAL1VSUBV1SSUBV2

Subvalue mark. For example, the subvalue mark ( S ) in the following string delimits elements SUBV1 and SUBV2:
FLD1FVAL1VSUBV1SSUBV2

Text mark. For example, the text mark ( T ) in the following string delimits elements 4 and 5: 1F2S3V4T5

The following conventions are also used: Syntax definitions and examples are indented for ease in reading. All punctuation marks included in the syntaxfor example, commas, parentheses, or quotation marksare required unless otherwise indicated. Syntax lines that do not fit on one line in this manual are continued on subsequent lines. The continuation lines are indented. When entering syntax, type the entire syntax entry, including the continuation lines, on the same input line.

DataStage Documentation
DataStage documentation includes the following: DataStage Install and Upgrade Guide. This guide contains instructions for installing DataStage on Windows and UNIX platforms, and for upgrading existing installations of DataStage. DataStage Administrator Guide: This guide describes DataStage setup, routine housekeeping, and administration. DataStage Designer Guide This guide describes the DataStage Designer, and gives a general description of how to create, design, and develop a DataStage application. DataStage Manager Guide: This guide describes the DataStage Manager and describes how to use and maintain the DataStage Repository. DataStage Server: Server Job Developers Guide: This guide describes the tools that are used in building a server job, and it supplies programmers reference information.

How to Use this Guide

vii

DataStage Enterprise Edition: Parallel Job Developers Guide: This guide describes the tools that are used in building a parallel job, and it supplies programmers reference information. DataStage Enterprise Edition: Parallel Job Advanced Developers Guide: This guide gives more specialized information about parallel job design. DataStage Enterprise MVS Edition: Ascential DataStage Mainframe Job Developers Guide: This guide describes the tools that are used in building a mainframe job, and it supplies programmers reference information.. DataStage Director Guide: This guide describes the DataStage Director and how to validate, schedule, run, and monitor DataStage server jobs. These guides are also available online in PDF format. You can read them using the Adobe Acrobat Reader supplied with DataStage. See Install and Upgrade Guide for details on installing the manuals and the Adobe Acrobat Reader. You can use the Acrobat search facilities to search the whole DataStage document set. To use this feature, select Edit Search then choose the All PDF documents in option and specify the DataStage docs directory (by default this is C:\Program Files\Ascential\DataStage\Docs). Extensive online help is also supplied. This is particularly useful when you have become familiar with DataStage, and need to look up specific information.

viii

Ascential DataStage NLS Guide

1
What Is NLS?
NLS Mode
When you install DataStage With NLS mode enabled, you can use DataStage in various languages and countries. You can do the following: Use DataStage in various languages. This includes languages that use multi-byte characters, such as Japanese. Read and write data in multi-byte character sets and process the data within DataStage. This is regardless of the language of DataStage itself. For example, you can process Japanese data in an English version of DataStage, or process English data in a Japanese version of DataStage. Use locales to change things like collating sequence, monetary conventions, date/time format from outside a job design. You must enable NLS when you install DataStage. If you choose to install a non-English language version of DataStage, NLS is enabled automatically. If you choose to install an English version of DataStage, you specify separately whether NLS is enabled or not.

How NLS Mode Works


NLS mode works by using two types of character set: The NLS internal character set External character sets that cover the worlds different languages In NLS mode, DataStage maps between the two character sets when its needed.

What Is NLS?

1-1

The mechanism for handling NLS differs for parallel and server jobs. They each use a different internal character set, so each uses a different set of maps for converting data. Note that it is certain types of string (i.e. character) data that needs mapping, purely numeric data types never require it. Parallel and server jobs also use different locales.

Internal Character Sets


The internal character set can represent at least 64,000 characters. Each character in the internal character set has a unique code point. This is a number that is by convention represented in hexadecimal format. You can use this number to represent the character in programs. DataStage easily stores many languages. The NLS internal character sets conform to the Unicode standard. The Unicode consortium specify a number of ways to represent code points, called Unicode Transformation Formats (UTF). Server jobs use UTF-8, parallel jobs use UTF-16. Because the two types of job use different internal character sets, a different set of maps are provided for conversion to and from each one (although equivalents to commonly used server job maps are provided for parallel jobs). For more information about Unicode, see the Unicode Consortiums World Wide Web page at http://www.unicode.org.

Mapping
When you need to transform or transfer data, NLS maps the data to or from the external character set you want to use. NLS includes map tables for many of the character sets used in the world (see the list in Appendix B). You can specify mapping at different levels within DataStage: A project-wide default. In the DataStage Administrator client you specify a default map for all server jobs in a project, and a default map for all parallel jobs in a project. A job default. In the DataStage Designer, you can specify a default map used by a particular job that overrides the project default.

1-2

Ascential DataStage NLS Guide

A stage map. Certain parallel and server stages allow you to specify that they use a particular map. This overrides both the project default and the job detail. A column map. Certain parallel and server stages support percolumn mapping. This allows you to specify a separate map for particular data columns. This overrides the project default, job default, and stage maps. Note: If your files contain only ASCII 7-bit characters, they need not be mapped.

Locales
Strictly speaking, a DataStage NLS locale is a set of national conventions. A locale is viewed as a separate entity from a character set. You need to consider the language, character set, and conventions for data formatting that one or more groups of people use. You define the character set independently, although for national conventions to work correctly, you must also use the appropriate character sets. For example, Venezuela and Ecuador both use Spanish as their language, but have different data formatting conventions. Locales do not respect national boundaries. One country may use several locales, for example, Canada uses two and Belgium uses three. Several countries may use one locale, for example, a multinational business could define a worldwide locale to use in all its offices. Appendix B lists all the locales that are supplied with DataStage and the territories and languages associated with them. Server jobs allow you to choose locales separately for several different aspects of National conventions: The format for times and dates The format for displaying numbers How to display monetary values Whether a character is alphabetic, numeric, nonprinting, and so on The order in which characters should be sorted (collation)

You can mix locales if required, for example you could specify times and dates in one locale and monetary conventions in another. Parallel jobs allow you to choose locales separately for: The order in which characters should be sorted (collation)

What Is NLS?

1-3

You can specify locales at different levels within DataStage: A project-wide default. In the DataStage Administrator client you specify default locales for all server jobs in a project, and a default locale for all parallel jobs in a project. A job default. In the DataStage Designer, you can specify default locales used by a particular job that overrides the project default. A stage locale. Certain parallel stages allow you to specify that they use a particular locale. This overrides both the project default and the job default. Note: This manual uses the term territory rather than country to describe an area that uses a locale. Time and Date. Most territories have a preferred style for presenting times and dates. For times, this is usually a choice between a 12-hour or 24hour clock. For dates, there are more variations. Here are some examples of formats used by different locales to express 9.30 at night on the first day of April in 1990: Territory France U.S. Japan Time 21h30 9:30 p.m. 21:30 Date 1.4.90 4/1/90 90.4.1 DataStage Locale FR-FRENCH US-ENGLISH JP-JAPANESE

Numeric. This convention defines how numbers are displayed, including: The character used as the decimal separator (the radix character) The character used as a thousands separator Whether leading zeros should be used for numbers 1 through 1 For example, the following numbers can all mean one thousand, depending on the locale you use: Territory Ireland Netherlands France Number 1,000 1.000 1 000 DataStage Locale IE-ENGLISH NL-DUTCH FR-FRENCH

1-4

Ascential DataStage NLS Guide

Monetary. This convention defines how monetary values are displayed, including: The character used as the decimal separator. This may differ from the decimal separator used in numeric formats. The character used as a thousands separator. This may differ from the thousands separator used in numeric formats. The local currency symbol for the territory, for example, $, , or . The string used as the international currency symbol, for example, USD (US Dollars), NOK (Norwegian Kroner), JPY (Japanese Yen). The number of decimal places used in local monetary values. The number of decimal places used in international monetary values. The sign used to indicate positive monetary values. The sign used to indicate negative monetary values. The relative positions of the currency symbol and any positive or negative signs in monetary values. Here are examples of monetary formats different locales use: Currency U.S. Dollars UK Pounds German Marks German Euros Format $123.45 37,000.00 DM123,45 123,45 DataStage Locale US-ENGLISH GB-ENGLISH DE-GERMAN DE-GERMAN-EURO

Character Type. This convention defines whether a character is alphabetic, numeric, nonprinting, and so on. This convention also defines any casing rules, for example, some letters take an accent in lowercase but not in uppercase. Collation. This convention defines the order in which characters are collated, that is, sorted. There can be many variations in collation order within a single character set. For example, the character follows A in Germany, but follows Z in Sweden.

What Is NLS?

1-5

1-6

Ascential DataStage NLS Guide

2
Server Jobs and NLS
This chapter gives details about NLS in DataStage server jobs. It covers: Maps and locales available in server jobs Loading maps and loading locales Considerations about character data in server jobs How to use maps and locales in server jobs Creating new maps for server jobs Creating new locales for server jobs

Maps and Locales in DataStage Jobs


A large number of maps and locales are installed when you install DataStage with NLS enabled. DataStage makes a distinction between available maps and locales and loaded maps and locales. Depending on what language you specify when you install DataStage, a set of maps and locales are compiled and loaded ready for use when designing and running DataStage server jobs. Available maps and locales are those that DataStage has available for compiling and loading; these can be specified when designing jobs but must be actually loaded before you run a job that uses them. You can view what maps and locales are currently loaded and which ones are available from the DataStage Administrator: 1. Open the DataStage Administrator client.

Server Jobs and NLS

2-1

2.

Click the Projects tab to go to the Projects page.

3.

Select a project and click the NLS button to open the Project NLS Settings dialog box for that project. By default this shows all the maps currently loaded for server jobs. Choose the Show all maps option to see a list of maps available for loading.

4.

To view loaded locales click the Server Locales tab. Click on the down arrow next to each locale category to see drop down list of

2-2

Ascential DataStage NLS Guide

loaded locales. Select the Show all locales option to have the drop down lists show all the maps available for loading.

Loading Maps
To load one of the available maps so that it can be used by jobs at run time:

Server Jobs and NLS

2-3

1.

In the Server Maps page, click the Install >> button. The page expands to show lists of available and loaded maps:

2.

Select the map you want to load from the Available list on the left and click the Add> button. A dialog box asks you to confirm the action. Click Yes. When the map has been compiled it is added to the Installed list on the right. You need to stop and restart the DataStage engine before it is actually loaded, so initially there is no tick beside it. Stop and restart the DataStage engine either by rebooting the machine or stopping and starting the DataStage services (see DataStage Administrators Guide for instructions how to do this). The map is then available for jobs at run time.

3.

Loading Locales
To load one of the available locales so that it can be used by jobs at run time:

2-4

Ascential DataStage NLS Guide

1.

In the Server Locales page, click the Install >> button. The page expands to show lists of available and loaded locales:

2.

Select the locale you want to load from the Available list on the left and click the Add> button. A dialog box asks you to confirm the action. Click Yes. When the locale has been compiled it is added to the Installed list on the right. You need to stop and restart the DataStage engine before it is actually loaded, so initially there is no tick beside it. Stop and restart the DataStage engine either by rebooting the machine or stopping and starting the DataStage services (see DataStage Administrators Guide for instructions how to do this). The locale is then available for jobs at run time.

3.

Using Maps in Server Jobs


Basically you need to use a map whenever you are reading character data (other than 7-bit ASCII) into DataStage or writing character data out of DataStage. The map tells DataStage how to convert the external character set into the internal Unicode character set.

Server Jobs and NLS

2-5

You do not need to map data if you are: Handling purely numeric data. Reading from or writing to a stage representing the internal storage provided by DataStage (i.e., Hashed File stage or UniVerse stage). Reading from or writing to an external UniVerse database with NLS enabled. Reading or writing 7-bit ASCII data. DataStage allows you to specify the map to use at various points in a job design: You can specify the default map for a project. This is used by all stages in all jobs in a project unless specifically overridden in the job design. You can specify the default map for a job. This is used by all stages in a job (replacing the project default) unless overridden in the job design. You can specify a map for a particular stage in your job. This overrides both the project default and the job default. For certain stages you can specify a map for individual columns, this overrides the project, job, and stage default maps.

Character Data in Server Jobs


You only need to specify a character set map where your job is processing character data. DataStage has a number of character types which can be specified as the SQL type of a column: Char VarChar LongVarChar NChar NVarChar NLongVarChar

All of the above denote string columns, which need to be mapped to DataStages internal Unicode character set.

2-6

Ascential DataStage NLS Guide

Specifying a Project Default Map


You specify the default map for a project in the DataStage Administrator Client: 1. 2. Open the DataStage Administrator client. Click the Projects tab to go to the Projects page.

3.

Select the project for which you want to set a default map and click the NLS button to open the Project NLS Settings dialog box for

Server Jobs and NLS

2-7

that project. By default this shows all the maps currently loaded for server jobs.

4.

Choose the map you want from the Default map name list. You select the Show all maps options and choose a map that is not yet loaded, but not that you will have to load the map (see Loading Maps on page 2-3) before any jobs that use the map are run. Click OK. The selected map is now the default one for that project and is used by all the jobs in that project.

5.

Specifying a Job Default Map


You specify a default map for a particular job in the DataStage Designer, using the Job Properties dialog: 1. 2. Open the job for which you want to set the map in the DataStage Designer. Open the Job Properties dialog box for that job (choose Edit Job Properties).

2-8

Ascential DataStage NLS Guide

3.

Click the NLS tab to go to the NLS page:

4.

Choose the map you want from the Default map for stages list. You select the Show all maps options and choose a map that is not yet loaded, but not that you will have to load the map (see Loading Maps on page 2-3) before the job is actually run. Click OK. The selected map is now the default one for that job and is used by all the stages in that job.

5.

Specifying a Stage Map


You specify a map for a particular stage to use in the stage editor dialog in the DataStage Designer. You can specify maps for all types of stage except: Active stages such as the Aggregator and Transformer. These deal with data that has already been input to DataStage and so has already been mapped. Stages that use the internal storage offered by DataStage, i.e., Hashed File and UniVerse stages. These handle data in the Unicode character set, so require no mapping.

Server Jobs and NLS

2-9

To specify a map for a stage: 1. Open the stage editor in the job in the DataStage Designer. Select the NLS tab on the Stage page:

2.

Do one of the following: Choose the map you want from the Map name for use with stage list. You select the Show all maps options and choose a map that is not yet loaded, but not that you will have to load the map (see Loading Maps on page 2-3) before the job containing this stage is actually run. Click the Use Job Parameter button. This allows you to select an existing job parameter or specify a new one. When the job is run, DataStage will use the value of that parameter for the name of the map to use.

3.

Click OK. The selected map or job parameter are used by the stage.

Specifying a Column Map


Certain types of server job stage allow you to specify a map that is used for a particular column in the data handled by that stage. The following stages permit per-column mapping:

2-10

Ascential DataStage NLS Guide

ODBC stage Sequential File stage To specify a per-column map: 1. Open the stage editor in the job. Click on the NLS tab on the Stage page:

Server Jobs and NLS

2-11

2.

Select the Allow per-column mapping option. Then go to the Inputs or Outputs page (depending on whether you are writing or reading data) and select the Columns tab:

3. 4.

The columns grid now has an extra field called NLS Map. Choose the map you want for a particular column from the drop down list. Click OK.

Using Locales in Server Jobs


Locales allows you to specify that data is handled in accordance with the conventions of a certain territory. There is not always a direct relationship between locale and language, for example the French locale is different to the French Canadian one. Server jobs allow you to choose locales separately for several different aspects of National conventions: The format for times and dates The format for displaying numbers How to display monetary values Whether a character is alphabetic, numeric, nonprinting, and so on The order in which characters should be sorted (collation)

2-12

Ascential DataStage NLS Guide

You can mix locales if required, for example you could specify times and dates in one locale and monetary conventions in another. Descriptions of each type of convention are given in Locales on page 1-3. In server jobs you can set a default locale for a project or for an individual job.

Specifying a Project Default Locale


You specify the default locale for a project in the DataStage Administrator Client: 1. 2. Open the DataStage Administrator client. Click the Projects tab to go to the Projects page.

3.

Select the project for which you want to set a default map and click the NLS button to open the Project NLS Settings dialog box for

Server Jobs and NLS

2-13

that project. Click the Server Locales tab to go to the Server Locales page.

4.

Click on the arrow next to the category for which you want to set a locale, and choose a locale from the drown down list. You can select the Show all locales options and choose a locale that is not yet loaded, but not that you will have to load the locale (see Loading Locales on page 2-4) before you run jobs that use it. Click OK. The selected locale is now the default one for that category in the project and is used by all the jobs in that project.

5.

Specifying a Job Default Locale


You specify a default locale for a particular job in the DataStage Designer, using the Job Properties dialog: 1. 2. Open the job for which you want to set the locale in the DataStage Designer. Open the Job Properties dialog box for that job (choose Edit Job Properties).

2-14

Ascential DataStage NLS Guide

3.

Click the NLS tab to go to the NLS page:

4.

Click on the arrow next to the category for which you want to set a locale, and choose a locale from the drown down list. You can select the Show all locales options and choose a locale that is not yet loaded, but not that you will have to load the locale (see Loading Locales on page 2-4) before the job is actually run. Click OK. The selected locale is now the default one for that category in the job and is used by all the stages in that job.

5.

Creating New Maps


If the maps supplied with DataStage do not meet your needs, you can create new ones and use these in your jobs. You are most likely to want to produce a variant of an existing map rather than add an entirely new one, DataStage allows you to base a new map on an existing one and just add or alter the required mappings. You do this by creating a table and adding it to a map to make a new map.

Server Jobs and NLS

2-15

A map is defined by a Description, which in turn calls upon a Table to define the actual mappings. To create a new map, you need to define a Description and a Table. CAUTION: When you want to produce a variant of an existing map it is important that you create a new map based on the existing one. Under no circumstances should you edit one of the maps supplied with DataStage. Maps are created using the NLS administration tool. This is run in a DS engine shell as follows. You need to have DataStage Administrator status in order to be able to run this.

Running NLS Administration Tool on a Windows Server


On a Windows server: 1. Start a telnet session and connect to your DataStage server. The Welcome to DataStage Telnet Server message appears and you are prompted for a log in name and password. Enter your DataStage user name and password. You are then prompted for an account name or path. Enter uv as the account name. You are now connected to the DS engine.

2. 3.

2-16

Ascential DataStage NLS Guide

4.

At the prompt type NLS.ADMIN (note that case is important). The NLS Administration window appears:

Running NLS Administration Tool on a UNIX Server


On a UNIX server: 1. 2. 3. 4. Start a telnet session and connect to your DataStage server. CD to the DataStage engine directory ($DSHOME/DSEngine). Type bin/uvsh. At the prompt type NLS.ADMIN (note that case is important). The NLS Administration window appears.

Base Maps
A map can be based on another map and this map can be based on yet another map. To understand the complete map you must follow the chain of base maps. For more information about the construction of a map, choose Mappings Descriptions Xref and Mappings Tables Xref from the NLS Administration menu. Choose the map or table whose lineage you want to see. For example, the map C0-CONTROLS is a single-byte character set map using the C0-CONTROLS table. It maps the set of 7-bit control characters.

Server Jobs and NLS

2-17

The description report will tell you that just about every other map has C0-CONTROLS in its lineage, while it is the base map for C1-CONTROLS and ASCII.

Creating a New Map


When you need to create new maps, follow these steps: 1. 2. 3. 4. Find an existing map that most closely matches the required map. Identify the characters that need to be mapped differently in the new map. Create a new table contains only these new mappings. Create the new map by adding a new description based on the existing map but adding the new table.

2-18

Ascential DataStage NLS Guide

The following example creates a map called MY.ASCII. This map is identical to the existing ASCII map, except the input character 0x23 is mapped to the UK pound sign () instead of the number symbol (#). Your first action is to create a table called MY.POUND that performs this mapping: 1. 2. In the NLS administration tool, choose Mappings Tables Create. Specify MY.POUND as the table name:

3.

The NLS Administrator editor opens, enter I to insert new lines and add lines 1 and 2 as shown below. At line 3, just press return to exit insert mode.

4.

Type FILE to write the file and leave the table editor. In the NLS administration tool, choose Mappings Descriptions Create. Specify MY.ASCII as the description name:

Next you need to create a description. 1. 2.

3.

The NLS Administration tool asks you if you want to base the new description on an existing one. As we only require a short description, it is easier just to enter it directly, so type Q.

Server Jobs and NLS

2-19

4.

As the administration tool prompts for each field, enter the information as shown:

5.

The NLS administration tool shows you the description and gives you the opportunity to change any fields youre not happy with.

The following table shows the fields of a map description: Field 0 1 2 3 Name Map ID Map Description Base Map ID Map type Description The name used to specify the map in commands and programs. A description of the map. The name of a map to base this one on. This value must be the record ID of another description. The value of this field must be either SBCS for a singlebyte character set, or DBCS for a double-byte or multibyte character set. The default value is SBCS. The record ID of the map table that this map description refers to. You do not need to specify a value if the map table has the same ID as the map description. The display length of all characters in the mapping table specified in field 4. Most double-byte character sets have some characters that print as two display positions on a screen (for example, Hangul characters or CJK ideographs). However, the same map will usually require that ASCII characters are printed as one display position. This field does not pick up a value from any base map description. The default value is 1.

Table ID

Display length

2-20

Ascential DataStage NLS Guide

Field 6

Name Unknown char seq.

Description This field specifies the character sequence to substitute for unknown characters that do not form part of the character set. The value, which is a byte sequence in the external character set, should be a hexadecimal number from one to four bytes. The default value is 3F, the ASCII question mark character. The default is used if neither this map nor any underlying base map has a value in this field. This field contains the character sequence to compose hexadecimal Unicode values from one to four bytes. If DataStage detects the sequence on input, the next four bytes entered are checked to see if they are hexadecimal values. If so, the Unicode character with that value is entered directly. If neither this map nor any base map has a value in this field, you cannot input Unicode characters by this means. A value of NONE overrides a compose sequence set by an underlying map. The name of a map table to be used for inputting deadkey sequences. A string in hexadecimal numbers to be prefixed to all external character mappings in the table referenced by field 4. Used mainly for mapping Japanese character sets. A value in hexadecimal numbers to be added to each external mapping in the table referenced by field 4. If prefixed by a minus sign, the value is subtracted. Used mainly for mapping Japanese character sets.

Compose seq.

8 9

Input Table ID Prefix string

10

Offset value

Now that youve defined your new map you can use the DataStage administrator to make it available within your projects. Follow the instructions given in Loading Maps on page 2-3.

How Locales Work


Before you attempt to create new locales, you need to know a bit more about how DataStage defines Locales. It is important to distinguish between a locale, a category, and a convention. A locale comprises a set of categories.

Server Jobs and NLS

2-21

A category comprises a set of conventions. A convention is a rule describing how data values are input or displayed. In NLS each locale comprises five categories: Time Numeric Monetary Ctype Collate

Each category comprises various conventions specific to the type of data in each category. For example, conventions in the Time category include the names of the days of the week, the strings used to indicate AM or PM, the character that separates the hours, minutes, and seconds, and so forth. You can view this information using the NLS Administration tool: You examine the conventions defined for a locale using the NLS Administration tool. This is run in a DS engine shell as described in Running NLS Administration Tool on a Windows Server on page 2-16 and Running NLS Administration Tool on a UNIX Server on page 2-17. You need to have DataStage Administrator status in order to be able to run this. When you have start the NLS Administration tool: 1. 2. Choose Locales View. When prompted for a Locale ID, enter one of the Locale IDs (as listed in the DataStage Administrator). Choose Categories category_type List all where category_type is the type of category you want to examine. This gives a list of all the categories defined for this type. Choose Categories category_type View where category_type is the type of category you want to examine. When prompted for a Category ID, enter one of the Category IDs (as listed by the List all command).

You can also examine the categories from which Locales are built: 1.

2. 3.

The following example shows the record for the US-ENGLISH locale as displayed by the NLS Administration tool:

2-22

Ascential DataStage NLS Guide

Locale name..... Description..... Time/Date....... Numeric......... Monetary........ Ctype........... Collate......... . . .

USA Territory=USA, Language=English US-ENGLISH DEFAULT USA DEFAULT DEFAULT

A locale can be built from existing conventions without duplication. Different locales can share conventions, and one convention can be based on another. For example, Canada uses the locales CA-FRENCH and CA-ENGLISH. The two locales are not completely different; they share the same Monetary convention. The records for the CA-FRENCH and CA-ENGLISH locales look like this:
Locale name..... Description..... Time/Date....... Numeric......... Monetary........ Ctype........... Collate......... . . . Locale name..... Description..... Time/Date....... Numeric......... Monetary........ Ctype........... Collate......... . . . CA-FRENCH Country=Canada, Language=French CA-FRENCH CA-FRENCH CANADA DEFAULT DEFAULT+ACCENT+CASE

CA-ENGLISH Country=Canada, Language=English CA-ENGLISH CA-ENGLISH CANADA DEFAULT DEFAULT

Notice that for both locales the Monetary field points to a monetary convention called CANADA. The other fields contain the appropriate value for the language concerned.

Server Jobs and NLS

2-23

A detailed description of the format of the conventions in each category is given in Appendix A.

Creating New Locales


If the locales supplied with DataStage do not meet your needs, you can create new ones and use these in your jobs. You are most likely to want to produce a variant of an existing locale rather than add an entirely new one, DataStage allows you to base a new locale on an existing one and just add or alter the required details. CAUTION: When you want to produce a variant of an existing locale it is important that you create a new locale based on the existing one. Under no circumstances should you edit one of the locales supplied with DataStage. Locales are created using the NLS administration tool. This is run in a DS engine shell as described in Running NLS Administration Tool on a Windows Server on page 2-16 and Running NLS Administration Tool on a UNIX Server on page 2-17. You need to have DataStage Administrator status in order to be able to run this. The instructions take you through an example which creates a new Locale called GB-ENGLISH-EURO. Such a locale will be needed if and when the UK joins the Euro zone. It is a copy of the GB-ENGLISH locale except that it uses a different monetary category which gives a Euro sign rather than a pound sign (for completeness we will also show you how to create the Euro monetary category). We will be following these steps: 1. 2. Create a new monetary category (based on an existing one) with a Euro sign as the money symbol. Create a new locale, based on the GB-ENGLISH one, that uses the Euro monetary category.

Creating a New Convention


We are going to assume that the UK will keep its existing monetary conventions, i.e., decimal separator of . (full stop) and thousands separator of , (comma). We are therefore going to base the UK-EURO category on the existing UK category:

2-24

Ascential DataStage NLS Guide

1. 2. 3.

Choose Categories Monetary Create. When prompted enter UK-EURO as the record ID for the new category. When prompted, enter UK as the existing record you want to copy:

4.

The NLS Administration tool displays the current UK category and allows you to edit it. Type the number of the line you want to change. DataStage displays the convention heading and you can type in the new data. For the UK-EURO category, we are changing the Currency Symbol and International currency string conventions:

Creating a New Locale


We are going to create the GB-ENGLISH-EURO locale based on the GBENGLISH locale. The only difference is that it uses the UK-EURO monetary category. 1. Choose Locales Create.

Server Jobs and NLS

2-25

2. 3.

When prompted, enter GB-ENGLISH-EURO as the id of the record to create. When prompted, enter GB-ENGLISH as the id of the record you are going to base the new locale on:

4.

The NLS Administration tool displays the current GB-ENGLISH locale and allows you to edit it. Type the number of the line you want to change. DataStage displays the line heading and you can type in the new data. For the GB-ENGLISH-EURO category, change the MONETARY category to UK-EURO.

Now that youve defined your new locale you can use the DataStage Administrator to make it available within your projects. Follow the instructions given in Loading Locales on page 2-4.

2-26

Ascential DataStage NLS Guide

3
Parallel Jobs and NLS
This chapter gives details about NLS in DataStage parallel jobs. It covers: Maps and locales available in parallel jobs Considerations about character data in parallel jobs How to use maps and locales in parallel jobs Creating new maps for parallel jobs Creating new locales for parallel jobs. Note: You must be connected to a UNIX server in order to work with parallel job maps and locales. Although you can develop parallel jobs on a Windows system, you do not have access to the maps and locales.

Maps and Locales in DataStage Parallel Jobs


A large number of maps and locales are installed when you install DataStage with NLS enabled. You can view what maps and locales are currently loaded and which ones are available from the DataStage Administrator: 1. Open the DataStage Administrator client.

Parallel Jobs and NLS

3-1

2.

Click the Projects tab to go to the Projects page.

3.

Select a project and click the NLS button to open the Project NLS Settings dialog box for that project. By default this shows all the maps currently loaded for server jobs. Click the Parallel Maps tab to view the available parallel job maps. Map names beginning with ASCL are the parallel version of the maps available in server jobs.

4.

To view loaded locales, click the Parallel Locales tab. Click on the down arrow next to each locale category to see drop down list of

3-2

Ascential DataStage NLS Guide

loaded locales. Select the Show all locales option to have the drop down lists show all the maps available for loading.

Using Maps in Parallel Jobs


Basically you need to use a map whenever you are reading certain types of character data into DataStage or writing it out of DataStage. The map tells DataStage how to convert the external character set into the internal Unicode character set. You do not need to map data if you are: Handling purely numeric data. Reading or writing 7-bit ASCII data. DataStage allows you to specify the map to use at various points in a job design: You can specify the default map for a project. This is used by all stages in all jobs in a project unless specifically overridden in the job design. You can specify the default map for a job. This is used by all stages in a job (replacing the project default) unless overridden in the job design.

Parallel Jobs and NLS

3-3

You can specify a map for a particular stage in your job (depending on stage type). This overrides both the project default and the job default. For certain stages you can specify a map for individual columns, this overrides the project, job, and stage default maps.

Character Data in Parallel Jobs


You only need to specify a character set map where your job is processing character data. DataStage has a number of character types which can be specified as the SQL type of a column: Char VarChar LongVarChar NChar NVarChar LongNVarChar

DataStage parallel jobs store character data as string (byte per character) or ustring (unicode string). The Char, VarChar, and LongVarChar relate to underlying string types where each character is 8-bits and does not require mapping because it represents an ASCII character. You can, however, specify that these data types are extended, in which case they are taken as ustrings and do require mapping. They are specified as such by selecting the Extended check box for the column in the Edit Meta Data dialog box (opened for that column by selecting Edit Row from the columns grid shortcut menu). An Extended field appears in the columns grid, and extended Char, VarChar, or LongVarChar columns have Unicode in this field. The NChar, NVarChar, and LongNVarChar types relate to underlying ustring types so do not need to be explicitly extended. If you have selected Allow per-column mapping for this table (on the NLS page of the Table Definition dialog box or the NLS Map tab of a

3-4

Ascential DataStage NLS Guide

stage editor), you can select a character set map in the NLS Map field, otherwise the default map is used.

Specifying a Project Default Map


You specify the default map for a project in the DataStage Administrator Client: 1. Open the DataStage Administrator client.

Parallel Jobs and NLS

3-5

2.

Click the Projects tab to go to the Projects page.

3.

Select the project for which you want to set a default map and click the NLS button to open the Project NLS Settings dialog box for that project. By default this shows all the maps currently loaded for server jobs. Click the Parallel Maps tab.

4.

Choose the map you want from the Default map name list. Click OK. The selected map is now the default one for that project and is used by all the jobs in that project.

3-6

Ascential DataStage NLS Guide

Specifying a Job Default Map


You specify a default map for a particular job in the DataStage Designer, using the Job Properties dialog: 1. 2. 3. Open the job for which you want to set the map in the DataStage Designer. Open the Job Properties dialog box for that job (choose Edit Job Properties). Click the NLS tab to go to the NLS page:

4. 5.

Choose the map you want from the Default map for stages list. Click OK. The selected map is now the default one for that job and is used by all the stages in that job.

Specifying a Stage Map


You specify a map for a particular stage to use in the stage editor dialog in the DataStage Designer. You can specify maps for all types of stage that read or write data from/to an external data source.

Parallel Jobs and NLS

3-7

Processing, Restructure, and Development/Debug stages deal with data that has already been input to DataStage and so has already been mapped. Certain File stages, for example Data Set and Lookup File Set, represent data held by DataStage and so do not require mapping. To specify a map for a stage: 1. Open the stage editor in the job in the DataStage Designer. Select the NLS Map tab on the Stage page:

2.

Do one of the following: Choose the map you want from the Map name for use with stage list. Click the arrow button next to the map name. This allows you to select an existing job parameter or specify a new one. When the job is run, DataStage will use the value of that parameter for the name of the map to use.

3.

Click OK. The selected map or job parameter are used by the stage.

3-8

Ascential DataStage NLS Guide

Specifying a Column Map


Certain types of parallel job stage allow you to specify a map that is used for a particular column in the data handled by that stage. All the stages that require mapping allow per-column mapping except for the Database stages: To specify a per-column map: 1. Open the stage editor in the job. Click on the NLS Map tab on the Stage page:

Parallel Jobs and NLS

3-9

2.

Select the Allow per-column mapping option. Then go to the Inputs or Outputs page (depending on whether you are writing or reading data) and select the Columns tab:

3. 4.

The columns grid now has an extra field called NLS Map. Choose the map you want for a particular column from the drop down list. Click OK.

Using Locales in Parallel Jobs


Locales allows you to specify that data is sorted in accordance with the conventions of a certain territory. Note that there is not always a direct relationship between locale and language. In parallel jobs you can set a default locale for a project, for an individual job, or for a particular stage. The default is for data to be sorted in accordance with the Unicode Collation Algorithm (UCA/14651). If you select a specific locale, you are effectively overriding certain features of the UCA collation base. Note: Although you cannot specify date and time formats or decimal separators using the locale mechanism, there are ways to set these in parallel jobs. See Defining Date/Time and Number Formats on page 3-15 for details.

3-10

Ascential DataStage NLS Guide

Specifying a Project Default Locale


You specify the default locale for a project in the DataStage Administrator Client: 1. 2. Open the DataStage Administrator client. Click the Projects tab to go to the Projects page.

3.

Select the project for which you want to set a default map and click the NLS button to open the Project NLS Settings dialog box for that project. Click the Parallel Locales tab to go to the Parallel Locales page.

Parallel Jobs and NLS

3-11

4.

Click on the arrow next to the Collate category and choose a locale from the drown down list. The setting OFF indicates that sorting will be carried out according to the base UCA rules. Click OK. The selected locale is now the default one for that category in the project and is used by all the jobs in that project.

5.

Specifying a Job Default Locale


You specify a default locale for a particular job in the DataStage Designer, using the Job Properties dialog: 1. 2. 3. Open the job for which you want to set the locale in the DataStage Designer. Open the Job Properties dialog box for that job (choose Edit Job Properties). Click the NLS tab to go to the NLS page:

4.

Choose a locale from the Default collation locale for stages list. The setting OFF indicates that sorting will be carried out according to the base UCA rules.

3-12

Ascential DataStage NLS Guide

5.

Click OK. The selected locale is now the default one for the job and is used by all the stages in that job.

Specifying a Stage Locale


Stages that involve sorting of data allow you to specify a locale, overriding the project and job default. You can also specify a sort on the Partitioning tab of most stages, depending on partition method chosen. This sort is performed before the incoming data is processed by the stage. You can specify a locale for this sort that overrides the project and job default. To specify a locale for stages that explicitly sort: 1. Open the stage editor and go to the NLS Locale tab of the Stage page:

2.

Choose the required locale from the list and click OK. The stage will sort according to the conventions specified by that locale. The setting OFF indicates that sorting will be carried out according to the base UCA rules.

To specify a locale for a stage using the pre-sort facility on the Partition tab:

Parallel Jobs and NLS

3-13

1.

Open the stage editor and go to the Partitioning tab on the Inputs page.

2.

Click on the properties button erties dialog box opens:

in the Sorting area. The Sort Prop-

3-14

Ascential DataStage NLS Guide

3.

Select the required locale from the list. This will specify the conventions according to which the data is sorted before being processed by this stage. The setting OFF indicates that sorting will be carried out according to the base UCA rules.

Defining Date/Time and Number Formats


Although you cannot set new formats for dates and times or numbers using the locales mechanism, there are other ways of doing this in parallel jobs. You can do this at project level, at job level, for certain types of individual stage, and at column level.

Specifying Formats at Project Level


You can specify date/time and number formats for a project in the DataStage Administrator Client: 1. 2. Open the DataStage Administrator client. Click the Projects tab to go to the Projects page.

Parallel Jobs and NLS

3-15

3.

Select the project for which you want to set a default map and click the Properties button to open the Project Properties dialog box for that project. Click the Parallel tab to go to the Parallel page.

4.

The page shows the current defaults for date, time, timestamp, and decimal separator. To change the default, clear the corresponding System default check box, then either select a new format from the drop down list or type in a new format. Click OK to set the new formats as defaults for the project.

5.

Specifying Formats at Job Level


You specify date/time and number formats for a particular job in the DataStage Designer, using the Job Properties dialog: 1. 2. Open the job for which you want to set the formats in the DataStage Designer. Open the Job Properties dialog box for that job (choose Edit Job Properties).

3-16

Ascential DataStage NLS Guide

3.

Click the Defaults tab to go to the Defaults page:

4.

The page shows the current defaults for date, time, timestamp, and decimal separator. To change the default, clear the corresponding Project default check box, then either select a new format from the drop down list or type in a new format. Click OK to set the new formats as defaults for the job.

5.

Specifying Formats at Stage Level


Stages that have a Format tab on their editor allow you to override the project and job defaults for date and time and number formats. These stages are: Sequential File stage File Set stage External Source stage External Target stage Column Import stage Column Export stage

To set new formats in a stage editor:

Parallel Jobs and NLS

3-17

1.

Open the stage editor for the stage you want to change and go to the Formats tab on either the Input or Output page (as appropriate).

2.

To change the decimal separator, select the Decimal category under the Type defaults category in the Properties tree, then click Decimal separator in the Available properties to add list. You can then choose a new value in the Decimal separator box that appears in the top right of the dialog box:

3.

To change the date format, select the Date category under the Type defaults category in the Properties tree, then click Format string in the Available properties to add list. You can then specify a new

3-18

Ascential DataStage NLS Guide

format in the Format string box that appears in the top right of the dialog box:

4.

To change the time format, select the Time category under the Type defaults category in the Properties tree, then click Format string in the Available properties to add list. You can then specify a new format in the Format string box that appears in the top right of the dialog box:

5.

To change the timestamp format, select the Timestamp category under the Type defaults category in the Properties tree, then click Format string in the Available properties to add list. You can then

Parallel Jobs and NLS

3-19

specify a new format in the Format string box that appears in the top right of the dialog box:

Specifying Formats at Column Level


You can specify date/time and number formats at column level either from Columns tabs of stage editors, or from the Columns page of a Table Definition dialog box:

3-20

Ascential DataStage NLS Guide

1.

In the columns grid, select the column for which you want to specify a format, right click and select Edit Row from the shortcut menu. The Edit Column Meta Data dialog box appears:

2.

The information shown in the Parallel tab varies according to the type of the column you are editing. In the example it is a date column. To change the format of the date, select the Date type category in the Properties tree, then click Format string in the Available properties

Parallel Jobs and NLS

3-21

to add list. You can then specify a new format in the Format string box that appears in the top right of the dialog box:

3.

Click Apply to implement the change, then click Close.

The method for changing time, timestamp, and decimal separator are similar. When you select a column of the time, timestamp, numeric, or decimal type the available properties allow you to specify a new format for that column.

Creating New Maps


If the maps supplied with DataStage do not meet your needs, you can create new ones and use these in your jobs. You are most likely to want to produce a variant of an existing map rather than add an entirely new one. The system will not allow you to overwrite an existing map, so any maps you create must have a unique name. Note that map names are case insensitive, and ignore underscores, dashes, and spaces, so the two map names cso_iso_latin_1 would be taken as identical to CSOISOLATIN1. Ascential provides the source files for all the ASCL_ maps (i.e., the parallel job equivalents of most of the server job maps). You can copy these files and base new ones on them, you should not edit the original ASCL_ files. The procedure for setting up a new map is: 1. 2. 3. Configure your environment to allow map building. Produce a new map source file. Use the supplied tool to build the map.

3-22

Ascential DataStage NLS Guide

Setting the Environment


You need to ensure you have the correct environment settings before you create and build new maps.

Solaris
Typical settings for a Solaris system are:
APT_ORCHHOME=/export/home/dsadm/Ascential/DataStage/PXEngine ; export APT_ORCHHOME PATH=$PATH:$APT_ORCHHOME/bin:$APT_ORCHHOME/etc; export PATH LD_LIBRARY_PATH=$APT_ORCHHOME/lib ; export LD_LIBRARY_PATH APT_CONFIG_FILE=/export/home/dsadm/Ascential/DataStage/Configur ations/default.apt ; export APT_CONFIG_FILE ICU_DATA=$APT_ORCHHOME/nls/charmaps

HP-UX
Typical settings for an HP-UX system are:
APT_ORCHHOME=/export/home/dsadm/Ascential/DataStage/PXEngine ; export APT_ORCHHOME PATH=$PATH:$APT_ORCHHOME/bin:$APT_ORCHHOME/etc; export PATH SHLIB_PATH=$APT_ORCHHOME/lib ; export LD_LIBRARY_PATH APT_CONFIG_FILE=/export/home/dsadm/Ascential/DataStage/Configur ations/default.apt ; export APT_CONFIG_FILE ICU_DATA=$APT_ORCHHOME/nls/charmaps

AIX
Typical settings for an AIX system are:
APT_ORCHHOME=/export/home/dsadm/Ascential/DataStage/PXEngine ; export APT_ORCHHOME PATH=$PATH:$APT_ORCHHOME/bin:$APT_ORCHHOME/etc ; export PATH LIBPATH=$APT_ORCHHOME/lib ; export LIBPATH APT_CONFIG_FILE=/export/home/dsadm/Ascential/DataStage/Configur ations/default.apt ; export APT_CONFIG_FILE ICU_DATA=$APT_ORCHHOME/nls/charmaps

Parallel Jobs and NLS

3-23

Compaq Tru64
Typical settings for a Compaq Tru64 system are:
APT_ORCHHOME=/export/home/dsadm/Ascential/DataStage/PXEngine ; export APT_ORCHHOME PATH=$PATH:$APT_ORCHHOME/bin:$APT_ORCHHOME/etc; export PATH LD_LIBRARY_PATH=$APT_ORCHHOME/lib ; export LD_LIBRARY_PATH APT_CONFIG_FILE=/export/home/dsadm/Ascential/DataStage/Configur ations/default.apt ; export APT_CONFIG_FILE ICU_DATA=$APT_ORCHHOME/nls/charmaps

LINUX
Typical settings for a LINUX system are:
APT_ORCHHOME=/export/home/dsadm/Ascential/DataStage/PXEngine ; export APT_ORCHHOME PATH=$PATH:$APT_ORCHHOME/bin:$APT_ORCHHOME/etc; export PATH LD_LIBRARY_PATH=$APT_ORCHHOME/lib ; export LD_LIBRARY_PATH APT_CONFIG_FILE=/export/home/dsadm/Ascential/DataStage/Configur ations/default.apt ; export APT_CONFIG_FILE ICU_DATA=$APT_ORCHHOME/nls/charmaps

Map Source Files


Map source files end in .ucm. They are located in: $APT_ORCHHOME/nls/charmaps and must be built from this location. As an example, we will create a new map called MY_ASCII which is based on the ASCL_ASCII map, except the input character 0x23 is mapped to the UK pound sign () instead of the number symbol (#). To create this new map: 1. 2. In the $APT_ORCHHOME/nls/charmaps, copy ASCL_ASCII.ucm to MY_ASCII.ucm. Edit the MY_ASCII.ucm file. The format is fairly self-explanatory. The header information identifies the character set. The map itself is described between CHARMAP and END CHARMAP. The string <UNNNN> gives the Unicode character in hexadecimal. The string \xNN gives the map character in hexadecimal. See

3-24

Ascential DataStage NLS Guide

http://oss.software.ibm.com/icu/userguide/conversion-data.html for a full description of the file format.

3.

Write the file. It is now ready to be built.

Building a New Map


The example map is built in the $APT_ORCHHOME/nls/charmaps using the following command: addCustomMaps.sh MY_ASCII.ucm Once the build is complete, the map is visible in your parallel jobs and ready to use.

Parallel Jobs and NLS

3-25

Deleting a Custom Map


If you subsequently want to delete a custom map: 1. 2. 3. Edit the file $APT_ORCHHOME/nls/charmaps/convrtrs.txt. Go to the last section in the file, headed User added custom map and delete the name of the offending map. From the $APT_ORCHHOME/nls/charmaps directory, execute the following command: gncnval convrtrs.txt The character set map is removed.

Overriding Collate Conventions


DataStage allows you to tailor existing collate conventions by adding rules to them. The rules that you add override what is set by the current locale. You specify the new rules in a text file which you can reference at project, job, or stage level.

Text File Basic Format


The text file comprises a set of one or more rules, each on a separate line. Each rule contains a string of ordered characters that starts with an anchor point This is an absolute point that determines the order of other characters. It has the format &character. For example &a means the character a is the anchor point, all other rules on that line are relative to that letter. The following table gives the other symbols you can use: Symbol < << <<< = Example a<b a<< a<<<A x =y Description Identifies a primary (base letter) difference between a and b Signifies a secondary (accent) difference between a and Identifies a tertiary difference between a and A Signifies no difference between x and y

3-26

Ascential DataStage NLS Guide

For example, the rule &a < g has the following sorting consequences: Without Rule apple Abernathy bird Boston green Graham With Rule apple Abernathy green bird Boston Graham

Add the rule &A<<<G and the sorting would be as follows: With Additional Rule apple Abernathy green Graham bird Boston There are also options that you can specify in the file, and more advanced syntactical elements that you can us. These are described in full at:
http://oss.software.ibm.com/icu/userguide/Collate_Customization.html

For details of the UCA rules see:


http://www.unicode.org/unicode/reports/tr10/

Using an Override File


Once you have set up an override file you can reference it at project level, job level or stage level.

Using an Override File at Project Level


1. 2. 3. Open the DataStage Administrator. Click the Projects tab to go to the Projects page. Select the project for which you want to set a default map and click the NLS button to open the Project NLS Settings dialog box for

Parallel Jobs and NLS

3-27

that project. Click the Parallel Locales tab to go to the Parallel Locales page. 4. 5. Click the browse button next to the Collate list box. Browse for the file containing the override rules.

Using an Override File at Job Level


1. 2. 3. 4. Open the job for which you want to set the locale in the DataStage Designer. Open the Job Properties dialog box for that job (choose Edit Job Properties). Click the NLS tab to go to the NLS page. Click the browse button next to the Default collation locale for stages list box.

3-28

Ascential DataStage NLS Guide

5.

Browse for the file containing the override rules.

Using an Override File at Project Level


1. 2. Open the stage editor and go to the NLS Locale tab of the Stage page: Click the arrow button next to the Collate list box and choose Browse for file from the shortcut menu.

3.

Browse for the file containing the override rules.

Parallel Jobs and NLS

3-29

To specify a locale for a stage using the pre-sort facility on the Partition tab: 1. 2. 3. Open the stage editor and go to the Partitioning tab on the Inputs page. Click on the properties button erties dialog box opens. in the Sorting area. The Sort Prop-

Click the arrow button next to the Collate list box and choose Browse for file from the shortcut menu.

4.

Browse for the file containing the override rules.

3-30

Ascential DataStage NLS Guide

A
NLS and Server Jobs Supplementary Information
This Appendix gives supplementary information about NLS and server jobs.

The NLS Administration Tool


This section gives a complete description of the NLS Administration tool menus. You must be a DataStage Administrator in the DataStage server engine account (UV) to use the menus. To display the main NLS Administration menu, use the NLS.ADMIN command. The NLS Administration menu has the following options: Unicode. This option lets you examine the Unicode character set using various search criteria. Mappings. This option lets you view, create, or modify map descriptions or map tables. Locales. This option lets you view, create, or modify locale definitions. Categories. This option lets you view, create, or modify category files and weight tables.

NLS and Server Jobs - Supplementary Information

A-1

Installation. This option lets you install maps into shared memory or edit the uvconfig file. The options lead to further menus that are described in the following sections.

Unicode Menu
Use the Unicode menu to examine the Unicode character set. The following options are available: Characters. This option leads to a further menu containing the following options: List All descriptions. Provides a very long listing of all the Unicode characters. by Value. Prompts you to enter a Unicode 4-digit hexadecimal value, then returns its description. by Char description. Prompts you to enter a partial description of a character, then returns possible matches. by block Number. Lists all characters in a given Unicode block in Unicode order. by Block descriptions. Lists the Unicode block numbers, the official description of what each block contains, the start and end points in the Unicode set, and the number of characters in the block. Ideograph xref. The start of further levels of menu, which are of interest to multibyte users only. These let you do the following: Display a listing of how the Unicode ideographic area maps to Chinese, Japanese, and Korean standards Search for a character in Unicode, given its external character set reference number Convert between external encodings and standard reference numbers, for example, convert shift-JIS to row and column format Mnemonic search. Looks up entries in the MNEMONICS input map by description. Alphabetics. This option lists the NLS.CS.ALPHAS file. This file contains records that define ranges of code points within which Ascential DataStage NLS Guide

A-2

characters are considered to be alphabetic. Use the Ctype category to modify these ranges. Digits. This option lists the NLS.CS.TYPES file. This file contains records that describe code points normally considered to represent the digits 0 through 9 in different scripts. Use the Numeric category to modify these ranges. Non-printing. This option lists the NLS.CS.TYPES file. This file contains records that describe code points normally considered to be nonprinting characters. Use the Ctype category to modify these ranges. case Rules. This option lists the NLS.CS.CASES file. This file describes the normal rules for converting uppercase to lowercase and lowercase to uppercase for all code points in Unicode. Use the Ctype category to modify these ranges. Exit.

Mappings Menu
Use the Mappings menu to examine, create, and edit map description and map table records, and to compile maps. The following options are available: View. Displays a listing of all map description records. Descriptions. Leads to a submenu for manipulating map descriptions, that is, records in the NLS.MAP.DESCS file. The Xref option produces a cross-reference listing that lets you see which maps and tables are being used as the basis for others. Tables. Leads to a submenu for manipulating map tables, that is, records in the NLS.MAP.TABLES file. From the submenu you can list, create, edit, delete, and cross-reference map tables. Clients. Administers the NLS.CLIENT.MAPS file, which provides synonyms between map names on a client and the DataStage NLS maps on the server. You can list, create, edit, and delete records using this option. Build. Compiles a single map.

NLS and Server Jobs - Supplementary Information

A-3

Locales Menu
Use the Locales menu to examine, create, and edit locale definitions. The following options are available: List All. Lists all the locales that are available in DataStage, that is, all the records in the NLS.LC.ALL file. You may need to build the locales in order to install them into shared memory. View. Prompts you for the name of a locale, then lists the record for that locale. Create. Creates a new locale record. Edit. Edits an existing locale record. Delete. Deletes a locale record Xref. Cross-references a locale. This lets you see the relationship between various locale definitions. Clients. Administers the NLS.CLIENT.LCS file, which provides synonyms between locale names on a client, and the DataStage NLS locales on the server. You can list, create, edit, and delete records using this option. Report. Lets you produce a report on records in locale categories. You can choose from All, Time/date, Numeric, Monetary, Ctype, and Collate. Build. Builds a locale.

Categories Menu
From the Categories menu you can administer the NLS category files for different types of convention. The following options are available: Time/date Numeric Monetary Ctype Collate Weight tables Language info

A-4

Ascential DataStage NLS Guide

The first five options call submenus that let you list, view, create, edit, delete, and cross-reference records in the specific category. The final two options have differences as described below. Weight tables. This option has two additional suboptions as follows: Accent weights. This option lists all the records in the NLS.WT.LOOKUP file that refer to accents. Case weights. This option lists all the records in the NLS.WT.LOOKUP file that refer to casing. Language info. This option administers the NLS.LANG.INFO file and lets you list, view, create, edit, delete, and cross-reference records in the file.

Installation Menu
Use the Installation menu to edit the system configuration file or to install maps in shared memory. The following options are available: Edit uvconfig. This option lets you edit the configurable parameters in the uvconfig file. You can edit all the parameters, or just those referring to NLS, maps, locales, or clients. Maps. This option leads to a further menu with the following options: Configure. Runs the NLS map configuration program. All binaries. Lists all the built maps that are available to be installed into shared memory. In memory. Lists the names of all maps currently installed in shared memory and available for use within DataStage. (re-)Build. Compiles a single map in the same way as the Build option on the Mappings menu. Delete binary. Removes a binary map. This takes effect when DataStage is restarted. Locales. This option leads to a further menu with the following options: Configure. Runs the NLS locale configuration program.

NLS and Server Jobs - Supplementary Information

A-5

All binaries. Lists all the built locales that are available to be installed into shared memory. In memory. Lists the names of all locales currently installed in shared memory and available for use within DataStage. Use this option if the SET.LOCALE command fails with the error locale not loaded. This option lets you identify locales that are built but not loaded. (re-)Build. Compiles a single locale. Delete binary. Removes a binary locale. This takes effect when DataStage is restarted. By language. This option lets you configure NLS by specifying a particular language. The configuration program selects the appropriate locales and maps to be built and an appropriate configuration for the uvconfig file.

The NLS Database


This section describes the files in the NLS database. We recommend that you use the NLS.ADMIN command to perform all NLS administration, but you can list and edit these tables directly if you are familiar with TCL. The NLS database is in the nls subdirectory of the server engine directory. The nls directory contains the subdirectories charset, locales, and maps. Each subdirectory of the NLS directory contains further subdirectories, such as the listing and install subdirectories. listing contains listing information generated when building maps and locales (if the user selects this option). install contains the binary files that are loaded into memory. The VOC names for NLS files start with the prefix NLS (this prefix is absent if you view the files from the operating system). The second part of the filename indicates the logical group that the file belongs to. The logical groups are as follows: These letters CLIENT CS LANG LC Indicate this file group Data received from client programs Information about Unicode character sets Languages Locales

A-6

Ascential DataStage NLS Guide

These letters MAP WT

Indicate this file group Character set maps Weight tables

The third part of the filename indicates the contents of the file. For example, the file called NLS.LC.COLLATE is an NLS file belonging to the locales group that contains information about collating sequences. Table A-1 lists all the files in the NLS database. Table A-1. NLS Database Files File NLS.CLIENT.LCS NLS.CLIENT.MAPS NLS.CS.ALPHAS Description Defines the locales to be used by client programs connecting to DataStage. Defines the character set used by client programs. Defines which characters are defined as alphabetic in the Unicode standard. Each record ID is a hexadecimal code point value that indicates the start of a range of characters. The record itself specifies the last character in the range. These default values can be overridden by a national convention. You should not modify this file; it is for information only. Defines the blocks of consecutive code point values for characters that are normally used together as a set for one or more languages. The record IDs are block numbers. This file is cross-referenced by the NLS.CS.DESCS file. You should not modify this file; it is for information only. Defines those characters that have an uppercase and lowercase version, and how they map between the two, according to the Unicode standard. These default values can be overridden by a national convention. Each record ID is the hexadecimal code point value for a character. You should not modify this file; it is for information only.

NLS.CS.BLOCKS

NLS.CS.CASES

NLS and Server Jobs - Supplementary Information

A-7

Table A-1. NLS Database Files (Continued) File NLS.CS.DESCS Description Contains descriptions of every character supported by DataStage NLS. Each character has its own record, using its hexadecimal code point value as the record ID. The descriptions are based on those used by the Unicode standard. You should not modify this file; it is for information only. Defines which characters are numbers, nonprintable characters, and so on, according to the Unicode standard.These default values can be overridden by a national convention. Each record ID is the hexadecimal code point value for a character. You should not modify this file; it is for information only. Contains information about languages. Provides possible mappings between language, locale and character set map. It is used for installing NLS and reporting on locales, and should not be modified. Holds records for all the locales known to DataStage. The record IDs are the locale names. The fields of each record are the IDs of records in other locale files. These files contain data about the categories that make up a locale (Time, Numeric, and so on). For a description of the record format for this file, see Creating New Locales on page 2-24. Each record in this file defines a collating sequence used by a locale. The collating sequences are defined according to how they differ from the default collating sequence. For a description of the record format for this file, see Format of Convention Records on page A-9. Each record in this file holds character typing information used in a locale, that is, which characters are alphabetic, numeric, lowercase, uppercase, nonprinting, and so on. The character types are defined according to how they differ from the default character typing. For a description of the record format for this file, see Format of Convention Records on page A-9. Each record in this file holds the monetary formatting convention used in a locale. For a description of the record format for this file, see Format of Convention Records on page A-9.

NLS.CS.TYPES

NLS.LANG.INFO

NLS.LC.ALL

NLS.LC.COLLATE

NLS.LC.CTYPE

NLS.LC.MONETARY

A-8

Ascential DataStage NLS Guide

Table A-1. NLS Database Files (Continued) File NLS.LC.NUMERIC Description Each record in this file holds the numeric formatting convention used in a locale. For a description of the record format for this file, see Format of Convention Records on page A-9. Each record in this file holds the time and date formatting convention for a locale. For a description of the record format for this file, see Format of Convention Records on page A-9. Contains descriptions of every map known to DataStage. The record ID of each map is the map name used in DataStage commands or BASIC programs. The record IDs must comprise ASCII-7 characters only. For a description of the record format for this file, see Creating a New Map on page 2-18. A type 19 file that contains the map tables for mapping an external character set to the DataStage internal character set. For more information about the structure of this file, see Creating a New Map on page 2-18. Contains weightings given to characters during a sort, based on the Unicode standard. This file should not be modified. Contains specific weight information about characters used in a locale. For more information about the structure of this file, see Editing Weight Tables on page A-30.

NLS.LC.TIME

NLS.MAP.DESCS

NLS.MAP.TABLES

NLS.WT.LOOKUP

NLS.WT.TABLES

Format of Convention Records


Locales are organized in categories which are in turn made up of a set of conventions. The following sections describe the fields in convention records in the five categories: Time Numeric Monetary Ctype Collate

NLS and Server Jobs - Supplementary Information

A-9

Time Records
The following table shows each field number, its display name, and a description for time and date information: Field 0 1 Name Description Description A description of the convention. It usually includes the territory that the convention applies to and the language it is used with. The name of another convention record that this convention is based on. A format for combined time and date used by the BASIC TIMEDATE function and the TIME command. The value should consist of an MT or TI time conversion code, and a D or DI date conversion code. The two codes can be in any order. They should be separated by a tab character, or a text or subvalue mark. The full combined date and time format used by the TIME command. The value should consist of an MT or TI time conversion code, and a D or DI date conversion code. The two codes can be in any order. They should be separated by a tab character, or a text or subvalue mark. The default date format for the D conversion code. The value should be any D or DI conversion code.

Category Name The name of the convention.

2 3

Based on TIMEDATE format

Full DATE format

Date D format

Date DI format The default date format for the DI conversion code. The value should be a D conversion code. The order is specified by the DMY order (field 23). The separator is specified by the date separator (field 24). Time MT format The default time format for the MT conversion code. The value should be an MT conversion code. In most cases, use the value TI.

A-10

Ascential DataStage NLS Guide

Field 8

Name

Description

Time TI format The format for the TI conversion code. The value should be an MT conversion code that specifies separators. The default separator is a colon (:) as specified by the time separator (field 25). Days of the week A multivalued list of the full names of the days of the week. For example, Monday, Tuesday. Fields 9 and 10 are associated multivalued fields; the same number of values must exist in each field. A multivalued list of abbreviated names of the days of the week. For example, Mon, Tue. See field 9. A multivalued list of the full names of the months of the year. For example, January, February. Fields 11 and 12 are associated multivalued fields; the same number of values must exist in each field. A multivalued list of abbreviated names of the months of the year. For example, Jan, Feb. See field 11. A multivalued list of Chinese year names (Monkey to Sheep). A string used to denote times before noon in 12-hour formats. A string used to denote times after noon in 12-hour formats. A string to be added to dates before the date 01 Jan 0001 in the Gregorian calendar. This corresponds to 718432, the DataStage internal date. A multivalued list of names of eras and their start dates, beginning with the most recent, for example, Japanese Imperial Era Heisei. This field can be used for any locale that uses a calendar with several year zeros. For example, the Thai Buddhist Era commencing 1/1/543 BC. See Defining Era Names on page A-12.

10

Abbreviated

11

Month names

12

Abbreviated

13 14 15 16

Chinese years AM string PM string BC string

17

Era name

NLS and Server Jobs - Supplementary Information

A-11

Field 18

Name Start date

Description Corresponding era start dates for the era names specified in DataStage internal date format. A D or DI conversion code used in HEADING and FOOTING statements. An MT or TI conversion code used in HEADING and FOOTING statements. The date at which the calendar changes from Julian to Gregorian, expressed as a DataStage internal date. The default is 140607, corresponding to 11 January 1583.

19

HEADING/FO OTING D format HEADING/FO OTING T format Gregorian calendar day 1

20

21

22

Number of days The number of days to skip when the skipped calendar changes from Julian to Gregorian. The default is 10. Default DMY order Default date separator Default time separator The order of day, month, and year, for example, DMY. The separator used between day, month, and year. The default is the slash (/). The separator used between hours, minutes, and seconds. The default is the colon (:).

23 24 25

Defining Era Names. The values in the ERA_NAMES field can contain the format code: Name [ %n

] [ string ]

Name is the era name. %n is a digit from 1 through 9, or the characters +, , or Y. string is any text string. The %n syntax allows era year numbers to be included in the era name and indicates how the era year numbers are to be calculated. If %n is omitted, %1 is assumed. The rules for the %n syntax are as follows:

A-12

Ascential DataStage NLS Guide

%1 %9: The number following the % is the number to be used for the first year n of this era. This is effectively an offset which is added to the era year number. This will usually be 1 or 2. %+: The era year numbers count backward relative to year numbers; that is, if era year number 1 corresponds to Julian year Y, year 2 corresponds to Y1, year 3 to Y2, etc. % : The same as for %+, but uses negative era year numbers; that is, first year Y is 1, Y1 is 2, Y2 is 3, and so forth. %Y: Uses the Julian year numbers for the era year numbers. The year number will be displayed as a 4-digit year number. The %+, %, and %Y syntax should only be used in the last era name in the list of era names, that is, the first era, since the list of era names must be in descending date order. string allows any text string to be appended to the era name. It is frequently the case that the first year or part-year of an era is followed by some qualifying characters. Therefore, the actual era is divided into two values, each with the same era name, but one terminated by %1string and the other by %2. You must define the era names accordingly. Example. This example shows the contents of the records named DEFAULT and US-ENGLISH. The US-ENGLISH record is based on the ENGLISH.NAMES record. An empty field specifies that its definition is derived from any category on which it is based. If there is no base category, the default category is used.
Time/Date Conventions for Locale DEFAULT Category name............ DEFAULT Description.............. System defaults Based on................. TIMEDATE format.......... MTS . D4 Full DATE format......... D4WAMADY[", ", " ", ", "] . MT Date 'D' format.......... D4 DMBY Date 'DI' format......... D2-YMD Time 'MT' format......... TI Time 'TI' format......... MTS: Days of the week................... Abbreviated......... Sunday Sun

NLS and Server Jobs - Supplementary Information

A-13

Monday Mon Tuesday Tue Wednesday Wed Thursday Thu Friday Fri Saturday Sat Month names........................ Abbreviated........ January Jan February Feb March Mar April Apr May May June Jun July Jul August Aug September Sep October Oct November Nov December Dec Chinese years............ MONKEY . COCK . DOG . BOAR . RAT . OX . TIGER . RABBIT . DRAGON . SNAKE . HORSE . SHEEP AM string................ am PM string................ pm BC string................ BC Era name................................ Start date.... Heisi 08 JAN 1989 Showa 25 DEC 1926 Taisho 30 JUL 1912 Meiji 08 SEP 1868 HEADING/FOOTING D format. D2HEADING/FOOTING T format. MTS . D2Gregorian calendar day 1. 11 JAN 1583 Number of days skipped... 10 Default DMY order........

A-14

Ascential DataStage NLS Guide

Default date separator... Default time separator...

Time/Date Conventions for US-ENGLISH Category name............ US-ENGLISH Description.............. Territory=USA,Language=English Based on................. .ENGLISH.NAMES TIMEDATE format.......... Full DATE format......... Date 'D' format.......... Date 'DI' format......... D2/MDY Time 'MT' format......... Time 'TI' format......... MTHS: Days of the week.............Abbreviated......... Month names..................... Abbreviated......... Chinese years............ AM string................ PM string................ BC string................ Era name................................ Start date.... HEADING/FOOTING D format. HEADING/FOOTING T format. Gregorian calendar day 1. Number of days skipped... Default DMY order........ MDY Default date separator... Default time separator...

Numeric Records
The following table shows each field number, its display name, and a description: Field 0 1 Name Category Name Description Description The name of the convention. A description of the convention. It usually includes the territory that the convention applies to and the language it is used with.

NLS and Server Jobs - Supplementary Information

A-15

Field 2 3

Name Based on Decimal separator

Description The name of another convention record that this convention is based on. The character used as a decimal separator (radix character). The value can be expressed as either a single character or the hexadecimal Unicode value of a character. The character used as a thousands separator. The value can be expressed as either a single character or the hexadecimal Unicode value of a character. Use the value NONE to indicate that no separator is needed. Defines whether leading zeros should be suppressed for numbers in the range 1 through 1. A value of 0 or N means insert a zero; any other value suppresses the zero. A multivalued field containing 10 values that can be used as alternatives to the corresponding ASCII digits 0 through 9.

Thousands separator

Suppress leading zero

Alternative digits (0 first)

This example shows the contents of the records named DEFAULT and DEC.COMMA+DOT locale (used by DE-GERMAN) in the NLS.LC.NUMERIC file. The DEC.COMMA+DOT conventions are based on DEFAULT.
Numeric Conventions for DEFAULT Category name..... Description....... DEFAULT System defaults: Decimal separator = dot, thousands = comma Based on.............. Decimal separator..... . - FULL STOP Thousands separator... , - COMMA Suppress leading zero. 0 Alternative digits (0 first).

Numeric Conventions for DEC.COMMA+DOT Category name......DEC.COMMA+DOT Description........Decimal separator = comma, thousands = dot

A-16

Ascential DataStage NLS Guide

Based on.............. DEFAULT Decimal separator..... , Thousands separator... . Suppress leading zero. Alternative digits (0 first).

COMMA FULL STOP

Monetary Records
Convention records in the Monetary category are stored in the NLS.LC.MONETARY file. The following table shows each field number, its display name, and a description: Field 0 1 Name Category Name Description Description The name of the convention. A description of the convention. It usually includes the territory that the convention applies to and the language it is used with. The name of another convention record that this category is based on. The character used as a decimal separator (radix character). You do not need to specify a value if this character is the same as the one in the decimal separator field in the corresponding numeric convention. The character used as a thousands separator. You do not need to specify a value if this character is the same as the one in the thousands separator field in the corresponding numeric convention. A character or string used as the local currency symbol, for example, $ or . Leading or trailing spaces are not included. The international currency symbol. The value should consist of three uppercase ASCII characters as specified in the ISO 4217 standard. For example, USD. Trailing spaces are included. This symbol always precedes the amount it refers to.

2 3

Based on Monetary decimal separator

Monetary thousands separator

Local currency symbol

International currency symbol

NLS and Server Jobs - Supplementary Information

A-17

Field 7

Name Decimal places

Description The number of decimal places in monetary amounts when the local currency symbol is used. The number of decimal places in monetary amounts when used with the international currency symbol (field 6). The sign used to indicate positive monetary amounts. If the value consists of two characters, these are used to parenthesize positive monetary amounts (one used at either end of the monetary format). Use the value NONE to omit a positive sign. The sign used to indicate negative monetary amounts. If the value consists of two characters, these are used to parenthesize negative monetary amounts (one used at either end of the monetary format). Use the value NONE to omit a negative sign. The format for positive monetary amounts. This is expressed using a combination of the characters $ S + 1 and a space. The $ or S represents the local currency symbol. 1 represents the monetary amount. + represents the positive sign. If the positive sign (field 9) contains two characters, the + sign is ignored. For example, the value $1 in a US locale results in the format $1,234.56. The value 1 $ in a GERMAN locale results in the format 1.234,56 DM.

International decimal places Positive sign

10

Negative sign

11

Positive currency format

A-18

Ascential DataStage NLS Guide

Field 12

Name Negative currency format

Description The format for negative monetary amounts. This is expressed using a combination of the characters $ S 1 and a space. The $ or S represents the local currency symbol. 1 represents the monetary amount. represents the negative sign. If the negative sign (field 10) contains two characters the sign is ignored. For example, the value $1 in a PORTUGUESE locale results in the format 1,234$56. The value $ 1 in a DUTCH locale results in the format F1 1.234,56.

This example shows the contents of the record named DEFAULT followed by records for NETHERLANDS, ITALY, NORWAY and PORTUGAL, which show different combinations of fields:
Numeric Conventions for DEFAULT Category name................. Description................... Based on...................... Monetary decimal separator.... Monetary thousands separator.. Local currency symbol......... International currency symbol. Decimal places................ International decimal places.. Positive sign................. Negative sign................. Positive currency format...... Negative currency format...... DEFAULT System defaults . , $ USD<SP> 2 2 NONE S1 S-1 FULL STOP COMMA DOLLAR SIGN

HYPHEN-MINUS

Monetary Conventions for NETHERLANDS Category name................. Description................... Based on...................... Monetary decimal separator.... Monetary thousands separator.. Local currency symbol......... International currency symbol. NETHERLANDS Territory=Netherlands , . Fl NLG<SP> COMMA FULL STOP

NLS and Server Jobs - Supplementary Information

A-19

Decimal places................ International decimal places.. Positive sign................. Negative sign................. Positive currency format...... Negative currency format......

2 2 NONE S 1 S 1-

HYPHEN-MINUS

Monetary Conventions for ITALY Category name................. Description................... Based on...................... Monetary decimal separator.... Monetary thousands separator.. Local currency symbol......... International currency symbol. Decimal places................ International decimal places.. Positive sign................. Negative sign................. Positive currency format...... Negative currency format...... ITALY Territory=Italy , . L. ITL. 0 2 NONE S1 -S1 COMMA FULL STOP

HYPHEN-MINUS

Monetary Conventions for NORWAY Category name................. Description................... Based on...................... Monetary decimal separator.... Monetary thousands separator.. Local currency symbol......... International currency symbol. Decimal places................ International decimal places.. Positive sign................. Negative sign................. Positive currency format...... Negative currency format...... NORWAY Territory=Norway , . kr NOK<SP> 2 2 NONE S1 S1COMMA FULL STOP

HYPHEN-MINUS

Monetary Conventions for PORTUGAL Category name............... PORTUGAL Description................... Territory=Portugal

A-20

Ascential DataStage NLS Guide

Based on...................... Monetary decimal separator.... Monetary thousands separator.. Local currency symbol......... International currency symbol. Decimal places................ International decimal places.. Positive sign................. Negative sign................. Positive currency format...... Negative currency format......

$ . NONE PTE<SP> 2 2 NONE 1 S -1 S

DOLLAR SIGN FULL STOP

HYPHEN-MINUS

The following table shows how the data in the previous records affect monetary formats: Locale Name DEFAULT NETHERLANDS ITALY (see Note) NORWAY PORTUGAL Positive Format Negative Format $1,234.56 Fl 1.234,56 L.1.234 kr1.234,56 1.234$56 $1,234.56 Fl 1.234,56 L.1.234 kr1.234,56 1.234$56 International Format USD 1,234.56 NLG 1.234,56 ITL.1.234 NOK 1.234,56 PTE 1,234$56

Note: Italian lire are usually quoted in whole numbers only. Your programs must detect that the DEC_PLACES and INTL_DEC_PLACES fields contain zero in this case and not hard code an MD2 conversion. An MM conversion handles the scaling automatically.

Ctype Records
The following table shows each field number, its display name, and a description for fields in the Ctype record. Many of the defaults are based directly on Unicode settings. These can be viewed by choosing the appropriate item from the Unicode menu in the NLS Administration tool. Note: For fields 3 onward, you can enter the values as characters or as Unicode values. You can specify a range of values separated by a dash (). Field 0 Name Category Name Description The name of the convention.

NLS and Server Jobs - Supplementary Information

A-21

Field 1

Name Description

Description A description of the convention. It usually includes the territory that the convention applies to and the language it is used with. The name of another convention record that this convention is based on. A multivalued list of lowercase values whose associated uppercase values differ from the Unicode defaults. A multivalued list of the equivalent uppercase values for the characters listed in field 3. A multivalued list of uppercase values whose associated lowercase values differ from the Unicode defaults. A mutivalued list of the equivalent lowercase values for the characters listed in field 5. A multivalued list of characters that are alphabetic but are not described as such under the Unicode defaults. You can specify this value as a Unicode block value using the format BLOCK=nn, where nn is the Unicode block number. A multivalued list of characters that are not alphabetic but are described as such under the Unicode defaults. You can specify this value as a Unicode block value using the format BLOCK=nn, where nn is the Unicode block number. A multivalued list of characters that should be considered as numeric but are not described as such under the Unicode defaults. A multivalued list of characters that are not considered to be numeric but are described as such under the Unicode defaults.

2 3

Based on Lowercase

->Upper

Uppercase

->Lower

Alphabetics

Non-Alphabetics

Numerics

10

Non-Numerics

A-22

Ascential DataStage NLS Guide

Field 11

Name Printables

Description A multivalued list of characters that are considered to be printable but are not described as such under the Unicode defaults. A multivalued list of characters that are not considered to be printable but are described as such under the Unicode defaults. A multivalued list of characters that are to be removed by TRIM functions in addition to spaces and tab characters.

12

Non-Printables

13

Trimmables

In Spanish, accented characters other than drop their accents when converted to uppercase. In French, all accented characters drop their accents in uppercase. This example shows a convention called NOACCENT.UPCASE (based on DEFAULT), which the locale FR-FRENCH uses, and a convention called SPANISH, that is based on it. Note: In this example, the only characters affected are those in general use in French and Spanish. There are many other accented characters in Unicode. This example displays <N?> that comes from the MNEMONICS map. This lets you easily enter non-ASCII characters rather than their Unicode values.
Character Type Conventions for ACCENTLESS.UPPERCASE Category name. NOACCENT.UPCASE Description... ISO8859-1 lowercase accented chars lose accents in uppercase Based on...... DEFAULT Lowercase.............................. -> Uppercase........................... 00E0 - LATIN SMALL LETTER A WITH GRAVE 0041 - LATIN CAPITAL 00E1 - LATIN SMALL LETTER A WITH ACUTE 0041 - LATIN CAPITAL 00E2 - LATIN SMALL LETTER A WITH 0041 - LATIN CAPITAL CIRCUMFLEX 00E3 - LATIN SMALL LETTER A WITH TILDE 0041 - LATIN CAPITAL 00E4 - LATIN SMALL LETTER A WITH 0041 - LATIN CAPITAL DIAERESIS 00E5 - LATIN SMALL LETTER A WITH RING 0041 - LATIN CAPITAL ABOVE 00E7 - LATIN SMALL LETTER C WITH 0043 - LATIN CAPITAL CEDILLA 00E8 - LATIN SMALL LETTER E WITH GRAVE 0045 - LATIN CAPITAL 00E9 - LATIN SMALL LETTER E WITH ACUTE 0045 - LATIN CAPITAL 00EA - LATIN SMALL LETTER E WITH 0045 - LATIN CAPITAL CIRCUMFLEX 00EB - LATIN SMALL LETTER E WITH 0045 - LATIN CAPITAL

LETTER A LETTER A LETTER A LETTER A LETTER A LETTER A LETTER C LETTER E LETTER E LETTER E LETTER E

NLS and Server Jobs - Supplementary Information

A-23

DIAERESIS 00EC - LATIN SMALL LETTER I WITH GRAVE 00ED - LATIN SMALL LETTER I WITH ACUTE 00EE - LATIN SMALL LETTER I WITH CIRCUMFLEX 00EF - LATIN SMALL LETTER I WITH DIAERESIS 00F1 - LATIN SMALL LETTER N WITH TILDE 00F2 - LATIN SMALL LETTER O WITH GRAVE 00F3 - LATIN SMALL LETTER O WITH ACUTE 00F4 - LATIN SMALL LETTER O WITH CIRCUMFLEX 00F5 - LATIN SMALL LETTER O WITH TILDE 00F6 - LATIN SMALL LETTER O WITH DIAERESIS 00F8 - LATIN SMALL LETTER O WITH STROKE 00F9 - LATIN SMALL LETTER U WITH GRAVE 00FA - LATIN SMALL LETTER U WITH ACUTE 00FB - LATIN SMALL LETTER U WITH CIRCUMFLEX 00FC - LATIN SMALL LETTER U WITH DIAERESIS 00FD - LATIN SMALL LETTER Y WITH ACUTE 00FF - LATIN SMALL LETTER Y WITH DIAERESIS Uppercase.............................. Alphabetics..... Non-Alphabetics. Numerics........ Non-Numerics.... Printables...... Non-Printables.. Trimmables......

0049 - LATIN CAPITAL LETTER I 0049 - LATIN CAPITAL LETTER I 0049 - LATIN CAPITAL LETTER I 0049 - LATIN CAPITAL LETTER I 004E 004F 004F 004F LATIN LATIN LATIN LATIN CAPITAL CAPITAL CAPITAL CAPITAL LETTER LETTER LETTER LETTER N O O O

004F - LATIN CAPITAL LETTER O 004F - LATIN CAPITAL LETTER O 004F 0055 0055 0055 LATIN LATIN LATIN LATIN CAPITAL CAPITAL CAPITAL CAPITAL LETTER LETTER LETTER LETTER O U U U

0055 - LATIN CAPITAL LETTER U 0059 - LATIN CAPITAL LETTER Y 0059 - LATIN CAPITAL LETTER Y -> Lowercase................

Character Type Conventions for SPANISH Category name. SPANISH Description... Language=Spanish - SMALL N WITH TILDE keeps tilde on uppercasing Based on...... NOACCENT.UPCASE Lowercase.............................. -> Uppercase........................... <n?> - LATIN SMALL LETTER N WITH TILDE <N?> - LATIN CAPITAL LETTER N WITH TILDE Uppercase.............................. -> Lowercase........................... Alphabetics..... Non-Alphabetics. Numerics........ Non-Numerics.... Printables...... Non-Printables.. Trimmables......

Collate Records
The following table shows each field number, its display name, and a description for Collate category records. Many of the fields are Boolean.

A-24

Ascential DataStage NLS Guide

An empty field or a value of 0 or N indicates false; any other value indicates true. Field 0 1 Name Category Name Description Description The name of the convention. A description of the convention. It usually includes the territory that the convention applies to and the language it is used with. The name of another convention record that this convention is based on. This field determines how accents on characters affect the collate order. A false value indicates that accents are not collated separately. A true value indicates that accents are used as tie breakers in the sort. See Collating on page A-28. If field 3 indicates an accented collation, this field determines the direction of that collation. A false value indicates forward collation. A true value indicates reverse collation. This field determines whether the case of a character is considered during collation. A false value indicates that case is not considered. A true value indicates that case is used as a tie breaker in the collation. If field 5 indicates a cased collation, this field determines which case is collated first. A false value indicates that lowercase is collated first. A true value indicates that uppercase is collated first. A multivalued field containing Unicode values of characters that are expanded before collation. See Contractions and Expansions on page A-30.

2 3

Based on Accented Sort?

In reverse?

Cased Sort?

Lowercase first?

Expand

NLS and Server Jobs - Supplementary Information

A-25

Field 8

Name Expanded

Description A multivalued field associated with field 7 that supplies the values the characters expand to. Each value may be one or more Unicode values separated by tab characters or spaces. To override an expansion inherited from a based convention named in field 2, enter the same multivalue in fields 7 and 8. (For another method, see the description of field 10.) A multivalued field associated with fields 7 and 8 that determines how expanded characters collate. A false value indicates that a character is collated after expansion; a true value indicates that a character is collated before expansion. A multivalued field containing a list of pairs of Unicode values of characters after contraction. The values should be separated by tab characters or spaces. To override an expansion inherited from a based convention named in field 2, enter a value in this field and a corresponding empty value in field 11. See Contractions and Expansions on page A-30. A multivalued field associated with field 10. It gives the Unicode value of the character that a contracted pair precedes in the collation order. A multivalued field supplying the weight information for characters in this locale. The values should be record IDs in the NLS.WT.TABLES file. The default is the name of the locale. The weight information is processed in the order supplied in this field.

Before?

10

Contract

11

Before

12

Weight Tables

This example shows the Collate records named DEFAULT, GERMAN, and SPANISH: DEFAULT uses no expansion or contraction, but does collate in a sequence other than the Unicode value.

A-26

Ascential DataStage NLS Guide

GERMAN uses the DEFAULT collating sequence, but introduces an expansion. SPANISH is also based on DEFAULT, but introduces eight contractions.
Collating Sequence Conventions for DEFAULT Category name.... DEFAULT Description...... System defaults Based on......... Accented Sort?... N In reverse?...... N Cased Sort?...... N Lowercase first?. N Expand -------------------->..... Before? Expanded.. .......................... Contract... ----------------------->..... Before .............................. Weight Tables.... . . . . . LATIN1-DEFAULT LATINX-DEFAULT LATINX2-DEFAULT LATINX3-DEFAULT GREEK-DEFAULT CYRILLIC-DEFAULT

Collating Sequence Conventions for GERMAN Category name.... GERMAN Description...... Language=German Based on......... DEFAULT Accented Sort?... Y In reverse?...... N Cased Sort?...... Y Lowercase first?. N Expand -------------------->..... Before? Expanded.. .......................... <ss> LATIN SMALL LETTER SHARP S N S S LATIN CAPITAL LETTER S LATIN CAPITAL LETTER S Contract... ----------------------->..... Before .............................. Weight Tables....

Collating Sequence Conventions for SPANISH Category name.... SPANISH Description...... Language=Spanish Based on......... DEFAULT Accented Sort?... Y In reverse?...... N Cased Sort?...... Y Lowercase first?. N Expand -------------------->..... Before? Expanded.. ..........................

NLS and Server Jobs - Supplementary Information

A-27

Contract... ----------------------->..... .............................. C H LATIN CAPITAL LETTER C LATIN CAPITAL LETTER H C h LATIN CAPITAL LETTER C c h LATIN SMALL LETTER C LATIN SMALL LETTER H c H LATIN SMALL LETTER C LATIN CAPITAL LETTER H L L LATIN CAPITAL LETTER L LATIN CAPITAL LETTER L L l LATIN CAPITAL LETTER L LATIN SMALL LETTER L l l LATIN SMALL LETTER L LATIN SMALL LETTER L l L LATIN SMALL LETTER L LATIN CAPITAL LETTER L Weight Tables.... LATIN-SPANISH

Before D D d d M M m m LATIN CAPITAL LETTER D LATIN CAPITAL LETTER D LATIN SMALL LETTER D LATIN SMALL LETTER D LATIN CAPITAL LETTER M LATIN CAPITAL LETTER M LATIN SMALL LETTER M LATIN SMALL LETTER M

Collating
Collating is a complex issue for many languages. It is not sufficient to collate a character set in numerical order of its Unicode values. Locales that share a character set often have different collating rules. For example, these are the main issues that affect collating in Western European languages: Accented characters. Should accented characters come before or after their unaccented equivalents? Or should accents only be examined if two strings being compared would otherwise be identical (that is, as a tie breaker)? Expanding characters. Some languages treat certain single characters as two separate characters for collating purposes. Contracting characters. Some languages have pairs of characters that collate as though they were a single character. Should case be considered? Should case be used as a tie breaker for otherwise identical strings? If so, which comes first, uppercase or lowercase? Should hyphens or other punctuation be considered as tie breakers?

How DataStage Collates


To overcome these collating problems, DataStage allows each Unicode character to be assigned up to three weights. The weight is a numeric

A-28

Ascential DataStage NLS Guide

value to use instead of the character during collation. The three weights are as follows: Shared weight All characters that are essentially the same have the same shared weight, even though they may differ in accent or case. Accent weight This weight shows the order of precedence for accented characters. The Collate convention determines the direction of the collation. Case weight This weight differentiates between uppercase and lowercase characters. The Collate convention determines which case has precedence.

Before collation begins, DataStage expands or contracts any characters as defined in the Collate convention. The collation works as follows: 1. 2. 3. The characters are compared by shared weight. If two characters have the same shared weight, they are compared by accent weight. If the accent weight is the same, they are compared by case weight.

Example of Accented Collation


This table compares how four French words that differ only in their accents are collated in two different ways, depending on how the weight tables have been configured: Order 1 2 3 4 Accented Collation cote cte cot ct Unaccented Collation cote cot cte ct

In the accented collation, the words are in the order they would be found in a French dictionary. (It is actually a reverse accented collation.) Each accented character has the same shared weight as it would have without the accent. The order is decided by referring to the accent weight. In the unaccented collation, each accented character has a different shared weight unrelated to its unaccented equivalent. The order is decided by the shared weight alone.

NLS and Server Jobs - Supplementary Information

A-29

Example of Cased Collation


The three words Aaron, Aardvark, and aardvark show how case affects collation: Order 1 2 3 Cased Collation Aardvark aardvark Aaron Uncased Collation Aardvark Aaron aardvark

In the cased collation, Aaron follows aardvark because the characters A and a have the same shared weight. The case weight is only considered for the two strings that are otherwise identical, that is, Aardvark and aardvark. In the uncased collation, Aaron precedes aardvark because the characters A and a have different shared weights.

Shared Weights and Blocks


Unicode is divided into blocks of related characters. For example, Cyrillic characters form one block, while Hebrew characters form another. In most circumstances, it is unlikely that you need to collate characters from more than one block at a time. Shared weights are assigned so that characters collate correctly within each Unicode block.

Contractions and Expansions


Some languages have pairs of characters that collate as though they were a single character. Other languages treat certain single characters as two separate characters for collating. These contractions and expansions are done before DataStage begins a collation. For example, in Spanish, the character pairs CH and LL (in any combination of case) are treated as a single, separate character. CH comes between C and D in the collating sequence, and LL comes between L and M. DataStage identifies these character pairs before collation begins. In German, the character is expanded to SS before collation begins.

Editing Weight Tables


Collating character sets in different languages is a complex issue. Each character has an assigned weight value used for numeric comparisons in

A-30

Ascential DataStage NLS Guide

sorting, but you can change these weight values to sort in a different way when you want to customize your locale. You can edit the weight table for a locale by choosing Categories Weight Tables Edit from the NLS Administration menu. Any change you make to the weight assigned to a character overrides the default weight derived from its Unicode value. The weights are held in the NLS.WT.TABLES file, which is a type 19 file. Each record in the file can contain: Comment lines, introduced by a # or * A set of weight values for a Unicode code point Each weight value line has the following fields, separated by at least one ASCII space or tab character: character [block.weight / ] shared.weight accent.weight case.weight

[comments]

character is a Unicode character value. This should be four hexadecimal digits, zero-filled as necessary. The block.weight / shared.weight value is one or two decimal integers, separated by a slash ( / ) if necessary. block.weight can be 1 through 127; shared.weight 1 through 32767. If block.weight is omitted, it is taken as the value of the Unicode block number to which character belongs. shared.weight may be given as a hyphen, in which case it is taken as the value of the most recent weight value line without a hyphen for shared.weight. Characters that should sort together if accents and case are disregarded should have the same block.weight / shared.weight value. accent.weight is a decimal integer 1 through 63. It may be given as a hyphen, in which case it is taken as the value of the most recent weight value line without a hyphen for accent.weight. Characters that are distinguished only by accent should have the same block.weight / shared.weight value and differ in their accent.weight value. A list of conventional values to assign to this field can be found by listing records starting with AW in the NLS.WT.LOOKUP file. case.weight is a decimal integer 1 through 7, or the letter U or L to indicate uppercase and lowercase. case.weight can be given as a hyphen, in which case it is taken as the value of the most recent weight value line without a hyphen for case.weight. Characters that are distinguished only by case should have the same block.weight / shared.weight value and accent.weight value and differ only in their case.weight value. A list of conventional

NLS and Server Jobs - Supplementary Information

A-31

values to assign to this field can be found by listing records starting with CW in the NLS.WT.LOOKUP file. comments can contain any characters.

Calculating the Overall Weight


The overall weight assigned to character is calculated using the following formula: ( block.weight x 224 ) + ( shared.weight x 29 ) + ( accent.weight x 23 ) + case.weight If character is not mentioned in a table, the default weight is calculated as follows: ( BW x 224 ) + ( SW x 29 ) BW is the characters Unicode block number. SW depends on its position within the block: the first character has a SW of 1, the second a SW of 2, and so on.

Example of a Weight Table


This example shows a weight table for collating Turkish characters:
* Sorting weight table for TURKISH characters (from ISO8859/9) * in order on top of LATIN1/LATINX tables. These characters are: * * Between G and H: G BREVE * Between H and J: I WITH DOT ABOVE (uppercase version of SMALL I 0069) * DOTLESS I (lowercase version of CAPITAL I 0049) * (Note: the sequence is H, dotless I, I dot + accented versions, J, ...) * Between S and T: S CEDILLA * * SYNTAX: * Each non-comment line gives one or more weights for a character,as * follows (character value in hex, weights in decimal): * Field 1 = Unicode character value * Field 2 = Shared weight (characters that sort together if * accents and case were to be disregarded should * have the same SW) * Or, Block Weight/Shared Weight. This form allows * characters in different Unicode blocks to have * equal SWs. If BW is omitted, only SWs for characters in * the same block are equal.

A-32

Ascential DataStage NLS Guide

* Field 3 = Accent weight, or '-' to omit or copy from previous. * Please use values as defined in the file NLS.WT.LOOKUP. * Field 4 = Case weight, or 'U' for upper and 'L' for lower case chars. * ************************************************************** * HEX (BW/)SW AW CW * After G: 011E 4/1092 5 U * G WITH BREVE 011F 5 L * I, dotted and undotted: * (Note we do not use AWs here, but use SWs to differentiate * these characters from the unaccented versions.) 0049 4/1109 U * I 0131 L * DOTLESS I 0130 4/1110 U * I WITH DOT ABOVE 0069 L * I * S cedilla 015E 4/1232 40 U * S WITH CEDILLA 015F 40 L * * END

NLS and Server Jobs - Supplementary Information

A-33

A-34

Ascential DataStage NLS Guide

B
Maps and Locales Supplied with DataStage
This appendix provides lists of the character set maps and locales that are supplied with DataStage.

Server Job Character Set Maps


The following list shows all the maps for major character sets used worldwide that are supplied with DataStage for use with server jobs. The left column contains the name of the map, the middle column contains the name of the map

Maps and Locales Supplied with DataStage

B-1

table used by the map (in NLS.MAP.TABLES), and the right column contains a description of the map. Character Set ASCII ASCII+C1 ASCII+MARKS BIG5 C0-CONTROLS C1-CONTROLS EBCDIC Table Name ASCII ASCII UV-MARKS BIG5 C0-CONTROLS C1-CONTROLS EBCDIC Description Standard ASCII 7-bit set ASCII 7-bit + C1 control chars Std ASCII 7-bit set for type 1&19 files w/ marks AIWAN: "Big 5" standard Standard ISO2022 C0 control set, chars 00-1F+7F Standard 8-bit ISO control set, 80-9F IBM EBCDIC as implemented by standard uniVerse - control chars only IBM EBCDIC variant 037 IBM EBCDIC variant 1026 (Turkish) IBM EBCDIC variant 500V1 IBM EBCDIC variant 875 (Greek) IBM EBCDIC as implemented by standard uniVerse - control chars only CHINESE: EUC as described by GB 2312 Standard ISO8859 part 1: Latin-1 Standard ISO8859 part 1: Latin-1 for type 1& 19 files with marks Standard ISO8859 part 10: Latin-6 Standard ISO8859 part 2: Latin-2 Standard ISO8859 part 3: Latin-3 Standard ISO8859 part 4: Latin-4 Standard ISO8859 part 5: LatinCyrillic

EBCDIC-037 EBCDIC-1026 EBCDIC-500V1 EBCDIC-875 EBCDIC-CTRLS

EBCDIC-037 EBCDIC-1026 EBCDIC-500V1 EBCDIC-875 EBCDIC-CTRLS

GB2312 ISO8859-1 ISO88591+MARKS ISO8859-10 ISO8859-2 ISO8859-3 ISO8859-4 ISO8859-5

GB2312-80 ISO8859-1 ISO88591+MARKS ISO8859-10 ISO8859-2 ISO8859-3 ISO8859-4 ISO8859-5

B-2

Ascential DataStage NLS Guide

Character Set ISO8859-6 ISO8859-7 ISO8859-8 ISO8859-9 JIS-EUC JIS-EUC+ JIS-EUC-HWK JIS-EUC2 JIS-EUC2-C0 JIS-EUC2-C1 JIS-EUC2-HWK JIS-EUC2-MARKS JIS-EUC2-ROMAN JISX0201 KOI8-R KSC5601 MAC-GREEK MAC-GREEK2

Table Name ISO8859-6 ISO8859-7 ISO8859-8 ISO8859-9 JISX0208 JISX0212 JISX0201-K JISX0208 C0-CONTROLS C1-CONTROLS JISX0201-K JIS-EUC2-MARKS JISX0201-A JISX0201-K KOI8-R KSC5601 MAC-GREEK MAC-GREEK2

Description Standard ISO8859 part 6: LatinArabic Standard ISO8859 part 7: LatinGreek Standard ISO8859 part 8: LatinHebrew Standard ISO8859 part 5: Latin-5 JAPANESE: EUC excluding JIS X 0212 Kanji JAPANESE: EUC including JIS X 0212 Kanji JAPANESE: 1/2 width katakana for JIS-EUC JAPANESE: EUC fixed width excluding JIS X 02 12 kanji JAPANESE: EUC2 fixed width C0 control chars JAPANESE: EUC fixed width C1 control chars JAPANESE: EUC fixed width representation of 1 /2 width katakana JAPANESE: EUC2 fixed width mark characters (external form JAPANESE: Variant of 7-bit ASCII JAPANESE: Single-byte set, 1/2 width katakana + ASCII KOI8-R Russian/Cyrillic set #KOREAN: Wansung code as described by KS C 5601-1987 Apple Macintosh Greek Repertoire (like ISO8859-7) Apple Macintosh Greek Repertoire based on APPLE II

Maps and Locales Supplied with DataStage

B-3

Character Set MAC-ROMAN MNEMONICS MNEMONICS-1 MS1250 MS1251 MS1252 MS1253 MS1254 MS1255 MS1256 PC1040 PC1041 PC437 PC850 PC852 PC855 PC857 PC860 PC861 PC863 PC864 PC865 PC866

Table Name MAC-ROMAN

Description Apple Macintosh Roman character set, based on ASCII ASCII mnemonics for many Unicodes, based on UTF8

ISO8859-1 MS1250 MS1251 MS1252 MS1253 MS1254 MS1255 MS1256 PC1040 PC1041 PC437 PC850 PC852 PC855 PC857 PC860 PC861 PC863 PC864 PC865 PC866

As for MNEMONICS, but ISO8859-1 capable MS Windows code page 1250 (Latin 2) MS Windows code page 1251 (Cyrillic) MS Windows code page 1252 (Latin 1) MS Windows code page 1253 (Greek) MS Windows code page 1254 (Turkish) MS Windows code page 1255 (Hebrew) MS Windows code page 1256 (Arabic) PC DOS code page 1040 (Korean) PC DOS code page 1041 (Japanese) PC DOS code page 437 (US) PC DOS code page 850 (Latin 1) PC DOS code page 852 (Latin 2) PC DOS code page 855 (Cyrillic) PC DOS code page 857 (Turkish) PC DOS code page 860 (Portuguese) PC DOS code page 861 (Icelandic) PC DOS code page 863 (Canada-Fr) PC DOS code page 864 (Arabic) PC DOS code page 865 (Nordic) PC DOS code page 866 (Cyrillic)

B-4

Ascential DataStage NLS Guide

Character Set PC869 PIECS PRIME-SHIFT-JIS SHIFT-JIS TAU-SHIFT-JIS TIS620 TIS620-B

Table Name PC869 PIECS PJISX0208 SJISX0208 TJISX0208 TIS620-A TIS620-B

Description PC DOS code page 869 (Greek) PI and PI/open Extended Character Set JAPANESE: Shift-JIS main map (Prime variant) JAPANESE: Shift-JIS main map JAPANESE: Shift-JIS main map (Tau variant) THAI: standard TIS 620 ("Thai ASCII") Non-spacing characters part of TIS620 (Thai)

Server Job Locales


The following list shows the locales supplied with DataStage for use with server jobs, the territory that uses each locale, and the relevant language: Locale AR-SPANISH AT-GERMAN AU-ENGLISH BE-DUTCH BE-FRENCH BE-GERMAN BG-BULGARIAN BO-SPANISH BR-PORTUGUESE CA-ENGLISH CA-FRENCH CH-FRENCH CH-GERMAN Description Territory=Argentina, Language=Spanish Territory=Austria, Language=German Territory=Australia, Language=English Territory=Belgium, Language=Dutch Territory=Belgium, Language=French Territory=Belgium, Language=German Territory=Bulgaria, Language=Bulgarian Territory=Bolivia, Language=Spanish Territory=Brazil, Language=Portuguese Territory=Canada, Language=English Territory=Canada, Language=French Territory=Switzerland, Language=French Territory=Switzerland, Language=German

Maps and Locales Supplied with DataStage

B-5

Locale CH-ITALIAN CL-SPANISH CN-CHINESE CO-SPANISH CR-SPANISH CZ-CZECH DE-GERMAN DK-DANISH DO-SPANISH EC-SPANISH EV-SPANISH FI-FINNISH FO-FAEROESE FR-FRENCH GB-ENGLISH GL-GREENLANDIC GR-GREEK GT-SPANISH HN-SPANISH HR-CROATIAN HU-HUNGARIAN IE-ENGLISH IL-ENGLISH IL-HEBREW IS-ICELANDIC IT-ITALIAN JP-JAPANESE KP-KOREAN

Description Territory=Switzerland, Language=Italian Territory=Chile, Language=Spanish Territory=China (PRC), Language=Chinese Territory=Colombia, Language=Spanish Territory=Costa Rica, Language=Spanish Territory=Czech Republic, Language=Czech Territory=Germany, Language=German Territory=Denmark, Language=Danish Territory=Dominican Republic, Language=Spanish Territory=Ecuador, Language=Spanish Territory=El Salvador, Language=Spanish Territory=Finland, Language=Finnish Territory=Faeroe Islands, Language=Faeroese Territory=France, Language=French Territory=UK, Language=English Territory=Greenland, Language=Greenlandic Territory=Greece, Language=Greek Territory=Guatemala, Language=Spanish Territory=Honduras, Language=Spanish Territory=Croatia, Language=Croatian Territory=Hungary, Language=Hungarian Territory=Ireland, Language=English Territory=Israel, Language=English Territory=Israel, Language=Hebrew Territory=Iceland, Language=Icelandic Territory=Italy, Language=Italian Territory=Japan, Language=Japanese Territory=Democratic People's Republic of Korea (NORTH), Language=Korean

B-6

Ascential DataStage NLS Guide

Locale KR-KOREAN LT-LITHUANIAN LV-LATVIAN MX-SPANISH NL-DUTCH NO-NORWEGIAN NZ-ENGLISH PA-SPANISH PE-SPANISH PL-POLISH PT-PORTUGUESE RO-ROMANIAN RU-RUSSIAN SE-SWEDISH SI-SLOVENIAN TR-TURKISH TW-CHINESE US-ENGLISH UY-SPANISH VE-SPANISH ZA-ENGLISH

Description Territory=Republic of Korea (SOUTH), Language=Korean Territory=Lithuania, Language=Lithuanian Territory=Latvia, Language=Latvian Territory=Mexico, Language=Spanish Territory=Netherlands, Language=Dutch Territory=Norway, Language=Norwegian Territory=New Zealand, Language=English Territory=Panama, Language=Spanish Territory=Peru, Language=Spanish Territory=Poland, Language=Polish Territory=Portugal, Language=Portuguese Territory=Romania, Language=Romanian Territory=Russia, Language=Russian Territory=Sweden, Language=Swedish Territory=Slovenia, Language=Slovenian Territory=Turkey, Language=Turkish Territory=Taiwan, Language=Chinese Territory=USA, Language=English Territory=Uruguay, Language=Spanish Territory=Venezuela, Language=Spanish Territory=South Africa, Language=English

Parallel Job Character Set Maps


The following table lists the character set maps available for parallel maps. The maps whose names start with ASCL_ are the equivalents of the server job maps see Server Job Character Set Maps onpage B-1. (Parallel job versions of most of

Maps and Locales Supplied with DataStage

B-7

the server job maps are supplied).

Character Set Big5 BOCU-1 CESU-8 EUC-KR Extended_UNIX_ Code_Packed_Format _for_Japanese ebcdic-xml-us GB_2312-80 GBK gb18030 HZ-GB-2312 hp-roman8 IBM00858 IBM01140 IBM01141 IBM01142 IBM01143 IBM01144 IBM01145 IBM01146 IBM01147 IBM01148 IBM01149 IBM037 IBM1026 IBM273

Description Chinese for Taiwan Multi-byte set Compressed UTF-8 (http://www.unicode.org/notes/tn6) 8-bit Compatibility Encoding Scheme for UTF-16 (http://www.unicode.org/unicode/reports/tr26) Korean for Internet messages Extended UNIX Code Packed Format for Japanese

EBCDIC for XML (US) Chinese (1980) Chinese (1995) Chinese (2000) Chinese (HZ) http://www.faqs.org/rfcs/rfc1345.html IBM codepage 850 (multilingual) with Euro symbol EBCDIC US with Euro symbol EBCDIC German with Euro symbol EBCDIC Danish/Norwegian with Euro symbol EBCDIC Finnish/Swedish with Euro symbol EBCDIC Italian with Euro symbol EBCDIC Spanish with Euro symbol EBCDIC GB with Euro symbol EBCDIC French with Euro symbol EBCDIC international with Euro symbol EBCDIC Icelandic with Euro symbol EPCDIC CP US EBCDIC Latin-5 Turkey EBCDIC Austria, Germany

B-8

Ascential DataStage NLS Guide

Character Set IBM277 IBM278 IBM280 IBM284 IBM285 IBM290 IBM297 IBM367 IBM420 IBM424 IBM500 IBM850 IBM851 IBM852 IBM852 IBM855 IBM857 IBM860 IBM861 IBM862 IBM863 IBM864 IBM865 IBM868 IBM869 IBM870 IBM871 IBM918 ISCII, Version 1

Description EBCDIC Denmark, Norway EBCDIC Sweden, Finland EBCDIC Italy EBCDIC Spanish EBCDIC GB EBCDIC Japanese (kana) EBCDIC France ASCII EBCDIC Arabic EBCDIC Hebrew EBCDIC International MS-DOS Latin-1 MS-DOS Greek MS-DOS Latin-2 MS-DOS Latin-1 with Euro symbol EBCDIC Cyrillic EBCDIC Turkey MS-DOS Portugese MS-DOS Icelandic PC Hebrew MS-DOS Canadian French PC Arabic MS-DOS Nordic MS-DOS Pakistan EBCDIC Modern Greek EBCDIC Multilingual Latin-2 EBCDIC Iceland EBCDIC Pakistan(Urdu) Indian Standard Code for Infromation Interchange, version 1

Maps and Locales Supplied with DataStage

B-9

Character Set ISCII, Version 2 ISCII, Version 3 ISCII, Version 4 ISCII, Version 5 ISCII, Version 6 ISCII, Version 7 ISCII, Version 8 ISO-2022-CN ISO-2022-CN-EXT ISO-2022-JP ISO-2022-JP-2 ISO-2022-KR ISO-2022 ISO-2022, locale=ja,version=3 ISO-2022, locale=ja,version=4 ISO-2022, locale=ko,version=1 ISO-8859-1:1987 ISO-8859-2:1987 ISO-8859-3:1988 ISO-8859-4:1988 ISO-8859-5:1988 ISO-8859-6:1987

Description Indian Standard Code for Infromation Interchange, version 2 Indian Standard Code for Infromation Interchange, version 3 Indian Standard Code for Infromation Interchange, version 4 Indian Standard Code for Infromation Interchange, version 5 Indian Standard Code for Infromation Interchange, version 6 Indian Standard Code for Infromation Interchange, version 7 Indian Standard Code for Infromation Interchange, version 8 Chinese Chinese extended Japanese (JIS) Japanese (JIS) extension Korean

Latin alphabet No. 1 Latin alphabet No. 2 Latin alphabet No. 3 Latin alphabet No. 4 Latin/Cyrillic alphabet Latin/Arabic alphabet

B-10

Ascential DataStage NLS Guide

Character Set ISO-8859-7:1987 ISO-8859-8:1988 ISO-8859-9:1989 ibm-1006_P100-2000 ibm-1006_X100-2000 ibm-1025_P100-2000 ibm-1047 ibm-1047-s390 ibm-1097_P100-2000 ibm-1097_X100-2000 ibm-1098_P100-2000 ibm-1098_X100-2000 ibm-1112_P100-2000 ibm-1122_P100-2000 ibm-1123 ibm-1124_P100-2000 ibm-1125_P100-2000 ibm-1129_P100-2000 ibm-1130_P100-2000 ibm-1131_P100-2000 ibm-1132_P100-2000 ibm-1133_P100-2000 ibm-1137_P100-2000 ibm-1140-s390 ibm-1142-s390 ibm-1143-s390 ibm-1144-s390 ibm-1145-s390 ibm-1146-s390 ibm-1147-s390

Description Latin/Greek alphabet Latin/Hebrew alphabet Latin alphabet No. 5 ISO Urdu ISO Urdu EBCDIC Cyrillic EBCDIC Open Edition EBCDIC Open Edition EBCDIC Farsi EBCDIC Farsi ISO Farsi ISO Farsi EBCDIC Baltic EBCDIC Estonia EBCDIC Ukraine PC Ukraine PC Cyrillic Ukraine ISO Vietnamese EBCDIC Vietnamese PC Cyrillic Belarus EBCDIC Lao ISO Lao EBCDIC Devanagari with LF/NL swapped EBCDIC United States with LF/NL swapped EBCDIC Denmark, Norway with LF/NL swapped EBCDIC Finland, Sweden with LF/NL swapped EBCDIC Italy with LF/NL swapped EBCDIC Spain with LF/NL swapped EBCDIC UK, Ireland with LF/NL swapped EBCDIC France with LF/NL swapped

Maps and Locales Supplied with DataStage

B-11

Character Set ibm-1148-s390 ibm-1149-s390 ibm-1153 ibm-1153-s390 ibm-1154 ibm-1155 ibm-1156 ibm-1157 ibm-1158 ibm-1159 ibm-1160 ibm-1164 ibm-1250 ibm-1251 ibm-1252 ibm-1253 ibm-1254 ibm-1255 ibm-1256 ibm-1257 ibm-1258 ibm-12712 ibm-12712-s390 ibm-1277 ibm-1280 ibm-1281 ibm-1282 ibm-1283 ibm-1363_P110-2000 ibm-1363_P11B-2000

Description EBCDIC Multilingual with LF/NL swapped EBCDIC Iceland with LF/NL swapped EBCDIC latin 2 As ibm-1153 with LF/NL swapped EBCDIC Cyrillic Multilingual EBCDIC Turkey EBCDIC Baltic Multilingual EBCDIC Estonia EBCDIC Cyrillic Ukraine EBCDIC Thailand EBCDIC Vietnam Windows Latin 2 Windows Cyrillic Windows Latin 1 Windows Greek Windows Latin 5 (Turkey) Windows Hebrew Windows Arabic Windows Latin 4 (Balttic) Windows Vietnamese EBCDIC Hebrew EBCDIC Hebrew with LF/NL swapped Adobe Latin1 Encoding Macintosh Greek Macintosh Turkish Macintosh Central European Macintosh Cyrillic PC Korea KS extended PC Korea KS extended

B-12

Ascential DataStage NLS Guide

Character Set ibm-1364_P110-2000 ibm-1371 ibm-1381_P110-2000 ibm-1388_P103-2001 ibm-1390 ibm-1399 ibm-16684 ibm-16804 ibm-17248 ibm-33722_P120-2000 ibm-37-s390 ibm-437 ibm-4899 ibm-4971 ibm-5104 ibm-5123 ibm-808 ibm-813 ibm-848 ibm-8482 ibm-849 ibm-856 ibm-859 ibm-866 ibm-867 ibm-872 ibm-874 ibm-875_P100-2000 ibm-901 ibm-902

Description EBCDIC Korea KS extended EBCDIC Taiwan (euro) PC China GB EBCDIC China GBK EBCDIC Japan Katakana (euro) EBCDIC Japan Latin (euro) DBCS Jis + Roman Jis Host EBCDIC Arabic PC Arabic EUC Japan EBCDIC United States PC United States Old EBCDIC Hebrew EBCDIC Greek 8-bit Arabic Host Roman Jis PC Russian (euro) ISO Greek host SBCS (Katakana) host SBCS (Katakana) PC Belarus PC Hebrew (old) PC Latin 9 PC Russia PC Israel PC Cyrillic PC Thai EBCDIC Greek PC Baltic PC Estonian

Maps and Locales Supplied with DataStage

B-13

Character Set ibm-9027 ibm-9030_P100-2000 ibm-918_X100-2000 ibm-921 ibm-922 ibm-9238 ibm-930 ibm-933 ibm-935 ibm-937 ibm-939 ibm-942_P120-2000 ibm-942_P12A-2000 ibm-943_P130-2000 ibm-949_P110-2000 ibm-950 ibm-964_P110-2000 iso-8859-15 JIS_Encoding KO18-R KS-C-5601-1987 LMBCS-1 LMBCS-11 LMBCS-16 LMBCS-17 LMBCS-18 LMBCS-19 LMBCS-2 LMBCS-3 LMBCS-4

Description DBCS T-Ch Host with Euro EBCDIC Urdu PC Baltic PC Estonian PC Arabic Extended EBCDIC Japan DBCS EBCDIC Korea DBCS EBCDIC China DBCS EBCDIC Taiwan DBCS EBCDIC Japan Extended DBCS PC Japan SJIS-78 syntax PC Japan SJIS-78 syntax PC Japan SJIS-90 PC DBCS-only Taiwan PC Taiwan EUC Taiwan ISO Latin 1 Russia Internet Korean Lotus multi-byte character set Latin 1 Lotus multi-byte character set Thai Lotus multi-byte character set Japanese Lotus multi-byte character set Korean Lotus multi-byte character set Traditional Chinese Lotus multi-byte character set Simplified Chinese Lotus multi-byte character set Greek Lotus multi-byte character set Hebrew Lotus multi-byte character set Arabic

B-14

Ascential DataStage NLS Guide

Character Set LMBCS-5 LMBCS-6 LMBCS-8 macintosh SCSU Shift_JIS TIS_620 UTF-16 UTF-16BE UTF-16LE UTF-32 UTF-32BE UTF-32LE UTF-7 UTF-8 UTF16OppositeEndian UTF16PlatformEndian UTF32OppositeEndian UTF32PlatformEndian windows-1250 windows-1251 windows-1252 windows-1253 windows-1254 windows-1255 windows-1256 windows-1257

Description Lotus multi-byte character set Cyrillic Lotus multi-byte character set Latin 2 Lotus multi-byte character set Turkish Macintosh http://www.iana.org/assignments/charset-reg/SCSU Shift-JIS, Japanese TIS-620, Thai UTF-16 Unicode UTF-16 Unicode Big Endian UTF-16 Unicode Little Endian UTF-32 Unicode UTF-32 Unicode Big Endian UTF-32 Unicode Little Endian UTF-7 Unicode UTF-8 Unicode UTF-16 Unicode Opposite Endian UTF-16 Unicode Platform Endian UTF-32 Unicode Opposite Endian UTF-32 Unicode Platform Endian Windows Latin 2 Windows Cyrillic Windows Latin 1 Windows Greek Windows Latin 5 (Turkey) Windows Hebrew Windows Arabic Windows Latin 4 (Baltic)

Maps and Locales Supplied with DataStage

B-15

Character Set windows-1258

Description Windows Vietnamese

Parallel Job Locales


The following list shows the locales supplied with DataStage for use with parallel jobs for collation purposes, the territory that uses each locale, and the relevant language: Locale af af_ZA am am_ET ar ar_AE ar_BH ar_DZ ar_EG ar_IN ar_IQ ar_JO ar_KW ar_LB ar_LY ar_MA ar_OM ar_QA ar_SA ar_SD ar_SY ar_TN Description Language=Afrikaans Language=Afrikaans, Territory=South Africa Language=Amharic Language=Amharic, Territory=Ethiopia Language=Arabic Language=Arabic, Territory=United Arab Emirates Language=Arabic, Territory=Bahrain Language=Arabic, Territory=Algeria Language=Arabic, Territory=Egypt Language=Arabic, Territory=India Language=Arabic, Territory=Iraq Language=Arabic, Territory=Jordan Language=Arabic, Territory=Kuwait Language=Arabic, Territory=Lebanon Language=Arabic, Territory=Libya Language=Arabic, Territory=Morocco Language=Arabic, Territory=Oman Language=Arabic, Territory=Qatar Language=Arabic, Territory=Saudi Arabia Language=Arabic, Territory=Sudan Language=Arabic, Territory=Syria Language=Arabic, Territory=Tunisia

B-16

Ascential DataStage NLS Guide

Locale ar_YE be be_BY bg bg_BG bn bn_IN ca ca_ES ca_ES_PREEURO cs cs_CZ da da_DK de de_PHONEBOOK de_AT de_AT_PREEURO de_BE de_CH de_DE de_DE_PREEURO de_LU de_LU_PREEURO el el_GR el_GR_PREEURO en en_AU en_BE

Description Language=Arabic, Territory=Yemen Language=Belarusian Language=Belarusian, Territory=Belarus Language=Bulgarian Language=Bulgarian, Territory=Bulgaria Language=Bengali Language=Bengali, Territory=India Language=Catalan Language=Catalan, Territory=Spain Language=Catalan, Territory= Language=Czech Language=Czech, Territory= Language=Danish Language=Danish, Territory=Denmark Language=German Language=German, Territory=Phonebook order Language=German, Territory=Austria Language=German, Territory=Austria Language=German, Territory=Belgium Language=German, Territory=Switzerland Language=German, Territory=Germany Language=German, Territory=Germany Language=German, Territory=Luxembourg Language=German, Territory=Luxembourg Language=Greek Language=Greek, Territory=Greece Language=Greek, Territory=Greece Language=English Language=English, Territory=Australia Language=English, Territory=Belgium

Maps and Locales Supplied with DataStage

B-17

Locale en_BE_PREEURO en_BW en_CA en_GB en_GB_EURO en_HK en_IE en_IE_PREEURO en_IN en_MT en_NZ en_PH en_SG en_US en_US_POSIX en_VI en_ZA en_ZW eo es es_TRADITIONAL es_AR es_BO es_CL es_CO es_CR es_DO es_EC es_ES es_ES_PREEURO

Description Language=English, Territory=Belgium Language=English, Territory=Botswana Language=English, Territory=Canada Language=English, Territory=Great Britain Language=English, Territory=Great Britain Language=English, Territory=Hong Kong Language=English, Territory=Ireland Language=English, Territory=Ireland Language=English, Territory=India Language=English, Territory=Malta Language=English, Territory=New Zealand Language=English, Territory=Philippines Language=English, Territory=Singapore Language=English, Territory=United States Language=English, Territory=United States Language=English, Territory=U.S. Virgin Islands Language=English, Territory=South Africa Language=English, Territory=Zimbabwe Language=Esperanto Language=Spanish Language=Spanish Language=Spanish, Territory=Argentina Language=Spanish, Territory=Bolivia Language=Spanish, Territory=Chile Language=Spanish, Territory=Colombia Language=Spanish, Territory=Costa Rica Language=Spanish, Territory=Dominican Republic Language=Spanish, Territory=Ecuador Language=Spanish, Territory=Spain Language=Spanish, Territory=Spain

B-18

Ascential DataStage NLS Guide

Locale es_GT es_HN es_MX es_NI es_PA es_PE es_PR es_PY es_SV es_US es_UY es_VE et et_EE eu eu_ES eu_ES_PREEURO fa fa_IN fa_IR fi fi_FI fi_FI_PREEURO fo fo_FO fr fr_BE fr_BE_PREEURO fr_CA fr_CH

Description Language=Spanish, Territory=Guatemala Language=Spanish, Territory=Honduras Language=Spanish, Territory=Mexico Language=Spanish, Territory=Nicaragua Language=Spanish, Territory=Panama Language=Spanish, Territory=Peru Language=Spanish, Territory=Puerto Rico Language=Spanish, Territory=Paraguay Language=Spanish, Territory=El Salvador Language=Spanish, Territory=United States Language=Spanish, Territory=Uruguay Language=Spanish, Territory=Venezuela Language=Estonian Language=Estonian, Territory=Estonia Language=Basque Language=Basque, Territory=Spain Language=Basque, Territory=Spain Language=Persian Language=Persian, Territory=India Language=Persian, Territory=Iran Language=Finnish Language=Finnish, Territory=Finland Language=Finnish, Territory=Finland Language=Faroese Language=Faroese, Territory=Faroe Islands Language=French Language=French, Territory=Belgium Language=French, Territory=Belgium Language=French, Territory=Canada Language=French, Territory=Switzerland

Maps and Locales Supplied with DataStage

B-19

Locale fr_FR fr_FR_PREEURO fr_LU fr_LU_PREEURO ga ga_IE ga_IE_PREEURO gl gl_ES gl_ES_PREEURO gu gu_IN gv gv_GB he_ he_IL hi hi_DIRECT hi_IN hr hr_HR hu hu_HU hy hy_AM hy_AM_REVISED id id_ID is is_IS

Description Language=French, Territory=France Language=French, Territory=France Language=French, Territory=Luxembourg Language=French, Territory=Luxembourg Language=Irish Language=Irish, Territory=Ireland Language=Irish, Territory=Ireland Language=Gallegan Language=Gallegan, Territory=Spain Language=Gallegan, Territory=Spain Language=Gujarati Language=Gujarati, Territory=India Language=Manx Language=Manx, Territory=Great Britain Language=Hebrew Language=Hebrew, Territory=Israel Language=Hindi Language=Hindi Language=Hindi, Territory=India Language=Croatian Language=Croatian, Territory=Croatia Language=Hungarian Language=Hungarian, Territory=Hungary Language=Armenian Language=Armenian, Territory=Armenia Language=Armenian, Territory=Armenia Language=Indonesian Language=Indonesian, Territory=Indonesia Language=Icelandic Language=Icelandic, Territory=Iceland

B-20

Ascential DataStage NLS Guide

Locale it it_CH it_IT it_IT_PREEURO ja ja_JP kl kl_GL kn kn_IN ko ko_KR kok kok_IN kw kw_GB lt lt_LT lv lv_LV mk mk_MK mr mr_IN mt mt_MT nb nb_NO nl

Description Language=Italian Language=Italian, Territory=Switzerland Language=Italian, Territory=Italy Language=Italian, Territory=Italy Language=Japanese Language=Japanese, Territory=Japan Language=Kalaallisut Language=Kalaallisut, Territory=Greenland Language=Kannada Language=Kannada, Territory=India Language=Korean Language=Korean, Territory=South Korea Language=Konkani Language=Konkani, Territory=India Language=Cornish Language=Cornish, Territory=Great Britain Language=Lithuanian Language=Lithuanian, Territory=Lithuania Language=Latvian Language=Latvian, Territory=Latvia Language=Macedonian Language=Macedonian, Territory=Macedonia Language=Marathi Language=Marathi, Territory=India Language=Maltese Language=Maltese, Territory=Malta Language=Norwegian Bokm\u00e5l Language=Norwegian Bokm\u00e5l, Territory=Norway Language=Dutch

Maps and Locales Supplied with DataStage

B-21

Locale nl_BE nl_BE_PREEURO nl_NL nl_NL_PREEURO nn nn_NO om om_ET om_KE pl pl_PL pt pt_BR pt_PT pt_PT_PREEURO ro ro_RO ru ru_RU ru_UA sh sh_YU sk sk_SK sl sl_SI so so_DJ so_ET so_KE

Description Language=Dutch, Territory=Belgium Language=Dutch, Territory=Belgium Language=Dutch, Territory=Netherlands Language=Dutch, Territory=Netherlands Language=Norwegian Nynorsk Language=Norwegian Nynorsk, Territory=Norway Language=Oromo Language=Oromo, Territory=Ethiopia Language=Oromo, Territory=Kenya Language=Polish Language=Polish, Territory=Poland Language=Portugese Language=Portugese, Territory=Brazil Language=Portugese, Territory=Portugal Language=Portugese, Territory=Portugal Language=Romanian, Territory= Language=Romanian, Territory=Romania Language=Russian Language=Russian, Territory=Russia Language=Russian, Territory=Ukraine Language=Serbo-Croatian Language=Serbo-Croatian, Territory=Yugoslavia Language=Slovak Language=Slovak, Territory=Slovakia Language=Slovenian Language=Slovenian, Territory=Slovenia Language=Somali Language=Somali, Territory=Djibouti Language=Somali, Territory=Ethiopia Language=Somali, Territory=Kenya

B-22

Ascential DataStage NLS Guide

Locale so_SO sq sq_AL sr sr_YU sv sv_FI sv_SE sw sw_KE sw_TZ ta ta_IN te te_IN th th_TH ti ti_ER ti_ET tr tr_TR uk uk_UA vi vi_VN zh zh_PINYIN zh_CN zh_HK

Description Language=Somali, Territory=Somalia Language=Albanian Language=Albanian, Territory=Albania Language=Serbian Language=Serbian, Territory=Yugoslavia Language=Swedish, Territory= Language=Swedish, Territory=Finland Language=Swedish, Territory=Sweden Language=Swahili Language=Swahili, Territory=Kenya Language=Swahili, Territory=Tanzania Language=Tamil Language=Tamil, Territory=India Language=Telugu Language=Telugu, Territory=India Language=Thai Language=Thai, Territory=Thailand Language=Tigrinya Language=Tigrinya, Territory=Eritrea Language=Tigrinya, Territory=Ethiopia Language=Turkish Language=Turkish, Territory=Turkey Language=Ukrainian Language=Ukrainian, Territory=Ukraine Language=Vietnamese Language=Vietnamese, Territory=Vietnam Language=Chinese Language=Chinese Language=Chinese, Territory=China Language=Chinese, Territory=Hong Kong

Maps and Locales Supplied with DataStage

B-23

Locale zh_MO zh_SG zh_TW zh_TW_STROKE

Description Language=Chinese, Territory=Macoa S.A.R. China Language=Chinese, Territory=Singapore Language=Chinese, Territory=Taiwan Language=Chinese, Territory=Taiwan

B-24

Ascential DataStage NLS Guide

Glossary
base map A character set map upon which another map is based. For example, most character sets use an ASCII map as their base map with additional sets of characters building on the ASCII map. One of the five national conventions: Time, Numeric, Monetary, Collate, or Ctype. A fixed association between the characters used by a language, or group of languages and the values, or code points, that represent them. For example, the KSC5601 character set fixes code points for the Hangul characters used in the Korean language. A number that is used in a program to represent a character. Note that in different character sets the same code point may be used to represent different characters. Characters that do not have a dedicated key on the keyboard, but are generated using a sequence of key strokes. See input map table. A character set where the code points are either one or two bytes long. The two-byte code points usually represent characters belonging to Asian languages, such as Chinese or Kanji. See also single-byte character set. A variant of the EBCDIC character set. EBCDIK replaces lowercase Latin characters with Japanese Katakana characters. The character set used to input data on a keyboard, display data on a screen, print reports, and so on. Appendix B lists the external character sets supported by DataStage. See also internal character set and Unicode.

category character set

code point

deadkey characters

deadkey table double-byte character set

EBCDIK character set

external character set

Glossary-1

JEF character set

A Fujitsu proprietary encoding of several thousand characters. It includes the single-byte EBCDIK and double-byte JIS character sets. The JEF character set differs from all other character sets that DataStage NLS supports, in that it uses a pair of shift characters to toggle between single-byte and double-byte encoding. Mapping tables used to define byte sequences that are valid only on input. They are used to define deadkey characters. The character set that DataStage uses to store and manipulate data. See also external character set and Unicode. The language, character set, and data formatting conventions used by a group of people. In DataStage, a locale comprises a set of conventions in specific categories (Time, Numeric, Monetary, Ctype, and Collate). See also territory. The main table that defines how a character set is mapped between the internal and external character sets. A standard set of rules that defines how certain data types such as numbers and dates are used in a territory. See NLS. A programs ability to use any languages, data formatting rules, or character sets, that are required by its users all over the world. Also referred to as internationalization. A character set whose code points have values 0 through 255, and can therefore be represented by a single byte. Single-byte character sets are suitable for some European, American, and Middle Eastern languages. See also double-byte character set. The area or region where a locale is used. This may correspond to a geographical location, such as a

input map table

internal character set

locale

main map table

national conventions

National Language Support (NLS) NLS

single-byte character set

territory

Glossary-2

Ascential DataStage NLS Guide

country, or to something less easy to define in geographical terms, such as a multinational organization. Unicode A 16-bit character set that aims to provide unique code points for all characters in every standard character set (with room for some nonstandard characters too). Unicode forms part of ISO 10646 and is a trademark of Unicode, Inc. Groups of logically related characters in the Unicode character set that correspond to the scripts used for different families of languages. The character value xFFFD, which is used to replace an unmappable character read from the external character set. The character that is used as a substitute for an unmappable character. Each map contains a definition of an unknown character. A character that cannot be mapped to the external character set using the current map table. DataStage substitutes the current maps unknown character, usually a question mark (?), for any unmappable character. UTF8 is a standard for the use Unicode character data in 8-bit UNIX environments. In DataStage UTF8 is enhanced to map the DataStage system delimiters to the Private Use area of Unicode. Other UTF8-compatible software can understand the DataStage UTF8 representation.

Unicode blocks

Unicode replacement character unknown character

unmappable character

UTF8

Glossary-3

Glossary-4

Ascential DataStage NLS Guide

Numerics
7-bit ASCII 1-3

A
accent weight A-29 alphabetic characters A-3, A-22

B
base maps definition Gl-1 block characters listing A-2 building locales A-4 maps A-3

C
case weight A-29 Categories menu A-4 categories, see locale categories character sets 1-1, 1-2 code points 1-2 definition Gl-1 mapping between internal and external 1-1 characters see also Unicode characters alphabetic A-3, A-22 listing Unicode block A-2 nonprinting A-3 radix 1-4 7-bit ASCII 1-3 storing 1-2 Characters menu A-2 code point 1-2 definition Gl-1 Collate category 2-22 definition 1-5

collating accented sorts A-25 considering case A-25 contractions and expansions A-30 in DataStage A-28 issues A-28 compiling locales A-6 maps A-5 configurable parameters editing A-5 configuring locales A-5 maps A-5 NLS by language A-6 convention definition 2-22 convention records A-9A-28 conventions 2-22, 2-23 national 1-3, ??1-5 conventions, documentation 1-vi converting lowercase to uppercase A-3 uppercase to lowercase A-3 creating locale records A-4 map tables A-3 new maps 2-18 cross-referencing locales A-4 map tables A-3 Ctype category 2-22, A-3 definition 1-5 currency symbols international A-17 local A-17

D
deadkey characters definition Gl-1 deadkey tables

Index-1

definition Gl-1 decimal places, specifying in monetary formats A-18 decimal separators specifying in monetary formats A-17 specifying in numeric formats A-16 defining characters as lowercase A-22 characters as uppercase A-22 deleting locale records A-4 locales A-6 map tables A-3 maps A-5 digits A-3 specifying alternatives to ASCII A-16 documentation conventions 1-vi double-byte character set definition Gl-1

NLS.CS.ALPHAS A-2, A-7 NLS.CS.BLOCKS A-7 NLS.CS.CASES A-3, A-7 NLS.CS.DESCS A-8 NLS.CS.TYPES A-3, A-8 NLS.LANG.INFO A-5, A-8 NLS.LC.ALL A-4, A-8 NLS.LC.COLLATE A-8 NLS.LC.CTYPE A-8 NLS.LC.MONETARY A-8, A-17 NLS.LC.NUMERIC A-9 NLS.LC.TIME A-9 NLS.MAP.DESCS A-3, A-9 NLS.MAP.TABLES A-3, A-9 NLS.WT.LOOKUP A-5, A-9, A-31 NLS.WT.TABLES A-9 type 19 A-31 uvconfig A-5, A-6

G
Gregorian calendar A-12 grids editing A-9

E
EBCDIK character set definition Gl-1 editing configurable parameters A-5 grids A-9 locale records A-4 map tables A-3 weight tables A-31 era names A-11 external character sets 1-1, 1-2 definition Gl-1

I
ideographic area (Unicode) A-2 input map table, definition Gl-2 Installation menu A-5 installing maps A-5 internal character sets 1-1, 1-2 definition Gl-2 ISO 4217 standard A-17

J F
files NLS.CLIENT.LCS A-4, A-7 NLS.CLIENT.MAPS A-3, A-7 Japanese Imperial Era A-11 JEF character set definition Gl-2

Index-2

Ascential DataStage NLS Guide

L
listing built locales A-6 built maps A-5 currently installed locales A-6 currently installed maps A-5 locales A-4 map tables A-3 maps A-3 Unicode block characters A-2 Unicode block numbers A-2 Unicode characters A-2 locale definition 2-21 locale categories Collate 1-5, 2-22 Ctype 1-5, 2-22 definition Gl-1 Monetary 1-5, 2-22, A-17 Numeric 1-4, 2-22 Time 1-4, 2-22 locale category definition 2-22 locale records creating A-4 deleting A-4 editing A-4 locales building A-4 compiling A-6 configuring A-5 cross-referencing A-4 definition Gl-2 deleting A-6 how they work 2-21 listing A-4 listing built A-6 listing installed A-6 NLS locale configuration program A-5 overview 1-3

supplied with DataStage B-5, B-16 Locales menu A-4 lowercase defining characters as A-22 rules for converting to uppercase A-3

M
main map table, definition Gl-2 map descriptions A-3 map tables 1-2 creating A-3 cross-referencing A-3 deleting A-3 editing A-3 listing A-3 table of B-1 Mappings menu A-3 maps building A-3 compiling A-5 configuring A-5 creating 2-18 deleting A-5 installing in shared memory A-5 listing A-3 listing built A-5 listing installed A-5 MNEMONICS A-2 NLS map configuration program A-5 supplied with DataStage B-1 Maps menu A-5 menus Categories A-4 Characters A-2 Installation A-5 Locales A-4 Mappings A-3 Maps A-5 Unicode A-2

Index-3

MNEMONICS map A-2 Monetary category 2-22, A-17 definition 1-5 Monetary records A-17

N
national convention definition 2-22 national conventions 1-3, ??1-5, 2-22, 2-23 definition Gl-2 National Language Support, see NLS NLS configuring by language A-6 definition Gl-2 NLS Administration menu Build (map) option A-3 Categories option A-4 Installation option A-5 Locales option 2-22, A-4 Mappings option A-3 Unicode option A-2 NLS database A-6 nls directory A-6 NLS locale configuration program A-5 NLS map configuration program A-5 NLS mode overview 1-1 NLS.CLIENT.LCS file A-4, A-7 NLS.CLIENT.MAPS file A-3, A-7 NLS.CS.ALPHAS file A-2, A-7 NLS.CS.BLOCKS file A-7 NLS.CS.CASES file A-3, A-7 NLS.CS.DESCS file A-8 NLS.CS.TYPES file A-3, A-8 NLS.LANG.INFO file A-5, A-8 NLS.LC.ALL file A-4, A-8 NLS.LC.COLLATE file A-8 NLS.LC.CTYPE file A-8 NLS.LC.MONETARY file A-8, A-17 NLS.LC.NUMERIC file A-9

NLS.LC.TIME file A-9 NLS.MAP.DESCS file A-3, A-9 NLS.MAP.TABLES file A-3, A-9 NLS.WT.LOOKUP file A-5, A-9, A-31 NLS.WT.TABLES file A-9 nonprinting characters A-3 Numeric category 2-22, A-3 definition 1-4

O
overview of locales 1-3 of NLS mode 1-1 of Unicode 1-2

R
radix character 1-4, A-17

S
SET.LOCALE command A-6 shared memory installing maps in A-5 shared weight A-29 single-byte character set definition Gl-2 storing characters 1-2 suppressing zeros A-16

T
territory 1-4 definition Gl-2 Thai Buddhist Era A-11 thousands separators specifying in monetary formats A-17 specifying in numeric formats A-16 Time category 2-22

Index-4

Ascential DataStage NLS Guide

definition 1-4 TIME command A-10 TIMEDATE function A-10 type 19 files A-9, A-31

shared A-29

Z
zeros, suppressing in numeric formats A-16

U
Unicode block characters, listing A-2 block numbers, listing A-2 blocks definition Gl-3 characters A-2 listing A-2 definition Gl-3 ideographic area A-2 menus A-2 overview 1-2 replacement character, definition Gl-3 shared weights and A-30 standard 1-2 unknown characters defining substitute characters for 2-21 definition Gl-3 unmappable characters definition Gl-3 uppercase defining characters as A-22 rules for converting to lowercase A-3 uppercase, defining characters as A-22 UV account directory A-6 uvconfig file A-5, A-6

W
weight tables editing A-30 weights calculating A-32

Index-5

Index-6

Ascential DataStage NLS Guide

You might also like