Zen and The Art of Power Play Cube Building

Download as pdf or txt
Download as pdf or txt
You are on page 1of 84

Performance and Scalability in PowerPlay Applications

This document outlines several known facts and theoretical aspects of PowerCube creation in Transformer. It also discusses scalability in PowerPlay Enterprise Server.

Table of Contents
Part I ZEN and the Art of PowerPlay Cube Building 7 Choosing Your Hardware
How Transformer Uses Memory 10
Setting the PowerPlay DataServer Cache Size 11 Setting the Memory Available for Data Sort 12 Operating System Memory Settings 13 Using Multiple CPUs 14

CHAPTER 1

How Transformer Uses Disk Space

16
16

Configuring Work Locations for Transformer Handling Very Large Work Files 17 Disk Configurations 18

CHAPTER 2

PowerPlay Partitioning
When To Focus On Partitioning Environmental Constraints 21 20

19

Network Bandwidth 21 Cube Production Window 22 Acceptable Query Response Time 22 Server And End User Hardware 22 After Environmental Constraints 22

Whats New In PowerPlay 6.0 Partitioning


Terminology Refresh 24 The New Partitioning Scheme Other Tips 28 25

24

CHAPTER 3

Using Server Transformer


The Basics 32
Installation Checklist 32 Step-wise Testing 35 Common Pitfalls 37

31

Creating an Effective Working Environment 43


Directory Structures 43 Configuration Files 43 Automation Scripts 47

CHAPTER 4

Cube Build Samples


Cube Build Results 54

53

Environmental Settings and Methodology 54 Row Dominant Model 55 Category Dominant Model 56 Row and Category Dominant Model 57

Part II

Zen and the Art of PowerPlay Enterprise Server Configuration 59 PowerPlay Web Stress Testing
Test Tools and Environment 62 Test Results (Windows NT Server) 63

CHAPTER 5

61

Enterprise Server Concurrency Test Results 66

Test Results (Solaris) 67 PowerPlay Web Sizing Guidelines (NT) 68


Basic Server Specifications 69 Memory Requirements 71 Deciding on the Number of Processes 72

CHAPTER 6

Enterprise Server Tips and Techniques 75


Restricting Query Processing Time 76 Setting the Query Processor Threshold 77 Registering Multiple Cubes by Directory Name 78 Reducing the amount of HTML Passed to the Web Browser 80 Passing Security Information to PowerPlay Web from a Web Front End 81 File Descriptors for PowerPlay Enterprise Server 83

Part I

ZEN and the Art of PowerPlay Cube Building

Nothing can improve either build performance or runtime performance in a PowerPlay application better than taking the time to design your OLAP model and cubes well. However, there are some things that can impact performance for nearly any PowerPlay application. This chapter describes the things you can do to optimize your PowerPlay cube builds. In addition, this section describes in detail how to use Transformer successfully in a UNIX environment.

CHAPTER 1

Choosing Your Hardware

Choosing the right hardware to build PowerPlay cubes is dependent on the following things:
1. 2. 3.

Memory. This is probably the single most important aspect of a cube building machine. CPUs. Transformer can take advantage of two CPUs during the data read phase, thereby achieving faster builds. Disk Space and Configuration. Transformer is I/O intensive as it reads source data, builds temporary work files, and creates PowerCubes. Tuning your environment to provide Transformer the best possible I/O performance will speed up your cube builds. Networking Considerations. Ensuring fast access between Transformer and its data sources will provide optimal cube build times. Partitioning. PowerPlay versions 6.0 and 6.5 include a new autopartitioning algorithm that typically builds cubes faster than earlier versions.

4. 5.

This chapter discusses these issues.

Choosing Your Hardware

How Transformer Uses Memory


One of the keys to building big cubes is to understand how Transformer uses memory during the various phases of cube building. There are three major phases of cube building:

Category/Work File Generation. Includes reading the data source and processing the rows to create categories, and compressed working files. Meta-Data Update. Deals with specifying the structure of the cube. Cube structure is some subset of the dimensions, categories and measures available in the model, and it reflects what end-users see in PowerPlay. Data Update. Represents (normally) the dominant stage of cube processing in terms of time. This stage includes consolidation, partitioning, and updating the cube with the set of records that apply to the cube.

The last two phases (Metadata Update and Data Update) most often represent the bulk of cube build time. The main variables for controlling how Transformer uses memory are as follows:
UNIX PPDS_WRITE_MEMORY environment variable. Default: 16MB SORTMEMORY environment variable Windows NT

WriteCacheSize entry in [PowerPlay DataServer] section of cognos.ini.


Default: 4MB

SortMemory entry in [Services] section of cognos.ini.file.

Performance and Scalability in PowerPlay Applications

10

How Transformer Uses Memory

Setting the PowerPlay DataServer Cache Size


The value set for PowerPlay DataServer Write Cache can affect cube build time in a negative or positive way, depending on how much memory is available. In order to better understand this, you have to be aware of what processes are using memory during a cube build. Assuming that the server on which cubes are being built is dedicated to Transformer, the following processes compete for memory:

the Operating System the Disk Cache, which is maintained by the Operating System Transformer, and the Transformer Object Repository (Transformer RP) Cognos Data Management Services (DMS) the PowerPlay Data Server (PPDS)

The MetaData Update and Cube Update phases most often represent the bulk of processing time for Transformer. During these phases, PPDS, Disk Cache, and to some extent Transformer RP compete for memory. The amount of physical memory available during these times is critical to better build times. As a rule of thumb, the best performance is achieved when enough physical memory is available so that the Disk Cache can grow to be as large as the final size of the PowerCube (MDC). The reason is that building an MDC results in many inserts, updates and deletes by PPDS. When the PowerCube is entirely in memory (Disk Cache), cube build performance will be optimal since there is no need by PPDS to go to access information on the hard disk. Assigning more memory to PPDS (via the PPDS_WRITE_MEMORY or WriteCacheSize entries) can benefit build time since this will reduce the number of disk I/Os required by PPDS. However, this is beneficial only if it does not reduce the memory available to the Disk Cache. On systems with lots of memory, there is no reason why the user should not consider increasing the PPDS Write Cache from the default setting. However, increasing it indiscriminantly can actually cause problems. Increasing the cache size to 32MB or 64MB on large systems can provide performance improvements. Increasing it to a very large setting (hundreds of megabytes) is not generally advised.

11

Performance and Scalability in PowerPlay Applications

Choosing Your Hardware

A recent cube build for a 65MB cube in Windows NT, on a PC with 512 MB of RAM yielded a build time improvement of approximately 15%:
WriteCacheSize 4 MB 32MB Build Time

162 Minutes 137 Minutes

Setting the Memory Available for Data Sort


The SORTMEMORY variable sets the amount of physical memory thats available to DMS when its sorting data. Transformer sorts data for consolidation, and sorting is required for auto-partitioning. The number you specify represents the number of 2K blocks used when sorting data. For example, setting SORTMEMORY=5120 provides 5120 x 2K = 10MB of memory for data sorting. The default setting for SORTMEMORY is 512, or 1MB. As with the PPDS Write Cache, you should only increase the setting for SORTMEMORY if your systems overall physical memory allows it, and if your source data is large. When sorting data, creates the data services component of Transformer creates a temporary sort file. You will obtain the best performance (and possibly avoid running out of disk space during cube builds) if you control the location of this temporary file. See the section entitled Configuring Work Locations for Transformer on page 16.

Performance and Scalability in PowerPlay Applications

12

How Transformer Uses Memory

Operating System Memory Settings


Most UNIX systems are configured to provide the best possible sharing of system resources between competing processes. This is not an optimal setting for Transformer, which requires as much physical memory as possible. For example, an HP/UX server might have 2GB of physical memory, but be configured so that no single process can ever obtain more than 67MB. In such cases, Transformer will never obtain as much memory as it needs to perform large cube builds in an efficient way. To ensure that Transformer can obtain the memory it requires, make sure that your UNIX server is configured to grant unlimited resources to the rsserver process. For HP/UX: You can increase the MAXDSIZ kernel setting. For Solaris and IBM/AIX, contact the system administrator or your operating system documentation to determine how to tune your kernel settings for Transformer.

13

Performance and Scalability in PowerPlay Applications

Choosing Your Hardware

Using Multiple CPUs


Transformer versions 6.0 and 6.5 have an option to use multiple CPUs during the data read phase of a cube build. For large data sources, this can significantly improve the overall performance of your cube builds. This feature is called Enable Multi Processing, and it is set on a query-by-query basis in the Query Properties dialog box:

The Enable multiprocessing feature allows Transformer to use multiple processors (if theyre available) during Transformers data read phase.

Note: The Enable multi-processing feature is supported only for the following query types:

Impromptu Query Definition (IQD) Delimited Field Text Delimited Field Text with Column Titles

Performance and Scalability in PowerPlay Applications

14

How Transformer Uses Memory

Running Parallel Instances of Transformer


A second case in which a multi CPU system can be used to advantage with Transformer is in running multiple instances in parallel. This can be required when a large number of cubes must be built in parallel, in a specific time period. When running multiple Transformer instances, keep the following in mind:

Each distinct Transformer process should have its own dedicated CPU. If youre using the Enable Multi-Processing feature, then each instance or transformer should have 2 dedicated CPUs. Each Transformer instance will use system resources independent of all other instances. Ensure that you have sufficient memory, disk space, and i/o bandwidth to support all instances. Each Transformer instance will require its own set of configuration files. Do not share directories for temporary files between Transformer instances.

15

Performance and Scalability in PowerPlay Applications

Choosing Your Hardware

How Transformer Uses Disk Space


Transformer creates several temporary files as it builds cubes. Most of these are documented in the Transformer documentation, with the exception of the DMS temporary sort file. You can find detailed information about temporary file size estimates in the PowerPlay Administrators Guide. One file that is not documented in the PowerPlay Administrators Guide is the DMS temporary sort file. This file is created whenever Transformer has to perform a sort operation, which happens by default when you build cubes. The entry that controls the location of the DMS temporary sort file is:
UNIX TEMPFILEDIRS environment variable Windows NT

Temporary entry in [Services] section of cognos.ini.file.

If you do not set this entry, then Transformer uses whatever location is set up as the Temporary file location in either UNIX or NT. Most often this is /usr/tmp (for UNIX) or c:\temp for NT. If you have not explicitly set up the location for the temporary sort file, its possible to run out of disk space during a cube build. Set the TEMPFILEDIRS (UNIX) or Temporary (NT) entry so that it points to a disk with space equal to the size of the cube or greater. You can specify multiple directories separated by semicolons. Note: You cannot set this entry for Windows NT using the Transformer UI; you can only set it by editing the cognos.ini file manually.

Configuring Work Locations for Transformer


A common mistake is to point all of your file locations to the same physical folder. For example, you might set your DataWorkDirectory, ModelWorkDirectory, CubeSaveDirectory, and so on all to /usr/cognos/work, or something similar. The problem with this approach is that it introduces contention for the same physical drive among multiple processes. All phases of the cube build process will read data from and write data to the same physical device. This does not provide an optimal use of system resources. A better approach is to specify different locations for each of the major areas in which Transformer creates files, preferably in physical locations associated with

Performance and Scalability in PowerPlay Applications

16

How Transformer Uses Disk Space

different disk controllers. For example, assume that on a Solaris system, there are four physical drives, each with several GB of space, and each controlled by a separate disk controller (Controller1 through Controller4). A good approach might be to define the following locations:

ModelWorkDirectory: /cognos/model/work, on Controller1 DataWorkDirectory: /cognos/data/work, on Controller2 TEMPFILEDIRS: /usr/cognos/temp, on Controller3 All other file locations (CubeSave, ModelSave) in various folders on Controller4.

Handling Very Large Work Files


In the case of very large cubes (Millions of rows of input data and hundreds of thousands of categories), its possible for Transformer to exceed the maximum allowable size for a file when building its temporary work file. Its also possible to completely fill a volume or disk. In such cases, you can specify multiple locations for your temporary work files. For NT, you specify these in the Data Temporary Files box in the Directories tab of the Preferences dialog box. For UNIX, you specify multiple paths for the DataWorkDirectory setting, as in
DataWorkDirectory=<path1;path2;...>

For example, you might specify the following:


DataWorkDirectory=/opt/cognos/work1;/opt/cognos/work2

Being able to use multiple drives eliminates size limitations set by the operating system. As Transformer creates cubes, it writes temporary files to the specified drives or directories, in the order that you listed them. It moves to the next location when the current one is full. These files are concatenated into one logical file, regardless of which drive they are in.

17

Performance and Scalability in PowerPlay Applications

Choosing Your Hardware

Disk Configurations
Transformer is i/o intensive. For a large cube build, Transformer can perform thousands of reads and writes to and from its temporary work file, the sort file, and to the PowerCube (MDC file). To ensure that your cubes are built as optimally as possible with respect to disk I/O, its important that your environment be optimized for fast reads and writes. Consider the following when configuring your environment for optimal Transformer cube builds:

The speed of the physical devices available. When deciding on hardware for use in building large cubes, you will realize the best performance if you have very fast physical disk access times. The RAID level of your physical storage devices can have an effect on overall cube build time. Which RAID level you choose will have a big effect on the throughput for Transformer. The following table describes the effects of some (but not all) widely-used RAID levels:
RAID Level 0 (striping) Description Optimizes performance at the expense of data redundancy. Data is distributed among disks for performance, with no provision for redundancy. As a result, a disk crash can cause data to be lost across several disks.

Emphasizes data redundancy at the expense of performance. Mirroring maintains multiple copies of data to ensure that, in the event of a disk failure, no data is lost. 5 (striping with parity) Attempts to balance performance and data redundancy. Data is distributed across disks along with parity information about the data. In the event of a disk crash, the parity information allows for the reconstruction of data. As a result, system-wide risk is lowered, as no individual disk can cause widespread loss of the data.
1 (mirroring) Although RAID Level 5 is generally faster than mirroring, it can significantly slow down data reads, as it must deal with (ignore) parity information when performing reads.

Because PowerPlay cubes should not be used as a primary storage place for data, RAID Level 0 (striping) is often most appropriate for large cube builds. This will provide optimal cube builds. In the event of a disk failure during a cube build, the cube can be rebuilt from the source data. If data redundancy is a requirement once the cube is built, the cube can be copied to another location if required.

Performance and Scalability in PowerPlay Applications

18

CHAPTER 2

PowerPlay Partitioning

The goal of partitioning has always been to achieve a satisfactory end user query response time without exceeding the cube production window. Partitioning is a strategy that improves query response time at the expense of cube build time. In PowerPlay 6.x, partitioning occurs automatically using a best guess approach that can be fine tuned by the user. Choosing the right partitioning strategy is, however, the subject of environmental constraints that include hardware power and the nature of end user queries performed. This means the default auto-partitioning cannot always pick the right partitioning strategy for every situation.

19

PowerPlay Partitioning

When To Focus On Partitioning


PowerPlay 6.x partitioning is effective only if the proper OLAP architecure and supporting hardware is in place. No amount of partitioning can correct deficiencies in these areas. The final tuning of query response time then comes down to what you can do with partitioning strategies. The Auto-partitioning capability in PowerPlay 6.x does most of the work for you, provided that you supply some simple information concerning partition size. Partitioning is a big cube philosophy which is only applicable if you are dealing with millions of rows of source data; partitioning is of little value for small cubes.

Performance and Scalability in PowerPlay Applications

20

Environmental Constraints

Environmental Constraints
Environmental Constraints are limitations in the BI architecture that presently cannot be changed. These constraints impact the range of Partition Sizes that can be employed and, in general, apply to all cubes. Environmental constraints include:
1. 2. 3. 4.

Network bandwidth Cube production window Acceptable query response time Server and end user hardware

Network Bandwidth
As a rule of thumb, avoid using large partitions when the PowerCube does not reside on the same machine as PowerPlay Server. The impact of network bandwidth constraints comes from the new architecture of PowerPlays Enterprise Server. Communication of information between the PowerCube and the end user can involve multiple networks and hosts. In PowerPlay 6.5, it is also possible to network PowerPlay servers to distribute requests to the right OLAP data source. The effect of partitioning on network traffic is most notable at the PowerPlay Server closest to the cube. Between this server and the cube, the volume of network traffic is proportional to the partition size and number of requests. Using a smaller partition size will support a larger number of queries. Another example of this is end users who employ PowerPlay Personal Server to access a cube shared on the LAN. Query processing is performed by each individual users Personal Server. Here again the volume of network traffic is proportional to the partition size and number of requests on the LAN. Where cube and PowerPlay Server are resident on the same host, network constraints are exchanged for the higher volume limits associated with hardware buses and back-planes which permit greater partition sizes. These higher limits may shift the constraint to the capability of the CPU to perform query calculations within the required response time.

21

Performance and Scalability in PowerPlay Applications

PowerPlay Partitioning

Cube Production Window


When considering cube production windows, decreasing partition size generally improves query response time by reducing the volume of data required for query calculation. This improvement is at the expense of increasing cube build time. Average query response time reflects the response time needed to keep users happy and the throughput required to service the user community.

Acceptable Query Response Time


This is normally expressed in terms of an average since it varies both with distance from the cube and with the complexity of the reports being executed against the cube. Other factors that can effect query response time include the number of simultaneous requests being processed and network capacity. It is important to understand the fluctuations in workload, like variances in user behavior during the course of the day, in order to verify the right OLAP architecture and hardware have been employed. For information about how to guage the average response time at your site, see Determining Average Query Time on page 70.

Server And End User Hardware


If an OLAP architecture is to support a given number of users it must be able to execute as many requests per second, as it receives. Estimating volumes of incoming requests is where capacity planning should start so that systems can be designed having a combination of hardware capacity and cube structure to support the request load. It follows that large partitions are only practical on servers that are capable of performing a large number of calculations per second. For information about sizing your server, see Basic Server Specifications on page 69.

After Environmental Constraints


Having defined the environmental constraints and determining that the average required response time is still outside acceptable criteria, several steps can be taken

Performance and Scalability in PowerPlay Applications

22

Environmental Constraints

to refine cube partitioning strategies. There are two critical strategies that guide successful cube partitioning:
1. 2.

Answer the majority of queries from the first or upper partitions (called Summary Partitions) Try to ensure that information sought by a query can be found within a single partition.

23

Performance and Scalability in PowerPlay Applications

PowerPlay Partitioning

Whats New In PowerPlay 6.0 Partitioning


Consider the following when you partition models in PowerPlay 6.x:
PowerPlay 5.x Autopartitioning not optimized to produce balance between access time and cube build time. PowerPlay 6.x

Autopartitioning is quicker and will usually produce a smaller faster cube than 5.x Manual partitioning No default partitioning. Partitioning is ON by default. The new autopartitioning feature determines the way to partition any given cube. Manual partitioning only. Supports both manual and automatic partitioning. The only partitioning controls were Partition numbers are assigned by a user partition numbers. when manually partitioning only. This facility will be less commonly used. Instead an extra tab on the Cube Properties dialog box provides controls that guide the auto-partitioning algorithm as shown below. Consolidation of input records was Consolidation with Sort is ON by default. off by default. Only 2 partition levels were recom- Up to 5 levels are commonly used. mended.

Terminology Refresh
It is important to remember that a partition level is not the same as a level in a dimension. A partition level is a set of categories that receive the same partition number. These categories can and probably will come from more than one level in a dimension when using the 6.x auto-partitioning algorithm.

Performance and Scalability in PowerPlay Applications

24

Whats New In PowerPlay 6.0 Partitioning

The New Partitioning Scheme


PowerPlay version 6.x partitioning (both Manual and Automatic) requires the new cube build algorithm. If you use a feature that forces Transformer to use the default 6.0 cube optimization then you will not be able to get the advantages of 6.x partitioning. The features that do not utilize the new cube build algorithm are:

Cube Optimization is not set to Auto-Partition Consolidation set to: either of NO or YES with Presort Using externally rolled up measures Using before rollup calculated measures (use calculated columns instead)

Auto-Partitioning
There are two controls used by the auto-partitioning algorithm. In order of importance they are:

The desired partition size (based on estimated number of consolidated records) The number of partitioning passes

The desired partition size is set by a slide control that is designed to help users select sensible values. It makes the trade-off between cube build time and query response time very clear. However, it does not show the effect that a change in par-

25

Performance and Scalability in PowerPlay Applications

PowerPlay Partitioning

tition size has on the partitioning strategy employed. The Desired Partition Size is the main control in determining the partitioning strategy.

Dragging the slider to the right (towards Faster Cube Access) decreases the Desired Partition Size. Dragging the slider to the left increases the Desired Partition Size. Similarly, editing the Estimated Number of Consolidated Records changes the Desired Partition Size accordingly.

Transformer selects groups of categories that satisfy the partition size criteria you specify. It does this in a way that optimizes ad-hoc query performance for that partition size. This means that users should try several different partition sizes that satisfy their environmental constraints. Users will arrive at a suitable partition strategy quickest if they start with larger partition sizes and work toward smaller ones (rather than the other way around). The number of passes can be adjusted at the end to find the best build time/query performance trade-off. It becomes a fine tuning tool that safe-guards against overpartitioning. Transformer doesnt necessarily use all the passes that you allow it but it will not use more. Transformer requires one pass for every partition level created. The smaller the partition size, the greater, the number of partition levels created (and the longer Transformer will generally take to build the Cube). This is why it is recommended that users start with larger partition sizes and work toward smaller ones. During this phase it is important to allow Transformer to use a large number of passes.

Performance and Scalability in PowerPlay Applications

26

Whats New In PowerPlay 6.0 Partitioning

Cutting back the number of passes will not prevent Transformer from producing a partitioned cube. It simply controls the amount of effort Transformer will expend on a given partition size. Transformer always stops partitioning when the number of records in the summary partition drops below the specified partition size. The key point is that later passes may not produce significant improvements in query response time. This is where the number of passes may be refined to get the right query performance/cube build time trade-off. A useful tip to know is that the last pass that Transformer performs is summary partition consolidation. If an additional pass does not yield a reduction in summary partition size then this pass can be skipped. (If both the summary partition and the level one partition finish with the same number of records then it is likely that Transformer would use more passes if you allow it. It shows that the summary partition has not been consolidated.)

PowerPlay 6.x Manual Partitioning


There are reasons why you might wish to use manual partitioning in 6.x. Some of these are:

Deciding whether to adopt 6.x auto-partitioning over your 5.x partitioning scheme Trying to improve on the auto-partitioning strategy Dealing with large cubes or an unusually structured cubes Tuning specifically for the top 10 reports

Lets consider reasons for using auto-partitioning instead of a 5.x partitioning scheme. If the 6.x Auto-Partitioned cubes build faster and answer queries at least as quickly as 5.x cubes then there is no need to compare the actual partitioning strategies. Some time should be spent with Auto-partitioning under 6.x since it generally performs very well. If necessary your 5.x partitioning scheme can be loaded into 6.x but you must open the model in 6.x and change the default optimization setting in cube properties to auto-partitioning. In addition you must remove all the manual partition information from your old model otherwise the Auto-partitioning algorithm will not be used. Also, if you want optimal performance, ensure that you dont use any features that may prevent Transformer from using the new Cube build engine. See The New Partitioning Scheme on page 25. Experience has shown that it is possible to improve on Auto-partitioning in some situations. This might be achieved by starting with the partition points already

27

Performance and Scalability in PowerPlay Applications

PowerPlay Partitioning

selected by the Auto-partitioning feature. Note that the moment partitioning level numbers are assigned manually, Auto-Partitioning is disabled. Most models have dimensions whose lowest level details are frequently accessed (i.e product codes). In these dimensions, it is important to manage categories with a high parent to child ratio and partition them accordingly. Normally Auto-partitioning will do a good job provided you pick a partition size that is big enough. Too many partition levels will adversely effect lowest level detail reports. This is another reason why you should first auto-partition with a large partition size and work down in size to get acceptable query response. After dealing with large detail dimensions, consider dimensions that are frequently summarized. These dimensions may be less obvious than the detail dimensions. If 80 percent of your user community creates summary reports with categories from a given dimension, consider partitioning at a high level in that dimension. Auto-partitioning will do this automatically so long as the partition size is small enough for Transformer to consider the dimension as a candidate for partitioning. The partition status information will show the partitions chosen by Transformer. Using a small partition size may go against advice offered earlier. This is true if you have both frequently summarized dimensions and dimensions containing large category counts used primarily to report lowest level details. Unfortunately it is not possible to use different partition sizes on different parts of the cube. The best one can do is to favor either summary reports or lowest level detail reports.

Excluding Dimensions
The auto-partitioning algorithm does permit the exclusion of dimensions. Dimensions might be considered for exclusion if they are used only for lowest level detail reports. By excluding detail dimensions (which are often large) it will be possible to set the partition size smaller. Cube build time will be faster and summary queries will be serviced quickly. If the same model included partitioning on detail dimensions, lowest level queries would be slower since they are reporting across multiple partitions.

Other Tips

The moment any partitioning level numbers are assigned to either categories or levels auto-partitioning is turned off and only the categories and levels specified manually are partitioned. When manually defining partitions in 6.x, the number of Auto-partition passes

Performance and Scalability in PowerPlay Applications

28

Whats New In PowerPlay 6.0 Partitioning

must be set to the number of partition levels assigned manually or greater. Whether in Automatic or Manual mode, Transformer 6.x requires one pass for each partition level defined. In 5.x, we made the recommendation that dimensions with alternate drilldowns were poor choices for partitioning. The categories in alternate paths will likely come from multiple partitions, causing slow drill-down response times. In 6.x, Auto-partitioning is more equitable in choosing candidate categories for partitioning. (As a result it will frequently partition categories from dimensions that have alternate drill paths, without detrimentally effecting performance). Also only the primary drill path is used for partitioning. Manually defined 6.x partition level numbers dont necessarily define the order in which partitions are placed in the cube. Partition level numbers may be reordered by Transformer. When partitioning manually, rules similar to 5.x partitioning apply. If performance is not adequate after partitioning repeat the process using a level (in some other dimension) that adequately divides the number of records in the summary partition. For each new level of partitioning, increase the partition number by 1. Try to select categories across the entire dimension and assign these categories the same partition number. Once complete, you should not be able to traverse a path from the root category to any leaf without encountering a category with an assigned partition number.

29

Performance and Scalability in PowerPlay Applications

CHAPTER 3

Using Server Transformer

In production environments, there is often a desire to leverage the CPU power and resources of a UNIX server to update PowerPlay PowerCubes. Processing production PowerCubes in this fashion enables you to centralize administration and cube storage and allows PowerCube updates to become a part of the existing data warehouse update process. After the warehouse updates, the PowerCubes are updated. Also, the UNIX server can often access the database faster because of location or network priority. PowerPlay Server enables you to use Transformers graphical user interface on the PC to model your two-dimensional data into multidimensional PowerCubes. However, the Administrator Server Edition also enables you to offload the creation of models and PowerCubes to the rsserver program (PowerPlay Server Transformer), which is a version of Transformer that runs on a supported UNIX server platforms. This chapter addresses some common issues faced when using PowerPlay Server Transformer, and has two major sections:

Part I covers the basics: an installation checklist, step-wise testing, and some common pitfalls. Part II discusses how to create an effective working environment by configuring your working areas, customizing where files go, and automating cube builds.

31

Using Server Transformer

The Basics
Often, users are confused about what the components of PowerPlay Server are and how they interact. Understanding what the pieces do is vital to successfully using the product. The main components of PowerPlay Server Transformer are: PowerGrid. PowerGrid provides the basic networking services between client and server Transformer. The PowerGrid network daemon (netd) must be started and left running to respond to periodic requests from the client. PowerGrid uses the script rsserver.sh, which sets the necessary environment variables and starts rsserver, to launch Server Transformer. An addiitonal utility, called NetInfo, is installed on the client machine, and can be used to test your defined client/server connections. NetInfo verifies that messages are being sent to netd, and that netd is responding to them. Server Transformer (rsserver). The rsserver program is the executable for PowerPlay Server Transformer that runs on the UNIX server. The rsserver program provides the engine to create PowerCubes on any of the supported server platforms such that the processing of data during category generation and cube creation takes place on the server. All PowerCube prototyping is still done on the client; however, you must install the PowerPlay Administration Server edition on the client in order to have the options available to process the cubes on the server.

Installation Checklist
Before installing and working with the product, you should read the Installation Guide and the Transformer for UNIX Guide. This book focuses wholly on Client/ Server Transformer, and contains explanations of all the environment variables that can be used to customize your working area. Reading these will also familiarize you with where specific information is located so you can reference them later.

32

The Basics

Ensure you have the required hardware and software. The supported Operating Systems and databases are:
Database Product and Version Oracle 7.3 (base) and 8 (current Platforms and Operating Systems HP-UX 10.20 (base) and 11.x (current) Sun Solaris 2.5 (base) and 7 (2.7) (current) IBM AIX 4.2 (base) and 4.3 (current) Digital UNIX 4.0 (base) and 4.0E (current) SINIX 5.43 (current)
HP-UX 10.20 (base) and 11.x (current) Sun Solaris 2.5 (base) and 7 (2.7) (current) IBM AIX 4.2 (base) and 4.3 (current) Digital UNIX 4.0 (base) and 4.0E (current) HP-UX 10.20 (base) and 11.x (current) Sun Solaris 2.5 (base) and 7 (2.7) (current) IBM AIX 4.2 (base) and 4.3 (current) Digital UNIX 4.0 (base) and 4.0E (current) SINIX 5.43 (current) HP-UX 10.20 (base) and 11.x (current) Sun Solaris 2.5 (base) and 7 (2.7) (current) IBM AIX 4.2 (base) and 4.3 (current) Digital UNIX 4.0 (base) and 4.0E (current) SINIX 5.43 (current)

Sybase SQL Server 11.2 (base), Adaptive Server 11.5 (current) and Sybase Adaptive Server 11.9.2

Informix Dynamic Server 7.2 (base) and 7.3 (current)

Informix XPS 8.21 (current)

IBM DB2 Common Server 2.1 (base) HP-UX 10.20 (base) and 11.x (current) and its successor: Universal DataSun Solaris 2.5 (base) and 7 (2.7) (current) base 5.x (current) IBM AIX 4.2 (base) and 4.3 (current)

Consult the README help file accompanying PowerPlay for any changes to the list of supported databases. Also, before beginning your installation, you should ensure you have everything on the following checklist:

UNIX root password UNIX login and password for user who will store server models and cubes PowerPlay Server CD ROM Host name or IP address of UNIX server UNIX shell ( i.e. csh, sh, ksh). Echo $SHELL to determine this. You should be able to successfully TELNET from the PC to the UNIX server If possible, make sure a DBA will be available for questions, especially if you are planning to store cubes in the database. Many DBAs do not like to give out create and drop privileges; they want to be involved in this step.

33

Performance and Scalability in PowerPlay Applications

Using Server Transformer

If possible, have a UNIX System Administrator available. Double-check the version of UNIX you are using - the command uname -a at the UNIX command line should tell you this. Ensure the version is supported. Make sure you know what database(s) are used and that the version is supported. Cognos Customer number. This will enable you to get to the Customer Support area and Knowledge Base on the Cognos Web Site to get common solutions to configuration issues.

For Oracle
Ensure that:

the environment variables ORACLE_HOME, TNS_ADMIN, and ORACLE_SID are set up and pointing to the correct locations. For example:
ORACLE_HOME=/train/oracle7.3/products TNS_ADMIN=$ORACLE_HOME/network/admin ORACLE_SID=ORAUX73

the bin directory under Oracle is in the path. the Oracle login for the schema where you will place your database PowerCubes has create table, create index, and drop table privileges. OCI via SQL*Net (Oracle 7.3 and 8), Net 8 (Oracle 8) is installed on the PC. the same TNSNAME defined (case sensitive) is defined in the client and server TNSNAMES.ORA file. you can successfully ping the Oracle Instance using TNSPING from the client PC and the server.

For Sybase
Ensure that:

the environment variables: SYBASE and DSQUERY set up and pointing to the correct locations. For example:
SYBASE=/train/sybase11/server DSQUERY=SYBTRAIN (Sybase Server Name would list here)

the bin directory under Sybase is in the path

Performance and Scalability in PowerPlay Applications

34

The Basics

the Sybase login has create database privileges CT-Lib (Sybase 11 Client Library) is installed on the client the same Server Name (case sensitive) defined in the server INTERFACES file (stored in $SYBASE) and the client PC (stored in c:\sybase\ini\sql.ini file). You can successfully ping the Sybase SQL Server using WSYPING or DSEDIT

For Informix
Ensure that:

the environment variables: INFORMIXDIR and INFORMIXSERVER are set up and pointing to the correct location. For example:
INFORMIXDIR=/train/informix INFORMIXSERVER=INFSOL72 (Informix Server Name would list here)

the Informix login that has create privileges ESQL/C 7.23 for Win 32 (Informix Net 7.2 or higher) is installed on the client the same Server Name defined in the server. you can successfully access Informix using Ilogin.

Step-wise Testing
As discussed earlier, several components make up the PowerPlay Administrator Server environment. With step-wise testing, you ensure that each piece is working before you start using the tool. This way, if an environment setting or configuration step was left out, you can find it right away and have a better chance of isolating the proper area if a problem occurs. The following steps are provided as guidelaines to make sure each piece works individually, and builds up to using all of the pieces together:
1.

Install the server software and make sure PowerGrid is working. Install the software on the UNIX server and the PC. Configure the environment settings on the Server. For client/server operations, set the database environment variables and source the setpya.sh script in the

35

Performance and Scalability in PowerPlay Applications

Using Server Transformer

rsserver.sh file that resides in powergrid/tcpip/netbin. For command line operations, make the same modifications to the .profile or the .login. Verify the cognos.ini on the PC is setup correctly to run Transformer Server by ensuring that the Transformer Server service is defined. The entry looks like this:
[Service - Transformer Server ] NETWORK=rsserver.sh

Configure and start the PowerGrid service (netd) on the UNIX server. You may set up logging for netd (see Automation Scripts on page 47 for information about how to do this). Do not forget to assign a port number to PowerGrid in the /etc/services file on the server. Define a PowerGrid connection and test to ensure PowerGrid is working correctly via netinfo.exe from the PC. See Chapter 2 of the Transformer for UNIX Guide for information about PowerGrid and NetInfo.
2.

Make the environment cleaner and nicer to work in (see Creating an Effective Working Environment on page 43.): Create directory structures on the UNIX side for cubes, models, logs, and work directories. Modify the Transformer Server startup script (rsserver.sh) to add preferences for logging, and so on. Create a server preferences file (.trnsfrmr.rc).

3.

Create a server PowerCube using Transformer Server with an ASCII file Create a simple model with an ASCII file (national.asc sample) and set it up to work in client/server mode. Move the model file to the server using client Transformer (preferred) or use FTP method. Build this on the server. Copy the PowerCube to the PC and open cube with PowerPlay Client.

4.

Create a server PowerCube using Transformer Server with .IQD files Take a model that uses .IQD files and set it up to work in client/server mode. Move the model file to the server using client Transformer or FTP method. Build this on the server. Copy PowerCube to PC and open cube with PowerPlay Client.

5.

Optional - only if you are storing cubes in your database: Create Database cubes from the server Take a model and build as standard cube first (or, can use the cube from step 4)

Performance and Scalability in PowerPlay Applications

36

The Basics

Create the tables in the database and allocate space using double the size of the standard cube as a rough guideline. Modify the model to build the cube in the database. Move the model file to the server. This is optional; you can buildthe model from client first to make sure it works then move it to the server. Recreate the cube as a database cube.

Common Pitfalls
There are some common pitfalls that users encounter when using PowerPlay Server. Most of these concern environment and configuration settings. The sections below discuss different areas and what to look out for in each.

Environment settings are everything in UNIX


When using .IQD files to build cubes, you must make sure the client and server database information is in sync. For example, the same TNSNAME must be defined defined (case sensitive) in the client and server TNSNAMES.ora file for Oracle. Similarly, the same server name (case sensitive) must be defined in the client PC sql.ini file and the server INTERFACES file for Sybase. PowerGrid uses rsserver.sh when it launches Server Transformer. The rsserver.sh file sets up environment variables to control preferences for the Server Transformer process, and it sets up the necessary environment variables to run Server Transformer. If you are using .IQD files to build your cubes, you will need the appropriate database environment variables set in this file as well. Set these database variables up before the line that calls the setpya.sh script. See rsserver.sh on page 45. Setpya.sh (or setpya.csh if using csh) is used to set up the PowerPlay environment variables and should be sourced in both the users .profile and rsserver.sh. This should be done after all the other environment variables are set as it may depend on specific variables being set first (for example, database specific environment variables). Make sure the TNSNAMES.ora, INTERFACES, or other database specific files contain connection information for all .IQD data sources. If you build cubes with new data sources, the connection information for these new sources needs to be added to these server files.

37

Performance and Scalability in PowerPlay Applications

Using Server Transformer

A few words about Root


PowerPlay Server Transformer installation files need to be owned by root. PowerGrid not only needs to be owned by root, but the process must be started as root. If client/server processes such as model synchronization fail suddenly, make sure that the owner of the rsserver.sh file has not been changed from root to another user after the install.

PowerGrid Needs its Own Port


Make sure there is an entry for PowerGrid in the /etc/services or, if an NIS Protocol is used in your system, you must define the communications service port for the PowerGrid netd in the Services Database. Verify that the port number used in the connection between client and server matches. The default port number set for PowerGrid is 1526. If this is not the port number configured for PowerGrid in the /etc/services file on your UNIX server, you must change your local port setting to match. See Chapter 2 of the Transformer for UNIX Guide. When communicating between client and server, PowerGrid must be running. As a rule of thumb, you should modify the system reboot script so that the PowerGrid netd is started after each reboot. This way it is certain to always be running.

PowerPlay Administrator Server PC Requirements


You must install the PowerPlay Administration Server client piece on any PC that needs to access the server. This edition enables the options to build cubes on the server. The cognos.ini file should have the Transformer Server service defined network=rsserver.sh (upon install the product should add this line, but double-check this if you are having problems). The 32-bit database connection software is needed to run Transformer 6.x and Impromptu.

Performance and Scalability in PowerPlay Applications

38

The Basics

Passwords
When working with models, it is often a good idea to save the model as an .MDL file as well as a .PYx file. Passwords are embedded in .PYx models, which are binary files. However, they are not embedded in .MDL models, which are text files. This is correct behavior of the product and documented in the PowerPlay Administrators Guide. Therefore, if you have embedded your password into your model, saved it as both .PYx and .MDL, closed the tool and then decided to work with the .MDL model, Transformer will not save the password information in this version of the model. Instead, it will default to prompt for password in the sign on. If you try to restore this model from the client to server and run on the server, the process will fail because the server cannot prompt for the password. To avoid this problem, either use the k option (see below) on the UNIX command line to build the cube, or add the password information to the model, save as a .PYx file, and use this format to synchronize the server cubes. One method for moving models from the client to server and building on the server is to ftp the .MDL version of the model, process the .MDL model on the server to a .PYx, and then build from the UNIX command line. When using this method, since passwords are not embedded in the .MDL file, you should use the -k option of rsserver to explicitly set the password. All rsserver command line options are documented in Chapter 5 of the Transformer for UNIX Guide. If a model suddenly stop working, make sure passwords used are still valid since they may periodically change. Before any communication between the client and server can take place, Transformer will prompt you to connect to the UNIX server with a user id and password. By default, security is enforced with netd(), and the user ID and password from login are checked in /etc/passwd file OR other password file for TRUSTED systems. Examples of password files for Trusted systems are:
AIX:/etc/security/passwd HPUX(10 and higher):/tcb/files/auth/ Sun Solaris:/etc/shadow

Therefore, in order for netd to allow you to connect to the server, the user ID and password information should be located in either the /etc/passwd or one of the above files for authentication, otherwise you will not be allowed to connect.

39

Performance and Scalability in PowerPlay Applications

Using Server Transformer

Some Server file information


When Transformer Server builds a cube, it creates several temporary files. One is a .QYx, which is a temporary work file and another is a .LCK file, which guards the model from being used by other processes at the same time. After work with a model is finished, these files are removed. However, if the Transformer process is killed or otherwise fails abruptly, these files may not be cleaned out. If a .LCK file exists for a model, Transformer sees this as a locked model and does not allow you to use it. If this occurs and no processes are using the model, you can remove the .LCK file to unlock it and continue. Note the supported data sources. Server processed cubes can not use Impromptu Hotfiles or PC format files, such as dbase files. Supported data sources are:

.IQD files to get data from databases that can be accessed from the UNIX server delimited-field text files fixed-field text files

There needs to be enough temporary space on the server to build PowerCubes. You can set DataWorkDirectory and ModelWorkDirectory to places that are large enough to force Transformer to use these as work areas. This space should be larger than the expected size of the cube. See Configuring Work Locations for Transformer on page 16.

Special needs of Database Cubes


If you are storing your cube in a database, but your database option is not available in Transformer, make sure to update the Database Support section in Cognos.ini by putting 1 after the = for the database you would like to store the cube in. Then the option should be available in Transformer. While trying to update a database cube, if Transformer returns the following error:
(TR1102) Transformer Server detected a model error at line 2 in file (TR1901) PowerPlay Data Server is already in use in an incompatible mode

This error might indicate another user is connected to the database cube. Have the other user disconnect from the database cube until the cube update is complete. Transformer attempts to apply an exclusive lock to the database

Performance and Scalability in PowerPlay Applications

40

The Basics

when performing a database cube operation - if someone else is already using that database, the lock cannot be applied and an error is returned.
Space Requirements

If you are building cubes in an RDBMS container, the space requirements are different that those required to build the equivalent Cognos standard cube. This section describes the results of an internal test performed using Oracle. This is based on a specific model, and actual sizes will vary depending of the specific structure of the client model and size of data source:

As a general rule, an Oracle PowerCube will occupy approximately twice the space of its equivalent Standard PowerCube. For example, if you build a standard cube that produces an 80MB MDC file you would need approximately 160MB of Oracle tablespace to build the equivalent RDBMS cube, including indexes. Of the 160MB, Transformer would use approximately 10% for indexes and the rest for data. The end result would be a 144MB data tablespace and a 16MB index tablespace. There are 17 tables used to store a PowerPlay 6.0 cube in an Oracle database. Only a few of these tables store significant amounts of data, with the XXXX_BITMAP table occupying the bulk of the space. The following are suggested extent sizes for PowerPlay 6.x:
Table Name XXXX_ACCESS XXXX_ALLOCATION XXXX_APPLICATION XXXX_APPLICATION60 XXXX_BITMAP XXXX_CAT_INDEX XXXX_CATEGORY XXXX_CELL XXXX_CLASS XXXX_CURRENCY XXXX_DIMENSION XXXX_DRILL XXXX_EXCHANGE XXXX_LEVEL XXXX_MEASURE XXXX_PARTITION Initial Extent 5K 5K 5K 5K IM IM 1M 5K 5K 5K 5K 1M 5K 5K 5K 5K Next Extent 5K 5K 5K 5K IM 1M 1M 5K 5K 5K 5K 1M 5K 5K 5K 5K

41

Performance and Scalability in PowerPlay Applications

Using Server Transformer

Table Name XXXX_TEST

Initial Extent 1M

Next Extent 1M

With very large cubes, its advisable to change the INITIAL and NEXT extents for the XXXX_BITMAP table to be much larger depending on the number of times the table can extend. Also, the XXXX_CELL table is only used if crosstab caching is enable in the PowerCube property sheet (Processing TAB). If this option is selected then you would want to change the extent sizes to be 1 MB. The default table creation scripts supplied with Transformer do not create the tables and indexes in separate tablespaces. You can modify these scripts to use different tablespaces for the indexes and tables if necessary or desired.

RDBMS Tunable Parameters

When creating cubes in an RDBMS, there are certain parameters you can set that may speed up Transformers performance.
RDBMS Oracle Description of Settings

The Fetch Number of Rows entry in the [Misc] section of cogdmor.ini. This setting controls how many rows are retrieved during a fetch operation during data read. The default setting is 10. Try bumping it up to 50 or 100.

For larger cubes, try creating your indexes and tables in separate tablespaces. Certain of Oracles default configuration settings are in conflict with Transformers requirements.

Sybase

DB/2

The NUMOFROWS setting in the [Misc] section of cogdmct.ini. This setting controls how many rows the server returns to Client Library for each client fetch request. Bumping it up (for example to 50 or 100) can have a positive impact on some systems. In the cognos.ini file, the following entry controls the level of locking for DB/2 during cube updates: [PowerPlay Server - DB2] security=0/1 The default setting is 1, security enabled. When you disable security, you will not be able to tell if someone is using the cube. A cube build will overwrite an existing cube. This wont necessarily be apparent to the client, but data could change during a client session. You should ensure that no users are connected when you build cubes.

Performance and Scalability in PowerPlay Applications

42

Creating an Effective Working Environment

Creating an Effective Working Environment


Directory Structures
For the purposes of discussion, assume that PowerPlay Server Transformer is installed in the directory /opt/cognos. This directory contains all of the files installed with the product. Also, assume that the corresponding working directory is located in /home/cognos. This directory contains all of the models, log files, cubes, scripts, and temporary workspace used by PowerPlay Server Transformer. We could have had this working area in the same directory as the product is installed. However, most UNIX applications are kept separate from work areas, and so its best to follow this convention. The sub-directories that you could create are:
Location /home/cognos/cubes: /home/cognos/models: /home/cognos/log: /home/cognos/scripts: /home/cognos/scripts/bin: /home/cognos/work: Description The directory where the cubes are saved. The directory where the models are saved. The directory where log files are saved. The directory where all scripts are saved. The directory where script utilities are (buildscript). The directory used for work or temporary space.

Configuration Files
The following is a listing of files used by PowerPlay Server Transformer that have either been created or edited after installation along with explanations. The only addition that must be added to ensure proper usage is to the file ".profile". The changes to the other files are not absolutely necessary to run PowerPlay Server Transformer, but they either make the product easier to use or add more logging capabilities. Changes are highlighted in bold lettering.
.trnsfrmr.rc

/home/cognos/.trnsfrmr.rc: This is a resource file that was created after installation. This file contains default settings for PowerPlay Server Transformer to use when

43

Performance and Scalability in PowerPlay Applications

Using Server Transformer

executed on the UNIX command line (not when executed via PowerGrid from the PC).
FilenameVariables=true ModelWorkDirectory=/home/cognos/work DataWorkDirectory=/home/cognos/work DataSourceDirectory=/home/cognos/work CubeSaveDirectory=/home/cognos/cubes ModelSaveDirectory=/home/cognos/models LogFileDirectory=/home/cognos/log LogDetailLevel=6

.profile (or .login)


/home/cognos/.profile: The UNIX system administrator creates this sample login file when the user "cognos" was added to the system. The bold items were added to ensure the proper environment variables are set for PowerPlay Server Transformer when the user logs into the system. The directories /home/cognos/scripts and / home/cognos/scripts/bin are added to the path. These directories contain the scripts created to automate the building process. Also, the setpya.sh file was called to set up the PowerPlay environment variables; it is usually safest to add this as the last line of the file since it may depend on other environment variables being set first (for eaxmple, database specific environment variables).
TERM=vt100; export TERM ORACLE_TERM=vt100; export ORACLE_TERM ORACLE_HOME=/fin030601/oramtmo/7.3.2.2; export ORACLE_HOME ORACLE_SID=sin3dev1; export ORACLE_SID . /usr/local/bin/oraenv #PATH=$PATH:. PATH=/usr/local/bin:/usr/bin:/fin030601/oramtmo/7.3.2.2/bin:/home/ cognos/scripts/bin:/home/cognos/scripts EDITOR=vi; export EDITOR # Set up the shell environment: # Set -u trap "echo logout" stty erase "^H" kill "^U" intr "^C" eof "^D" . /opt/cognos/pya65_278/setpya.sh

Performance and Scalability in PowerPlay Applications

44

Creating an Effective Working Environment

rsserver.sh
/opt/cognos/powergrid/tcpip/netbin/rsserver.sh: This file is installed with PowerPlay Server Transformer and is used by PowerGrid to set the necessary environment variables and start the Rsserver process. This is only used to start the rsserver process when executing tasks from the PC (via PowerGrid) on the server. The bold items contain default settings for PowerPlay Server Transformer to use when executing tasks from the PC. Also note that the necessary database environment variables are set near the top of the file.
#!/bin/sh # $Header: /High Sierra/UNIX/setup_ux/RSSERVER.SH 4 Taos $ # $Author: Taos $ # rsserver.sh PowerGrid 3/11/97 11:01a

- Bourne-Shell script to launch Transformer via

# set environment variable for Oracle ORACLE_HOME=/fin030601/oramtmo/7.3.2.2 ; export ORACLE_HOME . /usr/local/bin/oraenv # set environment variables required to run rsserver PYA_USR=${PYA_USR:=/opt/cognos/pya65_278} . $PYA_USR/setpya.sh if [ x"$DSQUERY" = x ] ; then DSQUERY=SYBHPUX491 ; export DSQUERY fi if [ x"$PLANG" = x ] ; then PLANG=english ; export PLANG fi # You may modify values and remove comment indicator # to use # these environment variables to modify Transformers behaviour. # Preference values established here override any set by setpya.sh above.) # The documentation describes these and many additional controls. DataSourceDirectory=/home/cognos/work ; export DataSourceDirectory LogFileName=/home/cognos/log/$$.log ; export LogFileName ModelSaveDirectory=/home/cognos/models ; export ModelSaveDirectory CubeSaveDirectory=/home/cognos/cubes ; export CubeSaveDirectory DataWorkDirectory=/home/cognos/work ; export DataWorkDirectory # launch Transformer in PowerGrid mode... exec $PYA_USR/bin/rsserver -w

45

Performance and Scalability in PowerPlay Applications

Using Server Transformer

netpgrc
/opt/cognos/powergrid/tcpip/network/netpgrc: This file is installed with PowerPlay Server Transformer and is used to set the necessary environment variables and start the netd daemon process (PowerGrid). The item "env > $NPPATH/netd_env.log" will redirect the output of the env command (display current environment variables) to the log file named "netd_env.log" in the netbin directory. The last highlighted addition tells PowerGrid to log detailed actions and specify that the information be written to a log file called "netd.log" in the netbin directory. This is especially useful when troubleshooting PowerGrid. This file should be used to start the PowerGrid netd process because it ensures the proper environment variables are set and enables logging.
#!/bin/sh ##################################################################### ######### # COGNOS INC. PowerGrid Network Daemon rc script for all UNIX platforms ##################################################################### ######### COGNOS=/opt/cognos export COGNOS PG_VER_DISPLAY=2.0.6 # # Sourcing all Cognos client-server products Environment variables # if [ "echo /opt/cognos/powergrid/tcpip/netbin/net??rc" != /opt/ cognos/powergrid/tcpip/netbin/net??rc ] ; then for script in $COGNOS/powergrid/tcpip/netbin/net??rc do . $script echo "sourced Cognos Products environment through $script" done fi # NPPATH=$COGNOS/powergrid/tcpip/netbin export NPPATH # # The following sample lines are used by NETD for non-default comm. port. # If the env. variable NPNETD is set up here, NETD will NOT pick up # any port number from /etc/services. Please take out "#" both lines

Performance and Scalability in PowerPlay Applications

46

Creating an Effective Working Environment

# and change yyyy below to whatever the actual port number before # re-start NETD. # #NPNETD=ISyyyy #export NPNETD # LANG=C export LANG SRVCMSGS=$COGNOS/powergrid/tcpip/network/srvcmsgs.cat export SRVCMSGS # # COGNLSTAB=$COGNOS/powergrid/tcpip/network/coglangtab export COGNLSTAB # env > $NPPATH/netd_env.log if [ uname -m = i386 ]; then echo "SCO PowerGrid Network Daemon Version $PG_VER_DISPLAY" su root -c "$COGNOS/powergrid/tcpip/network/netd &" else echo "UNIX PowerGrid Network Daemon Version $PG_VER_DISPLAY" $COGNOS/powergrid/tcpip/network/netd -d -f &

fi

Automation Scripts
The following are examples of scripts that can be used to automate the scheduling of PowerPlay Transformer cube builds. These scripts assume the same installation and work directories as mentioned earlier. The work directories must be in place for these scripts to work. File 1, buildscript, is the shell program that can be called to build a script to run any model. To use buildscript: At the UNIX command prompt, type "buildscript". . When prompted, type the name of the model you are building the script for. An example:
fin_hp3:/home/cognos-> buildscript Please enter the name of the model to create the run script for.

47

Performance and Scalability in PowerPlay Applications

Using Server Transformer

ddbcube Building /home/cognos/scripts/run_ddbcube.sh ------ Finished -----fin_hp3:/home/cognos->

In the above case, the script "run_ddbcube.sh" can now be added to a crontab to be scheduled to run automatically. File 2, generic, is used as a template for buildscript to work. File 3, run_ddbcube.sh, is a sample script created using buildscript. File 4, crontab, is a sample of a crontab that can be used to schedule tasks in UNIX. The "crontab -e" command needs to be run in UNIX to create this file. Note, when more than one task needs to be run, you add the tasks to the same crontab using the "crontab -e" command. Refer to UNIX documentation for more information on cron.

File Locations
For the automation scripts to work, they need to be kept in the directory locations they currently reside in. /home/cognos/scripts/bin/buildscript /home/cognos/scripts/bin/generic All scripts built should reside in: /home/cognos/scripts To run the scripts, the user should log into UNIX with a user ID set up to work with Server Transformer.
File 1: /home/cognos/scripts/bin/buildscript

This file uses /home/cognos/scripts/bin/generic as a template and builds scripts to run models with PowerPlay Server Transformer. Before running this, you must apply execute rights to the file.
#! /bin/sh

Performance and Scalability in PowerPlay Applications

48

Creating an Effective Working Environment

# Edit "workdir" to point to the directory where the sub-directories for # models, cubes, logs, etc are located workdir=/home/cognos echo "Please enter the name of the model to create the run script for. " read modelname echo "Building $workdir/scripts/run_$modelname.sh" #Replace all occurances of "model_name_here" with the input name sed "s/model_name_here/$modelname/g" $workdir/scripts/bin/generic > $workdir/scripts/run_$modelname.sh # Make the shell script executable chmod +x $workdir/scripts/run_$modelname.sh echo "------ Finished ------"

File 2: /home/cognos/scripts/bin/generic

This file is used by /home/cognos/scripts/bin/buildscript as a template to build to run models with PowerPlay Transformer Server.
#! /bin/sh # This script runs the given model file. # Edit workdir to point to the directory where the sub-directories for # models, cubes, logs, etc are located # Edit installdir to point to the appropriate pya* directory where the install files are located workdir=/home/cognos installdir=/opt/cognos/pya65_278 . $HOME/.profile . $installdir/setpya.sh # Save the previous log file (if it exists) if [ -e $workdir/log/model_name_here.log ]; then mv $workdir/log/model_name_here.log $workdir/log/ model_name_here.bak fi # Run model_name_here and output to a log file rsserver -c -i $workdir/models/model_name_here.pye > $workdir/log/ model_name_here.log 2>&1 &

49

Performance and Scalability in PowerPlay Applications

Using Server Transformer

ret_stat=$? exit $ret_stat

File 3: /home/cognos/scripts/run_ddbcube.sh

This is a SAMPLE of a script created by /home/cognos/scripts/bin/buildscript. In this example, the name of the model is ddbcube.
#! /bin/sh # This script runs the given model file # Edit workdir to point to the directory where the sub-directories for # models, cubes, logs, etc are located # Edit installdir to point to the appropriate pya* directory where the install files are located workdir=/home/cognos installdir=/opt/cognos/pya65_278 . $HOME/.profile . $installdir/setpya.sh # Save the previous log file (if it exists) if [ -e $workdir/log/ddbcube.log ]; then mv $workdir/log/ddbcube.log $workdir/log/ddbcube.bak fi # Run ddbcube and output to a log file rsserver -c -i $workdir/models/ddbcube.pye > $workdir/log/ddbcube.log 2>&1 & ret_stat=$? exit $ret_stat

File 4: Sample crontab with entry to run script: run_ddbcube.sh

CRON is a UNIX scheduling utility that uses a crontab (like below) to list and track the scheduled tasks it needs to run.
# =================================================================== # crontab input file to run cognos # =================================================================== # flags (, between mult values, - for range, * for all) #minute = 0 to 59 #hour = 0 to 23 # day of month= 1 to 31 #month= 1 to 12

Performance and Scalability in PowerPlay Applications

50

Creating an Effective Working Environment

# day of week= 0 to 6 (0=Sun, 1=Mon, 2=Tue, 3=Wed, 4=Thu, 5=Fri, 6=Sat) # =================================================================== # -------ddbcube -------------# This will run daily at 4:00 am 00 4 * * * /home/cognos/scripts/run_ddbcube.sh

# ====end of crontab input file ====

51

Performance and Scalability in PowerPlay Applications

CHAPTER 4

Cube Build Samples

This section describes the results of a series of Cognos internal tests conducted using PowerPlay version 6.5. All results were obtained on the following server:
Digital Prioris Model ZX-109 Dual 200MHz Pentium Pro Processor 512 MB memory Digital Storage Array (DS-SWXRA-HA) configured with 130GB of storage (RAID-5). Drives in array are DS-RZ1DB-VW (9.1 GB Ultra SCSI Drive)

53

Cube Build Samples

Cube Build Results


The following test results outline Transformer cube build times for three different types and sizes of cube (Row Dominant, Category Dominant, Row & Category Dominant):
Row Dominant 41 60 Category Dominant 31 170 Row & Category Dominant 336 410

Cube Build Time (Min.) Cube Size (MB)

Two of the three models used in the benchmark tests originated from actual clients (Row Dominant and Category Dominant). The Row and Category Dominant model was manufactured internally but its design was based on actual client models.

Environmental Settings and Methodology


For all Transformer runs the following environmental settings were used:

In the [PowerPlay Server] section of cognos.ini, the value for WriteCacheSize was set to 32000. In the [Services] section of cognos.ini the value for Temporary was set to a directory that had a large amount of available disk space. In the [Services] section of cognos.ini the value for SortMemory was set to 51200. This is the equivalent of 100MB.

The following steps were repeated 4 times for each of the Transformer models:

All unneeded processes on the test machine were halted. An MDL script was then used to run the following sequence: Open the test model. Change the name of the powercube (using a numbered sequence) Build the PowerCube. Save the populated model to PY.

Stated results are an average of the 4 test run timings.

Performance and Scalability in PowerPlay Applications

54

Cube Build Results

Row Dominant Model


The row dominant model was used to create a Cognos Standard cube with the following settings:

auto-partitioning with 5 passes maximum and a maximum partition size of 500,000 records crosstab caching enabled

Dimension Map and Model Attributes

Model Attribute Number of Categories Number of Dimensions Number of Measures

Description 32,000 5 (measures not counted as a dimension) 4 (one calculated)

2 x 64 bit, 1 x 32 bit, 1 after rollup calculated Source Data Format ASCII (comma-delimited) Number of source files 14 (11 provide structure, 3 are transactional) Enable Multi Processing was set for the 4 largest queries. Number of Transaction 10 million input records Size (in MB) of all 570 source files

55

Performance and Scalability in PowerPlay Applications

Cube Build Samples

Category Dominant Model


The category dominant model was used to create a Cognos Standard cube with the following settings:

auto-partitioning with 5 passes maximum and a maximum partition size of 50,000 records crosstab caching enabled

Dimension Map and Model Attributes

Model Attribute Number of Categories Number of Dimensions Number of Measures Source Data Format Number of source files

Description 146,238 5 (measures not counted as a dimension) 15. All 64-bit. ASCII (comma and ~ delimited) 6 (5 provide structure, 1 is transactional)

Enable Multi Processing was enabled for all but the smallest query. Number of Transaction 1 million input records Size (in MB) of all 224 source files

Performance and Scalability in PowerPlay Applications

56

Cube Build Results

Row and Category Dominant Model


The category dominant model was used to create a Cognos Standard cube with the following settings:

auto-partitioning with 5 passes maximum and a maximum partition size of 500,000 records crosstab caching enabled

Dimension Map and Model Attributes

Model Attribute Number of Categories Number of Dimensions Number of Measures

Description 492,152 5 (measures not counted as a dimension) 5 (two calculated). 3 x 64 bit, 2 after-rollup calculeted. Source Data Format ASCII (~-delimited) Number of source files 9 (6 provide structure, 3 are transactional) Enable Multi Processing was set for the 4 largest queries. Number of Transaction 50 million input records Size (in MB) of all 2,280 (2.28 GB) source files

57

Performance and Scalability in PowerPlay Applications

Part II

Zen and the Art of PowerPlay Enterprise Server Configuration

The most common question we receive with respect to PowerPlay Enterprise Server is: How may servers of type A do I require to service requests for n users against m cubes? This question is difficult to answer without some understanding of the following things:

How well are the target cubes designed? This will impact how long the average
query takes to complete, and how much CPU will be eaten up in servicing requests.

How large and complex are the average queries? Is the average query a 50 row
report or a 10,000 row report with nested levels, all measures, and zero suppression?

How many concurrent users (peak) will be accessing the cube(s)?


Note: At this time, sizing guidelines are for NT Server only. A later version of this paper will include information on UNIX servers.

59

CHAPTER 5

PowerPlay Web Stress Testing

This chapter describes results from a set of internal tests run against PowerPlay Enterprise Server. Based on the results, it makes recommendations for server sizing.

61

PowerPlay Web Stress Testing

Test Tools and Environment


This section outlines results to date using an internally developed stress testing tool for PowerPlay Web. This tool occupies the same space in the PowerPlay architecture as the cgi gateway. The tool (called CAbuse, for Cube Abuser) submits random requests to a given cube on a given PowerPlay Server. These requests are submitted one after the other, with no pauses whatsoever. Several instances of CAbuse running against a particular cube (or mapped cube) on a server simulates a very large number of concurrent requests, and hence a large number of concurrent users. A couple of things to note:

these test results do not include the overhead of sending HTML query results
from a request back to the cgi-gateway, through the web server, and ultimately back to the web browser. The results are based strictly on observed times for the query processor to handle requests. Under normal circumstances, this should be the bulk of processing time. In high volume client installations, similar throughput will be achieved on the PowerPlay Enterprise Server machine, provided that the cgi-gateway and web server are on a different machine. The machine that runs the web server must be large enough to handle the volume of HTML files coming from PowerPlay Enterprise Server.

Because CAbuse generates random queries very quickly, it simulates about the
worst possible load imaginable. The randomness of the queries ensures that very little advantage is derived from the servers PPDS cache. Also, because its truly random in its choice of links to follow, CAbuse periodically generates a phenomenally large query -- a hundred thousand rows nested, by several columns, displayed as bar charts, for example. The tests, therefore, do include some very large queries (as well as, admittedly, some very small ones).

We have not (to date) run many tests against UNIX servers. The results in this
document are primarily for NT Server only. UNIX tests are planned and will be documented as soon as theyre available.

Performance and Scalability in PowerPlay Applications

62

Test Results (Windows NT Server)

Test Results (Windows NT Server)


The following table outlines test results on, derived from running multiple instances of the Cube Abuser against the Row Dominant cube (see Row Dominant Model on page 55) on two different NT Servers.
Maximum Processes 1 2 3 Server1a Requests/Sec 5.74 7.80 7.11 Server2b Requests/Sec 6.55 11.3 12.42

a. Compaq DeskPro 6300. 1 x 300 MHz CPU, 320MB RAM. b. IBM Netfinity Server. 2 x 350MHz CPU, 512MB RAM.

The above results are average times over several repeated tests, with varying levels of incoming queries. The tests were run with PPDS Read Cache settings ranging between 32MB and 48MB -- no significant difference was discerned between these settings. However, tests run with a PPDS Read Cache setting of only 8MB yielded far lower overall throughput. For example, with an 8MB PPDS Read Cache setting, the dual-processor NT server (Server 2) achieved a maximum throughput of about 7 or 8 requests per second, far lower than was achieved with a higher PPDS Read Cache setting, indicating that the cache does have a positive effect. The tests indicate that a single PPDSWEB process will handle slightly better than 50 concurrent users (5 to 7 requests per second), and that, on a dual processor PC, two processes will handle more than 100 concurrent users (10 to 12 requests per second). The results confirm that multiple CPU machines can dramatically increase the throughput possible with PowerPlay Web, up to the maximum number of fully loaded PPDSWEB processes.

63

Performance and Scalability in PowerPlay Applications

PowerPlay Web Stress Testing

The following charts shows output from NT Performance Monitor (showing %CPU usage for each CPU) and average request completed over the duration of one instance of the above test for the IBM NetFinity Server (Server2):

Percent CPU Usage

Average Request Time (ms)

Performance and Scalability in PowerPlay Applications

64

Test Results (Windows NT Server)

The following PowerPlay report, derived from the NT Event Log, shows the number of requests completed per second during a test run on Server2:

Average Queries/Minute
Queries per Time Period 700 600 500 400 300 200 100 0 Minute 02 Minute 04 Minute 06 Minute 08 Minute 10 Minute 12 Minute 01 Minute 03 Minute 05 Minute 07 Minute 09 Minute 11

Mean

The following graph shows the average request time in microseconds for the same test run:

Average Query Time (ms)


Query Time (Average) 500

Mean
400 300 200 100 0 Minute 02 Minute 04 Minute 06 Minute 08 Minute 10 Minute 12 Minute 01 Minute 03 Minute 05 Minute 07 Minute 09 Minute 11

As you can see, the NetFinity server handled an average of nearly 600 requests per minute, with an average query time of under half a second. If we assume a concurrent user level of 10% (of licensed users), and concurrent request level of 10% (of concurrent users), that implies this server would handle 100 concurrent users, or 1000 licensed users, with very fast average query times, provided users were executing average queries.

65

Performance and Scalability in PowerPlay Applications

PowerPlay Web Stress Testing

Detailed logging was enabled for the duration of the test run. With logging disabled, slightly greater throughput should be achievable.

Enterprise Server Concurrency Test Results


To validate that the PowerPlay Enterprise Server does scale to multiple users, a set of tests was run with 40 PowerPlay clients connected to the PowerPlay Enterprise Server (NT). During this period, a tester submitted a series of benchmark queries against a very large cube, including drill down, drill up, nest rows and columns, and suppress zeros. While the PowerPlay Enterprise Server was under load, the impact on performance was minimal. The following table outlines the impact on performance:
Non-Load 2.37 5.31 7.34 2.47 4.59 7.50 Load 2.97 24.85 8.67 2.00 2.00 19.32 Impact % 25% 368% 18% -19% -56% 158%

Open Cube Drill Down Drill Up Nest Rows Nest Columns Suppress Zeros

The decrease in time under load for Nesting is due to the effect of a populated cache on the PowerPlay Enterprise Server.

Performance and Scalability in PowerPlay Applications

66

Test Results (Solaris)

Test Results (Solaris)


Server Characteristics:
SUN Enterprise 3500 Server 4 x 360 MHz CPUs 1GB RAM 4MB External cache

Cube Characteristics:
Input Records: 10 Million Dimensions: 5 Categories: 32,000 Measures: 4 Cube Size: 62MB

A test run against the same cube on a quad processor Solaris server yielded sustained throughput of 800 to 900 requests per minute, with peaks of over 1000 requests per minute. For the test, 10 processes were set up (as Startup Processes) for the cube, and 64MB of PPDS Read Cache was allocated. For the duration of the run, CPU utilization rested at or above 90%, with no significant processes other than ppserver and the 10 ppdsweb instances on the machine. On average, the server handled approximately 15 requests per second. If we assume a concurrent user level of 10% (of licensed users), and concurrent request level of 10% (of concurrent users), that implies this server would handle 150 concurrent users, or 1500 licensed users, with very fast average query times, provided users were executing average queries.

67

Performance and Scalability in PowerPlay Applications

PowerPlay Web Stress Testing

PowerPlay Web Sizing Guidelines (NT)


Given the results in the preceding sections, its reasonable to assume that each CPU in a single or dual CPU machine will handle about 5 requests per second for average queries against an average cube. For larger servers (quad, for example), assume about 4 requests per second per processor. In order to ensure that you achieve an acceptable level of throughput, follow these guidelines when setting up your system:
1.

For large installations, strive for an average query response time that is short -a few seconds at most -- with the server not loaded. A single user should see results back very quickly for nearly all queries. If this is not possible, then setting client expectations becomes critical. One thing to ask clients who insist on generating reports with a hundred thousand rows and 10 columns is: When was the last time you opened a table like that on a regular HTML page?. You cannot expect blindingly fast performance if your average report contains a million cells with zeros suppressed. For multi-processor systems on NT, the scalability from one processor to two has been found to be fairly linear for PP Web requests. That is, if a single CPU P300 Server will handle n concurrent requests per second (on average), then a dual-CPU P300 server should handle 2xn concurrent requests of a similar nature, provided you set up 2 or more processes for the cube. This indicates that PP Web is relatively CPU-bound. The more CPUs the better, for large installations. Note: Although performance improvements have been found to be linear from 1 to 2 processors, its unlikely to be linear from 2 to 4 processors. Larger servers expend additional effort in managing system resources. Further tests in this area are pending.

2.

3.

Tests to date for load balanced servers are inconclusive at this time. However, preliminary tests indicate that load balancing across 2 identical servers does not yield a doubling of throughput. Rather, each additional server adds approximately 50% additional capacity. For example, suppose Server A will handle n requests/second. A mapped cube on Server A that is mapped onto physical cubes on Server A and identical Server B will handle approximately n*1.5 requests/second. The network can become the bottleneck in load-balanced environments. If two back-end PowerPlay servers can normally handle a set number of requests per minute apiece, and load balancing requests across them provides no perfor-

4.

Performance and Scalability in PowerPlay Applications

68

PowerPlay Web Sizing Guidelines (NT)

mance gain, its possible that either the network is too slow, or the Network Interface Card on the central dispatcher machine is handling as many requests as it can. A good indication of a network bottleneck is a low level of CPU usage on the machines where the PowerPlay Query Processors are installed, even though the number of incoming requests on the central dispatcher is very high.

Basic Server Specifications


The following table outlines possible sizings for PowerPlay Web installations using dual and quad processor NT servers.The table assumes relatively fast queries, and does not take into account other factors, such as network bandwidth or processes other than PowerPlay Enterprise Server using up machine resources:
Factors: Number of Licenses % Users on System % Users Executing Query Avg. Query Time (secs) Calculations: Requests per second Number of CPUs Step 2: Servers Required Dual Processor Server Quad Processor Server 1 1 1 1 5 3 10 5 5 1 10 2 50 10 100 20 500 10% 10% 1 1000 10% 10% 1 5000 10% 10% 1 10000 10% 10% 1

Note: Requests per Second in this table represents the time required for the Query Processor to service a request (the times you see in the detailed log for the cube). It does not represent the overall time to submit the query from a browser, have it dispatched, processed, and returned to the browser via the web server. In large installations, the actual processing of the request will be the bulk of the query time, as the web server will be on a different machine than the PowerPlay server.

69

Performance and Scalability in PowerPlay Applications

PowerPlay Web Stress Testing

You can use this table to as a guideline to derive an estimate for the number of server required to service requests for client installations by adjusting the figures. The key things to consider are the complexity of the requests being submitted, and the percentage of users active at any given time. If any of these go up significantly, then the number of servers required to keep query response time in an acceptable range goes up accordingly. Its important to note that the above table provides guidelines for average queries against medium sized cubes (50 to 100 MB, relatively row dominant). For installations with lots of smaller cubes, it should be possible to reduce the number of servers required. For example, a site with 50 cubes, all of which are less than 500K in size, with low category and row counts, will not require as large a server as outlined above. Queries against such small cubes should be very fast (200 to 300 milliseconds), and hence you should see higher overall throughput. For installations with large category-dominant models and complex requests, the throughput may be lower than depicted in the above table.

Determining Average Query Time


In order to determine your sites average query time, you can submit a set of typical queries against your cube(s) with only a single user connected. Devise a set of queries that comprises a good representative set of end user queries, and include at least 20 or more query timings. If youre submitting requests from the web client, you can enable detailed logging to gather query timing results. The query timing results are stored in the file ppsrv.log (for UNIX) or in the Event Log (Windows NT). The values stored in these logs represent the number of milliseconds required for the PowerPlay Query Processor to complete submitted requests. Alternatively, you can time the queries manually. If youre submitting PowerPlay client requests, you must time the length of the request within PowerPlay client. The values for PowerPlay client will be slightly higher than those for the web, as they include the time to submit the query via the dispatcher, as well as the time to return the results to PowerPlay. At present, there is no way to derive the query processor time for a PowerPlay client request independent of communication timings.

Performance and Scalability in PowerPlay Applications

70

PowerPlay Web Sizing Guidelines (NT)

Memory Requirements
Determining memory requirements for PowerPlay Enterprise Server depends on

the number of cubes registered on the server the maximum number of processes that you define for each cube the number of startup processes that you define for each cube.

Each process thats active for a cube will use up to the amount of memory specified for the servers PPDS Read Cache size. A practical setting for the PPDS Read Cache is about 30% of the average cube size. An absolute ceiling for the PPDS Read Cache is currently 64 MB; above this limit, the memory management for the cache can become more expensive that resubmitting the query. The maximum number of processes for a cube is truly a maximum. If you have a popular cube that incurs many hits periodically (monthly, for example), theres no penalty in setting a high number of maximum processes permanently. Unless you specify startup processes for the cube, the Enterprise Server initiates new ppdsweb processes only as theyre required by new incoming requests. So, if you specify a maximum of 10 processes, but on average (except for month-end) only 2 users ever access the cube per day, then only 2 processes will consume resources on an ongoing basis. If the amount of resources being used is a concern, you can minimize the length of time that a process will remain active after the most recent request to minimize memory requirements. Use the PowerPlay Enterprise Server Administrator to set the process timeout:

Process Timeout sets how long a query processor remains in memory before it shuts down due to inactivity.

Because started processes to terminate faster, memory is freed up for other processes. However, it will also potentially reduce the effectiveness of caching.

71

Performance and Scalability in PowerPlay Applications

PowerPlay Web Stress Testing

Deciding on the Number of Processes


There is a tendency to add more processes for a cube that is experiencing less than optimal throughput. The addition of processes (the Maximum Processes cube property) makes sense only if one of the following is true:

The server on which PowerPlay Enterprise Server is running has free CPU cycles. If there are no free CPU cycles, the current average request time is sufficiently fast that an increase in query processing time will still meet user expectations.

Deciding on how many processes becomes clear using the following analogy, which outlines two methods to achieve the same throughput. In a cafeteria, there are 10 users with lunches to heat, and only one microwave oven. You, as the cafeteria coordinator, have the following options:

Let cafeteria users line up one after the other to heat their lunches in the microwave. Put all 10 lunches in the microwave at once, and have all users wait until theyre all heated.

The first option ensures that some people get fast service -- those in the queue early will have their lunches heated quickly. Those at the back of the queue will have to wait until others have finished, but once their turns arrive, their lunches will heat as quickly as anyone elses. The second option ensures that everyone waits for pretty much the same period of time. But because the microwave contains 10x the volume of a single lunch, the time to get it all heated is relatively high. The result is that nobody gets an optimal heating time, but everyone waits for the same amount of time. Those who would have been at the end of the queue get faster service than they would have, while those at the front of the queue are slowed down. Now think of the people with lunches to heat as incoming PowerPlay requests, and the number of lunches in the microwave as the number of processes that youre running for a cube. If the server (microwave) has free cycles (additional watts for heating), then adding processes (lunches) will provide better throughput. At the point where the server is running at or near 100% CPU, however, adding processes has a similar effect to placing several lunches in a microwave at the same time; no query

Performance and Scalability in PowerPlay Applications

72

PowerPlay Web Sizing Guidelines (NT)

has to wait for any other query to finish, but all queries take longer. The number of queries processed is the same, but the effect to end users can be very different. The best approach obviously varies from one situation to the next. You must balance the servers resources against both the number of processes that server can handle and the expectations of users. As a rule of thumb, if a server is running at or near 100% of CPU capacity, then dont add more processes for busy cubes.

73

Performance and Scalability in PowerPlay Applications

CHAPTER 6

Enterprise Server Tips and Techniques

There are some undocumented features in PowerPlay Enterprise Server that you can use to advantage. These are:

restricting the maximum query processing time for a cube setting the Query Processor threshold to balance requests across busy processes (PowerPlay client requests only) registering multiple cubes by directory name reducing the amount of HTML passed to the Web Browser passing security information to PowerPlay Enterprise Server from a client web application

75

Enterprise Server Tips and Techniques

Restricting Query Processing Time


For any cube, you can set an absolute amount of query processing time that PowerPlay Enterprise Server will allow for any submitted query. You set this limit in the ppwebconfig.dat configuration file:
<Control> Connect Timeout=IN,15 Enabled=IN,1 Logging=IN,0 MaxProcess=IN,1 Process Timeout=IN,15 QPThreshold=IN,5 RequestTimeoutSec=IN,900 Startup=IN,0 </Control>

By default, RequestTimeoutSec is set to 900 seconds (15 minutes). You can adjust this number lower to avert the possibility that users can submit runaway queries that consume up to 15 minutes of query processor time. Its important to note that once enabled for a cube, it applies to all queries submitted by all users. Setting it too low may result in a high number of query failures.

Performance and Scalability in PowerPlay Applications

76

Setting the Query Processor Threshold

Setting the Query Processor Threshold


For any cube, you can set a threshold level to control the load balancing for incoming PowerPlay Client requests. This allows you to choose how you would like to distribute new incoming requests based on how busy each query processor is. Note: This setting applies only to queries submitted by PowerPlay or PowerPlay for Excel; it does not apply to PowerPlay Web requests. You set this limit in the ppwebconfig.dat file. For each cube, youll find an entry called QPThreshold in the <Control> section:
<Control> Connect Timeout=IN,15 Enabled=IN,1 Logging=IN,0 MaxProcess=IN,1 Process Timeout=IN,15 QPThreshold=IN,5 RequestTimeoutSec=IN,900 Startup=IN,0 </Control>

This figure represents a ratio of delay time for incoming requests to active query processing time. As s new request is received, there may be a delay period before that request gets processed by a query processor (if the query processor is busy). Setting the QPThreshold value works like this:

If QPThreshold is set to 0, then each new request is submitted to the first available query processor. This is the same way that incoming PowerPlay Web requests are handled. If QPThreshold is set to 10, then each new request is submitted to the next query processor in a round-robin fashion, regardless of the load on that query processor. For any value between 0 and 10, each new request is submitted to the next query processor whose ratio of delay time:processing time is lower than the specified threshold. So, for a value of 5 (the default), an incoming request will be submitted to the first query processor found that has less than a 50% ratio of delay time to processing time.

77

Performance and Scalability in PowerPlay Applications

Enterprise Server Tips and Techniques

Registering Multiple Cubes by Directory Name


A common administration requirement for the PowerPlay Enterprise Server administrator is to add and remove cubes to and from an enterprise server. While the command-line driven ADMTOOL makes it easier to do this than previous versions of PowerPlay Web, it can still be time consuming. In PowerPlay 6.0 (post build 19) and in PowerPlay 6.5, you can enable cubes based on the directory in which theyre located. You can then access any cube in that directory by referring to it using an MDC parameter on the PowerPlay Enterprise Server URL. For example, on a server named SRV001, suppose you have 30 cubes (named CUBE 001 through CUBE 030) in the directory D:\OLAPCUBES\COGNOS. You could add each of the cubes to the Enterprise Server manually, or you could use the ADMTOOL to write a script that adds all of these cubes. Alternatively, you can add the Directory D:\OLAPCUBES\COGNOS\ as an entry named ALL CUBES, using either the PowerPlay Enterprise Administrator or the ADMTOOL. In the PowerPlay Enterprise Administrator, the entry looks like this:

Note the trailing backslash to indicate that this is a directory entry.

Performance and Scalability in PowerPlay Applications

78

Registering Multiple Cubes by Directory Name

Once this is done, you can then access any of the cubes within that directory using the following URL:
HTTP://SRV001/cgi-bin/ppdscgi.exe?XT=ALL+CUBES&MDC=cubename

where you replace cubename with the name of the cube that you want to access. Notes:

For cubes that you enable in this way, ensure that you disable the Publish to a Table of Contents (Web) option on the Control tab. The directory entry is meaningless in the table of contents. You can maintain cubes in this way using the batch administration tool.

79

Performance and Scalability in PowerPlay Applications

Enterprise Server Tips and Techniques

Reducing the amount of HTML Passed to the Web Browser


In high-volume installations, the amount of HTML generated by PowerPlay Enterprise Server can be very high. In such situations, its advisable to change the default settings for the Page Size Default for your cubes in the PowerPlay Enterprise Server Administrator:
Lowering these numbers reduces the volume of HTML passed through the web server for each request.

By default, PowerPlay Enterprise Server returns 200 rows and 50 columns per page of generated report result. Setting this to a lower number reduces the amount of HTML passed back to the web server and on to PowerPlay Web users. Try setting the number of rows to a lower value, such as 50, and the number of columns to 10. This can have a big impact on the performance of your system if the web server is having trouble handling all of the query results being passed to it.

Performance and Scalability in PowerPlay Applications

80

Passing Security Information to PowerPlay Web from a Web Front End

Passing Security Information to PowerPlay Web from a Web Front End


A common requirement for many clients is to prompt users for a User ID and password up front, when that log on to their web server. The desire then is to authenticate users in PowerPlay Web using those login IDs. This requires the following:

the capturing of the login information at the time of login, so that the users security information can be passed in as parameters on the PowerPlay Enterprise Server URL optionally, to encrypt the login information, a Cognos-provided encryption routine.

At run time, its possible to pass in security parameters to the Enterprise Server as part of the URL. The parameters you can use include:
Parameter U=<username> A=<password> L=<class> B=<db user id> Description Specifies the User Name, as defined in the associated Cognos security file. Specifies the user password, as defined in the associated Cognos security file. Specifies the user class, as defined in the associated Cognos security file. For cubes stored in an RDBMS or third-party OLAP container, specifies the database user ID, as defined in the associated Cognos security file. For cubes stored in an RDBMS or third-party OLAP container, specifies the database user password, as defined in the associated Cognos security file.

W=<db user password>

So, the following URL entry would open cube SALESDATA, with User ID manager and User Password qwerty:
http://www.srv001.com/cgi-bin/ ppdscgi.exe?XT=SALESDATA&U=manager&A=qwerty

Passing security information to the PowerPlay Enterprise Server in an unencrypted format as part of the URL is an unacceptable solution in many cases. If this is true, there is an unsupported Cognos library that you can use to encrypt the security

81

Performance and Scalability in PowerPlay Applications

Enterprise Server Tips and Techniques

information before you pass it to PowerPlay Enterprise Server. The .h file (with which you create a small program) associated with this library contains source code for a function named ppWebEncrypt. The ppWebEncrypt function takes as arguments whatever security parameters you wish to pass, and encrypts them along with a timestamp that expires after 15 minutes.
char * ppwbEncrypt ( const char *dbUser, const char *dbPassword, const char *user, const char *password, const char *userClass);

The ppWebEncrypt function returns a time-stamped (valid only for 15 minutes), encrypted string that looks like this:
E5C7EA3AA7B3EFF1389A82E62BCE573C801A5EE3A124F52D7C4AEA4057B23F15A87F3 0EDC9DF068B74A39D7BFA

You can pass this encrypted string to PowerPlay Enterprise Server using the UX= parameter:
http://www.srv001.com/cgi-bin/ppdscgi.exe?XT=SALESDATA &UX=E5C7EA3AA7B3EFF1389A82E62BCE573C801A5EE3A124F52D7C4AEA4 057B23F15A87F30EDC9DF068B74A39D7BFA

At present, the ppWebEncrypt library is available for NT Server and Solaris only. If you need this library, contact FieldSupport.

Performance and Scalability in PowerPlay Applications

82

File Descriptors for PowerPlay Enterprise Server

File Descriptors for PowerPlay Enterprise Server


For sites with large numbers of users, it may be necessary to increase the maximum number of file descriptors available per process for the PowerPlay Enterprise Server. This can be required because the instances of the PowerPlay Query Processor program (ppdsweb.exe) will open multiple sockets, each of which the operating system considers a file. On many systems, the maximum number of file descriptors per process defaults to 64. To increase the maximum number of file descriptors, you can use the ulimit command. Consult your UNIX documentation for more information.

83

Performance and Scalability in PowerPlay Applications

Enterprise Server Tips and Techniques

Performance and Scalability in PowerPlay Applications

84

You might also like