Zen and The Art of Power Play Cube Building
Zen and The Art of Power Play Cube Building
Zen and The Art of Power Play Cube Building
This document outlines several known facts and theoretical aspects of PowerCube creation in Transformer. It also discusses scalability in PowerPlay Enterprise Server.
Table of Contents
Part I ZEN and the Art of PowerPlay Cube Building 7 Choosing Your Hardware
How Transformer Uses Memory 10
Setting the PowerPlay DataServer Cache Size 11 Setting the Memory Available for Data Sort 12 Operating System Memory Settings 13 Using Multiple CPUs 14
CHAPTER 1
16
16
Configuring Work Locations for Transformer Handling Very Large Work Files 17 Disk Configurations 18
CHAPTER 2
PowerPlay Partitioning
When To Focus On Partitioning Environmental Constraints 21 20
19
Network Bandwidth 21 Cube Production Window 22 Acceptable Query Response Time 22 Server And End User Hardware 22 After Environmental Constraints 22
24
CHAPTER 3
31
CHAPTER 4
53
Environmental Settings and Methodology 54 Row Dominant Model 55 Category Dominant Model 56 Row and Category Dominant Model 57
Part II
Zen and the Art of PowerPlay Enterprise Server Configuration 59 PowerPlay Web Stress Testing
Test Tools and Environment 62 Test Results (Windows NT Server) 63
CHAPTER 5
61
CHAPTER 6
Part I
Nothing can improve either build performance or runtime performance in a PowerPlay application better than taking the time to design your OLAP model and cubes well. However, there are some things that can impact performance for nearly any PowerPlay application. This chapter describes the things you can do to optimize your PowerPlay cube builds. In addition, this section describes in detail how to use Transformer successfully in a UNIX environment.
CHAPTER 1
Choosing the right hardware to build PowerPlay cubes is dependent on the following things:
1. 2. 3.
Memory. This is probably the single most important aspect of a cube building machine. CPUs. Transformer can take advantage of two CPUs during the data read phase, thereby achieving faster builds. Disk Space and Configuration. Transformer is I/O intensive as it reads source data, builds temporary work files, and creates PowerCubes. Tuning your environment to provide Transformer the best possible I/O performance will speed up your cube builds. Networking Considerations. Ensuring fast access between Transformer and its data sources will provide optimal cube build times. Partitioning. PowerPlay versions 6.0 and 6.5 include a new autopartitioning algorithm that typically builds cubes faster than earlier versions.
4. 5.
Category/Work File Generation. Includes reading the data source and processing the rows to create categories, and compressed working files. Meta-Data Update. Deals with specifying the structure of the cube. Cube structure is some subset of the dimensions, categories and measures available in the model, and it reflects what end-users see in PowerPlay. Data Update. Represents (normally) the dominant stage of cube processing in terms of time. This stage includes consolidation, partitioning, and updating the cube with the set of records that apply to the cube.
The last two phases (Metadata Update and Data Update) most often represent the bulk of cube build time. The main variables for controlling how Transformer uses memory are as follows:
UNIX PPDS_WRITE_MEMORY environment variable. Default: 16MB SORTMEMORY environment variable Windows NT
10
the Operating System the Disk Cache, which is maintained by the Operating System Transformer, and the Transformer Object Repository (Transformer RP) Cognos Data Management Services (DMS) the PowerPlay Data Server (PPDS)
The MetaData Update and Cube Update phases most often represent the bulk of processing time for Transformer. During these phases, PPDS, Disk Cache, and to some extent Transformer RP compete for memory. The amount of physical memory available during these times is critical to better build times. As a rule of thumb, the best performance is achieved when enough physical memory is available so that the Disk Cache can grow to be as large as the final size of the PowerCube (MDC). The reason is that building an MDC results in many inserts, updates and deletes by PPDS. When the PowerCube is entirely in memory (Disk Cache), cube build performance will be optimal since there is no need by PPDS to go to access information on the hard disk. Assigning more memory to PPDS (via the PPDS_WRITE_MEMORY or WriteCacheSize entries) can benefit build time since this will reduce the number of disk I/Os required by PPDS. However, this is beneficial only if it does not reduce the memory available to the Disk Cache. On systems with lots of memory, there is no reason why the user should not consider increasing the PPDS Write Cache from the default setting. However, increasing it indiscriminantly can actually cause problems. Increasing the cache size to 32MB or 64MB on large systems can provide performance improvements. Increasing it to a very large setting (hundreds of megabytes) is not generally advised.
11
A recent cube build for a 65MB cube in Windows NT, on a PC with 512 MB of RAM yielded a build time improvement of approximately 15%:
WriteCacheSize 4 MB 32MB Build Time
12
13
The Enable multiprocessing feature allows Transformer to use multiple processors (if theyre available) during Transformers data read phase.
Note: The Enable multi-processing feature is supported only for the following query types:
Impromptu Query Definition (IQD) Delimited Field Text Delimited Field Text with Column Titles
14
Each distinct Transformer process should have its own dedicated CPU. If youre using the Enable Multi-Processing feature, then each instance or transformer should have 2 dedicated CPUs. Each Transformer instance will use system resources independent of all other instances. Ensure that you have sufficient memory, disk space, and i/o bandwidth to support all instances. Each Transformer instance will require its own set of configuration files. Do not share directories for temporary files between Transformer instances.
15
If you do not set this entry, then Transformer uses whatever location is set up as the Temporary file location in either UNIX or NT. Most often this is /usr/tmp (for UNIX) or c:\temp for NT. If you have not explicitly set up the location for the temporary sort file, its possible to run out of disk space during a cube build. Set the TEMPFILEDIRS (UNIX) or Temporary (NT) entry so that it points to a disk with space equal to the size of the cube or greater. You can specify multiple directories separated by semicolons. Note: You cannot set this entry for Windows NT using the Transformer UI; you can only set it by editing the cognos.ini file manually.
16
different disk controllers. For example, assume that on a Solaris system, there are four physical drives, each with several GB of space, and each controlled by a separate disk controller (Controller1 through Controller4). A good approach might be to define the following locations:
ModelWorkDirectory: /cognos/model/work, on Controller1 DataWorkDirectory: /cognos/data/work, on Controller2 TEMPFILEDIRS: /usr/cognos/temp, on Controller3 All other file locations (CubeSave, ModelSave) in various folders on Controller4.
Being able to use multiple drives eliminates size limitations set by the operating system. As Transformer creates cubes, it writes temporary files to the specified drives or directories, in the order that you listed them. It moves to the next location when the current one is full. These files are concatenated into one logical file, regardless of which drive they are in.
17
Disk Configurations
Transformer is i/o intensive. For a large cube build, Transformer can perform thousands of reads and writes to and from its temporary work file, the sort file, and to the PowerCube (MDC file). To ensure that your cubes are built as optimally as possible with respect to disk I/O, its important that your environment be optimized for fast reads and writes. Consider the following when configuring your environment for optimal Transformer cube builds:
The speed of the physical devices available. When deciding on hardware for use in building large cubes, you will realize the best performance if you have very fast physical disk access times. The RAID level of your physical storage devices can have an effect on overall cube build time. Which RAID level you choose will have a big effect on the throughput for Transformer. The following table describes the effects of some (but not all) widely-used RAID levels:
RAID Level 0 (striping) Description Optimizes performance at the expense of data redundancy. Data is distributed among disks for performance, with no provision for redundancy. As a result, a disk crash can cause data to be lost across several disks.
Emphasizes data redundancy at the expense of performance. Mirroring maintains multiple copies of data to ensure that, in the event of a disk failure, no data is lost. 5 (striping with parity) Attempts to balance performance and data redundancy. Data is distributed across disks along with parity information about the data. In the event of a disk crash, the parity information allows for the reconstruction of data. As a result, system-wide risk is lowered, as no individual disk can cause widespread loss of the data.
1 (mirroring) Although RAID Level 5 is generally faster than mirroring, it can significantly slow down data reads, as it must deal with (ignore) parity information when performing reads.
Because PowerPlay cubes should not be used as a primary storage place for data, RAID Level 0 (striping) is often most appropriate for large cube builds. This will provide optimal cube builds. In the event of a disk failure during a cube build, the cube can be rebuilt from the source data. If data redundancy is a requirement once the cube is built, the cube can be copied to another location if required.
18
CHAPTER 2
PowerPlay Partitioning
The goal of partitioning has always been to achieve a satisfactory end user query response time without exceeding the cube production window. Partitioning is a strategy that improves query response time at the expense of cube build time. In PowerPlay 6.x, partitioning occurs automatically using a best guess approach that can be fine tuned by the user. Choosing the right partitioning strategy is, however, the subject of environmental constraints that include hardware power and the nature of end user queries performed. This means the default auto-partitioning cannot always pick the right partitioning strategy for every situation.
19
PowerPlay Partitioning
20
Environmental Constraints
Environmental Constraints
Environmental Constraints are limitations in the BI architecture that presently cannot be changed. These constraints impact the range of Partition Sizes that can be employed and, in general, apply to all cubes. Environmental constraints include:
1. 2. 3. 4.
Network bandwidth Cube production window Acceptable query response time Server and end user hardware
Network Bandwidth
As a rule of thumb, avoid using large partitions when the PowerCube does not reside on the same machine as PowerPlay Server. The impact of network bandwidth constraints comes from the new architecture of PowerPlays Enterprise Server. Communication of information between the PowerCube and the end user can involve multiple networks and hosts. In PowerPlay 6.5, it is also possible to network PowerPlay servers to distribute requests to the right OLAP data source. The effect of partitioning on network traffic is most notable at the PowerPlay Server closest to the cube. Between this server and the cube, the volume of network traffic is proportional to the partition size and number of requests. Using a smaller partition size will support a larger number of queries. Another example of this is end users who employ PowerPlay Personal Server to access a cube shared on the LAN. Query processing is performed by each individual users Personal Server. Here again the volume of network traffic is proportional to the partition size and number of requests on the LAN. Where cube and PowerPlay Server are resident on the same host, network constraints are exchanged for the higher volume limits associated with hardware buses and back-planes which permit greater partition sizes. These higher limits may shift the constraint to the capability of the CPU to perform query calculations within the required response time.
21
PowerPlay Partitioning
22
Environmental Constraints
to refine cube partitioning strategies. There are two critical strategies that guide successful cube partitioning:
1. 2.
Answer the majority of queries from the first or upper partitions (called Summary Partitions) Try to ensure that information sought by a query can be found within a single partition.
23
PowerPlay Partitioning
Autopartitioning is quicker and will usually produce a smaller faster cube than 5.x Manual partitioning No default partitioning. Partitioning is ON by default. The new autopartitioning feature determines the way to partition any given cube. Manual partitioning only. Supports both manual and automatic partitioning. The only partitioning controls were Partition numbers are assigned by a user partition numbers. when manually partitioning only. This facility will be less commonly used. Instead an extra tab on the Cube Properties dialog box provides controls that guide the auto-partitioning algorithm as shown below. Consolidation of input records was Consolidation with Sort is ON by default. off by default. Only 2 partition levels were recom- Up to 5 levels are commonly used. mended.
Terminology Refresh
It is important to remember that a partition level is not the same as a level in a dimension. A partition level is a set of categories that receive the same partition number. These categories can and probably will come from more than one level in a dimension when using the 6.x auto-partitioning algorithm.
24
Cube Optimization is not set to Auto-Partition Consolidation set to: either of NO or YES with Presort Using externally rolled up measures Using before rollup calculated measures (use calculated columns instead)
Auto-Partitioning
There are two controls used by the auto-partitioning algorithm. In order of importance they are:
The desired partition size (based on estimated number of consolidated records) The number of partitioning passes
The desired partition size is set by a slide control that is designed to help users select sensible values. It makes the trade-off between cube build time and query response time very clear. However, it does not show the effect that a change in par-
25
PowerPlay Partitioning
tition size has on the partitioning strategy employed. The Desired Partition Size is the main control in determining the partitioning strategy.
Dragging the slider to the right (towards Faster Cube Access) decreases the Desired Partition Size. Dragging the slider to the left increases the Desired Partition Size. Similarly, editing the Estimated Number of Consolidated Records changes the Desired Partition Size accordingly.
Transformer selects groups of categories that satisfy the partition size criteria you specify. It does this in a way that optimizes ad-hoc query performance for that partition size. This means that users should try several different partition sizes that satisfy their environmental constraints. Users will arrive at a suitable partition strategy quickest if they start with larger partition sizes and work toward smaller ones (rather than the other way around). The number of passes can be adjusted at the end to find the best build time/query performance trade-off. It becomes a fine tuning tool that safe-guards against overpartitioning. Transformer doesnt necessarily use all the passes that you allow it but it will not use more. Transformer requires one pass for every partition level created. The smaller the partition size, the greater, the number of partition levels created (and the longer Transformer will generally take to build the Cube). This is why it is recommended that users start with larger partition sizes and work toward smaller ones. During this phase it is important to allow Transformer to use a large number of passes.
26
Cutting back the number of passes will not prevent Transformer from producing a partitioned cube. It simply controls the amount of effort Transformer will expend on a given partition size. Transformer always stops partitioning when the number of records in the summary partition drops below the specified partition size. The key point is that later passes may not produce significant improvements in query response time. This is where the number of passes may be refined to get the right query performance/cube build time trade-off. A useful tip to know is that the last pass that Transformer performs is summary partition consolidation. If an additional pass does not yield a reduction in summary partition size then this pass can be skipped. (If both the summary partition and the level one partition finish with the same number of records then it is likely that Transformer would use more passes if you allow it. It shows that the summary partition has not been consolidated.)
Deciding whether to adopt 6.x auto-partitioning over your 5.x partitioning scheme Trying to improve on the auto-partitioning strategy Dealing with large cubes or an unusually structured cubes Tuning specifically for the top 10 reports
Lets consider reasons for using auto-partitioning instead of a 5.x partitioning scheme. If the 6.x Auto-Partitioned cubes build faster and answer queries at least as quickly as 5.x cubes then there is no need to compare the actual partitioning strategies. Some time should be spent with Auto-partitioning under 6.x since it generally performs very well. If necessary your 5.x partitioning scheme can be loaded into 6.x but you must open the model in 6.x and change the default optimization setting in cube properties to auto-partitioning. In addition you must remove all the manual partition information from your old model otherwise the Auto-partitioning algorithm will not be used. Also, if you want optimal performance, ensure that you dont use any features that may prevent Transformer from using the new Cube build engine. See The New Partitioning Scheme on page 25. Experience has shown that it is possible to improve on Auto-partitioning in some situations. This might be achieved by starting with the partition points already
27
PowerPlay Partitioning
selected by the Auto-partitioning feature. Note that the moment partitioning level numbers are assigned manually, Auto-Partitioning is disabled. Most models have dimensions whose lowest level details are frequently accessed (i.e product codes). In these dimensions, it is important to manage categories with a high parent to child ratio and partition them accordingly. Normally Auto-partitioning will do a good job provided you pick a partition size that is big enough. Too many partition levels will adversely effect lowest level detail reports. This is another reason why you should first auto-partition with a large partition size and work down in size to get acceptable query response. After dealing with large detail dimensions, consider dimensions that are frequently summarized. These dimensions may be less obvious than the detail dimensions. If 80 percent of your user community creates summary reports with categories from a given dimension, consider partitioning at a high level in that dimension. Auto-partitioning will do this automatically so long as the partition size is small enough for Transformer to consider the dimension as a candidate for partitioning. The partition status information will show the partitions chosen by Transformer. Using a small partition size may go against advice offered earlier. This is true if you have both frequently summarized dimensions and dimensions containing large category counts used primarily to report lowest level details. Unfortunately it is not possible to use different partition sizes on different parts of the cube. The best one can do is to favor either summary reports or lowest level detail reports.
Excluding Dimensions
The auto-partitioning algorithm does permit the exclusion of dimensions. Dimensions might be considered for exclusion if they are used only for lowest level detail reports. By excluding detail dimensions (which are often large) it will be possible to set the partition size smaller. Cube build time will be faster and summary queries will be serviced quickly. If the same model included partitioning on detail dimensions, lowest level queries would be slower since they are reporting across multiple partitions.
Other Tips
The moment any partitioning level numbers are assigned to either categories or levels auto-partitioning is turned off and only the categories and levels specified manually are partitioned. When manually defining partitions in 6.x, the number of Auto-partition passes
28
must be set to the number of partition levels assigned manually or greater. Whether in Automatic or Manual mode, Transformer 6.x requires one pass for each partition level defined. In 5.x, we made the recommendation that dimensions with alternate drilldowns were poor choices for partitioning. The categories in alternate paths will likely come from multiple partitions, causing slow drill-down response times. In 6.x, Auto-partitioning is more equitable in choosing candidate categories for partitioning. (As a result it will frequently partition categories from dimensions that have alternate drill paths, without detrimentally effecting performance). Also only the primary drill path is used for partitioning. Manually defined 6.x partition level numbers dont necessarily define the order in which partitions are placed in the cube. Partition level numbers may be reordered by Transformer. When partitioning manually, rules similar to 5.x partitioning apply. If performance is not adequate after partitioning repeat the process using a level (in some other dimension) that adequately divides the number of records in the summary partition. For each new level of partitioning, increase the partition number by 1. Try to select categories across the entire dimension and assign these categories the same partition number. Once complete, you should not be able to traverse a path from the root category to any leaf without encountering a category with an assigned partition number.
29
CHAPTER 3
In production environments, there is often a desire to leverage the CPU power and resources of a UNIX server to update PowerPlay PowerCubes. Processing production PowerCubes in this fashion enables you to centralize administration and cube storage and allows PowerCube updates to become a part of the existing data warehouse update process. After the warehouse updates, the PowerCubes are updated. Also, the UNIX server can often access the database faster because of location or network priority. PowerPlay Server enables you to use Transformers graphical user interface on the PC to model your two-dimensional data into multidimensional PowerCubes. However, the Administrator Server Edition also enables you to offload the creation of models and PowerCubes to the rsserver program (PowerPlay Server Transformer), which is a version of Transformer that runs on a supported UNIX server platforms. This chapter addresses some common issues faced when using PowerPlay Server Transformer, and has two major sections:
Part I covers the basics: an installation checklist, step-wise testing, and some common pitfalls. Part II discusses how to create an effective working environment by configuring your working areas, customizing where files go, and automating cube builds.
31
The Basics
Often, users are confused about what the components of PowerPlay Server are and how they interact. Understanding what the pieces do is vital to successfully using the product. The main components of PowerPlay Server Transformer are: PowerGrid. PowerGrid provides the basic networking services between client and server Transformer. The PowerGrid network daemon (netd) must be started and left running to respond to periodic requests from the client. PowerGrid uses the script rsserver.sh, which sets the necessary environment variables and starts rsserver, to launch Server Transformer. An addiitonal utility, called NetInfo, is installed on the client machine, and can be used to test your defined client/server connections. NetInfo verifies that messages are being sent to netd, and that netd is responding to them. Server Transformer (rsserver). The rsserver program is the executable for PowerPlay Server Transformer that runs on the UNIX server. The rsserver program provides the engine to create PowerCubes on any of the supported server platforms such that the processing of data during category generation and cube creation takes place on the server. All PowerCube prototyping is still done on the client; however, you must install the PowerPlay Administration Server edition on the client in order to have the options available to process the cubes on the server.
Installation Checklist
Before installing and working with the product, you should read the Installation Guide and the Transformer for UNIX Guide. This book focuses wholly on Client/ Server Transformer, and contains explanations of all the environment variables that can be used to customize your working area. Reading these will also familiarize you with where specific information is located so you can reference them later.
32
The Basics
Ensure you have the required hardware and software. The supported Operating Systems and databases are:
Database Product and Version Oracle 7.3 (base) and 8 (current Platforms and Operating Systems HP-UX 10.20 (base) and 11.x (current) Sun Solaris 2.5 (base) and 7 (2.7) (current) IBM AIX 4.2 (base) and 4.3 (current) Digital UNIX 4.0 (base) and 4.0E (current) SINIX 5.43 (current)
HP-UX 10.20 (base) and 11.x (current) Sun Solaris 2.5 (base) and 7 (2.7) (current) IBM AIX 4.2 (base) and 4.3 (current) Digital UNIX 4.0 (base) and 4.0E (current) HP-UX 10.20 (base) and 11.x (current) Sun Solaris 2.5 (base) and 7 (2.7) (current) IBM AIX 4.2 (base) and 4.3 (current) Digital UNIX 4.0 (base) and 4.0E (current) SINIX 5.43 (current) HP-UX 10.20 (base) and 11.x (current) Sun Solaris 2.5 (base) and 7 (2.7) (current) IBM AIX 4.2 (base) and 4.3 (current) Digital UNIX 4.0 (base) and 4.0E (current) SINIX 5.43 (current)
Sybase SQL Server 11.2 (base), Adaptive Server 11.5 (current) and Sybase Adaptive Server 11.9.2
IBM DB2 Common Server 2.1 (base) HP-UX 10.20 (base) and 11.x (current) and its successor: Universal DataSun Solaris 2.5 (base) and 7 (2.7) (current) base 5.x (current) IBM AIX 4.2 (base) and 4.3 (current)
Consult the README help file accompanying PowerPlay for any changes to the list of supported databases. Also, before beginning your installation, you should ensure you have everything on the following checklist:
UNIX root password UNIX login and password for user who will store server models and cubes PowerPlay Server CD ROM Host name or IP address of UNIX server UNIX shell ( i.e. csh, sh, ksh). Echo $SHELL to determine this. You should be able to successfully TELNET from the PC to the UNIX server If possible, make sure a DBA will be available for questions, especially if you are planning to store cubes in the database. Many DBAs do not like to give out create and drop privileges; they want to be involved in this step.
33
If possible, have a UNIX System Administrator available. Double-check the version of UNIX you are using - the command uname -a at the UNIX command line should tell you this. Ensure the version is supported. Make sure you know what database(s) are used and that the version is supported. Cognos Customer number. This will enable you to get to the Customer Support area and Knowledge Base on the Cognos Web Site to get common solutions to configuration issues.
For Oracle
Ensure that:
the environment variables ORACLE_HOME, TNS_ADMIN, and ORACLE_SID are set up and pointing to the correct locations. For example:
ORACLE_HOME=/train/oracle7.3/products TNS_ADMIN=$ORACLE_HOME/network/admin ORACLE_SID=ORAUX73
the bin directory under Oracle is in the path. the Oracle login for the schema where you will place your database PowerCubes has create table, create index, and drop table privileges. OCI via SQL*Net (Oracle 7.3 and 8), Net 8 (Oracle 8) is installed on the PC. the same TNSNAME defined (case sensitive) is defined in the client and server TNSNAMES.ORA file. you can successfully ping the Oracle Instance using TNSPING from the client PC and the server.
For Sybase
Ensure that:
the environment variables: SYBASE and DSQUERY set up and pointing to the correct locations. For example:
SYBASE=/train/sybase11/server DSQUERY=SYBTRAIN (Sybase Server Name would list here)
34
The Basics
the Sybase login has create database privileges CT-Lib (Sybase 11 Client Library) is installed on the client the same Server Name (case sensitive) defined in the server INTERFACES file (stored in $SYBASE) and the client PC (stored in c:\sybase\ini\sql.ini file). You can successfully ping the Sybase SQL Server using WSYPING or DSEDIT
For Informix
Ensure that:
the environment variables: INFORMIXDIR and INFORMIXSERVER are set up and pointing to the correct location. For example:
INFORMIXDIR=/train/informix INFORMIXSERVER=INFSOL72 (Informix Server Name would list here)
the Informix login that has create privileges ESQL/C 7.23 for Win 32 (Informix Net 7.2 or higher) is installed on the client the same Server Name defined in the server. you can successfully access Informix using Ilogin.
Step-wise Testing
As discussed earlier, several components make up the PowerPlay Administrator Server environment. With step-wise testing, you ensure that each piece is working before you start using the tool. This way, if an environment setting or configuration step was left out, you can find it right away and have a better chance of isolating the proper area if a problem occurs. The following steps are provided as guidelaines to make sure each piece works individually, and builds up to using all of the pieces together:
1.
Install the server software and make sure PowerGrid is working. Install the software on the UNIX server and the PC. Configure the environment settings on the Server. For client/server operations, set the database environment variables and source the setpya.sh script in the
35
rsserver.sh file that resides in powergrid/tcpip/netbin. For command line operations, make the same modifications to the .profile or the .login. Verify the cognos.ini on the PC is setup correctly to run Transformer Server by ensuring that the Transformer Server service is defined. The entry looks like this:
[Service - Transformer Server ] NETWORK=rsserver.sh
Configure and start the PowerGrid service (netd) on the UNIX server. You may set up logging for netd (see Automation Scripts on page 47 for information about how to do this). Do not forget to assign a port number to PowerGrid in the /etc/services file on the server. Define a PowerGrid connection and test to ensure PowerGrid is working correctly via netinfo.exe from the PC. See Chapter 2 of the Transformer for UNIX Guide for information about PowerGrid and NetInfo.
2.
Make the environment cleaner and nicer to work in (see Creating an Effective Working Environment on page 43.): Create directory structures on the UNIX side for cubes, models, logs, and work directories. Modify the Transformer Server startup script (rsserver.sh) to add preferences for logging, and so on. Create a server preferences file (.trnsfrmr.rc).
3.
Create a server PowerCube using Transformer Server with an ASCII file Create a simple model with an ASCII file (national.asc sample) and set it up to work in client/server mode. Move the model file to the server using client Transformer (preferred) or use FTP method. Build this on the server. Copy the PowerCube to the PC and open cube with PowerPlay Client.
4.
Create a server PowerCube using Transformer Server with .IQD files Take a model that uses .IQD files and set it up to work in client/server mode. Move the model file to the server using client Transformer or FTP method. Build this on the server. Copy PowerCube to PC and open cube with PowerPlay Client.
5.
Optional - only if you are storing cubes in your database: Create Database cubes from the server Take a model and build as standard cube first (or, can use the cube from step 4)
36
The Basics
Create the tables in the database and allocate space using double the size of the standard cube as a rough guideline. Modify the model to build the cube in the database. Move the model file to the server. This is optional; you can buildthe model from client first to make sure it works then move it to the server. Recreate the cube as a database cube.
Common Pitfalls
There are some common pitfalls that users encounter when using PowerPlay Server. Most of these concern environment and configuration settings. The sections below discuss different areas and what to look out for in each.
37
38
The Basics
Passwords
When working with models, it is often a good idea to save the model as an .MDL file as well as a .PYx file. Passwords are embedded in .PYx models, which are binary files. However, they are not embedded in .MDL models, which are text files. This is correct behavior of the product and documented in the PowerPlay Administrators Guide. Therefore, if you have embedded your password into your model, saved it as both .PYx and .MDL, closed the tool and then decided to work with the .MDL model, Transformer will not save the password information in this version of the model. Instead, it will default to prompt for password in the sign on. If you try to restore this model from the client to server and run on the server, the process will fail because the server cannot prompt for the password. To avoid this problem, either use the k option (see below) on the UNIX command line to build the cube, or add the password information to the model, save as a .PYx file, and use this format to synchronize the server cubes. One method for moving models from the client to server and building on the server is to ftp the .MDL version of the model, process the .MDL model on the server to a .PYx, and then build from the UNIX command line. When using this method, since passwords are not embedded in the .MDL file, you should use the -k option of rsserver to explicitly set the password. All rsserver command line options are documented in Chapter 5 of the Transformer for UNIX Guide. If a model suddenly stop working, make sure passwords used are still valid since they may periodically change. Before any communication between the client and server can take place, Transformer will prompt you to connect to the UNIX server with a user id and password. By default, security is enforced with netd(), and the user ID and password from login are checked in /etc/passwd file OR other password file for TRUSTED systems. Examples of password files for Trusted systems are:
AIX:/etc/security/passwd HPUX(10 and higher):/tcb/files/auth/ Sun Solaris:/etc/shadow
Therefore, in order for netd to allow you to connect to the server, the user ID and password information should be located in either the /etc/passwd or one of the above files for authentication, otherwise you will not be allowed to connect.
39
.IQD files to get data from databases that can be accessed from the UNIX server delimited-field text files fixed-field text files
There needs to be enough temporary space on the server to build PowerCubes. You can set DataWorkDirectory and ModelWorkDirectory to places that are large enough to force Transformer to use these as work areas. This space should be larger than the expected size of the cube. See Configuring Work Locations for Transformer on page 16.
This error might indicate another user is connected to the database cube. Have the other user disconnect from the database cube until the cube update is complete. Transformer attempts to apply an exclusive lock to the database
40
The Basics
when performing a database cube operation - if someone else is already using that database, the lock cannot be applied and an error is returned.
Space Requirements
If you are building cubes in an RDBMS container, the space requirements are different that those required to build the equivalent Cognos standard cube. This section describes the results of an internal test performed using Oracle. This is based on a specific model, and actual sizes will vary depending of the specific structure of the client model and size of data source:
As a general rule, an Oracle PowerCube will occupy approximately twice the space of its equivalent Standard PowerCube. For example, if you build a standard cube that produces an 80MB MDC file you would need approximately 160MB of Oracle tablespace to build the equivalent RDBMS cube, including indexes. Of the 160MB, Transformer would use approximately 10% for indexes and the rest for data. The end result would be a 144MB data tablespace and a 16MB index tablespace. There are 17 tables used to store a PowerPlay 6.0 cube in an Oracle database. Only a few of these tables store significant amounts of data, with the XXXX_BITMAP table occupying the bulk of the space. The following are suggested extent sizes for PowerPlay 6.x:
Table Name XXXX_ACCESS XXXX_ALLOCATION XXXX_APPLICATION XXXX_APPLICATION60 XXXX_BITMAP XXXX_CAT_INDEX XXXX_CATEGORY XXXX_CELL XXXX_CLASS XXXX_CURRENCY XXXX_DIMENSION XXXX_DRILL XXXX_EXCHANGE XXXX_LEVEL XXXX_MEASURE XXXX_PARTITION Initial Extent 5K 5K 5K 5K IM IM 1M 5K 5K 5K 5K 1M 5K 5K 5K 5K Next Extent 5K 5K 5K 5K IM 1M 1M 5K 5K 5K 5K 1M 5K 5K 5K 5K
41
Initial Extent 1M
Next Extent 1M
With very large cubes, its advisable to change the INITIAL and NEXT extents for the XXXX_BITMAP table to be much larger depending on the number of times the table can extend. Also, the XXXX_CELL table is only used if crosstab caching is enable in the PowerCube property sheet (Processing TAB). If this option is selected then you would want to change the extent sizes to be 1 MB. The default table creation scripts supplied with Transformer do not create the tables and indexes in separate tablespaces. You can modify these scripts to use different tablespaces for the indexes and tables if necessary or desired.
When creating cubes in an RDBMS, there are certain parameters you can set that may speed up Transformers performance.
RDBMS Oracle Description of Settings
The Fetch Number of Rows entry in the [Misc] section of cogdmor.ini. This setting controls how many rows are retrieved during a fetch operation during data read. The default setting is 10. Try bumping it up to 50 or 100.
For larger cubes, try creating your indexes and tables in separate tablespaces. Certain of Oracles default configuration settings are in conflict with Transformers requirements.
Sybase
DB/2
The NUMOFROWS setting in the [Misc] section of cogdmct.ini. This setting controls how many rows the server returns to Client Library for each client fetch request. Bumping it up (for example to 50 or 100) can have a positive impact on some systems. In the cognos.ini file, the following entry controls the level of locking for DB/2 during cube updates: [PowerPlay Server - DB2] security=0/1 The default setting is 1, security enabled. When you disable security, you will not be able to tell if someone is using the cube. A cube build will overwrite an existing cube. This wont necessarily be apparent to the client, but data could change during a client session. You should ensure that no users are connected when you build cubes.
42
Configuration Files
The following is a listing of files used by PowerPlay Server Transformer that have either been created or edited after installation along with explanations. The only addition that must be added to ensure proper usage is to the file ".profile". The changes to the other files are not absolutely necessary to run PowerPlay Server Transformer, but they either make the product easier to use or add more logging capabilities. Changes are highlighted in bold lettering.
.trnsfrmr.rc
/home/cognos/.trnsfrmr.rc: This is a resource file that was created after installation. This file contains default settings for PowerPlay Server Transformer to use when
43
executed on the UNIX command line (not when executed via PowerGrid from the PC).
FilenameVariables=true ModelWorkDirectory=/home/cognos/work DataWorkDirectory=/home/cognos/work DataSourceDirectory=/home/cognos/work CubeSaveDirectory=/home/cognos/cubes ModelSaveDirectory=/home/cognos/models LogFileDirectory=/home/cognos/log LogDetailLevel=6
44
rsserver.sh
/opt/cognos/powergrid/tcpip/netbin/rsserver.sh: This file is installed with PowerPlay Server Transformer and is used by PowerGrid to set the necessary environment variables and start the Rsserver process. This is only used to start the rsserver process when executing tasks from the PC (via PowerGrid) on the server. The bold items contain default settings for PowerPlay Server Transformer to use when executing tasks from the PC. Also note that the necessary database environment variables are set near the top of the file.
#!/bin/sh # $Header: /High Sierra/UNIX/setup_ux/RSSERVER.SH 4 Taos $ # $Author: Taos $ # rsserver.sh PowerGrid 3/11/97 11:01a
# set environment variable for Oracle ORACLE_HOME=/fin030601/oramtmo/7.3.2.2 ; export ORACLE_HOME . /usr/local/bin/oraenv # set environment variables required to run rsserver PYA_USR=${PYA_USR:=/opt/cognos/pya65_278} . $PYA_USR/setpya.sh if [ x"$DSQUERY" = x ] ; then DSQUERY=SYBHPUX491 ; export DSQUERY fi if [ x"$PLANG" = x ] ; then PLANG=english ; export PLANG fi # You may modify values and remove comment indicator # to use # these environment variables to modify Transformers behaviour. # Preference values established here override any set by setpya.sh above.) # The documentation describes these and many additional controls. DataSourceDirectory=/home/cognos/work ; export DataSourceDirectory LogFileName=/home/cognos/log/$$.log ; export LogFileName ModelSaveDirectory=/home/cognos/models ; export ModelSaveDirectory CubeSaveDirectory=/home/cognos/cubes ; export CubeSaveDirectory DataWorkDirectory=/home/cognos/work ; export DataWorkDirectory # launch Transformer in PowerGrid mode... exec $PYA_USR/bin/rsserver -w
45
netpgrc
/opt/cognos/powergrid/tcpip/network/netpgrc: This file is installed with PowerPlay Server Transformer and is used to set the necessary environment variables and start the netd daemon process (PowerGrid). The item "env > $NPPATH/netd_env.log" will redirect the output of the env command (display current environment variables) to the log file named "netd_env.log" in the netbin directory. The last highlighted addition tells PowerGrid to log detailed actions and specify that the information be written to a log file called "netd.log" in the netbin directory. This is especially useful when troubleshooting PowerGrid. This file should be used to start the PowerGrid netd process because it ensures the proper environment variables are set and enables logging.
#!/bin/sh ##################################################################### ######### # COGNOS INC. PowerGrid Network Daemon rc script for all UNIX platforms ##################################################################### ######### COGNOS=/opt/cognos export COGNOS PG_VER_DISPLAY=2.0.6 # # Sourcing all Cognos client-server products Environment variables # if [ "echo /opt/cognos/powergrid/tcpip/netbin/net??rc" != /opt/ cognos/powergrid/tcpip/netbin/net??rc ] ; then for script in $COGNOS/powergrid/tcpip/netbin/net??rc do . $script echo "sourced Cognos Products environment through $script" done fi # NPPATH=$COGNOS/powergrid/tcpip/netbin export NPPATH # # The following sample lines are used by NETD for non-default comm. port. # If the env. variable NPNETD is set up here, NETD will NOT pick up # any port number from /etc/services. Please take out "#" both lines
46
# and change yyyy below to whatever the actual port number before # re-start NETD. # #NPNETD=ISyyyy #export NPNETD # LANG=C export LANG SRVCMSGS=$COGNOS/powergrid/tcpip/network/srvcmsgs.cat export SRVCMSGS # # COGNLSTAB=$COGNOS/powergrid/tcpip/network/coglangtab export COGNLSTAB # env > $NPPATH/netd_env.log if [ uname -m = i386 ]; then echo "SCO PowerGrid Network Daemon Version $PG_VER_DISPLAY" su root -c "$COGNOS/powergrid/tcpip/network/netd &" else echo "UNIX PowerGrid Network Daemon Version $PG_VER_DISPLAY" $COGNOS/powergrid/tcpip/network/netd -d -f &
fi
Automation Scripts
The following are examples of scripts that can be used to automate the scheduling of PowerPlay Transformer cube builds. These scripts assume the same installation and work directories as mentioned earlier. The work directories must be in place for these scripts to work. File 1, buildscript, is the shell program that can be called to build a script to run any model. To use buildscript: At the UNIX command prompt, type "buildscript". . When prompted, type the name of the model you are building the script for. An example:
fin_hp3:/home/cognos-> buildscript Please enter the name of the model to create the run script for.
47
In the above case, the script "run_ddbcube.sh" can now be added to a crontab to be scheduled to run automatically. File 2, generic, is used as a template for buildscript to work. File 3, run_ddbcube.sh, is a sample script created using buildscript. File 4, crontab, is a sample of a crontab that can be used to schedule tasks in UNIX. The "crontab -e" command needs to be run in UNIX to create this file. Note, when more than one task needs to be run, you add the tasks to the same crontab using the "crontab -e" command. Refer to UNIX documentation for more information on cron.
File Locations
For the automation scripts to work, they need to be kept in the directory locations they currently reside in. /home/cognos/scripts/bin/buildscript /home/cognos/scripts/bin/generic All scripts built should reside in: /home/cognos/scripts To run the scripts, the user should log into UNIX with a user ID set up to work with Server Transformer.
File 1: /home/cognos/scripts/bin/buildscript
This file uses /home/cognos/scripts/bin/generic as a template and builds scripts to run models with PowerPlay Server Transformer. Before running this, you must apply execute rights to the file.
#! /bin/sh
48
# Edit "workdir" to point to the directory where the sub-directories for # models, cubes, logs, etc are located workdir=/home/cognos echo "Please enter the name of the model to create the run script for. " read modelname echo "Building $workdir/scripts/run_$modelname.sh" #Replace all occurances of "model_name_here" with the input name sed "s/model_name_here/$modelname/g" $workdir/scripts/bin/generic > $workdir/scripts/run_$modelname.sh # Make the shell script executable chmod +x $workdir/scripts/run_$modelname.sh echo "------ Finished ------"
File 2: /home/cognos/scripts/bin/generic
This file is used by /home/cognos/scripts/bin/buildscript as a template to build to run models with PowerPlay Transformer Server.
#! /bin/sh # This script runs the given model file. # Edit workdir to point to the directory where the sub-directories for # models, cubes, logs, etc are located # Edit installdir to point to the appropriate pya* directory where the install files are located workdir=/home/cognos installdir=/opt/cognos/pya65_278 . $HOME/.profile . $installdir/setpya.sh # Save the previous log file (if it exists) if [ -e $workdir/log/model_name_here.log ]; then mv $workdir/log/model_name_here.log $workdir/log/ model_name_here.bak fi # Run model_name_here and output to a log file rsserver -c -i $workdir/models/model_name_here.pye > $workdir/log/ model_name_here.log 2>&1 &
49
File 3: /home/cognos/scripts/run_ddbcube.sh
This is a SAMPLE of a script created by /home/cognos/scripts/bin/buildscript. In this example, the name of the model is ddbcube.
#! /bin/sh # This script runs the given model file # Edit workdir to point to the directory where the sub-directories for # models, cubes, logs, etc are located # Edit installdir to point to the appropriate pya* directory where the install files are located workdir=/home/cognos installdir=/opt/cognos/pya65_278 . $HOME/.profile . $installdir/setpya.sh # Save the previous log file (if it exists) if [ -e $workdir/log/ddbcube.log ]; then mv $workdir/log/ddbcube.log $workdir/log/ddbcube.bak fi # Run ddbcube and output to a log file rsserver -c -i $workdir/models/ddbcube.pye > $workdir/log/ddbcube.log 2>&1 & ret_stat=$? exit $ret_stat
CRON is a UNIX scheduling utility that uses a crontab (like below) to list and track the scheduled tasks it needs to run.
# =================================================================== # crontab input file to run cognos # =================================================================== # flags (, between mult values, - for range, * for all) #minute = 0 to 59 #hour = 0 to 23 # day of month= 1 to 31 #month= 1 to 12
50
# day of week= 0 to 6 (0=Sun, 1=Mon, 2=Tue, 3=Wed, 4=Thu, 5=Fri, 6=Sat) # =================================================================== # -------ddbcube -------------# This will run daily at 4:00 am 00 4 * * * /home/cognos/scripts/run_ddbcube.sh
51
CHAPTER 4
This section describes the results of a series of Cognos internal tests conducted using PowerPlay version 6.5. All results were obtained on the following server:
Digital Prioris Model ZX-109 Dual 200MHz Pentium Pro Processor 512 MB memory Digital Storage Array (DS-SWXRA-HA) configured with 130GB of storage (RAID-5). Drives in array are DS-RZ1DB-VW (9.1 GB Ultra SCSI Drive)
53
Two of the three models used in the benchmark tests originated from actual clients (Row Dominant and Category Dominant). The Row and Category Dominant model was manufactured internally but its design was based on actual client models.
In the [PowerPlay Server] section of cognos.ini, the value for WriteCacheSize was set to 32000. In the [Services] section of cognos.ini the value for Temporary was set to a directory that had a large amount of available disk space. In the [Services] section of cognos.ini the value for SortMemory was set to 51200. This is the equivalent of 100MB.
The following steps were repeated 4 times for each of the Transformer models:
All unneeded processes on the test machine were halted. An MDL script was then used to run the following sequence: Open the test model. Change the name of the powercube (using a numbered sequence) Build the PowerCube. Save the populated model to PY.
54
auto-partitioning with 5 passes maximum and a maximum partition size of 500,000 records crosstab caching enabled
2 x 64 bit, 1 x 32 bit, 1 after rollup calculated Source Data Format ASCII (comma-delimited) Number of source files 14 (11 provide structure, 3 are transactional) Enable Multi Processing was set for the 4 largest queries. Number of Transaction 10 million input records Size (in MB) of all 570 source files
55
auto-partitioning with 5 passes maximum and a maximum partition size of 50,000 records crosstab caching enabled
Model Attribute Number of Categories Number of Dimensions Number of Measures Source Data Format Number of source files
Description 146,238 5 (measures not counted as a dimension) 15. All 64-bit. ASCII (comma and ~ delimited) 6 (5 provide structure, 1 is transactional)
Enable Multi Processing was enabled for all but the smallest query. Number of Transaction 1 million input records Size (in MB) of all 224 source files
56
auto-partitioning with 5 passes maximum and a maximum partition size of 500,000 records crosstab caching enabled
Description 492,152 5 (measures not counted as a dimension) 5 (two calculated). 3 x 64 bit, 2 after-rollup calculeted. Source Data Format ASCII (~-delimited) Number of source files 9 (6 provide structure, 3 are transactional) Enable Multi Processing was set for the 4 largest queries. Number of Transaction 50 million input records Size (in MB) of all 2,280 (2.28 GB) source files
57
Part II
The most common question we receive with respect to PowerPlay Enterprise Server is: How may servers of type A do I require to service requests for n users against m cubes? This question is difficult to answer without some understanding of the following things:
How well are the target cubes designed? This will impact how long the average
query takes to complete, and how much CPU will be eaten up in servicing requests.
How large and complex are the average queries? Is the average query a 50 row
report or a 10,000 row report with nested levels, all measures, and zero suppression?
59
CHAPTER 5
This chapter describes results from a set of internal tests run against PowerPlay Enterprise Server. Based on the results, it makes recommendations for server sizing.
61
these test results do not include the overhead of sending HTML query results
from a request back to the cgi-gateway, through the web server, and ultimately back to the web browser. The results are based strictly on observed times for the query processor to handle requests. Under normal circumstances, this should be the bulk of processing time. In high volume client installations, similar throughput will be achieved on the PowerPlay Enterprise Server machine, provided that the cgi-gateway and web server are on a different machine. The machine that runs the web server must be large enough to handle the volume of HTML files coming from PowerPlay Enterprise Server.
Because CAbuse generates random queries very quickly, it simulates about the
worst possible load imaginable. The randomness of the queries ensures that very little advantage is derived from the servers PPDS cache. Also, because its truly random in its choice of links to follow, CAbuse periodically generates a phenomenally large query -- a hundred thousand rows nested, by several columns, displayed as bar charts, for example. The tests, therefore, do include some very large queries (as well as, admittedly, some very small ones).
We have not (to date) run many tests against UNIX servers. The results in this
document are primarily for NT Server only. UNIX tests are planned and will be documented as soon as theyre available.
62
a. Compaq DeskPro 6300. 1 x 300 MHz CPU, 320MB RAM. b. IBM Netfinity Server. 2 x 350MHz CPU, 512MB RAM.
The above results are average times over several repeated tests, with varying levels of incoming queries. The tests were run with PPDS Read Cache settings ranging between 32MB and 48MB -- no significant difference was discerned between these settings. However, tests run with a PPDS Read Cache setting of only 8MB yielded far lower overall throughput. For example, with an 8MB PPDS Read Cache setting, the dual-processor NT server (Server 2) achieved a maximum throughput of about 7 or 8 requests per second, far lower than was achieved with a higher PPDS Read Cache setting, indicating that the cache does have a positive effect. The tests indicate that a single PPDSWEB process will handle slightly better than 50 concurrent users (5 to 7 requests per second), and that, on a dual processor PC, two processes will handle more than 100 concurrent users (10 to 12 requests per second). The results confirm that multiple CPU machines can dramatically increase the throughput possible with PowerPlay Web, up to the maximum number of fully loaded PPDSWEB processes.
63
The following charts shows output from NT Performance Monitor (showing %CPU usage for each CPU) and average request completed over the duration of one instance of the above test for the IBM NetFinity Server (Server2):
64
The following PowerPlay report, derived from the NT Event Log, shows the number of requests completed per second during a test run on Server2:
Average Queries/Minute
Queries per Time Period 700 600 500 400 300 200 100 0 Minute 02 Minute 04 Minute 06 Minute 08 Minute 10 Minute 12 Minute 01 Minute 03 Minute 05 Minute 07 Minute 09 Minute 11
Mean
The following graph shows the average request time in microseconds for the same test run:
Mean
400 300 200 100 0 Minute 02 Minute 04 Minute 06 Minute 08 Minute 10 Minute 12 Minute 01 Minute 03 Minute 05 Minute 07 Minute 09 Minute 11
As you can see, the NetFinity server handled an average of nearly 600 requests per minute, with an average query time of under half a second. If we assume a concurrent user level of 10% (of licensed users), and concurrent request level of 10% (of concurrent users), that implies this server would handle 100 concurrent users, or 1000 licensed users, with very fast average query times, provided users were executing average queries.
65
Detailed logging was enabled for the duration of the test run. With logging disabled, slightly greater throughput should be achievable.
Open Cube Drill Down Drill Up Nest Rows Nest Columns Suppress Zeros
The decrease in time under load for Nesting is due to the effect of a populated cache on the PowerPlay Enterprise Server.
66
Cube Characteristics:
Input Records: 10 Million Dimensions: 5 Categories: 32,000 Measures: 4 Cube Size: 62MB
A test run against the same cube on a quad processor Solaris server yielded sustained throughput of 800 to 900 requests per minute, with peaks of over 1000 requests per minute. For the test, 10 processes were set up (as Startup Processes) for the cube, and 64MB of PPDS Read Cache was allocated. For the duration of the run, CPU utilization rested at or above 90%, with no significant processes other than ppserver and the 10 ppdsweb instances on the machine. On average, the server handled approximately 15 requests per second. If we assume a concurrent user level of 10% (of licensed users), and concurrent request level of 10% (of concurrent users), that implies this server would handle 150 concurrent users, or 1500 licensed users, with very fast average query times, provided users were executing average queries.
67
For large installations, strive for an average query response time that is short -a few seconds at most -- with the server not loaded. A single user should see results back very quickly for nearly all queries. If this is not possible, then setting client expectations becomes critical. One thing to ask clients who insist on generating reports with a hundred thousand rows and 10 columns is: When was the last time you opened a table like that on a regular HTML page?. You cannot expect blindingly fast performance if your average report contains a million cells with zeros suppressed. For multi-processor systems on NT, the scalability from one processor to two has been found to be fairly linear for PP Web requests. That is, if a single CPU P300 Server will handle n concurrent requests per second (on average), then a dual-CPU P300 server should handle 2xn concurrent requests of a similar nature, provided you set up 2 or more processes for the cube. This indicates that PP Web is relatively CPU-bound. The more CPUs the better, for large installations. Note: Although performance improvements have been found to be linear from 1 to 2 processors, its unlikely to be linear from 2 to 4 processors. Larger servers expend additional effort in managing system resources. Further tests in this area are pending.
2.
3.
Tests to date for load balanced servers are inconclusive at this time. However, preliminary tests indicate that load balancing across 2 identical servers does not yield a doubling of throughput. Rather, each additional server adds approximately 50% additional capacity. For example, suppose Server A will handle n requests/second. A mapped cube on Server A that is mapped onto physical cubes on Server A and identical Server B will handle approximately n*1.5 requests/second. The network can become the bottleneck in load-balanced environments. If two back-end PowerPlay servers can normally handle a set number of requests per minute apiece, and load balancing requests across them provides no perfor-
4.
68
mance gain, its possible that either the network is too slow, or the Network Interface Card on the central dispatcher machine is handling as many requests as it can. A good indication of a network bottleneck is a low level of CPU usage on the machines where the PowerPlay Query Processors are installed, even though the number of incoming requests on the central dispatcher is very high.
Note: Requests per Second in this table represents the time required for the Query Processor to service a request (the times you see in the detailed log for the cube). It does not represent the overall time to submit the query from a browser, have it dispatched, processed, and returned to the browser via the web server. In large installations, the actual processing of the request will be the bulk of the query time, as the web server will be on a different machine than the PowerPlay server.
69
You can use this table to as a guideline to derive an estimate for the number of server required to service requests for client installations by adjusting the figures. The key things to consider are the complexity of the requests being submitted, and the percentage of users active at any given time. If any of these go up significantly, then the number of servers required to keep query response time in an acceptable range goes up accordingly. Its important to note that the above table provides guidelines for average queries against medium sized cubes (50 to 100 MB, relatively row dominant). For installations with lots of smaller cubes, it should be possible to reduce the number of servers required. For example, a site with 50 cubes, all of which are less than 500K in size, with low category and row counts, will not require as large a server as outlined above. Queries against such small cubes should be very fast (200 to 300 milliseconds), and hence you should see higher overall throughput. For installations with large category-dominant models and complex requests, the throughput may be lower than depicted in the above table.
70
Memory Requirements
Determining memory requirements for PowerPlay Enterprise Server depends on
the number of cubes registered on the server the maximum number of processes that you define for each cube the number of startup processes that you define for each cube.
Each process thats active for a cube will use up to the amount of memory specified for the servers PPDS Read Cache size. A practical setting for the PPDS Read Cache is about 30% of the average cube size. An absolute ceiling for the PPDS Read Cache is currently 64 MB; above this limit, the memory management for the cache can become more expensive that resubmitting the query. The maximum number of processes for a cube is truly a maximum. If you have a popular cube that incurs many hits periodically (monthly, for example), theres no penalty in setting a high number of maximum processes permanently. Unless you specify startup processes for the cube, the Enterprise Server initiates new ppdsweb processes only as theyre required by new incoming requests. So, if you specify a maximum of 10 processes, but on average (except for month-end) only 2 users ever access the cube per day, then only 2 processes will consume resources on an ongoing basis. If the amount of resources being used is a concern, you can minimize the length of time that a process will remain active after the most recent request to minimize memory requirements. Use the PowerPlay Enterprise Server Administrator to set the process timeout:
Process Timeout sets how long a query processor remains in memory before it shuts down due to inactivity.
Because started processes to terminate faster, memory is freed up for other processes. However, it will also potentially reduce the effectiveness of caching.
71
The server on which PowerPlay Enterprise Server is running has free CPU cycles. If there are no free CPU cycles, the current average request time is sufficiently fast that an increase in query processing time will still meet user expectations.
Deciding on how many processes becomes clear using the following analogy, which outlines two methods to achieve the same throughput. In a cafeteria, there are 10 users with lunches to heat, and only one microwave oven. You, as the cafeteria coordinator, have the following options:
Let cafeteria users line up one after the other to heat their lunches in the microwave. Put all 10 lunches in the microwave at once, and have all users wait until theyre all heated.
The first option ensures that some people get fast service -- those in the queue early will have their lunches heated quickly. Those at the back of the queue will have to wait until others have finished, but once their turns arrive, their lunches will heat as quickly as anyone elses. The second option ensures that everyone waits for pretty much the same period of time. But because the microwave contains 10x the volume of a single lunch, the time to get it all heated is relatively high. The result is that nobody gets an optimal heating time, but everyone waits for the same amount of time. Those who would have been at the end of the queue get faster service than they would have, while those at the front of the queue are slowed down. Now think of the people with lunches to heat as incoming PowerPlay requests, and the number of lunches in the microwave as the number of processes that youre running for a cube. If the server (microwave) has free cycles (additional watts for heating), then adding processes (lunches) will provide better throughput. At the point where the server is running at or near 100% CPU, however, adding processes has a similar effect to placing several lunches in a microwave at the same time; no query
72
has to wait for any other query to finish, but all queries take longer. The number of queries processed is the same, but the effect to end users can be very different. The best approach obviously varies from one situation to the next. You must balance the servers resources against both the number of processes that server can handle and the expectations of users. As a rule of thumb, if a server is running at or near 100% of CPU capacity, then dont add more processes for busy cubes.
73
CHAPTER 6
There are some undocumented features in PowerPlay Enterprise Server that you can use to advantage. These are:
restricting the maximum query processing time for a cube setting the Query Processor threshold to balance requests across busy processes (PowerPlay client requests only) registering multiple cubes by directory name reducing the amount of HTML passed to the Web Browser passing security information to PowerPlay Enterprise Server from a client web application
75
By default, RequestTimeoutSec is set to 900 seconds (15 minutes). You can adjust this number lower to avert the possibility that users can submit runaway queries that consume up to 15 minutes of query processor time. Its important to note that once enabled for a cube, it applies to all queries submitted by all users. Setting it too low may result in a high number of query failures.
76
This figure represents a ratio of delay time for incoming requests to active query processing time. As s new request is received, there may be a delay period before that request gets processed by a query processor (if the query processor is busy). Setting the QPThreshold value works like this:
If QPThreshold is set to 0, then each new request is submitted to the first available query processor. This is the same way that incoming PowerPlay Web requests are handled. If QPThreshold is set to 10, then each new request is submitted to the next query processor in a round-robin fashion, regardless of the load on that query processor. For any value between 0 and 10, each new request is submitted to the next query processor whose ratio of delay time:processing time is lower than the specified threshold. So, for a value of 5 (the default), an incoming request will be submitted to the first query processor found that has less than a 50% ratio of delay time to processing time.
77
78
Once this is done, you can then access any of the cubes within that directory using the following URL:
HTTP://SRV001/cgi-bin/ppdscgi.exe?XT=ALL+CUBES&MDC=cubename
where you replace cubename with the name of the cube that you want to access. Notes:
For cubes that you enable in this way, ensure that you disable the Publish to a Table of Contents (Web) option on the Control tab. The directory entry is meaningless in the table of contents. You can maintain cubes in this way using the batch administration tool.
79
By default, PowerPlay Enterprise Server returns 200 rows and 50 columns per page of generated report result. Setting this to a lower number reduces the amount of HTML passed back to the web server and on to PowerPlay Web users. Try setting the number of rows to a lower value, such as 50, and the number of columns to 10. This can have a big impact on the performance of your system if the web server is having trouble handling all of the query results being passed to it.
80
the capturing of the login information at the time of login, so that the users security information can be passed in as parameters on the PowerPlay Enterprise Server URL optionally, to encrypt the login information, a Cognos-provided encryption routine.
At run time, its possible to pass in security parameters to the Enterprise Server as part of the URL. The parameters you can use include:
Parameter U=<username> A=<password> L=<class> B=<db user id> Description Specifies the User Name, as defined in the associated Cognos security file. Specifies the user password, as defined in the associated Cognos security file. Specifies the user class, as defined in the associated Cognos security file. For cubes stored in an RDBMS or third-party OLAP container, specifies the database user ID, as defined in the associated Cognos security file. For cubes stored in an RDBMS or third-party OLAP container, specifies the database user password, as defined in the associated Cognos security file.
So, the following URL entry would open cube SALESDATA, with User ID manager and User Password qwerty:
http://www.srv001.com/cgi-bin/ ppdscgi.exe?XT=SALESDATA&U=manager&A=qwerty
Passing security information to the PowerPlay Enterprise Server in an unencrypted format as part of the URL is an unacceptable solution in many cases. If this is true, there is an unsupported Cognos library that you can use to encrypt the security
81
information before you pass it to PowerPlay Enterprise Server. The .h file (with which you create a small program) associated with this library contains source code for a function named ppWebEncrypt. The ppWebEncrypt function takes as arguments whatever security parameters you wish to pass, and encrypts them along with a timestamp that expires after 15 minutes.
char * ppwbEncrypt ( const char *dbUser, const char *dbPassword, const char *user, const char *password, const char *userClass);
The ppWebEncrypt function returns a time-stamped (valid only for 15 minutes), encrypted string that looks like this:
E5C7EA3AA7B3EFF1389A82E62BCE573C801A5EE3A124F52D7C4AEA4057B23F15A87F3 0EDC9DF068B74A39D7BFA
You can pass this encrypted string to PowerPlay Enterprise Server using the UX= parameter:
http://www.srv001.com/cgi-bin/ppdscgi.exe?XT=SALESDATA &UX=E5C7EA3AA7B3EFF1389A82E62BCE573C801A5EE3A124F52D7C4AEA4 057B23F15A87F30EDC9DF068B74A39D7BFA
At present, the ppWebEncrypt library is available for NT Server and Solaris only. If you need this library, contact FieldSupport.
82
83
84