Getting Started With Pentaho
Getting Started With Pentaho
Getting Started With Pentaho
This document is copyright 2011 Pentaho Corporation. No part may be reprinted without written permission from Pentaho Corporation. All trademarks are the property of their respective owners.
Trademarks
Pentaho (TM) and the Pentaho logo are registered trademarks of Pentaho Corporation. All other trademarks are the property of their respective owners. Trademarked names may appear throughout this document. Rather than list the names and entities that own the trademarks or insert a trademark symbol with each mention of the trademarked name, Pentaho states that it is using the names for editorial purposes only and to the benefit of the trademark owner, with no intention of infringing upon that trademark.
Company Information
Pentaho Corporation Citadel International, Suite 340 5950 Hazeltine National Drive Orlando, FL 32822 Phone: +1 407 812-OPEN (6736) Fax: +1 407 517-4575 http://www.pentaho.com E-mail: [email protected] Sales Inquiries: [email protected] Documentation Suggestions: [email protected] Sign-up for our newsletter: http://community.pentaho.com/newsletter/
Contents
Welcome....................................................................................................................................4
Finding More Resources...............................................................................................................................4
| TOC | 3
Welcome
This guide provides you with a brief overview of the client tools and server applications that make up the Pentaho BI Suite. Also included, are step-by-step instructions for installing the Pentaho BI Suite on Windows devices. Post installation walk-throughs include a quick introduction to Sample reports and exercises for creating a simple Interactive Report and a Pentaho Analyzer report. You will also learn how to add the reports you created to a dashboard and how to define a data source. Once you complete the installation and exercises, you will be able to start taking advantage of many of the features that are included in this release of the Pentaho BI Suite.
Important: The scope of this document is limited to the topics described in the paragraph above. If you need further information deploying the Pentaho BI Suite or have custom configuration requirements, contact your Pentaho Sales Representative or send an email to Support.
Successful implementations start with training and support. Pentaho trainers are experts in the industry and offer a wide range of affordable classroom, online, and on-site training services to meet your knowledge requirements. Our support team will help you become productive quickly.
Design tool
Pentaho Analyzer
Description WYSIWYG (What You See Is What You Get) report design and editing Template-based report creation for immediate productivity Pentaho Analyzer provides intuitive, interactive analytical reporting letting non-technical business users quickly understand business information against Pentaho Analysis data sources. Feature highlights: Accesses Pentaho Analysis data sources Web-based, drag-and-drop report creation Advanced sorting and filtering Customized totals and user-defined calculations Chart visualizations Dashboard Designer empowers business users to create rich, interactive dashboards with little or no training. Feature highlights: Creating a dashboard is as simple as selecting your layout, theme, and the content you want to display. Include any type of Pentaho reports, external web pages, or dashboard internal elements (charts and data grid) that access Pentaho Metadata data sources. Add dynamic filter controls making it easy for end users to filter the entire contents of their dashboard using a simple pick list Drive content within a dashboard using other dashboard content
Design Tools
The design tools are desktop applications that allow you to create a rich reporting environment: Design tool Report Designer Description Report Designer is a visual design environment that makes it easy for report authors to quickly create sophisticated pixel perfect reports against a wide range of data sources addressing the demands of operational, financial and production reporting. These reports can be executed as a standalone report within the User Console or used within a Pentaho Dashboard. In addition, it creates detailed charts and Pentaho Interactive Reporting and Report Wizard templates. It is geared towards experienced and power users, who are familiar with the report design concepts and the data sources used. Data Integration is an intuitive, graphical, drag and drop design environment that provides powerful Extraction, Transformation and Loading (ETL) features. In addition, Pentaho Data Integration provides you with Agile BI capabilities that collapse the multi-step and lengthy BI project cycles into a single integrated design, modeling, and visualization process. Agile BI drives close collaboration between BI application developers and endusers. Metadata Editor builds Pentaho metadata data sources, a data model representation of a relational database where business users can create queries without having to know SQL. Pentaho Dashboards and Pentaho Interactive
Design tool
Description Reporting are primary tools to access this data source. Additionally, a data model designer can tag useful attributes to secure or format columns in a data model which can be inherited in a report (prpt) and interactive report (prpti). Schema Workbench builds Pentaho Analysis data sources, (ROLAP cubes) facilitating data exploration and analysis for business users without having to know MDX. Pentaho Aggregation Designer is a graphical environment used to increase query performance of a Pentaho Analysis (Mondrian OLAP) schema through the creation of aggregate tables. Design Studio is used to create Action Sequences (.xaction). Action Sequences define lightweight, success oriented process flows within the Pentaho BI Server. It enables full customization and integration between all components within the Pentaho BI Suite.
Server Applications
The BI Server functions as a Web-based report management system, application integration server, and lightweight workflow engine. The BI Server supports the Pentaho User Console, primarily; it serves users with reporting, analysis, and dashboard content and includes the services for security, scheduling, Pentaho Metadata and Pentaho Analysis. The Enterprise Console Server supports the Enterprise Console and includes services for security, scheduling and repository management and the configuration of the BI Server. The Data Intergration Server (DI Server) is a dedicated, enterprise class server for ETL and Data Integration. It is used to execute Data Integration jobs and transformation and provides services such as scheduling and content management (including revision history and security integration).
System Requirements
Below are the system requirements for each Pentaho component included in this release:
Software Requirements
Note: The system requirements listed below apply to the BI Suite. The BI Suite graphical installation utility, however, will only work on Windows or Linux. In terms of operating systems, Windows (XP SP2, 2008, 7) modern Linux distributions (SUSE Linux Enterprise Desktop and Server 10 and Red Hat Enterprise Linux 5 are officially supported, but most others should work), Solaris 10, and Mac OS X 10.5 (and newer) are all officially supported. The BI Suite installer will provide you with a Sun Java Runtime Environment (JRE) version 1.6 (sometimes referenced as version 6.0) installed. If you have other versions installed, they will be ignored. Workstations will need to have reasonably modern Web browsers to access Pentaho's Web interface. Internet Explorer 8 or higher; Firefox 4.1 or higher; and Safari 5 or higher will all work. Note: The Pentaho User Console may have rendering issues in Firefox 3.5 and Safari 4.x.
Your environment can be either 32-bit or 64-bit as long as it meets the above requirements. The aforementioned configurations are officially supported by Pentaho. Other operating systems such as FreeBSD and OpenBSD; other Java virtual machines like Blackdown; other application servers such as Liferay and Websphere; and other Web browsers like Opera may work without any problems. However, the Pentaho support team will probably not be able to help you if you have trouble installing or using the BI Suite under these conditions. Note: Some Pentaho client tools, such as Metadata Editor, Aggregation Designer, Pentaho Data Integration, and Design Studio, require that the Eclipse SWT JAR be in your Java classpath. This can be an issue in Pentaho BI Suite Official Documentation | Before You Begin | 7
scenarios where standalone client tools are installed onto a machine that does not also have the BI Server installed, particularly on platforms other than Windows and Linux. Note: The Pentaho Reporting engine requires a graphical environment in order to create charts. If you are installing the BI Server onto a headless Linux, BSD, or Solaris server and do not have X11R6 on it, you should install the Xvfb package on your server to satisfy the charting dependency.
Hardware Requirements
The Pentaho BI Suite does not have strict limits on computer or network hardware. As long as you meet the minimum software requirements (note that your operating system will have its own minimum hardware requirements), Pentaho is hardware agnostic. There is, however, a recommended set of minimum system specifications: Server: RAM: at least 2GB Hard drive space: at least 2GB for the software, and more for solution and content files Processor: dual-core AMD64 or Intel EM64T
Workstation: RAM: at least 1GB Hard drive space: at least 1GB for the software, and more for solution and content files Processor: dual-core AMD64 or Intel EM64T
It's possible to use less capable machines, but in most realistic scenarios, the too-limited system resources will result in an undesirable level of performance. Your environment does not have to be 64-bit, even if your processor architecture supports it; while all modern desktop, workstation, and server machines have 64-bit processors, they often ship by default with 32-bit operating systems. If you want to run the Pentaho BI Suite in a pure 64-bit environment, you will have to install a 64-bit operating system, ensure that your solution database and Java Runtime Environment are 64-bit, and install the BI Suite via the 64-bit graphical installer, or through the archive-based or manual deployment methods. Note: A 32-bit JRE has a hard memory limit of 2GB (1.5GB on Windows), so if you have 2GB or more of RAM, you must use a 64-bit JRE on a 64-bit operating system to take full advantage of it. This means that, through architectural limitations, a 32-bit environment will likely be under-powered for a production BI Server deployment.
1. Go to Start -> Programs -> Pentaho Enterprise Edition -> User Console Login, or open a browser and go to http://localhost:8080/ or or the hostname, IP address, or domain name of the machine you installed the BI Server to. 2. Enter the credentials,joe/password (all lowercase letters), to open the Pentaho User Console launch page. Joe is a sample user with administrative privileges to the Pentaho User Console. The Pentaho User Console launch page appears.
Build dashboard interactive flash-based charts (bar, line, pie, area, dial).
4. Double-click on Product Scale, (under Products). 5. Double-click on Quantity Ordered, (under Orders). 6. Double-click Total, (under Orders). Your report should now have four columns: Product Name, Product Scale, Quantity Ordered, and Total. 7. Click and drag Territory, (under Customers), above the column headers in the report. This will group the Products by Territory. 8. Click and drag Product Vendor, (under Products), above the column headers in the report. This will group the Products by Territory and then by Product Vendor. Your new interactive report shows sales by territory, and the volume of sales and quantities sold for each product by vendor.
5. Click the Total column header and drag the entire column next to the Product Name column.
You can also move a column by clicking on the small arrow next to the column name and selecting Move -> Right (or Left) from the menu. 6. Double-click the Total column header to display an input text box. 7. Type Sales in the text box and press <Enter>. 8. Double-click the report name, (Untitled), and type Vendor Sales Report in the text box, then press <Enter>. 9. Click the small arrow next to the Sales column header to display the context menu. 10.Select Summary -> Sum so that the report displays the total sales for each vendor. 11.Click the small arrow next to the Sales column header to display the context menu. Select Sort -> Descending to display sales figures from the highest to the lowest. 12.Double-click the group footer, and rename it Total . You now have a more complex interactive report.
3. Select NA (North America) from the list in the left pane, then click
A small number one (1) appears on the Filter icon in the toolbar indicating that the report contains a single filter. A filter icon also appears next to the Territory field name under Data in the right pane. 4. Save your report. You have successfully added a filter to a report.
5. Click and drag the Line field and drop it next to Territory. 6. Right-click the Territory column and select Show Subtotals. 7. Right-click the Sales column and select Conditional Formatting -> Data Bar - Green.
8. Right-click the Sales column and select User Defined Number -> % of Rank, Running Sum.., then in the dialog box select % of Sales. Click Next. 9. Enable Each Territory Column/Row Subtotal (Subtotal is 100%). Click Done.
10.Click (Filter) in the toolbar to create a filter for your report. When the filter area appears at the top of the report, click and drag the Territory field (under Available Fields on the right) into the filter area. 11.In the Filter on Territory dialog box, select APAC from the list and click the small green arrow to move it to the box on the right. 12.Enable Parameter Name and type region (lowercase) as your parameter name in the text box. Click OK.
The report updates and displays sales data for APAC exclusively. Click the line that separates each column to adjust it for better viewing. Move the line right or left as needed.
13. At this point you have a functioning report and you can view your data in chart form. Click (Switch to Chart Format) to examine your report data in a chart format. The default display is a bar chart but if you click (Choose Another Chart Type) you can select a different chart type to display your data.
Click 14.
Save your report before continuing the exercise. In the Pentaho User Console, click (Save As). When the Save As dialog box appears, save your report as Territory Line - Sales under /Steel Wheels/Analysis and click Save.
4. 5.
6. 7.
a) Type Product Vendor, under Title, in the lower portion of your dashboard and click Apply. Product Vendor is the title of the dashboard panel. Click Analysis in your samples repository (Steel Wheels) to display the list of available reports. Click-and-drag the Territory Line - Sales Analyzer report into the upper right dashboard panel. a) Type Territory Line Sales, under Title, in the lower portion of your dashboard and click Apply. Territory Line Sales is the title of the dashboard panel. Click General Settings in the lower portion of the page and select the 2 Column template. The content of your dashboard displays in the new template. Click General Settings and select Filter. a) Enable Show Filter Toolbar. A Filter area appears at the top of the dashboard. b) Click to add a filter.
The Filter dialog box appears. c) In the Name text box, type Region. d) Under Data Type, select Metadata List. e) Under Connections, click Select, then choose the Orders data source and click OK. The Query Editor opens. 8. In the Query Editor, expand the Customer category on the left and select Territory from the list. Click the small arrow to move Territory under Selected Columns.
a) Click the small arrow in the center of the Query Editor to add Territory to Conditions. b) Under Comparison select is not null.
c) Click OK to exit the Query Editor. 9. Click OK to exit the Filter dialog box and return to the dashboard. 10.Select the dashboard panel that contains your Territory Line Sales Analyzer report. a) Under the Parameters tab, select Region as the a Source value for your region parameter and click Apply.
11.Select the dashboard panel that contains Product Vendor report and change its parameter APAC to Region. a) Test the filter to make sure that the content in each dashboard is being filtered on Region.
2. In the File Name text box, type My First Dashboard. 3. Save your file in the .../steel-wheels/dashboards directory. You can do this by double clicking the Steel Wheels folder and the Dashboards folder. 4. Click Save. Note: To learn more about Dashboard features, see the Pentaho User Console Guide. Click Help -> Documentation in the Pentaho User Console.
11.
Click
in the Pentaho User Console toolbar to exit edit mode, then save your dashboard.
As you select a different territory, the slices in the pie chart update.
You must be logged on to the Pentaho Enterprise Console to follow the instructions below. Go to Start -> Programs -> Pentaho Enterprise Edition -> Enterprise Console Login. 1. 2. 3. 4. 5. In the Pentaho Enterprise Console go to Administration > Database Connections. Click the General icon to display basic configuration options. Click the plus sign (+) (add) if you cannot find your data source in the default list. The Add Database Connection dialog box appears. Type an easy to remember connection name This is the name you give the connection (sometimes referred to as a Pentaho Database Connection or named connection. It is also the name you use when creating a connection on a client tool, like Report Designer. 6. Type or select the Driver Class from the list. The database driver name you select depends on the type of database you are accessing. For example, org.hsqldb.jdbcDriver is a sample driver name for a HSQLDB (previously Hypersonic SQL or hsql) database. If your JDBC driver is not available, see Adding a JDBC Driver. 7. Type the User Name and Password required to access your database. 8. Type or select the URL from the list. This is the URL of your database; for example, jdbc:hsqldb:hsql:// localhost/sampledata. JDBC establishes a connection to a SQL-based database and sends and processes SQL statements. 9. Click Test. A success message appears if the connection is established. 10.Click OK to save your entries
you have the JAR, follow the instructions below to copy it to the driver directories for all of the BI Suite components that need to connect to this data source. Note: Microsoft SQL Server users frequently use an alternative, non-vendor-supported driver called JTDS. If you are adding an MSSQL data source, ensure that you are installing the correct driver. Backing up old drivers You must also ensure that there are no other versions of the same vendor's JDBC driver installed in these directories. If there are, you may have to back them up and remove them to avoid confusion and potential class loading problems. This is of particular concern when you are installing a driver JAR for a data source that is the same database type as your Pentaho solution repository. If you have any doubts as to how to proceed, contact your Pentaho support representative for guidance. Installing JDBC drivers Copy the driver JAR file to the following directories, depending on which servers and client tools you are using (Dashboard Designer, ad hoc reporting, and Analyzer are all part of the BI Server): Note: For the DI Server: before copying a new JDBC driver, ensure that there is not a different version of the same JAR in the destination directory. If there is, you must remove the old JAR to avoid version conflicts. BI Server: /pentaho/server/biserver-ee/tomcat/lib/ Enterprise Console: /pentaho/server/enterprise-console/jdbc/ Data Integration Server: /pentaho/server/data-integration-server/tomcat/webapps/pentaho-di/ WEB-INF/lib/ Data Integration client: /pentaho/design-tools/data-integration/libext/JDBC/ Report Designer: /pentaho/design-tools/report-designer/lib/jdbc/ Schema Workbench: /pentaho/design-tools/schema-workbench/drivers/ Aggregation Designer: /pentaho/design-tools/agg-designer/drivers/ Metadata Editor: /pentaho/design-tools/metadata-editor/libext/JDBC/ Note: To establish a data source in the Pentaho Enterprise Console, you must install the driver in both the Enterprise Console and the BI Server or Data Integration Server. If you are just adding a data source through the Pentaho User Console, you do not need to install the driver to Enterprise Console. Restarting Once the driver JAR is in place, you must restart the server or client tool that you added it to. Connecting to a Microsoft SQL Server using Integrated or Windows Authentication The JDBC driver supports Type 2 integrated authentication on Windows operating systems through the integratedSecurity connection string property. To use integrated authentication, copy the sqljdbc_auth.dll file to all the directories to which you copied the JDBC files. The sqljdbc_auth.dll files are installed in the following location: <installation directory>\sqljdbc_<version>\<language>\auth\ Note: Use the sqljdbc_auth.dll file, in the x86 folder, if you are running a 32-bit Java Virtual Machine (JVM) even if the operating system is version x64. Use the sqljdbc_auth.dll file in the x64 folder, if you are running a 64-bit JVM on a x64 processor. Use the sqljdbc_auth.dll file in the IA64 folder, you are running a 64-bit JVM on an Itanium processor.
SQL Query
The exercises in this section will walk you through creating a Database Table(s) Reporting and Analysis data source.
The Data Source Wizard opens. 2. Under Data Source Name type, Orders Report and Analysis KT. 3. Under Source Type, select Database Table(s) as your data source type.
4. In the lower portion of the wizard page, under Create data source for enable Reporting and Analysis (Requires Star Schema. 5. In the center of the page, under Connection:, click (Add) to define a connection to the database that contains the content you want to access. The Database Connection dialog box appears.
6. In the Connection Name text box, type SampleData. A Connection Name must be easy to remember and must identify the data you are accessing. 7. Under Database Type, select Hypersonic. Text fields for required settings associated with your connection type appear under Settings on the right. 8. Enter the appropriate connection information for your database type: Option Host Name Database Name Port Number User Name Password Description localhost sampledata 9001 (automatically available, by default) pentaho_user password
9. In the Database Connection dialog box, click Test to test your connection and click OK. A success message appears. The connection name appears in the list under Connections in the Data Source Wizard.
To create a Database Table(s) data source for Reporting and Analysis, a star schema is required. In a database, a star schema consists of one or more fact tables that reference any number of dimension tables. A fact table contains business metrics, (numeric data that can be measured), such as Total, Quantity, Budget, and Inventory. In a star schema, the Fact table is the center of the star. A Dimension table may contain Customer Names, Product Names, Employee records, and other miscellaneous data. Dimension tables allow users to browse through specific categories, to summarize data, and to perform drill-down searches.
Follow the instructions below to create the data source. 1. In the Data Source Wizard, click Next. A list of available database tables appears. 2. Press <CTRL+CLICK> to select the ORDERFACT CUSTOMER_W_TER, DIM_TIME, and PRODUCTS tables, then click the small green arrow to move the tables under Selected Tables.
Note: The Fact Table must be selected before you can proceed. In a production environment, a database administrator knows how to identify Fact tables. 4. Define how the tables you selected join to each other. All of your Dimension tables must link to your Fact table. You will create a join relationship with each Dimension table on the right to your Fact table (ORDERFACT) on the left. a) Select the CUSTOMERNUMBER field from and click Create Join. The join relationship between CUSTOMER_W_TER (Dimension Table) and ORDERFACT (Fact Table) is created. Both tables share the key field, CUSTOMERNUMBER. b) Select the DIM_TIME table from the list under Right Table, then select the TIME_ID field from the list of key fields on the left. Click Create Join. The join relationship between DIM_TIME (Dimension Table) and ORDERFACT (Fact Table) is created. Both tables share the key field, TIME_ID.. c) Select the PRODUCTS table from the list under Right Table, then select the PRODUCTCODE field from the list of key fields on the left. Click Create Join. The join relationship between the PRODUCTS table and ORDERFACT table is created. Both tables share the key field, PRODUCTCODE.
5. Click Finish to create your data source. 6. In the Data Source Created dialog box, enable Customize model now. See Customizing the Data Source for more information.
The Pentaho Modeler is a metadata modeling tool that makes it possible for you to customize the models without having to understand the underlying database structure or having to understand SQL and MDX. If you are not familiar with metadata modeling, think of it this way metadata is simply "data about data." For example, a database contains fields, such as customer name, address, city, and state. Among other things, metadata provides the naming conventions for the fields, describes their size, and specifies what is allowed in the field, such as numeric values only. If you were to hand a database file to a user, without providing the associated metadata, the file would appear as a long string of characters. Metadata defines how data is extracted into fields and records. Click OK in the Data Source Model dialog box to keep your default model. The new data source is added to the list of available data sources that users select when creating a new Interactive Report, a dashboard, or Analyzer report.
You can edit ( ) your data source by clicking Manage Existing in the Pentaho User Console launch page. For more information about customizing your data source, see Customizing a Reporting Data Source and Customizing an Analysis Data Source in the Pentaho User Console Guide.
Making sure the BI Server and MySQL have started If you selected to launch the User Console or Enterprise Console, the installer automatically starts the BI Server, Solution Repository, and Enterprise Console. You can check to see if the Pentaho BI Server and the MySQL database that contains the Pentaho Solution Repository have started by launching Services from the Control Panel. Go to Start -> Control Panel -> Administrative Tools -> Services. Alternatively, you can type services.msc in the Run command on the Start menu. Under local Services scroll down to Pentaho BI Server. In the image below, the Pentaho BI Server is started on Tomcat. The Pentaho Solution Repository (MySQL) is also started.
Note: Right-click on the BI Server or Solution Repository in the Services window to start or stop them. Alternatively, you can start the BI Server by navigating to Start -> Programs -> Pentaho Enterprise Edition > Server Management -> Start BI Server. The MySQL database starts automatically when you log on to your device.
Follow the instructions below before you call support to resolve an issue. The support staff must know what components have been installed. 1. Navigate to the folder that contains Pentaho installation; for example, C:\Program Files\pentaho. 2. Locate and open the following file: installation-summary.txt. 3. Copy the contents of the file and send it, via email, to the support technician as instructed.