Biwa2013 Rittman Odi
Biwa2013 Rittman Odi
Biwa2013 Rittman Odi
Mark Rittman, Technical Director, Rittman Mead BIWA Summit 2013, San Francisco, January 2013
T : +44 (0) 8446 697 995 E : [email protected] W: www.rittmanmead.com
Saturday, 5 January 13
Mark Rittman, Co-Founder of Rittman Mead Oracle ACE Director, specialising in Oracle BI&DW 14 Years Experience with Oracle Technology Regular columnist for Oracle Magazine Author of two Oracle Press Oracle BI books Oracle Business Intelligence Developers Guide Oracle Exalytics Revealed Writer for Rittman Mead Blog : http://www.rittmanmead.com/blog Email : [email protected] Twitter : @markrittman
Saturday, 5 January 13
Oracle BI and DW platinum partner World leading specialist partner for technical excellence, solutions delivery and innovation in Oracle BI Approximately 50 consultants worldwide All expert in Oracle BI and DW Offices in US (Atlanta), Europe, Australia and India Skills in broad range of supporting Oracle tools:
OBIEE OBIA ODIEE Essbase, Oracle OLAP GoldenGate Exadata Endeca
Saturday, 5 January 13
Oracles strategic data integration tool, originally came via the Sunopsis acquisition (2006) Java architecture, part of the wider Oracle Data Integration Suite, and Oracle Fusion Middleware Heterogenous database and source/target support Long-term successor to OWB, most common ETL tool on new projects now Commonly used alongside OBIEE, Essbase
and Oracle RDBMS for BI/DW projects
Saturday, 5 January 13
Same philosophy as OWB and Oracle RDBMS DB as the ETL engine Declarative design - separates logic from implementation
Business rules define what goes where, and using which transformation rules Technical implementation defines how data is moved Built for SOA environments Support for Web Services, EII etc Supports batch, event-based and real-time integration Extensible through Knowledge Modules Change Data Capture Slowly Changing Dimensions Bulk load Java client application with server elements (agents)
Saturday, 5 January 13
Oracle Data Integrator for large-scale data integration across heterogenous sources and targets Oracle GoldenGate for heterogeneous data replication and changed data capture Oracle Enterprise Data Quality for data profiling and cleansing Oracle Data Services Integrator
for SOA message-based data federation
Saturday, 5 January 13
Oracle complete set of middleware servers and technologies Based around Java, SOA, Oracle WebLogic Server and non-Java technologies Foundation for Oracles applications and platforms such as Oracle Fusion Applications
Saturday, 5 January 13
As ODI becomes more mainstream, and data integration more mission-critical, ODI needed to evolve Data warehousing and BI projects dont just access (Oracle) relational sources and targets any more Data quality requires more thought than just ad-hoc corrections and filtering ODI needs to participate in modern software development techniques such as continuous integration Its no longer acceptable for ODI jobs to fail, and be unavailable all day or weekend The stakes are raised - can ODI deliver?
Saturday, 5 January 13
Loading More than a Data Warehouse, Accessing More than Oracle RDBMS
Most of us know ODI through its ability to load Oracle data warehouses Data typically sourced from Oracle databases, files, maybe the odd non-Oracle RBDMS source Enterprises now work with many and varied data sources and applications, such as
Multidimensional servers such as Oracle Essbase, and associated EPM apps XML sources, and JMS queues SOA environments, using messaging and service buses, typically in real-time More recently - big data sources such as Hadoop clusters, NoSQL databases
Saturday, 5 January 13
ODI11g is the strategic, long-term DI tool for Essbase and associated EPM applications IKMs and LKMs for loading, and extracting from, Essbase databases and EPM metadata stores Data models for Essbase databases represented as tables, columns, the same as with other data sourcres Data loads via rules files, Essbase / Planning / HFM APIs However ... not really Essbase native, learning curve for admins sources of ODI + Essbase/EPM Suite information: Good http://john-goodwin.blogspot.co.uk
Cameron Lackpour OOW2012 Presentation Slay the Bad Data in Essbase with ODI http://tinyurl.com/lackpour-odi Rittman Mead Blog
Saturday, 5 January 13
ODI has technology adapters and features for many SOA, queue and messaging-type technologies JMS Queue, JMS Topic (plain message or XML), SOAP messages via Web Services etc Main role for ODI in SOA enviroments is bulk-data movement, invoked by web service calls Regular inter-service messaging for low volume, switching to ODI for high-volume Web services provided by runtime agents
Start, monitor, stop and restart scenarios Start, monitor, stop and restart load pans Public introspection web service List contexts List scenarios Requires deployment in Java EE container Call from BPEL or any other standard process
Saturday, 5 January 13
1. Large file arrives, detected by BPEL file 2. Execution starts (BPEL / ESB) - and a step for transforming a large document payload occurs 3. Pass XML payload, by reference, to ODI 4. ODI loads payload
5. 6. 7. 8.
ODI transforms payload ODI sends payload wherever instructed ODI notifies BPEL / ESB that job has completed Core BPEL / ESB processing completes
Saturday, 5 January 13
ODI is essentially a batch-orientated DI tool, though batches can be micro-batches (and event-driven) ODI moves and transforms data, loading it into a central, integrated location In some cases though, you may wish to take a different approach
Data federation vs. integration - read and transform data in-place Data read and integrated on-demand, as a service Approach could be preferable for many reasons Security rules dont allow data to be replicated Consumer Development is dynamic, sources frequently added or changed ODSI Data volumes dont warrant a full ETL solution Data format is inherently nested and does not easily map onto relational model
Source A Source B Consumer Consumer ODSI
Consolidated Hub Consolidated Hub
ODI/OGG
Source A
Source B
Federation only
Consolidation only
Saturday, 5 January 13
Saturday, 5 January 13
Big data is the hot topic in BI, DW and Analytics circles The ability to harness vast datasets, at a highly-granular level, by harnessing massively-parallel computing Crunching loosely-structured and modelled datasets using simple algorithms: Map (project) + Reduce (agg) Largely based around open-source projects, non-relational technologies
Apache Hadoop MapReduce Hadoop Distributed File System Apache Hive, Sqoop, HBase etc Emerging commercial vendors Cloudera Hortonworks etc Can be used standalone, or linked to an enterprise DW/BI architecture
Saturday, 5 January 13
ODI is the data integration tool for extracting data from Hadoop/MapReduce, and loading into Oracle Big Data Appliance, Oracle Exadata and Oracle Exalytics Oracle Application Adaptor for Hadoop provides required data adapters
Load data into Hadoop from local filesystem, or HDFS (Hadoop clustered FS) Read data from Hadoop/MapReduce using Apache Hive (JDBC) and HiveQL, load into Oracle RDBMS using Oracle Loader for Hadoop Supported by Oracles Engineered Systems Exadata Exalytics Big Data Appliance (w/Cloudera Hadoop Distrib)
Saturday, 5 January 13
Hive Server
HiveQL
Saturday, 5 January 13
Data is extracted and loaded using regular interfaces LKMs and IKMs generate HiveQL queries Functionally identical to RDBMS access/loading
Saturday, 5 January 13
Oracle technology for accessing Hadoop data, and loading it into an Oracle database Pushes data transformation, heavy lifting to the Hadoop cluster, using MapReduce Direct-path loads into Oracle Database, partitioned and non-partitioned Online and offline loads Key technology for fast load of
Hadoop results into Oracle DB
Saturday, 5 January 13
ODI has built-in capabilities for defining data rules, data firewalls Static controls, Flow controls, constraints etc But what if you dont know what issues your data actually has? What if you need to profile, deduplicate, merge or otherwise manage your data? This is almost a topic in itself...
Saturday, 5 January 13
Data profiling, auditing and cleansing based on the industry-leading Datanomic platform Integration with Oracle Data Integrator for a complex data management solution
Saturday, 5 January 13
Ability to profile data from many sources (file, RDBMS, JDNI, XML, MS Office) Create data quality cases, track and assign to owner Cleanse, transform, parse and match incoming data via a palette of operators (processors) Batch or real-time operation All-Java architecture, thin-client and runs in WebLogic Server Replaces previous Trillium-based OEM solution
(but extra-cost option, as was Trillium solution)
Saturday, 5 January 13
Limited integration at present, but Datanomic only just acquired Can run in same WLS domain, environment EDQ result schema can be on same DB as ODI staging area EDQ processes can be executed from ODI package or load plan
using EDQ Open Tool
Saturday, 5 January 13
As ODI and data integration becomes more integral to enterprises, expectations rise ODI project elements, and executable code, needs to go into source control Build systems need to be able to include ODI functionality in their releases Development Operations (DevOps) systems need to be able to spin-up ODI environments automatically Ideas such as continuous integration and smoke testing can also apply to ODI projects ODI topologies need to be flexible enough to
deal with DEV/PROD network & responsibility separations
Saturday, 5 January 13
Typical enterprise customers deploy all non-PROD environments on their own network, isolated from the main production systems This stops you having a single master repository for all ODI work repositories Good practice is to have all non-DEV environments use
execution work repositories Only allows load plans and scenarios to be imported Can only run existing code, not alter or change code Challenge is how you deploy code without DEV assistance Requires command-line tools Requires scripting Requires an API?
Saturday, 5 January 13
Run from the command-line, from an ODI procedure, or other methods Scriptable using the startcmd.bat|sh utility Run from the agent home directory, connects to master and work repositories The key to automating the deployment and administration of ODI projects and environments
Saturday, 5 January 13
For complex, multi-developer projects, continuous integration is a good practice Continously taking shipped code and testing it in a smoke test environment
Identifies changes that break the build early Use a suite of regression tests that run the code with optimal coverage, end-to-end ETL runs Gives you confidence that a release shipped into test will actually compile, deploy and pass functional tests Enables more agile development, through having a robust build and regression testing process that welcomes change
Saturday, 5 January 13
Jenkins is an open-source build automation and continuous integration tool Supports a range of build tools including Ant, Maven, Subversion, Git etc Use to detect new ODI export files in a given directory, and then
automatically deploy them to the CI / Smoke-Test environment Or monitor a source-control system for new check-ins Deploy ODI code through ODI Tools (OdiImportScen, OdiImportObject)
Security Topology
Versioning
Execution
Saturday, 5 January 13
Download Jenkins from http://jenkins-ci.org Set up a new build job, optionally integrate with SVN etc Run ODI tools through Execute a Batch File function Or take it further using Maven, Ant etc Run the build process manually, to a schedule, or on check-in of new code to the source control system Report on stability of build, see last failure, reason for fail
Saturday, 5 January 13
For other automation tasks, the ODI SDK can be used to perform all functions available in ODI Studio Java-based API analogous to OMB+ within Warehouse Builder Script the creation of repositories & interfaces, updating of models, registering of data sources and topologies etc Used either within Java applications (compiled),
or interpreted using Groovy (editor now shipped with ODI)
import oracle.odi.domain.project.OdiProject; import oracle.odi.core.persistence.transaction.support. DefaultTransactionDefinition; txnDef = new DefaultTransactionDefinition(); tm = odiInstance.getTransactionManager() txnStatus = tm.getTransaction(txnDef) project = new OdiProject("Project For Demo", "PROJECT_DEMO") odiInstance.getTransactionalEntityManager().persist(project) tm.commit(txnStatus)
Saturday, 5 January 13
ODI routines when deployed in the enterprise, need to be resilient, fail gracefully, be restartable They are often considered mission critical You need to code defensively, and anticipate #fail
.. or this. Make your ETL routines like this...
Saturday, 5 January 13
ODI ETL processes typically fail for one of two main reasons
Reason #1 : An error in your code, unexpected data, run out of disk space etc - the process fails Reason #2 : An agent crashes, ODI repositories goes down etc - the infrastructure fails Most modern databases (Oracle 11g+ etc) have capabilities to recover from DB process issues Can we make use of these within ODI packages, KMs etc?
Saturday, 5 January 13
When enabled, suspends INSERT operations when out of disk space, rather than fail load Datafiles can then be extended, or new ones added Can be incorporated into ODI KM to enable more load operations to complete
Saturday, 5 January 13
Insert process becomes suspended, ODI Operator shows step as still running Suspended operation can be detected using DBA_RESUMABLE, USER_RESUMABLE Once more disk space added, step will resume, operation can complete
select name from dba_resumable; NAME ------------------------------LOAD Sales Fact Table alter database datafile 'sales_ts.dbf' resize 200m; database datafile 'SALES_TS.DBF' altered.
Saturday, 5 January 13
Recoverability is another enterprise ETL requirement - graceful failure and ability to restart process Can be as simple as re-running the job, but some failures may be catastrophic - how to you unwind? Oracle RDBMS has several flashback technologies that can help
Flashback database, to a given SCN or restore point Flashback table, etc Example : An ETL process performs an UPDATE, then and INSERT - if the INSERT fails, the UPDATE stays present. Can we use FLASHBACK TABLE to restore the table back to original state, so the process can be restarted safely?
Saturday, 5 January 13
Alternative to packages for sequencing interfaces and other steps Helps organize an optimal execution schedule for a batch Advanced sequencing capabilities
Parallel or Serial, Conditional branching Exception handling Complements Scenarios and Packages, does not replace them Exception handling feature could be very useful in restart / graceful failure scenarios Run ODI procedure, package, to correct errors Run commands to roll-back/flashback the database or tables Lets use one for our example...
Saturday, 5 January 13
Flashback table requires an SCN (System Change Number) to flashback-to Record the current SCN before performing thre integration in a project variable
Requires SELECT privilege on V$_DATABASE
Saturday, 5 January 13
Load plan will define an exception, to be raised if the final INSERT operation fails Exception will call an ODI Procedure that runs the FLASHBACK TABLE command, using the saved SCN
Saturday, 5 January 13
Now, when the INSERT step fails due to an error, the UPDATE
is rolled-back as well through the FLASHBACK TABLE feature Table restored to state at original recorded SCN
Saturday, 5 January 13
Enterprises typically deploy ODI using standalone agents, in a parent/child load-balancing configuration Repository database has regular backups, or ideally uses DataGuard / log-shipping Scheduled jobs assigned to the parent, master runtime agent Jobs then delegated to the child agents, that then do the work based on load factor, availability But what if the parent agent goes down?
What about the schedule?
Saturday, 5 January 13
OPMN (Oracle Process Manager and Notification Server) can be installed to manage standalone agents Not part of the base install or license, but you probably have it somewhere Standalone agents then run, stopped, restarted and monitored using OPMN server Ensures that failed agents are restarted, including
the parent agent for load balancing
Saturday, 5 January 13
Runtime agents can now be deployed in WebLogic Server managed servers (requires WebLogic Server license) Benefit from WebLogic clustering, Enterprise Manager (+ODI Console), more resilient JVM Better for high-availability protects the scheduler how?
Saturday, 5 January 13
How JEE Agents, WebLogic and Coherence Protect Against Agent Failure
Hardware load balancer provides the load-balancing Agents are all equal - one elects to be the scheduler on cluster start, another takes over if that one crashes Oracle Coherence cache grid holds details of the schedule, available to all nodes in the cluster WebLogic Server clustering restarts failed managed servers, and Java processes (JEE runtime agents) However ... more complex setup, extra license cost, and
may not be necessary if external scheduler used instead Still benefits from running agents in production JVM though And you get Enterprise Manager, ODI Console etc
Saturday, 5 January 13
Five-part series on the Rittman Mead Blog: ODI 11g in the Enterprise
Part 1: Beyond Data Warehouse Table Loading Part 2 : Data Integration using Essbase, Messaging, and Big Data Sources and Targets Part 3: Data Quality and Data Profiling using Oracle EDQ Part 4: Build Automation and Devops using the ODI SDK, Groovy and ODI Tools Part 5: ETL Resilience and High-Availability
http://www.rittmanmead.com/2012/12/
odi11g-in-the-enterprise-part-1-beyond-datawarehouse-table-loading/
Saturday, 5 January 13
Saturday, 5 January 13
Saturday, 5 January 13