PDF Export

Download as pdf or txt
Download as pdf or txt
You are on page 1of 110

Welcome to Talend Help Center

Talend Components
2022-03-17

1. Access
1. Access components
2. Access scenario
2. Amazon Aurora
1. Amazon Aurora components
2. Amazon Aurora scenario
3. Amazon DynamoDB
1. Amazon DynamoDB components
2. Amazon DynamoDB scenario
4. Amazon EMR
1. Amazon EMR components
2. Amazon EMR scenario
5. Amazon EMR distribution
1. Amazon EMR distribution scenario
6. Amazon MySQL
1. Amazon MySQL components
7. Amazon Oracle
1. Amazon Oracle components
8. Amazon Redshift
1. Amazon Redshift components
2. Amazon Redshift scenarios
9. Amazon S3
1. Amazon S3 components
2. Amazon S3 scenarios
10. Amazon SQS
1. Amazon SQS components
2. Amazon SQS scenarios
11. Apache log
1. Apache log component
2. Apache log scenario
12. Archive/Unarchive
1. Archive/Unarchive components
2. Archive/Unarchive scenarios
13. ARFF
1. ARFF components
2. ARFF scenario
14. AS400
1. AS400 components
2. AS400 scenario
15. Avro
1. Avro components
2. Avro scenario
16. Azure Data Lake Store
1. Azure Data Lake Store components
2. Azure Data Lake Store scenarios
17. Azure SQL Data Warehouse
1. Azure SQL Data Warehouse components
18. Azure Storage Blob
1. Azure Storage Blob components
2. Azure Storage Blob scenarios
19. Azure Storage Queue
1. Azure Storage Queue components
20. Azure Storage Table
1. Azure Storage Table components
2. Azure Storage Table scenario
21. Bonita
1. Bonita components
2. Bonita scenarios
22. Box
1. Box components
2. Box scenario
23. Buffer
1. Buffer components
2. Buffer scenarios
24. Business rules
1. Business rules components
2. Business rules scenarios
25. Cassandra
1. Cassandra components
2. Cassandra scenario
26. Change Data Capture
1. Change Data Capture components
2. Change Data Capture scenarios
27. Chart
1. Chart components
2. Chart scenarios
28. Cloud
1. Cloud components
29. CombinedSQL
1. CombinedSQL components
2. CombinedSQL scenario
30. Context
1. Context components
2. Context scenario
31. CosmosDB
1. CosmosDB components
32. Couchbase
1. Couchbase components
2. Couchbase scenario
33. CyberArk
1. CyberArk component
2. CyberArk scenario
34. Data mapping
1. Data mapping components
2. Data mapping scenarios
35. Data Preparation
1. Data Preparation components
2. Data Preparation scenarios
36. Data Quality
1. Address standardization
1. Address standardization components
2. Address standardization scenarios
2. Continuous matching
1. Continuous matching components
2. Continuous matching scenarios
3. Data extraction
1. Data extraction components
2. Data extraction scenarios
4. Data matching
1. Data matching components
2. Data matching scenarios
5. Data privacy
1. Data privacy components
2. Data privacy scenarios
6. Deduplication
1. Deduplication components
2. Deduplication scenarios
7. Email validation
1. Email validation component
2. Email validation scenario
8. Formatting
1. Formatting component
2. Formatting scenario
9. Fuzzy matching
1. Fuzzy matching components
2. Fuzzy matching scenarios
10. Google address standardization
1. Google address standardization components
2. Google address standardization scenarios
11. Identification
1. Identification components
2. Identification scenarios
12. Loqate address standardization
1. Loqate address standardization component
2. Loqate address standardization scenario
13. Matching with machine learning
1. Matching with machine learning components
2. Matching with machine learning scenarios
14. Melissa Data address standardization
1. Melissa Data address standardization components
2. Melissa Data address standardization scenarios
15. Microsoft SQL Server validation
1. Microsoft SQL Server validation components
16. MySQL validation
1. MySQL validation components
2. MySQL validation scenarios
17. Name standardization
1. Name standardization component
2. Name standardization scenario
18. Oracle validation
1. Oracle validation components
19. Pattern validation
1. Pattern validation components
2. Pattern validation scenarios
20. Phone number standardization
1. Phone number standardization component
2. Phone number standardization scenario
21. PostgreSQL validation
1. PostgreSQL validation components
22. QAS address standardization
1. QAS address standardization components
2. QAS address standardization scenarios
23. Reporting
1. Reporting components
2. Reporting scenarios
24. Sampling
1. Sampling component
2. Sampling scenario
25. Standardization
1. Standardization components
2. Standardization scenarios
26. Synonym index
1. Synonym index components
2. Synonym index scenarios
27. Text standardization
1. Text standardization components
2. Text standardization scenarios
28. Uniserv
1. Uniserv components
2. Uniserv scenarios
29. Validation
1. Validation component
2. Validation scenario
37. Data Stewardship
1. Data Stewardship components
2. Data Stewardship scenarios
38. Database utility
1. Database utility components
2. Database utility scenarios
39. Databricks
1. Databricks components
2. Databricks scenarios
40. DB Generic
1. DB Generic components
41. DB2
1. DB2 components
42. DBFS
1. DBFS components
43. Defining Context Groups
1. Defining Context Groups scenarios
44. Delimited
1. Delimited components
2. Delimited scenarios
45. Delta Lake
1. Delta Lake components
2. Delta Lake scenario
46. DotNET
1. DotNET components
2. DotNET scenarios
47. Dropbox
1. Dropbox components
2. Dropbox scenario
48. Dynamic Schema
1. Dynamic Schema component
2. Dynamic Schema scenarios
49. ElasticSearch
1. ElasticSearch components
50. ELT Greenplum
1. ELT Greenplum components
2. ELT Greenplum scenarios
51. ELT Hive
1. ELT Hive components
2. ELT Hive scenarios
52. ELT JDBC
1. ELT JDBC components
2. ELT JDBC scenarios
53. ELT MSSql
1. ELT MSSql components
2. ELT MSSql scenarios
54. ELT MySQL
1. ELT MySQL components
2. ELT MySQL scenarios
55. ELT Netezza
1. ELT Netezza components
2. ELT Netezza scenarios
56. ELT Oracle
1. ELT Oracle components
2. ELT Oracle scenarios
57. ELT PostgreSQL
1. ELT PostgreSQL components
2. ELT PostgreSQL scenarios
58. ELT Sybase
1. ELT Sybase components
2. ELT Sybase scenarios
59. ELT Teradata
1. ELT Teradata components
2. ELT Teradata scenarios
60. ELT Vertica
1. ELT Vertica components
2. ELT Vertica scenarios
61. ESB REST
1. ESB REST components
2. ESB REST scenarios
62. ESB SOAP
1. ESB SOAP components
2. ESB SOAP scenarios
63. EXASolution
1. EXASolution components
2. EXASolution scenario
64. Excel
1. Excel components
2. Excel scenario
65. EXist
1. EXist components
2. EXist scenario
66. Firebird
1. Firebird components
67. Flume
1. Flume components
68. FTP
1. FTP components
2. FTP scenarios
69. FullRow
1. FullRow components
2. FullRow scenario
70. Global variable
1. Global variable components
2. Global variable scenarios
71. Google BigQuery
1. Google BigQuery components
2. Google BigQuery scenarios
72. Google Dataproc
1. Google Dataproc component
73. Google Drive
1. Google Drive components
2. Google Drive scenario
74. Google PubSub
1. Google PubSub components
75. GPG
1. GPG component
2. GPG scenario
76. Greenplum
1. Greenplum components
77. Groovy
1. Groovy components
2. Groovy scenario
78. GS
1. GS components
2. GS scenario
79. HBase
1. HBase components
2. HBase scenario
80. HCatalog
1. HCatalog components
2. HCatalog scenario
81. HDFS
1. HDFS components
2. HDFS scenarios
82. Hive
1. Hive components
2. Hive scenarios
83. HSQLDB
1. HSQLDB components
84. HTTP
1. HTTP component
2. HTTP scenarios
85. Impala
1. Impala components
86. Informix
1. Informix components
87. Ingres
1. Ingres components
2. Ingres scenario
88. Interbase
1. Interbase components
89. Internet (Integration)
1. Internet (Integration) component
2. Internet (Integration) scenarios
90. Jasper
1. Jasper components
2. Jasper scenario
91. Java custom code for Map Reduce
1. Java custom code for Map Reduce component
2. Java custom code for Map Reduce scenario
92. Java custom code for Storm
1. Java custom code for Storm component
2. Java custom code for Storm scenario
93. Java custom code
1. Java custom code components
2. Java custom code scenarios
94. JavaDB
1. JavaDB components
95. JBoss ESB
1. JBoss ESB components
96. JDBC
1. JDBC components
97. JIRA
1. JIRA components
2. JIRA scenarios
98. JMS
1. JMS components
2. JMS scenario
99. JSON
1. JSON components
2. JSON scenarios
100. Kafka
1. Kafka components
2. Kafka scenarios
101. Kerberos
1. Kerberos component
102. Keystore
1. Keystore component
2. Keystore scenario
103. Kinesis
1. Kinesis components
2. Kinesis scenario
104. Kudu
1. Kudu components
2. Kudu scenario
105. LDAP
1. LDAP components
2. LDAP scenarios
106. LDIF
1. LDIF components
2. LDIF scenario
107. Library import
1. Library import component
2. Library import scenario
108. Logs and errors (Integration)
1. Logs and errors (Integration) components
2. Logs and errors (Integration) scenarios
109. Machine Learning
1. Machine Learning components
2. Machine Learning scenarios
110. Mail
1. Mail components
2. Mail scenarios
111. MapRDB
1. MapRDB components
2. MapRDB scenario
112. MapRStreams
1. MapRStreams components
113. Marketo
1. Marketo components
2. Marketo scenarios
114. MarkLogic
1. MarkLogic components
115. MaxDB
1. MaxDB components
116. MDM (Master Data Management)
1. MDM connection and transaction
1. MDM connection and transaction components
2. MDM data processing
1. MDM data processing components
2. MDM data processing scenarios
3. MDM event processing
1. MDM event processing components
2. MDM event processing scenarios
117. MemSQL
1. MemSQL components
2. MemSQL scenario
118. Microsoft CRM
1. Microsoft CRM components
2. Microsoft CRM scenario
119. Microsoft MQ
1. Microsoft MQ components
2. Microsoft MQ scenario
120. MOM
1. MOM components
2. MOM scenarios
121. Mondrian
1. Mondrian component
2. Mondrian scenario
122. MongoDB
1. MongoDB components
2. MongoDB scenarios
123. MQTT
1. MQTT components
124. MS Delimited
1. MS Delimited components
2. MS Delimited scenario
125. MS Positional
1. MS Positional components
2. MS Positional scenario
126. MS XML connectors
1. MS XML connectors components
2. MS XML connectors scenario
127. MSSql
1. MSSql components
2. MSSql scenarios
128. MySQL
1. MySQL components
2. MySQL scenarios
129. NamedPipe
1. NamedPipe components
2. NamedPipe scenario
130. Natural Language Processing
1. Natural Language Processing components
2. Natural Language Processing scenarios
131. Neo4j
1. Neo4j components
2. Neo4j scenarios
132. Netezza
1. Netezza components
133. Netsuite
1. Netsuite components
2. Netsuite scenario
134. Openbravo ERP
1. Openbravo ERP components
135. Oracle
1. Oracle components
2. Oracle scenarios
136. ORC
1. ORC components
137. Orchestration (Integration)
1. Orchestration (Integration) components
2. Orchestration (Integration) scenarios
138. Palo
1. Palo components
2. Palo scenarios
139. ParAccel
1. ParAccel components
140. Parquet
1. Parquet components
141. Petals
1. Petals components
142. POP
1. POP component
2. POP scenario
143. Positional
1. Positional components
2. Positional scenarios
144. PostgresPlus
1. PostgresPlus components
145. PostgreSQL
1. PostgreSQL components
146. Processing (Integration)
1. Processing (Integration) components
2. Processing (Integration) scenarios
147. Properties
1. Properties components
2. Properties scenario
148. Proxy
1. Proxy component
149. RabbitMQ
1. RabbitMQ components
150. Raw
1. Raw components
151. Regex
1. Regex components
2. Regex scenario
152. REST
1. REST component
2. REST scenario
153. Riak
1. Riak components
2. Riak scenario
154. Route
1. Route components
2. Route scenarios
155. RSS
1. RSS components
2. RSS scenarios
156. Salesforce
1. Salesforce components
2. Salesforce scenarios
157. SAP
1. SAP components
2. SAP scenarios
158. SCD
1. SCD components
2. SCD scenario
159. SCDELT
1. SCDELT components
2. SCDELT scenarios
160. SCP
1. SCP components
2. SCP scenario
161. ServiceNow
1. ServiceNow components
162. SingleStore
1. SingleStore components
163. Snowflake
1. Snowflake components
2. Snowflake scenarios
164. SOAP
1. SOAP component
2. SOAP scenarios
165. Socket
1. Socket components
2. Socket scenario
166. Splunk
1. Splunk component
167. SQLite
1. SQLite components
2. SQLite scenarios
168. SQLTemplate
1. SQLTemplate components
2. SQLTemplate scenarios
169. Sqoop
1. Sqoop components
2. Sqoop scenarios
170. SVNLog
1. SVNLog component
2. SVNLog scenario
171. Sybase
1. Sybase components
2. Sybase scenario
172. System
1. System components
2. System scenarios
173. Tachyon
1. Tachyon component
174. tAddLocationFromIP
1. tAddLocationFromIP component
2. tAddLocationFromIP scenario
175. Talend Cloud
1. Talend Cloud components
176. tChangeFileEncoding
1. tChangeFileEncoding component
2. tChangeFileEncoding scenario
177. tCreateTemporaryFile
1. tCreateTemporaryFile component
2. tCreateTemporaryFile scenario
178. Technical
1. Technical components
2. Technical scenarios
179. Teradata
1. Teradata components
2. Teradata scenarios
180. tFileCompare
1. tFileCompare component
2. tFileCompare scenario
181. tFileCopy
1. tFileCopy component
2. tFileCopy scenario
182. tFileDelete
1. tFileDelete component
2. tFileDelete scenario
183. tFileExist
1. tFileExist component
2. tFileExist scenario
184. tFileList
1. tFileList component
2. tFileList scenarios
185. tFileProperties
1. tFileProperties component
2. tFileProperties scenario
186. tFileRowCount
1. tFileRowCount component
2. tFileRowCount scenario
187. tFileTouch
1. tFileTouch component
188. tFixedFlowInput
1. tFixedFlowInput component
2. tFixedFlowInput scenario
189. tMap
1. tMap component
2. tMap scenarios
190. tMemorizeRows
1. tMemorizeRows component
2. tMemorizeRows scenario
191. tMsgBox
1. tMsgBox component
2. tMsgBox scenario
192. tRowGenerator
1. tRowGenerator component
2. tRowGenerator scenario
193. tServerAlive
1. tServerAlive component
2. tServerAlive scenario
194. tSocketTextStreamInput
1. tSocketTextStreamInput component
195. tXMLMap
1. tXMLMap component
2. tXMLMap scenarios
196. VectorWise
1. VectorWise components
197. Vertica
1. Vertica components
198. VtigerCRM
1. VtigerCRM components
199. Webservice
1. Webservice components
2. Webservice scenarios
200. Workday
1. Workday component
201. XML
1. XML components
2. XML scenarios
202. XML connectors
1. XML connectors components
2. XML connectors scenarios
203. XML validation
1. XML validation components
2. XML validation scenarios
204. XMLRPC
1. XMLRPC component
2. XMLRPC scenario

Access
Access components

tAccessBulkExec Offers gains in performance when carrying out Insert operations in an


Access database.

tAccessClose Closes an active connection to the Access database so as to release


occupied resources.

tAccessCommit Commits in one go a global transaction instead of doing that on every row
or every batch, and provides gain in performance, using a unique
connection.

tAccessConnection Opens a connection to the specified database that can then be reused in
the subsequent subJob or subJobs.

tAccessInput Reads a database and extracts fields based on a query.

tAccessOutput Writes, updates, makes changes or suppresses entries in a database.

tAccessOutputBulk Prepares the file which contains the data used to feed the Access database.

tAccessOutputBulkExec Executes an Insert action on the data provided, in an Access database.

tAccessRollback Cancels the transaction commit in the connected database and avoids to
commit part of a transaction involuntarily.

tAccessRow Executes the SQL query stated onto the specified database.

Access scenario
Inserting data in parent/child tables

Amazon Aurora
Amazon Aurora components

tAmazonAuroraInvalidRows Checks Amazon Aurora database rows against specific Data Quality patterns
(regular expression) or Data Quality rules (business rule). Only MySQL is
supported.

tAmazonAuroraValidRows Checks Amazon Aurora database rows against specific Data Quality patterns
(regular expression) or Data Quality rules (business rule). Only MySQL is
supported.

tAmazonAuroraClose Closes an active connection to an Amazon Aurora database instance to release


the occupied resources.

tAmazonAuroraCommit Commits in one go a global transaction instead of doing that on every row or
every batch, and provides gain in performance, using a unique connection.

tAmazonAuroraConnection Opens a connection to an Amazon Aurora database instance that can then be
reused by other Amazon Aurora components.
tAmazonAuroraInput Reads an Amazon Aurora database and extracts fields based on a query.

tAmazonAuroraOutput Writes, updates, makes changes or suppresses entries in an Amazon Aurora


database.

tAmazonAuroraRollback Rolls back any changes made in the Amazon Aurora database to prevent partial
transaction commit if an error occurs.

tAmazonAuroraRow Executes query statements on a specified Amazon Aurora database table.

Amazon Aurora scenario


Handling data with Amazon Aurora

Amazon DynamoDB

Amazon DynamoDB components

tDynamoDBConfiguration Stores connection information and credentials to be reused by other


DynamoDB components.

tDynamoDBLookupInput Executes a database query with a strictly defined order which must
correspond to the schema definition.

tDynamoDBInput Retrieves data from an Amazon DynamoDB table and sends them to the
component that follows for transformation.

tDynamoDBOutput Creates, updates or deletes data in an Amazon DynamoDB table.

Amazon DynamoDB scenario


Writing and extracting JSON documents from DynamoDB

Amazon EMR
Amazon EMR components

tAmazonEMRListInstances Lists the details about the instance groups in a cluster on Amazon EMR
(Elastic MapReduce).

tAmazonEMRManage Launches or terminates a cluster on Amazon EMR (Elastic MapReduce).

tAmazonEMRResize Adds or resizes a task instance group in a cluster on Amazon EMR (Elastic
MapReduce).

Amazon EMR scenario


Managing an Amazon EMR cluster

Amazon EMR distribution

Amazon EMR distribution scenario


Writing server-side KMS encrypted data on EMR

Amazon MySQL

Amazon MySQL components

tAmazonMysqlClose Closes the transaction committed in the connected DB.

tAmazonMysqlCommit Commits in one go a global transaction instead of doing that on every row or
every batch, and provides gain in performance, using a unique connection.

tAmazonMysqlConnection Opens a connection to the specified database that can then be reused in the
subsequent subJob or subJobs.

tAmazonMysqlInput Reads a database and extracts fields based on a query.

tAmazonMysqlOutput Writes, updates, makes changes or suppresses entries in a database.

tAmazonMysqlRollback Cancels the transaction commit in the connected database and avoids to
commit part of a transaction involuntarily.

tAmazonMysqlRow Executes the SQL query stated onto the specified database.

Amazon Oracle
Amazon Oracle components

tAmazonOracleClose Closes the transaction committed in the connected database.

tAmazonOracleCommit Commits in one go a global transaction instead of doing that on every row or
every batch, and provides gain in performance, using a unique connection.

tAmazonOracleConnection Opens a connection to the specified database that can then be reused in the
subsequent subJob or subJobs.

tAmazonOracleInput Reads a database and extracts fields based on a query.

tAmazonOracleOutput Writes, updates, makes changes or suppresses entries in a database.

tAmazonOracleRollback Cancels the transaction commit in the connected database and avoids to
commit part of a transaction involuntarily.

tAmazonOracleRow Executes the SQL query stated onto the specified database.

Amazon Redshift
Amazon Redshift components

tRedshiftConfiguration Reuses the connection configuration to a Redshift database in the same


Job.

tRedshiftLookupInput Reads a Redshift database and extracts fields based on a query.


tAmazonRedshiftManage Manages Amazon Redshift clusters and snapshots.

tRedshiftBulkExec Loads data into Amazon Redshift from Amazon S3, Amazon EMR cluster,
Amazon DynamoDB, or remote hosts.

tRedshiftClose Closes the transaction committed in the connected DB.

tRedshiftCommit Provides gain in performance.

tRedshiftConnection Opens a connection to the specified database that can then be reused in
the subsequent subJob or subJobs.

tRedshiftInput Reads data from a database and extracts fields based on a query so that
you may apply changes to the extracted data.

tRedshiftOutput Writes, updates, modifies or deletes the data in a database.

tRedshiftOutputBulk Prepares a delimited/CSV file that can be used by tRedshiftBulkExec to feed


Amazon Redshift.

tRedshiftOutputBulkExec Executes the Insert action on the data provided.

tRedshiftRollback Cancels the transaction commit in the Redshift database to avoid


committing part of a transaction involuntarily.

tRedshiftRow Acts on the actual DB structure or on the data (although without handling
data), depending on the nature of the query and the database.

tRedshiftUnload Unloads data on Amazon Redshift to files on Amazon S3.

Amazon Redshift scenarios


Handling data with Redshift
Loading/unloading data to/from Amazon S3

Amazon S3
Amazon S3 components

tS3Configuration Reuses the connection configuration to S3 in the same Job. The Spark
cluster to be used reads this configuration to eventually connect to S3.

tS3Input Reads data from a given S3N system (S3 Native Filesystem).

tS3Output Writes data into a given S3 filesystem.

tS3BucketCreate Creates a bucket on Amazon S3.

tS3BucketDelete Deletes an empty bucket from Amazon S3.

tS3BucketExist Verifies if the specified bucket exists on Amazon S3.

tS3BucketList Lists all the buckets on Amazon S3.


tS3Close Shuts down a connection to Amazon S3, thus releasing the network
resources.

tS3Connection Establishes a connection to Amazon S3 to store and retrieve data.

tS3Copy Copies an Amazon S3 object from a source bucket to a destination bucket.

tS3Delete Deletes a file from Amazon S3.

tS3Get Retrieves a file from Amazon S3.

tS3List Lists the files on Amazon S3 based on the bucket/file prefix settings.

tS3Put Uploads data onto Amazon S3 from a local file or from cache memory via
the streaming mode.

Amazon S3 scenarios
Writing and reading data from S3 (Databricks on AWS)
Writing server-side KMS encrypted data on EMR
Copying an S3 object from one bucket to another
Exchange files with Amazon S3
Listing files with the same prefix from a bucket
Retrieving data from an S3 object in Studio
Tagging S3 objects
Verifing the absence of a bucket, creating it and listing all the S3 buckets

Amazon SQS

Amazon SQS components

tSQSConnection Opens a connection to Amazon Simple Queue Service that can then be reused by
other SQS components.

tSQSInput Retrieves one or more messages, with a maximum limit of ten messages, from an
Amazon SQS (Simple Queue Service) queue.

tSQSMessageChangeVisibility Changes the visibility timeout of a specified message in an Amazon SQS (Simple
Queue Service) queue.

tSQSMessageDelete Deletes a specified message from an Amazon SQS (Simple Queue Service) queue.

tSQSOutput Delivers one or more messages to an Amazon SQS (Simple Queue Service) queue.

tSQSQueueAttributes Gets attributes for a specified Amazon SQS (Simple Queue Service) queue.

tSQSQueueCreate Creates a new Amazon SQS (Simple Queue Service) queue.

tSQSQueueDelete Deletes an Amazon SQS (Simple Queue Service) queue.

tSQSQueueList Iterates and lists the URL of Amazon SQS (Simple Queue Service) queues in a specified
region.
tSQSQueuePurge Purges messages in an Amazon SQS (Simple Queue Service) queue.

Amazon SQS scenarios


Delivering messages to an Amazon SQS queue
Listing Amazon SQS queues in an AWS region
Retrieving messages from an Amazon SQS queue

Apache log
Apache log component

tApacheLogInput Reads the access-log file for an Apache HTTP server.

Apache log scenario


Reading an Apache access-log file

Archive/Unarchive

Archive/Unarchive components

tFileArchive Creates a new zip, gzip, or tar.gz archive file from one or more files or
folders.

tFileUnarchive Decompresses an archive file for further processing, in one of the following
formats: *.tar.gz , *.tgz, *.tar, *.gz and *.zip.

Archive/Unarchive scenarios
Comparing unzipped files
Zipping files using a tFileArchive

ARFF

ARFF components

tFileInputARFF Reads an ARFF file row by row to split them up into fields and then sends
the fields as defined in the schema to the next component.

tFileOutputARFF Writes an ARFF file that holds data organized according to the defined
schema.

ARFF scenario
Displaying the content of a ARFF file

AS400

AS400 components
tAS400Close Closes the transaction committed in the connected database.

tAS400Commit Commits in one go a global transaction instead of doing that on every row
or every batch, and provides gain in performance, using a unique
connection.

tAS400Connection Opens a connection to the specified database that can then be reused in
the subsequent subJob or subJobs.

tAS400Input Reads a database and extracts fields based on a query.

tAS400LastInsertId Obtains the primary key value of the record that was last inserted in an
AS/400 table.

tAS400Output Writes, updates, makes changes or suppresses entries in a database.

tAS400Rollback Cancels the transaction commit in the connected database and avoids to
commit part of a transaction involuntarily.

tAS400Row Executes the SQL query stated onto the specified database.

AS400 scenario
Handling data with AS/400

Avro
Avro components

tAvroInput Extracts records from any given Avro format files for other components to
process the records.

tAvroOutput Receives data flows from the processing component placed ahead of it and
writes the data into Avro format files in a given distributed file system.

tAvroStreamInput Listens on a given directory, reads data from Avro files once they are
created and sends this data to the component that follows.

Avro scenario
Filtering Avro format employee data

Azure Data Lake Store

Azure Data Lake Store components

tAzureAdlsGen2Input Retrieves data from an ADLS Gen2 file system of an Azure storage account
and passes the data to the subsequent component connected to it through
a Main>Row link.

tAzureAdlsGen2Output Uploads incoming data to an ADLS Gen2 file system of an Azure storage
account in the specified format.
tAzureFSConfiguration Provides authentication information for Spark to connect to a given Azure
file system.

Azure Data Lake Store scenarios


Accessing Azure ADLS Gen2 storage
Writing and reading data from Azure Data Lake Storage using Spark (Azure Databricks)

Azure SQL Data Warehouse

Azure SQL Data Warehouse components

tAzureSynapseBulkExec Loads data into an Azure SQL Data Warehouse table from either Azure Blob
Storage or Azure Data Lake Storage.

tAzureSynapseClose Closes an active connection to an Azure SQL Data Warehouse database.

tAzureSynapseCommit Commits in one go a global transaction instead of doing that on every row or
every batch and thus provides gain in performance.

tAzureSynapseConnection Opens a connection to an Azure SQL Data Warehouse database.

tAzureSynapseInput Reads data and extracts fields based on a query from an Azure SQL Data
Warehouse database.

tAzureSynapseOutput Writes, updates, makes changes or suppresses entries in an Azure SQL Data
Warehouse database.

tAzureSynapseRollback Cancels the transaction commit in the connected Azure SQL Data Warehouse
database to prevent partial transaction commit if an error occurs.

tAzureSynapseRow Executes an SQL query stated on an Azure SQL Data Warehouse database.

Azure Storage Blob

Azure Storage Blob components

tAzureFSConfiguration Provides authentication information for Spark to connect to a given Azure file system.

tAzureStorageConnection Uses authentication and the protocol information to create a connection to the
Microsoft Azure Storage system that can then be reused by other Azure Storage
components.

tAzureStorageContainerCreate Creates a new storage container used to hold Azure blobs (Binary Large Object) for a
given Azure storage account.

tAzureStorageContainerDelete Automates the removal of a given blob container from the space of a specific storage
account.

tAzureStorageContainerExist Automates the verification of whether a given blob container exists or not within a
storage account.
tAzureStorageContainerList Lists all containers in a given Azure storage account.

tAzureStorageDelete Deletes blobs from a given container for an Azure storage account according to the
specified blob filters.

tAzureStorageGet Retrieves blobs from a given container for an Azure storage account according to the
specified filters applied on the virtual hierarchy of the blobs and then write selected
blobs in a local folder.

tAzureStorageList Lists blobs in a given container according to the specified blob filters.

tAzureStoragePut Uploads local files into a given container for an Azure storage account.

Azure Storage Blob scenarios


Creating a container in Azure Storage
Retrieving files from a Azure Storage container

Azure Storage Queue


Azure Storage Queue components

tAzureStorageConnection Uses authentication and the protocol information to create a connection to the Microsoft
Azure Storage system that can then be reused by other Azure Storage components.

tAzureStorageQueueCreate Creates a new queue under a given Azure storage account.

tAzureStorageQueueDelete Deletes a specified queue permanently under a given Azure storage account.

tAzureStorageQueueInput Retrieves one or more messages from the front of an Azure queue.

tAzureStorageQueueInputLoop Runs an endless loop to retrieve messages from the front of an Azure queue.

tAzureStorageQueueList Returns all queues associated with the given Azure storage account.

tAzureStorageQueueOutput Adds messages to the back of an Azure queue.

tAzureStorageQueuePurge Purges messages in an Azure queue.

Azure Storage Table


Azure Storage Table components

tAzureStorageConnection Uses authentication and the protocol information to create a connection to


the Microsoft Azure Storage system that can then be reused by other Azure
Storage components.

tAzureStorageInputTable Retrieves a set of entities that satisfy the specified filter criteria from an Azure
storage table.
tAzureStorageOutputTable Performs the defined action on a given Azure storage table and inserts,
replaces, merges or deletes entities in the table based on the incoming data
from the preceding component.

Azure Storage Table scenario


Handling data with Microsoft Azure Table storage

Bonita

Bonita components

tBonitaDeploy Deploys a specific Bonita process to a Bonita Runtime.

tBonitaInstantiateProcess Starts an instance for a specific process deployed in a Bonita Runtime


engine.

Bonita scenarios
Executing a Bonita process via a Talend Job
Outputting the process instance UUID over the Row > Main link

Box

Box components

tBoxConnection Creates a Box connection that the other Box components can
reuse.

tBoxCopy Copies or moves a given folder or file from Box.

tBoxDelete Removes a given folder or file from Box.

tBoxGet Downloads a selected file from a Box account.

tBoxList Lists the files stored in a specified directory in Box.

tBoxPut Uploads files to a Box account.

Box scenario
Uploading and downloading files from Box

Buffer

Buffer components

tBufferInput Retrieves data bufferized via a tBufferOutput component, for example, to


process it in another subJob.

tBufferOutput Collects data in a buffer in order to access it later via webservice for
example.
Buffer scenarios
Buffering data to be used as a source system
Buffering data
Buffering output data on the webapp server
Calling a Job exported as Webservice in another Job
Calling a Job with context variables from a browser
Retrieving bufferized data
Returning a value from a child Job to the parent Job

Business rules

Business rules components

tBRMS Applies Drools business rules to an incoming flow and writes the output
data to an XML file.

tRules Uses business rules defined in a Drools file of .xls or .drl format in order to
filter data.

Business rules scenarios


Applying business rules to an Excel file to filter data
Extracting client data according to business rules stored in an external file
Extracting zip codes using DRL rules you create from the Studio

Cassandra

Cassandra components

tCassandraConfiguration Enables the reuse of the connection configuration to a Cassandra server in the
same Job.

tCassandraLookupInput Extracts the desired data from a standard or super column family of a Cassandra
keyspace so as to apply changes to the data.

tCassandraBulkExec Improves performance during Insert operations to a Cassandra column family.

tCassandraClose Disconnects a connection to a Cassandra server so as to release occupied


resources.

tCassandraConnection Enables the reuse of the connection it creates to a Cassandra server.

tCassandraInput Extracts the desired data from a standard or super column family of a Cassandra
keyspace so as to apply changes to the data.

tCassandraOutput Writes data into or deletes data from a column family of a Cassandra keyspace.

tCassandraOutputBulk Prepares an SSTable of large size and processes it according to your needs
before loading this SSTable into a column family of a Cassandra keyspace.

tCassandraOutputBulkExec Improves performance during Insert operations to a column family of a


Cassandra keyspace.
tCassandraRow Acts on the actual DB structure or on the data, depending on the nature of the
query and the database.

Cassandra scenario
Handling data with Cassandra

Change Data Capture

Change Data Capture components

tAS400CDC Addresses data extraction and transportation needs.

tDB2CDC Extracts the changes done to the source operational data and makes them
available to the target system(s) using database CDC views.

tInformixCDC Extracts the data from a source system which has changed since the last
extraction and transports it to another/other system(s).

tIngresCDC (deprecated) Extracts source system data that has changed since the last extraction and
transports it to another/other system(s).

tMSSqlCDC Extracts the changes made to the source operational data and makes them
available to the target system(s) using database CDC views.

tMysqlCDC Extracts only the changes made to the source operational data and makes
them available to the target system(s) using database CDC views.

tOracleCDC Extracts source system data that has changed since the last extraction and
transports it to another/other system(s).

tOracleCDCOutput Synchronizes data changes in the Oracle XStream CDC mode.

tPostgresqlCDC Addresses data extraction and transportation needs, only extracts the
changes made to the source operational data and makes them available to
the target system(s) using database CDC views.

tSybaseCDC Extracts source system data that has changed since the last extraction and
transports it to another/other system(s).

tTeradataCDC Extracts source system data that has changed since the last extraction and
transports it to another system(s) using the CDC Trigger mode.

Change Data Capture scenarios


Extracting and synchronizing data changes using XStream mode
Populating a MySQL data warehouse
Populating an Oracle data warehouse
Retrieving modified data using CDC
Retrieving modified data using Oracle CDC Redo log mode

Chart
Chart components

tBarChart Generates a bar chart from the input data to ease technical analysis.

tLineChart Reads data from an input flow and transforms the data into a line chart in a
PNG image file to ease technical analysis.

Chart scenarios
Creating a bar chart from the input data
Creating a line chart to ease trend analysis

Cloud

Cloud components

tCloudStart Starts instances on Amazon EC2 (Amazon Elastic Compute Cloud).

tCloudStop Changes the status of a launched instance on Amazon EC2 (Amazon Elastic
Compute Cloud).

CombinedSQL

CombinedSQL components

tCombinedSQLAggregate Provides a set of matrix based on values or calculations.

tCombinedSQLFilter Filters data by reorganizing, deleting or adding columns based on the


source table and to filter the given data source using the filter conditions.

tCombinedSQLInput Extracts fields from a database table based on its schema definition.

tCombinedSQLOutput Inserts records from the incoming flow to an existing database table.

CombinedSQL scenario
Filtering and aggregating table columns directly on the DBMS

Context
Context components

tContextDump Copies the context setup of the current Job to a flat file, a database table,
etc., which can then be used by tContextLoad.

tContextLoad Loads a context from a flow.

Context scenario
Reading data from different MySQL databases using dynamically loaded connection parameters

CosmosDB
CosmosDB components

tCosmosDBSQLAPIInput Retrieves data from a Cosmos database collection through SQL API.

tCosmosDBSQLAPIOutput Inserts, updates, upserts or deletes documents in a Cosmos database


collection based on the incoming flow from the preceding component
through SQL API.

tCosmosDBBulkLoad Imports data files in different formats (CSV, TSV or JSON) into the specified
Cosmos database so that the data can be further processed.

tCosmosDBConnection Creates a connection to a CosmosDB database and reuse that connection in


other components.

tCosmosDBInput Retrieves certain documents from a Cosmos database collection by


supplying a query document containing the fields the desired documents
should match.

tCosmosDBOutput Inserts, updates, upserts or deletes documents in a Cosmos database


collection based on the incoming flow from the preceding component in the
Job.

tCosmosDBRow Executes the commands of the Cosmos database.

Couchbase

Couchbase components

tCouchbaseDCPInput Queries the documents from the Couchbase database, under the Database
Change Protocol (DCP), a streaming protocol.

tCouchbaseDCPOutput Upserts documents in the Couchbase database based on the incoming flat
data from preceding components, under the Database Change Protocol
(DCP), a streaming protocol.

tCouchbaseInput Queries the documents from the Couchbase database.

tCouchbaseOutput Upserts documents in the Couchbase database based on the incoming flat
data from preceding components.

Couchbase scenario
Querying JSON documents from a Couchbase database with a N1QL query

CyberArk

CyberArk component
tCyberarkInput Retrieves the content of an secret object (usually, a password) stored in a
Cyberark vault at runtime. The retrieved content is stored in the after
variable SECRET, which can be referenced by any subsequent components
in the Job. The content can also be passed to the subsequent component
in a column named secret through a Row > Main connection.

CyberArk scenario
Accessing a password-protected file

Data mapping

Data mapping components

tHConvertFile Uses Talend Data Mapper structures to perform a conversion from one
representation to another, as a Spark Batch execution.

tHMap Executes transformations (called maps) between different sources and


destinations by harnessing the capabilities of Talend Data Mapper ,
available in the Mapping perspective.

tHMapFile Runs a Talend Data Mapper map where input and output structures may
differ, as a Spark batch execution.

tHMapInput Runs a Talend Data Mapper map where input and output structures may
differ, as a Spark batch execution, and sends the data for use by a
downstream component.

tHMapRecord Runs a Talend Data Mapper map where input and output structures may
differ, as a Spark streaming execution.

Data mapping scenarios


Connecting tHMapRecord to multiple outputs
Generating the Output Using tHMap with Multiple Schema Inputs
Generating the Output using tHMap with Multiple Payload Inputs
Handling errors
Transforming data in a Spark environment
Transforming from a Data Integration schema to a complex content schema
Using Talend Data Integration metadata
Using Talend Data Mapper metadata

Data Preparation

Data Preparation components

tDataprepRun Applies a preparation made using Talend Data Preparation in a standard


Data Integration Job.

tDatasetInput Creates a flow with data from a Talend Data Preparation dataset.

tDatasetOutput Creates a dataset in Talend Data Preparation.


Data Preparation scenarios
Applying a preparation to a data sample in an Apache Spark Batch Job
Applying a preparation to a data sample in an Apache Spark Streaming Job
Creating a dataset from a Job
Dynamically selecting a preparation at runtime according to the input
Preparing data from a database in a Talend Job
Promoting a Job leveraging a preparation across environments

Data Quality

Address standardization
Address standardization components

tAddressRowCloud Verifies and formats international addresses in the Cloud by using online
services.

tBatchAddressRowCloud Uses batch processing to parse address data and get formatted addresses
quickly, accurately and without installing any software.

Address standardization scenarios

Editing the mapping of the verification codes from address validation providers to Talend verification levels
Parsing addresses against reference data in the Cloud
Parsing addresses against reference data in the Cloud using batch processing

Continuous matching
Continuous matching components

tMatchIndex Indexes a clean and deduplicated data set in ElasticSearch for continuous
matching purposes.

tMatchIndexPredict Compares a new data set with a lookup data set stored in ElasticSearch,
using tMatchIndex. tMatchIndexPredict outputs unique records and
suspect duplicates in separate files.

Continuous matching scenarios

Doing continuous matching using tMatchIndexPredict


Indexing a reference data set in Elasticsearch

Data extraction
Data extraction components

tExtractRegexFields Extracts data and generates multiple columns from a formatted string
using regex matching.

tPatternExtract Outputs all data that match a given pattern. You can then implement any
required operation on the extracted data.

Data extraction scenarios


Extracting name, domain and TLD from e-mail addresses
Extracting only the data that corresponds to a defined pattern from a delimited file

Data matching
Data matching components

tMatchGroup Creates groups of similar data records in any source data including large
volumes of data by using one or several match rules.

tRecordMatching Ensures the data quality of any source data against a reference data source.

Data matching scenarios

Grouping output data in separate flows according to the minimal distance computed in each record
Matching customer data through multiple passes
Matching data through multiple passes using Map/Reduce components
Matching entries using the Q-grams and Levenshtein algorithms
Using a custom matching algorithm to match entries
Using survivorship functions to merge two records and create a master record

Data privacy
Data privacy components

tDataDecrypt Decrypts data encrypted with the tDataEncrypt component.

tDataEncrypt Protects data by transforming it into unreadable cipher text.

tDataMasking Hides original data with random characters or figures to protect the actual
data while having a functional substitute for occasions when it is not
advisable to show sensitive real data.

tDataShuffling Shuffles the data from in an input table to protect the actual data while
having a functional data set. Data will remain usable for purposes such as
testing and training.

tDataUnmasking Unmasks data masked with the tDataMasking component to retrieve the
original data.

tDuplicateRow Creates duplicates with meaningful data for data quality functional testing
purposes.

tPatternMasking Masks data that follows a specific pattern and can transform the original
data in consistent manner, if needed.

tPatternUnmasking Unmasks data masked with the tPatternMasking component to retrieve the
original data.

Data privacy scenarios

Altering data values to restrict the use of actual sensitive data


Encrypting and decrypting back sensitive data
Generating duplicate data from an input flow
Masking Australian phone numbers
Masking Medicare beneficiary identifiers
Shuffling data values to restrict the use of actual sensitive data
Unmasking Australian phone numbers

Deduplication
Deduplication components

tRuleSurvivorship Creates the single representation of an entity according to business rules


and can create a master copy of data for Master Data Management.

tSurviveFields Centralizes data from various and heterogeneous sources to create a


master copy of data for MDM.

tUniqRow Ensures data quality of input or output flow in a Job.

Deduplication scenarios

Converting the Standard Job to a Spark Batch Job


Creating a clean data set from the suspect pairs labeled by tMatchPredict and the unique rows computed by tMatchPairing
Deduplicating entries based on dynamic schema
Deduplicating entries using Map/Reduce components
Merging the content of several rows using different columns as rank values
Modifying the rule file manually to code the conditions you want to use to create a survivor
Selecting the best-of-breed data from a group of duplicates to create a survivor
Deduplicating entries

Email validation
Email validation component

tVerifyEmail Verifies if email addresses comply with specific rules and corrects
addresses that do not match the rules by using the content from specific
columns.

Email validation scenario

Verifying email addresses against column content and domain names

Formatting
Formatting component

tChangeFileEncoding Transforms the character encoding of a given file and generates a new file
with the transformed character encoding.

Formatting scenario

Transforming the character encoding of a file

Fuzzy matching
Fuzzy matching components
tBlockedFuzzyJoin Helps ensuring the data quality of any source data against a reference data source.
(deprecated)

tFuzzyJoin (deprecated) Joins two tables by doing a fuzzy match on several columns, comparing columns from the
main flow with reference columns from the lookup flow and outputting the main flow data
and/or the rejected data.

tFuzzyMatch Compares a column from the main flow with a reference column from the lookup flow and
outputs the main flow data displaying the distance.

tFuzzyUniqRow Compares columns in the input flow by using a defined matching method and collects the
encountered duplicates.

Fuzzy matching scenarios

Checking the Levenshtein distance of 0 in first names


Checking the Levenshtein distance of 1 or 2 in first names
Checking the Metaphonic distance in first name
Comparing four columns using different matching methods and collecting encountered duplicates
Doing a fuzzy match on two columns and outputting the main and rejected data (deprecated)
Doing a fuzzy match on two columns and outputting the match, possible match and non match values (deprecated)

Google address standardization


Google address standardization components

tGoogleAddressRow Converts human-readable addresses into geographic coordinates and


other geographic information.

tGoogleGeocoder Converts human-readable addresses into geographic coordinates.

tGoogleMapLookup Obtains detailed geographic information using geographic coordinates and


address information.

Google address standardization scenarios

Obtaining detailed geographic information using address and geographic coordinates


Obtaining detailed geographic information using address information
Obtaining geographic coordinates using address information

Identification
Identification components

tGenKey Generates a functional key from the input columns, by applying different
types of algorithms on each column and grouping the computed results in
one key, then outputs this key with the input columns.

tAddCRCRow Provides a unique ID which helps improving the quality of processed data.
CRC stands for Cyclical Redundancy Checking.

Identification scenarios

Comparing columns and grouping in the output flow duplicate records that have the same functional key
Generating functional keys in the output flow
Adding a surrogate key to a file

Loqate address standardization


Loqate address standardization component

tLoqateAddressRow Parses, verifies, cleanses, standardizes, transliterates, and formats


international addresses.

Loqate address standardization scenario

Parsing addresses against Loqate data

Matching with machine learning


Matching with machine learning components

tMatchModel Generates the matching model that is used by the tMatchPredict


component to automatically predict the labels for the suspect pairs and
groups records which match the label(s) set in the component properties.

tMatchPairing Enables you to compute pairs of suspect duplicates from any source data
including large volumes in the context of machine learning on Spark.

tMatchPredict Labels suspect records automatically and groups suspect records which
match the label(s) set in the component properties.

Matching with machine learning scenarios

Computing suspect pairs and suspect sample from source data


Computing suspect pairs and writing a sample in Talend Data Stewardship
Generating a matching model
Generating a matching model from a Grouping campaign
Labeling suspect pairs with assigned labels

Melissa Data address standardization


Melissa Data address standardization components

tMelissaDataAddress Verifies if an address is properly formatted and corrects any formatting or


spelling errors in each row.

tPersonator Ensures the quality of a US and Canadian contact database by checking,


verifying, moving and appending contact data.

Melissa Data address standardization scenarios

Editing addresses against a Melissa Data data file


Scenario: Verifying and enriching a database

Microsoft SQL Server validation


Microsoft SQL Server validation components
tMSSqlInvalidRows Extracts DB rows that match a given data quality business rule. You can
then implement any required correction.

tMSSqlValidRows Extracts DB rows that match a given data quality business rule.

MySQL validation
MySQL validation components

tMySQLInvalidRows Checks MySQL database rows against specific Data Quality patterns
(regular expression) or Data Quality rules (business rule).

tMySQLValidRows Checks MySQL database rows against Data Quality patterns (regular
expression).

MySQL validation scenarios

Checking customer table against a given DQ rule to select customer records


Reading email addresses from a DB table and retrieving specific data

Name standardization
Name standardization component

tFirstnameMatch Matches first names against a reference index in order to standardize


data.

Name standardization scenario

Matching first names with a reference index

Oracle validation
Oracle validation components

tOracleInvalidRows Checks Oracle database rows against specific Data Quality patterns
(regular expression) or Data Quality rules (business rule).

tOracleValidRows Checks Oracle database rows against Data Quality patterns (regular
expression).

Pattern validation
Pattern validation components

tFindRegexlibExpressions Returns a dataset holding information about all of the regular expressions
that match the request sent to the web server.

tLastRegexlibExpressions Returns a dataset holding information about the N most recent regular
expressions added to the library and that match the query at
http://regexlib.com.
tMultiPatternCheck Checks all existing data in multiple columns against a given Java regular
expression.

tPatternCheck Gives two output flows: Matching Data and Non-Matching Data. The first
collects all data that match a given pattern, and the second collects all data
that do not match a given pattern. You can then implement any required
corrections.

Pattern validation scenarios

Checking the data in multiple columns against patterns


Connecting to a web service and returning a list of regular expressions

Phone number standardization


Phone number standardization component

tStandardizePhoneNumber Standardizes phone numbers according to given formats.

Phone number standardization scenario

Standardizing French phone numbers

PostgreSQL validation
PostgreSQL validation components

tPostgresqlInvalidRows Extracts DB rows that do not match a given data quality pattern, you can
then implement any required correction.

tPostgresqlValidRows Extracts DB rows that match a given data quality pattern.

QAS address standardization


QAS address standardization components

tQASAddressIncomplete Gives two output flows: Incomplete and Reject.


(deprecated)

tQASAddressRow Corrects any formatting or spelling errors and gives the verification status for each row.

tQASAddressUnknown Gives one output flow: Unknown which collects all addresses that do not match to
(deprecated) deliverable results in the QuickAddress data.

tQASAddressVerified Gives three output flows: Verified, Interaction required, and Reject.
(deprecated)

tQASBatchAddressRow Corrects any formatting or spelling errors, adds missing data and gives the verification
status for each row.

QAS address standardization scenarios

Editing addresses against QAS files and giving the verification status
Editing addresses and giving the verification status

Reporting
Reporting components

tDqReportRun Launches the analyses listed in a report and save the results in the data
quality data mart.

tThresholdViolationAlert Alerts to any threshold violations regarding the thresholds set on


indicators in different quality analyses created in the Studio.

Reporting scenarios

Launching a profiling report from Talend Cloud Management Console


Launching a profiling report from Talend Administration Center Web application

Sampling
Sampling component

tReservoirSampling Extracts a random sample data from a big data set.

Sampling scenario

Extracting sample data from an input data set

Standardization
Standardization components

tStandardizeRow Normalizes the incoming data in a separate XML or JSON data flow to
separate or standardize the rule-compliant data from the non-compliant
data.

tIntervalMatch Returns a value based on a Join relation.

tReplaceList Cleanses all files before further processing.

Standardization scenarios

Extracting exact match by using Index rules


Normalizing data using rules of basic types
Standardizing addresses from unstructured data
Using two parsing levels to extract information from unstructured data
Identifying server locations based on their IP addresses
Replacing state names with their two-letter codes

Synonym index
Synonym index components

tSynonymOutput Creates a Lucene index and feeds it with entries and the related synonyms
it receives.
tSynonymSearch Searches a given index for the reference entries matching the data you
input.

Synonym index scenarios

Creating a synonym index for city names


Creating a synonym index for people names using tMap
Searching a given index for matched reference entries
Searching for matched reference entries for two input columns

Text standardization
Text standardization components

tJapaneseNumberNormalize Normalizes Japanese numbers (kansūji) to regular Arabic numbers.

tJapaneseTokenize Splits Japanese text into tokens.

tJapaneseTransliterate Converts textual data in Japanese to kana and Latin scripts.

tStem Enables to standardize data in columns before matching this data.

tTransliterate Converts strings from many languages of the world to a standard set of characters
(Universal Coded Character Set, UCS).

Text standardization scenarios

Converting Japanese numbers to Arabic numbers


Converting words from different languages to standard set of characters
Extracting the stems of English words from a specific DB column
Generating stems for a list of English words
Tokenizing Japanese text
Transliterating Japanese text

Uniserv
Uniserv components

tUniservBTGeneric Executes a process created with the Uniserv product DQ Batch Suite.
(deprecated)

tUniservRTConvertName Analyzes the name elements in an address .


(deprecated)

tUniservRTMailBulk Creates the index pool for duplicate search.


(deprecated)

tUniservRTMailOutput Synchronizes the index pool that is used for duplicate search.
(deprecated)

tUniservRTMailSearch Searches for duplicate values based on a given input record and adds additional data to
(deprecated) each record.
tUniservRTPost (deprecated) Improves the addresses quality, which is extremely important for CRM and e-business as it is
directly related to postage and advertising costs.

Uniserv scenarios

Adding contacts to the mailRetrieval index pool


Analyzing a person's name and assigning a salutation
Checking and correcting the postal code, city and street
Checking and correcting the postal code, city and street, as well as rejecting the unfeasible
Creating an index pool
Execution of a Job in the Data Quality Service Hub Studio

Validation
Validation component

tSchemaComplianceCheck Ensures the data quality of any source data against a reference data source.

Validation scenario

Validating data against schema

Data Stewardship
Data Stewardship components

tDataStewardshipTaskDelete Connects to Talend Data Stewardship and deletes the data stored in campaigns in the
form of tasks.

tDataStewardshipTaskInput Connects to Talend Data Stewardship and retrieves the data stored in campaigns in
the form of tasks.

tDataStewardshipTaskOutput Connects to Talend Data Stewardship and loads data into campaigns in the form of
tasks. The tasks must have the same schema defined in the campaign.

Data Stewardship scenarios


Assigning tasks dynamically in Talend Data Stewardship
Deleting tasks from Talend Data Stewardship
Populating campaigns dynamically using campaign IDs
Populating tasks into the same campaign on different Talend Data Stewardship instances
Retrieving tasks from Talend Data Stewardship
Writing tasks in a Merging campaign
Writing tasks in Talend Data Stewardship campaigns

Database utility

Database utility components

tCreateTable Creates a table for a specific type of database.


tDBSQLRow Acts on the actual DB structure or on the data (although without handling
data) depending on the nature of the query and the database. The
SQLBuilder tool helps you write easily your SQL statements.

Database utility scenarios


Creating new table in a Mysql Database
Resetting a DB auto-increment

Databricks

Databricks components

tDBFSConnection Connects to a given DBFS (Databricks Filesystem) system so that the other
DBFS components can reuse the connection it creates to communicate
with this DBFS.

tDBFSGet Copies files from a given DBFS (Databricks Filesystem) system, pastes them
in a user-defined directory and if needs be, renames them.

tDBFSPut Connects to a given DBFS (Databricks Filesystem) system, copies files from
an user-defined directory, pastes them in this system and if needs be,
renames these files.

Databricks scenarios
Writing and reading data from Azure Data Lake Storage using Spark (Azure Databricks)
Writing and reading data from S3 (Databricks on AWS)

DB Generic

DB Generic components

tDBCDC Extracts only the changes made to the source operational data and makes
them available to the target system(s) using database CDC views.

tDBCDCOutput Synchronizes data changes in database of the selected database type in


the CDC mode.

tDBInvalidRows Checks database rows against specific Data Quality patterns (regular
expression) or Data Quality rules (business rule).

tDBValidRows Checks database rows against Data Quality patterns (regular expression).

tDBBulkExec Offers gains in performance while executing the Insert operations on a


database.

tDBClose Closes the transaction committed in a connected database.

tDBColumnList Iterates on all columns of a given database table and lists column names.

tDBCommit Validates the data processed through the Job into the connected database.
tDBConnection Opens a connection to a database to be reused in the subsequent subJob
or subJobs.

tDBInput Extracts data from a database.

tDBLastInsertId Obtains the primary key value of the record that was last inserted in a
database table by a user.

tDBOutput Writes, updates, makes changes or suppresses entries in a database.

tDBOutputBulk Writes a file with columns based on the defined delimiter and the
standards of the selected database type.

tDBOutputBulkExec Executes the Insert action in a database.

tDBRollback Cancels the transaction commit in a connected database to avoid


committing part of a transaction involuntarily.

tDBRow Executes the stated SQL query onto a database.

tDBSCD Reflects and tracks changes in a dedicated database SCD table.

tDBSCDELT Reflects and tracks changes in a dedicated SCD table through SQL queries.

tDBSP Calls a database stored procedure.

tDBTableList Lists the names of specified database tables using a SELECT statement
based on a WHERE clause.

tParseRecordSet Parses a recordset rather than individual records from a table.

DB2
DB2 components

tDB2BulkExec Executes the Insert action on the provided data and gains in performance
during Insert operations to a DB2 database.

tDB2Close Closes a transaction committed in the connected DB.

tDB2Commit Commits in one go a global transaction instead of doing that on every row
or every batch and thus provides gain in performance.

tDB2Connection Opens a connection to the specified database that can then be reused in
the subsequent subJob or subJobs.

tDB2Input Executes a DB query with a strictly defined order which must correspond to
the schema definition. Then tDB2Input passes on the field list to the next
component via a Row > Main link.

tDB2Output Executes the action defined on the table and/or on the data contained in
the table, based on the flow incoming from the preceding component in
the Job.
tDB2Rollback Avoids to commit part of a transaction involuntarily.

tDB2Row Acts on the actual DB structure or on the data (although without handling
data) depending on the nature of the query and the database. The
SQLBuilder tool helps you write easily your SQL statements.

tDB2SP Offers a convenient way to call the database stored procedures.

DBFS

DBFS components

tDBFSConnection Connects to a given DBFS (Databricks Filesystem) system so that the other
DBFS components can reuse the connection it creates to communicate
with this DBFS.

tDBFSGet Copies files from a given DBFS (Databricks Filesystem) system, pastes them
in a user-defined directory and if needs be, renames them.

tDBFSPut Connects to a given DBFS (Databricks Filesystem) system, copies files from
an user-defined directory, pastes them in this system and if needs be,
renames these files.

Defining Context Groups

Defining Context Groups scenarios


Reading data from databases through context-based dynamic connections
Using context parameters when reading a table from a database

Delimited

Delimited components

tFileStreamInputDelimited Reads data continuously, row by row, to split it into fields, then sends fields
defined in its schema to the next Job component, via a Row > Main link.

tFileInputDelimited Reads a delimited file row by row to split them up into fields and then sends
the fields as defined in the schema to the next component.

tFileOutputDelimited Outputs the input data to a delimited file according to the defined schema.

tPivotToColumnsDelimited Fine-tunes the selection of data to output.

Delimited scenarios
Reading data from a Delimited file and display the output
Reading data from a remote file in streaming mode
Using a pivot column to aggregate data
Utilizing Output Stream to save filtered data to a local file
Writing data in a delimited file
Delta Lake

Delta Lake components

tDeltaLakeClose Closes an active DeltaLake connection to release the occupied resources.

tDeltaLakeConnection Opens a connection to the specified database that can then be reused in
the subsequent subJob or subJobs.

tDeltaLakeInput Extracts the latest version or a given snapshot of records from the Delta
Lake layer of your Data Lake system and sends the data to the next
component for further processing.

tDeltaLakeOutput Writes records in the Delta Lake layer of your Data Lake system in the
Parquet format.

tDeltaLakeRow Acts on the actual DB structure or on the data (although without handling
data) using the SQLBuilder tool to write easily your SQL statements.

Delta Lake scenario


Computing the day-over-day evolution of US flights using a related Delta Lake dataset

DotNET

DotNET components

tDotNETInstantiate Invokes the constructor of a .NET object that is intended for later
reuse.

tDotNETRow Facilitates data transform by utilizing custom or built-in .NET classes.

DotNET scenarios
Integrating .Net into Talend Studio: Introduction
Utilizing .NET in Talend

Dropbox

Dropbox components

tDropboxConnection Creates a Dropbox connection to a given account that the other Dropbox
components can reuse.

tDropboxDelete Removes a given folder or file from Dropbox.

tDropboxGet Downloads a selected file from a Dropbox account to a specified local


directory.

tDropboxList Lists the files stored in a specified directory on Dropbox.

tDropboxPut Uploads data to Dropbox from either a local file or a given data flow.
Dropbox scenario
Uploading files to Dropbox

Dynamic Schema

Dynamic Schema component

tSetDynamicSchema Sets a dynamic schema that can be reused by components in the


subsequent subJob or subJobs to retrieve data from unknown columns.

Dynamic Schema scenarios


Handling a positional file based on a dynamic schema
Writing dynamic columns from a database to an output file

ElasticSearch

ElasticSearch components

tElasticSearchConfiguration Enables the reuse of the connection configuration to ElasticSearch in the same
Job.

tElasticSearchInput Reads documents from a given Elasticsearch system based on a user-defined


query.

tElasticSearchLookupInput Executes a ElasticSearch query with a strictly defined order which must
correspond to the schema definition.

tElasticSearchOutput Writes datasets into a given Elasticsearch system.

ELT Greenplum

ELT Greenplum components

tELTGreenplumInput Adds as many Input tables as required for the most complicated Insert
statement.

tELTGreenplumMap Uses the tables provided as input to feed the parameter in the built
statement. The statement can include inner or outer joins to be
implemented between tables or between one table and its aliases.

tELTGreenplumOutput Executes the SQL Insert, Update and Delete statement to the Greenplum
database

ELT Greenplum scenarios


Aggregating Snowflake data using context variables as table and connection names
Aggregating table columns and filtering
Mapping data using a simple implicit join
Mapping data using a subquery
Mapping date using using an Alias table
ELT Hive

ELT Hive components

tELTHiveInput Replicates the schema, which the tELTHiveMap component that follows
will use, of the input Hive table.

tELTHiveMap Builds graphically the Hive QL statement in order to transform data.

tELTHiveOutput Works alongside tELTHiveMap to write data into the Hive table.

ELT Hive scenarios


Aggregating Snowflake data using context variables as table and connection names
Joining table columns and writing them into Hive
Mapping data using a subquery

ELT JDBC

ELT JDBC components

tELTInput Adds as many Input tables as required for the SQL statement to be
executed.

tELTMap Uses the tables provided as input to feed the parameter in the built SQL
statement. The statement can include inner or outer joins to be
implemented between tables or between one table and its aliases.

tELTOutput Carries out the action on the table specified and inserts the data according
to the output schema defined in the ELT Mapper.

ELT JDBC scenarios


Aggregating Snowflake data using context variables as table and connection names
Aggregating table columns and filtering
Mapping data using a simple implicit join
Mapping data using a subquery
Mapping date using using an Alias table

ELT MSSql
ELT MSSql components

tELTMSSqlInput Adds as many Input tables as required for the most complicated Insert
statement.

tELTMSSqlMap Uses the tables provided as input to feed the parameter in the built
statement. The statement can include inner or outer joins to be
implemented between tables or between one table and its aliases.

tELTMSSqlOutput Executes the SQL Insert, Update and Delete statement to the MSSql
database
ELT MSSql scenarios
Aggregating Snowflake data using context variables as table and connection names
Aggregating table columns and filtering
Mapping data using a simple implicit join
Mapping data using a subquery
Mapping date using using an Alias table

ELT MySQL

ELT MySQL components

tELTMysqlInput Adds as many Input tables as required for the most complicated Insert
statement.

tELTMysqlMap Uses the tables provided as input to feed the parameter in the built
statement. The statement can include inner or outer joins to be
implemented between tables or between one table and its aliases.

tELTMysqlOutput tELTMysqlOutput executes the SQL Insert, Update and Delete statement to
the Mysql database

ELT MySQL scenarios


Aggregating Snowflake data using context variables as table and connection names
Aggregating table columns and filtering
Mapping data using a simple implicit join
Mapping data using a subquery
Mapping date using using an Alias table

ELT Netezza
ELT Netezza components

tELTNetezzaInput Allows you to add as many Input tables as required for the most
complicated Insert statement.

tELTNetezzaMap Uses the tables provided as input, to feed the parameter in the built
statement. The statement can include inner or outer joins to be
implemented between tables or between one table and its aliases.

tELTNetezzaOutput Performs the action (insert, update or delete) on data in the specified
Netezza table through the SQL statement generated by the
tELTNetezzaMap component.

ELT Netezza scenarios


Aggregating Snowflake data using context variables as table and connection names
Aggregating table columns and filtering
Mapping data using a simple implicit join
Mapping data using a subquery
Mapping date using using an Alias table
ELT Oracle

ELT Oracle components

tELTOracleInput Provides the Oracle table schema that will be used by the tELTOracleMap
component to generate the SQL SELECT statement.

tELTOracleMap Builds the SQL SELECT statement using the table schema(s) provided by
one or more tELTOracleInput components.

tELTOracleOutput Performs the action (insert, update, delete, or merge) on data in the
specified Oracle table through the SQL statement generated by the
tELTOracleMap component.

ELT Oracle scenarios


Aggregating Snowflake data using context variables as table and connection names
Aggregating table columns and filtering
Mapping data using a simple implicit join
Mapping data using a subquery
Mapping date using using an Alias table
Updating Oracle database entries
Managing data using the Oracle MERGE function

ELT PostgreSQL

ELT PostgreSQL components

tELTPostgresqlInput Provides the Postgresql table schema that will be used by the
tELTPostgresqlMap component to generate the SQL SELECT statement.

tELTPostgresqlMap Builds the SQL SELECT statement using the table schema(s) provided by
one or more tELTPostgresqlInput components.

tELTPostgresqlOutput Performs the action (insert, update or delete) on data in the specified
Postgresql table through the SQL statement generated by the
tELTPostgresqlMap component.

ELT PostgreSQL scenarios


Aggregating Snowflake data using context variables as table and connection names
Aggregating table columns and filtering
Mapping data using a simple implicit join
Mapping data using a subquery
Mapping date using using an Alias table

ELT Sybase

ELT Sybase components

tELTSybaseInput Provides the Sybase table schema that will be used by the tELTSybaseMap
component to generate the SQL SELECT statement.
tELTSybaseMap Builds the SQL SELECT statement using the table schema(s) provided by
one or more tELTSybaseInput components.

tELTSybaseOutput Performs the action (insert, update or delete) on data in the specified
Sybase table through the SQL statement generated by the tELTSybaseMap
component.

ELT Sybase scenarios


Aggregating Snowflake data using context variables as table and connection names
Aggregating table columns and filtering
Mapping data using a simple implicit join
Mapping date using using an Alias table

ELT Teradata

ELT Teradata components

tELTTeradataInput Provides the Teradata table schema that will be used by the
tELTTeradataMap component to generate the SQL SELECT statement.

tELTTeradataMap Builds the SQL SELECT statement using the table schema(s) provided by
one or more tELTTeradataInput components.

tELTTeradataOutput Performs the action (insert, update or delete) on data in the specified
Teradata table through the SQL statement generated by the
tELTTeradataMap component.

ELT Teradata scenarios


Aggregating Snowflake data using context variables as table and connection names
Aggregating table columns and filtering
Mapping data using a simple implicit join
Mapping data using a subquery
Mapping date using using an Alias table

ELT Vertica
ELT Vertica components

tELTVerticaInput Provides the Vertica table schema that will be used by the tELTVerticaMap
component to generate the SQL SELECT statement.

tELTVerticaMap Builds the SQL SELECT statement using the table schema(s) provided by
one or more tELTVerticaInput components.

tELTVerticaOutput Performs the action (insert, update or delete) on data in the specified
Vertica table through the SQL statement generated by the tELTVerticaMap
component.

ELT Vertica scenarios


Aggregating Snowflake data using context variables as table and connection names
Aggregating table columns and filtering
Mapping data using a simple implicit join
Mapping data using a subquery
Mapping date using using an Alias table

ESB REST

ESB REST components

tRESTClient Interacts with RESTful Web service providers by sending HTTP and HTTPS
requests using CXF (JAX-RS) getting the corresponding responses.

tRESTRequest Receives GET/POST/PUT/PATCH/DELETE requests from the clients on the


server end.

tRESTResponse Returns a specific HTTP status code to the client end as a response to the
HTTP and/or HTTP requests.

ESB REST scenarios


Building a JSON document with tXMLMap to call a REST service
Getting user information by interacting with a RESTful service
Using a REST service to accept HTTP POST requests
Using a REST service to accept HTTP POST requests and send responses
Using a REST service to accept HTTP POST requests in an HTML form
Updating user information by interacting with a RESTful service
Using URI Query parameters to explore the data of a database
Using a REST service to accept HTTP GET requests and send responses
Using context variables in REST endpoint URLs in Data Services

ESB SOAP

ESB SOAP components

tESBConsumer Calls the defined method from the invoked Web service and returns the
class as defined, based on the given parameters.

tESBProviderFault Serves a Talend Job cycle result as a Fault message of the Web service in
case of a request response communication style.

tESBProviderRequest Wraps Talend Job as web service.

tESBProviderResponse Serves a Talend Job cycle result as a response message.

ESB SOAP scenarios


Requesting airport names based on country codes
Returning Hello world response
Sending a message without expecting a response
Using tESBConsumer to retrieve the valid email
Using tESBConsumer with custom SOAP Headers
EXASolution

EXASolution components

tExasolBulkExec Imports data into an EXASolution database table using the IMPORT
command provided by the EXASolution database in a fast way.

tExasolClose Closes an active connection to an EXASolution database instance to release


the occupied resources.

tExasolCommit Validates the data processed through the Job into the connected
EXASolution database.

tExasolConnection Opens a connection to an EXASolution database instance that can then be


reused by other EXASolution components.

tExasolInput Retrieves data from an EXASolution database based on a query with a


strictly defined order which corresponds to the schema definition, and
passes the data to the next component.

tExasolOutput Writes, updates, modifies or deletes data in an EXASolution database by


executing the action defined on the table and/or on the data in the table,
based on the flow incoming from the preceding component.

tExasolRollback Cancels the transaction commit in the connected EXASolution database.

tExasolRow Executes SQL queries on an EXASolution database.

EXASolution scenario
Importing data into an EXASolution database table from a local CSV file

Excel

Excel components

tFileInputExcel Reads an Excel file row by row to split them up into fields using regular
expressions and then sends the fields as defined in the schema to the next
component.

tFileOutputExcel Writes an MS Excel file with separated data values according to a defined
schema.

Excel scenario
Extracting data from specific Excel cells

EXist
EXist components

tEXistConnection (deprecated) Opens a connection to an eXist database in order that a transaction may be carried out.
tEXistDelete (deprecated) Deletes specified resources from a remote eXist database.

tEXistGet (deprecated) Retrieves selected resources from a remote eXist database to a defined local directory.

tEXistList (deprecated) Lists the resources stored on a remote eXist database.

tEXistPut (deprecated) Uploads specified files from a defined local directory to a remote eXist database.

tEXistXQuery (deprecated) Queries XML files located on remote databases using local files containing XPath queries
and outputs the results to an XML file stored locally.

tEXistXUpdate (deprecated) Processes XML file records and updates the existing records on the database server.

EXist scenario
Retrieving resources from a remote eXist DB server

Firebird

Firebird components

tFirebirdClose Closes a transaction with a Firebird database.

tFirebirdCommit Commits a global transaction instead of doing so on every row or every


batch, thus providing a gain in performance.

tFirebirdConnection Opens a connection to the specified database that can then be reused in
the subsequent subJob or subJobs.

tFirebirdInput Executes a database query on a Firebird database with a strictly defined


order which must correspond to the schema definition then passes on the
field list to the next component via a Main row link.

tFirebirdOutput Executes the action defined on the table in a Firebird database and/or on
the data contained in the table, based on the flow incoming from the
preceding component in the Job.

tFirebirdRollback Cancels the transation committed in the connected Firebird database.

tFirebirdRow Executes the stated SQL query on the specified Firebird database.

Flume

Flume components

tFlumeInput Acts as interface to integrate Flume and the Spark Streaming Job
developed with the Studio to continuously read data from a given Flume
agent.

tFlumeOutput Acts as interface to integrate Flume and the Spark Streaming Job
developed with the Studio to continuously send data to a given Flume
agent.
FTP

FTP components

tFTPClose Closes an active FTP connection to release the occupied resources.

tFTPConnection Opens an FTP connection to transfer files in a single transaction.

tFTPDelete Deletes files or folders in a specified directory on an FTP server.

tFTPFileExist Checks if a file or a directory exists on an FTP server.

tFTPFileList Lists all files and folders directly under a specified directory based on a
filemask pattern.

tFTPFileProperties Retrieves the properties of a specified file on an FTP server.

tFTPGet Downloads files to a local directory from an FTP directory.

tFTPPut Uploads files from a local directory to an FTP directory.

tFTPRename Renames files in an FTP directory.

tFTPTruncate Truncates files in an FTP directory.

FTP scenarios
Listing and getting files/folders on an FTP directory
Putting files onto an FTP server
Renaming a file located on an FTP server

FullRow

FullRow components

tFileStreamInputFullRow Reads data in a newly-created file row by row and sends the entire rows
within one single field to the next Job component, via a Row > Main link.

tFileInputFullRow Reads a file row by row and sends complete rows of data as defined in the
schema to the next component via a Row link.

FullRow scenario
Reading full rows in a delimited file

Global variable
Global variable components

tGlobalVarLoad Sets variables using the incoming data so that the data can be dynamically
reused by other subJobs.
tSetGlobalVar Facilitates the process of defining global variables.

Global variable scenarios


Selecting the salary records above the average using a Map/Reduce Job
Printing out the content of a global variable

Google BigQuery
Google BigQuery components

tBigQueryConfiguration Provides the connection configuration to Google BigQuery and Google


Cloud Storage for a Spark Job.

tBigQueryBulkExec Transfers given data to Google BigQuery.

tBigQueryInput Performs the queries supported by Google BigQuery.

tBigQueryOutput Transfers the data provided by its preceding component to Google


BigQuery.

tBigQueryOutputBulk Creates a .txt or .csv file for the data of large size so that you can process
it according to your needs before transferring it to Google BigQuery.

tBigQuerySQLRow Connects to Google BigQuery and performs queries to select data from
tables row by row or create or delete tables in Google BigQuery.

Google BigQuery scenarios


Performing a query in Google BigQuery
Writing data in Google BigQuery

Google Dataproc

Google Dataproc component

tGoogleDataprocManage Creates or deletes a Dataproc cluster in the Global region on Google Cloud
Platform.

Google Drive

Google Drive components

tGoogleDriveConnection Opens a Google Drive connection that can be reused by other Google Drive
components.

tGoogleDriveCopy Creates a copy of a file/folder in Google Drive.

tGoogleDriveCreate Creates a new folder in Google Drive.

tGoogleDriveDelete Deletes a file/folder in Google Drive.


tGoogleDriveGet Gets a file's content and downloads the file to a local directory.

tGoogleDriveList Lists all files, or folders, or both files and folders in a specified Google Drive
folder, in the domain, including both Shared Drive and My Drive, and all
shared drives.

tGoogleDrivePut Uploads data from a data flow or a local file to Google Drive.

Google Drive scenario


Managing files with Google Drive

Google PubSub

Google PubSub components

tPubSubInput Connects to the Google Cloud PubSub service that transmits messages to
the components that run transformations over these messages.

tPubSubInputAvro Connects to Google Cloud Pub/Sub to receive messages in the Avro format
for the components that run transformations over these messages.

tPubSubOutput Receives messages serialized into byte arrays by its preceding component
and issues these messages into a given PubSub service.

GPG
GPG component

tGPGDecrypt Calls the gpg -d command to decrypt a GnuPG-encrypted file and saves
the decrypted file in the specified directory.

GPG scenario
Decrypting a GnuPG-encrypted file and display its content

Greenplum

Greenplum components

tGreenplumBulkExec Improves performance when loading data in a Greenplum database.

tGreenplumClose Closes a connection to the Greenplum database.

tGreenplumCommit Commits global transaction in one go instead of repeating the operation for every
row or every batch and thus provides gain in performance.

tGreenplumConnection Opens a connection to the specified database that can then be reused in the
subsequent subJob or subJobs.
tGreenplumGPLoad Bulk loads data into a Greenplum table either from an existing data file, an input
flow, or directly from a data flow in streaming mode through a named-pipe.

tGreenplumInput Reads a database and extracts fields based on a query.

tGreenplumOutput Executes the action defined on the table and/or on the data of a table, according
to the input flow from the previous component.

tGreenplumOutputBulk Prepares the file to be used as parameter in the INSERT query to feed the
Greenplum database.

tGreenplumOutputBulkExec Provides performance gains during Insert operations to a Greenplum database.

tGreenplumRollback Avoids to commit part of a transaction involuntarily.

tGreenplumRow Acts on the actual DB structure or on the data (although without handling data),
depending on the nature of the query and the database.

Groovy
Groovy components

tGroovy tGroovy broadens the functionality if the Job, using the Groovy language
which is a simplified Java syntax.

tGroovyFile Broadens the functionality of Jobs using the Groovy language which is a
simplified Java syntax.

Groovy scenario
Calling a file which contains Groovy code

GS

GS components

tGoogleCloudConfiguration Provides the connection configuration to Google Cloud Platform for a Spark Job.

tGSConfiguration Provides the connection configuration to Google Cloud Storage for a Spark Job.

tGSBucketCreate Creates a new bucket which you can use to organize data and control access to
data in Google Cloud Storage.

tGSBucketDelete Deletes an empty bucket in Google Cloud Storage so as to release occupied


resources.

tGSBucketExist Checks the existence of a bucket in Google Cloud Storage so as to make further
operations.

tGSBucketList Retrieves a list of buckets from all projects or one specific project in Google
Cloud Storage.
tGSClose Closes an active connection to Google Cloud Storage in order to release the
occupied resources.

tGSConnection Provides the authentication information for making requests to the Google
Cloud Storage system and enables the reuse of the connection it creates to
Google Cloud Storage.

tGSCopy Copies or moves objects within a bucket or between buckets in Google Cloud
Storage.

tGSDelete Deletes the objects which match the specified criteria in Google Cloud Storage so
as to release the occupied resources.

tGSGet Retrieves objects which match the specified criteria from Google Cloud Storage
and outputs them to a local directory.

tGSList Retrieves a list of objects from Google Cloud Storage one by one.

tGSPut Uploads files from a local directory to Google Cloud Storage so that you can
manage them with Google Cloud Storage.

GS scenario
Managing files with Google Cloud Storage

HBase
HBase components

tHBaseConfiguration Enables the reuse of the connection configuration to HBase in the same
Job.

tHBaseLookupInput Provides lookup data to the main flow of a streaming Job.

tHBaseClose Closes an HBase connection you have established in your Job.

tHBaseConnection Establishes an HBase connection to be reused by other HBase components


in your Job.

tHBaseInput Reads data from a given HBase database and extracts columns of selection.

tHBaseOutput Writes columns of data into a given HBase database.

HBase scenario
Exchanging customer data with HBase

HCatalog

HCatalog components
tHCatalogInput Reads data from an HCatalog managed Hive database and send data to the
component that follows.

tHCatalogLoad Reads data directly from HDFS and writes this data into an established
HCatalog managed table.

tHCatalogOperation Prepares the HCatalog managed database/table/partition to be processed.

tHCatalogOutput Receives data from its incoming flow and writes this data into an HCatalog
managed table.

HCatalog scenario
Managing HCatalog tables on Hortonworks Data Platform

HDFS
HDFS components

tHDFSConfiguration Enables the reuse of the connection configuration to HDFS in the same
Job.

tHDFSCompare Compares two files in HDFS and based on the read-only schema, generates
a row flow that presents the comparison information.

tHDFSConnection Connects to a given HDFS so that the other Hadoop components can reuse
the connection it creates to communicate with this HDFS.

tHDFSCopy Copies a source file or folder into a target directory in HDFS and removes
this source if required.

tHDFSDelete Deletes a file located on a given Hadoop distributed file system (HDFS).

tHDFSExist Checks whether a file exists in a specific directory in HDFS.

tHDFSGet Copies files from Hadoop distributed file system(HDFS), pastes them in a
user-defined directory and if needs be, renames them.

tHDFSInput Extracts the data in a HDFS file for other components to process it.

tHDFSList tHDFSList retrieves a list of files or folders based on a filemask pattern and
iterates on each unity.

tHDFSOutput Writes data flows it receives into a given Hadoop distributed file system
(HDFS).

tHDFSOutputRaw Transfers data of different formats such as hierarchical data in the form of a
single column into a given HDFS file system.

tHDFSProperties Creates a single row flow that displays the properties of a file processed in
HDFS.
tHDFSPut Connects to Hadoop distributed file system to load large-scale files into it
with optimized performance.

tHDFSRename Renames the selected files or specified directory on HDFS.

tHDFSRowCount Reads a file in HDFS row by row in order to determine the number of rows
this file contains.

HDFS scenarios
Checking the existence of a file in HDFS
Computing data with Hadoop distributed file system
Using HDFS components to work with Azure Data Lake Storage (ADLS)
Iterating on a HDFS directory

Hive

Hive components

tHiveClose Closes connection to a Hive database.

tHiveConfiguration Enables the reuse of the connection configuration to Hive in the same Job.

tHiveConnection Establishes a Hive connection to be reused by other Hive components in your Job.

tHiveCreateTable Creates Hive tables that fit a wide range of Hive data formats.

tHiveInput Extracts data from Hive and sends the data to the component that follows.

tHiveLoad Writes data of different formats into a given Hive table or to export data from a Hive
table to a directory.

tHiveOutput Connects to a given Hive database and writes the data it receives into a given Hive table
or a directory in HDFS.

tHiveRow Acts on the actual DB structure or on the data without handling data itself, depending
on the nature of the query and the database.

tHiveWarehouseConfiguration Enables the reuse of the Hive Warehouse Connector connection configuration to Hive in
the same Job.

tHiveWarehouseInput Extracts data from Hive and sends the data to the component that follows using Hive
Warehouse Connector.

tHiveWarehouseOutput Connects to a given Hive database and writes the received data into a given Hive table
or a directory in HDFS using Hive Warehouse Connector.

Hive scenarios
Creating a JDBC Connection to Azure HDInsight Hive
Creating a partitioned Hive table

HSQLDB
HSQLDB components

tHSQLDbInput Executes a DB query with a strictly defined order which must correspond to
the schema definition and then it passes on the field list to the next
component via a Main row link.

tHSQLDbOutput Executes the action defined on the table and/or on the data contained in
the table, based on the flow incoming from the preceding component in
the Job.

tHSQLDbRow Acts on the actual DB structure or on the data (although without handling
data), depending on the nature of the query and the database.

HTTP

HTTP component

tHttpRequest Sends an HTTP request to the server and outputs the response information
locally.

HTTP scenarios
Sending a HTTP request to the server and saving the response information to a local file
Sending a POST request from a local JSON file

Impala

Impala components

tImpalaClose Closes connection to an Impala database.

tImpalaConnection Establishes an Impala connection to be reused by other Impala


components in your Job.

tImpalaCreateTable Creates Impala tables that fit a wide range of Impala data formats.

tImpalaInput Executes the select queries to extract the corresponding data and sends
the data to the component that follows.

tImpalaLoad Writes data of different formats into a given Impala table or to export data
from an Impala table to a directory.

tImpalaOutput Executes the action defined on the data contained in the table, based on
the flow incoming from the preceding component in the Job.

tImpalaRow Acts on the actual DB structure or on the data (although without handling
data).

Informix
Informix components
tInformixBulkExec Executes Insert operations in Informix databases.

tInformixClose Closes connection to Informix databases.

tInformixCommit Makes a global commit just once instead of commiting every row or batch
of rows separately.

tInformixConnection Opens a connection to the specified database that can then be reused in
the subsequent subJob or subJobs.

tInformixInput Reads a database and extracts fields based on a query.

tInformixOutput Executes the action defined on the table and/or on the data contained in
the table, based on the flow incoming from the preceding component in
the Job.

tInformixOutputBulk Prepares the file to be used as a parameter in the INSERT query used to
feed Informix databases.

tInformixOutputBulkExec Carries out Insert operations in Informix databases using the data
provided.

tInformixRollback Prevents involuntary transaction commits by canceling transactions in


connected databases.

tInformixRow Acts on the actual DB structure or on the data (although without handling
data) thanks to the SQLBuilder that helps you write easily your SQL
statements.

tInformixSP Centralises and calls multiple and complex queries in a database.

Ingres
Ingres components

tIngresBulkExec (deprecated) Inserts data in bulk to a table in the Ingres DBMS for performance gain.

tIngresClose (deprecated) Closes the transaction committed in the connected Ingres database.

tIngresCommit (deprecated) Commits in one go, using a unique connection, a global transaction instead of doing that on
every row or every batch and thus provides gain in performance.

tIngresConnection (deprecated) Opens a connection to the specified database that can then be reused in the subsequent
subJob or subJobs.

tIngresInput (deprecated) Reads an Ingres database and extracts fields based on a query.

tIngresOutput (deprecated) Executes the action defined on the table and/or on the data contained in the table, based on
the flow incoming from the preceding component in the Job.

tIngresOutputBulk (deprecated) Prepares the file whose data is inserted in bulk to the Ingres DBMS for performance gain.
tIngresOutputBulkExec Inserts data in bulk to a table in the Ingres DBMS for performance gain.
(deprecated)

tIngresRollback (deprecated) Avoids to commit part of a transaction involuntarily by canceling the transaction committed
in the connected database.

tIngresRow (deprecated) Acts on the actual DB structure or on the data (although without handling data) using the
SQLBuilder tool to write easily your SQL statements.

Ingres scenario
Loading data to a table in the Ingres DBMS

Interbase

Interbase components

tInterbaseClose (deprecated) Closes the transaction committed in the connected Interbase database.

tInterbaseCommit (deprecated) Commits in one go a global transaction instead of doing that on every row or every batch
and thus provides gain in performance.

tInterbaseConnection Opens a connection to the specified database that can then be reused in the subsequent
(deprecated) subJob or subJobs.

tInterbaseInput (deprecated) Reads an Interbase database and extracts fields based on a query.

tInterbaseOutput (deprecated) Executes the action defined on the table and/or on the data contained in the table, based on
the flow incoming from the preceding component in the Job.

tInterbaseRollback Avoids to commit part of a transaction involuntarily by canceling the transaction committed
(deprecated) in the connected Interbase database.

tInterbaseRow (deprecated) Acts on the actual database structure or on the data (although without handling data) using
the SQLBuilder tool to write easily your SQL statements.

Internet (Integration)

Internet (Integration) component

tFileFetch Retrieves a file through the given protocol (HTTP, HTTPS, FTP, or
SMB).

Internet (Integration) scenarios


Fetching data through HTTP
Reusing stored cookie to fetch files through HTTP

Jasper
Jasper components
tJasperOutput Creates a report in rich formats using Jaspersoft's iReport.

tJasperOutputExec Creates a report in rich formats using Jaspersoft's iReport and offers a
performance gain as it functions as a combination of an input component
and a tJasperOutput component.

Jasper scenario
Generating a report against a .jrxml template

Java custom code for Map Reduce

Java custom code for Map Reduce component

tJavaMR Provides an editor that enables you to enter personalized MapReduce code
in order to integrate it in Talend program.

Java custom code for Map Reduce scenario


Counting words using custom map and reduce code (deprecated)

Java custom code for Storm


Java custom code for Storm component

tJavaStorm (deprecated) Provides a Java code editor that lets you enter the custom Storm code you
want to use in the Storm topology you are designing.

Java custom code for Storm scenario


Analyzing people's activities using a Storm topology (deprecated)

Java custom code

Java custom code components

tJava Extends the functionalities of a Talend Job using custom Java commands.

tJavaFlex Provides a Java code editor that lets you enter personalized code in order
to integrate it in Talend program.

tJavaRow Provides a code editor that lets you enter the Java code to be applied to
each row of the flow.

Java custom code scenarios


Using tJavaFlex to display file content based on a dynamic schema
Using tJavaRow to handle file content based on a dynamic schema
Checking the format of an e-mail address
Generating data flow
Printing out a variable content
Processing rows of data with tJavaFlex
Redirecting the standard output to a file for the entire Job
Transforming data line by line using tJavaRow

JavaDB
JavaDB components

tJavaDBInput Reads a database and extracts fields based on a query

tJavaDBOutput Executes the action defined on the table and/or on the data contained in
the table, based on the flow incoming from the preceding component in
the Job.

tJavaDBRow Acts on the actual database structure or on the data (although without
handling data) using the SQLBuilder tool to write easily your SQL
statements.

JBoss ESB

JBoss ESB components

tJBossESBInput Retrieves a message from a JBossESB server to process it as a flow that can
be used in a Talend Job.

tJBossESBOutput Transforms the data used in a Talend Job into a JBossESB message.

JDBC

JDBC components

tJDBCConfiguration Stores connection information and credentials to be reused by other JDBC


components.

tJDBCLookupInput Reads a database and extracts fields based on a query.

tJDBCClose Closes an active JDBC connection to release the occupied resources.

tJDBCColumnList Lists all column names of a given JDBC table.

tJDBCCommit Commits in one go a global transaction instead of doing that on every row
or every batch and thus provides gain in performance.

tJDBCConnection Opens a connection to the specified database that can then be reused in
the subsequent subJob or subJobs.

tJDBCInput Reads any database using a JDBC API connection and extracts fields based
on a query.

tJDBCOutput Executes the action defined on the data contained in the table, based on
the flow incoming from the preceding component in the Job.
tJDBCRollback Avoids commiting part of a transaction accidentally by canceling the
transaction committed in the connected database.

tJDBCRow Acts on the actual DB structure or on the data (although without handling
data) using the SQLBuilder tool to write easily your SQL statements.

tJDBCSP Centralizes multiple or complex queries in a database in order to call them


easily.

tJDBCTableList Lists the names of a given set of JDBC tables using a select statement
based on a Where clause.

JIRA

JIRA components

tJIRAInput Retrieves the issue information based on a JQL query or retrieve the
project information based on a specified project ID from JIRA.

tJIRAOutput Inserts, updates, or deletes the issue or project information in JIRA.

JIRA scenarios
Creating an issue in JIRA application
Retrieving the project information from JIRA application
Updating an issue in JIRA application

JMS

JMS components

tJMSInput Creates an interface between a Java application and a Message-Oriented


middleware system.

tJMSOutput Creates an interface between a Java application and a Message-Oriented


middleware system.

JMS scenario
Enqueuing/dequeuing a message on the ActiveMQ server

JSON
JSON components

tFileStreamInputJSON Extracts JSON data from a file, then transfers the data to, for instance, a file
or a database table.

tFileInputJSON Extracts JSON data from a file and transfers the data to a file, a database
table, etc.
tFileOutputJSON Receives data and rewrites it in a JSON structured data block in an output
file.

JSON scenarios
Extracting JSON data from a URL
Extracting JSON data from a file using JSONPath
Extracting JSON data from a file using JSONPath without setting a loop node
Extracting JSON data from a file using XPath
Writing a JSON structured file

Kafka

Kafka components

tKafkaInputAvro Transmits Avro-formatted messages you need to process to its following


component in the Job you are designing.

tKafkaCommit Saves the current state of the tKafkaInput to which it is connected.

tKafkaConnection Opens a reusable Kafka connection.

tKafkaCreateTopic Creates a Kafka topic that the other Kafka components can use.

tKafkaInput Transmits messages you need to process to the components that follow in
the Job you are designing.

tKafkaOutput Publishes messages into a Kafka system.

Kafka scenarios
Analyzing a Twitter flow in near real-time
Analyzing people's activities using a Storm topology (deprecated)

Kerberos

Kerberos component

tSetKerberosConfiguration Sets the relevant information for Kerberos authentication.

Keystore
Keystore component

tSetKeystore Sets the authentication data type between PKCS 12 and


JKS.

Keystore scenario
Extracting customer information from a private WSDL file

Kinesis
Kinesis components

tKinesisInput Acts as consumer of an Amazon Kinesis stream to pull messages from this
Kinesis stream.

tKinesisInputAvro Acts as consumer of an Amazon Kinesis stream to pull messages from this
Kinesis stream.

tKinesisOutput Acts as data producer to put data to an Amazon Kinesis stream for real-
time ingestion.

Kinesis scenario
Working with Amazon Kinesis and Big Data Streaming Jobs

Kudu
Kudu components

tKuduConfiguration Enables the reuse of the connection configuration to Cloudera Kudu in the
same Job.

tKuduInput Retrieves data from a Cloudera Kudu table and sends them to the
component that follows for transformation.

tKuduOutput Creates, updates or deletes data in a Cloudera Kudu table.

Kudu scenario
Writing and reading data from Cloudera Kudu using a Spark Batch Job

LDAP

LDAP components

tLDAPAttributesInput Analyses each object found via the LDAP query and lists a collection of
attributes associated with the object.

tLDAPClose Disconnects one connection to the LDAP Directory server so as to release


occupied resources.

tLDAPConnection Creates a connection to an LDAP Directory server.

tLDAPInput Executes an LDAP query based on the given filter and corresponding to the
schema definition. Then it passes on the field list to the next component
via a Row > Main link.

tLDAPOutput Executes an LDAP query based on the given filter and corresponding to the
schema definition. Then it passes on the field list to the next component
via a Row > Main link.

tLDAPRenameEntry Renames ones or more entries in a specific LDAP directory.


LDAP scenarios
Displaying LDAP directory's filtered content
Editing data in a LDAP directory

LDIF

LDIF components

tFileInputLDIF Reads an LDIF file row by row to split them up into fields and sends the
fields as defined in the schema to the next component using a Row
connection.

tFileOutputLDIF Writes or modifies an LDIF file with data separated in respective entries
based on the schema defined, or else deletes content from an LDIF file.

LDIF scenario
Writing data from a database table into an LDIF file

Library import
Library import component

tLibraryLoad Loads useable Java libraries in a Job.

Library import scenario


Checking the format of an e-mail address

Logs and errors (Integration)

Logs and errors (Integration) components

tAssert Generates the boolean evaluation on the concern for the Job execution
status and provides the Job status messages to tAssertCatcher.

tAssertCatcher Generates a data flow consolidating the status information of a job


execution and transfer the data into defined output files.

tChronometerStart Operates as a chronometer device that starts calculating the processing


time of one or more subJobs in the main Job, or that starts calculating the
processing time of part of your subJob.

tChronometerStop Operates as a chronometer device that stops calculating the processing


time of one or more subJobs in the main Job, or that stops calculating the
processing time of part of your subJob. tChronometerStop displays the
total execution time.

tDie Triggers the tLogCatcher component for exhaustive log before killing the
Job.
tFlowMeter Counts the number of rows processed in the defined flow, so this number
can be caught by the tFlowMeterCatcher component for logging purposes.

tFlowMeterCatcher Operates as a log function triggered by the use of a tFlowMeter component


in the Job.

tLogCatcher Operates as a log function triggered by one of the three: Java exception,
tDie or tWarn, to collect and transfer log data.

tLogRow Displays data or results in the Run console to monitor data processed.

tStatCatcher Gathers the Job processing metadata at the Job level and at the
component level and transfers the log data to the subsequent component
for display or storage.

tWarn Triggers a warning often caught by the tLogCatcher component for


exhaustive log.

Logs and errors (Integration) scenarios


Catching flow metrics from a Job
Catching messages triggered by a tWarn component
Catching the message triggered by a tDie component
Displaying the statistics log of Job execution
Measuring the processing time of a subJob and part of a subJob
Setting up the assertive condition for a Job execution
Viewing product orders status (on a daily basis) against a benchmark number

Machine Learning

Machine Learning components

tALSModel Generates an user-ranking-product associated matrix, based on given user-product


interactive data.

tClassify Predicts which class an element belongs to, based on the classifier model generated by a
model training component.

tClassifySVM Predicts which class an element belongs to, based on the classifier model generated by
tSVMModel.

tDecisionTreeModel Analyzes feature vectors usually prepared and provided by tModelEncoder to generate a
classifier model that is used by tPredict to classify given elements.

tGradientBoostedTreeModel Analyzes feature vectors usually prepared and provided by tModelEncoder to generate a
classifier model that is used by tPredict to classify given elements.

tKMeansModel Analyzes incoming datasets based on applying the K-Means algorithm.

tKMeansStrModel Analyzes incoming datasets in near real-time, based on applying the K-Means algorithm.

tLinearRegressionModel Builds a linear regression model using a training dataset.


tLogisticRegressionModel Analyzes feature vectors usually pre-processed by tModelEncoder to generate a classifier
model that is used by tPredict to classify given elements.

tMahoutClustering (deprecated) Groups unlabeled numerical data into clusters that can reveal interesting patterns or helps
identifying abnormal data items in the data set.

tModelEncoder Performs featurization operations to transform data into the format expected by the model
training components such as tLogisticRegressionModel or tKMeansModel.

tNaiveBayesModel Generates a classifier model that is used by tPredict to classify given elements.

tPredict Predicts the situation of an element.

tPredictCluster Predicts the cluster of an element.

tRandomForestModel Analyzes feature vectors.

tRecommend Recommends products to users known to this model, based on the user-product
recommender model generated by tASLModel.

tSVMModel Generates an SVM-based classifier model that can be used by tPredict to classify given
elements.

Machine Learning scenarios


Creating a classification model to filter spam
Grouping customer numerical data into clusters on HDFS (deprecated)
Modeling the accident-prone areas in a city

Mail
Mail components

tFileInputMail Reads the standard key data of a given MIME or MSG email file.

tSendMail Notifies recipients about a particular state of a Job or possible


errors.

Mail scenarios
Extracting key fields from an email
Retrieving emails and extracting data from email files
Sending an email on error
Sending an email with attachment in HTML format

MapRDB

MapRDB components

tMapRDBConfiguration Stores connection information and credentials to be reused by other


MapRDB components.
tMapRDBLookupInput Provides lookup data to the main flow of a streaming Job.

tMapRDBClose Closes an MapRDB connection you have established in a same Job.

tMapRDBConnection Establishes a MapRDB connection to be reused by other MapRDB


components in a same Job.

tMapRDBInput Reads data from a given MapRDB database and extracts columns of
selection.

tMapRDBOutput Writes columns of data into a given MapRDB database.

tMapROjaiInput Reads documents from a MapR-DB database to load the data in a given
Job.

tMapROjaiOutput Inserts, replaces or deletes documents in a MapR-DB database to be used


as document database, based on the incoming flow from the preceding
component in the Job.

MapRDB scenario
Writing candidate data in a MapR-DB OJAI database

MapRStreams

MapRStreams components

tMapRStreamsInputAvro Transmits messages in the Avro format to the Job that runs transformations over
these messages. Only MapR V5.2 onwards is supported by this component.

tMapRStreamsCommit Connects to a given tMapRStreamsInput to perform a consumer offset commit.

tMapRStreamsConnection Opens a reusable connection to a given MapR Streams cluster so that the other
MapR Streams components can reuse this connection.

tMapRStreamsCreateStream Creates a MapR Streams stream or topic that the other MapR Streams components
can use.

tMapRStreamsInput Transmits messages to the Job that runs transformations over these messages.
Only MapR V5.2 onwards is supported by this component.

tMapRStreamsOutput Publishes messages into a MapR Streams system. Only MapR V5.2 onwards is
supported by this component.

Marketo
Marketo components

tMarketoBulkExec Imports leads or custom objects into Marketo from a local file in the REST
API mode.
tMarketoCampaign Retrieves campaign records, activity and campaign changes related data
from Marketo.

tMarketoConnection Opens a connection to Marketo that can then be reused by other Marketo
components.

tMarketoInput Retrieves lead records, activity history, lead changes, and custom object
related data from Marketo.

tMarketoListOperation Adds/removes one or more leads to/from a list in Marketo. Also, it helps
you verify the existence of one or more leads in a list in Marketo.

tMarketoOutput Writes lead records or custom object records from the incoming data flow
into Marketo.

Marketo scenarios
Adding a lead record to a Marketo list using SOAP API
Transmitting data with Marketo using REST API

MarkLogic
MarkLogic components

tMarkLogicBulkLoad Imports local files into a MarkLogic server database in bulk mode using the
MarkLogic Content Pump (MLCP) tool.

tMarkLogicClose Closes an active connection to a MarkLogic database to release the


occupied resources.

tMarkLogicConnection Opens a connection to a MarkLogic database that can then be reused by


other MarkLogic components.

tMarkLogicInput Searches document content in a MarkLogic database based on a string


query.

tMarkLogicOutput Creates, updates or deletes document content in a MarkLogic database.

MaxDB

MaxDB components

tMaxDBInput Reads a database and extracts fields based on a query.

tMaxDBOutput Writes, updates, makes changes or suppresses entries in a database.

tMaxDBRow Acts on the actual DB structure or on the data (although without handling
data), depending on the nature of the query and the database.

MDM (Master Data Management)

MDM connection and transaction


MDM connection and transaction components

tMDMClose Terminates an open MDM server connection after the execution of the
proceeding subJob.

tMDMCommit Commits all changes to the database made within the scope of a
transaction in MDM.

tMDMConnection Opens an MDM server connection for convenient reuse in the current Job
or transaction.

tMDMRollback Rolls back any changes made in the database rather than definitively
committing them, for example to prevent partial commits if an error
occurs.

MDM data processing


MDM data processing components

tMDMBulkLoad Uses bulk mode to write XML structured master data into the MDM server.

tMDMDelete Deletes master data records from specific entities in the MDM Hub.

tMDMInput Reads data in an MDM Hub and thus makes it possible to process this data.

tMDMOutput Writes data into or removes data from the MDM server.

tMDMRestInput Reads data through the REST API from the MDM Hub for further processing.

tMDMSP Offers a convenient way to centralize multiple or complex queries in an


MDM Hub and calls the stored procedure easily.

tMDMViewSearch Retrieves the MDM records from an MDM hub by applying filtering criteria
you have created in a specific view and puts out results in XML structure.

MDM data processing scenarios

Deleting master data from an MDM Hub


Executing a stored procedure using tMDMSP
Loading records into a business entity
Reading data from an MDM hub through the REST API
Reading master data from an MDM hub
Reading staging data from MDM
Removing master data partially from the MDM hub
Retrieving records from an MDM hub via an existing view
Writing master data in an MDM hub
Writing staging data into MDM

MDM event processing


MDM event processing components
tMDMReceive Decodes a context parameter holding MDM XML data and transforms it into
a flat schema.

tMDMRouteRecord Helps Event Manager to identify the changes you have made on your data
so that correlative actions can be triggered.

tMDMTriggerInput Reads the XML message (Document type) sent by MDM and passes the
information to the component that follows.

tMDMTriggerOutput Receives an XML flow (Document type) from the preceding component in
the Job.

MDM event processing scenarios

Exchanging the event information about an MDM record


Extracting information from an MDM record in XML
Routing an update report record to Event Manager

MemSQL
MemSQL components

tMemSQLClose (deprecated) Closes the transaction committed in the MemSQL database.

tMemSQLConnection Opens a connection to the specified database that can then be reused in the subsequent
(deprecated) subJob or subJobs.

tMemSQLInput (deprecated) Executes a DB query with a strictly defined order which must correspond to the schema
definition.

tMemSQLOutput (deprecated) Reads data incoming from the preceding component in the Job and executes the action
defined on a given MemSQL table and/or on the data contained in the table.

tMemSQLRow (deprecated) Acts on the actual database structure or on the data (although without handling data).

MemSQL scenario
Writing data to and reading data from a MemSQL database table

Microsoft CRM

Microsoft CRM components

tMicrosoftCrmInput Extracts data from a Microsoft CRM database based on conditions set on
specific columns.

tMicrosoftCrmOutput Writes data into a Microsoft CRM database.

Microsoft CRM scenario


Writing data in a Microsoft CRM database and putting conditions on columns to extract specified rows

Microsoft MQ
Microsoft MQ components

tMicrosoftMQInput Retrieves the first message in a given Microsoft message queue (only
support String).

tMicrosoftMQOutput Writes a defined column of given inflow data to Microsoft message queue
(only support String type).

Microsoft MQ scenario
Writing and fetching queuing messages from Microsoft message queue

MOM

MOM components

tMomCommit Commits data on the MQ Server.

tMomConnection Opens a connection to the MQ Server for communication.

tMomInput Fetches a message from a queue on a Message-Oriented Middleware


(MOM) system and passes it on to the next component.

tMomMessageIdList Fetches a message ID list from a queue on a Message-Oriented middleware


system and passes it to the next component.

tMomOutput Adds a message to a Message-Oriented Middleware system queue in order


for it to be fetched asynchronously.

tMomRollback Cancels the transaction committed in the MQ Server.

MOM scenarios
Asynchronous communication via a MOM server
Transmitting XML files via a MOM server

Mondrian
Mondrian component

tMondrianInput (deprecated) Executes a multi-dimensional expression (MDX) query corresponding to the dataset
structure and schema definition.

Mondrian scenario
Extracting multi-dimenstional datasets from a MySQL database (Cross-join tables)

MongoDB

MongoDB components
tMongoDBConfiguration Stores connection information and credentials to be reused by other MongoDB
components.

tMongoDBLookupInput Executes a database query with a strictly defined order which must correspond
to the schema definition.

tMongoDBBulkLoad Imports data files in different formats (CSV, TSV or JSON) into the specified
MongoDB database so that the data can be further processed.

tMongoDBClose Closes a connection to the MongoDB database.

tMongoDBConnection Creates a connection to a MongoDB database and reuse that connection in other
components.

tMongoDBGridFSDelete Automates the delete action over specific files in MongoDB GridFS.

tMongoDBGridFSGet Connects to a MongoDB GridFS system to copy files from it.

tMongoDBGridFSList Retrieves a list of files based on a query.

tMongoDBGridFSProperties Obtains information about the properties of given files selected based on a
query.

tMongoDBGridFSPut Connects to a MongoDB GridFS system to load files into it.

tMongoDBInput Retrieves records from a collection in the MongoDB database and transfers them
to the following component for display or storage.

tMongoDBOutput Executes the action defined on the collection in the MongoDB database.

tMongoDBRow Executes the commands and functions of the MongoDB database.

MongoDB scenarios
Reading and writing data in MongoDB using a Spark Streaming Job
Writing and reading data from MongoDB using a Spark Batch Job
Creating a collection and writing data to it
Importing data into MongoDB database
Managing files using MongoDB GridFS
Retrieving data from a collection by advanced queries
Upserting records in a collection
Using MongoDB functions to create a collection and write data to it

MQTT

MQTT components

tMQTTInput Acts as consumer of a MQTT topic to stream messages from this topic.

tMQTTOutput Acts as publisher to a MQTT topic to stream messages to this topic in real
time.

MS Delimited
MS Delimited components

tFileInputMSDelimited Reads the data structures (schemas) of a multi-structured delimited file


and sends the fields as defined in the different schemas to the next
components using Row connections.

tFileOutputMSDelimited Creates a complex multi-structured delimited file, using data structures


(schemas) coming from several incoming Row flows.

MS Delimited scenario
Reading a multi structure delimited file

MS Positional
MS Positional components

tFileInputMSPositional Reads the data structures (schemas) of a multi-structured positional file


and sends the fields as defined in the different schemas to the next
components using Row connections.

tFileOutputMSPositional Creates a complex multi-structured file, using data structures (schemas)


coming from several incoming Row flows.

MS Positional scenario
Reading data from a positional file

MS XML connectors

MS XML connectors components

tFileInputMSXML Reads the data structures (schemas) of a multi-structured XML file and
sends the fields as defined in the different schemas to the next
components using Row connections.

tFileOutputMSXML Creates a complex multi-structured XML file, using data structures


(schemas) coming from several incoming Row flows.

MS XML connectors scenario


Reading a multi-structure XML file

MSSql
MSSql components

tMSSqlBulkExec Offers gains in performance while executing the Insert operations to a


Microsoft SQL Server database.

tMSSqlClose Closes a transaction in the MSSql databases.


tMSSqlColumnList Lists all column names of a given MSSql table.

tMSSqlCommit Commits in one go, using a unique connection, a global transaction instead
of doing that on every row or every batch and thus provides gain in
performance.

tMSSqlConnection Opens a connection to the specified database that can then be reused in
the subsequent subJob or subJobs.

tMSSqlInput Executes a DB query with a strictly defined order which must correspond to
the schema definition.

tMSSqlLastInsertId Retrieves the last primary keys added by a user to a MSSql table.

tMSSqlOutput Executes the action defined on the table and/or on the data contained in
the table, based on the flow incoming from the preceding component in
the Job.

tMSSqlOutputBulk Prepares the file to be used as parameter in the INSERT query to feed the
MSSql database.

tMSSqlOutputBulkExec Gains in performance during Insert operations to a Microsoft SQL Server


database.

tMSSqlRollback Cancels the transaction commit in the MSSql database and thus avoids to
commit part of a transaction involuntarily.

tMSSqlRow Acts on the actual DB structure or on the data (although without handling
data).

tMSSqlSP Offers a convenient way to centralize multiple or complex queries in a


database and calls them easily.

tMSSqlTableList Lists the names of a given set of MSSql tables using a select statement
based on a Where clause.

MSSql scenarios
Inserting data into a database table and extracting useful information from it
Retrieving personal information using a stored procedure

MySQL
MySQL components

tMysqlConfiguration Stores connection information and credentials to be reused by other


MySQL components.

tMySQLInvalidRows Checks MySQL database rows against specific Data Quality patterns
(regular expression) or Data Quality rules (business rule).

tMysqlLookupInput Reads a MySQL database and extracts fields based on a query.


tMySQLValidRows Checks MySQL database rows against Data Quality patterns (regular
expression).

tMysqlBulkExec Offers gains in performance while executing the Insert operations on a


MySQL or Aurora database.

tMysqlClose Closes the transaction committed in a Mysql database.

tMysqlColumnList Iterates on all columns of a given Mysql table and lists column names.

tMysqlCommit Commits in one go, using a unique connection, a global transaction instead
of doing that on every row or every batch and thus provides gain in
performance.

tMysqlConnection Opens a connection to the specified MySQL database for reuse in the
subsequent subJob or subJobs.

tMysqlInput Executes a DB query with a strictly defined order which must correspond to
the schema definition.

tMysqlLastInsertId Obtains the primary key value of the record that was last inserted in a
Mysql table by a user.

tMysqlOutput Writes, updates, makes changes or suppresses entries in a database.

tMysqlOutputBulk Writes a file with columns based on the defined delimiter and the MySQL or
Aurora standards.

tMysqlOutputBulkExec Executes the Insert action in the specified MySQL or Aurora database.

tMysqlRollback Cancels the transaction commit in the connected MySQL database to avoid
committing part of a transaction involuntarily.

tMysqlRow Executes the stated SQL query on the specified MySQL database.

tMysqlSP Calls a MySQL database stored procedure.

tMysqlTableList Lists the names of a given set of Mysql tables using a select statement
based on a Where clause.

MySQL scenarios
Checking customer table against a given DQ rule to select customer records
Controlling the data definition language via tMysqlOutput when creating a table
Reading email addresses from a DB table and retrieving specific data
Updating a database table using tMysqlOutput in a Big Data Streaming Job
Writing dynamic columns from a source file to a database
Combining two flows for selective output
Getting the ID for the last inserted record with tMysqlLastInsertId
Inserting a column and altering data using tMysqlOutput
Inserting data in bulk in MySQL database
Inserting data in mother/daughter tables
Inserting transformed data in MySQL database
Iterating on DB tables and deleting their content using a user-defined SQL template
Iterating on a DB table and listing its column names
Removing and regenerating a MySQL table index
Retrieving data in error with a Reject link
Sharing a database connection between a parent Job and child Job
Updating data using tMysqlOutput
Using PreparedStatement objects to query data
Using tMysqlSP to find a State Label using a stored procedure
Writing columns from a MySQL database to an output file using tMysqlInput

NamedPipe
NamedPipe components

tNamedPipeClose Closes a named-pipe at the end of a process.

tNamedPipeOpen Opens a named-pipe for writing data into it.

tNamedPipeOutput Writes data into an existing open named-pipe.

NamedPipe scenario
Writing and loading data through a named-pipe

Natural Language Processing

Natural Language Processing components

tCompareColumns Compares two columns to design useful features for generating a


classification model.

tNLPModel Uses an input in CoNLL format and automatically generates token-level


features to create a model for classification tasks like Named Entity
Recognition (NER).

tNLPPredict Uses a classifier model generated by tNLPModel to predict and label the
input text.

tNLPPreprocessing Prepares a text sample and divides it into tokens, which can be words,
numbers or punctuation marks.

Natural Language Processing scenarios


Extracting named entities using a classification model
Generating a classification model
Preparing a text sample to be used for learning a model

Neo4j
Neo4j components

tNeo4jv4Close Closes a connection to a Neo4j version 4.x database.


tNeo4jv4Connection Establishes a connection to a Neo4j version 4.x database for later use.

tNeo4jv4Input Reads data from Neo4j version 4.x and sends data in the output flow.

tNeo4jv4Output Receives data from the preceding component and writes the data into a Neo4j version 4.x
database.

tNeo4jv4Row Executes the stated Cypher query onto the specified Neo4J version 4.x database.

tNeo4jBatchOutput Receives data from the preceding component and writes the data into a local Neo4j
database.

tNeo4jBatchOutputRelationship Receives data from the preceding component and writes relationships in bulk into a local
Neo4j database.

tNeo4jBatchSchema Defines the schema of a local Neo4j database.

tNeo4jClose Close an active connection to an Neo4j database in embedded mode.

tNeo4jConnection Opens a connection to a Neo4j database to be reuse by other Neo4j components.

tNeo4jImportTool Uses Neo4j Import Tool to create a Neo4j database and import large amounts of data in bulk
from CSV files to this database.

tNeo4jInput Reads data from Neo4j and sends data in the output flow.

tNeo4jOutput Receives data from the preceding component and writes the data into Neo4j.

tNeo4jOutputRelationship Receives data from the preceding component and writes relationships into Neo4j.

tNeo4jRow Executes the stated Cypher query onto the specified Neo4J database.

Neo4j scenarios
Creating nodes with a label using a Cypher query
Importing data from a CSV file to Neo4j and creating relationships using a single Cypher query
Importing data from a CSV file to Neo4j using a Cypher query
Writing information of actors and movies to Neo4j with hierarchical relationship using Neo4j Batch components
Writing data to a Neo4j database and reading specific data from it
Writing family information to Neo4j and creating relationships
Writing information of actors and movies to Neo4j with hierarchical relationship

Netezza
Netezza components

tNetezzaBulkExec Offers gains in performance while carrying out the Insert operations to a
Netezza database.

tNetezzaClose Closes the transaction committed in the connected Netazza database.

tNetezzaCommit Validates the data processed through the Job into the connected Netezza
database.
tNetezzaConnection Opens a connection to a Netezza database to be reused in the subsequent
subJob or subJobs.

tNetezzaInput Reads a database and extracts fields from a Netezza database based on a
query.

tNetezzaNzLoad Inserts data into a Netezza database table using Netezza's nzload utility.

tNetezzaOutput Writes, updates, makes changes or suppresses entries in a Netezza


database.

tNetezzaRollback Cancels the transaction committed in the connected Netezza database to


avoid committing part of a transaction involuntarily.

tNetezzaRow Executes the SQL query stated onto the specified Netezza database.

Netsuite

Netsuite components

tNetSuiteV2019Connection Creates a connection to a NetSuite SOAP server by leveraging NetSuite v2019 features so
that other NetSuite V2019 components in the Job can reuse the connection.

tNetSuiteV2019Input Invokes the NetSuite SOAP service and retrieves data according to the conditions you
specify by leveraging NetSuite v2019 features.

tNetSuiteV2019Output Invokes the NetSuite SOAP service and inserts, updates, or removes data on the NetSuite
SOAP server by leveraging NetSuite v2019 features.

tNetsuiteConnection Creates a connection to the NetSuite SOAP server so that other NetSuite components in the
(deprecated) Job can reuse the connection.

tNetsuiteInput (deprecated) Invokes the NetSuite SOAP service and retrieves data according to the conditions you
specify.

tNetsuiteOutput (deprecated) Invokes the NetSuite SOAP service and inserts, updates, or removes data on the NetSuite
SOAP server.

Netsuite scenario
Handling data with NetSuite

Openbravo ERP

Openbravo ERP components

tOpenbravoERPInput Extracts data from OpenBravoERP database according to the conditions defined in specific
(deprecated) columns.

tOpenbravoERPOutput Writes data in an OpenbravoERP database.


(deprecated)
Oracle

Oracle components

tOracleConfiguration Stores connection information and credentials to be reused by other


Oracle components.

tOracleInvalidRows Checks Oracle database rows against specific Data Quality patterns
(regular expression) or Data Quality rules (business rule).

tOracleLookupInput Reads a database and extracts fields based on a query.

tOracleValidRows Checks Oracle database rows against Data Quality patterns (regular
expression).

tOracleBulkExec Offers gains in performance during operations performed on data of an


Oracle database.

tOracleClose Closes the transaction committed in the connected Oracle database.

tOracleCommit Validates the data processed through the Job into the connected Oracle
database

tOracleConnection Opens a connection to the specified Oracle database for reuse in the
subsequent subJob or subJobs.

tOracleInput Reads an Oracle database and extracts fields based on a query.

tOracleOutput Writes, updates, makes changes or suppresses entries in an Oracle


database.

tOracleOutputBulk Writes a file with columns based on the defined delimiter and the Oracle
standards.

tOracleOutputBulkExec Executes the Insert action in the specified Oracle database.

tOracleRollback Cancels the transaction commit in the connected Oracle database to avoid
committing part of a transaction involuntarily.

tOracleRow Executes the stated SQL query on the specified Oracle database.

tOracleSP Calls an Oracle database stored procedure.

tOracleTableList Lists the names of specified Oracle tables using a SELECT statement based
on a WHERE clause.

Oracle scenarios
Checking number format using a stored procedure
Truncating and inserting file data into an Oracle database
Using context parameters when reading a table from an Oracle database

ORC
ORC components

tFileInputORC Extracts records from a given ORC format file and sends the data to the
next component for further processing.

tFileOutputORC Receives records from the processing component placed ahead of it and
writes the records into ORC format files.

Orchestration (Integration)

Orchestration (Integration) components

tCollector Feeds the parallel execution processes with the threads generated by
tPartitioner.

tDepartitioner Assembles the outputs of the parallel execution processes so that


tRecollector can capture those outputs.

tParallelize Manages complex Job systems. It executes several subJobs simultaneously


and synchronizes the execution of a subJob with other subJobs within the
main Job.

tPartitioner Partitions the input data before tCollector can transfer them to the parallel
execution processes.

tRecollector Outputs of the parallel execution results, depending on tDepartitioner.

tFlowToIterate Reads data line by line from the input flow and stores the data entries in
iterative global variables.

tForeach Creates a loop on a list for an iterate link.

tInfiniteLoop Executes a task or a Job automatically, based on a loop.

tIterateToFlow Transforms non processable data into a processable flow.

tLoop Executes a task or a Job automatically, based on a loop

tPostjob Triggers a task required after the execution of a Job

tPrejob Triggers a task required for the execution of a Job

tReplicate Duplicates the incoming schema into two identical output flows.

tRunJob Manages complex Job systems which need to execute one Job after
another.

tSleep Identifies possible bottlenecks using a time break in the Job for testing or
tracking purpose.

tUnite Centralizes data from various and heterogeneous sources.


tWaitForFile Iterates on a directory and triggers the next component when the defined
condition is met.

tWaitForSocket Triggers a Job based on a defined condition.

tWaitForSqlData Iterates on a given connection for insertion or deletion of rows and triggers
a subJob when a condition linked to SQL data presence is met.

Orchestration (Integration) scenarios


Parallelizing/synchronizing subJobs execution
Sorting the customer data of large size in parallel
Calling a Job and passing the parameter needed to the called Job
Executing a Job multiple times using a loop
Handling files before and after the execution of a data Job
Iterating on a list and retrieving the values
Iterating on files and merge the content
Passing a value from a parent Job to a child Job
Propagating the buffered output data from the child Job to the parent Job
Replicating a flow and sorting two identical flows respectively
Running a list of child Jobs dynamically
Transforming a list of files as data flow
Transforming data flow to a list
Waiting for a file to be created and continuing the iteration loop after a message is triggered
Waiting for a file to be created and stopping the iteration loop after a message is triggered
Waiting for insertion of rows in a table

Palo
Palo components

tPaloCheckElements Checks whether elements are present in an incoming data flow existing in a given cube.
(deprecated)

tPaloClose (deprecated) Closes an active connection to a Palo Server.

tPaloConnection (deprecated) Opens a connection to a Palo Server and allows other components involved in a process to
share the connection for the duration of the process.

tPaloCube (deprecated) Performs operations on a given Palo cube.

tPaloCubeList (deprecated) Retrieves a list of cube details from the given Palo database.

tPaloDatabase (deprecated) Manages the databases inside a Palo server.

tPaloDatabaseList (deprecated) Lists database names, database types, number of cubes, number of dimensions, database
status and database id from a given Palo server.

tPaloDimension (deprecated) Manages Palo dimensions, even elements inside a database.

tPaloDimensionList Retrieves a list of dimension details from the given Palo database.
(deprecated)
tPaloInputMulti (deprecated) Retrieves the stored or calculated values in combination with the element records out of a
cube.

tPaloOutput (deprecated) Takes the input stream and writes it to a given Palo cube.

tPaloOutputMulti (deprecated) Takes the input stream and writes it to a given Palo cube.

tPaloRule (deprecated) Manages rules in a given cube.

tPaloRuleList (deprecated) Lists all rules, formulas, comments, activation status, external IDs from a given cube.

Palo scenarios
Creating a cube in an existing database
Creating a database
Creating a dimension with elements
Creating a rule in a given cube
Rejecting inflow data when the elements to be written do not exist in a given cube
Retrieving detailed cube information from a given database
Retrieving detailed database information from a given Palo server
Retrieving detailed dimension information from a given database
Retrieving detailed rule information from a given cube
Retrieving dimension elements from a given cube
Writing data into a given cube

ParAccel
ParAccel components

tParAccelBulkExec (deprecated) Improves performance when loading data in ParAccel database.

tParAccelClose (deprecated) Closes a transaction.

tParAccelCommit (deprecated) Commits in one go a global transaction, using a unique connection, instead of doing that on
every row or every batch and thus provides gain in performance.

tParAccelConnection Opens a connection to the specified database that can then be reused in the subsequent
(deprecated) subJob or subJobs.

tParAccelInput (deprecated) Reads a database and extracts fields based on a query.

tParAccelOutput (deprecated) Executes the action defined on the table and/or on the data of a table, according to the
input flow form the previous component.

tParAccelOutputBulk Prepares the file to be used as parameter in the INSERT query to feed the ParAccel database.
(deprecated)

tParAccelOutputBulkExec Improves performance when loading data in ParAccel database.


(deprecated)

tParAccelRollback (deprecated) Avoids to commit part of a transaction involuntarily.


tParAccelRow (deprecated) Acts on the actual DB structure or on the data (although without handling data), depending
on the nature of the query and the database. The SQLBuilder tool helps you write easily
your SQL statements.

Parquet
Parquet components

tFileInputParquet Extracts records from a given Parquet format file and sends the data to the
next component for further processing.

tFileOutputParquet Receives records from the processing component placed ahead of it and
writes the records into Parquet format files.

tFileStreamInputParquet Extracts records from a given Parquet format file for other components to
process the records.

Petals

Petals components

tPetalsInput (deprecated) Passes Petals' data to a Talend Job.

tPetalsOutput (deprecated) Transfers the data in a Talend Job to Petals ESB.

POP

POP component

tPOP Fetches one or more email messages from a server using the POP3 or IMAP
protocol.

POP scenario
Retrieving a selection of email messages from an email server

Positional
Positional components

tFileStreamInputPositional Listens on a given directory for new files, reads data from them row by row and
extracts fields based on a specific pattern.

tFileInputPositional Reads a positional file row by row to split them up into fields based on a given
pattern and then sends the fields as defined in the schema to the next
component.

tFileOutputPositional Writes a file row by row according to the length and the format of the fields or
columns in a row.
Positional scenarios
Handling a positional file based on a dynamic schema
Reading a Positional file and saving filtered results to XML

PostgresPlus

PostgresPlus components

tPostgresPlusBulkExec Improves performance during Insert operations to a DB2 database.

tPostgresPlusClose Closes the transaction committed in the connected PostgresPlus database.

tPostgresPlusCommit Commits in one go a global transaction, using a unique connection, instead of doing
that on every row or every batch and thus improves performance.

tPostgresPlusConnection Opens a connection to the specified database that can then be reused in the
subsequent subJob or subJobs.

tPostgresPlusInput Executes a DB query with a strictly defined order which must correspond to the
schema definition. Then it passes on the field list to the next component via a Main row
link.

tPostgresPlusOutput Executes the action defined on the table and/or on the data contained in the table,
based on the flow incoming from the preceding component in the job.

tPostgresPlusOutputBulk Prepares the file to be used as parameter in the INSERT query to feed the PostgresPlus
database.

tPostgresPlusOutputBulkExec Improves performance during Insert operations to a PostgresPlus database.

tPostgresPlusRollback Avoids to commit part of a transaction involuntarily.

tPostgresPlusRow Acts on the actual DB structure or on the data (although without handling data),
depending on the nature of the query and the database. The SQLBuilder tool helps you
write easily your SQL statements.

PostgreSQL

PostgreSQL components

tPostgresqlInvalidRows Extracts DB rows that do not match a given data quality pattern, you can then
implement any required correction.

tPostgresqlValidRows Extracts DB rows that match a given data quality pattern.

tPostgresqlBulkExec Improves performance while carrying out the Insert operations to a Postgresql
database.

tPostgresqlClose Closes the transaction committed in the connected Postgresql database.

tPostgresqlCommit Commits in one go a global transaction, using a unique connection, instead of


doing that on every row or every batch and thus improves performance.
tPostgresqlConnection Opens a connection to the specified database that can then be reused in the
subsequent subJob or subJobs.

tPostgresqlInput Executes a DB query with a strictly defined order which must correspond to the
schema definition. Then it passes on the field list to the next component via a
Main row link.

tPostgresqlOutput Executes the action defined on the table and/or on the data contained in the
table, based on the flow incoming from the preceding component in the job.

tPostgresqlOutputBulk Prepares the file to be used as parameters in the INSERT query to feed the
Postgresql database.

tPostgresqlOutputBulkExec Improves performance during Insert operations to a Postgresql database.

tPostgresqlRollback Avoids to commit part of a transaction involuntarily.

tPostgresqlRow Acts on the actual DB structure or on the data (although without handling data),
depending on the nature of the query and the database. The SQLBuilder tool
helps you write easily your SQL statements.

Processing (Integration)
Processing (Integration) components

tCacheIn Offers faster access to the persistent data.

tCacheOut Persists the input RDDs depending on the specific storage level you define
in order to offer faster access to these datasets later.

tExtractDynamicFields Parses a Dynamic column to create standard output columns.

tExtractEDIField Reads the EDI structured data from an EDIFACT message file, generates an
XML according to the EDIFACT family and the EDIFACT type, extracts data
by parsing the generated XML using the XPath queries manually defined or
coming from the Repository wizard, and finally sends the data to the next
component via a Row connection.

tExtractRegexFields Extracts data and generates multiple columns from a formatted string
using regex matching.

tSample Returns a sample subset of the data being processed.

tSqlRow Performs SQL queries over input datasets.

tTop Sorts data and outputs several rows from the first one of this data.

tTopBy Groups and sorts data and outputs several rows from the first one of the
data in each group.

tWindow Applies a given Spark window on the incoming RDDs and sends the
window-based RDDs to its following component.
tWriteAvroFields Transforms the incoming data into Avro files.

tWriteDelimitedFields Converts records into byte arrays.

tWriteDynamicFields Creates a dynamic schema from input columns in the component.

tWritePositionalFields Converts records into byte arrays.

tWriteXMLFields Converts records into byte arrays.

tAggregateRow Receives a flow and aggregates it based on one or more columns.

tAggregateSortedRow Aggregates the sorted input data for output column based on a set of
operations. Each output column is configured with many rows as required,
the operations to be carried out and the input column from which the data
will be taken for better data aggregation.

tConvertType Converts one Talend java type to another automatically, and thus avoid
compiling errors.

tDenormalize Denormalizes the input flow based on one column.

tDenormalizeSortedRow Synthesizes sorted input flow to save memory.

tExternalSortRow Sorts input data based on one or several columns, by sort type and order,
using an external sort application.

tExtractDelimitedFields Generates multiple columns from a delimited string column.

tExtractJSONFields Extracts the desired data from JSON fields based on the JSONPath or XPath
query.

tExtractPositionalFields Extracts data and generates multiple columns from a formatted string
using positional fields.

tExtractXMLField Reads the XML structured data from an XML field and sends the data as
defined in the schema to the following component.

tFilterColumns Homogenizes schemas either by ordering the columns, removing


unwanted columns or adding new columns.

tFilterRow Filters input rows by setting one or more conditions on the selected
columns.

tJoin Performs inner or outer joins between the main data flow and the lookup
flow.

tNormalize Normalizes the input flow following SQL standard to help improve data
quality and thus eases the data update.

tPartition Allows you to visually define how an input dataset is partitioned.

tReplace Cleanses all files before further processing.


tReplicate Duplicates the incoming schema into two identical output flows.

tSampleRow Selects rows according to a list of single lines and/or a list of groups of
lines.

tSortRow Helps creating metrics and classification table.

tSplitRow Splits one input row into several output rows.

tUniqRow Ensures data quality of input or output flow in a Job.

tUnite Centralizes data from various and heterogeneous sources.

tWriteJSONField Transforms the incoming data into JSON fields and transfers them to a file,
a database table, etc.

Processing (Integration) scenarios


Aggregating values based on dynamic schema
Converting java types using Map/Reduce components
Creating a dynamic column and extract its content
Deduplicating entries based on dynamic schema
Deduplicating entries using Map/Reduce components
Extracting data from an EDIFACT message
Extracting name, domain and TLD from e-mail addresses
Extracting the contents of a dynamic column via tJavaRow
Matching input data against a reference file based on a dynamic column
Normalizing data using Map/Reduce components
Performing download analysis using a Spark Batch Job
Replacing values and filtering columns using Map/Reduce components
Sorting entries based on dynamic schema
Aggregating values and sorting data
Cleaning up and filtering a CSV file
Collecting data from your favorite online social network
Converting java types
Deduplicating entries
Denormalizing on multiple columns
Denormalizing on one column
Doing an exact match on two columns and outputting the main and rejected data
Extracting XML data from a field in a database table
Extracting a delimited string column of a database table
Extracting correct and erroneous data from an XML field in a delimited file
Filtering a list of names through different logical operations
Filtering a list of names using simple conditions
Filtering rows and groups of rows
Iterating on files and merge the content
Normalizing data
Regrouping sorted rows
Replicating a flow and sorting two identical flows respectively
Retrieving error messages while extracting data from JSON fields
Sorting and aggregating the input data
Sorting entries
Splitting one row into two rows
Writing flat data into JSON fields
Properties

Properties components

tFileInputProperties Reads a text file row by row and separates the fields according to the model
key = value.

tFileOutputProperties Writes a configuration file, of the type .ini or .properties, containing text
data organized according to the model key = value.

Properties scenario
Reading and matching the keys and the values of different .properties files and outputting the results in a glossary

Proxy

Proxy component

tSetProxy Sets the relevant information for proxy


setup.

RabbitMQ

RabbitMQ components

tRabbitMQClose Closes a connection to a message queue.

tRabbitMQConnection Establishes a connection to a message queue for later use.

tRabbitMQInput Reads messages from a message queue and passes the messages in the
output flow.

tRabbitMQOutput Receives data from the preceding component as messages and adds the
messages to queues in the specified way.

Raw

Raw components

tFileInputRaw Reads all data in a raw file and sends it to a single output column for
subsequent processing by another component.

tFileOutputRaw Provides data coming from another component, in the form of a single
column of output data.

Regex
Regex components
tFileStreamInputRegex Listens on a given directory for new files, then reads data from these files,
row by row, in order to split the data into fields using regular expressions.

tFileInputRegex Reads a file row by row to split them up into fields using regular
expressions and sends the fields as defined in the schema to the next
component.

Regex scenario
Reading data using a Regex and outputting the result to Positional file

REST

REST component

tREST Serves as a REST Web service


client.

REST scenario
Creating and retrieving data by invoking REST Web service

Riak
Riak components

tRiakBucketList (deprecated) Retrieves a list of buckets from a Riak cluster and iterates on it.

tRiakClose (deprecated) Closes an active connection to a Riak cluster so as to release occupied resources.

tRiakConnection (deprecated) Opens and reuses of the connection it creates to a Riak cluster.

tRiakInput (deprecated) Extracts the desired data from a bucket in a Riak node so as to store or apply changes to
the data.

tRiakKeyList (deprecated) Retrieves a list of keys and iterates on it within a Riak bucket for analysis or
development purposes.

tRiakOutput (deprecated) Receives data from the preceding component and writes data into or deletes data from
a bucket in a Riak cluster.

Riak scenario
Exporting data from a Riak bucket to a local file

Route

Route components

tRouteFault Sends messages from a Data Integration Job to a Mediation Route and
mark the message as fault.
tRouteInput Accepts messages in a Data Integration Job from a Mediation Route.

tRouteOutput Sends messages from a Data Integration Job to a Mediation Route.

Route scenarios
Using shared Data Sources with DB components in Jobs with tRouteInput
Exchanging messages between a Job and a Route
Exchanging messages between a Job and a Route

RSS

RSS components

tRSSInput Reads RSS-Feeds using URLs.

tRSSOutput Creates and writes XML files that hold RSS or Atom
feeds.

RSS scenarios
Creating an ATOM feed XML file
Creating an RSS flow and storing files on an FTP server
Creating an RSS flow that contains metadata
Fetching frequently updated blog entries.

Salesforce

Salesforce components

tSalesforceBulkExec Bulk-loads data in a given file into a Salesforce object.

tSalesforceConnection Opens a connection to Salesforce.

tSalesforceEinsteinBulkExec Loads data into Salesforce Analytics Cloud from a local file.

tSalesforceEinsteinOutputBulkExec Gains in performance during data operations to the Salesforce Analytics Cloud.

tSalesforceGetDeleted Collects data deleted during a specific period of time from a Salesforce object.

tSalesforceGetServerTimestamp Retrieves the current date of the Salesforce server presented in a timestamp format.

tSalesforceGetUpdated Collects data updated during a specific period of time from a Salesforce object.

tSalesforceInput Retrieves data from a Salesforce object based on a query.

tSalesforceOutput Inserts, updates, upserts, or deletes data in a Salesforce object.

tSalesforceOutputBulk Generates the file to be processed by the tSalesforceBulkExec component for bulk
processing.

tSalesforceOutputBulkExec Bulk-loads data in a given file into a Salesforce object.


Salesforce scenarios
Connecting to Salesforce using OAuth implicit flow to authenticate the user (deprecated)
Inserting bulk data into Salesforce
Recovering deleted data from Salesforce
Upserting Salesforce data based on external IDs

SAP

SAP components

tELTSAPInput Provides the SAP table schema that will be used by the tELTSAPMap component
to generate the SQL SELECT statement.

tELTSAPMap Builds the SQL SELECT statement using the table schema(s) provided by one or
more tELTSAPInput components.

tSAPADSOInput Retrieves data of an active ADSO (Advanced Data Store Object) from an SAP BW
system on an SAP HANA database.

tSAPBapi Extracts data from or loads data to an SAP server using multiple input/output
parameters or the document type parameter.

tSAPBWInput Executes an SQL query with a strictly defined order which must correspond to
your schema definition.

tSAPCommit Commits a global transaction in one go, using a unique connection, instead of
doing that on every row or every batch and thus provides gain in performance.

tSAPConnection Commits a whole Job data in one go to the SAP system as one transaction.

tSAPDataSourceOutput Writes Data Source objects into an SAP BW Data Source system.

tSAPDataSourceReceiver Retrieves data requests stored on Talend SAP RFC server and related to a specific
Data Source system.

tSAPDSOInput Retrieves DSO data from an SAP BW system.

tSAPDSOOutput Creates or updates DSO data in an SAP BW table.

tSAPHanaBulkExec Improves performance while carrying out the Insert operations to an SAP HANA
database.

tSAPHanaInvalidRows Checks SAP Hana database rows against specific Data Quality patterns (regular
expression) or Data Quality rules (business rule).

tSAPHanaUnload Offloads massive data from the SAP HANA database to a third party system.

tSAPHanaValidRows Checks SAP Hana database rows against specific Data Quality patterns (regular
expression) or Data Quality rules (business rule).

tSAPIDocInput (deprecated) Extracts IDoc data set that is used for asynchronous transactions between SAP
systems or between a SAP system and another application.
tSAPIDocOutput Uploads IDoc data set in XML fomat to an SAP system.

tSAPIDocReceiver Extracts data from SAP IDocs stored on an SAP server.

tSAPInfoCubeInput Retrieves InfoCube data from an SAP BW system.

tSAPInfoObjectInput Retrieves InfoObject data from an SAP BW system.

tSAPInfoObjectOutput Writes InfoObject data into an SAP BW system.

tSAPODPInput Extracts business data from the ERP part of SAP (SAP Business application, SAP on
HANA, SAP R/3, and S4/HANA) through ODP (Operational Data Provisioning).

tSAPRollback Cancels the transaction commit in the connected SAP.

tSAPTableInput Reads data from an SAP table on an SAP server.

tSAPHanaClose Closes a connection to a SAP HANA database.

tSAPHanaCommit Commits in one go, using a unique connection, a global transaction instead of
doing that on every row or every batch and thus provides gain in performance.

tSAPHanaConnection Establishes a SAP HANA connection to be reused by other SAP HANA components
in your Job.

tSAPHanaInput Executes a database query with a defined command which must correspond to
the schema definition.

tSAPHanaOutput Executes the action defined on the table and/or on the data contained in the
table, based on the flow incoming from the preceding component in the Job.

tSAPHanaRollback Avoids to commit part of a transaction involuntarily.

tSAPHanaRow Acts on the actual database structure or on the data (although without handling
data).

SAP scenarios
Connecting to a given SAP R/3 system for listening the creation of IDoc files (deprecated)
Consuming Data Source objects using SSL Transport
Consuming IDocs for processing by tHMap
Exporting data using tSAPHanaUnload
Extracting Data using tSAPInfoCubeInput
Reading data from SAP BW database
Retrieving ADSO data from SAP BW
Retrieving data from SAP through ODP
Retrieving data from an SAP system by calling a BAPI function using document type parameters
Retrieving data from an SAP system by calling a BAPI function using multiple input/output parameters
Aggregating and filtering data in multiple SAP tables

SCD

SCD components
tDB2SCD Addresses Slowly Changing Dimension needs, reading regularly a source of
data and logging the changes into a dedicated SCD table

tGreenplumSCD Addresses Slowly Changing Dimension needs, reading regularly a source of


data and logging the changes into a dedicated SCD table.

tInformixSCD Tracks and shows changes which have been made to Informix SCD dedicated
tables

tIngresSCD (deprecated) Reflects and tracks changes in a dedicated Ingres SCD table.

tMSSqlSCD Tracks and reflects changes in a dedicated SCD table in a Microsoft SQL Server
or Azure SQL database.

tMysqlSCD Reflects and tracks changes in a dedicated MySQL SCD table.

tNetezzaSCD Reflects and tracks changes in a dedicated Netezza SCD table.

tOracleSCD Reflects and tracks changes in a dedicated Oracle SCD table.

tParAccelSCD (deprecated) Addresses Slowly Changing Dimension needs, reading regularly a source of
data and logging the changes into a dedicated SCD table

tPostgresPlusSCD Addresses Slowly Changing Dimension needs, reading regularly a source of


data and logging the changes into a dedicated SCD table.

tPostgresqlSCD Addresses Slowly Changing Dimension needs, reading regularly a source of


data and logging the changes into a dedicated SCD table.

tSybaseSCD Addresses Slowly Changing Dimension needs, reading regularly a source of


data and logging the changes into a dedicated SCD table.

tTeradataSCD Addresses Slowly Changing Dimension needs, reading regularly a source of


data and logging the changes into a dedicated SCD table.

tVerticaSCD Tracks and reflects data changes in a dedicated Vertica SCD table.

SCD scenario
Tracking data changes using Slowly Changing Dimensions (type 0 through type 3)

SCDELT
SCDELT components

tDB2SCDELT Addresses Slowly Changing Dimension needs through SQL queries (server-
side processing mode), and logs the changes into a dedicated DB2 SCD
table.

tJDBCSCDELT Tracks data changes in a source database table using SCD (Slowly
Changing Dimensions) Type 1 method and/or Type 2 method and writes
both the current and historical data into a specified SCD dimension table.
tMysqlSCDELT Reflects and tracks changes in a dedicated MySQL SCD table through SQL
queries.

tOracleSCDELT Reflects and tracks changes in a dedicated Oracle SCD table through SQL
queries.

tPostgresPlusSCDELT Addresses Slowly Changing Dimension needs through SQL queries (server-
side processing mode), and logs the changes into a dedicated PostgresPlus
SCD table.

tPostgresqlSCDELT Addresses Slowly Changing Dimension needs through SQL queries (server-
side processing mode), and logs the changes into a dedicated DB2 SCD
table.

tSybaseSCDELT Addresses Slowly Changing Dimension needs through SQL queries (server-
side processing mode), and logs the changes into a dedicated Sybase SCD
table.

tTeradataSCDELT Addresses Slowly Changing Dimension needs through SQL queries (server-
side processing mode), and logs the changes into a dedicated Teradata
SCD table.

SCDELT scenarios
Tracking data changes in a PostgreSQL table using the tPostgreSQLSCDELT component
Tracking data changes in a Snowflake table using the tJDBCSCDELT component
Tracking data changes using Slowly Changing Dimensions (type 0 through type 3)

SCP

SCP components

tSCPClose Closes a connection to an SCP protocol.

tSCPConnection Opens an SCP connection to transfer files in one transaction.

tSCPDelete Removes a file from the defined SCP server.

tSCPFileExists Verifies the existence of a file on the defined SCP server.

tSCPFileList Lists files from the defined SCP server.

tSCPGet Copies files from the defined SCP server.

tSCPPut Copies files to the defined SCP server.

tSCPRename Renames file(s) on the defined SCP server.

tSCPTruncate Removes data from file(s) on the defined SCP server via an SCP
connection.

SCP scenario
Handling a file using SCP

ServiceNow

ServiceNow components

tServiceNowConnection Opens a connection to a ServiceNow instance that can then be reused by


other ServiceNow components.

tServiceNowInput Accesses ServiceNow and retrieves data from it.

tServiceNowOutput Performs the defined action on the data on ServiceNow.

SingleStore
SingleStore components

tSingleStoreBulkExec Loads data from a file into a table of a database connected through JDBC API.

tSingleStoreClose Closes an active SingleStore connection to release the occupied resources.

tSingleStoreCommit Commits in one go a global transaction instead of doing that on every row or every
batch and thus provides gain in performance.

tSingleStoreConnection Opens a connection to the specified database that can then be reused in the
subsequent subJob or subJobs.

tSingleStoreInput Reads any database using a JDBC API connection and extracts fields based on a
query.

tSingleStoreOutput Executes the action defined on the data contained in the table, based on the flow
incoming from the preceding component in the Job.

tSingleStoreOutputBulk Prepares the bulk file to be used as a parameter to feed the database connected.

tSingleStoreOutputBulkExec Provides performance gain when loading data from a file into a table of a database
connected through JDBC API.

tSingleStoreRollback Avoids commiting part of a transaction accidentally by canceling the transaction


committed in the connected database.

tSingleStoreRow Acts on the actual DB structure or on the data (although without handling data)
using the SQLBuilder tool to write easily your SQL statements.

tSingleStoreSP Centralizes multiple or complex queries in a database in order to call them easily.

Snowflake
Snowflake components

tSnowflakeConfiguration Stores connection information and credentials to be reused by other Snowflake


components in the Apache Spark Batch framework.
tSnowflakeBulkExec Loads data from files in a folder into a Snowflake table. The folder can be in an
internal Snowflake stage, an Amazon Simple Storage Service (Amazon S3)
bucket, or an Azure container.

tSnowflakeClose Closes an active Snowflake connection to release the occupied resources.

tSnowflakeCommit Provides gain in performance.

tSnowflakeConnection Opens a connection to Snowflake that can then be reused by other Snowflake
components.

tSnowflakeInput Reads data from a Snowflake table into the data flow of your Job based on an
SQL query.

tSnowflakeOutput Uses the data incoming from its preceding component to insert, update, upsert
or delete data in a Snowflake table.

tSnowflakeOutputBulk Writes incoming data to files generated in a folder. The folder can be in an
internal Snowflake stage, an Amazon Simple Storage Service (Amazon S3)
bucket, or an Azure container.

tSnowflakeOutputBulkExec Writes incoming data to files generated in a folder and then loads the data into a
Snowflake database table. The folder can be in an internal Snowflake stage, an
Amazon Simple Storage Service (Amazon S3) bucket, or an Azure container.

tSnowflakeRollback Cancels the transaction commit in the Snowflake database to avoid committing
part of a transaction involuntarily.

tSnowflakeRow Executes the SQL command stated onto a specified Snowflake database.

Snowflake scenarios
Aggregating Snowflake data using context variables as table and connection names
Loading Data Using COPY Command
Loading data in a Snowflake table using custom stage path
Querying data in a cloud file through a materialized view and a Snowflake external table
Writing data into and reading data from a Snowflake table

SOAP

SOAP component

tSOAP Calls a method via a Web service in order to retrieve the values of the
parameters defined in the component editor.

SOAP scenarios
Fetching the country name information using a Web service
Using a SOAP message from an XML file to get country name information and saving the information to an XML file

Socket

Socket components
tSocketInput Opens the socket port and listens for the incoming data.

tSocketOutput Sends out the data from the incoming flow to a listening socket
port.

Socket scenario
Passing on data to the listening port

Splunk

Splunk component

tSplunkEventCollector Sends the event data to Splunk through Splunk HTTP Event
Collector.

SQLite

SQLite components

tSQLiteClose Closes a transaction committed in the connected DB.

tSQLiteCommit Commits in one go, using a unique connection, a global transaction instead
of doing that on every row or every batch and thus provides gain in
performance.

tSQLiteConnection Opens a connection to the database for a current transaction.

tSQLiteInput Executes a DB query with a defined command which must correspond to


the schema definition. It passes on rows to the next component via a Main
row link.

tSQLiteOutput Executes the action defined on the table and/or on the data contained in
the table, based on the flow incoming from the preceding component in
the job.

tSQLiteRollback Cancels the transaction committed in the SQLite database.

tSQLiteRow Executes the defined query onto the specified database and uses the
parameters bound with the column.

SQLite scenarios
Filtering SQlite data
Updating SQLite rows

SQLTemplate

SQLTemplate components
tSQLTemplate Executes the common database actions or customized SQL statement templates,
for example to drop/create a table.

tSQLTemplateAggregate Provides a set of matrix based on values or calculations.

tSQLTemplateCommit Commits a global action in one go using a single connection, instead of doing so
for every row or every batch of rows separately. This provides a gain in
performance.

tSQLTemplateFilterColumns Homogenizes schemas by reorganizing, deleting or adding new columns.

tSQLTemplateFilterRows Sets row filters for any given data source, based on a WHERE clause.

tSQLTemplateMerge Merges data into a database table directly on the DBMS by creating and executing
a MERGE statement.

tSQLTemplateRollback Cancels the transaction committed in the SQLTemplate database.

SQLTemplate scenarios
Filtering and aggregating table columns directly on the DBMS
Merging data directly on the DBMS

Sqoop
Sqoop components

tSqoopExport Defines the arguments required by Sqoop for transferring data to a RDBMS.

tSqoopImport Defines the arguments required by Sqoop for writing the data of your
interest into HDFS.

tSqoopImportAllTables Defines the arguments required by Sqoop for writing all of the tables of a
database into HDFS.

tSqoopMerge Performs an incremental import that updates an older dataset with newer
records. The file types of the newer and the older datasets must be the
same.

Sqoop scenarios
Importing a MySQL table to HDFS
Merging two datasets in HDFS

SVNLog
SVNLog component

tSVNLogInput Retrieves the information of a specified revision or range of revisions from


an SVN repository.

SVNLog scenario
Retrieving a log message from an SVN repository

Sybase

Sybase components

tSybaseBulkExec Gains in performance during Insert operations to a Sybase database.

tSybaseClose Closes a transaction committed in the connected database.

tSybaseCommit Commits in one go, using a unique connection, a global transaction instead
of doing that on every row or every batch and thus provides gain in
performance.

tSybaseConnection Opens a connection to the database for a current transaction.

tSybaseInput Executes a DB query with a strictly defined order which must correspond to
the schema definition.

tSybaseIQBulkExec Loads data into a Sybase database table from a flat file or other database
table.

tSybaseIQOutputBulkExec Gains in performance during Insert operations to a Sybase IQ database.

tSybaseOutput Executes the action defined on the table and/or on the data contained in the
table, based on the flow incoming from the preceding component in the job.

tSybaseOutputBulk Prepares the file to be used as parameter in the INSERT query to feed the
Sybase database.

tSybaseOutputBulkExec Gains in performance during Insert operations to a Sybase database.

tSybaseRollback Cancels the transaction committed in the Sybase database.

tSybaseRow Acts on the actual DB structure or on the data (although without handling
data).

tSybaseSP Calls a Sybase database stored procedure.

Sybase scenario
Bulk-loading data to a Sybase IQ 12 database

System
System components

tRunJob Manages complex Job systems which need to execute one Job after
another.

tSetEnv Adds variables temporarily to system environment during the execution of


a Job.
tSSH Establishes a communication with distant server and returns securely
sensible information.

tSystem Calls other system processing commands, already up and running in a


larger Job.

System scenarios
Calling a Job and passing the parameter needed to the called Job
Displaying remote system information via SSH
Echoing 'Hello World!'
Modifying a variable during a Job execution
Passing a value from a parent Job to a child Job
Propagating the buffered output data from the child Job to the parent Job
Running a list of child Jobs dynamically

Tachyon

Tachyon component

tTachyonConfiguration Defines a connection to Tachyon storage system and enables the reuse of
the configuration in the same Job.

tAddLocationFromIP
tAddLocationFromIP component

tAddLocationFromIP Replaces IP addresses with geographical locations.

tAddLocationFromIP scenario
Identifying a real-world geographic location of an IP

Talend Cloud

Talend Cloud components

tJobFailure Throws an exception and prompts a message when an error occurs.

tJobLog Collects and shows exception data during the execution of the Job in
Talend Studio or the task in Talend Cloud Management Console.

tJobReject Receives data rejected after task processing.

tChangeFileEncoding
tChangeFileEncoding component

tChangeFileEncoding Transforms the character encoding of a given file and generates a new file
with the transformed character encoding.
tChangeFileEncoding scenario
Transforming the character encoding of a file

tCreateTemporaryFile
tCreateTemporaryFile component

tCreateTemporaryFile Creates a temporary file in a specified directory. This component allows


you to either keep the temporary file or delete it after the Job execution.

tCreateTemporaryFile scenario
Creating a temporary file and writing data into it

Technical

Technical components

tBoundedStreamInput Provides a data stream for the component to be tested and is suitable for
use in a test case only.

tCollectAndCheck Shows and validates the result of a component test.

tHashInput Reads from the cache memory data loaded by tHashOutput to offer high-
speed data feed, facilitating transactions involving a large amount of data.

tHashOutput Loads data to the cache memory to offer high-speed access, facilitating
transactions involving a large amount of data.

Technical scenarios
Clearing the memory before loading data to it in case an iterator exists in the same subJob
Reading data from the cache memory for high-speed data access

Teradata

Teradata components

tTeradataConfiguration Defines a connection to Teradata and enables the reuse of the connection
configuration in the same Job.

tTeradataLookupInput Executes a database query with a strictly defined order which must
correspond to the schema definition.

tTeradataClose Closes the transaction committed in the connected DB.

tTeradataCommit Commits in one go, using a unique connection, a global transaction instead
of doing that on every row or every batch and thus provides gain in
performance.
tTeradataConnection Opens a connection to the specified database that can then be reused in
the subsequent subJob or subJobs.

tTeradataFastExport Exports data batches from a Teradata table to a customer system or to a


smaller database.

tTeradataFastLoad Executes a database query according to a strict order which must be the
same as the one in the schema.

tTeradataFastLoadUtility Executes a database query according to a strict order which must be the
same as the one in the schema.

tTeradataInput Executes a DB query with a strictly defined order which must correspond to
the schema definition.

tTeradataMultiLoad Executes a database query according to a strict order which must be the
same as the one in the schema.

tTeradataOutput Executes the action defined on the table and/or on the data contained in
the table, based on the flow incoming from the preceding component in
the job.

tTeradataRollback Cancels the transaction commit in the Teradata database.

tTeradataRow Acts on the actual DB structure or on the data (although without handling
data).

tTeradataTPTExec Offers high performance in inserting data from an existing file to a table in
a Teradata database.

tTeradataTPTUtility Writes the incoming data to a file and then loads the data from the file to a
Teradata database.

tTeradataTPump Inserts, updates, or deletes data in the Teradata database with the TPump
loading utility which allows near-real-time data to be achieved in the data
warehouse.

Teradata scenarios
Inserting data into a Teradata database table
Loading data into a Teradata database

tFileCompare

tFileCompare component

tFileCompare Compares two files and provides comparison data based on a read-only
schema.

tFileCompare scenario
Comparing unzipped files
tFileCopy

tFileCopy component

tFileCopy Copies a source file or folder into a target


directory.

tFileCopy scenario
Moving/copying/renaming files in batch

tFileDelete

tFileDelete component

tFileDelete Deletes files from a given directory.

tFileDelete scenario
Deleting files

tFileExist
tFileExist component

tFileExist Checks if a file exists or not.

tFileExist scenario
Checking for the presence of a file and creating it if it does not exist

tFileList

tFileList component

tFileList Iterates a set of files or folders in a given directory based on a filemask


pattern.

tFileList scenarios
Finding duplicate files between two folders
Iterating on a file directory

tFileProperties

tFileProperties component

tFileProperties Creates a single row flow that displays the main properties of the
processed file.

tFileProperties scenario
Displaying the properties of a processed file

tFileRowCount

tFileRowCount component

tFileRowCount Opens a file and reads it row by row in order to determine the number of
rows inside.

tFileRowCount scenario
Writing a file to MySQL if the number of its records matches a reference value

tFileTouch
tFileTouch component

tFileTouch Creates an empty file or, if the specified file already exists, updates its date
of modification and of last access while keeping the contents unchanged.

tFixedFlowInput

tFixedFlowInput component

tFixedFlowInput Generates a fixed flow from internal variables.

tFixedFlowInput scenario
Buffering output data on the webapp server

tMap
tMap component

tMap Transforms and routes data from single or multiple sources to single or
multiple destinations.

tMap scenarios
Advanced mapping with lookup reload at each row
Converting a UNIX timestamp to a readable date
Mapping data using a filter and a simple explicit join
Mapping with join output tables

tMemorizeRows

tMemorizeRows component

tMemorizeRows Memorizes a sequence of rows that passes through and allows the
following component(s) to perform operations of your choice on the
memorized rows.
tMemorizeRows scenario
Retrieving the different ages and lowest age data

tMsgBox
tMsgBox component

tMsgBox Opens a dialog box with an OK button requiring action from the
user.

tMsgBox scenario
'Hello world!' type test

tRowGenerator

tRowGenerator component

tRowGenerator Creates an input flow in a Job for testing purposes, in particular for
boundary test sets.

tRowGenerator scenario
Generating random java data

tServerAlive
tServerAlive component

tServerAlive Validates the status of the connection to a specified


host.

tServerAlive scenario
Validating the status of the connection to a remote host

tSocketTextStreamInput

tSocketTextStreamInput component

tSocketTextStreamInput Creates a textual input stream by connecting to a network.

tXMLMap

tXMLMap component

tXMLMap Transforms and routes data from single or multiple sources to single or
multiple destinations.

tXMLMap scenarios
Mapping and transforming XML data
Restructuring products data using multiple loop elements

VectorWise
VectorWise components

tVectorWiseCommit Commits a global transaction in one go using a single connection instead of doing so on
(deprecated) every row or every batch. This provides a gain in performance

tVectorWiseConnection Opens a connection to the specified database that can then be reused in the subsequent
(deprecated) subJob or subJobs.

tVectorWiseInput (deprecated) Executes a DB query with a strictly defined order which must correspond to the schema
definition.

tVectorWiseOutput Executes the action defined on the table and/or on the data contained in the table, based on
(deprecated) the flow incoming from the preceding component in the Job.

tVectorWiseRollback Cancels transactions committed to the VectorWise database.


(deprecated)

tVectorWiseRow (deprecated) Acts on the actual DB structure or on the data (although without handling data).

Vertica

Vertica components

tVerticaBulkExec Loads data into a Vertica database table from a local file using the Vertica
COPY SQL statement.

tVerticaClose Closes an active connection to a Vertica database.

tVerticaCommit Commits in one go a global transaction using a unique connection instead


of doing that on every row or every batch and thus provides gain in
performance.

tVerticaConnection Opens a connection to the specified database that can then be reused in
the subsequent subJob or subJobs.

tVerticaInput Retrieves data from a Vertica database table based on a SQL query.

tVerticaOutput Inserts, updates, deletes, or copies data from an incoming flow into a
Vertica database table.

tVerticaOutputBulk Prepares a file to be used by the tVerticaBulkExec component to feed a


Vertica database.

tVerticaOutputBulkExec Receives data from a preceding component, writes data into a local file,
and loads data into a Vertica database from the file using the Vertica COPY
SQL statement.

tVerticaRollback Cancels the transaction commit in the Vertica database.


tVerticaRow Executes a Vertica SQL statement against a database table.

VtigerCRM

VtigerCRM components

tVtigerCRMInput (deprecated) Extracts data from a module of a VtigerCRM database.

tVtigerCRMOutput (deprecated) Writes data into a module of a VtigerCRM database.

Webservice
Webservice components

tRestWebServiceLookupInput Retrieves messages from a REpresentational State Transfer (REST) Web service
provider and gets responses accordingly.

tRestWebServiceOutput Serves as a REpresentational State Transfer (REST) Web service client that
continuously sends HTTP requests to a REST Web service provider in real time and
gets the responses.

tWebService Calls a method via a Web service in order to retrieve the values of the parameters
defined in the component editor.

tWebServiceInput Invokes a Method through a Web service.

Webservice scenarios
Getting the sum of two numbers using tWebServiceInput
Getting country names using tWebService

Workday
Workday component

tWorkdayInput Retrieves data of a Workday client based on a query or the Workday client
report.

XML

XML components

tFileStreamInputXML Opens a structured XML file and reads it row by row to split the data into
fields, then sends these fields as defined in the Schema to the next
component.

tEDIFACTtoXML Transforms an EDIFACT message file into the XML format for better
readability to users and compatibility with processing tools.
tExtractXMLField Reads the XML structured data from an XML field and sends the data as
defined in the schema to the following component.

tWriteXMLField Reads an input XML file and extracts the structure to insert it in defined
fields of the output XML file.

tXSLT Helps to transform data structure to another structure.

XML scenarios
Extracting XML data from a field in a database table
Extracting correct and erroneous data from an XML field in a delimited file
Extracting the structure of an XML file and inserting it into the fields of a database table
Reading an EDIFACT message file and saving it to XML
Transforming XML into HTML using an XSL stylesheet
Transforming stream into HTML using an XSL stylesheet

XML connectors

XML connectors components

tAdvancedFileOutputXML Writes an XML file with separated data values according to an XML tree
structure.

tFileInputXML Reads an XML structured file row by row to split them up into fields and
sends the fields as defined in the schema to the next component.

tFileOutputXML Writes an XML file with separated data values according to a defined
schema.

XML connectors scenarios


The Append the source XML file feature
Creating an XML file using a loop
Extracting erroneous XML data via a reject flow
Reading and extracting data from an XML structure

XML validation

XML validation components

tDTDValidator Helps at controlling data and structure quality of the file to be processed

tXSDValidator Helps at controlling data and structure quality of the file or flow to be
processed.

XML validation scenarios


Validating XML files
Validating data flows against an XSD file

XMLRPC
XMLRPC component

tXMLRPCInput Invokes a Method through a Web service and for the described
purpose.

XMLRPC scenario
Guessing the State name from an XMLRPC

You might also like