Google Vision PDF/TIFF Text Extraction: User Guide

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Google Vision

PDF/TIFF Text Extraction


USER GUIDE
Version: 6.4
Document Revision: 1.0

For more information please contact:


[email protected] | UK: +44 (0) 870 879 3000 | US: +1 888 757 7476
www.blueprism.com
Contents
1. Introduction ..........................................................................................................................................................3
2. Solution Overview and Configuration ...................................................................................................................3
2.1. Limitations .....................................................................................................................................................3
3. Pre-Requisites and Environment Configuration ....................................................................................................4
3.1. Google Cloud Services Prerequisites .............................................................................................................4
3.2. Blue Prism Configuration ...............................................................................................................................4
4. Using the Skill ........................................................................................................................................................6
4.1. Common Parameters.....................................................................................................................................6
4.2. PDF Document Text Detection ......................................................................................................................6
4.3. TIFF Text Detection ........................................................................................................................................7
4.4. Operation Status............................................................................................................................................7
5. Support..................................................................................................................................................................8
6. Functional Tests ....................................................................................................................................................8
7. Troubleshooting Guidelines ..................................................................................................................................8
8. Frequently Asked Questions..................................................................................................................................8

The information contained in this document is the proprietary and confidential information of Blue Prism Limited and should not be
disclosed to a third party without the written consent of an authorised Blue Prism representative. No part of this document may be
reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying without the written
permission of Blue Prism Limited.
© Blue Prism Limited, 2001 – 2018
®Blue Prism is a registered trademark of Blue Prism Limited
All trademarks are hereby acknowledged and are used to the benefit of their respective owners.
Blue Prism is not responsible for the content of external websites referenced by this document.
Blue Prism Limited, Centrix House, Crow Lane East, Newton-le-Willows, WA12 9UY, United Kingdom
Registered in England: Reg. No. 4260035. Tel: +44 870 879 3000. Web: www.blueprism.com

Commercial in Confidence Page 2 of 8


1. Introduction
As the market for RPA grows, also grows the interest of what RPA can do and how easy it can integrate with every
ecosystem available. With the advent of Artificial Intelligence to the marketplace, interest has grown in capabilities
that provide integrations with different pre-trained AI services in the cloud.

This document focuses on the design of the integration between Blue Prism and Google’s Vision Cognitive Service.
Google provides these in the form of web services, which are consumed via RESTful APIs. The pros/cons of this
package are outside the scope of this document.

2. Solution Overview and Configuration


The basic design of the Google Vision Skill is to encapsulate the different AI Cognitive services offered by Google.
These integrations can be used as an easy bridge to connect the client’s processes to the different AI services
developed by Google.

The Blue Prism’s Google Vision Skill interacts with the Google Cognitive Services by using Blue Prism to construct a
REST call. Then, the response given back is handled by Blue Prism and then converted into easy-to-use outputs,
such as Text, Numbers, or Collections.

All of Google’s services require a service account, which is given to each party as part of their contract with Googles
authentication server. When registering with Google’s Cloud Platform, you can create service accounts which are
restricted based upon API services. These service accounts are part of the OAuth 2.0 authentication layer which
allow you to call the API’s seamlessly – don’t worry, Blue Prism handles all the flow for you. Once a service account
has been saved inside of Blue Prism, the basic data flow of an API call would be as such:

2.1. Limitations
The following limitations should be understood before attempting to use these integrations:

• The customer or partner is responsible for the configuration and maintenance of the relevant cloud
subscriptions and services. Blue Prism cannot provide any support on the configuration of the cloud
environment itself.
• Use of the APIs may incur additional costs, depending on usage.
• There is always a possibility with external services that the APIs will change. This Skill is provided as-is
without warranties, and support is provided by Blue Prism on a best endeavours basis and is not subject to
formal SLAs.
• The Vision API accepts PDF/TIFF files up to 2000 pages. Larger files will return an error.
• API keys are not supported for asyncBatchAnnotate requests.

Commercial in Confidence Page 3 of 8


• The account used for authentication must have access to the Cloud Storage bucket that you specify for the
output (roles/editor or roles/storage.objectCreator or above).

3. Pre-Requisites and Environment Configuration


This section outlines the pre-requisites that are required to use the integrations. Note that Blue Prism is not able to
provide any support in configuring the Google Cloud Services themselves.

3.1. Google Cloud Services Prerequisites


To implement the Google Cognitive Services integration, the following components are required:
• Subscription to Google Cloud Platform
• Enable the Vision API
• Obtain a service account with access to the Vision API
• To perform PDF/TIFF document text detection, make a POST request

3.2. Blue Prism Configuration


Before importing the Skill, which has been downloaded from the Digital Exchange, it is necessary that the following
information is obtained:

1. Service Account with access to the Vision API

The outlined requirements are explained in the next subsection.

If any conflict or overwrite messages appear during import, then please refer to the Release Manager section in the
Product Help.

3.2.1. Credentials
An individual credential, defined and stored in Credential Manager, will hold the Service Account information
needed to form an OAuth 2.0 request which responds with a bearer token. Each action has a common parameter
named “OAuth 2 (JWT Bearer Token) Authentication Credential Name” required to authenticate against the Google
Vision API.

This section will now describe how you can create one of these credentials which will then be used to authenticate
with Google on each call. This example will be for setting up the Google Vision Skill. To start, navigate to the
Security – Credentials tab within the System menu of BluePrism. Then, click the “New” button to the right of the
view. A new window will open, see figure 6.2.1.A below. Make sure you select the “Type” as “OAuth 2.0 (JWT
Bearer Token)”.

Commercial in Confidence Page 4 of 8


The name of the credential can be anything you wish but labelling it with respect to the API is recommended, for
example -> “Google Vision API Credential” – you could even restrict the credential by robot, but that side of the
configuration is down to you. The issuer is the email listed IAM Section of Google Cloud Platform -> Service
Accounts. When you originally created a Service Account it is also listed in the .json file which was downloaded on
your local machine. Finally, the private key is the private key listed in the .json file which was downloaded to your
workstation as you created your service account. When copy and pasting in the private key, you must include the
following information:
• • -----BEGIN PRIVATE KEY-----\n
• • \n-----END PRIVATE KEY-----\n

Save the credential and make note of the name, as this will need to be used as a parameter for each action listed in
this document. The Google Vision Skill has now been correctly configured.

Commercial in Confidence Page 5 of 8


4. Using the Skill
The following section outlines the individual configuration and usage of, the action in the Google Vision Skill.
1. PDF/TIFF Document Text Detection

4.1. Common Authentication Parameter


4.1.1. Inputs
Parameter Type Description

OAuth 2 (JWT Bearer Text The name of the credential which has the OAuth 2.0 information used for
Token) Authentication authentication with Google
Credential Name

4.2. PDF Document Text Detection


4.2.1. Request
Parameter Direction Type Description

GCS PDF File Path In Text File Name and Location (Bucket Name) in Google Cloud Storage

GCS Output Folder In Text Output folder in Google Cloud Storage

Batch Size In Number The batch Size parameter specifies how many pages of output
should be included in each output JSON file

4.2.2. Response
Parameter Direction Type Description

Response Content Out Text This output parameter contains operation code; can
be used to query the status of the operation

HTTP Code Status Out Text This output is not expected to be required except
for debugging

Response Headers Out Collection This output is not expected to be required except
for debugging

Operation Code Out Text Operation Code retrieved from the response of PDF
document text extraction request. This value is
used to call GCS API to retrieve status of the
operation

Commercial in Confidence Page 6 of 8


4.3. TIFF Text Detection
4.3.1. Request
Parameter Direction Type Description

GCS TIFF File Path In Text File Name and Location (Bucket Name) in Google Cloud Storage

GCS Output Folder In Text Output folder in Google Cloud Storage

Batch Size In Number The batch Size parameter specifies how many pages of output
should be included in each output JSON file

4.3.2. Response
Parameter Direction Type Description

Response Content Out Text This output parameter contains operation code; can
be used to query the status of the operation

HTTP Code Status Out Text This output is not expected to be required except
for debugging

Response Headers Out Collection This output is not expected to be required except
for debugging

Operation Code Out Text Operation Code retrieved from the response of PDF
document text extraction request. This value is
used to call GCS API to retrieve status of the
operation

4.4. Operation Status


4.4.1. Request
Parameter Direction Type Description

Operation Code In Text Operation Code retrieved from the PDF/TIFF document text
extraction request

4.4.2. Response
Parameter Direction Type Description

Response Content Out Text This output parameter contains operation code; can
be used to query the status of the operation

HTTP Code Status Out Text This JSON output contains the state of the API
Operation. State includes text like “RUNNING”,
“DONE” etc. Query this JSON output to retrieve
state value.

Commercial in Confidence Page 7 of 8


Response Headers Out Collection This output is not expected to be required except
for debugging

State Out Text Returns operation status (“RUNNING”, “DONE”) of


the submitted document for processing

5. Support
Support for these skills is provided by Blue Prism on a best endeavours basis and is not subject to formal SLAs. Full
details of how to obtain support are provided at:

https://portal.blueprism.com/customer-support/how-get-help-customer-support

The preferred channel of support is to create a support ticket on the Customer Portal. If this is not suitable for
some reason, alternatively Blue Prism can be contacted by the following channels:
• E-mail: [email protected]
• Phone: +44(0)330 321 0055 (UK, Europe, Middle East and Africa)
+1 844 321 0055 (North America)
+61 (2) 807 42915 (Asia Pacific)

6. Functional Tests
A test process is available from the Blue Prism Digital Exchange. However, a valid subscription is required for this to
run, no universal test account is available.

7. Troubleshooting Guidelines
In the even that unexpected behaviour is observed when using this skill, it is recommended that you investigate the
potential cause/solution using the error message/response content. Details of error messages and resolutions are
available from the Google Cloud Portal (https://cloud.google.com/vision/docs/pdf).

8. Frequently Asked Questions


There are no frequently asked questions at this stage.

Commercial in Confidence Page 8 of 8

You might also like