All Questions
Tagged with cloud-document-ai python
41 questions
0
votes
0
answers
2k
views
Document AI - Processor location issue [duplicate]
I'm using a Mac and I have created a simple Document AI processor on the Google Cloud Platform (PDF splitter). This processor was trained, tested and deployed.
I'm now desperately trying to make use ...
0
votes
0
answers
66
views
DocumentAI OCR Error: Invalid Document Content
I am calling DocumentAI OCR batch processing from Workflows generally quite successfully, however, I occasionally get the following error:
{
"caughtError": {
"message": "...
0
votes
1
answer
586
views
How to Batch Process Long Documents Exceeding the Google Document AI Page Limit?
I'm working with Google Document AI to process long documents, where the number of pages exceeds the processor limit (~8k pages). The current documented page limit for Enterprise OCR is 500 pages for ...
0
votes
0
answers
40
views
How can I run more than two Thread to parse multiple documents with DocumentProcessorServiceAsyncClient - python
As such, the code works, but only with two Threads, if I add another one, the process stops and then takes a time out. I don't know if DocumentProcessorServiceAsyncClient will have a limit of two ...
0
votes
0
answers
41
views
Google Document AI directly processes CV2 object
I know we can upload the image file to be processed in Google DocumentAI; I am building an app that leverages DocumentAI API in Python. Is there a way for DocumentAI to process image in numpy array?
...
0
votes
1
answer
81
views
GCP API for AI Documents
I'm having issues with the API, there is no response whatsoever. I have created the service account with the corresponding API key with its JSON file, however, I cannot seem to get any response when ...
1
vote
1
answer
485
views
Document AI "400 No valid schema provided for processing" with Cloud Function
I’ve been experiencing an issue with the Google Cloud Document AI API in my Firebase Cloud Function that handles documents uploaded to Google Cloud Storage. The function triggers correctly upon PDF ...
0
votes
0
answers
73
views
Document AI: 400 Request contains an invalid argument The return of the beast
This code raises the 400 error without details:
from google.api_core.client_options import ClientOptions
from google.cloud import documentai
from PIL import Image
from io import BytesIO
from base64 ...
1
vote
1
answer
142
views
Using Batch Processing Document AI inside the google cloud function
I have a scenario where I am uploading a local file to a Cloud Storage bucket, triggering a Cloud Function (xyz). Within this Cloud Function, I am performing a batch processing task using Google Cloud ...
0
votes
1
answer
263
views
google.api_core.exceptions.InvalidArgument: 400 The resource projects/{my-proj-id}/locations/eu is not located in us
I am trying to use the Google DocAI Warehouse sample Python code and it looks like that the location parameter is always ignored and just assumes the 'us' location.
My prototype project has 'eu' as ...
0
votes
1
answer
168
views
How do I iterate through JSON files stored in GCP bucket in different folders. Example; | Bucket/Dict/Folder2/file.json Bucket/Dict/Folder1/file.json
I have dumped JSON files from DOCAI to GCP but each file is stored in individual folder, although they are in the same bucket on Cloud Storage. I am not able to iterate through the JSON files stored ...
0
votes
1
answer
1k
views
How to locally process a batch of files using Document AI with the Python client?
I'm trying to use the Python console to use the Document OCR processor to locally process a large amount of pdf documents (native and scanned) to extract the text and some metadata. The documents are ...
0
votes
1
answer
4k
views
Extract Table with structure maintained from PDF for feeding into LLM's
I am trying to feed in LLM Model more specifically Vertex AI from Google a context from PDF. Generally GCP Document AI can do OCR to get text from the PDF, that text I pass on to LLM model as context ...
2
votes
1
answer
117
views
Is there a solution to select the first and the last character of certain regex patterns?
There is a very long text in xml format like:
><span class='ocrx_word' id='word_1_21_0_1_0' title='bbox 409 912 417 927'><</span><span class='ocrx_word' id='word_1_21_0_1_1' title=...
0
votes
1
answer
286
views
Original File Name - GCP - Document AI
I'm using Document AI to perform OCR on some thousands of pdf documents with their python client.
I'm uploading them into a bucket, batch processing them and a .json output is generated in another ...
1
vote
1
answer
420
views
How can I extract information from "google.cloud.documentai_v1.types.evaluation.Evaluationt"
I am new in the world of cloud and I am trying to use the DOCUMENT AI from GOOGLE but I stucked on how to extract information like precision, accuracy and others from a training evaluation. Here is ...
0
votes
1
answer
632
views
Document AI form parsing on documents with different format
We have a client that wishes to automatically extract information in different PDF files to fill their form. Those documents are all different in their format, for example, sometimes to extract the ...
0
votes
2
answers
2k
views
How to process a single GCS-stored file on Document AI with the Python client?
I have been testing out the Google Document AI Python client,
but I couldn't get the process_document()
function working when trying to process one single document stored on Google Cloud Storage.
What ...
0
votes
1
answer
1k
views
Google Document AI Python Query Throws "ValueError: Unknown field for ProcessRequest: document_type"... base64 encoding throws another error
I'm running the sample query for Python using an OCR Google Document AI processor. The only difference between my query and this sample query:
process_document_sample(
project_id="99999FAKE&...
1
vote
1
answer
507
views
Document AI - Converting the normalized_vertices to the orginal scale of the document
I am using Google Cloud - Document AI service. I have custom built some processors for "form data extraction" using the "Custom Entity Extractor" which processes PDF documents.
I ...
0
votes
1
answer
881
views
How to convert the json to a document object for DocumentAI
Using a general form parser, i want to fetch the entities and append those to the document object. (for a general form parser-- there are no properties called "entities", so need to create ...
-3
votes
1
answer
146
views
Custom document extractor Batch Processing Request
How to send batch processing request using Custom document extractor?
I tried using Jupyter Notebook by creating a cluster but the Python code didn't work. It's not showing any output whenever I run ...
0
votes
2
answers
350
views
.proto file for DocumentAI Document object
I am using DocumentAI API and want to serialize/deserialize the Document object
https://cloud.google.com/python/docs/reference/documentai-toolbox/latest/google.cloud.documentai_toolbox.wrappers....
0
votes
1
answer
91
views
Can I use form parserr to only perform table detection, and not table content extraction?
I have a form parser processor setup, and I only need the bounding box of the detected page in my image, I don't need it to do the table text extraction as well. Is there anyway I can do this (if yes, ...
0
votes
0
answers
156
views
When training a GCP Document AI Custom Processor, how do I get it to only grab characters after/before a symbol (e.g. '-' or '/')?
I am training a GCP Document AI custom processor to extract data from PDF patent forms. One line in particular is troublesome. On the forms, the Application No./Patent No. is presented as follows: ...
0
votes
1
answer
902
views
Document.AI python client does not return tables
I want to use Document.ai to extract data from tables in my pdf.
I was following this code snippet https://cloud.google.com/document-ai/docs/handle-response#code_samples_2
But my table array is always ...
0
votes
1
answer
1k
views
Using Document AI with python from google and code from google codelabs returns wrong or empty result
I tried the following code from codelabs.developers.google.com:
import pandas as pd
from google.cloud import documentai_v1 as documentai
def online_process(
project_id: str,
location: str,
...
2
votes
1
answer
2k
views
Google DocumentAI -> ValueError: Protocol message Document has no "file" field
In my script, I have the following:
response = requests.get(list_url[0], allow_redirects=True)
s = io.BytesIO()
s.write(response.content)
s.seek(0)
mimetype="application/octet-stream"
...
1
vote
2
answers
2k
views
Document AI process document fails with invalid argument when processing docs from GCS
I am getting an error very similar to the below, but I am not in EU:
Document AI: google.api_core.exceptions.InvalidArgument: 400 Request contains an invalid argument
When I use the raw_document and ...
3
votes
2
answers
3k
views
Document AI - Improving batch process time for a single document?
I'm working on a GCP Document AI project. First, let me say this - the OCR works fine :-). I'm curios to know about possibilities of improvement if possible.
What happens now
I have a python module ...
2
votes
0
answers
694
views
Google Document AI api authentication error
I am trying to use the Invoice parser from the Document AI API that google provides. I keep getting the below error even if I have followed all the required steps in their documentation. I have ...
5
votes
1
answer
1k
views
Google Document Ai giving different outputs for the same file
I was using Document OCR API to extract text from a pdf file, but part of it is not accurate. I found that the reason may be due to the existence of some Chinese characters.
The following is a made-up ...
0
votes
1
answer
688
views
Running Google Cloud DocumentAI sample code on Python returned the error 503
I am trying the example from the Google repo:
https://github.com/googleapis/python-documentai/blob/HEAD/samples/snippets/quickstart_sample.py
I have an error:
metadata=[('x-goog-request-params', 'name=...
4
votes
1
answer
3k
views
How can I convert "google.cloud.documentai_v1.types.document" object to json
I am using Google Cloud Document AI's Invoice Parser. API response is google.cloud.documentai_v1.types.Document object. I tried to write below approaches for converting it to JSON but nothing works:
...
1
vote
1
answer
4k
views
how to serialize/deserialize a protobuf response from google documentai API?
I'm working with a google API to process documents from upload. What I'm trying to achieve is saving the protobuf in the response as a .proto file so I could work with it later.
I can do response._pb....
0
votes
3
answers
1k
views
How can I split a PDF in Google cloud storage?
I have a single PDF that I would like to create different PDFs for each of its pages. How would I be able to so without downloading anything locally? I know that Document AI has a file splitting ...
1
vote
2
answers
7k
views
google.api_core.exceptions.InternalServerError: 500 Failed to process all the documents
I am getting this error when trying to implement the Document OCR from google cloud in python as explained here: https://cloud.google.com/document-ai/docs/ocr#documentai_process_document-python.
When ...
8
votes
1
answer
25k
views
Document AI: google.api_core.exceptions.InvalidArgument: 400 Request contains an invalid argument
I am getting this error when trying to implement the Document OCR from google cloud in python as explained here: https://cloud.google.com/document-ai/docs/ocr
When I run
result = client....
0
votes
1
answer
263
views
Permission denied when invoking Document AI v1beta3 from Cloud Function
I'm trying to call to DocumentAI v1beta3 from Cloud Function with the code snippet as follow:
client = documentai.DocumentProcessorServiceClient()
input_doc = documentai.types.Document(
content=...
0
votes
1
answer
174
views
rowSpan and colSpan of cell are always 1, by Google Document AI processor
import json
ifp = open('log.json')
response = json.load(ifp)
for bodyRow in response['document']['pages'][0]['tables'][1]['bodyRows']:
for cell in bodyRow['cells']:
print(f'rowSpan is {...
0
votes
1
answer
693
views
Is there a way to pass credentials programmatically for using Google documentAI without reading from a disk?
I am trying to run the demo code given in PDF parsing of GCP document AI. To run the code, exporting Google credentials as a command line works fine. The problem comes when the code needs to run in ...