Skip to main content

All Questions

Filter by
Sorted by
Tagged with
0 votes
0 answers

Document AI - Processor location issue [duplicate]

I'm using a Mac and I have created a simple Document AI processor on the Google Cloud Platform (PDF splitter). This processor was trained, tested and deployed. I'm now desperately trying to make use ...
AlexCT's user avatar
  • 35
0 votes
0 answers

DocumentAI OCR Error: Invalid Document Content

I am calling DocumentAI OCR batch processing from Workflows generally quite successfully, however, I occasionally get the following error: { "caughtError": { "message": "...
Leo Glowacki's user avatar
0 votes
1 answer

How to Batch Process Long Documents Exceeding the Google Document AI Page Limit?

I'm working with Google Document AI to process long documents, where the number of pages exceeds the processor limit (~8k pages). The current documented page limit for Enterprise OCR is 500 pages for ...
Leo Glowacki's user avatar
0 votes
0 answers

How can I run more than two Thread to parse multiple documents with DocumentProcessorServiceAsyncClient - python

As such, the code works, but only with two Threads, if I add another one, the process stops and then takes a time out. I don't know if DocumentProcessorServiceAsyncClient will have a limit of two ...
Jeison Jose Bolano Pabon's user avatar
0 votes
0 answers

Google Document AI directly processes CV2 object

I know we can upload the image file to be processed in Google DocumentAI; I am building an app that leverages DocumentAI API in Python. Is there a way for DocumentAI to process image in numpy array? ...
skw1990's user avatar
  • 63
0 votes
1 answer

GCP API for AI Documents

I'm having issues with the API, there is no response whatsoever. I have created  the service account with the corresponding API key with its JSON file, however, I cannot seem to get any response when ...
Keagan Gilmore's user avatar
1 vote
1 answer

Document AI "400 No valid schema provided for processing" with Cloud Function

I’ve been experiencing an issue with the Google Cloud Document AI API in my Firebase Cloud Function that handles documents uploaded to Google Cloud Storage. The function triggers correctly upon PDF ...
HaZeust's user avatar
  • 13
0 votes
0 answers

Document AI: 400 Request contains an invalid argument The return of the beast

This code raises the 400 error without details: from google.api_core.client_options import ClientOptions from import documentai from PIL import Image from io import BytesIO from base64 ...
Garito's user avatar
  • 19
1 vote
1 answer

Using Batch Processing Document AI inside the google cloud function

I have a scenario where I am uploading a local file to a Cloud Storage bucket, triggering a Cloud Function (xyz). Within this Cloud Function, I am performing a batch processing task using Google Cloud ...
Manish gupta's user avatar
0 votes
1 answer

google.api_core.exceptions.InvalidArgument: 400 The resource projects/{my-proj-id}/locations/eu is not located in us

I am trying to use the Google DocAI Warehouse sample Python code and it looks like that the location parameter is always ignored and just assumes the 'us' location. My prototype project has 'eu' as ...
caoimhinmacg's user avatar
0 votes
1 answer

How do I iterate through JSON files stored in GCP bucket in different folders. Example; | Bucket/Dict/Folder2/file.json Bucket/Dict/Folder1/file.json

I have dumped JSON files from DOCAI to GCP but each file is stored in individual folder, although they are in the same bucket on Cloud Storage. I am not able to iterate through the JSON files stored ...
Vedant Patil's user avatar
0 votes
1 answer

How to locally process a batch of files using Document AI with the Python client?

I'm trying to use the Python console to use the Document OCR processor to locally process a large amount of pdf documents (native and scanned) to extract the text and some metadata. The documents are ...
Vojta Partík's user avatar
0 votes
1 answer

Extract Table with structure maintained from PDF for feeding into LLM's

I am trying to feed in LLM Model more specifically Vertex AI from Google a context from PDF. Generally GCP Document AI can do OCR to get text from the PDF, that text I pass on to LLM model as context ...
Sarthak Pan's user avatar
2 votes
1 answer

Is there a solution to select the first and the last character of certain regex patterns?

There is a very long text in xml format like: ><span class='ocrx_word' id='word_1_21_0_1_0' title='bbox 409 912 417 927'><</span><span class='ocrx_word' id='word_1_21_0_1_1' title=...
kang's user avatar
  • 23
0 votes
1 answer

Original File Name - GCP - Document AI

I'm using Document AI to perform OCR on some thousands of pdf documents with their python client. I'm uploading them into a bucket, batch processing them and a .json output is generated in another ...
Camillo's user avatar
1 vote
1 answer

How can I extract information from ""

I am new in the world of cloud and I am trying to use the DOCUMENT AI from GOOGLE but I stucked on how to extract information like precision, accuracy and others from a training evaluation. Here is ...
Atilio's user avatar
  • 115
0 votes
1 answer

Document AI form parsing on documents with different format

We have a client that wishes to automatically extract information in different PDF files to fill their form. Those documents are all different in their format, for example, sometimes to extract the ...
prime's user avatar
  • 25
0 votes
2 answers

How to process a single GCS-stored file on Document AI with the Python client?

I have been testing out the Google Document AI Python client, but I couldn't get the process_document() function working when trying to process one single document stored on Google Cloud Storage. What ...
mimocha's user avatar
  • 1,111
0 votes
1 answer

Google Document AI Python Query Throws "ValueError: Unknown field for ProcessRequest: document_type"... base64 encoding throws another error

I'm running the sample query for Python using an OCR Google Document AI processor. The only difference between my query and this sample query: process_document_sample( project_id="99999FAKE&...
Hack-R's user avatar
  • 23.2k
1 vote
1 answer

Document AI - Converting the normalized_vertices to the orginal scale of the document

I am using Google Cloud - Document AI service. I have custom built some processors for "form data extraction" using the "Custom Entity Extractor" which processes PDF documents. I ...
Rajiv2806's user avatar
  • 120
0 votes
1 answer

How to convert the json to a document object for DocumentAI

Using a general form parser, i want to fetch the entities and append those to the document object. (for a general form parser-- there are no properties called "entities", so need to create ...
Asit Panda's user avatar
-3 votes
1 answer

Custom document extractor Batch Processing Request

How to send batch processing request using Custom document extractor? I tried using Jupyter Notebook by creating a cluster but the Python code didn't work. It's not showing any output whenever I run ...
Akshita Dewadwal's user avatar
0 votes
2 answers

.proto file for DocumentAI Document object

I am using DocumentAI API and want to serialize/deserialize the Document object
anonaka's user avatar
  • 95
0 votes
1 answer

Can I use form parserr to only perform table detection, and not table content extraction?

I have a form parser processor setup, and I only need the bounding box of the detected page in my image, I don't need it to do the table text extraction as well. Is there anyway I can do this (if yes, ...
Tarun Narayanan's user avatar
0 votes
0 answers

When training a GCP Document AI Custom Processor, how do I get it to only grab characters after/before a symbol (e.g. '-' or '/')?

I am training a GCP Document AI custom processor to extract data from PDF patent forms. One line in particular is troublesome. On the forms, the Application No./Patent No. is presented as follows: ...
imihailov's user avatar
0 votes
1 answer

Document.AI python client does not return tables

I want to use to extract data from tables in my pdf. I was following this code snippet But my table array is always ...
Michał Bogusz's user avatar
0 votes
1 answer

Using Document AI with python from google and code from google codelabs returns wrong or empty result

I tried the following code from import pandas as pd from import documentai_v1 as documentai def online_process( project_id: str, location: str, ...
mj1261829's user avatar
  • 1,309
2 votes
1 answer

Google DocumentAI -> ValueError: Protocol message Document has no "file" field

In my script, I have the following: response = requests.get(list_url[0], allow_redirects=True) s = io.BytesIO() s.write(response.content) mimetype="application/octet-stream" ...
An old man in the sea.'s user avatar
1 vote
2 answers

Document AI process document fails with invalid argument when processing docs from GCS

I am getting an error very similar to the below, but I am not in EU: Document AI: google.api_core.exceptions.InvalidArgument: 400 Request contains an invalid argument When I use the raw_document and ...
sacoder's user avatar
  • 189
3 votes
2 answers

Document AI - Improving batch process time for a single document?

I'm working on a GCP Document AI project. First, let me say this - the OCR works fine :-). I'm curios to know about possibilities of improvement if possible. What happens now I have a python module ...
Kris's user avatar
  • 8,858
2 votes
0 answers

Google Document AI api authentication error

I am trying to use the Invoice parser from the Document AI API that google provides. I keep getting the below error even if I have followed all the required steps in their documentation. I have ...
Vlad Tanase's user avatar
5 votes
1 answer

Google Document Ai giving different outputs for the same file

I was using Document OCR API to extract text from a pdf file, but part of it is not accurate. I found that the reason may be due to the existence of some Chinese characters. The following is a made-up ...
iter07's user avatar
  • 61
0 votes
1 answer

Running Google Cloud DocumentAI sample code on Python returned the error 503

I am trying the example from the Google repo: I have an error: metadata=[('x-goog-request-params', 'name=...
mommomonthewind's user avatar
4 votes
1 answer

How can I convert "" object to json

I am using Google Cloud Document AI's Invoice Parser. API response is object. I tried to write below approaches for converting it to JSON but nothing works: ...
kushagra's user avatar
  • 151
1 vote
1 answer

how to serialize/deserialize a protobuf response from google documentai API?

I'm working with a google API to process documents from upload. What I'm trying to achieve is saving the protobuf in the response as a .proto file so I could work with it later. I can do response._pb....
guiparpinelli's user avatar
0 votes
3 answers

How can I split a PDF in Google cloud storage?

I have a single PDF that I would like to create different PDFs for each of its pages. How would I be able to so without downloading anything locally? I know that Document AI has a file splitting ...
saladass4254's user avatar
1 vote
2 answers

google.api_core.exceptions.InternalServerError: 500 Failed to process all the documents

I am getting this error when trying to implement the Document OCR from google cloud in python as explained here: When ...
MegaSpeed45's user avatar
8 votes
1 answer

Document AI: google.api_core.exceptions.InvalidArgument: 400 Request contains an invalid argument

I am getting this error when trying to implement the Document OCR from google cloud in python as explained here: When I run result = client....
MegaSpeed45's user avatar
0 votes
1 answer

Permission denied when invoking Document AI v1beta3 from Cloud Function

I'm trying to call to DocumentAI v1beta3 from Cloud Function with the code snippet as follow: client = documentai.DocumentProcessorServiceClient() input_doc = documentai.types.Document( content=...
imationyj's user avatar
  • 145
0 votes
1 answer

rowSpan and colSpan of cell are always 1, by Google Document AI processor

import json ifp = open('log.json') response = json.load(ifp) for bodyRow in response['document']['pages'][0]['tables'][1]['bodyRows']: for cell in bodyRow['cells']: print(f'rowSpan is {...
dio lee's user avatar
  • 11
0 votes
1 answer

Is there a way to pass credentials programmatically for using Google documentAI without reading from a disk?

I am trying to run the demo code given in PDF parsing of GCP document AI. To run the code, exporting Google credentials as a command line works fine. The problem comes when the code needs to run in ...
sentinel's user avatar