Newest 'cloud-document-ai+python' Questions

0 votes

0 answers

2k views

Document AI - Processor location issue [duplicate]

I'm using a Mac and I have created a simple Document AI processor on the Google Cloud Platform (PDF splitter). This processor was trained, tested and deployed. I'm now desperately trying to make use ...

AlexCT

35

asked Jul 26 at 22:29

0 votes

0 answers

66 views

DocumentAI OCR Error: Invalid Document Content

I am calling DocumentAI OCR batch processing from Workflows generally quite successfully, however, I occasionally get the following error: { "caughtError": { "message": "...

Leo Glowacki

101

asked Jul 23 at 19:47

0 votes

1 answer

586 views

How to Batch Process Long Documents Exceeding the Google Document AI Page Limit?

I'm working with Google Document AI to process long documents, where the number of pages exceeds the processor limit (~8k pages). The current documented page limit for Enterprise OCR is 500 pages for ...

Leo Glowacki

101

asked Jun 18 at 12:41

0 votes

0 answers

40 views

How can I run more than two Thread to parse multiple documents with DocumentProcessorServiceAsyncClient - python

As such, the code works, but only with two Threads, if I add another one, the process stops and then takes a time out. I don't know if DocumentProcessorServiceAsyncClient will have a limit of two ...

Jeison Jose Bolano Pabon

1

asked May 21 at 16:39

0 votes

0 answers

41 views

Google Document AI directly processes CV2 object

I know we can upload the image file to be processed in Google DocumentAI; I am building an app that leverages DocumentAI API in Python. Is there a way for DocumentAI to process image in numpy array? ...

skw1990

63

asked Apr 23 at 10:55

0 votes

1 answer

81 views

GCP API for AI Documents

I'm having issues with the API, there is no response whatsoever. I have created the service account with the corresponding API key with its JSON file, however, I cannot seem to get any response when ...

Keagan Gilmore

1

asked Feb 22 at 10:15

1 vote

1 answer

485 views

Document AI "400 No valid schema provided for processing" with Cloud Function

I’ve been experiencing an issue with the Google Cloud Document AI API in my Firebase Cloud Function that handles documents uploaded to Google Cloud Storage. The function triggers correctly upon PDF ...

HaZeust

13

asked Jan 31 at 17:52

0 votes

0 answers

73 views

Document AI: 400 Request contains an invalid argument The return of the beast

This code raises the 400 error without details: from google.api_core.client_options import ClientOptions from google.cloud import documentai from PIL import Image from io import BytesIO from base64 ...

Garito

19

asked Jan 12 at 14:46

1 vote

1 answer

142 views

Using Batch Processing Document AI inside the google cloud function

I have a scenario where I am uploading a local file to a Cloud Storage bucket, triggering a Cloud Function (xyz). Within this Cloud Function, I am performing a batch processing task using Google Cloud ...

Manish gupta

11

asked Jan 4 at 17:52

0 votes

1 answer

263 views

google.api_core.exceptions.InvalidArgument: 400 The resource projects/{my-proj-id}/locations/eu is not located in us

I am trying to use the Google DocAI Warehouse sample Python code and it looks like that the location parameter is always ignored and just assumes the 'us' location. My prototype project has 'eu' as ...

caoimhinmacg

11

asked Dec 8, 2023 at 3:02

0 votes

1 answer

168 views

How do I iterate through JSON files stored in GCP bucket in different folders. Example; | Bucket/Dict/Folder2/file.json Bucket/Dict/Folder1/file.json

I have dumped JSON files from DOCAI to GCP but each file is stored in individual folder, although they are in the same bucket on Cloud Storage. I am not able to iterate through the JSON files stored ...

Vedant Patil

1

asked Dec 5, 2023 at 17:41

0 votes

1 answer

1k views

How to locally process a batch of files using Document AI with the Python client?

I'm trying to use the Python console to use the Document OCR processor to locally process a large amount of pdf documents (native and scanned) to extract the text and some metadata. The documents are ...

Vojta Partík

1

asked Oct 29, 2023 at 23:17

0 votes

1 answer

4k views

Extract Table with structure maintained from PDF for feeding into LLM's

I am trying to feed in LLM Model more specifically Vertex AI from Google a context from PDF. Generally GCP Document AI can do OCR to get text from the PDF, that text I pass on to LLM model as context ...

Sarthak Pan

1

asked Sep 28, 2023 at 10:24

2 votes

1 answer

117 views

Is there a solution to select the first and the last character of certain regex patterns?

There is a very long text in xml format like: ><span class='ocrx_word' id='word_1_21_0_1_0' title='bbox 409 912 417 927'><</span><span class='ocrx_word' id='word_1_21_0_1_1' title=...

kang

23

asked Sep 24, 2023 at 15:57

0 votes

1 answer

286 views

Original File Name - GCP - Document AI

I'm using Document AI to perform OCR on some thousands of pdf documents with their python client. I'm uploading them into a bucket, batch processing them and a .json output is generated in another ...

Camillo

1

asked Sep 13, 2023 at 13:03

1 vote

1 answer

420 views

How can I extract information from "google.cloud.documentai_v1.types.evaluation.Evaluationt"

I am new in the world of cloud and I am trying to use the DOCUMENT AI from GOOGLE but I stucked on how to extract information like precision, accuracy and others from a training evaluation. Here is ...

Atilio

115

asked Sep 7, 2023 at 18:33

0 votes

1 answer

632 views

Document AI form parsing on documents with different format

We have a client that wishes to automatically extract information in different PDF files to fill their form. Those documents are all different in their format, for example, sometimes to extract the ...

prime

25

asked Jul 28, 2023 at 13:38

0 votes

2 answers

2k views

How to process a single GCS-stored file on Document AI with the Python client?

I have been testing out the Google Document AI Python client, but I couldn't get the process_document() function working when trying to process one single document stored on Google Cloud Storage. What ...

mimocha

1,111

asked Jul 9, 2023 at 8:00

0 votes

1 answer

1k views

Google Document AI Python Query Throws "ValueError: Unknown field for ProcessRequest: document_type"... base64 encoding throws another error

I'm running the sample query for Python using an OCR Google Document AI processor. The only difference between my query and this sample query: process_document_sample( project_id="99999FAKE&...

Hack-R

23.2k

asked Jun 28, 2023 at 0:20

1 vote

1 answer

507 views

Document AI - Converting the normalized_vertices to the orginal scale of the document

I am using Google Cloud - Document AI service. I have custom built some processors for "form data extraction" using the "Custom Entity Extractor" which processes PDF documents. I ...

Rajiv2806

120

asked Jun 8, 2023 at 18:56

0 votes

1 answer

881 views

How to convert the json to a document object for DocumentAI

Using a general form parser, i want to fetch the entities and append those to the document object. (for a general form parser-- there are no properties called "entities", so need to create ...

Asit Panda

1

asked Apr 11, 2023 at 9:33

-3 votes

1 answer

146 views

Custom document extractor Batch Processing Request

How to send batch processing request using Custom document extractor? I tried using Jupyter Notebook by creating a cluster but the Python code didn't work. It's not showing any output whenever I run ...

Akshita Dewadwal

9

asked Mar 20, 2023 at 10:14

0 votes

2 answers

350 views

.proto file for DocumentAI Document object

I am using DocumentAI API and want to serialize/deserialize the Document object https://cloud.google.com/python/docs/reference/documentai-toolbox/latest/google.cloud.documentai_toolbox.wrappers....

anonaka

95

asked Mar 19, 2023 at 2:26

0 votes

1 answer

91 views

Can I use form parserr to only perform table detection, and not table content extraction?

I have a form parser processor setup, and I only need the bounding box of the detected page in my image, I don't need it to do the table text extraction as well. Is there anyway I can do this (if yes, ...

Tarun Narayanan

1

asked Dec 7, 2022 at 22:02

0 votes

0 answers

156 views

When training a GCP Document AI Custom Processor, how do I get it to only grab characters after/before a symbol (e.g. '-' or '/')?

I am training a GCP Document AI custom processor to extract data from PDF patent forms. One line in particular is troublesome. On the forms, the Application No./Patent No. is presented as follows: ...

imihailov

13

asked Nov 23, 2022 at 10:03

0 votes

1 answer

902 views

Document.AI python client does not return tables

I want to use Document.ai to extract data from tables in my pdf. I was following this code snippet https://cloud.google.com/document-ai/docs/handle-response#code_samples_2 But my table array is always ...

Michał Bogusz

404

asked Aug 16, 2022 at 13:00

0 votes

1 answer

1k views

Using Document AI with python from google and code from google codelabs returns wrong or empty result

I tried the following code from codelabs.developers.google.com: import pandas as pd from google.cloud import documentai_v1 as documentai def online_process( project_id: str, location: str, ...

mj1261829

1,309

asked May 14, 2022 at 10:49

2 votes

1 answer

2k views

Google DocumentAI -> ValueError: Protocol message Document has no "file" field

In my script, I have the following: response = requests.get(list_url[0], allow_redirects=True) s = io.BytesIO() s.write(response.content) s.seek(0) mimetype="application/octet-stream" ...

An old man in the sea.

1,426

asked Mar 31, 2022 at 9:58

1 vote

2 answers

2k views

Document AI process document fails with invalid argument when processing docs from GCS

I am getting an error very similar to the below, but I am not in EU: Document AI: google.api_core.exceptions.InvalidArgument: 400 Request contains an invalid argument When I use the raw_document and ...

sacoder

189

asked Mar 14, 2022 at 22:29

3 votes

2 answers

3k views

Document AI - Improving batch process time for a single document?

I'm working on a GCP Document AI project. First, let me say this - the OCR works fine :-). I'm curios to know about possibilities of improvement if possible. What happens now I have a python module ...

Kris

8,858

asked Feb 21, 2022 at 10:46

2 votes

0 answers

694 views

Google Document AI api authentication error

I am trying to use the Invoice parser from the Document AI API that google provides. I keep getting the below error even if I have followed all the required steps in their documentation. I have ...

Vlad Tanase

77

asked Nov 3, 2021 at 9:45

5 votes

1 answer

1k views

Google Document Ai giving different outputs for the same file

I was using Document OCR API to extract text from a pdf file, but part of it is not accurate. I found that the reason may be due to the existence of some Chinese characters. The following is a made-up ...

iter07

61

asked Aug 13, 2021 at 9:33

0 votes

1 answer

688 views

Running Google Cloud DocumentAI sample code on Python returned the error 503

I am trying the example from the Google repo: https://github.com/googleapis/python-documentai/blob/HEAD/samples/snippets/quickstart_sample.py I have an error: metadata=[('x-goog-request-params', 'name=...

mommomonthewind

4,640

asked Aug 2, 2021 at 6:58

4 votes

1 answer

3k views

How can I convert "google.cloud.documentai_v1.types.document" object to json

I am using Google Cloud Document AI's Invoice Parser. API response is google.cloud.documentai_v1.types.Document object. I tried to write below approaches for converting it to JSON but nothing works: ...

kushagra

151

asked Jun 22, 2021 at 17:27

1 vote

1 answer

4k views

how to serialize/deserialize a protobuf response from google documentai API?

I'm working with a google API to process documents from upload. What I'm trying to achieve is saving the protobuf in the response as a .proto file so I could work with it later. I can do response._pb....

guiparpinelli

11

asked Jun 15, 2021 at 10:34

0 votes

3 answers

1k views

How can I split a PDF in Google cloud storage?

I have a single PDF that I would like to create different PDFs for each of its pages. How would I be able to so without downloading anything locally? I know that Document AI has a file splitting ...

saladass4254

75

asked May 14, 2021 at 2:32

1 vote

2 answers

7k views

google.api_core.exceptions.InternalServerError: 500 Failed to process all the documents

I am getting this error when trying to implement the Document OCR from google cloud in python as explained here: https://cloud.google.com/document-ai/docs/ocr#documentai_process_document-python. When ...

MegaSpeed45

95

asked Mar 4, 2021 at 14:06

8 votes

1 answer

25k views

Document AI: google.api_core.exceptions.InvalidArgument: 400 Request contains an invalid argument

I am getting this error when trying to implement the Document OCR from google cloud in python as explained here: https://cloud.google.com/document-ai/docs/ocr When I run result = client....

MegaSpeed45

95

asked Mar 3, 2021 at 11:52

0 votes

1 answer

263 views

Permission denied when invoking Document AI v1beta3 from Cloud Function

I'm trying to call to DocumentAI v1beta3 from Cloud Function with the code snippet as follow: client = documentai.DocumentProcessorServiceClient() input_doc = documentai.types.Document( content=...

imationyj

145

asked Jan 12, 2021 at 12:14

0 votes

1 answer

174 views

rowSpan and colSpan of cell are always 1, by Google Document AI processor

import json ifp = open('log.json') response = json.load(ifp) for bodyRow in response['document']['pages'][0]['tables'][1]['bodyRows']: for cell in bodyRow['cells']: print(f'rowSpan is {...

dio lee

11

asked Dec 10, 2020 at 2:46

0 votes

1 answer

693 views

Is there a way to pass credentials programmatically for using Google documentAI without reading from a disk?

I am trying to run the demo code given in PDF parsing of GCP document AI. To run the code, exporting Google credentials as a command line works fine. The problem comes when the code needs to run in ...

sentinel

1

asked Jul 2, 2020 at 14:42

Collectives™ on Stack Overflow

All Questions

Related Tags