Newest 'cloud-document-ai' Questions

0 votes

0 answers

30 views

Doesn't contain any ground-truth entity defined in the Schema

while training a Google Document Classifyer, the training failed with the below errors. On inspection of the error, I noticed that all failed documents belong to classification lables which have been ...

Stefan Walther

59

asked Nov 18 at 7:09

2 votes

1 answer

48 views

With Custom Extractor, Python API view of schema does not provide access to EntityTypes; it should according to docs

The API documentation shows that the DocumentSchema has EntityType children which should contain details of all fields in a Custom Extractor. I am able to obtain the DocumentSchema as expected. ...

stu2

89

asked Nov 17 at 9:21

0 votes

0 answers

27 views

Error code 13 training with document processor

Processor type: Custom extractor I'm having the following error when trying to train a document ai processor: { "code": 3, "message": "Invalid document.", "...

Pedro Henrique

1

asked Nov 8 at 14:35

0 votes

1 answer

48 views

How to use DocumentAI to extract data and bring the results to BQ using BQML?

I built a custom extractor in Document AI. Deployed version : pretrained-foundation-model-v1.3-2024-08-31 # Create a remote model to register your Doc AI processor in BigQuery. CREATE OR REPLACE ...

Avantika Banerjee

316

asked Oct 21 at 14:08

0 votes

0 answers

23 views

Custom Classifier Failed to refresh dataset stats

I'm training a Custom Classifier in Document AI. Worked fine and I had a Dataset with about 4000 documents. I trained multiple versions and they are running well. But now I'm not able to see these ...

Jannik Schneider

1

asked Sep 16 at 12:01

0 votes

1 answer

27 views

ValueError: Protocol message OcrConfig has no "premium_features" field when use DocumentAI

I'm using Font-style detection in Google's DocumentAI using Python: "premium_features": { "compute_style_info": True }, But it gives the following ...

jonah_w

1,032

asked Aug 9 at 13:20

0 votes

0 answers

2k views

Document AI - Processor location issue [duplicate]

I'm using a Mac and I have created a simple Document AI processor on the Google Cloud Platform (PDF splitter). This processor was trained, tested and deployed. I'm now desperately trying to make use ...

AlexCT

35

asked Jul 26 at 22:29

0 votes

0 answers

161 views

Google Document AI Fine Tuning is taking forever

I am using a foundational model "pretrained-foundation-model-v1.1-2024-03-12" to train a custom extractor on Google Document AI. I've set the epochs to 300 and Learning Rate to 1 (range is ...

Sri Ram M S

13

asked Jul 24 at 15:14

0 votes

0 answers

66 views

DocumentAI OCR Error: Invalid Document Content

I am calling DocumentAI OCR batch processing from Workflows generally quite successfully, however, I occasionally get the following error: { "caughtError": { "message": "...

Leo Glowacki

101

asked Jul 23 at 19:47

0 votes

0 answers

127 views

How to improve the checkbox detection on gcp documents ai?

We're using google OCR to read PDF or Images that are Loan Estimates. We're defining multiple fields such as loanTerm, loanPurpose loanPurpose but we're also labeling multiple checkboxes that can be ...

enyesate

1

asked Jul 17 at 13:49

0 votes

0 answers

53 views

Can't Extract Table from Image Using Google API

I am working to digitize these tables using google's Form Parser, but have been struggling to get accurate replication. I've tried to read the table from an image into a csv, but it is still missing ...

Lillian Yang

1

asked Jul 15 at 20:54

0 votes

0 answers

92 views

Using FieldMask for DocumentAI inner fields in Google Java SDK

I'm looking at the Google DocumentAI SDK and I'm trying to filter out an "inner" field, specifically the BoundingPoly.vertices field from all objects. The bounding poly is part of the Layout ...

PentaKon

4,626

asked Jul 10 at 16:15

0 votes

0 answers

61 views

ProcessDocument API Errors - No remaining quota for ParseDocument

As part of our workflow we invoke DocumentAI ProcessDocument API (v1) API from our back end and the code has been in place for over 6 months and running without any errors. In the past one week we ...

Charles

1

asked Jul 3 at 16:47

0 votes

0 answers

49 views

Google document AI BatchImportDocuments Error

While trying to import documents from google storage bucket using this function we are getting an HTTP:500 error, we are trying to push the documents into the train queue of the specified processor ...

Deepika Majji

1

asked Jun 24 at 10:09

0 votes

1 answer

585 views

How to Batch Process Long Documents Exceeding the Google Document AI Page Limit?

I'm working with Google Document AI to process long documents, where the number of pages exceeds the processor limit (~8k pages). The current documented page limit for Enterprise OCR is 500 pages for ...

Leo Glowacki

101

asked Jun 18 at 12:41

0 votes

0 answers

96 views

Not able to get bounding box and other fields from the Layout Parser service

I am trying to extract text from a PDF file using the Layout Parser Python SDK. I have copied and used the sample code from the docs. However I've noticed that the output does not contain all the ...

honeybees

41

asked Jun 13 at 20:43

0 votes

0 answers

17 views

Is it possible to see how many documents where used to create a Customer Extractor Model?

I can see how many testing documents the model was evaluated against, but can't see how many documents where used to train the model. Do I just assume that it's all documents, is there a way to view ...

Dane Padley

1

asked Jun 12 at 13:38

0 votes

0 answers

92 views

Is there a way to split a Google Document AI Document into it's pages

I've labeled 4 different documents for Google's Document AI, each with 15-30 pages. This means i've labeled about 100 pages on which I wanted to train a custom extractor. Now the extractor won't let ...

DadaMlatic

1

asked Jun 10 at 15:15

0 votes

1 answer

55 views

Filtering Google API responses using FieldMasks with Java SDK

I'm using Google's Java SDK to call into DocumentAI service. The response happens to contain the image for each page, in base64, and I'd like to filter that out. During the request building for doing ...

PentaKon

4,626

asked Jun 10 at 14:32

0 votes

1 answer

237 views

DocumentAI: 400 Request contains an invalid argument

If I run this code locally it works. On Cloud Run, I get "400 Request contains an invalid argument". As side notes: The input file path is a temporary file obtained using the get_file_path ...

Alessandro Ceccarelli

1,935

asked May 24 at 10:45

0 votes

0 answers

40 views

How can I run more than two Thread to parse multiple documents with DocumentProcessorServiceAsyncClient - python

As such, the code works, but only with two Threads, if I add another one, the process stops and then takes a time out. I don't know if DocumentProcessorServiceAsyncClient will have a limit of two ...

Jeison Jose Bolano Pabon

1

asked May 21 at 16:39

0 votes

0 answers

103 views

How to create GCP document AI custom extractor with generative AI model and update its schema from .net app

I need to create from a .net application GCP document AI custom extractor that is configured to use generative AI model and I need to update its schema with proper labels. I tried to achieve this with ...

mr100

4,418

asked May 20 at 18:28

0 votes

0 answers

100 views

Fetch All Highlighted text from PDF using Document AI

How can I get all high lighted word / text from PDF file using Google Document AI. I try with Document OCR, Form Parser and etc Processor of document AI I also try Custom Extractor, Custom Splitter I ...

Nikhil Patel

19

asked May 17 at 6:21

1 vote

0 answers

32 views

custom classifier/splitter dataset test limit

I am currently working on a project that utilizes the docai custom classifier. I have a question regarding the test dataset size limitations. As I understand, the current limit for the test dataset ...

Al Monteagudo

11

asked May 16 at 2:10

0 votes

0 answers

51 views

Can I export the trained model or Docker from Google Document AI Custom Extractor?

I've trained a custom form model using Google Document AI Custom Extractor. According to the official Document AI documentation and the Google Cloud platform interface, it seems that the only way to ...

Mindy Wu

1

asked May 15 at 7:37

0 votes

0 answers

24 views

Converting document object to dataframe csv with document ai toolbox

This code sample from google cloud docs is supposed to produce output as csv, html or markdown files, but all that is in the output is 'Tables in Document', when run in a Google Colab notebook: # ...

OfficeSupplySA

1

asked Apr 30 at 21:05

0 votes

0 answers

81 views

Google Document AI API Returning Worse Results Than Console

I am trying to use the google document ai api. I have created a custom processor and defined a custom schema. When I upload a document through the console, the processor highlights almost all of the ...

Dhruv Luthra

31

asked Apr 27 at 3:52

0 votes

0 answers

41 views

Google Document AI directly processes CV2 object

I know we can upload the image file to be processed in Google DocumentAI; I am building an app that leverages DocumentAI API in Python. Is there a way for DocumentAI to process image in numpy array? ...

skw1990

63

asked Apr 23 at 10:55

0 votes

1 answer

569 views

Getting an error when I am trying to use pre-built contract model on AI Document Intelligence Studio. Error code in the body

I was trying to analyze a contract using Microsoft's Document Intelligence Studio. All the pre-built models are working except for the contract pre-built model. I am getting error code: "...

Harsh Khewal

5

asked Apr 19 at 9:09

0 votes

0 answers

56 views

Issue with spacing not being detected by custom extractor

I've created and trained a custom document extractor via GCP's Document AI and have noticed that it doesn't always notice the space between two sets of numbers and ends up putting them together. An ...

pl8nt

49

asked Apr 15 at 19:38

0 votes

1 answer

370 views

Document AI - Multi-page files performance affect

I’ve noticed that it’s possible to upload multi-page files to Document AI, such that all pages are connected to each other by being associated to the same file. My use case is invoice files that I ...

Yaniv Ben-Malka

47

asked Mar 26 at 15:52

0 votes

1 answer

297 views

Auto-Labeling in Document AI with Custom Extractor: Schema Requirement Issue

I am using Document AI with a Custom Extractor. When I create a new Custom Extractor, it offers to manage my dataset. I expect that doing so will automatically create label names for the documents I ...

tmighty

11.3k

asked Mar 23 at 2:57

0 votes

1 answer

231 views

Google Document AI create labeling instruction

https://cloud.google.com/document-ai/docs/workbench/label-documents#labeling For google Document AI, what is a labeling instruction exactly? Is it a pdf where every label are annotated using a box? If ...

Max

1

asked Mar 1 at 10:42

0 votes

1 answer

99 views

Document AI adding folders

I'm using Document AI to parse PDF files from one bucket and then save them as JSON in another bucket in GCS. However, Document AI creates a folder with a subfolder in my bucket. I've read a lot and I ...

c0nfusion

1

asked Feb 29 at 12:00

0 votes

0 answers

44 views

Book Digitization: Is Google Document AI Necessary?

I have a question about Google Document AI: I intend to create a digitization service dedicated to Libraries. My goal is to digitize old books (no manuscripts). The result must be a PDF with the ...

Pier

1

asked Feb 28 at 16:23

0 votes

1 answer

72 views

Does the `Number` type in Google Document AI include decimals?

I've been using the document AI tool for a while and have quite a few documents labeled and just thought of a question: does the Number field type allow for decimals (ex: 0.3456) or does it only allow ...

pl8nt

49

asked Feb 28 at 14:40

0 votes

1 answer

81 views

GCP API for AI Documents

I'm having issues with the API, there is no response whatsoever. I have created the service account with the corresponding API key with its JSON file, however, I cannot seem to get any response when ...

Keagan Gilmore

1

asked Feb 22 at 10:15

0 votes

0 answers

63 views

fail to train document extractor

I tried to train a custom document extractor, using the minimum set (3 training + 3 test) for template-based training. I've retried for 3 times, all failed with ... { "name": "...

Samuel Fung

1

asked Feb 22 at 1:44

0 votes

0 answers

106 views

How can I tell Google Document AI Enterprise OCR to always assume one column?

How can I tell Google Document AI Enterprise OCR to always assume one column? My text (scans of old books) are always one column. However, due to layout, (lots of) whitespace, and inline figures, ...

SRobertJames

9,139

asked Feb 21 at 1:18

1 vote

1 answer

266 views

How can I use Google Document AI OCR to find the non-text images in a text document?

How can I use Google Document AI OCR to find the non-text images in a text document? I'm using Google Document AI Enterprise OCR to OCR images (scans of old books_, and it works well. The books have ...

SRobertJames

9,139

asked Feb 20 at 23:24

0 votes

0 answers

125 views

Will adjusting the value acquired from bounding box annotation train the model to be able to make inferences?

This may be a silly question but I've been annotating quite a few documents with the Google Document AI tool and have had this worry in the back of my mind. My task is to use Doc AI to extract ...

pl8nt

49

asked Feb 20 at 19:31

0 votes

0 answers

77 views

Line Ordering Issue with Arabic PDF Text Using Google Cloud Document AI

I have an app that uses Document AI to process PDFs and extract text from it. When I use the stable version but still is not accurate. The processed text seems to have its lines mixed up, not ...

Khaled Saleh

148

asked Feb 18 at 1:51

0 votes

1 answer

224 views

Response from Document AI stored in Google Cloud Storage

I am using a GCP workflow and eventarc trigger connected to cloud storage to have a document evaluated by Document AI when the cloud storage bucket receives it. The issue I'm encountering is, whenever ...

Lofton Gentry

293

asked Feb 17 at 18:00

1 vote

1 answer

132 views

Reskewing GCP Document AI Result

GCP's Document AI is pre-processing images to remove things like skew. The bounding boxes it produces correspond to the pre-processed image, not the image sent to the API. I need to reskew them so ...

user19213041

11

asked Feb 16 at 1:24

0 votes

0 answers

132 views

Configured Google Document AI to enable "computeStyleInfo", but not receiving any textStyles in the response

The textStyles array from the Document AI response object is empty, despite having set everything up following google's docAI documentation. I enabled document AI's font-style detection following ...

Ryan Hartwig

1

asked Feb 13 at 23:02

0 votes

1 answer

315 views

Document AI batch processing timeout using Java

I am trying to batch process a set of documents using Document AI and its Java SDK. My code is derived from the batch processing example for Java (seen here), but I have modified it to add more than ...

Filip Östermark

435

asked Feb 6 at 17:36

0 votes

0 answers

63 views

Impact of Using PDF Training Data and JPG Test Data on Document AI Model Performance

I'm currently working on a document AI project (with Custom Extractor) and have encountered a scenario that I'm unsure how to navigate. My training dataset of Shipping instruction documents consists ...

lht_18018

64

asked Feb 1 at 7:58

1 vote

1 answer

485 views

Document AI "400 No valid schema provided for processing" with Cloud Function

I’ve been experiencing an issue with the Google Cloud Document AI API in my Firebase Cloud Function that handles documents uploaded to Google Cloud Storage. The function triggers correctly upon PDF ...

HaZeust

13

asked Jan 31 at 17:52

1 vote

0 answers

220 views

(Terraform) BigQuery Job misses IAM permissions, which have been granted

I read this blogpost about the recently published Document AI - BigQuery Integration. I want to configure this setup completly using terraform. An important step in the blog post is the configuration ...

Brian

95

asked Jan 26 at 10:48

0 votes

1 answer

477 views

Improve Document AI generative AI accuracy?

I am creating a Document AI Custom Processor on Google Cloud Platform. I have been using the pre-trained foundation model to auto-label documents as I import them. However, it is not clear to me if ...

Filip Östermark

435

asked Jan 16 at 10:24

Collectives™ on Stack Overflow

Related Tags