Skip to main content
Filter by
Sorted by
Tagged with
0 votes
0 answers
30 views

Doesn't contain any ground-truth entity defined in the Schema

while training a Google Document Classifyer, the training failed with the below errors. On inspection of the error, I noticed that all failed documents belong to classification lables which have been ...
Stefan Walther's user avatar
2 votes
1 answer
48 views

With Custom Extractor, Python API view of schema does not provide access to EntityTypes; it should according to docs

The API documentation shows that the DocumentSchema has EntityType children which should contain details of all fields in a Custom Extractor. I am able to obtain the DocumentSchema as expected. ...
stu2's user avatar
  • 89
0 votes
0 answers
27 views

Error code 13 training with document processor

Processor type: Custom extractor I'm having the following error when trying to train a document ai processor: { "code": 3, "message": "Invalid document.", "...
Pedro Henrique's user avatar
0 votes
1 answer
48 views

How to use DocumentAI to extract data and bring the results to BQ using BQML?

I built a custom extractor in Document AI. Deployed version : pretrained-foundation-model-v1.3-2024-08-31 # Create a remote model to register your Doc AI processor in BigQuery. CREATE OR REPLACE ...
Avantika Banerjee's user avatar
0 votes
0 answers
23 views

Custom Classifier Failed to refresh dataset stats

I'm training a Custom Classifier in Document AI. Worked fine and I had a Dataset with about 4000 documents. I trained multiple versions and they are running well. But now I'm not able to see these ...
Jannik Schneider's user avatar
0 votes
1 answer
27 views

ValueError: Protocol message OcrConfig has no "premium_features" field when use DocumentAI

I'm using Font-style detection in Google's DocumentAI using Python: "premium_features": { "compute_style_info": True }, But it gives the following ...
jonah_w's user avatar
  • 1,032
0 votes
0 answers
2k views

Document AI - Processor location issue [duplicate]

I'm using a Mac and I have created a simple Document AI processor on the Google Cloud Platform (PDF splitter). This processor was trained, tested and deployed. I'm now desperately trying to make use ...
AlexCT's user avatar
  • 35
0 votes
0 answers
161 views

Google Document AI Fine Tuning is taking forever

I am using a foundational model "pretrained-foundation-model-v1.1-2024-03-12" to train a custom extractor on Google Document AI. I've set the epochs to 300 and Learning Rate to 1 (range is ...
Sri Ram M S's user avatar
0 votes
0 answers
66 views

DocumentAI OCR Error: Invalid Document Content

I am calling DocumentAI OCR batch processing from Workflows generally quite successfully, however, I occasionally get the following error: { "caughtError": { "message": "...
Leo Glowacki's user avatar
0 votes
0 answers
127 views

How to improve the checkbox detection on gcp documents ai?

We're using google OCR to read PDF or Images that are Loan Estimates. We're defining multiple fields such as loanTerm, loanPurpose loanPurpose but we're also labeling multiple checkboxes that can be ...
enyesate's user avatar
0 votes
0 answers
53 views

Can't Extract Table from Image Using Google API

I am working to digitize these tables using google's Form Parser, but have been struggling to get accurate replication. I've tried to read the table from an image into a csv, but it is still missing ...
Lillian Yang's user avatar
0 votes
0 answers
92 views

Using FieldMask for DocumentAI inner fields in Google Java SDK

I'm looking at the Google DocumentAI SDK and I'm trying to filter out an "inner" field, specifically the BoundingPoly.vertices field from all objects. The bounding poly is part of the Layout ...
PentaKon's user avatar
  • 4,626
0 votes
0 answers
61 views

ProcessDocument API Errors - No remaining quota for ParseDocument

As part of our workflow we invoke DocumentAI ProcessDocument API (v1) API from our back end and the code has been in place for over 6 months and running without any errors. In the past one week we ...
Charles's user avatar
0 votes
0 answers
49 views

Google document AI BatchImportDocuments Error

While trying to import documents from google storage bucket using this function we are getting an HTTP:500 error, we are trying to push the documents into the train queue of the specified processor ...
Deepika Majji's user avatar
0 votes
1 answer
585 views

How to Batch Process Long Documents Exceeding the Google Document AI Page Limit?

I'm working with Google Document AI to process long documents, where the number of pages exceeds the processor limit (~8k pages). The current documented page limit for Enterprise OCR is 500 pages for ...
Leo Glowacki's user avatar
0 votes
0 answers
96 views

Not able to get bounding box and other fields from the Layout Parser service

I am trying to extract text from a PDF file using the Layout Parser Python SDK. I have copied and used the sample code from the docs. However I've noticed that the output does not contain all the ...
honeybees's user avatar
0 votes
0 answers
17 views

Is it possible to see how many documents where used to create a Customer Extractor Model?

I can see how many testing documents the model was evaluated against, but can't see how many documents where used to train the model. Do I just assume that it's all documents, is there a way to view ...
Dane Padley's user avatar
0 votes
0 answers
92 views

Is there a way to split a Google Document AI Document into it's pages

I've labeled 4 different documents for Google's Document AI, each with 15-30 pages. This means i've labeled about 100 pages on which I wanted to train a custom extractor. Now the extractor won't let ...
DadaMlatic's user avatar
0 votes
1 answer
55 views

Filtering Google API responses using FieldMasks with Java SDK

I'm using Google's Java SDK to call into DocumentAI service. The response happens to contain the image for each page, in base64, and I'd like to filter that out. During the request building for doing ...
PentaKon's user avatar
  • 4,626
0 votes
1 answer
237 views

DocumentAI: 400 Request contains an invalid argument

If I run this code locally it works. On Cloud Run, I get "400 Request contains an invalid argument". As side notes: The input file path is a temporary file obtained using the get_file_path ...
Alessandro Ceccarelli's user avatar
0 votes
0 answers
40 views

How can I run more than two Thread to parse multiple documents with DocumentProcessorServiceAsyncClient - python

As such, the code works, but only with two Threads, if I add another one, the process stops and then takes a time out. I don't know if DocumentProcessorServiceAsyncClient will have a limit of two ...
Jeison Jose Bolano Pabon's user avatar
0 votes
0 answers
103 views

How to create GCP document AI custom extractor with generative AI model and update its schema from .net app

I need to create from a .net application GCP document AI custom extractor that is configured to use generative AI model and I need to update its schema with proper labels. I tried to achieve this with ...
mr100's user avatar
  • 4,418
0 votes
0 answers
100 views

Fetch All Highlighted text from PDF using Document AI

How can I get all high lighted word / text from PDF file using Google Document AI. I try with Document OCR, Form Parser and etc Processor of document AI I also try Custom Extractor, Custom Splitter I ...
Nikhil Patel's user avatar
1 vote
0 answers
32 views

custom classifier/splitter dataset test limit

I am currently working on a project that utilizes the docai custom classifier. I have a question regarding the test dataset size limitations. As I understand, the current limit for the test dataset ...
Al Monteagudo's user avatar
0 votes
0 answers
51 views

Can I export the trained model or Docker from Google Document AI Custom Extractor?

I've trained a custom form model using Google Document AI Custom Extractor. According to the official Document AI documentation and the Google Cloud platform interface, it seems that the only way to ...
Mindy Wu's user avatar
0 votes
0 answers
24 views

Converting document object to dataframe csv with document ai toolbox

This code sample from google cloud docs is supposed to produce output as csv, html or markdown files, but all that is in the output is 'Tables in Document', when run in a Google Colab notebook: # ...
OfficeSupplySA's user avatar
0 votes
0 answers
81 views

Google Document AI API Returning Worse Results Than Console

I am trying to use the google document ai api. I have created a custom processor and defined a custom schema. When I upload a document through the console, the processor highlights almost all of the ...
Dhruv Luthra's user avatar
0 votes
0 answers
41 views

Google Document AI directly processes CV2 object

I know we can upload the image file to be processed in Google DocumentAI; I am building an app that leverages DocumentAI API in Python. Is there a way for DocumentAI to process image in numpy array? ...
skw1990's user avatar
  • 63
0 votes
1 answer
569 views

Getting an error when I am trying to use pre-built contract model on AI Document Intelligence Studio. Error code in the body

I was trying to analyze a contract using Microsoft's Document Intelligence Studio. All the pre-built models are working except for the contract pre-built model. I am getting error code: "...
Harsh Khewal's user avatar
0 votes
0 answers
56 views

Issue with spacing not being detected by custom extractor

I've created and trained a custom document extractor via GCP's Document AI and have noticed that it doesn't always notice the space between two sets of numbers and ends up putting them together. An ...
pl8nt's user avatar
  • 49
0 votes
1 answer
370 views

Document AI - Multi-page files performance affect

I’ve noticed that it’s possible to upload multi-page files to Document AI, such that all pages are connected to each other by being associated to the same file. My use case is invoice files that I ...
Yaniv Ben-Malka's user avatar
0 votes
1 answer
297 views

Auto-Labeling in Document AI with Custom Extractor: Schema Requirement Issue

I am using Document AI with a Custom Extractor. When I create a new Custom Extractor, it offers to manage my dataset. I expect that doing so will automatically create label names for the documents I ...
tmighty's user avatar
  • 11.3k
0 votes
1 answer
231 views

Google Document AI create labeling instruction

https://cloud.google.com/document-ai/docs/workbench/label-documents#labeling For google Document AI, what is a labeling instruction exactly? Is it a pdf where every label are annotated using a box? If ...
Max's user avatar
  • 1
0 votes
1 answer
99 views

Document AI adding folders

I'm using Document AI to parse PDF files from one bucket and then save them as JSON in another bucket in GCS. However, Document AI creates a folder with a subfolder in my bucket. I've read a lot and I ...
c0nfusion's user avatar
0 votes
0 answers
44 views

Book Digitization: Is Google Document AI Necessary?

I have a question about Google Document AI: I intend to create a digitization service dedicated to Libraries. My goal is to digitize old books (no manuscripts). The result must be a PDF with the ...
Pier's user avatar
  • 1
0 votes
1 answer
72 views

Does the `Number` type in Google Document AI include decimals?

I've been using the document AI tool for a while and have quite a few documents labeled and just thought of a question: does the Number field type allow for decimals (ex: 0.3456) or does it only allow ...
pl8nt's user avatar
  • 49
0 votes
1 answer
81 views

GCP API for AI Documents

I'm having issues with the API, there is no response whatsoever. I have created  the service account with the corresponding API key with its JSON file, however, I cannot seem to get any response when ...
Keagan Gilmore's user avatar
0 votes
0 answers
63 views

fail to train document extractor

I tried to train a custom document extractor, using the minimum set (3 training + 3 test) for template-based training. I've retried for 3 times, all failed with ... { "name": "...
Samuel Fung's user avatar
0 votes
0 answers
106 views

How can I tell Google Document AI Enterprise OCR to always assume one column?

How can I tell Google Document AI Enterprise OCR to always assume one column? My text (scans of old books) are always one column. However, due to layout, (lots of) whitespace, and inline figures, ...
SRobertJames's user avatar
  • 9,139
1 vote
1 answer
266 views

How can I use Google Document AI OCR to find the non-text images in a text document?

How can I use Google Document AI OCR to find the non-text images in a text document? I'm using Google Document AI Enterprise OCR to OCR images (scans of old books_, and it works well. The books have ...
SRobertJames's user avatar
  • 9,139
0 votes
0 answers
125 views

Will adjusting the value acquired from bounding box annotation train the model to be able to make inferences?

This may be a silly question but I've been annotating quite a few documents with the Google Document AI tool and have had this worry in the back of my mind. My task is to use Doc AI to extract ...
pl8nt's user avatar
  • 49
0 votes
0 answers
77 views

Line Ordering Issue with Arabic PDF Text Using Google Cloud Document AI

I have an app that uses Document AI to process PDFs and extract text from it. When I use the stable version but still is not accurate. The processed text seems to have its lines mixed up, not ...
Khaled Saleh's user avatar
0 votes
1 answer
224 views

Response from Document AI stored in Google Cloud Storage

I am using a GCP workflow and eventarc trigger connected to cloud storage to have a document evaluated by Document AI when the cloud storage bucket receives it. The issue I'm encountering is, whenever ...
Lofton Gentry's user avatar
1 vote
1 answer
132 views

Reskewing GCP Document AI Result

GCP's Document AI is pre-processing images to remove things like skew. The bounding boxes it produces correspond to the pre-processed image, not the image sent to the API. I need to reskew them so ...
user19213041's user avatar
0 votes
0 answers
132 views

Configured Google Document AI to enable "computeStyleInfo", but not receiving any textStyles in the response

The textStyles array from the Document AI response object is empty, despite having set everything up following google's docAI documentation. I enabled document AI's font-style detection following ...
Ryan Hartwig's user avatar
0 votes
1 answer
315 views

Document AI batch processing timeout using Java

I am trying to batch process a set of documents using Document AI and its Java SDK. My code is derived from the batch processing example for Java (seen here), but I have modified it to add more than ...
Filip Östermark's user avatar
0 votes
0 answers
63 views

Impact of Using PDF Training Data and JPG Test Data on Document AI Model Performance

I'm currently working on a document AI project (with Custom Extractor) and have encountered a scenario that I'm unsure how to navigate. My training dataset of Shipping instruction documents consists ...
lht_18018's user avatar
1 vote
1 answer
485 views

Document AI "400 No valid schema provided for processing" with Cloud Function

I’ve been experiencing an issue with the Google Cloud Document AI API in my Firebase Cloud Function that handles documents uploaded to Google Cloud Storage. The function triggers correctly upon PDF ...
HaZeust's user avatar
  • 13
1 vote
0 answers
220 views

(Terraform) BigQuery Job misses IAM permissions, which have been granted

I read this blogpost about the recently published Document AI - BigQuery Integration. I want to configure this setup completly using terraform. An important step in the blog post is the configuration ...
Brian's user avatar
  • 95
0 votes
1 answer
477 views

Improve Document AI generative AI accuracy?

I am creating a Document AI Custom Processor on Google Cloud Platform. I have been using the pre-trained foundation model to auto-label documents as I import them. However, it is not clear to me if ...
Filip Östermark's user avatar

1
2 3 4 5 6