283 questions
0
votes
0
answers
30
views
Doesn't contain any ground-truth entity defined in the Schema
while training a Google Document Classifyer, the training failed with the below errors.
On inspection of the error, I noticed that all failed documents belong to classification lables which have been ...
2
votes
1
answer
48
views
With Custom Extractor, Python API view of schema does not provide access to EntityTypes; it should according to docs
The API documentation shows that the DocumentSchema has EntityType children which should contain details of all fields in a Custom Extractor. I am able to obtain the DocumentSchema as expected. ...
0
votes
0
answers
27
views
Error code 13 training with document processor
Processor type: Custom extractor
I'm having the following error when trying to train a document ai processor:
{
"code": 3,
"message": "Invalid document.",
"...
0
votes
1
answer
48
views
How to use DocumentAI to extract data and bring the results to BQ using BQML?
I built a custom extractor in Document AI.
Deployed version : pretrained-foundation-model-v1.3-2024-08-31
# Create a remote model to register your Doc AI processor in BigQuery.
CREATE OR REPLACE ...
0
votes
0
answers
23
views
Custom Classifier Failed to refresh dataset stats
I'm training a Custom Classifier in Document AI. Worked fine and I had a Dataset with about 4000 documents. I trained multiple versions and they are running well.
But now I'm not able to see these ...
0
votes
1
answer
27
views
ValueError: Protocol message OcrConfig has no "premium_features" field when use DocumentAI
I'm using Font-style detection in Google's DocumentAI using Python:
"premium_features": {
"compute_style_info": True
},
But it gives the following ...
0
votes
0
answers
2k
views
Document AI - Processor location issue [duplicate]
I'm using a Mac and I have created a simple Document AI processor on the Google Cloud Platform (PDF splitter). This processor was trained, tested and deployed.
I'm now desperately trying to make use ...
0
votes
0
answers
161
views
Google Document AI Fine Tuning is taking forever
I am using a foundational model "pretrained-foundation-model-v1.1-2024-03-12" to train a custom extractor on Google Document AI. I've set the epochs to 300 and Learning Rate to 1 (range is ...
0
votes
0
answers
66
views
DocumentAI OCR Error: Invalid Document Content
I am calling DocumentAI OCR batch processing from Workflows generally quite successfully, however, I occasionally get the following error:
{
"caughtError": {
"message": "...
0
votes
0
answers
127
views
How to improve the checkbox detection on gcp documents ai?
We're using google OCR to read PDF or Images that are Loan Estimates.
We're defining multiple fields such as loanTerm, loanPurpose
loanPurpose
but we're also labeling multiple checkboxes that can be ...
0
votes
0
answers
53
views
Can't Extract Table from Image Using Google API
I am working to digitize these tables using google's Form Parser, but have been struggling to get accurate replication. I've tried to read the table from an image into a csv, but it is still missing ...
0
votes
0
answers
92
views
Using FieldMask for DocumentAI inner fields in Google Java SDK
I'm looking at the Google DocumentAI SDK and I'm trying to filter out an "inner" field, specifically the BoundingPoly.vertices field from all objects.
The bounding poly is part of the Layout ...
0
votes
0
answers
61
views
ProcessDocument API Errors - No remaining quota for ParseDocument
As part of our workflow we invoke DocumentAI ProcessDocument API (v1) API from our back end and the code has been in place for over 6 months and running without any errors. In the past one week we ...
0
votes
0
answers
49
views
Google document AI BatchImportDocuments Error
While trying to import documents from google storage bucket using this function we are getting an HTTP:500 error, we are trying to push the documents into the train queue of the specified processor
...
0
votes
1
answer
585
views
How to Batch Process Long Documents Exceeding the Google Document AI Page Limit?
I'm working with Google Document AI to process long documents, where the number of pages exceeds the processor limit (~8k pages). The current documented page limit for Enterprise OCR is 500 pages for ...
0
votes
0
answers
96
views
Not able to get bounding box and other fields from the Layout Parser service
I am trying to extract text from a PDF file using the Layout Parser Python SDK. I have copied and used the sample code from the docs. However I've noticed that the output does not contain all the ...
0
votes
0
answers
17
views
Is it possible to see how many documents where used to create a Customer Extractor Model?
I can see how many testing documents the model was evaluated against, but can't see how many documents where used to train the model. Do I just assume that it's all documents, is there a way to view ...
0
votes
0
answers
92
views
Is there a way to split a Google Document AI Document into it's pages
I've labeled 4 different documents for Google's Document AI, each with 15-30 pages. This means i've labeled about 100 pages on which I wanted to train a custom extractor. Now the extractor won't let ...
0
votes
1
answer
55
views
Filtering Google API responses using FieldMasks with Java SDK
I'm using Google's Java SDK to call into DocumentAI service. The response happens to contain the image for each page, in base64, and I'd like to filter that out. During the request building for doing ...
0
votes
1
answer
237
views
DocumentAI: 400 Request contains an invalid argument
If I run this code locally it works. On Cloud Run, I get "400 Request contains an invalid argument".
As side notes:
The input file path is a temporary file obtained using the get_file_path ...
0
votes
0
answers
40
views
How can I run more than two Thread to parse multiple documents with DocumentProcessorServiceAsyncClient - python
As such, the code works, but only with two Threads, if I add another one, the process stops and then takes a time out. I don't know if DocumentProcessorServiceAsyncClient will have a limit of two ...
0
votes
0
answers
103
views
How to create GCP document AI custom extractor with generative AI model and update its schema from .net app
I need to create from a .net application GCP document AI custom extractor that is configured to use generative AI model and I need to update its schema with proper labels. I tried to achieve this with ...
0
votes
0
answers
100
views
Fetch All Highlighted text from PDF using Document AI
How can I get all high lighted word / text from PDF file using Google Document AI.
I try with Document OCR, Form Parser and etc Processor of document AI
I also try Custom Extractor, Custom Splitter
I ...
1
vote
0
answers
32
views
custom classifier/splitter dataset test limit
I am currently working on a project that utilizes the docai custom classifier. I have a question regarding the test dataset size limitations.
As I understand, the current limit for the test dataset ...
0
votes
0
answers
51
views
Can I export the trained model or Docker from Google Document AI Custom Extractor?
I've trained a custom form model using Google Document AI Custom Extractor. According to the official Document AI documentation and the Google Cloud platform interface, it seems that the only way to ...
0
votes
0
answers
24
views
Converting document object to dataframe csv with document ai toolbox
This code sample from google cloud docs is supposed to produce output as csv, html or markdown files, but all that is in the output is 'Tables in Document', when run in a Google Colab notebook:
# ...
0
votes
0
answers
81
views
Google Document AI API Returning Worse Results Than Console
I am trying to use the google document ai api.
I have created a custom processor and defined a custom schema. When I upload a document through the console, the processor highlights almost all of the ...
0
votes
0
answers
41
views
Google Document AI directly processes CV2 object
I know we can upload the image file to be processed in Google DocumentAI; I am building an app that leverages DocumentAI API in Python. Is there a way for DocumentAI to process image in numpy array?
...
0
votes
1
answer
569
views
Getting an error when I am trying to use pre-built contract model on AI Document Intelligence Studio. Error code in the body
I was trying to analyze a contract using Microsoft's Document Intelligence Studio. All the pre-built models are working except for the contract pre-built model. I am getting error code:
"...
0
votes
0
answers
56
views
Issue with spacing not being detected by custom extractor
I've created and trained a custom document extractor via GCP's Document AI and have noticed that it doesn't always notice the space between two sets of numbers and ends up putting them together.
An ...
0
votes
1
answer
370
views
Document AI - Multi-page files performance affect
I’ve noticed that it’s possible to upload multi-page files to Document AI, such that all pages are connected to each other by being associated to the same file.
My use case is invoice files that I ...
0
votes
1
answer
297
views
Auto-Labeling in Document AI with Custom Extractor: Schema Requirement Issue
I am using Document AI with a Custom Extractor. When I create a new Custom Extractor, it offers to manage my dataset.
I expect that doing so will automatically create label names for the documents I ...
0
votes
1
answer
231
views
Google Document AI create labeling instruction
https://cloud.google.com/document-ai/docs/workbench/label-documents#labeling
For google Document AI, what is a labeling instruction exactly? Is it a pdf where every label are annotated using a box? If ...
0
votes
1
answer
99
views
Document AI adding folders
I'm using Document AI to parse PDF files from one bucket and then save them as JSON in another bucket in GCS. However, Document AI creates a folder with a subfolder in my bucket.
I've read a lot and I ...
0
votes
0
answers
44
views
Book Digitization: Is Google Document AI Necessary?
I have a question about Google Document AI:
I intend to create a digitization service dedicated to Libraries.
My goal is to digitize old books (no manuscripts). The result must be a PDF with the ...
0
votes
1
answer
72
views
Does the `Number` type in Google Document AI include decimals?
I've been using the document AI tool for a while and have quite a few documents labeled and just thought of a question: does the Number field type allow for decimals (ex: 0.3456) or does it only allow ...
0
votes
1
answer
81
views
GCP API for AI Documents
I'm having issues with the API, there is no response whatsoever. I have created the service account with the corresponding API key with its JSON file, however, I cannot seem to get any response when ...
0
votes
0
answers
63
views
fail to train document extractor
I tried to train a custom document extractor, using the minimum set (3 training + 3 test) for template-based training. I've retried for 3 times, all failed with ...
{
"name": "...
0
votes
0
answers
106
views
How can I tell Google Document AI Enterprise OCR to always assume one column?
How can I tell Google Document AI Enterprise OCR to always assume one column?
My text (scans of old books) are always one column. However, due to layout, (lots of) whitespace, and inline figures, ...
1
vote
1
answer
266
views
How can I use Google Document AI OCR to find the non-text images in a text document?
How can I use Google Document AI OCR to find the non-text images in a text document?
I'm using Google Document AI Enterprise OCR to OCR images (scans of old books_, and it works well. The books have ...
0
votes
0
answers
125
views
Will adjusting the value acquired from bounding box annotation train the model to be able to make inferences?
This may be a silly question but I've been annotating quite a few documents with the Google Document AI tool and have had this worry in the back of my mind. My task is to use Doc AI to extract ...
0
votes
0
answers
77
views
Line Ordering Issue with Arabic PDF Text Using Google Cloud Document AI
I have an app that uses Document AI to process PDFs and extract text from it. When I use the stable version but still is not accurate. The processed text seems to have its lines mixed up, not ...
0
votes
1
answer
224
views
Response from Document AI stored in Google Cloud Storage
I am using a GCP workflow and eventarc trigger connected to cloud storage to have a document evaluated by Document AI when the cloud storage bucket receives it. The issue I'm encountering is, whenever ...
1
vote
1
answer
132
views
Reskewing GCP Document AI Result
GCP's Document AI is pre-processing images to remove things like skew. The bounding boxes it produces correspond to the pre-processed image, not the image sent to the API. I need to reskew them so ...
0
votes
0
answers
132
views
Configured Google Document AI to enable "computeStyleInfo", but not receiving any textStyles in the response
The textStyles array from the Document AI response object is empty, despite having set everything up following google's docAI documentation.
I enabled document AI's font-style detection following ...
0
votes
1
answer
315
views
Document AI batch processing timeout using Java
I am trying to batch process a set of documents using Document AI and its Java SDK. My code is derived from the batch processing example for Java (seen here), but I have modified it to add more than ...
0
votes
0
answers
63
views
Impact of Using PDF Training Data and JPG Test Data on Document AI Model Performance
I'm currently working on a document AI project (with Custom Extractor) and have encountered a scenario that I'm unsure how to navigate. My training dataset of Shipping instruction documents consists ...
1
vote
1
answer
485
views
Document AI "400 No valid schema provided for processing" with Cloud Function
I’ve been experiencing an issue with the Google Cloud Document AI API in my Firebase Cloud Function that handles documents uploaded to Google Cloud Storage. The function triggers correctly upon PDF ...
1
vote
0
answers
220
views
(Terraform) BigQuery Job misses IAM permissions, which have been granted
I read this blogpost about the recently published Document AI - BigQuery Integration. I want to configure this setup completly using terraform.
An important step in the blog post is the configuration ...
0
votes
1
answer
477
views
Improve Document AI generative AI accuracy?
I am creating a Document AI Custom Processor on Google Cloud Platform. I have been using the pre-trained foundation model to auto-label documents as I import them. However, it is not clear to me if ...