1

From https://cloud.google.com/document-ai/docs/process-forms, I can see some example of processing single files. But in most cases, companies have buckets of documents. In that case, how do you scale the document ai processing? Do you use the document ai in conjunction with Spark? Or is there another way?

2 Answers 2

0

I could only find the following: batch_process_documents process many documents and return an async response that'll get saved in cloud storage.

From there, I think that we can parametrise our job by adding an input path of the bucket prefix and distribute the job over several machines.

All of that could be orchestrated via Airflow for example.

0
Answer recommended by Google Cloud Collective

You will need to use Batch Processing to handle multiple documents at once with Document AI.

This page in the Cloud Documentation shows how to make Batch Processing requests with REST and the Client Libraries.

https://cloud.google.com/document-ai/docs/send-request#batch-process

This codelab also illustrates how to do this in Python with the OCR Processor. https://codelabs.developers.google.com/codelabs/docai-ocr-python

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.