DocumentAI: 400 Request contains an invalid argument

Question

If I run this code locally it works. On Cloud Run, I get "400 Request contains an invalid argument".

As side notes:

The input file path is a temporary file obtained using the get_file_path function (e.g. /tmp/20240524_1011_470558_393480997312_ME7001b9a160047278515b83ff123c7545.jpg)
Env variables are the same locally and on Cloud Run
Assign mime type correctly returns "image/jpeg"

How do I fix it?

def get_file_path(filename):
        """Create a secure version of the filename
        and return a full path to temporary directory"""
        file_name = secure_filename(filename)
        return os.path.join(tempfile.gettempdir(), file_name)

def process_document_sample(
    project_id: str,
    location: str,
    processor_id: str,
    file_path: str,
    mime_type: str,
    field_mask: Optional[str] = None,
    processor_version_id: Optional[str] = None,
) -> None:

    # You must set the `api_endpoint` if you use a location other than "us".
    opts = ClientOptions(api_endpoint=f"{location}-documentai.googleapis.com")

    client = documentai.DocumentProcessorServiceClient.from_service_account_json(
        config.Config.OCR_GCP_KEY_PATH, client_options=opts
    )

    logger.debug("Document AI Client correctly launched.")

    if processor_version_id:
        # The full resource name of the processor version, e.g.:
        # `projects/{project_id}/locations/{location}/processors/{processor_id}/processorVersions/{processor_version_id}`
        name = client.processor_version_path(
            project_id, location, processor_id, processor_version_id
        )
    else:
        # The full resource name of the processor, e.g.:
        # `projects/{project_id}/locations/{location}/processors/{processor_id}`
        name = client.processor_path(project_id, location, processor_id)

    # Read the file into memory
    with open(file_path, "rb") as image:
        image_content = image.read()

    logger.debug("Local image correctly loaded.")

    # Load binary data
    raw_document = documentai.RawDocument(content=image_content, mime_type=mime_type)

    # For more information: https://cloud.google.com/document-ai/docs/reference/rest/v1/ProcessOptions
    # Optional: Additional configurations for processing.
    process_options = documentai.ProcessOptions(
        # Process only specific pages
        individual_page_selector=documentai.ProcessOptions.IndividualPageSelector(
            pages=[1]
        )
    )

    # Configure the process request
    request = documentai.ProcessRequest(
        name=name,
        raw_document=raw_document,
        field_mask=field_mask,
        process_options=process_options,
    )

    result = client.process_document(request=request)

    # For a full list of `Document` object attributes, reference this page:
    # https://cloud.google.com/document-ai/docs/reference/rest/v1/Document
    document = result.document

    # Read the text recognition output from the processor
    logger.debug("The document contains the following text:")
    logger.debug(document.text)

    return document


document = process_document_sample(
            project_id=os.environ["GCP_PROJECT_ID"],
            location=os.environ["PROCESSOR_LOCATION"],
            processor_id=os.environ["PROCESSOR_ID"],
            file_path=file_path,
            mime_type=_assign_mime_type(file_path),
        )

Are you deploying this as a Cloud Run job? It won't work as a Cloud Run service because it does not appear to serve HTTP. — DazWilkin, Commented May 24 at 19:04
You should try to not use from_service_account_json and instead use Application Default Credentials. It's not your biggest issue but it would make your code safer and more flexible. — DazWilkin, Commented May 24 at 19:06
I'm invoking it from Cloud Run. The error is the same across platforms though (e.g. Digital Ocean) — Alessandro Ceccarelli, Commented May 27 at 15:53
Your code isn't a minimal repro so it's challenging to comment but ... content=image_content appears incorrect. This value should be base64-encoded (see RawDocument) — DazWilkin, Commented May 27 at 16:31

Alessandro Ceccarelli · Accepted Answer · 2024-05-28 17:01:37Z

0

The downloaded object sent to DocumentAI was empty, due to external API-related circumstances; thus, the error.

Problem fixed with proper file download.

answered May 28 at 17:01

Alessandro Ceccarelli

1,9355 gold badges26 silver badges45 bronze badges

Add a comment |

Collectives™ on Stack Overflow

DocumentAI: 400 Request contains an invalid argument

1 Answer 1

Your Answer

Not the answer you're looking for? Browse other questions tagged
python-3.x
google-cloud-platform
cloud-document-ai
or ask your own question.

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged python-3.xgoogle-cloud-platformcloud-document-ai or ask your own question.

Related

Not the answer you're looking for? Browse other questions tagged
python-3.x
google-cloud-platform
cloud-document-ai
or ask your own question.