Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing multiple files failed #1

Open
alabulei1 opened this issue Nov 27, 2024 · 1 comment
Open

Parsing multiple files failed #1

alabulei1 opened this issue Nov 27, 2024 · 1 comment

Comments

@alabulei1
Copy link
Member

alabulei1 commented Nov 27, 2024

I uploaded three types of files, but only got the PDF version.

  • one pdf file
  • one txt file
  • one docfile
image

Error log:

127.0.0.1 - - [27/Nov/2024 16:47:39] "GET / HTTP/1.1" 200 -
127.0.0.1 - - [27/Nov/2024 16:47:39] "GET /static/script.js HTTP/1.1" 200 -
127.0.0.1 - - [27/Nov/2024 16:47:39] "GET /favicon.ico HTTP/1.1" 404 -
output_folder
9i4nqB20241127164858
Started parsing the file under job_id 957d4387-317b-431d-95f8-88906df4703e
文件内容已保存到: 9i4nqB20241127164858/HighStakes-2.1.pdf.md
HighStakes-2.1.pdf 是pdf
127.0.0.1 - - [27/Nov/2024 16:49:23] "POST /upload HTTP/1.1" 500 -
Traceback (most recent call last):
  File "/Users/alabulei/miniconda3/lib/python3.11/site-packages/textract/parsers/utils.py", line 87, in run
    pipe = subprocess.Popen(
           ^^^^^^^^^^^^^^^^^
  File "/Users/alabulei/miniconda3/lib/python3.11/subprocess.py", line 1026, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/Users/alabulei/miniconda3/lib/python3.11/subprocess.py", line 1950, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'antiword'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/alabulei/miniconda3/lib/python3.11/site-packages/flask/app.py", line 1536, in __call__
    return self.wsgi_app(environ, start_response)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alabulei/miniconda3/lib/python3.11/site-packages/flask/app.py", line 1514, in wsgi_app
    response = self.handle_exception(e)
               ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alabulei/miniconda3/lib/python3.11/site-packages/flask/app.py", line 1511, in wsgi_app
    response = self.full_dispatch_request()
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alabulei/miniconda3/lib/python3.11/site-packages/flask/app.py", line 919, in full_dispatch_request
    rv = self.handle_user_exception(e)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alabulei/miniconda3/lib/python3.11/site-packages/flask/app.py", line 917, in full_dispatch_request
    rv = self.dispatch_request()
         ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alabulei/miniconda3/lib/python3.11/site-packages/flask/app.py", line 902, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)  # type: ignore[no-any-return]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alabulei/gaiabase/webserver.py", line 113, in upload
    prase_doc(file_path, output_folder)
  File "/Users/alabulei/gaiabase/webserver.py", line 64, in prase_doc
    content = textract.process(input_file).decode("utf-8")
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alabulei/miniconda3/lib/python3.11/site-packages/textract/parsers/__init__.py", line 79, in process
    return parser.process(filename, input_encoding, output_encoding, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alabulei/miniconda3/lib/python3.11/site-packages/textract/parsers/utils.py", line 46, in process
    byte_string = self.extract(filename, **kwargs)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alabulei/miniconda3/lib/python3.11/site-packages/textract/parsers/doc_parser.py", line 9, in extract
    stdout, stderr = self.run(['antiword', filename])
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alabulei/miniconda3/lib/python3.11/site-packages/textract/parsers/utils.py", line 95, in run
    raise exceptions.ShellError(
textract.exceptions.ShellError: The command `antiword uploads/EssexUni_RaoFU_Resume_coda.doc` failed because the executable
`antiword` is not installed on your system. Please make
sure the appropriate dependencies are installed before using
textract:

    http://textract.readthedocs.org/en/latest/installation.html
@alabulei1
Copy link
Member Author

After running pip install antiword, I met the following error:

Traceback (most recent call last):
  File "/Users/alabulei/miniconda3/lib/python3.11/site-packages/flask/app.py", line 1536, in __call__
    return self.wsgi_app(environ, start_response)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alabulei/miniconda3/lib/python3.11/site-packages/flask/app.py", line 1514, in wsgi_app
    response = self.handle_exception(e)
               ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alabulei/miniconda3/lib/python3.11/site-packages/flask/app.py", line 1511, in wsgi_app
    response = self.full_dispatch_request()
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alabulei/miniconda3/lib/python3.11/site-packages/flask/app.py", line 919, in full_dispatch_request
    rv = self.handle_user_exception(e)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alabulei/miniconda3/lib/python3.11/site-packages/flask/app.py", line 917, in full_dispatch_request
    rv = self.dispatch_request()
         ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alabulei/miniconda3/lib/python3.11/site-packages/flask/app.py", line 902, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)  # type: ignore[no-any-return]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alabulei/gaiabase/webserver.py", line 113, in upload
    prase_doc(file_path, output_folder)
  File "/Users/alabulei/gaiabase/webserver.py", line 64, in prase_doc
    content = textract.process(input_file).decode("utf-8")
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alabulei/miniconda3/lib/python3.11/site-packages/textract/parsers/__init__.py", line 79, in process
    return parser.process(filename, input_encoding, output_encoding, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alabulei/miniconda3/lib/python3.11/site-packages/textract/parsers/utils.py", line 46, in process
    byte_string = self.extract(filename, **kwargs)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alabulei/miniconda3/lib/python3.11/site-packages/textract/parsers/doc_parser.py", line 9, in extract
    stdout, stderr = self.run(['antiword', filename])
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alabulei/miniconda3/lib/python3.11/site-packages/textract/parsers/utils.py", line 106, in run
    raise exceptions.ShellError(
textract.exceptions.ShellError: The command `antiword uploads/1.-.doc` failed with exit code 1
------------- stdout -------------
b''------------- stderr -------------
b'Traceback (most recent call last):\n  File "/Users/alabulei/miniconda3/bin/antiword", line 8, in <module>\n    sys.exit(main())\n             ^^^^^^\n  File "/Users/alabulei/miniconda3/lib/python3.11/site-packages/antiword.py", line 21, in main\n    r = run(cmd)\n        ^^^^^^^^\n  File "/Users/alabulei/miniconda3/lib/python3.11/subprocess.py", line 548, in run\n    with Popen(*popenargs, **kwargs) as process:\n         ^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File "/Users/alabulei/miniconda3/lib/python3.11/subprocess.py", line 1026, in __init__\n    self._execute_child(args, executable, preexec_fn, close_fds,\n  File "/Users/alabulei/miniconda3/lib/python3.11/subprocess.py", line 1950, in _execute_child\n    raise child_exception_type(errno_num, err_msg, err_filename)\nFileNotFoundError: [Errno 2] No such file or directory: \'libreoffice\'\n'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant