-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
File protocol not supported #13669
Comments
Thanks for opening the issue @teaguesterling! I think it would be nice if DuckDB would add support for |
I would really appreciate if you can add support to file:/ too or file:/// as this is the format used by our spark engine, just for compatibility reason |
FYI, Duckdb's It would be great to support the |
I see support for the |
I can confirm this issue as well, in my case for the
|
It may be worth adding support for Here's my understanding:
|
I used pyarrow to test out the urls above This might get tricky to implement since |
Thanks for testing this! It seems the first two that I wrote are not what was expected There may be an "easy" short circuit we can add in the glob code since the logic is a bit simpler. It will always need to begin with |
i have the same issue |
@samansmink any chance that this get merged before 1.2 please :) |
Takes a stab at #13669 @djouallah This PR adds support for [`file://` urls](https://en.wikipedia.org/wiki/File_URI_scheme) to the LocalFileSystem. It currently supports urls of 3 different formats: - `file:/some/path` (host omitted completely) - `file:///some/path` (empty host) - `file://localhost/some/path` (localhost as host) Note that the following is not supported because they are non-standard (and actually forbidden by the spec) formats: - relative paths (`file:some/relative/path`) - double-slash paths (`file://some/path`) Additionally, we also don't support - non-localhost hosts (`file://somehostsomewhere/some/path`) For the non-standard formats we could consider implementing them anyway if they show up a lot.
What happens?
DuckDB cannot load local files when the
file://
prefix is provided.In developing the duckdb_iceberg extension with test data created by pyiceberg, @ramonvermeulen observed that files were not loading. He traced this to the file paths in metadata being prefixed by
file://
and removing the protocol resolved the issue.In further testing to try and address the issue, I noticed that there is no fs subsystem for handling the file protocol and that a fix to this should probably be moved outside of the duckdb_iceberg extension.
To Reproduce
IO Error: No files found that match the pattern "file://data/iceberg/generated_spec1_0_001/pyspark_iceberg_table/data/00000-5-bd694195-a731-4121-be17-0a6b13d4e9fb-00001.parquet"
This is tested with data from the https://github.com/duckdb/duckdb_iceberg/ repository
OS:
PopOS 22.04
DuckDB Version:
1.0.0
DuckDB Client:
CLI
Full Name:
Teague Sterling
Affiliation:
23andMe
What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.
I have tested with a source build
Did you include all relevant data sets for reproducing the issue?
No - Other reason (please specify in the issue body)
Did you include all code required to reproduce the issue?
Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?
The text was updated successfully, but these errors were encountered: