Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for HuggingFace to httpfs #11831

Merged
merged 16 commits into from
May 13, 2024
Merged
Prev Previous commit
Next Next commit
allow spaces urls, add tests, fix ci
  • Loading branch information
samansmink committed May 2, 2024
commit db624d6cbf5d7a379d26d9657fd623f160e4e73d
2 changes: 1 addition & 1 deletion extension/httpfs/hffs.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -341,7 +341,7 @@ ParsedHFUrl HuggingFaceFileSystem::HFUrlParse(const string &url) {
ThrowParseError(url);
}
result.repo_type = url.substr(last_delim, curr_delim - last_delim);
samansmink marked this conversation as resolved.
Show resolved Hide resolved
if (result.repo_type != "datasets") {
if (result.repo_type != "datasets" && result.repo_type != "spaces") {
throw IOException("Failed to parse: '%s'. Currently DuckDB only supports querying datasets, so the url should "
"start with 'hf://datasets'",
samansmink marked this conversation as resolved.
Show resolved Hide resolved
url);
Expand Down
1 change: 1 addition & 0 deletions src/include/duckdb/main/extension_entries.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -252,6 +252,7 @@ static constexpr ExtensionEntry EXTENSION_SETTINGS[] = {
{"calendar", "icu"},
{"enable_server_cert_verification", "httpfs"},
{"force_download", "httpfs"},
{"hf_max_per_page", "httpfs"},
{"http_keep_alive", "httpfs"},
{"http_retries", "httpfs"},
{"http_retry_backoff", "httpfs"},
Expand Down
6 changes: 6 additions & 0 deletions test/sql/httpfs/hffs.test_slow
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,12 @@ FROM parquet_scan('hf://datasets/samansmink/duckdb_ci_private/hive_data/**/*.par
----
401

# Ensure spaces work too
query I
select size from read_text('hf://spaces/samansmink/duckdb_ci_tests/README.md');
----
199

# FIXME: push auth key into CI for this to ensure it is tested in CI properly
require-env HUGGING_FACE_TOKEN

Expand Down
Loading