Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

custom_extension_repository over https:// #9818

Closed
wants to merge 13 commits into from

Conversation

carlopi
Copy link
Contributor

@carlopi carlopi commented Nov 27, 2023

This PR allows to make extension over https protocol work.

Fixes #9818

Demo is like:
git checkout 0fd6fb9198 && GEN=ninja make && wget https://carlopi.github.io/extension-repository/0fd6fb9198/osx_arm64/httpfs.duckdb_extension
To build the duckdb binary + download httpfs extension locally (otherwise also building it it's fine).

Then:

carlo@ScroogeMcDuck duckdb2 % ./build/release/duckdb -unsigned      
v0.9.3-dev897 0fd6fb9198
Enter ".help" for usage hints.
D SET extension_directory = 'empty-extension-folder';
D SET custom_extension_repository='https://carlopi.github.io/extension-repository';
Error: Invalid Input Error: httpfs must be already locally installed before changing to a custom_extension_repo that requires httpfs to be accessed
D INSTALL 'httpfs.duckdb_extension';
D SET custom_extension_repository='https://carlopi.github.io/extension-repository';
D FORCE INSTALL icu;
Error: IO Error: Failed to download extension "icu" at either of:
     "https://carlopi.github.io/extension-repository/847c783e05/osx_arm64/icu.duckdb_extension.gz" (gzip compressed version) or
     "https://carlopi.github.io/extension-repository/847c783e05/osx_arm64/icu.duckdb_extension" (no encoding)

'LOAD httpfs' might help.
D LOAD httpfs;
D FORCE INSTALL icu; LOAD icu;
D FORCE INSTALL tpch; LOAD tpch;
D FORCE INSTALL tpcds; LOAD tpcds;
D FORCE INSTALL zzz;
Error: IO Error: Failed to download extension "zzz" at either of:
     "https://carlopi.github.io/extension-repository/0fd6fb9198/osx_arm64/zzz.duckdb_extension.gz" (gzip compressed version) or
     "https://carlopi.github.io/extension-repository/0fd6fb9198/osx_arm64/zzz.duckdb_extension" (no encoding)

'LOAD httpfs' might help.

All but the httpfs extension (that is loaded from the locally build folder) are loaded over https from the GitHub Page hosted at https://carlopi.github.io/extension-repository, that can only accessed over https, backed by the repository with this content: https://github.com/carlopi/extension-repository/tree/main/0fd6fb9198/osx_arm64.

Extension are first looked up in the gzip-compressed version, and if not present in the non-compressed one.
This holds both for when custom_extension_repository is a remote OR local directory.

See for example:

carlo@ScroogeMcDuck duckdb2 % ./build/release/duckdb -unsigned
v0.9.3-dev897 0fd6fb9198
Enter ".help" for usage hints.
D SET custom_extension_repository= 'extension-repository';
D FORCE INSTALL tpch; LOAD tpch;
D FORCE INSTALL tpcds; LOAD tpcds;
D FORCE INSTALL icu; LOAD icu;
D FORCE INSTALL zzz;
Error: IO Error: Failed to copy local extension "zzz" at either of:
     "extension-repository/0fd6fb9198/osx_arm64/zzz.duckdb_extension.gz" (gzip compressed version) or
     "extension-repository/0fd6fb9198/osx_arm64/zzz.duckdb_extension" (no encoding)

Where extensions works if at least one of the files name.duckdb_extension.gz or name.duckdb_extension are present at the expected path.

To be done, but not part of this PR:

  • Enabling autoloading for httpfs extension (currently there is some missing context, might need to rework interface a bit, to be reviewed)
  • Allow also over https to send user-agent string (currently blocked on exposing arbitrary headers on httpfs filesystem
  • Allowing streamlined publishing of extension-template based repositories over httpfs.

This is still allowing to host repositories on https://, with the only caveat that httpfs extension has to be either already present on the system (being installed before hand from the official extension repository).

This also allows to simplify building and using of local extensions, given that once passed SET custom_extension_repository='build/release', (and with the infrastructure discussed in #9800), it would be possible to make autoloading and autoinstall work in a similar way to what users will experience. This allows simplifications of infrastructure.

…ing.org/'

To be allowed to set CustomExtensionRepository to a remote extension repository requiring https or s3 protocol
httpfs to be available, so for this to succeed httpfs needs to be:
      1. already loaded (potentially built-in)
      2. available to be installed at the local_extension_path

The problem is that allowing to change the custom_extension_repository to something needing httpfs allows users
  to corner themselves out.
Potentially this can be relaxed, but given until serving extensions over https:// was not really working, we can as
  well be strict and relax this later.
There are, as of now, no tests for this, but for the fact that the rest
of the system keeps working
@carlopi carlopi added the Needs Documentation Use for issues or PRs that require changes in the documentation label Nov 27, 2023
@carlopi carlopi requested a review from samansmink November 27, 2023 16:09
@github-actions github-actions bot marked this pull request as draft November 27, 2023 16:28
Copy link
Contributor

@samansmink samansmink left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! one veeery minor nitpick

src/main/extension/extension_install.cpp Outdated Show resolved Hide resolved
@carlopi carlopi marked this pull request as ready for review November 27, 2023 16:50
@github-actions github-actions bot marked this pull request as draft November 28, 2023 08:26
@carlopi carlopi marked this pull request as ready for review November 28, 2023 08:26
carlopi added a commit to carlopi/duckdb-web that referenced this pull request Nov 28, 2023
carlopi added a commit to carlopi/duckdb-web that referenced this pull request Nov 28, 2023
@carlopi
Copy link
Contributor Author

carlopi commented Nov 28, 2023

I think this can be merged, can be polished a bit by allowing autoloading of httpfs, but that's also OK in a separate step since in any case this expand what was previously possible, fixing some corner cases in the process.

While writing the docs I also realised that we are not erroring out nicely on:

LOAD 'https://carlopi.github.io/extension-repository/0fd6fb9198/osx_arm64/json.duckdb_extension';

szarnyasg added a commit to duckdb/duckdb-web that referenced this pull request Nov 28, 2023
@github-actions github-actions bot marked this pull request as draft November 29, 2023 14:02
@carlopi carlopi marked this pull request as ready for review November 29, 2023 14:05
@carlopi
Copy link
Contributor Author

carlopi commented Nov 29, 2023

Autoloading of httpfs (or any other extension required by file system), has been handled, even though the used API for doing this might need to be reviewed.

@carlopi carlopi added Draft and removed Needs Documentation Use for issues or PRs that require changes in the documentation Ready To Merge labels Nov 29, 2023
@carlopi
Copy link
Contributor Author

carlopi commented Nov 29, 2023

There is a funny corner case to be handled, fixing reentrancy.

@github-actions github-actions bot marked this pull request as draft November 30, 2023 11:10
@carlopi carlopi marked this pull request as ready for review November 30, 2023 11:11
Idea is that it should not be possible to ever go back again in the InstallExtension
codepath.
@github-actions github-actions bot marked this pull request as draft November 30, 2023 16:11
@carlopi carlopi marked this pull request as ready for review November 30, 2023 16:11
@carlopi carlopi removed the Draft label Nov 30, 2023
@github-actions github-actions bot marked this pull request as draft December 1, 2023 08:32
@carlopi carlopi marked this pull request as ready for review December 1, 2023 08:33
@carlopi carlopi added the Draft label Dec 1, 2023
Copy link
Collaborator

@Mytherin Mytherin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Comments below

void ExtensionHelper::InstallExtension(DBConfig &config, FileSystem &fs, const string &extension, bool force_install,
const string &repository) {
ReentrancyLock reentrant_lock;
if (!reentrant_lock.LockIsValid()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we can disable auto-loading here entirely (like it is currently) so that we don't have to deal with re-entrancy either?

// Either extension OR repository might require some extensions to be loaded
if (!StringUtil::StartsWith(extension, "http://")) {
// Passing false as last parameter to avoid reentrancy issues
TryAutoLoadExtension(context, FileSystem::LookupExtensionForPattern(extension), false);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove the auto-loading here as this is introducing a lot of complexity - let's instead fail with a nicer error message if httpfs is not loaded

}
}
auto read_handle = fs.OpenFile(file, FileFlags::FILE_FLAGS_READ);
auto test_data = std::unique_ptr<unsigned char[]> {new unsigned char[read_handle->GetFileSize()]};
read_handle->Read(test_data.get(), read_handle->GetFileSize());
WriteExtensionFileToDisk(fs, temp_path, (void *)test_data.get(), read_handle->GetFileSize());

if (StringUtil::EndsWith(file, ".gz")) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be unified with the code that does this below?

// Potentially this can be relaxed, but given until serving extensions over https:// was not really working, we can
// as
// well be strict and relax this later.
if (!context.db->ExtensionIsLoaded("httpfs")) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better for this to fail during the installation, and not when setting the option

Copy link

github-actions bot commented Apr 2, 2024

This pull request is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 30 days.

@github-actions github-actions bot added the stale label Apr 2, 2024
@carlopi
Copy link
Contributor Author

carlopi commented May 24, 2024

Solved/superseeded by #11677

@carlopi carlopi closed this May 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants