PEP: 517
Title: A build-system independent format for source trees
Version:
While distutils
/ setuptools
have taken us a long way, they
suffer from three serious problems: (a) they're missing important
features like usable build-time dependency declaration,
autoconfiguration, and even basic ergonomic niceties like DRY-compliant
version number management, and (b) extending them is difficult, so
while there do exist various solutions to the above problems, they're
often quirky, fragile, and expensive to maintain, and yet (c) it's
very difficult to use anything else, because distutils/setuptools
provide the standard interface for installing packages expected by
both users and installation tools like pip
.
Previous efforts (e.g. distutils2 or setuptools itself) have attempted to solve problems (a) and/or (b). This proposal aims to solve (c).
The goal of this PEP is get distutils-sig out of the business of being a gatekeeper for Python build systems. If you want to use distutils, great; if you want to use something else, then that should be easy to do using standardized methods. The difficulty of interfacing with distutils means that there aren't many such systems right now, but to give a sense of what we're thinking about see flit or bento. Fortunately, wheels have now solved many of the hard problems here -- e.g. it's no longer necessary that a build system also know about every possible installation configuration -- so pretty much all we really need from a build system is that it have some way to spit out standard-compliant wheels and sdists.
We therefore propose a new, relatively minimal interface for
installation tools like pip
to interact with package source trees
and source distributions.
A source tree is something like a VCS checkout. We need a standard
interface for installing from this format, to support usages like
pip install some-directory/
.
A source distribution is a static snapshot representing a particular
release of some source code, like lxml-3.4.4.zip
. Source
distributions serve many purposes: they form an archival record of
releases, they provide a stupid-simple de facto standard for tools
that want to ingest and process large corpora of code, possibly
written in many languages (e.g. code search), they act as the input to
downstream packaging systems like Debian/Fedora/Conda/..., and so
forth. In the Python ecosystem they additionally have a particularly
important role to play, because packaging tools like pip
are able
to use source distributions to fulfill binary dependencies, e.g. if
there is a distribution foo.whl
which declares a dependency on
bar
, then we need to support the case where pip install bar
or
pip install foo
automatically locates the sdist for bar
,
downloads it, builds it, and installs the resulting package.
Source distributions are also known as sdists for short.
A build frontend is a tool that users might run that takes arbitrary
source trees or source distributions and builds wheels from them. The
actual building is done by each source tree's build backend. In a
command like pip wheel some-directory/
, pip is acting as a build
frontend.
An integration frontend is a tool that users might run that takes a
set of package requirements (e.g. a requirements.txt file) and
attempts to update a working environment to satisfy those
requirements. This may require locating, building, and installing a
combination of wheels and sdists. In a command like pip install
lxml==2.4.0
, pip is acting as an integration frontend.
There is an existing, legacy source tree format involving
setup.py
. We don't try to specify it further; its de facto
specification is encoded in the source code and documentation of
distutils
, setuptools
, pip
, and other tools. We'll refer
to it as the setup.py
-style.
Here we define a new pypackage.json
-style source tree. This
consists of any any directory which contains a file named
pypackage.json
. (If a tree contains both pypackage.json
and
setup.py
then it is a pypackage.json
-style source tree, and
pypackage.json
-aware tools should ignore the setup.py
; this
allows packages to include a setup.py
for compatibility with old
build frontends, while using the new system with new build frontends.)
This file has the following schema. Extra keys are ignored.
- schema
- The version of the schema. This PEP defines version "1". Defaults to "1" when absent. All tools reading the file MUST error on an unrecognised schema version.
- bootstrap_requires
Optional list of PEP 508 dependency specifications that the build frontend must ensure are available before invoking the build backend. For instance, if using flit, then the requirements might be:
"bootstrap_requires": ["flit"]
- build_backend
A mandatory string naming a Python object that will be used to perform the build (see below for details). This is formatted following the same
module:object
syntax as asetuptools
entry point. For instance, if using flit, then the build system might be specified as:"build_system": "flit.api:main"
and this object would be looked up by executing the equivalent of:
import flit.api backend = flit.api.main
It's also legal to leave out the
:object
part, e.g."build_system": "flit.api"
which acts like:
import flit.api backend = flit.api
Formally, the string should satisfy this grammar:
identifier = (letter | '_') (letter | '_' | digit)* module_path = identifier ('.' identifier)* object_path = identifier ('.' identifier)* entry_point = module_path (':' object_path)?
And we import
module_path
and then lookupmodule_path.object_path
(or justmodule_path
ifobject_path
is missing).
The build backend object is expected to have attributes which provide
some or all of the following hooks. The common config_settings
argument is described after the individual hooks:
def get_build_requires(config_settings): ...
This hook MUST return an additional list of strings containing PEP 508
dependency specifications, above and beyond those specified in the
pypackage.json
file. Example:
def get_build_requires(config_settings): return ["wheel >= 0.25", "setuptools"]
Optional. If not defined, the default implementation is equivalent to
return []
.
def get_wheel_metadata(metadata_directory, config_settings): ...
Must create a .dist-info
directory containing wheel metadata
inside the specified metadata_directory
(i.e., creates a directory
like {metadata_directory}/{package}-{version}.dist-info/
. This
directory MUST be a valid .dist-info
directory as defined in the
wheel specification, except that it need not contain RECORD
or
signatures. The hook MAY also create other files inside this
directory, and a build frontend MUST ignore such files; the intention
here is that in cases where the metadata depends on build-time
decisions, the build backend may need to record these decisions in
some convenient format for re-use by the actual wheel-building step.
Return value is ignored.
Optional. If a build frontend needs this information and the method is
not defined, it should call build_wheel
and look at the resulting
metadata directly.
def build_wheel(wheel_directory, config_settings, metadata_directory=None): ...
Must build a .whl
file, and place it in the specified
wheel_directory
.
If the build frontend has previously called get_wheel_metadata
and
depends on the wheel resulting from this call to have metadata
matching this earlier call, then it should provide the path to the
previous metadata_directory
as an argument. If this argument is
provided, then build_wheel
MUST produce a wheel with identical
metadata. The directory passed in by the build frontend MUST be
identical to the directory created by get_wheel_metadata
,
including any unrecognized files it created.
Mandatory.
def install_editable(prefix, config_settings, metadata_directory=None): ...
Must perform whatever actions are necessary to install the current
project into the Python installation at install_prefix
in an
"editable" fashion. This is intentionally underspecified, because it's
included as a stopgap to avoid regressing compared to the current
equally underspecified setuptools develop
command; hopefully a
future PEP will replace this hook with something that works better and
is better specified. (Unfortunately, cleaning up editable installs to
actually work well and be well-specified turns out to be a large and
difficult job, so we prefer not to do a half-way job here.)
For the meaning and requirements of the metadata_directory
argument, see build_wheel
above.
[XX UNRESOLVED: it isn't entirely clear whether prefix
alone is
enough to support all needed configurations -- in particular,
@takluyver has suggested that contra to the distutils docs, --user
on Windows is not expressible in terms of a regular prefix install.]
Optional. If not defined, then this build backend does not support editable builds.
config_settings
This argument, which is passed to all hooks, is an arbitrary
dictionary provided as an "escape hatch" for users to pass ad-hoc
configuration into individual package builds. Build backends MAY
assign any semantics they like to this dictionary. Build frontends
SHOULD provide some mechanism for users to specify arbitrary
string-key/string-value pairs to be placed in this dictionary. For
example, they might support some syntax like --package-config
CC=gcc
. Build frontends MAY also provide arbitrary other mechanisms
for users to place entries in this dictionary. For example, pip
might choose to map a mix of modern and legacy command line arguments
like:
pip install \ --package-config CC=gcc \ --global-option="--some-global-option" \ --build-option="--build-option1" \ --build-option="--build-option2"
into a config_settings
dictionary like:
{ "CC": "gcc", "--global-option": ["--some-global-option"], "--build-option": ["--build-option1", "--build-option2"], }
Of course, it's up to users to make sure that they pass options which make sense for the particular build backend and package that they are building.
All hooks are run with working directory set to the root of the source tree, and MAY print arbitrary informational text on stdout and stderr. They MUST NOT read from stdin, and the build frontend MAY close stdin before invoking the hooks.
If a hook raises an exception, or causes the process to terminate, then this indicates an error.
One of the responsibilities of a build frontend is to set up the Python environment in which the build backend will run.
We do not require that any particular "virtual environment" mechanism be used; a build frontend might use virtualenv, or venv, or no special mechanism at all. But whatever mechanism is used MUST meet the following criteria:
All requirements specified by the project's build-requirements must be available for import from Python. In particular:
- The
get_build_requires
hook is executed in an environment which contains the bootstrap requirements specified in thepypackage.json
file. - All other hooks are executed in an environment which contains both
the bootstrap requirements specified in the
pypackage.json
hook and those specified by theget_build_requires
hook.
- The
This must remain true even for new Python subprocesses spawned by the build environment, e.g. code like:
import sys, subprocess subprocess.check_call([sys.executable, ...])
must spawn a Python process which has access to all the project's build-requirements. This is necessary e.g. for build backends that want to run legacy
setup.py
scripts in a subprocess.All command-line scripts provided by the build-required packages must be present in the build environment's PATH. For example, if a project declares a build-requirement on flit, then the following must work as a mechanism for running the flit command-line tool:
import subprocess subprocess.check_call(["flit", ...])
A build backend MUST be prepared to function in any environment which meets the above criteria. In particular, it MUST NOT assume that it has access to any packages except those that are present in the stdlib, or that are explicitly declared as build-requirements.
A build frontend MAY use any mechanism for setting up a build environment that meets the above criteria. For example, simply installing all build-requirements into the global environment would be sufficient to build any compliant package -- but this would be sub-optimal for a number of reasons. This section contains non-normative advice to frontend implementors.
A build frontend SHOULD, by default, create an isolated environment for each build, containing only the standard library and any explicitly requested build-dependencies. This has two benefits:
- It allows for a single installation run to build multiple packages
that have contradictory build-requirements. E.g. if package1
build-requires pbr==1.8.1, and package2 build-requires pbr==1.7.2,
then these cannot both be installed simultaneously into the global
environment -- which is a problem when the user requests
pip install package1 package2
. Or if the user already has pbr==1.8.1 installed in their global environment, and a package build-requires pbr==1.7.2, then downgrading the user's version would be rather rude. - It acts as a kind of public health measure to maximize the number of packages that actually do declare accurate build-dependencies. We can write all the strongly worded admonitions to package authors we want, but if build frontends don't enforce isolation by default, then we'll inevitably end up with lots of packages on PyPI that build fine on the original author's machine and nowhere else, which is a headache that no-one needs.
However, there will also be situations where build-requirements are
problematic in various ways. For example, a package author might
accidentally leave off some crucial requirement despite our best
efforts; or, a package might declare a build-requirement on foo >=
1.0 which worked great when 1.0 was the latest version, but now 1.1
is out and it has a showstopper bug; or, the user might decide to
build a package against numpy==1.7 -- overriding the package's
preferred numpy==1.8 -- to guarantee that the resulting build will be
compatible at the C ABI level with an older version of numpy (even if
this means the resulting build is unsupported upstream). Therefore,
build frontends SHOULD provide some mechanism for users to override
the above defaults. For example, a build frontend could have a
--build-with-system-site-packages
option that causes the
--system-site-packages
option to be passed to
virtualenv-or-equivalent when creating build environments, or a
--build-requirements-override=my-requirements.txt
option that
overrides the project's normal build-requirements.
The general principle here is that we want to enforce hygiene on package authors, while still allowing end-users to open up the hood and apply duct tape when necessary.
For now, we continue with the legacy sdist format which is mostly
undefined, but basically comes down to: a file named
{NAME}-{VERSION}.{EXT}
, which unpacks into a buildable source tree
called {NAME}-{VERSION}/
. Traditionally these have always
contained setup.py
-style source trees; we now allow them to also
contain pypackage.json
-style source trees.
Integration frontends require that an sdist named
{NAME}-{VERSION}.{EXT}
will generate a wheel named
{NAME}-{VERSION}-{COMPAT-INFO}.whl
.
The primary difference between this and competing proposals (in particular) is that our build backend is defined via a Python hook-based interface rather than a command-line based interface.
We do not expect that this will, by itself, intrinsically reduce the
complexity calling into the backend, because build frontends will
in any case want to run hooks inside a child -- this is important to
isolate the build frontend itself from the backend code and to better
control the build backends execution environment. So under both
proposals, there will need to be some code in pip
to spawn a
subprocess and talk to some kind of command-line/IPC interface, and
there will need to be some code in the subprocess that knows how to
parse these command line arguments and call the actual build backend
implementation. So this diagram applies to all proposals equally:
+-----------+ +---------------+ +----------------+ | frontend | -spawn-> | child cmdline | -Python-> | backend | | (pip) | | interface | | implementation | +-----------+ +---------------+ +----------------+
The key difference between the two approaches is how these interface boundaries map onto project structure:
.-= This PEP =-. +-----------+ +---------------+ | +----------------+ | frontend | -spawn-> | child cmdline | -Python-> | backend | | (pip) | | interface | | | implementation | +-----------+ +---------------+ | +----------------+ | |______________________________________| | Owned by pip, updated in lockstep | | | PEP-defined interface boundary Changes here require distutils-sig .-= Alternative =-. +-----------+ | +---------------+ +----------------+ | frontend | -spawn-> | child cmdline | -Python-> | backend | | (pip) | | | interface | | implementation | +-----------+ | +---------------+ +----------------+ | | |____________________________________________| | Owned by build backend, updated in lockstep | PEP-defined interface boundary Changes here require distutils-sig
By moving the PEP-defined interface boundary into Python code, we gain three key advantages.
First, because there will likely be only a small number of build
frontends (pip
, and... maybe a few others?), while there will
likely be a long tail of custom build backends (since these are chosen
separately by each package to match their particular build
requirements), the actual diagrams probably look more like:
.-= This PEP =-. +-----------+ +---------------+ +----------------+ | frontend | -spawn-> | child cmdline | -Python+> | backend | | (pip) | | interface | | | implementation | +-----------+ +---------------+ | +----------------+ | | +----------------+ +> | backend | | | implementation | | +----------------+ : : .-= Alternative =-. +-----------+ +---------------+ +----------------+ | frontend | -spawn+> | child cmdline | -Python-> | backend | | (pip) | | | interface | | implementation | +-----------+ | +---------------+ +----------------+ | | +---------------+ +----------------+ +> | child cmdline | -Python-> | backend | | | interface | | implementation | | +---------------+ +----------------+ : :
That is, this PEP leads to less total code in the overall ecosystem. And in particular, it reduces the barrier to entry of making a new build system. For example, this is a complete, working build backend:
# mypackage_custom_build_backend.py import os.path def get_build_requires(config_settings, config_directory): return ["wheel"] def build_wheel(wheel_directory, config_settings, config_directory=None): from wheel.archive import archive_wheelfile path = os.path.join(wheel_directory, "mypackage-0.1-py2.py3-none-any") archive_wheelfile(path, "src/")
Of course, this is a terrible build backend: it requires the user to
have manually set up the wheel metadata in
src/mypackage-0.1.dist-info/
; when the version number changes it
must be manually updated in multiple places; it doesn't implement the
metadata or develop hooks, ... but it works, and these features can be
added incrementally. Much experience suggests that large successful
projects often originate as quick hacks (e.g., Linux -- "just a hobby,
won't be big and professional"; IPython/Jupyter -- a grad
student's ``$PYTHONSTARTUP` file
<http://blog.fperez.org/2012/01/ipython-notebook-historical.html>`_),
so if our goal is to encourage the growth of a vibrant ecosystem of
good build tools, it's important to minimize the barrier to entry.
Second, because Python provides a simpler yet richer structure for
describing interfaces, we remove unnecessary complexity from the
specification -- and specifications are the worst place for
complexity, because changing specifications requires painful
consensus-building across many stakeholders. In the command-line
interface approach, we have to come up with ad hoc ways to map
multiple different kinds of inputs into a single linear command line
(e.g. how do we avoid collisions between user-specified configuration
arguments and PEP-defined arguments? how do we specify optional
arguments? when working with a Python interface these questions have
simple, obvious answers). When spawning and managing subprocesses,
there are many fiddly details that must be gotten right, subtle
cross-platform differences, and some of the most obvious approaches --
e.g., using stdout to return data for the build_requires
operation
-- can create unexpected pitfalls (e.g., what happens when computing
the build requirements requires spawning some child processes, and
these children occasionally print an error message to stdout?
obviously a careful build backend author can avoid this problem, but
the most obvious way of defining a Python interface removes this
possibility entirely, because the hook return value is clearly
demarcated).
In general, the need to isolate build backends into their own process means that we can't remove IPC complexity entirely -- but by placing both sides of the IPC channel under the control of a single project, we make it much much cheaper to fix bugs in the IPC interface than if fixing bugs requires coordinated agreement and coordinated changes across the ecosystem.
Third, and most crucially, the Python hook approach gives us much more powerful options for evolving this specification in the future.
For concreteness, imagine that next year we add a new
install_editable2
hook, which replaces the current
install_editable
hook with something better specified. In order to
manage the transition, we want it to be possible for build frontends
to transparently use install_editable2
when available and fall
back onto install_editable
otherwise; and we want it to be
possible for build backends to define both methods, for compatibility
with both old and new build frontends.
Furthermore, our mechanism should also fulfill two more goals: (a) If
new versions of e.g. pip
and flit
are both updated to support
the new interface, then this should be sufficient for it to be used;
in particular, it should not be necessary for every project that
uses flit
to update its individual pypackage.json
file. (b)
We do not want to have to spawn extra processes just to perform this
negotiation, because process spawns can easily become a bottleneck when
deploying large multi-package stacks on some platforms (Windows).
In the interface described here, all of these goals are easy to
achieve. Because pip
controls the code that runs inside the child
process, it can easily write it to do something like:
command, backend, args = parse_command_line_args(...) if command == "do_editable_install": if hasattr(backend, "install_editable2"): backend.install_editable2(...) elif hasattr(backend, "install_editable"): backend.install_editable(...) else: # error handling
In the alternative where the public interface boundary is placed at
the subprocess call, this is not possible -- either we need to spawn
an extra process just to query what interfaces are supported (as was
included in an earlier version of this alternative PEP), or
else we give up on autonegotiation entirely (as in the current version
of that PEP), meaning that any changes in the interface will require
N individual packages to update their pypackage.json
files before
any change can go live, and that any changes will necessarily be
restricted to new releases.
One specific consequence of this is that in this PEP, we're able to
make the get_wheel_metadata
command optional. In our design, this
can easily be worked around by a tool like pip
, which can put code
in its subprocess runner like:
def get_wheel_metadata(output_dir, config_settings): if hasattr(backend, "get_wheel_metadata"): backend.get_wheel_metadata(output_dir, config_settings) else: backend.build_wheel(output_dir, config_settings) touch(output_dir / "PIP_ALREADY_BUILT_WHEELS") unzip_metadata(output_dir/*.whl) def build_wheel(output_dir, config_settings, metadata_dir): if os.path.exists(metadata_dir / "PIP_ALREADY_BUILT_WHEELS"): copy(metadata_dir / *.whl, output_dir) else: backend.build_wheel(output_dir, config_settings, metadata_dir)
and thus expose a totally uniform interface to the rest of pip
,
with no extra subprocess calls, no duplicated builds, etc. But
obviously this is the kind of code that you only want to write as part
of a private, within-project interface.
(And, of course, making the metadata
command optional is one piece
of lowering the barrier to entry, as discussed above.)
Besides the key command line versus Python hook difference described above, there are a few other differences in this proposal:
- Metadata command is optional (as described above).
- We return metadata as a directory, rather than a single METADATA file. This aligns better with the way that in practice wheel metadata is distributed across multiple files (e.g. entry points), and gives us more options in the future. (For example, instead of following the PEP 426 proposal of switching the format of METADATA to JSON, we might decide to keep the existing METADATA the way it is for backcompat, while adding new extensions as JSON "sidecar" files inside the same directory. Or maybe not; the point is it keeps our options more open.)
- We provide a mechanism for passing information between the metadata step and the wheel building step. I guess everyone probably will agree this is a good idea?
- We call our config file
pypackage.json
instead ofpypa.json
. This is because it describes a package, rather than describing a packaging authority. But really, who cares. - We provide more detailed recommendations about the build environment, but these aren't normative anyway.
A goal here is to make it as simple as possible to convert old-style sdists to new-style sdists. (E.g., this is one motivation for supporting dynamic build requirements.) The ideal would be that there would be a single static pypackage.json that could be dropped into any "version 0" VCS checkout to convert it to the new shiny. This is probably not 100% possible, but we can get close, and it's important to keep track of how close we are... hence this section.
A rough plan would be: Create a build system package
(setuptools_pypackage
or whatever) that knows how to speak
whatever hook language we come up with, and convert them into calls to
setup.py
. This will probably require some sort of hooking or
monkeypatching to setuptools to provide a way to extract the
setup_requires=
argument when needed, and to provide a new version
of the sdist command that generates the new-style format. This all
seems doable and sufficient for a large proportion of packages (though
obviously we'll want to prototype such a system before we finalize
anything here). (Alternatively, these changes could be made to
setuptools itself rather than going into a separate package.)
But there remain two obstacles that mean we probably won't be able to automatically upgrade packages to the new format:
- There currently exist packages which insist on particular packages
being available in their environment before setup.py is
executed. This means that if we decide to execute build scripts in
an isolated virtualenv-like environment, then projects will need to
check whether they do this, and if so then when upgrading to the
new system they will have to start explicitly declaring these
dependencies (either via
setup_requires=
or via static declaration inpypackage.json
). - There currently exist packages which do not declare consistent
metadata (e.g.
egg_info
andbdist_wheel
might get differentinstall_requires=
). When upgrading to the new system, projects will have to evaluate whether this applies to them, and if so they will need to stop doing that.
This document has been placed in the public domain.