PEP 324 – subprocess - New process module
- Author:
- Peter Astrand <astrand at lysator.liu.se>
- Status:
- Final
- Type:
- Standards Track
- Created:
- 19-Nov-2003
- Python-Version:
- 2.4
- Post-History:
Abstract
This PEP describes a new module for starting and communicating with processes.
Motivation
Starting new processes is a common task in any programming language, and very common in a high-level language like Python. Good support for this task is needed, because:
- Inappropriate functions for starting processes could mean a security risk: If the program is started through the shell, and the arguments contain shell meta characters, the result can be disastrous. [1]
- It makes Python an even better replacement language for over-complicated shell scripts.
Currently, Python has a large number of different functions for process creation. This makes it hard for developers to choose.
The subprocess module provides the following enhancements over previous functions:
- One “unified” module provides all functionality from previous functions.
- Cross-process exceptions: Exceptions happening in the child
before the new process has started to execute are re-raised in
the parent. This means that it’s easy to handle
exec()
failures, for example. With popen2, for example, it’s impossible to detect if the execution failed. - A hook for executing custom code between fork and exec. This can be used for, for example, changing uid.
- No implicit call of /bin/sh. This means that there is no need for escaping dangerous shell meta characters.
- All combinations of file descriptor redirection is possible. For example, the “python-dialog” [2] needs to spawn a process and redirect stderr, but not stdout. This is not possible with current functions, without using temporary files.
- With the subprocess module, it’s possible to control if all open file descriptors should be closed before the new program is executed.
- Support for connecting several subprocesses (shell “pipe”).
- Universal newline support.
- A
communicate()
method, which makes it easy to send stdin data and read stdout and stderr data, without risking deadlocks. Most people are aware of the flow control issues involved with child process communication, but not all have the patience or skills to write a fully correct and deadlock-free select loop. This means that many Python applications contain race conditions. Acommunicate()
method in the standard library solves this problem.
Rationale
The following points summarizes the design:
- subprocess was based on popen2, which is tried-and-tested.
- The factory functions in popen2 have been removed, because I consider the class constructor equally easy to work with.
- popen2 contains several factory functions and classes for different combinations of redirection. subprocess, however, contains one single class. Since the subprocess module supports 12 different combinations of redirection, providing a class or function for each of them would be cumbersome and not very intuitive. Even with popen2, this is a readability problem. For example, many people cannot tell the difference between popen2.popen2 and popen2.popen4 without using the documentation.
- One small utility function is provided:
subprocess.call()
. It aims to be an enhancement overos.system()
, while still very easy to use:- It does not use the Standard C function system(), which has limitations.
- It does not call the shell implicitly.
- No need for quoting; using an argument list.
- The return value is easier to work with.
The
call()
utility function accepts an ‘args’ argument, just like thePopen
class constructor. It waits for the command to complete, then returns thereturncode
attribute. The implementation is very simple:def call(*args, **kwargs): return Popen(*args, **kwargs).wait()
The motivation behind the
call()
function is simple: Starting a process and wait for it to finish is a common task.While
Popen
supports a wide range of options, many users have simple needs. Many people are usingos.system()
today, mainly because it provides a simple interface. Consider this example:os.system("stty sane -F " + device)
With
subprocess.call()
, this would look like:subprocess.call(["stty", "sane", "-F", device])
or, if executing through the shell:
subprocess.call("stty sane -F " + device, shell=True)
- The “preexec” functionality makes it possible to run arbitrary
code between fork and exec. One might ask why there are special
arguments for setting the environment and current directory, but
not for, for example, setting the uid. The answer is:
- Changing environment and working directory is considered fairly common.
- Old functions like
spawn()
has support for an “env”-argument. - env and cwd are considered quite cross-platform: They make sense even on Windows.
- On POSIX platforms, no extension module is required: the module
uses
os.fork()
,os.execvp()
etc. - On Windows platforms, the module requires either Mark Hammond’s Windows extensions [5], or a small extension module called _subprocess.
Specification
This module defines one class called Popen:
class Popen(args, bufsize=0, executable=None,
stdin=None, stdout=None, stderr=None,
preexec_fn=None, close_fds=False, shell=False,
cwd=None, env=None, universal_newlines=False,
startupinfo=None, creationflags=0):
Arguments are:
args
should be a string, or a sequence of program arguments. The program to execute is normally the first item in the args sequence or string, but can be explicitly set by using the executable argument.On UNIX, with
shell=False
(default): In this case, thePopen
class usesos.execvp()
to execute the child program.args
should normally be a sequence. A string will be treated as a sequence with the string as the only item (the program to execute).On UNIX, with
shell=True
: Ifargs
is a string, it specifies the command string to execute through the shell. Ifargs
is a sequence, the first item specifies the command string, and any additional items will be treated as additional shell arguments.On Windows: the
Popen
class usesCreateProcess()
to execute the child program, which operates on strings. Ifargs
is a sequence, it will be converted to a string using thelist2cmdline
method. Please note that not all MS Windows applications interpret the command line the same way: Thelist2cmdline
is designed for applications using the same rules as the MS C runtime.bufsize
, if given, has the same meaning as the corresponding argument to the built-inopen()
function: 0 means unbuffered, 1 means line buffered, any other positive value means use a buffer of (approximately) that size. A negativebufsize
means to use the system default, which usually means fully buffered. The default value forbufsize
is 0 (unbuffered).stdin
,stdout
andstderr
specify the executed programs’ standard input, standard output and standard error file handles, respectively. Valid values arePIPE
, an existing file descriptor (a positive integer), an existing file object, andNone
.PIPE
indicates that a new pipe to the child should be created. WithNone
, no redirection will occur; the child’s file handles will be inherited from the parent. Additionally,stderr
can be STDOUT, which indicates that the stderr data from the applications should be captured into the same file handle as for stdout.- If
preexec_fn
is set to a callable object, this object will be called in the child process just before the child is executed. - If
close_fds
is true, all file descriptors except 0, 1 and 2 will be closed before the child process is executed. - If
shell
is true, the specified command will be executed through the shell. - If
cwd
is notNone
, the current directory will be changed to cwd before the child is executed. - If
env
is notNone
, it defines the environment variables for the new process. - If
universal_newlines
is true, the file objects stdout and stderr are opened as a text file, but lines may be terminated by any of\n
, the Unix end-of-line convention,\r
, the Macintosh convention or\r\n
, the Windows convention. All of these external representations are seen as\n
by the Python program. Note: This feature is only available if Python is built with universal newline support (the default). Also, the newlines attribute of the file objects stdout, stdin and stderr are not updated by thecommunicate()
method. - The
startupinfo
andcreationflags
, if given, will be passed to the underlyingCreateProcess()
function. They can specify things such as appearance of the main window and priority for the new process. (Windows only)
This module also defines two shortcut functions:
call(*args, **kwargs)
:- Run command with arguments. Wait for command to complete,
then return the
returncode
attribute.The arguments are the same as for the Popen constructor. Example:
retcode = call(["ls", "-l"])
Exceptions
Exceptions raised in the child process, before the new program has started to execute, will be re-raised in the parent. Additionally, the exception object will have one extra attribute called ‘child_traceback’, which is a string containing traceback information from the child’s point of view.
The most common exception raised is OSError
. This occurs, for
example, when trying to execute a non-existent file. Applications
should prepare for OSErrors
.
A ValueError
will be raised if Popen is called with invalid
arguments.
Security
Unlike some other popen functions, this implementation will never call /bin/sh implicitly. This means that all characters, including shell meta-characters, can safely be passed to child processes.
Popen objects
Instances of the Popen class have the following methods:
poll()
- Check if child process has terminated. Returns
returncode
attribute. wait()
- Wait for child process to terminate. Returns
returncode
attribute. communicate(input=None)
- Interact with process: Send data to stdin. Read data from
stdout and stderr, until end-of-file is reached. Wait for
process to terminate. The optional stdin argument should be a
string to be sent to the child process, or
None
, if no data should be sent to the child.communicate()
returns a tuple(stdout, stderr)
.Note: The data read is buffered in memory, so do not use this method if the data size is large or unlimited.
The following attributes are also available:
stdin
- If the
stdin
argument isPIPE
, this attribute is a file object that provides input to the child process. Otherwise, it isNone
. stdout
- If the
stdout
argument isPIPE
, this attribute is a file object that provides output from the child process. Otherwise, it isNone
. stderr
- If the
stderr
argument isPIPE
, this attribute is file object that provides error output from the child process. Otherwise, it isNone
. pid
- The process ID of the child process.
returncode
- The child return code. A
None
value indicates that the process hasn’t terminated yet. A negative value -N indicates that the child was terminated by signal N (UNIX only).
Replacing older functions with the subprocess module
In this section, “a ==> b” means that b can be used as a replacement for a.
Note: All functions in this section fail (more or less) silently if the executed program cannot be found; this module raises an OSError exception.
In the following examples, we assume that the subprocess module is
imported with from subprocess import *
.
Replacing /bin/sh shell backquote
output=`mycmd myarg`
==>
output = Popen(["mycmd", "myarg"], stdout=PIPE).communicate()[0]
Replacing shell pipe line
output=`dmesg | grep hda`
==>
p1 = Popen(["dmesg"], stdout=PIPE)
p2 = Popen(["grep", "hda"], stdin=p1.stdout, stdout=PIPE)
output = p2.communicate()[0]
Replacing os.system()
sts = os.system("mycmd" + " myarg")
==>
p = Popen("mycmd" + " myarg", shell=True)
sts = os.waitpid(p.pid, 0)
Note:
- Calling the program through the shell is usually not required.
- It’s easier to look at the returncode attribute than the exit status.
A more real-world example would look like this:
try:
retcode = call("mycmd" + " myarg", shell=True)
if retcode < 0:
print >>sys.stderr, "Child was terminated by signal", -retcode
else:
print >>sys.stderr, "Child returned", retcode
except OSError, e:
print >>sys.stderr, "Execution failed:", e
Replacing os.spawn*
P_NOWAIT example:
pid = os.spawnlp(os.P_NOWAIT, "/bin/mycmd", "mycmd", "myarg")
==>
pid = Popen(["/bin/mycmd", "myarg"]).pid
P_WAIT example:
retcode = os.spawnlp(os.P_WAIT, "/bin/mycmd", "mycmd", "myarg")
==>
retcode = call(["/bin/mycmd", "myarg"])
Vector example:
os.spawnvp(os.P_NOWAIT, path, args)
==>
Popen([path] + args[1:])
Environment example:
os.spawnlpe(os.P_NOWAIT, "/bin/mycmd", "mycmd", "myarg", env)
==>
Popen(["/bin/mycmd", "myarg"], env={"PATH": "/usr/bin"})
Replacing os.popen*
pipe = os.popen(cmd, mode='r', bufsize)
==>
pipe = Popen(cmd, shell=True, bufsize=bufsize, stdout=PIPE).stdout
pipe = os.popen(cmd, mode='w', bufsize)
==>
pipe = Popen(cmd, shell=True, bufsize=bufsize, stdin=PIPE).stdin
(child_stdin, child_stdout) = os.popen2(cmd, mode, bufsize)
==>
p = Popen(cmd, shell=True, bufsize=bufsize,
stdin=PIPE, stdout=PIPE, close_fds=True)
(child_stdin, child_stdout) = (p.stdin, p.stdout)
(child_stdin,
child_stdout,
child_stderr) = os.popen3(cmd, mode, bufsize)
==>
p = Popen(cmd, shell=True, bufsize=bufsize,
stdin=PIPE, stdout=PIPE, stderr=PIPE, close_fds=True)
(child_stdin,
child_stdout,
child_stderr) = (p.stdin, p.stdout, p.stderr)
(child_stdin, child_stdout_and_stderr) = os.popen4(cmd, mode, bufsize)
==>
p = Popen(cmd, shell=True, bufsize=bufsize,
stdin=PIPE, stdout=PIPE, stderr=STDOUT, close_fds=True)
(child_stdin, child_stdout_and_stderr) = (p.stdin, p.stdout)
Replacing popen2.*
Note: If the cmd argument to popen2
functions is a string, the
command is executed through /bin/sh. If it is a list, the command
is directly executed.
(child_stdout, child_stdin) = popen2.popen2("somestring", bufsize, mode)
==>
p = Popen(["somestring"], shell=True, bufsize=bufsize
stdin=PIPE, stdout=PIPE, close_fds=True)
(child_stdout, child_stdin) = (p.stdout, p.stdin)
(child_stdout, child_stdin) = popen2.popen2(["mycmd", "myarg"], bufsize, mode)
==>
p = Popen(["mycmd", "myarg"], bufsize=bufsize,
stdin=PIPE, stdout=PIPE, close_fds=True)
(child_stdout, child_stdin) = (p.stdout, p.stdin)
The popen2.Popen3
and popen3.Popen4
basically works as
subprocess.Popen
, except that:
subprocess.Popen
raises an exception if the execution fails- the
capturestderr
argument is replaced with the stderr argument. stdin=PIPE
andstdout=PIPE
must be specified.popen2
closes all file descriptors by default, but you have to specifyclose_fds=True
withsubprocess.Popen
.
Open Issues
Some features have been requested but is not yet implemented. This includes:
- Support for managing a whole flock of subprocesses
- Support for managing “daemon” processes
- Built-in method for killing subprocesses
While these are useful features, it’s expected that these can be added later without problems.
- expect-like functionality, including pty support.
pty support is highly platform-dependent, which is a problem. Also, there are already other modules that provide this kind of functionality [6].
Backwards Compatibility
Since this is a new module, no major backward compatible issues are expected. The module name “subprocess” might collide with other, previous modules [3] with the same name, but the name “subprocess” seems to be the best suggested name so far. The first name of this module was “popen5”, but this name was considered too unintuitive. For a while, the module was called “process”, but this name is already used by Trent Mick’s module [4].
The functions and modules that this new module is trying to
replace (os.system
, os.spawn*
, os.popen*
, popen2.*
,
commands.*
) are expected to be available in future Python versions
for a long time, to preserve backwards compatibility.
Reference Implementation
A reference implementation is available from http://www.lysator.liu.se/~astrand/popen5/.
References
Copyright
This document has been placed in the public domain.
Source: https://github.com/python/peps/blob/main/peps/pep-0324.rst
Last modified: 2023-09-09 17:39:29 GMT