订单生成流程_聪明，安全地生成子流程-白红宇

订单生成流程_聪明，安全地生成子流程

阅读量：2517 次

发布时间：2019-05-11

本文共 11495 字，大约阅读时间需要 38 分钟。

订单生成流程

As part of your code, you may be inclined to call a command to do something. But is it always a good idea? How to do it safely? What happens behind the scenes?

作为代码的一部分，您可能倾向于调用命令来执行某些操作。但这总是一个好主意吗？如何安全地做？幕后发生了什么？

This article is written from a general perspective, with a Unix/C bias and a very slight Python bias. The problems mentioned apply to all languages in most environments, including Windows.

本文是从一般角度撰写的，带有Unix / C偏见和非常轻微的Python偏见。提到的问题适用于大多数环境中的所有语言，包括Windows。

Contents

内容

()

By calling another process, you introduce a third-party dependency. That dependency isn’t controlled by your code, and your code becomes more fragile. The problems include:

通过调用另一个进程，您引入了第三方依赖性。该依赖关系不受您的代码控制，并且您的代码变得更加脆弱。问题包括：

the program is not installed, or even available, for the user’s OS of choice

the program is not in the $PATH your process gets

the hard-coded path is not correct on the end user’s system

the program is in a different version (eg. GNU vs. BSD, updates/patches), which means different option names or other behaviors

the program’s output is not what you expected due to user config (including locale)

error reporting is based on numeric exit codes, and the meaning of those differs between programs (if they have meaning besides 0/1 in the first place)

尚未为用户选择的操作系统安装或提供该程序

该程序不在您的进程获取的$ PATH中

最终用户系统上的硬编码路径不正确

该程序的版本不同（例如，GNU与BSD，更新/补丁），这意味着不同的选项名称或其他行为

由于用户配置（包括语言环境），程序的输出与预期的不一样

错误报告基于数字退出代码，并且这些含义在程序之间有所不同（如果它们首先具有除0/1之外的含义）

On the other hand, if your code uses a lot of subprocesses, perhaps you should stay with Bash. You can do the harder parts with Python, Ruby, or some other language by calling them from within your Bash script.

另一方面，如果您的代码使用大量子进程，则也许您应该继续使用Bash。您可以通过在Bash脚本中调用Python，Ruby或其他语言来完成较难的部分。

()

Spawning a subprocess always incurs a (minor) performance hit minor compared to the alternatives. With that in mind, and the resiliency issues listed above, you should always try to find an alternative for the external command.

与替代方案相比，产生子进程总是会导致（次要）性能下降。考虑到这一点，以及上面列出的弹性问题，您应该始终尝试为外部命令找到替代方法。

The simplest ones are the basic Unix utilities. Replace grep, sed and awk with string operations and regular expressions. Filesystem utilities will have equivalents — for Python, in os or shutil. Your language of choice can also handle things like networking (don’t call curl), file compression, working with date/time…

最简单的是基本的Unix实用程序。用字符串操作和正则表达式替换grep ， sed和awk 。文件系统实用程序将具有等效功能-适用于Python，在os或shutil中 。您选择的语言还可以处理诸如联网（不要称curl ），文件压缩，使用日期/时间等问题。

Similarly, you should check if there are packages available that already do what you want — library bindings or re-implementations. And if there isn’t, perhaps you could help the world by writing one of those and sharing it?

同样，您应该检查是否有可用的软件包已经完成了所需的工作-库绑定或重新实现。如果没有，也许您可以通过编写其中一个并共享它来帮助世界吗？

One more important thing: if the program uses the same language as your code, then you should try to import the code and run it from the same process instead of spawning a process, if this is feasible.

还有一件重要的事情：如果程序使用与代码相同的语言，那么在可行的情况下，您应该尝试导入代码并从同一进程运行它，而不是生成一个进程。

()

We come to the most important part of this article: how to spawn subprocesses without compromising your system. When you spawn a subprocess on a typical Unix system, fork() is called, and your process is copied. Many modern Unix systems have a copy-on-write implementation of that syscall, meaning that the operation does not result in copying all the memory of the host process over. Forking is (almost) immediately followed by calling execve() (or a helper function from the exec family) in the child process — that function transforms the calling process into a new process . This technique is called fork-exec and is the typical way to spawn a new process on Unix.

我们来到本文最重要的部分：如何在不损害系统的情况下产生子流程。在典型的Unix系统上生成子进程时，将调用fork（）并复制您的进程。许多现代Unix系统都有该syscall的写时复制实现，这意味着该操作不会导致复制主机进程的所有内存。派生之后（几乎）立即在子进程中调用execve（） （或exec系列的辅助函数） -该函数将调用进程转换为新进程。这种技术称为fork-exec，是在Unix上产生新进程的典型方法。

There are two ways to access this API, from the C perspective:

从C角度来看，有两种访问此API的方法：

directly, by calling fork() and exec*() (or posix_spawn()), and providing an array of arguments passed to the process, or

through the shell (sh), usually by calling system(). As Linux’s manpage for system(3) puts it,
The system() library function uses fork(2) to create a child process that executes the shell command specified in command using execl(3) as follows:
```
execl("/bin/sh", "sh", "-c", command, (char *) 0);
```

通过调用fork（）和exec *（） （或posix_spawn（） ）直接提供传递给进程的参数数组，或者

通过shell（ sh ），通常是通过调用system（） 。正如Linux在system（3）的联机帮助页中所述，
system（）库函数使用fork（2）创建一个子进程，该子进程使用execl（3）执行命令中指定的shell命令，如下所示：
```
execl ( "/bin/sh" , "sh" , "-c" , command , ( char * ) 0 );
```

If you go through the shell, you pass one string argument, whereas exec*() demands you to specify arguments separately. Let’s write a sample program to print all the arguments it receives. I’ll do it in Python to get a more readable output.

如果通过外壳，则传递一个字符串参数，而exec *（）要求您单独指定参数。让我们编写一个示例程序来打印接收到的所有参数。我将在Python中执行此操作以获取更具可读性的输出。

Let’s see what appears:

让我们看看出现了什么：

$ ./argv.py foo bar$ ./argv.py foo bar['./argv.py', 'foo', 'bar']['./argv.py', 'foo', 'bar']$ ./argv.py 'foo bar'$ ./argv.py 'foo bar'['./argv.py', 'foo bar']['./argv.py', 'foo bar']$ ./argv.py foo bar baz$ ./argv.py foo bar baz['./argv.py', 'foo bar', 'baz']['./argv.py', 'foo bar', 'baz']$ ./argv.py $(date)$ ./argv.py $(date)['./argv.py', 'Sat', 'Sep', '2', '16:54:52', 'CEST', '2017']['./argv.py', 'Sat', 'Sep', '2', '16:54:52', 'CEST', '2017']$ ./argv.py "$(date)"$ ./argv.py "$(date)"['./argv.py', 'Sat Sep  2 16:54:52 CEST 2017']['./argv.py', 'Sat Sep  2 16:54:52 CEST 2017']$ ./argv.py /usr/*$ ./argv.py /usr/*['./argv.py', '/usr/X11', '/usr/X11R6', '/usr/bin', '/usr/include', '/usr/lib', '/usr/libexec', '/usr/local', '/usr/sbin', '/usr/share', '/usr/standalone']['./argv.py', '/usr/X11', '/usr/X11R6', '/usr/bin', '/usr/include', '/usr/lib', '/usr/libexec', '/usr/local', '/usr/sbin', '/usr/share', '/usr/standalone']$ ./argv.py "/usr/*"$ ./argv.py "/usr/*"['./argv.py', '/usr/*']['./argv.py', '/usr/*']$ ./argv.py $EDITOR$ ./argv.py $EDITOR['./argv.py', 'nvim']['./argv.py', 'nvim']$ $PWD/argv.py foo bar$ $PWD/argv.py foo bar['/Users/kwpolska/Desktop/blog/subprocess/argv.py', 'foo', 'bar']['/Users/kwpolska/Desktop/blog/subprocess/argv.py', 'foo', 'bar']$ ./argv.py a{b,c}d$ ./argv.py a{b,c}d['./argv.py', 'abd', 'acd']['./argv.py', 'abd', 'acd']$ python argv.py foo bar | cat$ python argv.py foo bar | cat['argv.py', 'foo', 'bar']['argv.py', 'foo', 'bar']$ python argv.py foo bar > foo.txt$ python argv.py foo bar > foo.txt$ cat foo.txt$ cat foo.txt['argv.py', 'foo', 'bar']['argv.py', 'foo', 'bar']$ ./argv.py foo; ls /usr$ ./argv.py foo; ls /usr['./argv.py', 'foo']['./argv.py', 'foo']X11@        X11R6@      bin/        include/    lib/        libexec/    local/      sbin/       share/      standalone/X11@        X11R6@      bin/        include/    lib/        libexec/    local/      sbin/       share/      standalone/

As you can see, the following things are handled by the shell (the process is unaware of this occurring):

如您所见，shell处理了以下内容（该过程并未意识到这种情况）：

quotes and escapes

expanding expressions in braces

expanding variables

wildcards (glob, *)

redirections and pipes (> >> |)

command substitution (backticks or $(…))

running multiple commands on the same line (; && || &)

引号和转义符

大括号中的扩展表达式

扩展变量

通配符（glob， * ）

重定向和管道（ > >> | ）

命令替换（反引号或$（…） ）

在同一行（ ; && ||＆ ）上运行多个命令

The list is full of potential vulnerabilities. If end users are in control of the arguments passed, and you go through the shell, they can execute arbitrary commands or even get full shell access. Even in other cases, you’ll have to depend on the shell’s parsing, which introduces an unnecessary indirection.

该列表充满了潜在的漏洞。如果最终用户可以控制传递的参数，并且您可以通过shell进行操作，则他们可以执行任意命令 ，甚至可以进行完全的shell访问 。即使在其他情况下，您也必须依靠外壳程序的解析，这会引入不必要的间接访问。

()

To ensure spawning subprocess is done securely, do not use the shell in between. If you need any of the operations I listed above as part of your command — wildcards, pipes, etc. — you will need to take care of them in your code; most languages have those features built-in.

为确保安全完成产卵子过程， 请勿在之间使用外壳 。如果您需要我在命令中列出的上述任何操作（通配符，管道等），则需要在代码中加以注意；大多数语言都内置了这些功能。

In C (Unix) 在C（Unix）中

posix_spawn(). This also lets you communicate with the process if you open a pipe and make it stdout of the child process. Never use

posix_spawn（） 。如果您打开管道并使其成为子进程的标准输出，这也使您可以与该进程进行通信。切勿使用

system().

system（） 。

In Python 在Python中

Use the subprocess module. Always pass

shell=False and give it a

list of arguments. With asyncio, use

asyncio.create_subprocess_exec (and not

_shell), but note it takes

*args and not a list. Never use

os.system and

os.popen.

使用子流程模块。始终传递

shell = False并为其提供参数

列表。对于asyncio，请使用

asyncio.create_subprocess_exec （而不是

_shell ），但请注意，它使用

* args而不是列表。切勿使用

os.system和

os.popen 。

In Ruby 在Ruby中

IO.popen. Pass multiple arguments to

IO.popen 。将多个参数传递给

system() (

system（） （

system(["ls", "ls"]) or

system（[“ ls”， “ ls”]）或

system("ls", "-l")). Never use

system（“ ls”， “ -l”） ）。切勿使用

%x{command} or backticks.

％x {command}或反引号。

In Java 在Java中

Runtime.exec. Pass multiple arguments or list to

Runtime.exec 。将多个参数或列表传递给

ProcessBuilder.

ProcessBuilder 。

In PHP 在PHP中

All the standard methods go through the shell. Try

escapeshellcmd(),

escapeshellarg() — or better, switch to Python. Or anything, really.

所有标准方法都通过外壳。尝试使用

escapeshellcmd（） ，

escapeshellarg（）或更好的方法，切换到Python。还是什么，真的。

In Go 在围棋

os/exec and

os / exec和

os.StartProcess are safe.

os.StartProcess是安全的。

In Node.js 在Node.js中

Use

child_process.execFile or

child_process.spawn with

shell set to false.

使用

将

shell设置为false的

child_process.execFile或

child_process.spawn 。

Elsewhere 别处

You should be able to specify multiple strings (using variadic arguments,

arrays, or otherwise standard data structures of your language of choice) as

the command line. Otherwise, you might be running into something

shell-related.

您应该能够指定多个字符串（使用可变参数，

数组或您选择的语言的其他标准数据结构）

命令行。否则，您可能会遇到一些麻烦

与外壳相关。

()

On Windows, argument lists are always passed to processes as strings (Python joins them semi-intelligently if it gets a list). Redirections and variables work in shell mode, but globs (asterisks) are always left for the called process to handle.

在Windows上，参数列表始终以字符串的形式传递给进程（如果得到列表，Python将以半智能方式将其连接）。重定向和变量在Shell模式下工作，但是始终保留全局字符（星号）供调用的进程处理。

Some useful functions are implemented as shell built-ins — in that case, you need to call it via the shell.

一些有用的功能是作为Shell内置的实现的，在这种情况下，您需要通过Shell调用它。

Internals: There is no fork() on Windows. Instead, CreateProcess(), ShellExecute(), or lower-level spawn*() functions are used. cmd.exe /c is called in shell calls.

内幕：Windows上没有fork（） 。而是使用CreateProcess（） ， ShellExecute（）或较低级别的spawn *（）函数。在外壳程序调用中调用cmd.exe / c 。

Unless your operating system does not implement copy-on-write forking — in that case, you might even run out of memory if you use too much of it.

除非您的操作系统未实现写时复制分叉-在这种情况下，如果过多使用它，甚至可能会耗尽内存。

execve(), which takes an exact path, an array of arguments, and takes environment variables as input. Other variants can also perform a execve（） ，它采用精确路径，参数数组并以环境变量作为输入。其他变体也可以执行$PATH search, take argv as variadic arguments, and inherit environment from the current process. $ PATH搜索，将argv作为可变参数，并从当前进程继承环境。 execl() does the last two.execl（）执行最后两个。