Fork PHP! (and speed up your scripts)
Posted Apr 08 in Linux, Programming, Tutorials 10 Comments »I just came across a forum post discussing what the author calls “multithreaded PHP,” and thought I’d clear a few things up about concurrency in PHP. First of all, this is not multhreading. It’s not even “sort of multithreading,” as the author implies (no offense, it’s just not). What the author is actually doing is creating multiple processes. A process is not the same thing as a thread. In fact, PHP does not support multiple threads at all, and doesn’t plan to do so anytime soon.
Threads and processes both increase the concurrency of an application, meaning you can do more things at the same time. A good example of a program that would benefit from multithreading/multiple processes is a web server. A single process or thread can only handle one I/O operation at a time (that’s not completely accurate, but it simplifies things so whatever… you can read more here if you want to know the truth). I/O operations are s - l - o - w (in computer-terms, anyways). If we fork a second process, or create a new thread, we can handle two I/O operations simultaneously! This simple model is the way 90% of network servers work:
While( 1 ) Process 1: Listen for connection Process 1: Connection made, fork Process 2 End Process 2: Handle request, terminate.
That way nobody is turned away by our server because the process that is supposed to be listening for incoming connections is handling a request.
Concurrency can come in handy for client-side programs too from time to time. Take the PHP script I wrote to download YouTube videos, for example. The script connects to YouTube, finds the URLs of the relevant FLV files, and pushes each URL onto a stack. Then it pops each URL off the stack one by one and downloads the FLV file. But YouTube throttles the downloads to about 60K/s. That’s fine when you’re streaming the video in your browser since the bitrate is much lower than that. But it’s not cool when you’re on a box with a 100mbit connection! So what can we do about it?
Well, we can’t change the fact that downloads from YouTube are capped at 60K/s, but we can download more than one movie at once by forking additional processes. And it turns out there is an easy way to do it built right in to PHP. The solution is the pcntl_fork() function, which provides an interface to the underlying fork syscall (only available on *nix platforms, sorry Windows users).
The trick to the fork syscall is that it returns twice. Once in the parent, and once in the child. It’s confusing at first, but if you think about it, it makes sense. In the parent the return value is the new child’s process ID. In the child it’s zero. Thus, we can use a simple if statement to determine whether we’re in the parent or the child process, which will usually determine the execution path we follow.
The changes that I had to make to the youtube script were almost trivial. The general pattern looks like this:
$pid = pcntl_fork();
if($pid == -1) {
die('could not fork')
}
else if ($pid) {
// positive value means we're in the parent.
// do whatever parents do
....
// wait for children to complete by calling
// pcntl_wait() or a variant
} else {
// zero value means we're in the child.
// do whatever children do
// (e.g. download the files, then exit)
}
Check out the source code here (plaintext version), and compare it to the previous version (plaintext version). The differences are minimal, but the forked version can substantially increase the execution speed if you’re downloading several files on a fast connection.
Recent Comments