Fork PHP! (and speed up your scripts)
Linux, Programming, Tutorials April 8th, 2007 - 9,044 viewsI just came across a forum post discussing what the author calls “multithreaded PHP,” and thought I’d clear a few things up about concurrency in PHP. First of all, this is not multhreading. It’s not even “sort of multithreading,” as the author implies (no offense, it’s just not). What the author is actually doing is creating multiple processes. A process is not the same thing as a thread. In fact, PHP does not support multiple threads at all, and doesn’t plan to do so anytime soon.
Threads and processes both increase the concurrency of an application, meaning you can do more things at the same time. A good example of a program that would benefit from multithreading/multiple processes is a web server. A single process or thread can only handle one I/O operation at a time (that’s not completely accurate, but it simplifies things so whatever… you can read more here if you want to know the truth). I/O operations are s - l - o - w (in computer-terms, anyways). If we fork a second process, or create a new thread, we can handle two I/O operations simultaneously! This simple model is the way 90% of network servers work:
While( 1 ) Process 1: Listen for connection Process 1: Connection made, fork Process 2 End Process 2: Handle request, terminate.
That way nobody is turned away by our server because the process that is supposed to be listening for incoming connections is handling a request.
Concurrency can come in handy for client-side programs too from time to time. Take the PHP script I wrote to download YouTube videos, for example. The script connects to YouTube, finds the URLs of the relevant FLV files, and pushes each URL onto a stack. Then it pops each URL off the stack one by one and downloads the FLV file. But YouTube throttles the downloads to about 60K/s. That’s fine when you’re streaming the video in your browser since the bitrate is much lower than that. But it’s not cool when you’re on a box with a 100mbit connection! So what can we do about it?
Well, we can’t change the fact that downloads from YouTube are capped at 60K/s, but we can download more than one movie at once by forking additional processes. And it turns out there is an easy way to do it built right in to PHP. The solution is the pcntl_fork() function, which provides an interface to the underlying fork syscall (only available on *nix platforms, sorry Windows users).
The trick to the fork syscall is that it returns twice. Once in the parent, and once in the child. It’s confusing at first, but if you think about it, it makes sense. In the parent the return value is the new child’s process ID. In the child it’s zero. Thus, we can use a simple if statement to determine whether we’re in the parent or the child process, which will usually determine the execution path we follow.
The changes that I had to make to the youtube script were almost trivial. The general pattern looks like this:
$pid = pcntl_fork();
if($pid == -1) {
die('could not fork')
}
else if ($pid) {
// positive value means we're in the parent.
// do whatever parents do
....
// wait for children to complete by calling
// pcntl_wait() or a variant
} else {
// zero value means we're in the child.
// do whatever children do
// (e.g. download the files, then exit)
}
Check out the source code here (plaintext version), and compare it to the previous version (plaintext version). The differences are minimal, but the forked version can substantially increase the execution speed if you’re downloading several files on a fast connection.
May 29th, 2007 at 7:13 am
Hello,
I’ve found your explanation about multithreading/concurrency very interesting.
I was searching for a way to make an asynchronous call to a web service from a PHP script (invoke a WS and go on with the PHP script, without worrying about the WS result) and, after thinking how to deal with it from the web service server part, I thought it could be possible to solve the problem from the PHP part (looking for some PHP threading capabilities). However, maybe I could do it with the fork approach, having the WS call made from the son process. What do you think about it? Any suggestion would be greatly appreciated.
Regards,
Adrian
May 29th, 2007 at 11:04 am
You typically don’t want to fork a PHP script that is being run by a PHP module in Apache, or some other web server. It can lead to weird problems.
It sounds like you need some sort of work queue. What I would do is create a simple database table (call it ‘queue’ for example) that holds whatever arguments you need to send to the WS. Your web script can then push items onto this ‘queue’ (INSERT INTO queue … ) and return almost instantly. In the background you’ll then run a second script, either using cron or by putting a script in an infinite loop and running it in the backround (maybe making it sleep() after each cycle if there’s no work to be done). The second script will pop items off the ‘queue’ (SELECT FROM queue … LIMIT 1), perform the WS call, then delete the item from the table.
Does that make sense? If not I might have to write a blog post with an example implementation :).
May 29th, 2007 at 12:24 pm
Wow, what a fast answer! Thank you.
The first solution I had in mind was the same you’ve just proposed but implementing this ‘queue’ in a separate application instead of having another PHP script.
Due to security issues it would be hard for me to have a script running all the time. The PHP scripts that I can manipulate are part of the Moodle platform installed in the faculty I study in, and there are restrictive rules about what can be done and what not. I think that Moodle’s management staff will not allow to have scripts running the whole time.
However, the whole system is not only composed by a web service server (in this case Axis2) and the Apache server, and I was trying to avoid adding more software components so as to make the whole management easier. That is why I was looking for an asynchronous way to put the web service into operation from the PHP script. I think that Axis2 has a way to do this kind of invocations within its engine but I still don’t know how. If I don’t achieve my purpose configuring Axis2 or something similar I will try the mentioned ‘queue’ approach.
Thank you for help. I beg you pardon for my poor english but I am from Spain, lol.
May 29th, 2007 at 1:47 pm
I’m not familiar with Moodle, but you may be able to get something working with flush() and/or register_shutdown_function() so that the script continues to execute after content has been sent to the web browser. Doing this will tie up a lot of resources, however, since you’ll be executing a slow remote procedure call from a process that probably has a fairly substantial memory footprint (i.e. your web server process).
Good luck, and let me know if you have any other questions, or if you come up with a solution!
August 3rd, 2007 at 5:51 am
Actually, on linux, a process and a thread IS the same thing. However, they are usually created and used differently; threads with pthreads and processes with fork/exec.
August 4th, 2007 at 5:11 pm
[…] Many modern applications take advantage of multithreading. In particular, applications that perform a lot of I/O — like web servers and databases — can drastically improve performance by implementing a multithreaded execution model. On a multiprocessor system, multiple threads of execution can even run simultaneously within a single process. Unfortunately, the threading abstractions in modern operating systems can be hard to understand, and are unavailable in certain programming languages. […]
October 9th, 2007 at 12:02 am
Hail!
What do you think about Apple Iogo? >:)
November 27th, 2007 at 1:54 pm
what about sharing db connection ,eg mysql connection handle made by parent wont be inherited by child process..is there any workaround to make the handle vsible in child processes
January 16th, 2008 at 6:22 pm
I would like to know the answer to the question proposed by prashant.
February 22nd, 2008 at 6:53 pm
@James & Prashant: You can do that by using share memory. Once a process has a connection to DB it put to share memory, the other process just pull that resource out from share memory. Have a look at shmop() function