SCO Doesn’t Own UNIX

Linux 1 Comment »

Following nearly five years of FUD and barratry, Judge Dale Kimball has issued a 102-page ruling [via Groklaw] concluding that “Novell is the owner of the UNIX and UnixWare copyrights.” Furthermore, the court found that SCO owes Novell quite a bit of money — “[B]ecause a portion of SCO’s 2003 Sun and Microsoft Agreements indisputably licenses SVRX products… SCO is obligated… to account for and pass through to Novell the appropriate portion relating to the license of SVRX products.” Read the rest of this entry »

Password free remote login and other SSH tips

Linux, Security 6 Comments »

I typically have four or five terminal windows open, and I’m almost always logged in to at least three servers (my dev box, production box, and database server). It’s a huge pain to log back into all these sessions whenever my connection is dropped. To keep myself sane, I use a couple of tricks to keep timeouts from occurring, and to streamline the login process when they do. Read the rest of this entry »

Threads vs. Processes: They’re not the same thing!

Linux, Programming 7 Comments »

Process vs. ThreadI read a lot of tech-related blogs and other tech-news, and I’ve caught a number of very talented programmers and intelligent technologists using the terms thread and process interchangibly. Forgive me for being pedantic, but they’re not the same thing! It’s true that threads and processes are very similar: they’re both methods of parallelizing an application. But the similarities pretty much stop there. Read the rest of this entry »

What is the Completely Fair Scheduler?

Linux, Programming 25 Comments »

Linux Penguin LogoIf you’ve been following Linux kernel news then you’ve probably heard about the new Completely Fair Scheduler that has been merged into the upcoming 2.6.23 kernel release. It’s been a while since I’ve done much Linux kernel hacking, so the initial announcement was mostly over my head. After reading about the new scheduler in several places, I decided to do a bit of research into how the current Linux scheduler works, and what makes the new scheduling algorithm so interesting. Here’s what I learned. Read the rest of this entry »

What exactly is a load average?

Linux 12 Comments »

Load AverageIf you’ve spent some time on a Unix or Unix-like machine (e.g., Linux, OS X, Solaris, etc.) then you’re probably at least vaguely familiar with the concept of a load average. A system’s load average can be easily determined from the Unix shell by running the uptime command:

mmalone@www:~$ uptime
 15:37:38 up 133 days,  3:37,  3 users,
    load average: 0.37, 0.37, 0.41

The load average is also displayed by the w and top commands, and by pretty much every system monitoring package on the planet. But what the heck is a load average, exactly? Read the rest of this entry »

Grepping your web logs

Linux, Web Development 13 Comments »

If you’re anything like me, you spend far too much time checking your web server stats, and not enough time actually creating content and coding. Thankfully, my logs are always close at hand since I work almost exclusively on the command line. With the help of a few common unix filters, you can quickly gauge how things are going on your site. These commands work with Apache, or Apache compatible log files, and can probably be tweaked to work with other log file formats pretty easily. Read the rest of this entry »

Lightweight Web Servers: 40 Alternatives to Apache

Linux, Software, Web Development No Comments »

IBM Developer Works just posted a new article discussing a variety of “lightweight” Web Servers. They analyze a number of servers across a variety of dimensions including performance, scalability, security, flexibility, and manageability. The article explains that “while it’s reasonable to assume the market leaders have been carefully optimized to be effectively unbeatable in performance (for example), many tiny competitors are faster for simple service of static Web pages.” This is in line with the results I found when I ran a comparison between longtime stalwart Apache and lightweight newcomer lighttpd. Read the rest of this entry »

Top 5 tops: keep tabs on your system

Linux, Lists 4 Comments »

Anyone who has spent any time at the command line has probably encountered the venerable top command. It’s an excellent system administration tool that make efficient use of the limited UI facilities available for command line applications. If you love top as much as I do, you may be interested in these other top-like tools that you can use to monitor other vital system statistics. Read the rest of this entry »

Top 5 unix network monitoring utilities

Linux, Lists, Networking 10 Comments »

I do a lot of web development work, which usually doesn’t require a lot of knowledge of low-level networking details. But from time to time it becomes necessary to work below the HTTP protocol, to debug a broken remote procedure call, or reverse engineer a third party ajax app. These tools make many low-level networking tasks a breeze. These are all command line utilities, by the way, since that’s where I’m most comfortable.
Read the rest of this entry »

Fork PHP! (and speed up your scripts)

Linux, Programming, Tutorials 10 Comments »

I just came across a forum post discussing what the author calls “multithreaded PHP,” and thought I’d clear a few things up about concurrency in PHP. First of all, this is not multhreading. It’s not even “sort of multithreading,” as the author implies (no offense, it’s just not). What the author is actually doing is creating multiple processes. A process is not the same thing as a thread. In fact, PHP does not support multiple threads at all, and doesn’t plan to do so anytime soon.

Threads and processes both increase the concurrency of an application, meaning you can do more things at the same time. A good example of a program that would benefit from multithreading/multiple processes is a web server. A single process or thread can only handle one I/O operation at a time (that’s not completely accurate, but it simplifies things so whatever… you can read more here if you want to know the truth). I/O operations are s - l - o - w (in computer-terms, anyways). If we fork a second process, or create a new thread, we can handle two I/O operations simultaneously! This simple model is the way 90% of network servers work:

While( 1 )
  Process 1: Listen for connection
  Process 1: Connection made, fork Process 2
End

Process 2: Handle request, terminate.

That way nobody is turned away by our server because the process that is supposed to be listening for incoming connections is handling a request.

Concurrency can come in handy for client-side programs too from time to time. Take the PHP script I wrote to download YouTube videos, for example. The script connects to YouTube, finds the URLs of the relevant FLV files, and pushes each URL onto a stack. Then it pops each URL off the stack one by one and downloads the FLV file. But YouTube throttles the downloads to about 60K/s. That’s fine when you’re streaming the video in your browser since the bitrate is much lower than that. But it’s not cool when you’re on a box with a 100mbit connection! So what can we do about it?

Well, we can’t change the fact that downloads from YouTube are capped at 60K/s, but we can download more than one movie at once by forking additional processes. And it turns out there is an easy way to do it built right in to PHP. The solution is the pcntl_fork() function, which provides an interface to the underlying fork syscall (only available on *nix platforms, sorry Windows users).

The trick to the fork syscall is that it returns twice. Once in the parent, and once in the child. It’s confusing at first, but if you think about it, it makes sense. In the parent the return value is the new child’s process ID. In the child it’s zero. Thus, we can use a simple if statement to determine whether we’re in the parent or the child process, which will usually determine the execution path we follow.

The changes that I had to make to the youtube script were almost trivial. The general pattern looks like this:

$pid = pcntl_fork();

if($pid == -1) {
  die('could not fork')
}
else if ($pid) {
  // positive value means we're in the parent.
  // do whatever parents do
  ....
  // wait for children to complete by calling
  // pcntl_wait() or a variant
} else {
  // zero value means we're in the child.
  // do whatever children do
  // (e.g. download the files, then exit)
}

Check out the source code here (plaintext version), and compare it to the previous version (plaintext version). The differences are minimal, but the forked version can substantially increase the execution speed if you’re downloading several files on a fast connection.

Copyright © 2007 - Mike Malone / Icons by N.Design Studio
Entries RSS Comments RSS Log in
no image