Design Patterns: The Singleton

Design Patterns, Programming, Software Patterns 7 Comments »

Global variables tie classes to context and create unnatural interdependencies in an application. A Singleton ensures that a class only has one instance, and provides a global point of access to it. If a system only needs one instance of a class, and that instance is used in different parts of the system, you can control instantiation and access by making the class a singleton. Read the rest of this entry »

Grokking software patterns

Programming, Software Patterns 10 Comments »

If you’ve been programming for any period of time then you’ve probably heard about these things called patterns. But unless you’re a professional you’re probably not quite sure what they are, or what they’re good for. I’ve decided to start cataloging some of the software patterns I use most frequently in my projects. But before I do that, I figured I should try explaining what a software pattern is. Read the rest of this entry »

How to write a simple PHP template engine

Programming, Tutorials 8 Comments »

Templates are a great way to separate logic from presentation in an application. There’s no shortage of template engines available for PHP, so why would you want to write another one? Well, because sometimes you don’t need a full fledged template language like smarty, and writing your own simple engine is as easy as it is useful. Read the rest of this entry »

Making popularity contest play nice with WP-Cache

Blogging, Programming 28 Comments »

In this post I’m going to describe how I got Alex King’s popularity contest Wordpress plugin to work with WP-Cache2. If you’re not interested in how it works, you can skip reading the article and just download my modified version of popularity-contest.php [pretty-printed version] that you can put in your plugins/popularity-contest directory replacing the original file. Make sure you delete your old cache files after installing it to get things working right away.

Update: Make sure you check that your feeds still work after the plugin is enabled. The PHP engine may try to interpret the XML declaration at the top of the file, causing a scripting error. If this happens you can tell WP-Cache not to cache your feed by adding /feed (or something else that matches your feed’s URL) to the list do not cache list under the WP-Cache options. If you really need your feeds cached, you could add some logic to popularity-contest.php to fix the problem.

Read the rest of this entry »

Digg Widget 2.0

Blogging, Programming 2 Comments »

DiggFriends 0.02
A couple of weeks ago I wrote a post on the Vino2Vino development blog about a digg widget that displayed your friends’ digg activity on your blog. I wrote the widget in response to a request by another digg user. If you didn’t read the original post, the widget displays a list of articles recently dugg by your friends. As I said in the original post, the code was a simple proof of concept, but it drew a lot of attention and a lot of criticism.

The complaints were all pretty accurate (no built-in caching, scraping data instead of using RSS, poor styling, etc.), and I promised to address them in a future release of the widget. So here it is — DiggFriends 0.02 Beta. You can see a working example of the new version in the sidebar here on my blog.

Read the rest of this entry »

Impress your friends with your blog stats

Blogging, Programming 78 Comments »

I was chatting with my buddy ryan earlier about PaulStamatiou.com and noticed the little word/comment counter he has in the header of his blog. I thought it was cool, so I wrote a wordpress plugin to generate the same stats for my blog! Now I’m making it available to you. I’m calling the plugin Impress, and you can download it here (pretty-printed version).

Here’s how you install it:

  1. Save the file to your wordpress plugins directory (wp-content/plugins) as impress.php
  2. Activate the plugin under the plugins tab on the wordpress admin panel (it should be the one called Impress)
  3. Place a call to the impress() function in your header.php, footer.php or some other template file - it should look like <?php impress(<format>); ?>

That’s all!

The impress function takes one argument: a specially formatted string that determines what the output will be. There are six special keywords that will be replaced with your statistics, the rest of the string can be anything (it will be displayed along with the stats). The keywords are :users, :posts, :pages, :comments, :categories, :post_wordcount, :page_wordcount, and :comment_wordcount. They’re pretty self explanatory, so I’ll let you figure out what they mean.

Here’s an example from my blog (see the lower right-hand corner):

<p><?php impress("So far I've written :post_wordcount words
in :posts posts. :comments comments have been posted,
with a total of :comment_wordcount words."); ?><p>

Let me know if you have any thoughts, comments, suggestions, etc. Otherwise, enjoy!

Update: It’s fast, too! I’m being dugg right now and the server’s not breaking a sweat.

How not to optimize a MySQL query

Database, MySQL, Programming, SQL 32 Comments »

I just read a blog post discussing mysql query optimization and thought I’d put in my two cents.

The post suggests using a number of mysql specific statements (e.g. SQL_SMALL_RESULT, HIGH/LOW_PRIORITY, and INSERT DELAYED. STRAIGHT_JOIN was conspicuously missing). Unless absolutely necessary, this is usually A Bad Idea for at least two reasons. First, they are specific to MySQL which makes your database code less portable. This might or might not be a problem. Second, and perhaps more importantly, giving the SQL interpreter this sort of hint can lead to decreased performance in the future when your database or the interpreter changes. Telling the interpreter to anticipate a small result set (with SQL_SMALL_RESULT) might seem like a good idea, but could lead to problems when your table grows and the result becomes large! Basically, use these keywords with caution, and only when you really need them. And when you do use them, take special care in documenting where and why they’re in use.

The truth is there is no silver bullet that is going to make MySQL (or any dbms) run a poorly written query lightning fast. But here are some tips that the post somehow neglected to mention.

Properly index your tables

If you do a lot of lookups using a particular column of a table, or if you join on a column, that column should be indexed. Moreover, if all of the data that you are retrieving is available in the index (e.g. you’re using a multi-column index) then MySQL can avoid looking at the table altogether and execute your query using just the index.

Avoid superfluous queries

Don’t do this:

$result = query_db('select * from table1');

for each $result as $row
  $array[] = query_db('select * from table2 where column = '.$row['id']);
endforeach;

Do this:

$result = query_db('select table2.* from '
       .'table1, table2 where table1.id=table2.column');

Look for bottlenecks

Don’t waste time optimizing queries that aren’t bottlenecks in your application. Find the low hanging fruit and correct those problems first.

Learn SQL

This is the most important tip. SQL optimization really has to be done on a case by case basis, and you can’t do it unless you have a good understanding of the language and how you can use it to your advantage. You need to understand things like subqueries, grouping, left joins vs. right joins vs. full joins, etc. There is no free lunch.

If you’re interested in learning more, I highly recommend Stephane Faroult’s book The Art of SQL.

Fork PHP! (and speed up your scripts)

Linux, Programming, Tutorials 10 Comments »

I just came across a forum post discussing what the author calls “multithreaded PHP,” and thought I’d clear a few things up about concurrency in PHP. First of all, this is not multhreading. It’s not even “sort of multithreading,” as the author implies (no offense, it’s just not). What the author is actually doing is creating multiple processes. A process is not the same thing as a thread. In fact, PHP does not support multiple threads at all, and doesn’t plan to do so anytime soon.

Threads and processes both increase the concurrency of an application, meaning you can do more things at the same time. A good example of a program that would benefit from multithreading/multiple processes is a web server. A single process or thread can only handle one I/O operation at a time (that’s not completely accurate, but it simplifies things so whatever… you can read more here if you want to know the truth). I/O operations are s - l - o - w (in computer-terms, anyways). If we fork a second process, or create a new thread, we can handle two I/O operations simultaneously! This simple model is the way 90% of network servers work:

While( 1 )
  Process 1: Listen for connection
  Process 1: Connection made, fork Process 2
End

Process 2: Handle request, terminate.

That way nobody is turned away by our server because the process that is supposed to be listening for incoming connections is handling a request.

Concurrency can come in handy for client-side programs too from time to time. Take the PHP script I wrote to download YouTube videos, for example. The script connects to YouTube, finds the URLs of the relevant FLV files, and pushes each URL onto a stack. Then it pops each URL off the stack one by one and downloads the FLV file. But YouTube throttles the downloads to about 60K/s. That’s fine when you’re streaming the video in your browser since the bitrate is much lower than that. But it’s not cool when you’re on a box with a 100mbit connection! So what can we do about it?

Well, we can’t change the fact that downloads from YouTube are capped at 60K/s, but we can download more than one movie at once by forking additional processes. And it turns out there is an easy way to do it built right in to PHP. The solution is the pcntl_fork() function, which provides an interface to the underlying fork syscall (only available on *nix platforms, sorry Windows users).

The trick to the fork syscall is that it returns twice. Once in the parent, and once in the child. It’s confusing at first, but if you think about it, it makes sense. In the parent the return value is the new child’s process ID. In the child it’s zero. Thus, we can use a simple if statement to determine whether we’re in the parent or the child process, which will usually determine the execution path we follow.

The changes that I had to make to the youtube script were almost trivial. The general pattern looks like this:

$pid = pcntl_fork();

if($pid == -1) {
  die('could not fork')
}
else if ($pid) {
  // positive value means we're in the parent.
  // do whatever parents do
  ....
  // wait for children to complete by calling
  // pcntl_wait() or a variant
} else {
  // zero value means we're in the child.
  // do whatever children do
  // (e.g. download the files, then exit)
}

Check out the source code here (plaintext version), and compare it to the previous version (plaintext version). The differences are minimal, but the forked version can substantially increase the execution speed if you’re downloading several files on a fast connection.

YouTube Video Ripper in PHP

Hacks, Programming 24 Comments »

I found an interesting shell script today while browsing digg that allows you to download all the youtube videos for a particular user that match a certain pattern. This is a great example of the power of regular expressions, by the way… I wrote a quick PHP port (plaintext version) that you can use if you’re running Windows. It doesn’t have a progress bar (since it’s not using wget), but it gets the job done.

Update: if you don’t have the PHP interpreter installed either, you can go to this page to generate direct links to the videos that would have been downloaded by the script automatically. I just threw it together in 5 minutes, so it’s not pretty. But again, it works. Just right click and “save as.”

read more

Hacking Google Spell Checker for Fun and Profit

Google, Hacks, Programming 35 Comments »

Try it out!

 


A few days ago I was researching ways to integrate spell checking with the search engine for a project I’m working on similar to the way Google does. I figured Google, being Google, must have some legitimate mechanism for accessing their spell checker (this is Web 2.0, after all).

After scouring the Internet for some time all I could find was a deprecated SOAP web service that used to be available as part of their SOAP search API. Unfortunately they stopped issuing API keys for the SOAP Search API on December 5, 2006. The ajax search API that replaced it doesn’t seem to provide spelling corrections. Bummer.

Just as I was about to give up I stumbled across an interesting blog post that describes a publicly available (but undocumented and apparently not very widely known) RPC endpoint that Google uses to provide spelling corrections for the Google Toolbar. The URL is https://www.google.com/tbproxy/spell.

Neat. After a few minutes of tinkering I put together a small class in PHP that provides easy access to the service. The class requires SimpleXML and CURL. It defines two static methods, SpellChecker::Check() (which returns true if the query you pass as an argument is spelled correctly) and SpellChecker::Correct() (which returns Google’s suggested spelling). You can download the source here (plaintext version), or try it out with the AJAX spell checker I threw together (up top).

Here’s a quick replay of a typical request/response (I wrapped the XML, but in theory it shouldn’t matter):

POST /tbproxy/spell?lang=en&hl=en HTTP/1.0
MIME-Version: 1.0
Content-type: application/PTI26
Content-length: 125
Content-transfer-encoding: text
Request-number: 1
Document-type: Request
Interface-Version: Test 1.4
Connection: close 

<spellrequest
  textalreadyclipped="0"
  ignoredups="1"
  ignoredigits="1"
  ignoreallcaps="0">
    <text>gogle spel</text>
</spellrequest>

HTTP/1.0 200 OK
Content-Type: text/xml
Server: DocumentSpellcheck
Cache-Control: private, x-gzip-ok=""
Date: Sat, 07 Apr 2007 14:11:57 GMT
Connection: Close

<?xml version="1.0"?>
<spellresult
  error="0"
  clipped="0"
  charschecked="10">
    <c o="0" l="5" s="1">google    Google  goggle  giggle  Gogol</c>
    <c o="6" l="4" s="1">spell       spiel   spelt   spew    Opel</c>
</spellresult>

The suggestions are tab-delineated. The ‘o’ attribute is an offset from the start of your query to the misspelled word. ‘l’ is the length of the misspelled word. ’s’ is the confidence of Google’s suggestion (presumably higher is better, but I’ve only gotten 0 or 1).

Copyright © 2007 - Mike Malone / Icons by N.Design Studio
Entries RSS Comments RSS Log in
no image