YouTube Video Ripper in PHP

Hacks, Programming 24 Comments »

I found an interesting shell script today while browsing digg that allows you to download all the youtube videos for a particular user that match a certain pattern. This is a great example of the power of regular expressions, by the way… I wrote a quick PHP port (plaintext version) that you can use if you’re running Windows. It doesn’t have a progress bar (since it’s not using wget), but it gets the job done.

Update: if you don’t have the PHP interpreter installed either, you can go to this page to generate direct links to the videos that would have been downloaded by the script automatically. I just threw it together in 5 minutes, so it’s not pretty. But again, it works. Just right click and “save as.”

read more

Hacking Google Spell Checker for Fun and Profit

Google, Hacks, Programming 35 Comments »

Try it out!

 


A few days ago I was researching ways to integrate spell checking with the search engine for a project I’m working on similar to the way Google does. I figured Google, being Google, must have some legitimate mechanism for accessing their spell checker (this is Web 2.0, after all).

After scouring the Internet for some time all I could find was a deprecated SOAP web service that used to be available as part of their SOAP search API. Unfortunately they stopped issuing API keys for the SOAP Search API on December 5, 2006. The ajax search API that replaced it doesn’t seem to provide spelling corrections. Bummer.

Just as I was about to give up I stumbled across an interesting blog post that describes a publicly available (but undocumented and apparently not very widely known) RPC endpoint that Google uses to provide spelling corrections for the Google Toolbar. The URL is https://www.google.com/tbproxy/spell.

Neat. After a few minutes of tinkering I put together a small class in PHP that provides easy access to the service. The class requires SimpleXML and CURL. It defines two static methods, SpellChecker::Check() (which returns true if the query you pass as an argument is spelled correctly) and SpellChecker::Correct() (which returns Google’s suggested spelling). You can download the source here (plaintext version), or try it out with the AJAX spell checker I threw together (up top).

Here’s a quick replay of a typical request/response (I wrapped the XML, but in theory it shouldn’t matter):

POST /tbproxy/spell?lang=en&hl=en HTTP/1.0
MIME-Version: 1.0
Content-type: application/PTI26
Content-length: 125
Content-transfer-encoding: text
Request-number: 1
Document-type: Request
Interface-Version: Test 1.4
Connection: close 

<spellrequest
  textalreadyclipped="0"
  ignoredups="1"
  ignoredigits="1"
  ignoreallcaps="0">
    <text>gogle spel</text>
</spellrequest>

HTTP/1.0 200 OK
Content-Type: text/xml
Server: DocumentSpellcheck
Cache-Control: private, x-gzip-ok=""
Date: Sat, 07 Apr 2007 14:11:57 GMT
Connection: Close

<?xml version="1.0"?>
<spellresult
  error="0"
  clipped="0"
  charschecked="10">
    <c o="0" l="5" s="1">google    Google  goggle  giggle  Gogol</c>
    <c o="6" l="4" s="1">spell       spiel   spelt   spew    Opel</c>
</spellresult>

The suggestions are tab-delineated. The ‘o’ attribute is an offset from the start of your query to the misspelled word. ‘l’ is the length of the misspelled word. ’s’ is the confidence of Google’s suggestion (presumably higher is better, but I’ve only gotten 0 or 1).

Copyright © 2007 - Mike Malone / Icons by N.Design Studio
Entries RSS Comments RSS Log in
no image