Try it out!

 


A few days ago I was researching ways to integrate spell checking with the search engine for a project I’m working on similar to the way Google does. I figured Google, being Google, must have some legitimate mechanism for accessing their spell checker (this is Web 2.0, after all).

After scouring the Internet for some time all I could find was a deprecated SOAP web service that used to be available as part of their SOAP search API. Unfortunately they stopped issuing API keys for the SOAP Search API on December 5, 2006. The ajax search API that replaced it doesn’t seem to provide spelling corrections. Bummer.

Just as I was about to give up I stumbled across an interesting blog post that describes a publicly available (but undocumented and apparently not very widely known) RPC endpoint that Google uses to provide spelling corrections for the Google Toolbar. The URL is https://www.google.com/tbproxy/spell.

Neat. After a few minutes of tinkering I put together a small class in PHP that provides easy access to the service. The class requires SimpleXML and CURL. It defines two static methods, SpellChecker::Check() (which returns true if the query you pass as an argument is spelled correctly) and SpellChecker::Correct() (which returns Google’s suggested spelling). You can download the source here (plaintext version), or try it out with the AJAX spell checker I threw together (up top).

Here’s a quick replay of a typical request/response (I wrapped the XML, but in theory it shouldn’t matter):

POST /tbproxy/spell?lang=en&hl=en HTTP/1.0
MIME-Version: 1.0
Content-type: application/PTI26
Content-length: 125
Content-transfer-encoding: text
Request-number: 1
Document-type: Request
Interface-Version: Test 1.4
Connection: close 

<spellrequest
  textalreadyclipped="0"
  ignoredups="1"
  ignoredigits="1"
  ignoreallcaps="0">
    <text>gogle spel</text>
</spellrequest>

HTTP/1.0 200 OK
Content-Type: text/xml
Server: DocumentSpellcheck
Cache-Control: private, x-gzip-ok=""
Date: Sat, 07 Apr 2007 14:11:57 GMT
Connection: Close

<?xml version="1.0"?>
<spellresult
  error="0"
  clipped="0"
  charschecked="10">
    <c o="0" l="5" s="1">google    Google  goggle  giggle  Gogol</c>
    <c o="6" l="4" s="1">spell       spiel   spelt   spew    Opel</c>
</spellresult>

The suggestions are tab-delineated. The ‘o’ attribute is an offset from the start of your query to the misspelled word. ‘l’ is the length of the misspelled word. ’s’ is the confidence of Google’s suggestion (presumably higher is better, but I’ve only gotten 0 or 1).