Async Google Spellcheck API Adaptor for TinyMCE

I recently added TinyMCE to a project in order to provide a stripped-down rich text editor with bold, italic and underline capability to a project. I discovered that the spell check functionality either required a client-side plugin for IE or a server-side implementation JSON RPC implementation called by TinyMCE via Ajax. Unfortunately, the only implementations for the server side provided by the TinyMCE project are in PHP and my project is in ASP.Net MVC 4.

Looking at the PHP implementations, one option is to adapt the Google Spellcheck API — which I didn’t even know existed. Basically this API allows you to post an XML document that contains a list of space-delimited words and get back a document which defines the substrings that are misspelled.

Using some examples of how the API works on the Google side, I was able to throw together a class that invokes it using the new async/await pattern in C# to create a Google Spellcheck API client that doesn’t block while wanting for its result.

using System;
using System.IO;
using System.Text;
using System.Net;
using System.Xml;
using System.Threading.Tasks;
using System.Collections.Generic;
using System.Diagnostics;

namespace WolfeReiter.Web.Utility
{
/*
 * http post to http://www.google.com/tbproxy/spell?lang=en&hl=en
 * 
 * Google spellcheck API request looks like this.
 * 
 * <?xml version="1.0" encoding="utf-8" ?> 
 * <spellrequest textalreadyclipped="0" ignoredups="0" ignoredigits="1" ignoreallcaps="1">
 * <text>Ths is a tst</text>
 * </spellrequest>
 * 
 * The response look like ...
 * 
 * <?xml version="1.0" encoding="UTF-8"?>
 * <spellresult error="0" clipped="0" charschecked="12">
 * <c o="0" l="3" s="1">This Th's Thus Th HS</c>
 * <c o="9" l="3" s="1">test tat ST St st</c>
 * </spellresult>
 */

    public class GoogleSpell
    {
        const string GOOGLE_REQUEST_TEMPLATE = "<?xml version=\"1.0\" encoding=\"utf-8\" ?><spellrequest textalreadyclipped=\"0\" ignoredups=\"0\" ignoredigits=\"1\" ignoreallcaps=\"1\"><text>{0}</text></spellrequest>";

        public async Task<IEnumerable<string>> SpellcheckAsync(string lang, IEnumerable<string> wordList)
        {
            //convert list of words to space-delimited string.
            var words = string.Join(" ", wordList);
            var result = (await QueryGoogleAsync(lang, words));

            var doc = new XmlDocument();
            doc.LoadXml(result);

            // Build misspelled word list
            var misspelledWords = new List<string>();
            foreach (var node in doc.SelectNodes("//c"))
            {
                var cElm = (XmlElement)node;
                //google sends back bad word positions to slice out of original data we sent.
                try
                {
                    var badword = words.Substring(Convert.ToInt32(cElm.GetAttribute("o")), Convert.ToInt32(cElm.GetAttribute("l")));
                    misspelledWords.Add(badword);
                }
                catch( ArgumentOutOfRangeException e)
                {
                    Trace.WriteLine(e);
                    Debug.WriteLine(e);
                }
            }
            return misspelledWords;
        }

        public async Task<IEnumerable<string>> SuggestionsAsync(string lang, string word)
        {
            var result = (await QueryGoogleAsync(lang, word));

            // Parse XML result
            var doc = new XmlDocument();
            doc.LoadXml(result);

            // Build misspelled word list
            var suggestions = new List<string>();
            foreach (XmlNode node in doc.SelectNodes("//c"))
            {
                var element = (XmlElement)node;
                if(!string.IsNullOrWhiteSpace(element.InnerText))
                {
                    foreach (var suggestion in element.InnerText.Split('\t'))
                    {
                        if (!string.IsNullOrEmpty(suggestion)) { suggestions.Add(suggestion); }
                    }
                }
            }

            return suggestions;
        }

        async Task<string> QueryGoogleAsync(string lang, string data)
        {
            var scheme     = "https";
            var server     = "www.google.com";
            var port       = 443;
            var path       = "/tbproxy/spell";
            var query      = string.Format("?lang={0}&hl={1}", lang, data);
            var uriBuilder = new UriBuilder(scheme, server, port, path, query);
            string xml     = string.Format(GOOGLE_REQUEST_TEMPLATE, EncodeUnicodeToASCII(data));

            var request           = WebRequest.CreateHttp(uriBuilder.Uri);
            request.Method        = "POST";
            request.KeepAlive     = false;
            request.ContentType   = "application/PTI26";
            request.ContentLength = xml.Length;

            // Google-specific headers
            var headers = request.Headers;
            headers.Add("MIME-Version: 1.0");
            headers.Add("Request-number: 1");
            headers.Add("Document-type: Request");
            headers.Add("Interface-Version: Test 1.4");

            using (var requestStream = (await request.GetRequestStreamAsync()))
            {
                var xmlData = Encoding.ASCII.GetBytes(xml);
                requestStream.Write(xmlData, 0, xmlData.Length);

                var response = (await request.GetResponseAsync());
                using (var responseStream = new StreamReader(response.GetResponseStream()))
                {
                    return responseStream.ReadToEnd();
                }
            }
        }

        string EncodeUnicodeToASCII(string s)
        {
            var builder = new StringBuilder();
            foreach(var c in s.ToCharArray())
            {
                //encode Unicode characters that can't be represented as ASCII
                if (c > 127) { builder.AppendFormat( "&#{0};", (int)c); }
                else { builder.Append(c); }
            }
            return builder.ToString();
        }

    }
}

The GoogleSpellChecker class below exposes two methods: SpellcheckAsync and SuggestionsAsync.

My MVC Controller class exposes this functionality to the TinyMCE by translating JSON back and forth to the GoogleSpell class.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading.Tasks;
using System.Web;
using System.Web.Mvc;
using WolfeReiter.Web.Utility;

namespace MvcProject.Controllers
{
    public class TinyMCESpellcheckGatewayController : AsyncController
    {
        [HttpPost]
        public async Task<JsonResult> Index(SpellcheckRequest model)
        {
            var spellService = new GoogleSpell();
            IEnumerable<string> result = null;
            if(string.Equals(model.method, "getSuggestions", StringComparison.InvariantCultureIgnoreCase))
            {
                result = (await spellService.SuggestionsAsync(model.@params.First().Single(), model.@params.Skip(1).First().Single()));
            }
            else //assume checkWords
            {
                result = (await spellService.SpellcheckAsync(model.@params.First().Single(), model.@params.Skip(1).First()));
            }
            string error = null;
            return Json( new { result, id = model.id, error } );
        }

        //class models JSON posted by TinyMCE allows MVC Model Binding to "just work"
        public class SpellcheckRequest
        {
            public SpellcheckRequest()
            {
                @params = new List<IEnumerable<string>>();
            }
            public string method { get; set; }
            public string id { get; set; }
            public IEnumerable<IEnumerable<string>> @params { get; set; }
        }
    }
}

Integrating the above controller with TinyMCE is straightforward. All that needs to happen is include the “spellchecker” plugin, the “spellchecker” toolbar button and set the spellchecker_rpc_url to point to the controller.

/*global $, jQuery, tinyMCE, tinymce */
/// <reference path="jquery-1.8.3.js" />
/// <reference path="jquery-ui-1.8.24.js" />
/// <reference path="modernizr-2.6.2.js" />
/// <reference path="tinymce/tinymce.jquery.js" />
/// <reference path="tinymce/tiny_mce_jquery.js" />
(function () {
    "use strict";

    $(document).ready(function () {

        $('textarea.rich-text').tinymce({
            mode: "exact",
            theme: "advanced",
            plugins: "safari,spellchecker,paste",
            gecko_spellcheck: true,
            theme_advanced_buttons1: "bold,italic,underline,|,undo,redo,|,spellchecker,code",
            theme_advanced_statusbar_location: "none",
            spellchecker_rpc_url: "/TinyMCESpellcheckGateway", //<-- point TinyMCE to GoolgeSpell adaptor controller
            /*strip pasted microsoft office styles*/
            paste_strip_class_attributes: "mso"
        });
       
    });
}());

That’s all there is to it. Here’s how TinyMCE renders on a <textarea class=”rich-text-“></textarea>.

Tinymce spellcheck

Advertisements

The Daily Flashback: The String in, String out API

I just ran across a recent Daily WTF story “XML’d XML” in my news feed. The gist is that some web service call which inherently has XML as return data returned a “string” as its payload. And it was a string, but a string of XML.

I instantly flashed back on an integration project from a few years back that we did for a Fortune 500 company. The company had contracted with a 3rd party service provider that provided a SOAP API for all of its message passing and we thought this would be much better than something out of the 1990s like passing PGP-encrypted CSV files over FTP. We were working with an—even at the time—ancient Sun ONE 6.0 server platform but it was able to support SOAP web services using Apache Axis.

I became concerned when I realized that every method took a single string “input” argument and returned a string.

I just pulled a sample from our source repository for old time sake.

/**
 * SubscriberAPISoap_PortType.java
 *
 * This file was auto-generated from WSDL
 * by the Apache Axis 1.4 Apr 22, 2006 (06:55:48 PDT) WSDL2Java emitter.
 */

package com.shallRemainNameless;

import javax.xml.soap.SOAPException;

public interface SubscriberAPISoap extends java.rmi.Remote, SoapAuthentication {
    public java.lang.String createSubscriber(java.lang.String input) throws java.rmi.RemoteException;
    public java.lang.String updateSubscriber(java.lang.String input) throws java.rmi.RemoteException;
    public java.lang.String addCategory(java.lang.String input) throws java.rmi.RemoteException;
    public java.lang.String removeCategory(java.lang.String input) throws java.rmi.RemoteException;
    public java.lang.String getSubscriber(java.lang.String input) throws java.rmi.RemoteException;
    public java.lang.String getSubscriberData(java.lang.String input) throws java.rmi.RemoteException;
    public java.lang.String authenticateSubscriber(java.lang.String input) throws java.rmi.RemoteException;
    public java.lang.String getSubscribers(java.lang.String input) throws java.rmi.RemoteException;
    public java.lang.String getSubscribersByField(java.lang.String input) throws java.rmi.RemoteException;
    public java.lang.String deactivateSubscriber(java.lang.String input) throws java.rmi.RemoteException;
    public java.lang.String activateSubscriber(java.lang.String input) throws java.rmi.RemoteException;
    public java.lang.String sendOptInMessage(java.lang.String input) throws java.rmi.RemoteException;
    public java.lang.String getRegistrationMetaData(java.lang.String input) throws java.rmi.RemoteException;
}

Every one of those input and return strings was actually XML. I remember when I realized it was XML all the way down, I sent an internal email with the subject “String in, String out == [expletive-deleted]”.

We created a wrapper API to hide the insanity but the actual XML API wasn’t so much better than the String version. They had a unified ApiResultDocument XML schema that encoded any possible result and an ApiRequestDocument schema which encoded any possible combination of arguments across all web methods.

/*
 * SubscriberAPISoapDocument.java
 *
 * Created on December 4, 2006, 2:59 PM
 *
 */

package com.shallRemainNameless.service;

/**
 *
 * @author breiter
 */

import com.shallRemainNameless.service.schemas.apiResult.ApiResultDocument;
import java.rmi.RemoteException;
import javax.xml.soap.SOAPException;
import org.apache.xmlbeans.XmlException;

public interface SubscriberAPISoapStrong extends java.rmi.Remote, SoapAuthentication 
{
    public ApiResultDocument createSubscriber(com.shallRemainNameless.service.schemas.subscriberCreateAndUpdate.ApiRequestDocument xmldoc) throws RemoteException, XmlException;
    public ApiResultDocument updateSubscriber(com.shallRemainNameless.service.schemas.subscriberCreateAndUpdate.ApiRequestDocument xmldoc) throws RemoteException, XmlException;
    public ApiResultDocument addCategory(com.shallRemainNameless.service.schemas.addRemoveCategory.ApiRequestDocument xmldoc) throws RemoteException, XmlException;
    public ApiResultDocument removeCategory(com.shallRemainNameless.service.schemas.addRemoveCategory.ApiRequestDocument xmldoc) throws RemoteException, XmlException;
    public ApiResultDocument getSubscriber(com.shallRemainNameless.service.schemas.getSubscriber.ApiRequestDocument xmldoc) throws RemoteException, XmlException;
    public ApiResultDocument getSubscriberData(com.shallRemainNameless.service.schemas.getSubscriber.ApiRequestDocument xmldoc) throws RemoteException, XmlException;
    public ApiResultDocument authenticateSubscriber(com.shallRemainNameless.service.schemas.authenticateSubscriber.ApiRequestDocument xmldoc) throws RemoteException, XmlException;
    public ApiResultDocument getSubscribers(com.shallRemainNameless.service.schemas.getSubscriber.ApiRequestDocument xmldoc) throws RemoteException, XmlException;
    public ApiResultDocument getSubscribersByField(com.shallRemainNameless.service.schemas.getSubscribersByField.ApiRequestDocument xmldoc) throws RemoteException, XmlException;
    public ApiResultDocument deactivateSubscriber(com.shallRemainNameless.service.schemas.deactivateSubscriber.ApiRequestDocument xmldoc) throws RemoteException, XmlException;
    public ApiResultDocument activateSubscriber(com.shallRemainNameless.service.schemas.activateSubscriber.ApiRequestDocument xmldoc) throws RemoteException, XmlException;
    public ApiResultDocument sendOptInMessage(com.shallRemainNameless.service.schemas.sendOptInMessage.ApiRequestDocument xmldoc) throws RemoteException, XmlException;
    public ApiResultDocument getRegistrationMetaData(com.shallRemainNameless.service.schemas.getSubscribers.ApiRequestDocument xmldoc) throws RemoteException, XmlException;
}

I remember that things got interesting when we created unit tests to confirm that the SOAP API did what it was supposed to do. There are 15 versions of unit tests checked in as we established that the documentation provided was not, in fact, based on the actual behavior of the system.

At least the designer of this API didn’t realize s/he could have just added an <ApiMethodRequest /> element to the ApiRequestDocument and used a single ApiResultDocument doApiRequest( ApiRequestDocument ) method. Oh wait, it would have been String doApiRequest( String ). That would have been epic.

Good times.

%d bloggers like this: