Async Google Spellcheck API Adaptor for TinyMCE

I recently added TinyMCE to a project in order to provide a stripped-down rich text editor with bold, italic and underline capability to a project. I discovered that the spell check functionality either required a client-side plugin for IE or a server-side implementation JSON RPC implementation called by TinyMCE via Ajax. Unfortunately, the only implementations for the server side provided by the TinyMCE project are in PHP and my project is in ASP.Net MVC 4.

Looking at the PHP implementations, one option is to adapt the Google Spellcheck API — which I didn’t even know existed. Basically this API allows you to post an XML document that contains a list of space-delimited words and get back a document which defines the substrings that are misspelled.

Using some examples of how the API works on the Google side, I was able to throw together a class that invokes it using the new async/await pattern in C# to create a Google Spellcheck API client that doesn’t block while wanting for its result.

using System;
using System.IO;
using System.Text;
using System.Net;
using System.Xml;
using System.Threading.Tasks;
using System.Collections.Generic;
using System.Diagnostics;

namespace WolfeReiter.Web.Utility
{
/*
 * http post to http://www.google.com/tbproxy/spell?lang=en&hl=en
 * 
 * Google spellcheck API request looks like this.
 * 
 * <?xml version="1.0" encoding="utf-8" ?> 
 * <spellrequest textalreadyclipped="0" ignoredups="0" ignoredigits="1" ignoreallcaps="1">
 * <text>Ths is a tst</text>
 * </spellrequest>
 * 
 * The response look like ...
 * 
 * <?xml version="1.0" encoding="UTF-8"?>
 * <spellresult error="0" clipped="0" charschecked="12">
 * <c o="0" l="3" s="1">This Th's Thus Th HS</c>
 * <c o="9" l="3" s="1">test tat ST St st</c>
 * </spellresult>
 */

    public class GoogleSpell
    {
        const string GOOGLE_REQUEST_TEMPLATE = "<?xml version=\"1.0\" encoding=\"utf-8\" ?><spellrequest textalreadyclipped=\"0\" ignoredups=\"0\" ignoredigits=\"1\" ignoreallcaps=\"1\"><text>{0}</text></spellrequest>";

        public async Task<IEnumerable<string>> SpellcheckAsync(string lang, IEnumerable<string> wordList)
        {
            //convert list of words to space-delimited string.
            var words = string.Join(" ", wordList);
            var result = (await QueryGoogleAsync(lang, words));

            var doc = new XmlDocument();
            doc.LoadXml(result);

            // Build misspelled word list
            var misspelledWords = new List<string>();
            foreach (var node in doc.SelectNodes("//c"))
            {
                var cElm = (XmlElement)node;
                //google sends back bad word positions to slice out of original data we sent.
                try
                {
                    var badword = words.Substring(Convert.ToInt32(cElm.GetAttribute("o")), Convert.ToInt32(cElm.GetAttribute("l")));
                    misspelledWords.Add(badword);
                }
                catch( ArgumentOutOfRangeException e)
                {
                    Trace.WriteLine(e);
                    Debug.WriteLine(e);
                }
            }
            return misspelledWords;
        }

        public async Task<IEnumerable<string>> SuggestionsAsync(string lang, string word)
        {
            var result = (await QueryGoogleAsync(lang, word));

            // Parse XML result
            var doc = new XmlDocument();
            doc.LoadXml(result);

            // Build misspelled word list
            var suggestions = new List<string>();
            foreach (XmlNode node in doc.SelectNodes("//c"))
            {
                var element = (XmlElement)node;
                if(!string.IsNullOrWhiteSpace(element.InnerText))
                {
                    foreach (var suggestion in element.InnerText.Split('\t'))
                    {
                        if (!string.IsNullOrEmpty(suggestion)) { suggestions.Add(suggestion); }
                    }
                }
            }

            return suggestions;
        }

        async Task<string> QueryGoogleAsync(string lang, string data)
        {
            var scheme     = "https";
            var server     = "www.google.com";
            var port       = 443;
            var path       = "/tbproxy/spell";
            var query      = string.Format("?lang={0}&hl={1}", lang, data);
            var uriBuilder = new UriBuilder(scheme, server, port, path, query);
            string xml     = string.Format(GOOGLE_REQUEST_TEMPLATE, EncodeUnicodeToASCII(data));

            var request           = WebRequest.CreateHttp(uriBuilder.Uri);
            request.Method        = "POST";
            request.KeepAlive     = false;
            request.ContentType   = "application/PTI26";
            request.ContentLength = xml.Length;

            // Google-specific headers
            var headers = request.Headers;
            headers.Add("MIME-Version: 1.0");
            headers.Add("Request-number: 1");
            headers.Add("Document-type: Request");
            headers.Add("Interface-Version: Test 1.4");

            using (var requestStream = (await request.GetRequestStreamAsync()))
            {
                var xmlData = Encoding.ASCII.GetBytes(xml);
                requestStream.Write(xmlData, 0, xmlData.Length);

                var response = (await request.GetResponseAsync());
                using (var responseStream = new StreamReader(response.GetResponseStream()))
                {
                    return responseStream.ReadToEnd();
                }
            }
        }

        string EncodeUnicodeToASCII(string s)
        {
            var builder = new StringBuilder();
            foreach(var c in s.ToCharArray())
            {
                //encode Unicode characters that can't be represented as ASCII
                if (c > 127) { builder.AppendFormat( "&#{0};", (int)c); }
                else { builder.Append(c); }
            }
            return builder.ToString();
        }

    }
}

The GoogleSpellChecker class below exposes two methods: SpellcheckAsync and SuggestionsAsync.

My MVC Controller class exposes this functionality to the TinyMCE by translating JSON back and forth to the GoogleSpell class.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading.Tasks;
using System.Web;
using System.Web.Mvc;
using WolfeReiter.Web.Utility;

namespace MvcProject.Controllers
{
    public class TinyMCESpellcheckGatewayController : AsyncController
    {
        [HttpPost]
        public async Task<JsonResult> Index(SpellcheckRequest model)
        {
            var spellService = new GoogleSpell();
            IEnumerable<string> result = null;
            if(string.Equals(model.method, "getSuggestions", StringComparison.InvariantCultureIgnoreCase))
            {
                result = (await spellService.SuggestionsAsync(model.@params.First().Single(), model.@params.Skip(1).First().Single()));
            }
            else //assume checkWords
            {
                result = (await spellService.SpellcheckAsync(model.@params.First().Single(), model.@params.Skip(1).First()));
            }
            string error = null;
            return Json( new { result, id = model.id, error } );
        }

        //class models JSON posted by TinyMCE allows MVC Model Binding to "just work"
        public class SpellcheckRequest
        {
            public SpellcheckRequest()
            {
                @params = new List<IEnumerable<string>>();
            }
            public string method { get; set; }
            public string id { get; set; }
            public IEnumerable<IEnumerable<string>> @params { get; set; }
        }
    }
}

Integrating the above controller with TinyMCE is straightforward. All that needs to happen is include the “spellchecker” plugin, the “spellchecker” toolbar button and set the spellchecker_rpc_url to point to the controller.

/*global $, jQuery, tinyMCE, tinymce */
/// <reference path="jquery-1.8.3.js" />
/// <reference path="jquery-ui-1.8.24.js" />
/// <reference path="modernizr-2.6.2.js" />
/// <reference path="tinymce/tinymce.jquery.js" />
/// <reference path="tinymce/tiny_mce_jquery.js" />
(function () {
    "use strict";

    $(document).ready(function () {

        $('textarea.rich-text').tinymce({
            mode: "exact",
            theme: "advanced",
            plugins: "safari,spellchecker,paste",
            gecko_spellcheck: true,
            theme_advanced_buttons1: "bold,italic,underline,|,undo,redo,|,spellchecker,code",
            theme_advanced_statusbar_location: "none",
            spellchecker_rpc_url: "/TinyMCESpellcheckGateway", //<-- point TinyMCE to GoolgeSpell adaptor controller
            /*strip pasted microsoft office styles*/
            paste_strip_class_attributes: "mso"
        });
       
    });
}());

That’s all there is to it. Here’s how TinyMCE renders on a <textarea class=”rich-text-“></textarea>.

Tinymce spellcheck

Advertisement

JSLint.VS2010: Automatic JavaScript Static Analysis

Recently, I was working on a project that used some asynchronous JSON-to-HTML databiding. Testers sent a bug that the site was working in Chrome, Firefox, Safari and IE9 but nothing was happening in IE8. The IE8 javascript debugger didn’t report any errors, it just didn’t do anything.

This sort of thing is no fun to debug and I spent an increasingly desperate evening staring at code that couldn’t possibly be wrong  failing to see the problem. Eventually, in desperation, I pasted the code into Douglas Crockford’s jslint and it quickly pointed out a stray comma which made all the difference.

I later discovered that there is a jslint extension available for Visual Studio 2010.

jslint-options

The really cool thing about the JSLint Visual Studio extension is that it reports its warnings and errors into the Visual Studio Error List window where they work just like other code errors. Even better, it has the option to cancel the build on error. That means if jslint finds an error, the project build dies. In my world, this is fantastic. It means I don’t even try to test something if jslint finds something wrong with it. I consider this to be a huge productivity win.skip-on-build

The following JSLint global options were key for me to use jslint as a build-time checker:

  • Output: Errors – causes Visual Studio to interpret JSLint output as errors
  • Run JSLint on build: true – causes JSLint to test all .JS files during build
  • Cancel build on error true– causes JSLint errors to cancel the build, just like C# compiler errors
  • Assume a browser: true
  • Predefined Vars: jQuery, $
  • Strict whitespace: false
  • Maximum line length: 999
    One issue with cancelling the build on error is that jslint is pedantic and some of its warnings are not strictly errors which means that standard libraries like jquery aren’t going to pass jslint. Fortunately, you can exclude individual .js files from build-time testing by right-clicking on the files you want to exclude in the Solution Explorer window. (Note that jslint doesn’t plug its menus into the Solution Navigator window from the “Productivity Power Tools” extension. You have to use the Solution Explorer.)
    Happy linting.

Submitting an MVC Ajax.BeginForm Using JavaScript and jQuery

The Ajax.BeginForm() helper method in ASP.Net MVC generates a bit of inline JavaScript attached to the onsubmit event of the form. When the form is submitted in the usual manner with a submit button the request is sent using XMLHttpRequest and the result is returned into the <DIV /> of your choice. However, if you want try to submit the form using JavaScript things are less tidy.

For my example, imagine a user account management tool that displays a grid of accounts and has a number batch of operations that you can perform against the grid data in an Ajax-y manner.account-mgmt

The idea here is that you could, for example, check a number of accounts and then click disable and those accounts would be disabled. The result of your operation gets written into a notification <DIV />.

Here’s the naïve implementation for submitting an MVC Ajax form.

jQuery().ready(function () {
	//other stuff including jqGrid initialization here...

	//'grid' is the id of the jqGrid table element
	//'disableKeys' is the id of the Ajax form we are submitting.
	$('form#disableAccounts').find('a.submit-link').click(function () {
	    //work to set the state of the form before sending it to the server here...
	    $('form#disableAccounts').submit();
	});
)};

Unfortunately, this doesn’t do at all what we want. What you end up with is the text that was intended for the notification <DIV /> replacing the entire window contents. In other words, a classic POST. But wait, there’s more.

debug-double-post

What actually happens is that the request is submitted twice! The first version is Ajax and the second is classic POST, interrupting the Ajax response.

The First Solution

My initial approach to solving this double-submit problem hinged on leveraging the fact that Ajax.BeginForm() generates a <FORM /> tag with some JavaScript attached to the onsubmit event to handle the Ajax behavior. Why not just trigger the onsubmit event directly?

jQuery().ready(function () {
	//other stuff including jqGrid initialization here...

	//'grid' is the id of the jqGrid table element
	//'disableKeys' is the id of the Ajax form we are submitting.
	$('form#disableAccounts').find('a.submit-link').click(function () {
	    //work to set the state of the form before sending it to the server here...
	    $('form#disableAccounts').trigger('onsubmit');
	});
)};

This works great except in Safari where nothing happens at all.

The Final Solution

I tried a number of different techniques to suppress the default form POST behavior. The one that worked in IE, Firefox, Chrome and Safari is to attach a new event handler to the submit event that always returns false.

jQuery().ready(function () {
	//other stuff including jqGrid initialization here...

	//prevent the form from submitting in a non-Ajax manner for all browsers
	$('form#disableAccounts').submit(function (event) { eval($(this).attr("submit")); return false; });

	//'grid' is the id of the jqGrid table element
	//'disableKeys' is the id of the Ajax form we are submitting.
	$('form#disableAccounts').find('a.submit-link').click(function () {
	    //work to set the state of the form before sending it to the server here...
	    $('form#disableAccounts').submit();
	});
)};

This works to suppress the second standard POST and allow the ASP.Net Ajax behavior to complete as expected.

My First Chrome Extension, part 2

I was fixated on how to get Chrome to pass a feed URL to my registered feed reader, Outlook 2010. After I figured out how to pass the message, it was nearly a one-liner to modify an existing extension. How many things could be wrong with one line of code that seems to be working?

url = url.replace( "%f", feedUrl.replace( "http:", "feed:" ) );

There are two bugs that I see in here.

1: URLs can legitimately include %f

From RFC 1738:

In addition, octets may be encoded by a character triplet consisting of the character "%" followed by the two hexadecimal digits (from "0123456789ABCDEF") which forming the hexadecimal value of the octet. (The characters "abcdef" may also be used in hexadecimal encodings.)

In practice this would affect URLs that include these characters:

  • ð -> %F0
  • ñ –> %F1
  • ò –> %F2
  • ó –> %F3
  • ô –> %F4
  • õ –> %F5
  • ö –> %F6
  • ÷ –> %F7
  • ø –> %F8
  • ù –> %F9
  • ú –> %FA
  • û –> %FB
  • ü –> %FC
  • ý –> %FD
  • þ –> %FE
  • ÿ –> %FF

The simple solution is to avoid using a macro character sequence that includes valid hexadecimal values. Sticking with a single ASCII character it could be anything from ‘g’ trhough ‘z’ except ‘s’, which is already being used.

2: A feed URL could have “http:” anywhere

The scheme for a URI is always at the beginning, but it is common for one URI to encode another one inside it. That is exactly how you pass a feed URL to Google Reader.

Imagine a feed URL like “http://feedify.exmaple/q=http://mysite.somewhere/page”. In order to pass this to the feed scheme handler I need to replace the http: scheme with feed: but my original code would also replace the second http:, which would break the URL.

The solution is to use a regular expression so that I only replace the http: scheme rather than every occurrence of the pattern “http:”. In Regular Expression syntax, the ^ character means that the pattern has to start with the beginning of the string.

^http:

I JavaScript, the shorthand for a Regular Expression object is paired / characters:

/^http:/

3:URI schemes are not case sensitive

The replace() method of the JavaScript String object is case sensitive. My code would not work on a URL like “HTTP://mysite.example/stuff”.

Fortunately, the Regular Expression object in JavaScript has a case-insensitive match mode. You set this with the ‘i’ option:

/^http:/i

3 Bugs in 1 Line: Fixed

url = url.replace( "%g", feedUrl.replace( /^http:/i, "feed:" ) );

%d bloggers like this: