Friday, 24 December 2010

JavaScript: validating UTF-8 string lengths in the browser

Let's take a JavaScript string: "€100". This is going to be sent from a browser input box and stored in a web server's database. The database is using the UTF-8 encoding and the constraint on the column is CHAR(4). Spot the problem?

Sunday, 19 December 2010

JSP: what all the encoding declarations mean

When you see a JSP document, you might wonder why it specifies the UTF-8 encoding three or four times. This is a post about what those declarations mean.

Sunday, 21 November 2010

Comments policy

Comments are moderated and will not appear until I approve them.

  • I don't live on the blog, so it may take me some time to see and respond to your comment.
  • I won't publish comments with e-mail addresses in them.
  • If you post a question and I don't respond, I just may not know the answer off the top of my head and may not feel like putting in the research to answer it. You'll have more luck on a dedicated Q&A site like
  • Comments that say little more than "Thanks!" are appreciated, but don't add much value for other readers. Don't expect them to show up.
  • Spam gets deleted.

Corrections and constructive criticism are welcome.

Sunday, 19 September 2010

Java: System.console(), IDEs and testing

The method System.console() can return null if there is no console device present. This comes as a surprise to people when they run their code in an IDE. This post is about overcoming such problems.

Thursday, 16 September 2010

Java: "Content is not allowed in prolog" - causes of this XML processing error

Content is not allowed in prolog is an error generally emitted by the Java XML parsers when data is encountered before the <?xml... declaration. You may inspect the document in a text editor and think nothing is wrong, but you need to go down to the byte level to understand the problem. You probably have a character encoding bug.

Sunday, 1 August 2010

Java: a fluent I/O API (4/4)

This is the fourth post about my experiments with a fluent I/O API. This post covers conclusions and limitations of the implementation. You can find downloads and source repository details further down the page.

Java: a fluent I/O API (3/4)

This is the third post about my experiments with a fluent I/O API. This post covers how the API enhances exception handling.

Java: a fluent I/O API (2/4)

This is the second post about my experiments with a fluent I/O API. This post covers how to extend the API.

Java: a fluent I/O API (1/4)

I've been experimenting with fluent API design. You can find the sources in part 4.

I've often been frustrated with the verbosity of Java I/O. Handling close with decorators got better with the introduction of the Closeable interface, but there's still a bit of boilerplate. This post describes a new fluent API to wrapper around the existing I/O API.

Saturday, 17 April 2010

I18N: comparing character encoding in C, C#, Java, Python and Ruby

Don't assume that the character handling conventions you've learnt in one language/platform will automatically apply in others. I've selected a cross-section of popular languages to contrast the different ways character encoding is handled.

Tuesday, 12 January 2010

Scala: implementing a "did you mean..?" spelling corrector

I was looking at Scala again and decided to implement Peter Norvig's algorithm for suggesting spelling correction suggestions. I suggest you go read How to Write a Spelling Corrector for the clever stuff.

This implementation is limited to the English alphabet. You'll need the big.txt file or a similar set of training data.