Sunday, 19 December 2010

JSP: what all the encoding declarations mean

When you see a JSP document, you might wonder why it specifies the UTF-8 encoding three or four times. This is a post about what those declarations mean.

A sample JSP document

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<%@ page language="java" contentType="text/html; charset=UTF-8"
  pageEncoding="UTF-8"%>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>Sample JSP page in standard syntax</title>
</head>
<body>
TODO: page content
</body>
</html>

What it means

It is easiest to see what is going on by looking at the raw HTTP response:

HTTP/1.1 200 OK
Server: Apache-Coyote/1.1
Content-Type: text/html;charset=UTF-8
Content-Length: 295
Date: Sun, 19 Dec 2010 16:40:22 GMT

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">

<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>Sample JSP page in standard syntax</title>
</head>
<body>
TODO: page content
</body>
</html>

The HTTP response Content-Type header carries the MIME type (text/html;charset=UTF-8). This is set by the contentType attribute in the page directive and instructs the user agent (browser) on how to decode the character data.

The MIME type in the meta element echoes this information. The HTML specification covers the why this is necessary.

The page declaration also contains the pageEncoding attribute. The JSP is a text file and this attribute defines the encoding of that file. So, it is possible to write the JSP in one encoding and have it emit data in another.

Exactly how the JSP compiler determines the encoding of a JSP source file is a bit more complicated than this and can involve byte-order-marks, page-encoding elements in web.xml and the contentType attribute and is generally tied up with page syntax discovery mechanisms. There are three pages of rules in the appendices of the JSP specification if you're interested.

In Java

If it helps, here's what Tomcat generates during the JSP translation phase for this file:

  public void _jspService(HttpServletRequest request, HttpServletResponse response)
        throws java.io.IOException, ServletException {

    PageContext pageContext = null;
    HttpSession session = null;
    ServletContext application = null;
    ServletConfig config = null;
    JspWriter out = null;
    Object page = this;
    JspWriter _jspx_out = null;
    PageContext _jspx_page_context = null;


    try {
      response.setContentType("text/html; charset=UTF-8");
      pageContext = _jspxFactory.getPageContext(this, request, response,
            null, true, 8192, true);
      _jspx_page_context = pageContext;
      application = pageContext.getServletContext();
      config = pageContext.getServletConfig();
      session = pageContext.getSession();
      out = pageContext.getOut();
      _jspx_out = out;

      out.write("<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\" \"http://www.w3.org/TR/html4/loose.dtd\">\r\n");
      out.write("\r\n");
      out.write("<html>\r\n");
      out.write("<head>\r\n");
      out.write("<meta http-equiv=\"Content-Type\" content=\"text/html; charset=UTF-8\">\r\n");
      out.write("<title>Sample JSP page in standard syntax</title>\r\n");
      out.write("</head>\r\n");
      out.write("<body>\r\n");
      out.write("TODO: page content\r\n");
      out.write("</body>\r\n");
      out.write("</html>");
    } catch (Throwable t) {
      if (!(t instanceof SkipPageException)){
        out = _jspx_out;
        if (out != null && out.getBufferSize() != 0)
          try { out.clearBuffer(); } catch (java.io.IOException e) {}
        if (_jspx_page_context != null) _jspx_page_context.handlePageException(t);
      }
    } finally {
      _jspxFactory.releasePageContext(_jspx_page_context);
    }
  }

Note: calling setContentType in this way determines how the writer encodes the character data to the byte stream.

No comments:

Post a Comment

All comments are moderated