When you see a JSP document, you might wonder why it specifies the UTF-8 encoding three or four times. This is a post about what those declarations mean.
A sample JSP document
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> <%@ page language="java" contentType="text/html; charset=UTF-8" pageEncoding="UTF-8"%> <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> <title>Sample JSP page in standard syntax</title> </head> <body> TODO: page content </body> </html>
What it means
It is easiest to see what is going on by looking at the raw HTTP response:
HTTP/1.1 200 OK Server: Apache-Coyote/1.1 Content-Type: text/html;charset=UTF-8 Content-Length: 295 Date: Sun, 19 Dec 2010 16:40:22 GMT <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> <title>Sample JSP page in standard syntax</title> </head> <body> TODO: page content </body> </html>
The HTTP response Content-Type
header carries the
MIME type (text/html;charset=UTF-8
). This is set by the contentType
attribute in the page directive and instructs the user
agent (browser) on how to decode the character data.
The MIME type in the meta
element echoes this
information. The HTML
specification covers the why this is necessary.
The page declaration also contains the pageEncoding
attribute. The JSP is a text file and this attribute defines the
encoding of that file. So, it is possible to write the JSP in one
encoding and have it emit data in another.
Exactly how the JSP compiler determines the encoding of
a JSP source file is a bit more complicated than this and can involve
byte-order-marks, page-encoding
elements in web.xml
and the contentType
attribute and is generally tied up with
page syntax discovery mechanisms. There are three pages of rules in the
appendices of the JSP
specification if you're interested.
In Java
If it helps, here's what Tomcat generates during the JSP translation phase for this file:
public void _jspService(HttpServletRequest request, HttpServletResponse response) throws java.io.IOException, ServletException { PageContext pageContext = null; HttpSession session = null; ServletContext application = null; ServletConfig config = null; JspWriter out = null; Object page = this; JspWriter _jspx_out = null; PageContext _jspx_page_context = null; try { response.setContentType("text/html; charset=UTF-8"); pageContext = _jspxFactory.getPageContext(this, request, response, null, true, 8192, true); _jspx_page_context = pageContext; application = pageContext.getServletContext(); config = pageContext.getServletConfig(); session = pageContext.getSession(); out = pageContext.getOut(); _jspx_out = out; out.write("<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\" \"http://www.w3.org/TR/html4/loose.dtd\">\r\n"); out.write("\r\n"); out.write("<html>\r\n"); out.write("<head>\r\n"); out.write("<meta http-equiv=\"Content-Type\" content=\"text/html; charset=UTF-8\">\r\n"); out.write("<title>Sample JSP page in standard syntax</title>\r\n"); out.write("</head>\r\n"); out.write("<body>\r\n"); out.write("TODO: page content\r\n"); out.write("</body>\r\n"); out.write("</html>"); } catch (Throwable t) { if (!(t instanceof SkipPageException)){ out = _jspx_out; if (out != null && out.getBufferSize() != 0) try { out.clearBuffer(); } catch (java.io.IOException e) {} if (_jspx_page_context != null) _jspx_page_context.handlePageException(t); } } finally { _jspxFactory.releasePageContext(_jspx_page_context); } }
Note: calling setContentType in this way determines how the writer encodes the character data to the byte stream.
No comments:
Post a Comment
All comments are moderated