Content is not allowed in prolog is an error generally
emitted by the Java XML parsers when data is encountered before the <?xml...
declaration. You may inspect the document in a text editor and think
nothing is wrong, but you need to go down to the byte level to understand
the problem. You probably have a character encoding bug.
This code reproduces the problem:
import java.io.*; |
The output:
Content is not allowed in prolog. actual:UTF-8 xml:<?xml version='1.0' encoding='UTF-16'?><x/> HIDDEN ERROR! actual:UTF-8 <?xml version='1.0' encoding='ISO-8859-1'?><x/> Content is not allowed in prolog. actual:UTF-16 xml:<?xml version='1.0' encoding='UTF-8'?><x/> Content is not allowed in prolog. actual:UTF-16 xml:<?xml version='1.0' encoding='ISO-8859-1'?><x/> HIDDEN ERROR! actual:ISO-8859-1 <?xml version='1.0' encoding='UTF-8'?><x/> Content is not allowed in prolog. actual:ISO-8859-1 xml:<?xml version='1.0' encoding='UTF-16'?><x/>
This code also highlights another, more insidious character encoding issue - when we can accidentally encode with one encoding thinking it is another and everything seems to work.
When you inspect the data in a hex editor problems become more apparent.
A valid UTF-16 form:
FF FE 3C 00 3F 00 78 00 6D 00 6C 00 20 00 76 00 __<_?_x_m_l_ _v_ 65 00 72 00 73 00 69 00 6F 00 6E 00 3D 00 27 00 e_r_s_i_o_n_=_'_ 31 00 2E 00 30 00 27 00 20 00 65 00 6E 00 63 00 1_._0_'_ _e_n_c_ 6F 00 64 00 69 00 6E 00 67 00 3D 00 27 00 55 00 o_d_i_n_g_=_'_U_ 54 00 46 00 2D 00 31 00 36 00 27 00 3F 00 3E 00 T_F_-_1_6_'_?_>_ 3C 00 78 00 2F 00 3E 00 <_x_/_>_
Note: exact UTF-16 byte forms vary - big-endian, little-endian, with or without a byte-order-mark. This one is little-endian with a BOM.
An XML document that declares itself as UTF-16 but is really UTF-8:
EF BB BF 3C 3F 78 6D 6C 20 76 65 72 73 69 6F 6E ___<?xml version 3D 27 31 2E 30 27 20 65 6E 63 6F 64 69 6E 67 3D ='1.0' encoding= 27 55 54 46 2D 31 36 27 3F 3E 3C 78 2F 3E 'UTF-16'?><x />
Note: UTF-8 XML documents can come with or without a byte-order-mark. This one includes a BOM.
XML, Java and Encodings
- XML 1.0: Autodetection of Character Encodings (Non-Normative)
- Java: a rough guide to character encoding
The code was written and tested against Sun's win32 Java 1.6.0_17 which uses a version of the Apache Xerces parser internally.
No comments:
Post a Comment
All comments are moderated