The unicode specs say that a file 'may' start with a BOM(U+FEFF). The
reader of the bytes can then look to see how the BOM is encoded, and
pick the correct encoding(UTF-8, UTF-16(le/be), UTF-32(le/be). If the
file does start with a BOM, it must be removed.
A BOM anywhere else in the datastream is left alone.
However, lovely java doesn't do this correctly. UTF-8 encodings do
*not* remove the BOM. Only the others do. The bug about this is at
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4508058I'm sending this to the list, because UTF-8 is the only sensible
encoding to use nowadays, and this might crop up here. I don't really
have a fix yet.
I'm going to have to deal with this in webslinger, so I'll develop a
change there, and then alter the ofbiz code with the same kind of logic.