OFBiz › OFBiz - Dev

UTF-8 encoding and BOM java bug

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

1 message

Adam Heath-2

UTF-8 encoding and BOM java bug

The unicode specs say that a file 'may' start with a BOM(U+FEFF). The
reader of the bytes can then look to see how the BOM is encoded, and
pick the correct encoding(UTF-8, UTF-16(le/be), UTF-32(le/be). If the
file does start with a BOM, it must be removed.

A BOM anywhere else in the datastream is left alone.

However, lovely java doesn't do this correctly. UTF-8 encodings do
*not* remove the BOM. Only the others do. The bug about this is at
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4508058

I'm sending this to the list, because UTF-8 is the only sensible
encoding to use nowadays, and this might crop up here. I don't really
have a fix yet.

I'm going to have to deal with this in webslinger, so I'll develop a
change there, and then alter the ofbiz code with the same kind of logic.