Unfortunately, the standard library doesn't include any facility to do this automatically. Instead, cut & paste this code to convert any raw InputStream into a Reader of the appropriate type.
public Reader inputStreamToReader(InputStream in) throws IOException {
in.mark(3);
int byte1 = in.read();
int byte2 = in.read();
if (byte1 == 0xFF && byte2 == 0xFE) {
return new InputStreamReader(in, "UTF-16LE");
} else if (byte1 == 0xFF && byte2 == 0xFF) {
return new InputStreamReader(in, "UTF-16BE");
} else {
int byte3 = in.read();
if (byte1 == 0xEF && byte2 == 0xBB && byte3 == 0xBF) {
return new InputStreamReader(in, "UTF-8");
} else {
in.reset();
return new InputStreamReader(in);
}
}
}

4 comments:
Are there any plans to include this in Guava?
It might be worthwhile to ensure that if InputStream.markSupported() returns false, you'd return something like a SequenceInputStream that prepends the bytes read, and then the remaining InputStream.
The UTF-LE BOM is FF FE not FF FF
Thanks for the useful example!
I notice that the utf16-be case is wrong though: it checks both bytes for 0xFF; byte1 should be checked for 0xFE.
If the stream doesn't support mark (I had this with a GZipInputStream) you could wrap the InputStream in a BufferedInputStream:
if (!in.markSupported()) {
in = new BufferedInputStream(in);
}
in.mark(3);
Thanks, this is great, but there's a mistake in UTF-16BE. It should be FE FF.
See http://illegalargumentexception.blogspot.com/2009/05/java-rough-guide-to-character-encoding.html#javaencoding_boms
for more.
Post a Comment