The DataInput interface provides
for reading bytes from a binary stream and
reconstructing from them data in any of
the Java primitive types. There is also
a
facility for reconstructing a String
from data in
modified UTF-8
format.
It is generally true of all the reading
routines in this interface that if end of
file is reached before the desired number
of bytes has been read, an EOFException
(which is a kind of IOException)
is thrown. If any byte cannot be read for
any reason other than end of file, an IOException
other than EOFException is
thrown. In particular, an IOException
may be thrown if the input stream has been
closed.
Implementations of the DataInput and DataOutput interfaces represent Unicode strings in a format that is a slight modification of UTF-8. (For information regarding the standard UTF-8 format, see section 3.9 Unicode Encoding Forms of The Unicode Standard, Version 4.0)
'\u0001' to
'\u007F' are represented by a single byte.
'\u0000' and characters
in the range '\u0080' to '\u07FF' are
represented by a pair of bytes.
'\u0800'
to '\uFFFF' are represented by three bytes.
| Value | Byte | Bit Values | |||||||
|---|---|---|---|---|---|---|---|---|---|
| 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | ||
\u0001 to \u007F |
1 | 0 | bits 6-0 | ||||||
\u0000, \u0080 to \u07FF |
1 | 1 | 1 | 0 | bits 10-6 | ||||
| 2 | 1 | 0 | bits 5-0 | ||||||
\u0800 to \uFFFF |
1 | 1 | 1 | 1 | 0 | bits 15-12 | |||
| 2 | 1 | 0 | bits 11-6 | ||||||
| 3 | 1 | 0 | bits 5-0 | ||||||
The differences between this format and the standard UTF-8 format are the following:
'\u0000' is encoded in 2-byte format
rather than 1-byte, so that the encoded strings never have
embedded nulls.
java.io.DataInputStream, java.io.DataOutput