8

I need to convert a stream of char into a stream of bytes, i.e. I need an adapter from a java.io.Writer interface to a java.io.OutputStream, supporting any valid Charset which I will have as a configuration parameter.

However, the java.io.OutputStreamWriter class has a hidden secret: the sun.nio.cs.StreamEncoder object it delegates to underneath creates an 8192 byte (8KB) buffer, even if you don't ask it to.

The problem is, at the OutputStream end I have inserted a wrapper that needs to count the amount of bytes being written, so that it immediately stops execution of the source system once a specific amount of bytes has been output. And if OutputStreamWriter is creating an 8K buffer, I simply get notified of the amount of bytes generated too late because they will only reach my counter when the buffer is flushing (so there will be already more than 8,000 already-generated bytes waiting for me at the OutputStreamWriter buffer).

So the question is, is there anywhere in the Java runtime a Writer -> OutputStream bridge that can run unbuffered?

I would really, really hate to have to write this myself :(...

NOTE: hitting flush() on the OutputStreamWriter for each write is not a valid alternative. That brings a large performance penalty (there's a synchronized block involved at the StreamEncoder).

NOTE 2: I understand it might be necessary to keep a small char overflow at the bridge in order to compute surrogates. It's not that I need to stop the execution of the source system in the very moment it generates the n-th byte (that would not be possible given bytes can come to me in the form of a larger byte[] in a write call). But I need to stop it asap, and waiting for an 8K, 2K or even 200-byte buffer to flush would simply be too late.

3
  • Arguably it can't be fully unbuffered - if you call write with just the first half of a surrogate pair, for most encodings the writer would have to store that and wait for the second character before writing anything.
    – Jon Skeet
    Commented Apr 23, 2016 at 10:57
  • 1
    Well yes, of course, I understand that. But there's a difference between a couple of buffered chars needed to compute a surrogate pair and an 8K buffer... Commented Apr 23, 2016 at 10:58
  • 2
    I don't think there's any "of course" there - I suspect many readers will assume you mean no buffering at all, and that you might not be aware of surrogate pairs. I suggest you edit the question to clarify that. (Would it be "too late" if the first call to write didn't stop execution, for example?)
    – Jon Skeet
    Commented Apr 23, 2016 at 11:00

1 Answer 1

13

As you have already detected the StreamEncoder used by OutputStreamWriter has a buffer size of 8KB and there is no interface to change that size.

But the following snippet gives you a way to obtain a Writer for a OutputStream which internally also uses a StreamEncoder but now has a user-defined buffer size:

String charSetName = ...
CharsetEncoder encoder = Charset.forName(charSetName).newEncoder();

OutputStream out = ...
int bufferSize = ...

WritableByteChannel channel = Channels.newChannel(out);
Writer writer = Channels.newWriter(channel, encoder, bufferSize);
1
  • I cannot give you enough points for this. It works like a charm, thanks so much. The only thing I worry a bit about is that the spec defines that bufferSize as the minimum buffer size. But I see the standard implementation at sun.nio.cs.StreamEncoder simply uses this as a fixed size, so that should do. Thanks again. Commented Apr 23, 2016 at 11:34

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.