especially for platforms where unaligned access is not allowed. Make
it possible to override the small buffer size.
A simple continuous read string test using libusb showed a reduction
in CPU usage from roughly 10% to less than 1% using a dual-core GHz
CPU, when the malloc() operation was skipped for small buffers.
MFC after: 2 weeks