Tuesday, January 30, 2024

[SOLVED] Java Base64 Decoding much slower than Encoding for large files

January 30, 2024 base64, java, ubuntu

Issue

Currently, I am investigating an existing application encoding and decoding files up to 100MB into and from Base64. I noticed that there is a huge difference in performance between encoding and decoding with Java's Base64 class. The decoding can be over 100 times slower than the encoding. From my understanding, there is no added complexity (besides the slightly (~33%) larger input file) when decoding the same content from Base64.

I timed the encoding/decoding for three file sizes and compared it to encoding/decoding with the pre-installed bash64 command on my Ubuntu Terminal. This showed that there is not really a difference between encoding and decoding when using my Terminal. However, the Java Base64 Decoding seems to be clearly out of line.

My measurements:

File Size	100 KB	1 MB	61 MB
Java Base64 Encode	5 ms	36 ms	325 ms
Java Base64 Decode	151 ms	1 522 ms	83 134 ms
Terminal Base64 Encode	8 ms	21 ms	257 ms
Terminal Base64 Decode	10 ms	46ms	385ms

Is there a specific reason for Java's decode method to be this slow? I am not sure what could make Java's decode method more complex than the Terminal command's base64 decode. I think that encoding and decoding of "large" files is not the typical use case, but still, I would expect a reasonable performance.

I would appreciate any insight or suggestions to fix my problem!

Java Reproducer:

import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.util.Base64;

public class Base64Test {

    public static void main(String[] args) throws IOException {
        String inputFileName = "input.txt";
        String outputFileName = "encoded.txt";
        String outputFileName2 = "decoded.txt";

        long time1 = System.nanoTime();

        try (FileInputStream in = new FileInputStream(inputFileName);
             FileOutputStream out = new FileOutputStream(outputFileName)) {
            in.transferTo(Base64.getEncoder().wrap(out));
        }

        System.out.println("Base64 encoding  : " + outputFileName + " Time: " + (System.nanoTime() - time1) / 1000 / 1000);

        long time2 = System.nanoTime();
        try (FileInputStream fis = new FileInputStream(outputFileName);
             FileOutputStream out = new FileOutputStream(outputFileName2)) {
            InputStream in = Base64.getDecoder().wrap(fis);
            in.transferTo(out);
        }

        System.out.println("Base64 decoding: " + outputFileName + " Time: " + (System.nanoTime() - time2) / 1000 / 1000);
    }
}

Terminal commands:

$ time base64 input.txt > encoded.txt
$ time base64 --decode encoded.txt > decoded.txt

System:

Ubuntu 22.04
16GB memory
OpenJDK 17.0.9
Intellij 2023.2.5

Input files consisted of random characters I typed with my keyboard and copied until the specific file size was reached. I also tried out a different Ubuntu notebook with JDK 17 and got the same results.

EDIT: Results with buffering:

File Size	100 KB	1 MB	61 MB	100 MB
Java Base64 Encode	5 ms	30 ms	311 ms	423 ms
Java Base64 Decode	12 ms	46 ms	1684 ms	2875 ms

Solution

Multiple issues here.

Buffers

The .read() call (as in, read a single byte) can be either somewhat or incredibly inefficient, depending on the underlying resource. It might require asking the OS for one byte which involves some infrastructure that takes a tiny bit of time, but it really adds up if you do it for every byte. It takes a ton of time if the underlying resource is block-based, and thus .read() reads in an entire block and then ditches everything except the one byte you want. If you do this:

for (int i = 0; i < 1024; i++) in.read();

And in represents a resource that works in 1024-size blocks, that will read the 1024 bytes, toss away all but the first, and return that. Rinse and repeat: It reads 1024*1024 bytes. It is, literally, 1024 slower than byte[] b = new byte[1024]; in.read(b);.

If you want to write code that is just simpler to write using in.read() instead of in.read(buffer), no problem! Wrap the resource in new BufferedInputStream(), and BIS will take care of making a little buffer. .read() on BIS causes a read(sizableBuffer) on the underlying resource, then further .read() calls on BIS just are an all memory, all-java affair, with BIS just returning you bytes from its buffer until you've gotten them all, only then will then next .read() call reach into the underlying resource again.

transferTo does not require buffers; it, itself, is 'smart' and transfers in patches.

However, digging into the source of Base64.getDecoder().wrap(someInputStream), even its .read(sizableByteArray) implementation calls in.read() on the underlying resource. Weirdly, the encoder does buffer a lot more (about 8192 bytes, that'll really take the edge off at least, and is more than enough for most block-based systems).

Simply wrap em in new BufferedInputStream and new BufferedOutputStream and this problem disappears, and that gives you 100 to 1000x speedup.

Microbenchmarking

Those simple explanations of how computers work are utter and total lies. They don't work like that at all. CPUs do pipelining and predictive ahead-of-pointer work, the JVM runs everything slow (even slower than you think: It, for example, does some bookkeeping and for example adds code that counts how often a certain if is triggered vs. skipped). That's because the vast majority of apps literally spend 99% of the time in 1% of the code, so running code slowly makes absolutely no difference whatsoever. Except that 1%, where it makes a ton of difference. That bookkeeping lets the JVM know what the 1% is, and the branch stuff means the JVM can make incredibly inefficient bytecode for it, which it will, and then just run that. This is called JIT (Just-in-time) / hotspot.

The thing is, on boot, the JVM doesn't know yet. So, the performance (oversimplified) of the 'hot path' looks like:

iterations 1-1000: Take 50ms per iteration.
iteration 1001: Takes 800ms as the JVM compiles up a machine code version informed by all that bookkeeping.
iteration 1002+: Take 4ms per iteration.

You want to iterate enough to make the impact of iterations 1-1001 get lost in the noise. But how do you know you've gotten past the 'JIT warmup phase'? This is tricky. And that's just one of a ton of reasons why performance measuring is vastly more complicated than just 'start timer, run code, stop timer'.

Unless you spend 5 years studying JVMs and CPU design, you're going to be hopeless at this and can't do the job. Fortunately, you don't have to: You use existing tools to do the performance testing for you. The most commonly used microbenchmarking tool is JMH. Follow this tutorial.

The nature of Base64 and disks

Encoding Base64 turns every block of 3 bytes into 4 bytes. That means you write 33% more than you read. (read 3MB, write 4MB).

Decoding turns every block of 4 bytes into 3 bytes. That means you read 33% more than you write. (Read 4MB, write 3MB).

If the underlying sources have different performance characteristics for read vs. write, one of them will naturally be slower than the other one.

But.. that's a big gap!

I'd fix all the above (add buffering, hang all this code into a JMH harnes), but a 10x+ factor vs. command line base64 is still quite large. Here are the next steps I would take:

Run this on a freshly downloaded OpenJDK21 installation.
Use guava's Decoder instead of the built in one.

If the difference disappears, you can now safely 'blame' a shoddy implementation.

Answered By - rzwitserloot

Answer Checked By - Cary Denson (WPSolving Admin)

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0