Recently, I worked on a .NET project to synchronize data between two APIs. One of the data types I had manipulated was binary data which had been used to store file-based data like images and documents. The particularity here is that: all binary data were in base64 format.
“Base64 is a group of binary-to-text encoding schemes that represent binary data (more specifically, a sequence of 8-bit bytes) in an ASCII string format by translating the data into a radix-64 representation.” -Wikipedia
So I had to calculate the size of the file represented by the base64 string and apply some business logic.
When decoding Base64 text, four characters are typically converted back to three bytes. This short sentence is enough to calculate the corresponding file size from any base64 string.
The only exceptions are when padding characters exist. A single =
indicates that the last four characters will decode to only two bytes, while ==
indicates that the last four characters will decode to only a single byte.
I created a file named base64.txt
(size= 13 bytes) containing the text Hello World!
and converted it to base64 using the base64 guru website. Here is the output: SGVsbG8gd29ybGQgIQ==
.
Below is how we can calculate the size of our file from base64.
Our base 64 | SGVs | bG8g | d29y | bGQg | IQ== |
---|---|---|---|---|---|
Size | 3 bytes | 3 bytes | 3 bytes | 3 bytes | 1 byte (due to the padding characters) |
Total size | 13 bytes |
The padding character is not essential for decoding, since the number of missing bytes can be inferred from the length of the encoded text. In some implementations, the padding character is mandatory, while for others it is not used. An exception in which padding characters are required is when multiple Base64 encoded files have been concatenated.
Sometimes, base64 is stored with MIME-type, in that case, you should remove it before calculating the size. By using the same example we saw earlier and adding the MIME type, we get this: data:text/plain;base64,SGVsbG8gd29ybGQgIQ==
If the MIME-type is not removed first, the total size will be completely wrong, we should get something around 30 bytes
.
As you can see decoding base64 is very simple, you can now implement the algorithm in your favorite programming language. I am going to use C#/.NET in this article to implement the algorithm.
var length = base64String.AsSpan().Slice(base64String.IndexOf(',') + 1).Length;
var fileSizeInByte = Math.Ceiling((double)length / 4) * 3;
We need to subtract 2 from the fileSizeInByte
we calculated above if the last 2 characters of the base64 string are paddings, otherwise subtract 1 when the last character only is a padding character.
// Get the last 2 characters
var paddings = base64String[^2..];
fileSizeInByte = paddings.Equals("==") ? fileSizeInByte - 2 : paddings[1].Equals('=') ? fileSizeInByte - 1 : fileSizeInByte;
Here is the complete code:
var base64String = "The-base64-string";
var applyPaddingsRules = true;
// Remove MIME-type from the base64 if exists
var length = base64String.AsSpan().Slice(base64String.IndexOf(',') + 1).Length;
var fileSizeInByte = Math.Ceiling((double)length / 4) * 3;
if(applyPaddingsRules && base64Length >= 2)
{
var paddings = base64String[^2..];
fileSizeInByte = paddings.Equals("==") ? fileSizeInByte - 2 : paddings[1].Equals('=') ? fileSizeInByte - 1 : fileSizeInByte;
}
I created a small class library project and implemented the algorithm of decoding. Check it out on GitHub: https://github.com/lioncoding-oss/FileSizeFromBase64.NET.
Quick Links