Compressing Sony camera raw images for storage

Sony please compress your ARWs losslelsy

I'm sure the ARW image sizes has been a headache for many Sony A7* camera owners. It has certainly been for me. Sony's compressed RAW format is lossy (seems it uses some sort of delta encoding) and I hate using it. Even if it's often not visibly different from an uncompressed ARW the knowledge that dynamic range is lost on some pixels due to compression made me turn on uncompressed ARW's and leave it at that. In fact, in some scenes the effects of the compression artefacts are visible (he said without bringing up any examples -- but I'm too lazy to dig them out right now).

Now anybody with an A7RIV is reportedly looking at a 150MB per image. That's ludicrous. That's just irresponsible of Sony because not only is backing them up an issue, but also SD card size needs to be bigger, and the sheer size of each .ARW file becomes an  I/O bottleneck very quickly unless one loads up their PC with NVMe drives. Which is expensive.

Backing up those huge files takes up a lot of space. Shooting 5000 images on a 24MP A7III on holidays and wanting to back them up will take up 250GB of space (about 85% of that if the files are zipped up as is). Oh man this is going to cost you if you back up on the cloud. Of course one may think that backing up jpgs is alright but I like to have the originals backed up and so that's what I'm doing. In their uncompressed form. The size issue is kind of obvious, and I don't know why Sony engineers didn't bother to implement better lossless compression.

So because I like messing with things I was wondering what can be done to improve back up story for those images.

What's inside the ARW

It's basically a TIFF file with one section containing data read from the sensor. For an A7III an uncompressed ARW is 6048 wide, 4024 tall, contains 16bit per pixel in a Bayer pattern (If you thought you really get 24MP RGB pixels raw readout then I hate to disappoint you -- we get 12MP green, 6MP red and 6MP blue pixels in a mosaic pattern). A tool like RawTherapee shows the pattern there if you want to see it.

Original photogarph jpeg

Cropped corner for illustration, demosaicing done.

Bayer pattern before processing. Note that individual pixels are not RGB -- they are green, blue or red.

How compressible is it?

If we zip the two types raw files (zip uses compression level 9) we get this

File properties File size before zip File size with zip level 9
Uncompressed.arw 50MB 40MB
Compressed.arw 24MB 23MB

Hmm. 40MB is still not very nice. In theory the sensor has [probably] 12bits of precision (TODO: check) so 4 bits in every 16bits of the pixel values can be removed so 25% of the file is zero bits so that compression ratio is barely just mopping up the obvious squishiness in the data.

The Sony-lossy-compressed file uses delta compression so entropy is much higher already so further zip compression doesn't give much.

Considering how "squishy" the uncompressed RAW files are, it doesn't look like zip is compressing the file enough. One method that is commonly applied for making files more compressible is changing the data stream that zip sees by rearranging the input data (give zip a higher entropy data stream without changing the information for better compression).

First we'll try the channel deinterleave. The idea behind it is it's more likely to have nearby green pixels to be close in value than e.g. to the red. So currently zip sees a stream something like G R G R G R, then B G B G B G for the next row, and we want to change it to GGGG RRRR BBBB so the data is more correlated. This definitely helps

Deinterleaved image. Each colour channel is in its own separate stream.

Compressed size 37MB, that's 7.5% smaller compressed than after compressing the unmodified file.

Second we do the 16bit value split into two bytes. The bottom byte is very noisy but the top bit changes slowly as it really should be. Introducing some notation G16=G_L + (G_H << 8) we want the stream to look like this G_L G_L G_L G_L followed by G_H G_H G_H G_H followed by R_L R_L R_L followed by R_H R_H R_H and the same for the blue channel.

Deinterleaved channels and low/high bits. Note how high bits vary very slowly and therefore will compress much better.

Compressed size 27MB. That's 32.5% smaller than what happens after zipping up the unmodified original file and 46% compression ration from the original file size.

File properties File size before zip File size with zip level 9
Swizzled_Uncompressed.arw 50MB 27MB
Compressed.arw 24MB 23MB


Now we're talking. We haven't changed the number of bytes of the information stored or lost any data whatsoever, we just rearranged it. Now zip can compress so much better. In fact the zip compressed file is now almost the same size the lossy delta compression they use, while retaining all of the data in it losslessly! For backing up purposes it makes a huge difference.

Conclusion
For me now, backing files up requires running a tool that I wrote and zipping up afterwards. It is not convenient for retrieval but it does save money in cloud storage. It is still possible to preview the thumbnail image inside the ARW file but to work on the image in a RAW processing tool the tool needs to restore the internal data from the de-interleaved form.

But utlimately, Sony should pay more attention to their raw file formats and implement something that doesn't just dump the sensor data into the file, the laziest thing anybody can do.

Please Sony, make an effort in that area! It's not rocket science.I would start by deinterleaving the RGB then separating K noisy lower bits from N smooth high bits and RLE-ing the high bits and see what happens.

Comments

  1. New Slots & Casino Games At M Resort Spa - JT Hub
    M Resort Spa is 목포 출장샵 home to the largest 세종특별자치 출장안마 collection of video poker and 강릉 출장샵 casino games, including popular and 부천 출장안마 classic slots and table 의정부 출장샵 games. Enjoy a relaxed,

    ReplyDelete
  2. Wynn Hotel and Casino - Mapyro
    Located in Las Vegas Strip, Wynn 나주 출장마사지 Hotel and Casino is within a 15-minute drive of 군포 출장안마 Fashion Show Mall and 삼척 출장마사지 Sands Expo Convention 과천 출장안마 Center. The 청주 출장마사지 casino features a

    ReplyDelete
  3. Casino and Hotel Maryland had been busy, even hours before the primary pitch of the World Series. But what happens when Maryland gamblers can just bet on their phones? Casino occasion that they} plan to make bets on their telephones quickly as} the state allows it. Bettors can find any variety of unregulated, offshore bookmakers that enable them to position bets on their telephones, however those websites don’t generate any tax income for states. “All of the cash — relatively talking — is in cellular or online betting,” Holden mentioned. “Everywhere that has cellular betting, you see a 카지노사이트 fee that’s near 90% or higher” of all wagers.

    ReplyDelete

Post a Comment