Identifying/decomposing raw formats

I don't have any prior experience with LibRaw, but quickly glossing over the documentation and header files, I didn't find anything that looks like the kind of feature I'm looking for. Maybe I'm wrong, though!

I'd like to be able to do the following:

- Identify that a file is a raw file, as quickly as possible.
- If it is a raw file, figure out if it is using a compressed sample representation or not.
- If it is not using compressed representation, determine where (i.e. at which byte offset) in the raw file the pixel data starts, where it ends, and what the format of the pixels is (e.g. 16-bit little-endian integer, no padding, RGGB or something like this). I'm naively assuming that most raw formats store the pixel data in a contiguous block.

I'm pretty certain that all of this information is available somewhere inside LibRaw. The question is if there is already an interface that could provide this information, and if there isn't, if it would be something worth adding without making it a maintenance nightmare.

The reason I'm asking in this is that I'd like to add transparent compression/decompression support for uncompressed raw files to my compressed read-only file system [1]. That is, while the data is compressed in the file system image, it will be byte-for-byte identical to the original raw file when the file system is mounted. I've implemented this type of transparent compression/decompression for raw audio formats as well as for the FITS image format that is popular in astrophotography. The compression ratio is usually around 50% and compression/decompression speed is so fast that by accessing a mounted image over a 1 Gbps network, you can achieve read speeds of up to 2 Gbps. I'm using this already for archiving my astrophotography images, and I would like to also use it to archive my uncompressed raw images. I've already done a few basic tests by converting raw images to FITS to see if they compress well using the compression algorithm I'm using and it seems they do.

Any feedback would be really appreciated!

[1] https://github.com/mhx/dwarfs

Forums: 

Unfortunately, the

Unfortunately, the information you are asking about and in the form you are asking about is not available in the library (in its direct form).

You can query the (function) name of the decoder used by calling LibRaw:get_decoder_info() and create your own table indexed by the decoder name and containing the properties you need.

In practice, there are very few uncompressed formats now: photographers prefer to save space on flash cards and camera manufacturers follow this wish. Such uncompressed files may still present in archives, but it is hard to find an uncompressed file produced by modern camera (even if camera supports it, photographer will, most likely, switch to compressed format).

If you still want to handle uncompressed formats separately: they can be tiled/striped, so you'll need to add own code that interprets this correctly.

For uncompressed/not tiled data, data starts at libraw_internal_data.unpacker_data.data_offset and
data size is libraw_internal_data.unpacker_data.data_size
These fields are protected (in C++ terms), you may need to subclass LibRaw to access these fields.

-- Alex Tutubalin @LibRaw LLC

Hi Alex,

Hi Alex,

Thank you so much for the quick reply and the helpful information!

I still shoot uncompressed raw on my old Sony cameras as the compressed raw format isn't lossless. I do convert everything to DNG eventually as the format I'm working with, but I do like to archive the original out-of-camera files, probably out of paranoia. :)

I'll take a look at the fields you've mentioned and see if I can build something on top of that!

Cheers,
Marcus