Two Paths Leading Nowhere
During the last 10 to 15 years, digital photography forced the film out of nearly all the domains. End users purchased hundreds of millions of digital cameras; and that is not including the cameras sold integrated with cellular phones. Such a huge industry can't exist without standards and such standards appear to exist. They cover the storage media (various flash cards), and image format which happens to be JPEG. Currently JPEG is the most widely used image format and its image quality and size satisfy the overwhelming majority of users.
However it is not always what professionals want. By professionals we do not mean just professional photographers. The list includes designers, pre-press staff, archivists, photo banks and many others. It often happens that JPEG format is also deemed less than appropriate by advanced amateurs. That is why nearly all digital cameras that are positioned by manufacturers as professional or semi-professionals models (as well as all current dSLR cameras) suggest an alternative format, the so-called RAW. For a casual onlooker it may appear that RAW is also some kind of a standard format that delivers better quality quality for pros.
This small article is to show that the matter is much more complicated. At the current stage the situation with RAW format is not just bad but really dreadful and continues to spiral downwards rapidly. This affects mostly professionals while less demanding amateurs simply enjoy the progress of digital.
RAW and JPEG: what is the difference?
When a photo is saved in JPEG all the processing is done in the camera before the image file is written to the storage media. On the contrary, when we shoot in RAW mode the data recorded is pretty much unchanged data read from the camera sensor.
Using classical photography terminology JPEG is pretty much a finished product, like an exposed Polaroid film is; while RAW is more like a latent, hidden image on a film before the film is developed. Such development for RAW files is often called RAW conversion. The programs used to perform such RAW conversion are often referred to as RAW converters. However there is a huge difference between undeveloped film and RAW data. The film can be processed only once. RAW data can be processed as many times as necessary using different RAW converters and experimenting with settings until the desired result is achieved. Of course in order to record an image in JPEG format the camera itself performs the necessary RAW conversion. The RAW converter is a part of camera intellect. In case the conversion performed in the camera itself turns to be less than satisfactory because of a poor contrast, bad colour, plugged shadows, blown out highlights, moir or any other reason it is very difficult and often impossible to recover the image. This is because the image, after conversion to 8-bit bitmap suitable for viewing, is already stripped of a lot of initial image data that was captured by the camera in RAW. This information is gone, lost past recovery. On top of that, saving it in a lossy format such as JPEG further complicates adjusting the image during post-processing.
While recording data in RAW format no digital processing is normally applied. The conversion of RAW data for viewing and printing is performed after the shooting using a powerful computer with a decent monitor. This allows to employ much more complex algorithms of RAW conversions, to monitor the process of the conversion, and to verify the results of the conversion visually. As a rule, this allows substantially better quality of the image; moreover, the conversion parameters can be tuned to a good degree without running into posterization, excessive noise levels, and other artefacts, that is without destroying the image.
Processing of RAW files demands certain additional skills and habits as well as additional time. This is why RAW format is used by a minority of photographers, mostly by advanced armatures and professionals and mostly when the quality of the result is more important that the speed of getting to it; or, when the shooting conditions do not allow the use of JPEG because of its limited ability to maintain dynamic range as compared to RAW. Another important case for RAW is when the customer explicitly demands for it. Quite often, RAW is used as a kind of a safety net. However it should be mentioned that the minority we referred to above are millions of users around the world.
The benefits that that are presented when using RAW format are by much overshadowed by the absence of well thought-out standards for the format. Because of this we deal with tens (or even hundreds depends on how you count) of varieties of this format.
Data formats and compatibility
Photography is one of those industries where much depends on the transparency and one-to-one biunic data exchange between the players. Sometimes these expectations are met, especially when a well-established formats such as JPEG, TIFF, EPS and others are used in a consistent workflow. But that is definitely not the case with RAW. For instance, take a photographer who has completed the shoot in RAW, and then browsed through the results and finally selected the keepers. Obviously this photographer is using the RAW converter of choice to evaluate the images. Unfortunately, selected RAW files can not be blindly submitted to the customer or to the pre-press bureau. The other party may be using a different RAW converter or a different version of the same converter, hence, the results of their conversion may be different to the point of being unacceptable and that is even if the photographer submits conversion parameters with the RAW file.
As of today we do not have any RAW standards that are recognized by all parties involved. Historically, the RAW format is defined by the camera s manufacturer. In most cases some extension of TIFF format is used. All camera manufacturers add their own extensions. Some manufacturers use several incompatible RAW formats. Sometimes it goes as far as one camera can record data in several incompatible formats.
Conveniently, the formats are not documented publicly. Some manufacturers enforce additional measures to protect data, for example, Nikon ciphers some data fields but this is not the only case of data concealment.
The manufacturers explain why they prefer not to disclose the RAW formats. Usually their explanations add up to the following simple reasons:
- Reserved and new data fields can reveal some trade secrets to competition.
- Quite often the data fields are added just in case ; as soon as the camera design allows to get this information it should be preserved and it may be used later to improve RAW conversion. Such fields include for example diagnostics and service fields. To document all these fields means to take responsibility for their contents and to maintain their presence in future releases of firmware and even future cameras.
- To open a format means to trigger an unnecessary public discussion. Let's say users can access the information that registers the focusing distance. The immediate reaction of certain users will be to get a ruler to check the focusing accuracy to motivate his claims that the camera doesn t focus correctly. In the recent case over the out-of-focus problems with 1D MkIII moral damage and financial losses of Canon could be much worse if only the respective field in RAW data would be officially documented. And of course such claims would not be limited to just Canon.
- Some camera manufacturers are trying to get additional money out of RAW converters. Not long ago native RAW converters had no competition at all to the extent of monopolizing the market. Of course they prefer to maintain a competition at minimum. It is not unusual to hear from camera manufacturers that only their converters do the justice to their cameras while third party converters only compromise their cameras decreasing image quality, distorting colours and sacrificing resolution and on top of that adding noise.
Manufacturers claim sometimes that encoding of data fields is done in the best interest of the users, that only such encoding allows to ensure data integrity and also to prove authenticity and authorship of the original shot. Those of our readers who are familiar with the modern state of cryptography will surely smile here. It looks like the manufacturers are not overly concerned with providing us with convincing and satisfactory arguments.
All other parties except the manufacturers are genially interested in open formats as well as in reducing the current manifold of formats.
- Photographers want to process the images taken with different cameras through only one or two standard processing workflows (as it was with films). Photographers benefit from the competition between the developers of different image processing programs. With the competition there is a hope for better processing quality and lower price for image processing programs. Another important consideration for photographers is an interaction between different programs to allow using best features of each of those.
- Photo labs today can t accept RAW files for batch processing and printing at all. In the absence of the standard processing routings printing from RAW can be done only manually with absolute minimum of automation.
- Program developers are also suffer from multiple formats and lack of documentation. They spend an awful lot of time studying alien formats and decoding the meanings of the fields. Such a waste of time and labour could be easily avoided with a little help and good will from the side of manufacturers. In the absence of such support from the camera makers their new cameras remain unsupported by the third party products for many months.
- Archivists, in the broad sense of this word, and that includes photo banks, advertising agencies and individual photographers, can t sleep well. The pandemonium of formats gives them nightmares. They need to store not only the RAW images but also the programs that can open those files, manuals for those programs, their own notes on using the programs and the sequences of user actions needed to set processing parameters to render the necessary result. Today we have at least one case when the compatibility between versions was lost. Processing parameters, if set in an older version of the program, are ignored by a newer version. Sometime setting those parameters in an older version can cause a crash of a newer version. This is because the RAW format was changed between those two versions. We mean Nikon Capture here. Maybe this is not the only case. We were not running especial investigation of this problem because even one case is more than enough.
The speed of development of digital cameras doesn t allow creating such a universal data format that would last forever. However the chaos that exists today is not inevitable. One of the reasons for the current situation is that distinct and agreeable attempts of introducing standards are absent. The photo industry hasmodern photography always enjoyed diversity, sometimes even too much of it. New film formats emerged and vanished (828, 110, 126, APS, disk film); different recipes of film processing felt into decay or went into oblivion (Polacolor, C-22, K-14, E-4). Many are not aware of the reasons of such excessive diversity (and by the way, such diversity was caused not only by economical or technological factors, but to a certain extent raison d tre was just an attempt to hook the consumer up and to get some additional revenue from selling the rights to use the format or the process to other manufacturers). But pretty much everybody knows the results: the archives of images taken with a use of ill-fated formats can be maintained only by professional archivists but not by the individuals or small companies. It would be just horrible if the current archives of digital photos will follow the pattern. It is even more so if we are taking into account that during the last 10 years the amount of photos captured nearly equals the amount captured during previous 30 years.
Data, metadata and meanings
One of the most important achievements of modern photography is storing not only the image data but metadata as well.
- Data is the image, captured by the camera. That is, it contains information about the level of brightness for each sensel. If the image is recorded in RAW format, very little is done to this data (usually only normalization and some noise reduction are applied). If the image is recorded as JPEG data is processed through multiple color and tone corrections, and also sharpened.
- Metadata is the information about the image. It contains exposure parameters, time the shot was taken, possibly geographical location of the shot, information on lighting conditions (white balance), make, model, and serial number of the camera, information about the lens, and so on. The description of data format (parametadata) should include; details of the method used to store image data (bit width, compression scheme, etc.) but also define metadata in full detail.
Parametadata (that is data format of a RAW-file) is exactly that, which brings meaning to stored bits sequences ( this field contains focal distance expressed in one tenth portions of inch). Since manufacturers do not document format, the quest for bits meaning is to be fulfilled by hackers (in a good sense of this word), who, using different methods, make their own definitions of formats (we will talk about it hereinafter).
With a view of a further discussion lets divide data and metadata into the following groups:
- Those which are necessary to get a good quality image from RAW-file: the manufacturer, the camera s model, light sensitivity while taking the shot, image size, white balance data, the use of flash while shooting, as well as some other critical parameters, and of course, the image data that is the brightness map registered by sensor.
- Those, which might be used while processing a RAW-file: camera s presets (contrast-saturation-tone curve-sharpening-colour space), optics, and focusing parameters.
- Those, which are not necessary for processing but useful for demonstrating, cataloguing and search: like date and time, GRS coordinates, author, description of photo etc.
One cannot say that no standard for metadata exists, there is an EXIF standard and the most of cameras manufacturers follow it. But EXIF, which in a first place was established to support ready, viewable images, describes fields prerequisite for cataloguing (the third group in our classification) and provides no help to the RAW-processing software developers.
It is also not true to say that data and metadata are completely not documented. They are, but in accordance with the grievous joke affirming that 'FreeBSD kernel is very well documented, unfortunately it is all on C '.
- Data and some part of metadata are documented in well-known program dcraw by Dave Coffin, the program, which currently supports (that is able to unpack) formats of 312 of digital cameras.
- Metadata are documented in the ExifTool program by Phil Harvey. This program deals with a much broader spectrum of information than just the EXIF. The program also recognizes and deciphers a number of utility fields, among them those which some converters include in RAW file if the file is not just out of camera, but modified and saved in the raw converter.
It is interesting to mention that the size of program code of ExifTool exceeds the size of program code of dcraw by nearly a whole order of magnitude (75 thousands of lines against 8 thousands). This proportion quite adequately reflects the ratio of laboriousness of data deciphering to metadata deciphering: metadata are much more diverse.
Of course this documentation is not enough. Despite of all hackers efforts, mistakes happen and completeness of description is far from perfect. Sometimes the correct deciphering of a data field of some camera becomes possible only when this camera has already been discontinued. As a result even developers of RAW processing software can t declare with any degree of certainty that they do everything in a correct way. Funny consequence is that, any attempts to compare the quality of RAW processors based on 1 or 2 examples, are completely meaningless.
Revolutionary situation in digital photography
According to herein-above in the digital photography industry a revolutionary situation is emerging in compliance with its definition given by nobody else but Vladimir Lenin himself:
Photographers (and the whole industry, which uses the results of their work) cannot live as of old: the variety of non-documented formats suits nobody, especially taking into account that the quantity of new modifications of formats is increasing nearly exponentially.
Cameras manufacturers cannot rule over as of old: despite of all their efforts, including (and mainly) their attempts to conceal information, converters produced by independent developers prevail by users number and sometimes even deliver higher quality results compared to native converters.
It is known that development of a revolution situation into a revolution depends on the existence of party, which is ready and capable of taking a lead of a struggle.
And here the reasonable question emerges: how is anything at all works in such a chaos?
Developers mainly use 2 approaches, decreasing the level of entropy a little:
- Some programs support only a very limited number of data formats, therefore dramatically simplifying the problem.
- If the author of a program declares that his program provides support for the majority of data formats, it means that most probably he is using dcraw source texts either as the ready solution or as a documentation. Among others who uses this approach is such a major player as Adobe. It is nothing less but amazing that such a huge industry largely depends on just one person and 8 thousand lines of code written by him.
It is easy to see that the both methods are having no prospects, especially from strategic point of view.
Adobe DNG
The DNG format was presented by Adobe in September 2004 as a universal format of a digital negative , intended for the eternal archive data storage. The specification DNG 1.0 was poorly thought-out, and in a half a year Adobe presented the specification DNG 1.1. Together with the description of its format DNG SDK was released. Unfortunately this DNG SDK cannot be considered as anything but a run-around : easy-to-read documentation, useful examples as well as program templates are virtually ascent.
Before we move on, let's check on Adobe s statements: the archive properties and the universality.
Is it really archival?
Let s make a very simple experiment: we will try to imitate the situation which could take place 3 years ago. To make this experiment we will take a source RAW-file from the old enough camera (Canon Powershot G6) and convert it into DNG with an old version of the Adobe converter. To check archival properties we ll convert both files source RAW and its derivative DNG with the same presets into bitmapped RGB-format using current version of Adobe Camera Raw v.5. Let s have a look at the difference between the results (photo 1). Visual difference between conversions of both files is small and it is highly possible that it won t be noticeable when printed in a magazine. But straightforward subtraction shows that the difference exists.
It is rather difficult to consider a format to be an archival while it does not provide the identity with the source given that the archival file and the source have been processed equally.
Is it universal?
To check universality let s perform the inverse operation: we ll take a shot with a current camera (Canon 1D Mark III), convert it into DNG using the modern version of a DNG-converter, and try to feed it to an old version of Adobe RAW converter (ACR), which does not know this camera. This experiment is quite topical, and here is why: the support of new cameras in Adobe Photoshop CS2 has been discontinued, but not everybody is ready to pay for upgrade to Adobe Photoshop CS3 or CS4 given that these versions do not provide any distinct advantages to a particular user.
It has been found that the version 2.4 of Camera Raw does not open the file at all, while versions 3.x open it but results of conversions of RAW and derived DNG into RGB (photo 2) differ even more than it was in the previous experiment.
Shortcomings and evolution of DNG
The reason for the both above-mentioned reciprocity failure is that not enough metadata is specified in DNG format specifications. At every stage of the DNG progress Adobe unpack and standardize only the metadata which is necessary to support their current conversion methods. All other metadata even if stored is still present in the initial (non-documented) form. The level of metadata standardization in DNG is not enough for the goal (universal archival format). Gradually Adobe are modifying DNG, discovering information content of metadata and accordingly adding to the specifications, but that metadata have been ignored in previous versions of converters. If the file had been converted by one of these previous versions, some data might have been lost forever (as it was shown above).
The specification DNG 1.2, which was released several months ago, contains some additional fields of metadata colour data, but since they are intended mainly to support Adobe products, they have been added in a form as they are used by Camera Raw and Lightroom. This data has no relation to source RAW formats and hence it is artificial. Thus, DNG more and more becomes the internal format of the company which have developed it.
The DNG format does not help developers to support non-standard sensors (such as Foveon, Fuji Super CCD SR having 2 different images in one shot etc.). Of course, it is not difficult to invent the way to store non-standard data, but consequently such data requires non-standard algorithms. Unfortunately, those are not accounted for by DNG.
At the same time some manufacturers (Panasonic, Leica, Samsung) have started to use the DNG format as the output format of their cameras, though it does not prevent them from recording non-documented metadata, since there is a special place assigned for such undocumented tags in DNG specifications.
One can easily see DNG as one more RAW format. In this sense DNG is a little bit better than all others since some fields are somehow documented. But it is absolutely impossible to use DNG as the universal archival format , and one can see it from the simple experiments we offered here. Moreover, the acceptance of DNG in its current state as the standard leads to the situation when the method of conversion used by Adobe is also imposed, though implicitly.
OpenRaw
In year 2005 the OpenRaw initiative emerged. In fact it boiled down to the call to cameras manufacturers to publish specifications of their respective RAW formats. This call was ignored altogether, despite the fact that well-respected people were holding polite and slow negotiations in accordance with all the rules of the Japanese etiquette with very influential managers of the leading manufacturers of digital photo equipment.
However, suppose that these manufacturers turned to be kind enough to publish all their internal raw cuisine, even if in the scope that was already known. Would it be of any help for raw converters developers? Our opinion is not too much; due to (traditionally) a couple of reasons:
- To program the processing of all data formats is a lot of work. Dave Coffin has been involved in it for more than 10 years, Phil Harvey about 5 years. Of course, given you have the descriptions it is no need to hack any more, which would have reduced the amount of work, but even reduced volume is still exorbitant high.
- In fact one needs not a description of all and every bit of a format but only descriptions of fields plus instructions of what to do with all that jazz. Unfortunately the founders of OpenRaw did not even think about asking for such instructions.
There would be no reason to mention OpenRaw initiative if not for the intense PR campaign held 3 years ago. The campaign was successful and now many think (unfortunately, mistakenly) that there is such a format as OpenRaw.
Who is to be blamed for and what is to be done?
All the problems of the photo industry mentioned can be attributed, among other reasons, to the fact that the list of requirements for RAW data has never been seriously and openly discussed. As a result in every particular case a subset of metadata interpreted by a converter reflects an opinion of developers on what is the correct way to process the data. There is no need to go far away to catch an example of how it happens one of the authors of this article is guilty of the fact that even knowing how to decipher and handle Nikon s camera tone curve recorded in the metadata, he still thinks that those curves are not very useful and consequently ignores them in the converter. The DNG format also lines up with the same tendency: the data tags added in the version 1.2 are intended to be used by Adobe programs in the first place.
The information industry have faced these problems many times already and have solved them by adopting the standards on data formats. Before adoption those standards were widely discussed by all parties, and in case of necessity they are revised; but again in accordance with a standard procedure. With RAW-formats adoption of the standard on metadata is a reasonable solution for the present situation. The standard should describe required and recommended metadata. Moreover, the encryption of required fields, that is those fields that are critical for baseline raw conversion ought to be explicitly prohibited. Yes, we need to agree upon what is that baseline , too.
We can discuss what tags should be included into metadata, as well as the structure of the fields (that is where the standardizing committee comes useful), though some of them are pretty obvious. One of these obvious fields, absence of which considerably slows down the work and leads to significant waste of time and resources, is the description of spectral characteristics of camera sensor. These characteristics are well-known to manufacturers and sometimes even have been published in the sensor spec sheets, - unfortunately in the form of curves of a rather general type instead of tables with exact values.
The absence of open information on colour characteristics of a camera results in the situation when to implement camera support one needs either to perform a number of tiresome, expensive and sometimes very approximate tests, or to choose the color visually, or to resort to data extracted from DNG. It takes time because with new cameras released initially the only converter available is the one offered by camera s manufacturer (sometime as a separate purchase). This converter not always (to put it mildly) suits users, sometimes because it does not provide the necessary quality and sometimes because it does not fit into the workflow adopted by a user or a big corporation.
For example, many design bureaus, photographic studios and pre-press bureaus use only Adobe suits. The support of new cameras with Adobe software might be delayed for months. As it was mentioned above, the push to DNG to a considerable extent is an attempt to impose the industry with Adobe s standard as the only one and thus to remove the problem of delayed support of new cameras, the problem which extremely irritates Adobe users.
Meanwhile DNG is not hopeless and is quite capable to become the basis for the standard. To achieve this it is quite sufficient to expand the list of metadata fields, to make non-prerequisite some of the fields required for Adobe conversion software to work, and to prohibit data encryption. The problem is that camera and software developers are extremely well-aware of the history of TIFF format and consequently are very cautious about Adobe initiatives. But it is still possible that authorizing some committee on standards to bring DNG to agreeable status might solve this problem.
As for the both approaches existing today insisting on opening of all the tags of tens (or even hundreds) of data formats without the description of the meaning of tags (like it is suggested by OpenRaw); or imposing the scanty single format (DNG) they are not serving the purpose of resolving industry problems. Both ways are in fact the paths to nowhere.
Recent comments