• Notice: Undefined index: href in oa_comment_preprocess_comment() (line 113 of /easy/www/tiffa-intranet/profiles/openatrium/modules/apps/oa_comment/oa_comment.theme.inc).
  • Notice: Undefined index: href in oa_comment_preprocess_comment() (line 113 of /easy/www/tiffa-intranet/profiles/openatrium/modules/apps/oa_comment/oa_comment.theme.inc).
  • Notice: Undefined index: href in oa_comment_preprocess_comment() (line 113 of /easy/www/tiffa-intranet/profiles/openatrium/modules/apps/oa_comment/oa_comment.theme.inc).
  • Notice: Undefined index: href in oa_comment_preprocess_comment() (line 113 of /easy/www/tiffa-intranet/profiles/openatrium/modules/apps/oa_comment/oa_comment.theme.inc).
  • Notice: Undefined index: href in oa_comment_preprocess_comment() (line 113 of /easy/www/tiffa-intranet/profiles/openatrium/modules/apps/oa_comment/oa_comment.theme.inc).
  • Notice: Undefined index: href in oa_comment_preprocess_comment() (line 113 of /easy/www/tiffa-intranet/profiles/openatrium/modules/apps/oa_comment/oa_comment.theme.inc).

TI/A and Compression

Basically we propose for TI/A to recommend no compression. The reason is simplicity. Uncompressed images are more easy to render. And they are in wide use.

Based on the survey of KOST (see below) we can identify that the following compressions are not in wide use:

2: CCITT 1D

3: Fax Group 3

6: old JPEG

7: JPEG

8: Adobe Deflate

9: JBIG bw

10: JBIG color

32773: PackBits

>32766: others, proprietary

There will be a risk that this compression will not be rendered in the future.

Compression 4: Fax Group 4 is in wide use. Or at least, there are so many files around that there will be readers around, that can render them.

So while the first compression is not problematic, the last group is not problematic because of wide use. The others in between should be converted sooner or later.

compressed-images-swiss

Comments

#1

To compress or not to compress at the file format level is somehow a disputed area within digital preservation.

A fundamental assumption in digital preservation is to keep things simple in the sense that added (format dependend) complexity creates a burden to maintain (re-)usability or creation of access derivatives.

In the case of images:
(_Lossless_) Compression per se is (imho) not a problem for the digital archive as bit-stream preservation is a strict requirement but can be realised by compressing the AIP as a unit independently from the chosen archival formats (although compression ratio might not be as good).

For storage intensive material like moving images one might choose a more efficient compressor if it is risk-assessed and documented in policy.

Marco Klindt, Zuse-Institute Berlin (ZIB)

#2

Excuse the chiming in by a lurker from outside the "document archival" world; my experience is primarily in computer graphics and film, and as the maintainer of the OpenImageIO project. I deal with TIFF files all the time and I see Deflate compression used in the overwhelming majority of TIFF files I come across. Very rarely do I encounter an uncompressed TIFF file, and when I do, it's almost always the result of user error.

So my questions for you to ponder are:

1. Is the KOST dataset perhaps not representative of the overall landscape of TIFF use, leading you to underestimate the prevalence of (and preference for) compression?

2. Even if it is representative of document archival -- and assuming that's the primary or sole concern/constituency of this group -- what are the pros/cons of settling on uncompressed-only when TIFF usage, software, and expectations might be radically different in other fields that are also concerned with image archival?

To spell it out a bit more explicitly, although I am totally sympathetic to the rationale for avoiding compression (simplicity, limiting the damage from storage errors, etc.), I think you should carefully weigh that against the ongoing headache you will encounter living in an overall ecosystem where the majority of files, software, and user expectations will not be TI/A compliant.

Larry Gritz (OpenImageIO)

#4

Hi Larry,

thanks a lot for YOur input! I think it really depends on the type of archive. As I understand You're coming more from the world of computer graphics, CGI etc. There I can imagine that compression is a) more effective and b) much nore widespread.

"Our" archives are primarely archives which preserve cultural heritage in the sense of still photographs of artifacts, (such as paintings, photographs in itself, pages of medieval manuscripts, historic documents or maps etc) There lossy compression was and still is regarded as a "sacrilege". For this type of archives I thnk the sample of "KOST" is very representative – as I agree with You that in other important domains (where You're more in) compression is looked at completely defferent (and with good reason).

#5

In film production ("live action" as well as "CGI"), we are similarly concerned that photographs, artwork, and films are preserved and represented in the full original fidelity that they were captured or created.

We care about image quality and we would never, NEVER use lossy compression. But lossless compression is both essential and ubiquitous.

What I'm saying is that the world is filled with software that writes (often by default) losslessly-compessed TIFF files. So if TI/A tries for a standard that is a TIFF subset that is uncompressed only -- although that totally makes sense in an academic principles sense -- I fear it represents a source of unending future aggravation, a never-ending battle of explaining to users that useful applications A, B, and C are to be avoided because they can so easily make non-compliant files, and a constant battle with TI/A-compliant software that annoys users by being unable to open most TIFF files in some archives.

The subtlety is that some of the compression methods that TIFF allows truly are rarely used, nobody will miss them, and we're better off throwing them out in order to make an archival subset that is simpler and safer. Maybe somewhere there is a community of people for whom packbits or jpeg-in-tiff is actually useful, but so far (as a maintainer of a popular open source package for reading images) I've never encountered either of those in the wild, aside from TIFF compliance test suites. Those kinds of compression should be abandoned, for sure. And certainly any lossy compression methods should be eschewed on philosophical grounds that they are inappropriate for any archival purpose (whether it's a scan of an ancient manuscript, or frames from a superhero film). But the lossless compression methods of zip (deflate) and lzw are in fact very widely used, and very easily/commonly/solely produced by a good deal of software, notwithstanding the fact that the KOST survey makes them look virtually unused.

I don't really have a dog in this TI/A fight, as they say. I'm not a partisan on either side. I'm just here to say that I see a lot of TIFF files in my field and others, and the KOST data doesn't reflect what I see. If the justification for dropping compression is partly based on "nobody uses that anyway", it is flawed.

Larry Gritz (OpenImageIO)

#6

Hi Larry,

I totally agree with You! LZW (as other lossless compression schemes) should be allowed. Some archives solely concernd with still images didn't use LZW since it's efficiency is not to good (compression ratio) on gray value or color images, especially if these image contain some inherent noise (e.g. film grain  etc.). In some cases  the file was larger after compression than before ;-) But in sometimes it decreases the size considerably. And as we all know, size matters! The archiving cost is still related to the amount of data You have to migrate in each migration cycle. If it's possible to reduce the size by some "mild" compression (even if lossy), this is better than having perfect images but no money to preserve them.

Thus I agree totally with You that we have to be pragmatic!

P.S: I looked at OpenImageIO: very impressive!!! What library are You using for JPEG2000? Or did YOu write Your own JPEG2000 handler?

#7

Working at an imaging-toolkit vendor I've run into plenty of JPEG-compressed TIFFs over the years, both new and old style. Microsoft, bless their tiny little heads, had code in their products for *years* that wrote only old-style JPEG-in-TIFF. Badly, though far from the worst.
So I have to point out a consequence, and I apologize if this has been discussed to death already:
if lossy compression is forbidden, then the only way to archive lossily-compressed images will be to decompress them into an uncompressed and/or losslessly-compressed form and then archive that. The result will be images stored in an unambiguously lossless format, that take up the space of lossless images, but show clear signs of damage and loss from compression. All the metadata associated with that damage, such as DCT tables, will be lost.
Probably also any tags describing the original writer?
In this case, there is no saving in space, nor is there gain in quality or protection against damage - the damage has already been done, and its origins are simply concealed.

-spike _/\_
Spike McLarty