• Notice: Undefined index: href in oa_comment_preprocess_comment() (line 113 of /easy/www/tiffa-intranet/profiles/openatrium/modules/apps/oa_comment/oa_comment.theme.inc).
  • Notice: Undefined index: href in oa_comment_preprocess_comment() (line 113 of /easy/www/tiffa-intranet/profiles/openatrium/modules/apps/oa_comment/oa_comment.theme.inc).
  • Notice: Undefined index: href in oa_comment_preprocess_comment() (line 113 of /easy/www/tiffa-intranet/profiles/openatrium/modules/apps/oa_comment/oa_comment.theme.inc).
  • Notice: Undefined index: href in oa_comment_preprocess_comment() (line 113 of /easy/www/tiffa-intranet/profiles/openatrium/modules/apps/oa_comment/oa_comment.theme.inc).
  • Notice: Undefined index: href in oa_comment_preprocess_comment() (line 113 of /easy/www/tiffa-intranet/profiles/openatrium/modules/apps/oa_comment/oa_comment.theme.inc).
  • Notice: Undefined index: href in oa_comment_preprocess_comment() (line 113 of /easy/www/tiffa-intranet/profiles/openatrium/modules/apps/oa_comment/oa_comment.theme.inc).
  • Notice: Undefined index: href in oa_comment_preprocess_comment() (line 113 of /easy/www/tiffa-intranet/profiles/openatrium/modules/apps/oa_comment/oa_comment.theme.inc).
  • Notice: Undefined index: href in oa_comment_preprocess_comment() (line 113 of /easy/www/tiffa-intranet/profiles/openatrium/modules/apps/oa_comment/oa_comment.theme.inc).

A Last Minute Proposal

  • Make TI/A a new file format while retaining compatibility with TIFF 6.0 (including technical notes)
  • Make ".tia" the standard filename extension, also allow ".tif" for compatibility reasons
  • Create/require a tag to indicate TI/A 1.0 conformity
  • Simplify radically by removing everything from TIFF 6.0, that is not needed now and in the foreseeable future
  • Do not keep features only because they were used in the past, we have TIFF for that
  • Allow lossless compression, but only the best one or two compression schemes, forbid all others
  • Require fixity (embedded checksums) as in FLAC and FFV1
  • As web browsers are essential today in archival and other workflows, make TI/A as web friendly as possible to encourage browser support

Comments

#1

Hi Marcus, thanks for your proposition. I only am able to answer after having published the final draft. So maybe there are some redundancies or overlappings.

I try to understand what you propose and give answers as best as possible.

1. You suggest to have a new file format based on the TI/A recommendation. This includes all requirements as in TIFF 6.0 Baseline plus the extensions, excluding "old JPEG" and including "new JPEG".

Question: Do you regard the options as in the final draft included? Assuming that yes, so far I see no problem, exept for the word "new file file format" (see last paragraph).

2. and 3. You propose a new filename extention ".tia".  and allow ".tif" for compatibility reasons. There will be a new tag, let's say "tiaConformity" with a flag for yes or no.

As a consequence existing or new TIFF readers/writers are will have to be extended with a feature that checks for conformity for the new file format and, if a file conforms to TI/A, sets the tiaConformity flag to yes and writes a new file with the extension ".tia".

This raises several questions: Will there be software vendors to integrate the new features? How many applications will accept/recognize the new file ending and the tiaConformity tag? And since most of the TIFF files that reside in archives today are already conform to TI/A in our understanding (at least that is what our evaluation has shown so far) is it necessary to rewrite billions of TIFF files? If and that is what our survey points out less than one percent of the files really need to be converted, why take an effort that would probably very expensive?

And if we did as you propose, what does this mean in real world? Let's figure this out. Assume an average image archive with 2 million TIFF files, consuming about 40 Terabytes. How does the conversion work? Either your application will write a new tia-file and deletes the original. Or your application will write a new tia-file and keeps the original which means you will need 40 Terabytes extra space. While the second procedure is more secure and if something goes wrong you have most likely still the original file, the first procedure, if you want to be on the save side, still requires you to make a (additional) backup copy of all of your files.

Either way this will be pretty much effort if you take into account that only the evaluation of roughly 4 million files in three swiss archives took several weeks. If you really rewrite files it will take much longer. Compared to converting only problematic files IMHO this approach seems inefficient.

4. To simplify radically was also our first approach. We are very comprehensive for it but we had to realize that it is not feasible, maybe also not necessary and maybe even dangerous.

It is not feasible because budgets in the cultural and heritage departments and institutions are very thight and nobody would be willing to spend the money needed to do a work that for nobody but some specialists would make any sense. It is not shiny as a new exhibition. Nobody sees the result so no politician will speak up for such a project.

Trying to aim at ideal world in archives means to raise the cost in an unproportional way. One example: Fax group 4 compression is lossless and well documented. In swiss archives 46 % of images are in Fax group 4 compression. We cannot find a single justified reason to ask for a conversion of 1.8 million files. Nobody would pay for it because it is a waste of money. But maybe here you would accept this compression as you point out in your point 6.

It is not necessary because most of the files are already in a state that conforms with TI/A. Many of the applied technologies are in wide use and well documented. Even many compression algorithms are well documented and although we do not recommend them they are still acceptable. The same applies for color models. It remains to say that in many cases color image files are already in the uncompressed and rgb state.

It is even dangerous if you take into account that the mass conversion would be processed without sufficient financial means. Keep in mind that file format conversion is not as simple as bitstream copying. If not done by experts, many things can go wrong (sometimes they even go wrong with experts). So in a mass conversion environment things can easily turn into a never ending story. IMHO it is better not to touch files if not absolutely necessary.

5. See above and below.

6. You proposed to allow only lossless compressions and forbid all others. We learnd to abstain from forbidding features and tags because it is unnecessary and contraproductive. Instead we recommend or do not recommend to use certain features. Or we declare a few things as really problematic. The reasons for that are either deprecated implementations (old JPEG) or badly or undocumented features (proprietary features or features related to raw files).

7. I have no opinion to this. Seems to be in one or the other way part of the bitstream preservation anyway.

8. Seems to be a new discussion. Seems also a bit contradictory to 4. because it would add features like interlace and lossy compression which you wanted to forbid.

To summarize what you propose here would mean very much effort for many archives.

There is one general statement I would like to repeat here. There will not be a new file format, for the reasons I explained in my post Timeline of TI/A and the surrounding conditions. Briefly summarized: we want the TI/A recommendation to become Part of the PDF/A ISO standard. Therefore we need the support of Adobe and Adobe doesn't want a new file format.

Hopefully this explains why we do not think the creation of a new file format is a good idea.

Best regards, Erwin

#2

Hi Erwin,

thanks for your remarks. As I am quite busy these days, my answers may take about a week.

Meanwhile allow me to ask you two questions:

Why do you care about what Adobe wants or thinks?

Why do you want to tie the TI/A file format to the totally unrelated PDF/A standard?

Best regards, Marcus

#3

Dear Marcus

First point is, Adobe bought TIFF from Aldus in 1994 and was very much involved in its development. Adobe also owns TIFF as a trade mark. If we touch on TIFF we have to care what Adobe thinks. 

Second point is, ISO standards are industry standards and and Adobe is one of the biggest players in this market. If we want the TI/A recommendations to become an ISO standard or part of it, I is reasonable to cooperate with the industry partners that are influencial in the ISO process.

To establish TI/A as part of PDF/A is partly because of the A or archive in the name and the intention of the format, partly because it is the most easy way to establish it as an ISO standard.

I hope this explains.

Best regards, Erwin

#4

Dear Erwin,

We do owe first Aldus and then Adobe gratitude for developing the TIFF file format and then maintaining it for many years.

But now, it has become a legacy file format for them.

It is however still in use by archives, libraries and many others, even for new files. And it is highly trusted.

Since Adobe obviously has other plans, someone else has to maintain the file format, otherwise it will slowly die.

If Adobe owns TIFF as a trade mark and we don't get permission to use it, we need to fork it under a new name, like TI/A or TIA.

As there might be other rights involved, only a specialised lawyer will know, if we need Adobe's consent to do that.

By the way, PDF/A is (among other things) a competing image file format. There may be other options than joining them.

More next week.

Best regards, Marcus

#5

Dear Erwin,

Sorry for the delayed response.

1. My proposal is a call for a new enhanced yet compatible file format based on TIFF under a new name. (I would however rather call it TIFF 7.0 and have Adobe join the project, if that were possible.)

2. and 3. While TIA 1.0 files would also be valid TIFF 6.0 files, no existing applications could create them, until TIA support would be added to libraries and applications. Until then, a TIFF to TIA converter would be needed.

Applications existing now would recognize TIA 1.0 files as TIFF 6.0 files and would be able to open and modify them without knowing anything about the proposed TIA file format. So there would be no risk for early adopters.

A new file format would force no institution to convert existing TIFF files. If, when and how is entirely up to them and their adopted policies.

4. As TIFF 6.0 support in existing applications is not likely to vanish anytime soon, the TIA 1.0 file format could (and should) safely drop any unneeded and unwanted features. Obsolete compression schemes are obvious candidates.

6. As (following my proposal) all TIA 1.0 files were valid TIFF files, while no now existing TIFF files were valid TIA files, there would be no basis for this kind of concerns anyway.

7. I regard embedded fixity as a major enhancement in addition to existing bitstream preservation measures or where no such measures are in place.

8. I cannot see any contradiction. Web friendly does not necessarily mean lossy.

Regarding having a new file format or not: By standardizing what is a mere validation policy, you are going to miss the chance for certain improvements. So I ask you to consider having both, a new TIFF compatible file format and recommendations regarding existing TIFF files.

Regarding the archive I work for: We have no plans to stop using TIFF (compatible) files in the foreseeable future and yes, that includes new files also.

Does this answer your questions?

Best regards, Marcus

#7

Hi Marcus, hi all

It is obvious, there is need for a stable archive format. Your last minute proposal post was quite clear. I did some more research about tiff and the libtiff and the deeper I dig, the more doubt rises about tiff as the approprate file format.

In the mid-1980s the TIFF file format  came into existence as an effort to agree on a common image file format, that should replace a multitude of proprietary formats. In the beginning TIFF only covered binary or bilevel images. A pixel was either black or white which was enough to serve desktop scanners. Later on TIFF grew to serve for grayscale images, then for color images.

A first version of a TIFF specification was published by Aldus coorporation in 1986, after two minor draft releases, thus it can be labeled as Revision 3.0. Together with Revision 5 in 1988 support for palette color images and LZW compression was added.

Meanwhile Sam Leffler, working for Pixar wrote libtiff, a C library for reading and writing images in the format TIFF. Leffler’s intent was to give this piece of software away for free, no matter what people did with it, even if they turned it into a product. Libtiff is meanwhile available in version 4.07.

Through the years TIFF has become a flexible and adaptive file format that handles many different color systems from plain rgb to YCrCb and CIELa*b*, as well as uncompressed as well as compressed formats including PackBits, RLE, LZW and even JPEG compression shemes. TIFF images can be written as strips as well as tiles and they can contain layers and multiple images.

In 2004 the original libtiff website (libtiff.org) was hijacked after it disappeared in 2003 due to ISP problems. The site under libtiff.org contains outdated software and the information and links provided is outdated or incorrect. Until recently the official site could be reached under www.remotesensing.org/libtiff. Since September 2016 the site is hosted at www.simplesystems.org/libtiff. A download site is provided by osgeo.org. Libtiff at present is maintained by Frank Warmerdam, Andrey Kiselev, Bob Friesenhahn, Joris Van Damme, Lee Howard and Even Rouault.

Until 2007 TIFF was strictly 32-bit restricted which means file size beyond 4 GB was not possible. Thanks to people like Frank Warmerdam, Joris Van Damme and others there is now a 64-bit version of TIFF (see www.awaresystems.be), that is mainly used for geo imaging. This version of TIFF, that usually cannot be opened by photography oriented TIFF readers (Photoshop and the like) is called BigTIFF.

A drawback of TIFF is the implementation of (proprietary) tags and offsets. Incorrect offsets can lead to security leaks. There have also been exploits with UTF8 data written in the document name tag, which led to buffer overflows and giving a malevolent hacker the possibility to take over a computer system.

How does this fit into the needs of memory institutions and archives that hold large amounts of images which are usually in the format TIFF – or at least have a file ending of .tif which is definitifely the same?

Is TIFF the right decision for sustainable long term achiving of digital images of the future? For existing tiff data, the answer might well be yes, because nobody will perform and pay for a preventive format migration. But what happens to image data in the future? A new format? Which one will it be? The discussion is opened!

Best, Erwin

#8

Hi Erwin,

I try to correct some myths:

* bigTIFF is a TIFF: This is not true. The fileformat uses a structure *like* TIFF in same way as DNG does.

* proprietary tags and offsets can lead to security leaks. This is not true. A baseline TIFF reader can still ignore these tags (or a tool could remove these), therefore no security problems are triggered. Offsets could be used in a creative way (looping), but it could easily be fixed in TIFF libraries by tracking used offset adresses. Also UTF8 is no problem at all, because TIFF spec says: 7Bit ASCII. If you use UTF8 in ASCII fields it is a spec violation, too.

* the hosting site has been changed. This is actually a problem, because old information can be found on previous sites. But it is quite common for projects (libTIFF) of this age to have their URLs changed from time to time.

Anyway, the TIFF format is very elegant and fits the needs of cultural heritage in general. Still, some features for digital preservation are missing. In my opinion, for a TIFF 7.0 (or TI/A) we should add protection for offset errors and especially for StripOffsets (see http://kulturreste.blogspot.de/2016/11/some-thoughts-about-risks-in-tiff... for details). This could be easily solved by defining TI/A as a sharpened and restricted subset of TIFF6.0 spec, I think.

However, in summary the TIFF format is old, but it was well defined and is easy to understand. It can be very robust if you forbid compression and add offset protection.

#9

Hi Andreas

Well I totally agree, but I also totally disagree with you.

BigTIFF is not TIFF, because it has a different header and also magic number 43 instead of 42. But why the heck does it have a file ending on .tif? Should I call it a good way to create myths?

Using UTF8 in the DocumentName is clearly a violation of the specs. But libtiff doesn't prevent you from doing so. The vulnerablity was there from libtiff 3.4 up to 3.82. So it was there for several years which means there are many images out there that contain the bug (see: https://vuldb.com/de/?id.2303). So converting tiff to pdf is a risk as you can see in the before mentioned link.

Furthermore libtiff.org still points to outdated source files up to 3.8.0. Also the cvsroot is outdated and I do not think that this is just an error. The world isn't the same as in the 1980s where there were only a few enthusiasts in the internet.

Unfortunately this is a programmers discussion. For endusers this is just confusing. But it also should raise awareness that TIFF although a very simple format at first sight is in fact very complex.

Somehow we have to deal with these old TIFF files in the archives and your blogpost is a huge contribution to this task, but my point of view is that for the future TIFF is no solution.