• Notice: Undefined index: href in oa_comment_preprocess_comment() (line 113 of /easy/www/tiffa-intranet/profiles/openatrium/modules/apps/oa_comment/oa_comment.theme.inc).
  • Notice: Undefined index: href in oa_comment_preprocess_comment() (line 113 of /easy/www/tiffa-intranet/profiles/openatrium/modules/apps/oa_comment/oa_comment.theme.inc).

KOST-Val

KOST-Val (link to official website) is an open source validator for different file formats (TIFF, SIARD, PDF/A, JP2, JPEG) and Submission Information Package (SIP).

Funtional Principle

KOST-Val complies with the following requirements.

  • TIFF validation: KOST-Val reads a TIFF file and uses JHOVE to validate the structure, the content, and ExifTool to validate the key properties such as compression, colour space, and multipage. These properties can be configured.
  • SIARD validation: KOST-Val reads a SIARD (eCH-0165 v1 ) file and validates the structure and the content.
  • PDF/A validation: KOST-Val reads a PDF or PDF/A file (ISO 19005-1 and 19005-2) and uses 3-Heights™ PDF/A Validator by PDF-Tools or PDF/A Manager by PDFTron to validate the structure and the content of the PDF file. KOST-Val organises the different error messages into main categories such as fonts, graphics, and metadata. KOST-Val supplies only a limited version from 3-Heights™ PDF/A Validator by PDF-Tools. Module J extracts (with iText) and validates the JPEG and JP2 images contained in the PDF file (depending on the configuration). It is also possible to configure whether the JBIG2 compression is accepted or not.
  • JP2 validation: KOST-Val reads a JP2 file (ISO 15444) and uses Jpylyzer to validate the structure and the content.
  • JPEG validation: KOST-Val reads a JPEG file (ISO 10918-1) and uses Bad Peggy to validate the structure and the content.
  • SIP validation: KOST-Val reads an SIP (eCH-0160 v1 as well as Swiss Federal Archives SFA v1 and v4 ) and validates the mandatory requirements of the SIP specification. The validated requirements are organised into groups such as folder structure, schema validation, and checksum validation. At the outset, a file format validation is performed.

The results (including information on inconsistencies and errors) are output for every step and written into a validation log. The validation steps are executed sequentially. Whenever possible the validation shall continue after an error has been detected in order to reduce the number of correction cycles.

KOST-Val functional principle

Third-party applications

KOST-Val uses unmodified components of other manufacturers by embedding them directly into the source code. Users of KOST-Val are requested to adhere to these components ‘terms of licence.

  • The TIFF validation module uses JHOVE and ExifTool and evaluates its output further.
  • For the PDF/A validation module PDF-A Manager or 3-Heights™ PDF/A Validator are used.
  • The JP2 validation module uses Jpylyzer and translates the failed tests into appropriate error messages (DE/FR/EN).
  • The JPEG validation module uses Bad Peggy and evaluates the error message "Not a JPEG file" further.
  • To extract the JPEG and JP2 images from PDF/A the iText library is used.
  • For the file format identification DROID is used. For performance and granularity reasons an own SignatureFile is used instead of the official PRONOM registry.

Comments

#1

We asked the PREFORMA consortium to publish Kost-Val in the list of already existing tools to validate file formats. You can see the it at http://www.preforma-project.eu/kost-val.html

#2

Does the file format validation tool,  mentioned here and on PREFORMA, use the principles of "language security"? (*)

- Does the language allow formal verification of the tool (eg. like verilog does)?

- Is the required computation power known to verify  a file?

langsec.org: (...) for complex input languages the problem of full recognition of valid or expected inputs may be UNDECIDABLE, in which case no amount of input-checking code or testing will suffice to secure the program (like, ingest). Many popular protocols and formats fell into this trap, the empirical fact with which security practitioners are all too familiar.