HDF5 documents and links
Introduction to HDF5
HDF5 User's Guide |
HDF5 Reference Manual
HDF5 Application Developer's Guide
HDF5 Release 1.8.0 represents a major update to the HDF5 Library, utilities, and file format. The HDF5 development team has attempted to provide new capabilities and improve performance while retaining compatibility with previous releases.
The new features are briefly described below, but first a few words regarding the compatibility solutions.
When new features and optimizations are introduced, as is certainly the case in this release, there is always the risk of creating compatibility problems. These problems can arise either with an application that must be ported to the new release (or cannot be ported, for any of a number of reasons), with applications based on a prior release that must read files created by the new release, or with files created by an older release that must work with an application based on the new release. The HDF5 team has made a concerted effort to provide a full range of compatibility solutions, hopefully addressing all of the situations a user or application is likely to encounter.
Interface — Backward and Forward API Compatibility:
This release contains many new features and related API routines, but at the same time attempts to provide stability for applications by continuing to make existing API routines available and by operating in a backwardly compatible manner, whenever possible.
API Compatibility Macros in HDF5 discusses the specifics of API compatibility and configuration options with respect to new features.
Format — Backward and Forward Format Compatibility:
The HDF5 Library Release 1.8.0 reads all existing HDF5 files, from this or any prior release. Although this release contains features that require additions and/or changes to the HDF5 file format, by default this release will write out files that conform to a “maximum compatibility” principle. That is, files are written with the earliest version of the file format that describes the information, rather than always using the latest version possible. This provides the best forward compatibility by allowing the maximum number of older versions of the library to read files produced with this release.
If library features are used that require new file format features, or if the application requests that the library write out only the latest version of the file format, the files produced with this version of the library may not be readable by older versions of the HDF5 library.
New Features in HDF5 Release 1.8.0 and Backward/Forward Format Compatibility Issues discusses the new features in the release from the point of view of their impact on format comaptibility.
New features are briefly described in this section. Further, instructional example codes for several of these features are provided here:HDF5 Reference Manual, there has not been time yet to describe all of them in the HDF5 User’s Guide.
Tunable properties enable the creation of files selectively compatible with older HDF5 applications and libraries. This feature enables the library, and thus an application, to create files that can be read by specific older HDF5 libraries and tools and by applications that those same use libraries.
This is accomplished with the function H5Pset_libver_bounds, which sets the lower and upper bounds on allowable formats. The lower bound is determined by specifying the earliest library whose format may be used for an object; the upper bound is determined by specifying the latest library whose format may be used for an objects.
The function H5Pget_libver_bounds can be used to retrieve the current settings.
For groups with only a few links, compact link storage allows groups containing only a few links to take up much less space in the file.
On the other hand, an improved implementation of indexed link storage provides a faster and more scalable method for storing and working with large groups containing many links.
The threshold for switching between the compact and indexed storage formats is configurable according an application’s or a user community’s expected use cases using the function H5Pset_link_phase_change.
The function H5Pget_link_phase_change can be used to retrieve the current settings.
External links allow a group to include objects in another HDF5 file and enable the library to access those objects as if they are in the current file. In this manner, a group may appear to directly contain datasets, named datatypes, and even groups that are actually in a different file. This feature is implemented via a suite of functions that create and manage the links, define and retrieve paths to external objects, and interpret link names:
The user-defined link feature enables the definition of customized types of links that meet specific community or application needs. This feature is implemented via a suite of functions that define, create, register and unregister the link types:
Links in a group can now be explicitly tracked and definitively indexed by the order in which they are created, enabling systematic iteration and lookup of links by creation order. This complements the already-existing alphanumeric-by-name capability.
New link APIs enables greater flexibility in the creation and management of links in an HDF5 file. The H5L routines allow links to be managed and manipulated more like objects in the HDF5 data model and provide detailed control of linking behavior.
The Attribute interface (H5A) includes several new functions for attribute management. When large numbers of attributes are attached to a single object, new functionality enables faster access and allows those attributes to be stored in much less space in the file.
Attributes can now be tracked and indexed on the order in which they are created, enabling iteration and lookup of attributes by creation order as well as alphanumeric order by name.
To conserve space in an HDF5 file, large header messages that are used repeatedly in the file can be designated as shared.
A shared object header message (SOHM) is written only once in a file then a pointer is inserted instead of the message itself on each object to which the header message would otherwise be attached. This can be particularly valuable when, for instance, an identical attribute is applied to tens of thousands of objects. (Note that there is will be no advantage if the attribute itself is smaller than the pointer would be.)
This feature is implemented via a suite of functions that set up SOHM tracking and indexing and manage the thresholds for switching between shared and non-shared messages:
UTF-8 Unicode encoding is supported for strings in datasets, the names of links, and the names of attributes.
Meta data caching enhancements boost performance with certain types of files and enable configurable meta data cache management and monitoring.
A suite of functions is provided to set and review the meta data cache configurations, to review and reset hit rate statistics, and to retrieve the current cache size:
See “Metadata Caching in HDF5” in the HDF5 User’s Guide for further information.
Rather than having to step through a hierarchy creating groups one at a time, intermediate groups that do not yet exist can now be created when creating or copying an object in a file.
See Creating Missing Groups (PDF) for further information.
With this feature, an object in an HDF5 file can easily be copied to a new location within the current file or to a specified location in another HDF5 file. This is accomplised at a low-level in the HDF5 file, allowing entire group hierarchies to be copied quickly and compressed datasets to be copied without going through a decompression/compression cycle.
A suite of functions is provided to manage copy properties and o perform the copying operation:
A command-line tool, h5copy, is also provided to enable copying objects without having to create an application. (Note: h5copy is not yet documented, but entering 'h5copy --help' on the command line will provide basic information.)
Three new functions have been added to enhance the object information that can be retrieved.
In each case, the function returns object information in a customized struct. For example, H5Lget_info returns the link type while H5Gget_info returns the number of links in the group.
Anonymous object creation enables the creation and management of objects in a file independently of the links that integrate those objects into the file structure.
The above routines are used in conjunction with the Link and Object interfaces discussed elsewhere (H5L and H5O, repectively).
A new object API enables greater flexibility in the creation and linking of objects in an HDF5 file.
Š User-defined Datatype Conversion Callback Functions: Revised Datatype Conversion Exception Handling – It is now possible for an application to have greater control over exceptional circumstances (range errors, etc.) during datatype conversion.
Š Integer-to-Floating-point Conversion Support – It is now possible for the HDF5 library to convert between integer and floating-point datatypes.
See H5Tconvert in the HDF5 Reference Manual.
Š Datatype and Dataspace Serial Conversion – Routines have been implemented to serialize/deserialize HDF5 datatypes and dataspaces. These routines allow datatype and dataspace information to be transmitted between processes or stored in non-HDF5 files.
Š Two-way Conversion Between Datatype and Text Description of Datatype – This feature enables the creation of a datatype from a text definition of that datatype and the creation of a formal text definition from a datatype. The text definition is in DDL format; DDL definitions of HDF5 datatypes can be found in the “DDL in BNF for HDF5.”
H5LTtext_to_dtype creates an HDF5 data type based on the text description and returns the data type identifier. Given a datatype identifier, H5LTdtype_to_text creates a DDL description of the datatype.
Also see “Conversion Between Text and Datatype.”
Š New Compression Filters – These new I/O filters allow better compression of certain types of data:
o N-Bit Filter – This filter compresses data which uses N-bit datatypes. See H5Pset_nbit in the HDF5 Reference Manual and the section “Using Filters / N-bit” in the “Datasets” chapter of the HDF5 User’s Guide.
o Scale+Offset Filter – This filter compresses scalar (integer and floating-point) datatypes which stay within a range. See H5Pset_scaleoffset in the HDF5 Reference Manual and the section “Using Filters / Scale-Offset” in the “Datasets” chapter of the HDF5 User’s Guide.
Š Collective Chunk I/O in Parallel – The library now attempts to use MPI collective mode when performing I/O on chunked datasets when using the parallel I/O file driver.
Š Arithmetic Data Transform on I/O – This feature allows arithmetic operations (add/subtract/multiply/divide) to be performed on data elements as they are being written to/read from a file. See H5Pset_data_transform in the HDF5 Reference Manual.
Š C++ API Wrapper Improvements – Several improvements were made to the C++ build infrastructure, as well as adding support for previously missing and new API routines.
Š FORTRAN API Wrapper Improvements – Several improvements were made to the FORTRAN build infrastructure, as well as adding support for previously missing and new API routines.
Š New Packet Table and Dimension Scale High-Level APIs have been added to the high-level C interfaces.
The Packet Table API (H5PT) is designed to allow variable-length records to be added to tables easily.
The Dimension Scale API (H5DS) allows dimension scales to be created in HDF5 and attached to HDF5 datasets. Also see “HDF5 Dimension Scale Specification and Design Notes” (PDF).
Š Tool Improvements – Three new tools have been added, and existing tools were enhanced:
o h5mkgrp is a new command-line tool that creates a new group in an HDF5 file. It is described in the next section, with other features that may change.
o h5stat (PDF) enables the analysis of an HDF5 file in various ways to determine useful statistics regarding the objects in the file, such as the numbers of objects per group, the sizes of datasets, the amount of free space in the file, etc.
o h5copy makes a complete copy of an object in an HDF5 file as a new object in that HDF5 file or as a new object in a different HDF5 file. (The h5copy tool is not yet documented, but entering 'h5copy --help' on the command line provides basic information.)
o Improved speed of h5dump – Performance improvements have been made to h5dump to speed it up when dealing with files that have large numbers of objects.
Š “NULL” Dataspace – A new type of dataspace, which allows datasets and attributes without any elements to be described.
See H5Screate in the HDF5 Reference Manual.
Š Extendible Identifier API – A new set of identifier management routines has been added, which allow an application to use the HDF5 identifier-to-object mapping routines.
Š Enhanced Error Handling – A new set of error API routines has been added, which allow an application to integrate its error reporting with the HDF5 library error stack.
In the HDF5 Reference Manual, see the error stack APIs. Also see the supporting document “Unified Error Reporting for HDF5 and Client Libraries.”
Š Better UNIX/Linux Portability – This release now uses the latest GNU “auto” tools (autoconf, automake, and libtool) to provide much better portability between many machine and OS configurations. Building the HDF5 distribution can now be performed in parallel (with the gmake “–j” flag), speeding up the process of building, testing and installing the HDF5 distribution. Many other improvements have gone into the build infrastructure as well.
HDF5 documents and links
Introduction to HDF5
HDF5 User's Guide |
HDF5 Reference Manual
HDF5 Application Developer's Guide