2D data table source files

In the dataset source folder folder, a subfolder 2D_datatables should be present. This is the root for a set of folders, each one describing an individual 2D data table, with the name of the folder serving as an identifier.

In each 2D data table folder, a file data.hdf5 should be present, containing the arrays of properties. (example file).

In addition, a yaml settings file should be present in the 2D data table folder (see 2D Datatable settings).

HDF5 source file structure

The source file data.hdf5 should be structured according to the HDF5 standard, and may contain the following arrays, which must be contained in the root of the HDF5 file:

Properties arrays
One or more arrays specifying properties of the 2D data table. Note that these arrays can be 3D but the first two dimensions should be row and column.
Column index 1D array
A 1D array listing the identifiers of all columns, in the order they are used in the properties matrices.
Row index 1D array
A 1D array listing the identifiers of all rows, in the order they are used in the properties matrices.

Only scalar builtin dtypes (ie not structured with fields or user-defined) or strings currently permitted for HDF5 arrays.

Example python HDF5 creation code:

import h5py
outfile = h5py.File(filename,'w', libver='latest')
call = outfile.create_dataset("call", (1000,10,2), dtype='i1')
call[:,:,:] = my_array_of_calls
allele_depth = outfile.create_dataset("allele_depth", (1000,10,3), dtype='i2')
allele_depth[:,:,:] = my_array_depth
quality = outfile.create_dataset("quality", (1000,10), dtype='i4')
quality[:,:] = my_array_of_quality
outfile.close()

We recommend using VCFNP for converting from VCF. See the VCF example for details of how to do this.