BinaryCIF and CIFTools-Lightweight, efficient and extensible macromolecular data management

Investor logo

Warning

This publication doesn't include Institute of Computer Science. It includes Central European Institute of Technology. Official publication website can be found on muni.cz.
Authors

SEHNAL David BITTRICH Sebastian VELANKAR Sameer KOČA Jaroslav SVOBODOVÁ Radka BURLEY Stephen K. ROSE Alexander S.

Year of publication 2020
Type Article in Periodical
Magazine / Source PLoS Computational Biology
MU Faculty or unit

Central European Institute of Technology

Citation
Web https://doi.org/10.1371/journal.pcbi.1008247
Doi http://dx.doi.org/10.1371/journal.pcbi.1008247
Keywords Structural Biology; Molecular Graphics; Data Curation
Description 3D macromolecular structural data is growing ever more complex and plentiful in the wake of substantive advances in experimental and computational structure determination methods including macromolecular crystallography, cryo-electron microscopy, and integrative methods. Efficient means of working with 3D macromolecular structural data for archiving, analyses, and visualization are central to facilitating interoperability and reusability in compliance with the FAIR Principles. We address two challenges posed by growth in data size and complexity. First, data size is reduced by bespoke compression techniques. Second, complexity is managed through improved software tooling and fully leveraging available data dictionary schemas. To this end, we introduce BinaryCIF, a serialization of Crystallographic Information File (CIF) format files that maintains full compatibility to related data schemas, such as PDBx/mmCIF, while reducing file sizes by more than a factor of two versus gzip compressed CIF files. Moreover, for the largest structures, BinaryCIF provides even better compression-factor ten and four versus CIF files and gzipped CIF files, respectively. Herein, we describe CIFTools, a set of libraries in Java and TypeScript for generic and typed handling of CIF and BinaryCIF files. Together, BinaryCIF and CIFTools enable lightweight, efficient, and extensible handling of 3D macromolecular structural data.
Related projects:

You are running an old browser version. We recommend updating your browser to its latest version.

More info