Dimensions
178 x 233 x 8mm
With the rise of the Python-NumPy stack for analysis, one area which is under-documented at the moment is that of storage for large scientific datasets. When this topic is discussed, it is usually within the context of the native data-archiving features in specific Python packages, for example, pandas.
While such packages may use open formats on the back end, no in-depth work currently exists covering the nuts-and-bolts, best practices, and pitfalls of dealing with gigabyte-to-terabyte-sized datasets from Python. This book aims to fill that gap in the market, by providing practical coverage of the use of HDF5 to archive and share binary data in Python.