Bringing Blosc2 to Heel
There are many array libraries in the scientific and data ecosystem that provide native array types (Numpy, PyTorch, Zarr, h5py, PyTables, JAX, Blosc2) and an even larger list of those that ''consume'' these provided array types (Scikit-Learn, Parcels, Dask, Pillow, ScikitImage). Moroever, the division between the two groups is not very cleancut - PyTorch tensors are wrappers for NumPy arrays, and thus straddle the boundary between array provider and consumer.
Such a high degree of interdependency makes it crucial that array objects are portable between libraries - this means that the array objects must be standardised between libraries, but also that the libraries are equipped with a minimal set of functions that have the same names and signatures across the ecosystem, and that know how to ingest, produce and process the arrays. The ideal would be to be able to write code that works with arrays
import array_lib as xp
#
# Do array things with library
#
and then simply swap in any array library array_lib
and have the code run.
From this set of concerns has sprung an open-source effort to develop the array API standard, along with an extensive associated test suite, to drive the array ecosystem towards this holy grail of interoperability.
Blosc2 and the array API
Blosc2 has been developed with the array API in mind from an early stage, but it is only now that ironArray has been able to dedicate development time to integration efforts. While the standard and test suite are still evolving (the latest version was released in December 2024), it is sufficiently stable to form the basis for ironArray's work.