Btune: Making Compression Better

Btune guiding the optimization of compression params

What It Is?

Btune is a Blosc2 plugin that finds the optimal compression parameters for your datasets. It is available in three tiers:

Genetic (Btune Community): Uses a genetic algorithm to test parameter combinations and find the best settings for your dataset. This is ideal for personal use. For a graphical visualization, click the image on the right.
Trained (Btune Models): We use your sample data to train a neural network model. Btune then uses this model to predict the best parameters for similar datasets. This is best for workgroups with a limited variety of data.
Fully managed (Btune Studio): A license to our training software allows you to create your own models on-site for unlimited datasets. This is best for organizations that need to optimize for a wide range of data.

Why Btune?

Data compression involves a fundamental trade-off between compression ratio and speed. Finding the optimal balance for Blosc2 is challenging due to its many parameter combinations (codec, level, filter, etc.), often requiring slow, manual trial and error. Btune automates this discovery process, quickly identifying the best settings for your specific use case.

For example, high-speed data acquisition prioritizes fast compression, while frequently accessed datasets benefit from fast decompression. Btune helps you optimize for what matters most.

The following figures illustrate these trade-offs for different codecs and filters using chunks of weather data:

And here, the different codecs and filters are compared in terms of compression ratio:

With Btune, you can find the optimal combination of compression parameters (aka in the Pareto front) for your datasets, allowing you to achieve the best possible compression ratio and speed for your specific needs.

How To Use

Ready to optimize your compression? Getting started with Btune is simple. Install the plugin directly from PyPI:

pip install blosc2-btune

This single plugin supports both Btune Community and Btune Models. For detailed instructions, check out the Btune README. To use Btune Studio, you will need additional software for on-site model training; please contact us to get set up.

Currently, the Btune plugin is available for Linux and macOS on Intel architectures, with support for more platforms coming soon.

Explore our hands-on tutorials to see Btune in action:

To complete the Studio tutorial, you will need additiona

What's in a Model?

A neural network is a computational system that learns from data. During a "training" process, it is fed many examples and adjusts its internal parameters to make accurate predictions. Once trained, it can quickly predict outcomes for new, unseen data.

In the context of Btune, a "model" is the output of this training process, saved as a set of small files (in JSON and TensorFlow format). You can place these files anywhere on your system for Btune to use. Btune leverages this model to instantly predict the best compression parameters for each chunk of your data. This rapid prediction is ideal for optimizing compression on the fly while handling large datasets.

A Starry Example

The figures below illustrate Btune's optimization for decompression performance on a 7.3 TB subset of the Gaia dataset. The first image shows the most predicted codec and filter combinations for this task.

Most predicted codecs/filters for decompression

Next, we see the slicing speed (in GB/s) achieved when using these combinations. Higher values are better.

Slicing speed for different codecs/filters

The results show that BloscLZ (level 5) and Zstd (level 9) are the fastest. Since their performance is not heavily dependent on the number of threads, they perform well even on machines with fewer CPU cores.

Finally, the last figure compares the resulting file sizes (in GB). Lower values are better.

In this case, the trained model recommends Zstd (level 9) for a good balance between file size and decompression speed. While adding the BitShuffle filter achieves the highest compression ratio, it is not recommended for general use.

For more details, see our paper for SciPy 2023 (slides). The data and scripts are also available on GitHub.

Licensing

Btune is available in three tiers to suit different needs:

Btune Community: Free for personal use. If you find it valuable, please consider donating to the Blosc project.
Btune Models: A commercial license for workgroups that provides pre-trained models for optimized performance.
Btune Studio: A commercial license that includes our training software, giving you full control to create your own models on-site.

Pricing

Visit our pricing page for more information on the different licensing options available for Btune.

Testimonials

Blosc2 and Btune are fantastic tools that allow us to efficiently compress and load large volumes of data for the development of AI algorithms for clinical applications. In particular, the new NDarray structure became immensely useful when dealing with large spectral video sequences.

-- Leonardo Ayala, Div. Intelligent Medical Systems, German Cancer Research Center (DKFZ)

Btune is a simple and highly effective tool. We tried this out with @LEAPSinitiative data and found some super useful spots in the parameter space of Blosc2 compression arguments! Awesome work, @Blosc and @ironArray teams!

-- Peter Steinbach, Helmholtz AI Consultants Team Lead for Matter Research @HZDR_Dresden

Contact

If you are interested in Btune and have any further questions, please contact us at contact@ironarray.io.

What It Is?​

Why Btune?​

How To Use​

What's in a Model?​

A Starry Example​

Licensing​

Pricing​

Testimonials​

Contact​