Dataset class#

A dataset is a Blosc2-encoded file on a root repository (thus a File) representing either a flat string of bytes or an n-dimensional array.

class caterva2.Dataset(root, path)#

Bases: File

Attributes:
blocks

The blockshape of the compressed dataset.

chunks

The chunkshape of the compressed dataset.

dtype

The data type of the dataset.

shape

The shape of the dataset.

vlmeta

Returns a mapping of metalayer names to their respective values.

This is used to access variable-length metalayers (user attributes) associated with the file.

>>> import caterva2 as cat2
>>> client = cat2.Client('https://demo.caterva2.net')
>>> root = client.get('example')
>>> file = root['ds-sc-attr.b2nd']
>>> file.vlmeta
{'a': 1, 'b': 'foo', 'c': 123.456}

Methods

append(data)

Appends data to the dataset.

concatenate(srcs, dst, axis)

Concatenate the file with srcs along axis to a new location dst.

copy(dst)

Copies the file to a new location.

download([localpath])

Downloads the file to storage.

get_download_url()

Retrieves the download URL for the file.

move(dst)

Moves the file to a new location.

remove()

Removes the file from the remote repository.

slice(key[, as_blosc2])

Get a slice of a File/Dataset.

stack(srcs, dst, axis)

Stack the file with srcs along new axis to a new location dst.

unfold()

Unfolds the file in a remote directory.

Special Methods:

__init__(root, path)

Represents a dataset within a Blosc2 container.

__getitem__(item)

Retrieves a slice of the dataset.

Constructor#

__init__(root, path)#

Represents a dataset within a Blosc2 container.

This class is not intended to be instantiated directly; it should be accessed through a Root instance.

Parameters:
  • root (Root) – The root repository.

  • path (str) – The path of the dataset.

Examples

>>> import caterva2 as cat2
>>> client = cat2.Client('https://demo.caterva2.net')
>>> root = client.get('example')
>>> ds = root['ds-1d.b2nd']
>>> ds.dtype
'int64'
>>> ds.shape
(1000,)
>>> ds.chunks
(100,)
>>> ds.blocks
(10,)

Utility Methods#

__getitem__(item)#

Retrieves a slice of the dataset.

Parameters:

item (int, slice, tuple of ints and slices, or None) – Specifies the slice to fetch.

Returns:

The requested slice of the dataset.

Return type:

numpy.ndarray

Examples

>>> import caterva2 as cat2
>>> client = cat2.Client('https://demo.caterva2.net')
>>> root = client.get('example')
>>> ds = root['ds-1d.b2nd']
>>> ds[1]
array(1)
>>> ds[:1]
array([0])
>>> ds[0:10]
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
append(data)#

Appends data to the dataset.

Parameters:

data (blosc2.NDArray, numpy.ndarray, sequence) – The data to append to the dataset.

Returns:

The new shape of the dataset.

Return type:

tuple

Examples

>>> import caterva2 as cat2
>>> import numpy as np
>>> # To append data to a dataset you need to be a registered user
>>> client = cat2.Client("https://cat2.cloud/demo", ("joedoe@example.com", "foobar"))
>>> data = client.copy('@public/examples/ds-1d.b2nd', '@personal/ds-1d.b2nd')
>>> dataset = client.get('@personal')['ds-1d.b2nd']
>>> dataset.append([1, 2, 3])
(1003,)
concatenate(srcs, dst, axis)#

Concatenate the file with srcs along axis to a new location dst.

Parameters:
  • srcs (list of Paths) – Source files to be concatenated with current file

  • dst (Path) – The destination path for the file.

  • axis (int) – Axis along which to concatenate.

Returns:

The new path of the concatenated file.

Return type:

Path

Examples

>>> import caterva2 as cat2
>>> import numpy as np
>>> # For concatenating a file you need to be a registered user
>>> client = cat2.Client("https://cat2.cloud/demo", ("joedoe@example.com", "foobar"))
>>> root = client.get('@personal')
>>> root.upload('root-example/dir2/ds-4d.b2nd', "a.b2nd")
<Dataset: @personal/a.b2nd>
>>> root.upload('root-example/dir2/ds-4d.b2nd', "b.b2nd")
<Dataset: @personal/b.b2nd>
>>> file = root['a.b2nd']
>>> file.concatenate('@personal/b.b2nd', '@personal/c.b2nd', axis=0)
PurePosixPath('@personal/c.b2nd')
copy(dst)#

Copies the file to a new location.

Parameters:

dst (Path) – The destination path for the file.

Returns:

The new path of the copied file.

Return type:

Path

Examples

>>> import caterva2 as cat2
>>> import numpy as np
>>> # For copying a file you need to be a registered user
>>> client = cat2.Client("https://cat2.cloud/demo", ("joedoe@example.com", "foobar"))
>>> root = client.get('@personal')
>>> root.upload('root-example/dir2/ds-4d.b2nd')
<Dataset: @personal/root-example/dir2/ds-4d.b2nd>
>>> file = root['root-example/dir2/ds-4d.b2nd']
>>> file.copy('@personal/root-example/dir2/ds-4d-copy.b2nd')
PurePosixPath('@personal/root-example/dir2/ds-4d-copy.b2nd')
>>> 'root-example/dir2/ds-4d.b2nd' in root
True
>>> 'root-example/dir2/ds-4d-copy.b2nd' in root
True
download(localpath=None)#

Downloads the file to storage.

Parameters:

localpath (Path, optional) – The destination path for the downloaded file. If not specified, the file will be downloaded to the current working directory.

Returns:

The path to the downloaded file.

Return type:

Path

Examples

>>> import caterva2 as cat2
>>> client = cat2.Client('https://demo.caterva2.net')
>>> root = client.get('example')
>>> file = root['ds-1d.b2nd']
>>> file.download()
PosixPath('example/ds-1d.b2nd')
>>> file.download('mydir/myarray.b2nd')
PosixPath('mydir/myarray.b2nd')
get_download_url()#

Retrieves the download URL for the file.

Returns:

The file’s download URL.

Return type:

str

Examples

>>> import caterva2 as cat2
>>> client = cat2.Client('https://demo.caterva2.net')
>>> root = client.get('example')
>>> file = root['ds-1d.b2nd']
>>> file.get_download_url()
'https://demo.caterva2.net/api/fetch/example/ds-1d.b2nd'
move(dst)#

Moves the file to a new location.

Parameters:

dst (Path) – The destination path for the file.

Returns:

The new path of the file after the move.

Return type:

Path

Examples

>>> import caterva2 as cat2
>>> # For moving a file you need to be a registered user
>>> client = cat2.Client("https://cat2.cloud/demo", ("joedoe@example.com", "foobar"))
>>> root = client.get('@personal')
>>> root.upload('root-example/dir2/ds-4d.b2nd')
<Dataset: @personal/root-example/dir2/ds-4d.b2nd>
>>> file = root['root-example/dir2/ds-4d.b2nd']
>>> file.move('@personal/root-example/dir1/ds-4d-moved.b2nd')
PurePosixPath('@personal/root-example/dir1/ds-4d-moved.b2nd')
>>> 'root-example/dir2/ds-4d.b2nd' in root
False
>>> 'root-example/dir1/ds-4d-moved.b2nd' in root
True
remove()#

Removes the file from the remote repository.

Returns:

The path of the removed file.

Return type:

str

Examples

>>> import caterva2 as cat2
>>> import numpy as np
>>> # To remove a file you need to be a registered user
>>> client = cat2.Client('https://cat2.cloud/demo', ("joedoe@example.com", "foobar"))
>>> root = client.get('@personal')
>>> path = 'root-example/dir2/ds-4d.b2nd'
>>> root.upload(path)
<Dataset: @personal/root-example/dir2/ds-4d.b2nd>
>>> file = root[path]
>>> file.remove()
'@personal/root-example/dir2/ds-4d.b2nd'
>>> path in root
False
slice(key: int | slice | Sequence[slice], as_blosc2: bool = True) NDArray | SChunk | ndarray#

Get a slice of a File/Dataset.

Parameters:
  • key (int, slice, or sequence of slices) – The slice to retrieve. If a single slice is provided, it will be applied to the first dimension. If a sequence of slices is provided, each slice will be applied to the corresponding dimension.

  • as_blosc2 (bool) – If True (default), the result will be returned as a Blosc2 object (either a SChunk or NDArray). If False, it will be returned as a NumPy array (equivalent to self[key]).

Returns:

A new Blosc2 object containing the requested slice.

Return type:

NDArray or SChunk or numpy.ndarray

Examples

>>> import caterva2 as cat2
>>> client = cat2.Client('https://demo.caterva2.net')
>>> root = client.get('example')
>>> ds = root['ds-1d.b2nd']
>>> ds.slice(1)
<blosc2.ndarray.NDArray object at 0x10747efd0>
>>> ds.slice(1)[()]
array(1)
>>> ds.slice(slice(0, 10))[:]
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
stack(srcs, dst, axis)#

Stack the file with srcs along new axis to a new location dst.

Parameters:
  • srcs (list of Paths) – Source files to be stacked with current file

  • dst (Path) – The destination path for the file.

  • axis (int) – Axis along which to stack.

Returns:

The new path of the stacked file.

Return type:

Path

Examples

>>> import caterva2 as cat2
>>> import numpy as np
>>> # For stacking a file you need to be a registered user
>>> client = cat2.Client("https://cat2.cloud/demo", ("joedoe@example.com", "foobar"))
>>> root = client.get('@personal')
>>> root.upload('root-example/dir2/ds-4d.b2nd', "a.b2nd")
<Dataset: @personal/a.b2nd>
>>> root.upload('root-example/dir2/ds-4d.b2nd', "b.b2nd")
<Dataset: @personal/b.b2nd>
>>> file = root['a.b2nd']
>>> file.stack('@personal/b.b2nd', '@personal/c.b2nd', axis=0)
PurePosixPath('@personal/c.b2nd')
unfold()#

Unfolds the file in a remote directory.

Returns:

The path to the unfolded directory.

Return type:

Path

Examples

>>> import caterva2 as cat2
>>> client = cat2.Client('https://demo.caterva2.net')
>>> root = client.get('example')
>>> file = root['ds-1d.h5']
>>> file.unfold()
PurePosixPath('example/ds-1d.h5')
property blocks#

The blockshape of the compressed dataset.

property chunks#

The chunkshape of the compressed dataset.

property dtype#

The data type of the dataset.

property shape#

The shape of the dataset.

property vlmeta#

Returns a mapping of metalayer names to their respective values.

This is used to access variable-length metalayers (user attributes) associated with the file.

>>> import caterva2 as cat2
>>> client = cat2.Client('https://demo.caterva2.net')
>>> root = client.get('example')
>>> file = root['ds-sc-attr.b2nd']
>>> file.vlmeta
{'a': 1, 'b': 'foo', 'c': 123.456}