Dataset class#
A dataset is a Blosc2-encoded file on a root repository (thus a File) representing either a flat string of bytes or an n-dimensional array.
- class caterva2.Dataset(root, path)#
Bases:
File,Operand- Attributes:
blocksThe blockshape of the compressed dataset.
chunksThe chunkshape of the compressed dataset.
deviceHardware device the array data resides on.
dtypeThe data type of the dataset.
infoGet information about the Operand.
ndimGet the number of dimensions of the Operand.
shapeThe shape of the dataset.
- vlmeta
Returns a mapping of metalayer names to their respective values.
This is used to access variable-length metalayers (user attributes) associated with the file.
>>> import caterva2 as cat2 >>> client = cat2.Client('https://demo.caterva2.net') >>> root = client.get('example') >>> file = root['ds-sc-attr.b2nd'] >>> file.vlmeta {'a': 1, 'b': 'foo', 'c': 123.456}
Methods
all([axis, keepdims])Test whether all array elements along a given axis evaluate to True.
any([axis, keepdims])Test whether any array element along a given axis evaluates to True.
append(data)Appends data to the dataset.
argmax([axis, keepdims])Returns the indices of the maximum values along a specified axis.
argmin([axis, keepdims])Returns the indices of the minimum values along a specified axis.
copy(dst)Copies the file to a new location.
download([localpath])Downloads the file to storage.
Retrieves the download URL for the file.
item()Copy an element of an array to a standard Python scalar and return it.
max([axis, keepdims])Return the maximum along a given axis.
mean([axis, dtype, keepdims])Return the arithmetic mean along the specified axis.
min([axis, keepdims])Return the minimum along a given axis.
move(dst)Moves the file to a new location.
prod([axis, dtype, keepdims])Return the product of array elements over a given axis.
remove()Removes the file from the remote repository.
slice(key[, as_blosc2])Get a slice of a File/Dataset.
std([axis, dtype, ddof, keepdims])Return the standard deviation along the specified axis.
sum([axis, dtype, keepdims])Return the sum of array elements over a given axis.
to_device(device)Copy the array from the device on which it currently resides to the specified device.
unfold()Unfolds the file in a remote directory.
var([axis, dtype, ddof, keepdims])Return the variance along the specified axis.
where([value1, value2])Select
value1orvalue2values based onTrue/Falseforself.- Special Methods:
__init__(root, path)Represents a dataset within a Blosc2 container.
__getitem__(item)Retrieves a slice of the dataset.
Constructor#
- __init__(root, path)#
Represents a dataset within a Blosc2 container.
This class is not intended to be instantiated directly; it should be accessed through a
Rootinstance.Examples
>>> import caterva2 as cat2 >>> client = cat2.Client('https://demo.caterva2.net') >>> root = client.get('example') >>> ds = root['ds-1d.b2nd'] >>> ds.dtype 'int64' >>> ds.shape (1000,) >>> ds.chunks (100,) >>> ds.blocks (10,)
Utility Methods#
- __getitem__(item)#
Retrieves a slice of the dataset.
- Parameters:
item¶ (int, slice, tuple of ints and slices, or None) – Specifies the slice to fetch.
- Returns:
The requested slice of the dataset.
- Return type:
numpy.ndarray
Examples
>>> import caterva2 as cat2 >>> client = cat2.Client('https://demo.caterva2.net') >>> root = client.get('example') >>> ds = root['ds-1d.b2nd'] >>> ds[1] array(1) >>> ds[:1] array([0]) >>> ds[0:10] array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
- all(axis=None, keepdims=False, **kwargs)#
Test whether all array elements along a given axis evaluate to True.
The parameters are documented in the
min.- Returns:
all_along_axis – The result of the evaluation along the axis.
- Return type:
np.ndarray or NDArray or scalar
References
Examples
>>> import numpy as np >>> import blosc2 >>> data = np.array([True, True, False, True, True, True]) >>> ndarray = blosc2.asarray(data) >>> # Test if all elements are True along the default axis (flattened array) >>> result_flat = blosc2.all(ndarray) >>> print("All elements are True (flattened):", result_flat) All elements are True (flattened): False
- any(axis=None, keepdims=False, **kwargs)#
Test whether any array element along a given axis evaluates to True.
The parameters are documented in the
min.- Returns:
any_along_axis – The result of the evaluation along the axis.
- Return type:
np.ndarray or NDArray or scalar
References
Examples
>>> import blosc2 >>> import numpy as np >>> data = np.array([[1, 0, 0], [0, 1, 0], [0, 0, 0]]) >>> # Convert the NumPy array to a Blosc2 NDArray >>> ndarray = blosc2.asarray(data) >>> print("NDArray data:", ndarray[:]) NDArray data: [[1 0 0] [0 1 0] [0 0 0]] >>> any_along_axis_0 = blosc2.any(ndarray, axis=0) >>> print("Any along axis 0:", any_along_axis_0) Any along axis 0: [True True False] >>> any_flattened = blosc2.any(ndarray) >>> print("Any in the flattened array:", any_flattened) Any in the flattened array: True
- append(data)#
Appends data to the dataset.
- Parameters:
data¶ (blosc2.NDArray, numpy.ndarray, sequence) – The data to append to the dataset.
- Returns:
The new shape of the dataset.
- Return type:
tuple
Examples
>>> import caterva2 as cat2 >>> import numpy as np >>> # To append data to a dataset you need to be a registered user >>> client = cat2.Client("https://cat2.cloud/demo", ("joedoe@example.com", "foobar")) >>> data = client.copy('@public/examples/ds-1d.b2nd', '@personal/ds-1d.b2nd') >>> dataset = client.get('@personal')['ds-1d.b2nd'] >>> dataset.append([1, 2, 3]) (1003,)
- argmax(axis=None, keepdims=False, **kwargs)#
Returns the indices of the maximum values along a specified axis.
When the maximum value occurs multiple times, only the indices corresponding to the first occurrence are returned.
- Parameters:
x¶ (blosc2.Array) – Input array. Should have a real-valued data type.
axis¶ (int | None) – Axis along which to search. If None, return index of the maximum value of flattened array. Default: None.
keepdims¶ (bool) – If True, reduced axis included in the result as singleton dimension. Otherwise, axis not included in the result. Default: False.
- Returns:
out – If axis is None, a zero-dimensional array containing the index of the first occurrence of the maximum value; otherwise, a non-zero-dimensional array containing the indices of the maximum values.
- Return type:
blosc2.Array
- argmin(axis=None, keepdims=False, **kwargs)#
Returns the indices of the minimum values along a specified axis.
When the minimum value occurs multiple times, only the indices corresponding to the first occurrence are returned.
- Parameters:
x¶ (blosc2.Array) – Input array. Should have a real-valued data type.
axis¶ (int | None) – Axis along which to search. If None, return index of the minimum value of flattened array. Default: None.
keepdims¶ (bool) – If True, reduced axis included in the result as singleton dimension. Otherwise, axis not included in the result. Default: False.
- Returns:
out – If axis is None, a zero-dimensional array containing the index of the first occurrence of the minimum value; otherwise, a non-zero-dimensional array containing the indices of the minimum values.
- Return type:
blosc2.Array
- copy(dst)#
Copies the file to a new location.
- Parameters:
dst¶ (Path) – The destination path for the file.
- Returns:
The new path of the copied file.
- Return type:
Path
Examples
>>> import caterva2 as cat2 >>> import numpy as np >>> # For copying a file you need to be a registered user >>> client = cat2.Client("https://cat2.cloud/demo", ("joedoe@example.com", "foobar")) >>> root = client.get('@personal') >>> root.upload('root-example/dir2/ds-4d.b2nd') <Dataset: @personal/root-example/dir2/ds-4d.b2nd> >>> file = root['root-example/dir2/ds-4d.b2nd'] >>> file.copy('@personal/root-example/dir2/ds-4d-copy.b2nd') PurePosixPath('@personal/root-example/dir2/ds-4d-copy.b2nd') >>> 'root-example/dir2/ds-4d.b2nd' in root True >>> 'root-example/dir2/ds-4d-copy.b2nd' in root True
- download(localpath=None)#
Downloads the file to storage.
- Parameters:
localpath¶ (Path, optional) – The destination path for the downloaded file. If not specified, the file will be downloaded to the current working directory.
- Returns:
The path to the downloaded file.
- Return type:
Path
Examples
>>> import caterva2 as cat2 >>> client = cat2.Client('https://demo.caterva2.net') >>> root = client.get('example') >>> file = root['ds-1d.b2nd'] >>> file.download() PosixPath('example/ds-1d.b2nd') >>> file.download('mydir/myarray.b2nd') PosixPath('mydir/myarray.b2nd')
- get_download_url()#
Retrieves the download URL for the file.
- Returns:
The file’s download URL.
- Return type:
str
Examples
>>> import caterva2 as cat2 >>> client = cat2.Client('https://demo.caterva2.net') >>> root = client.get('example') >>> file = root['ds-1d.b2nd'] >>> file.get_download_url() 'https://demo.caterva2.net/api/fetch/example/ds-1d.b2nd'
- item() float | bool | complex | int#
Copy an element of an array to a standard Python scalar and return it.
- max(axis=None, keepdims=False, **kwargs)#
Return the maximum along a given axis.
The parameters are documented in the
min.- Returns:
max_along_axis – The maximum of the elements along the axis.
- Return type:
np.ndarray or NDArray or scalar
References
Examples
>>> import blosc2 >>> import numpy as np >>> data = np.array([[11, 2, 36, 24, 5, 69], [73, 81, 49, 6, 73, 0]]) >>> ndarray = blosc2.asarray(data) >>> print("NDArray data:", ndarray[:]) NDArray data: [[11 2 36 24 5 69] [73 81 49 6 73 0]] >>> # Compute the maximum along axis 0 and 1 >>> max_along_axis_0 = blosc2.max(ndarray, axis=0) >>> print("Maximum along axis 0:", max_along_axis_0) Maximum along axis 0: [73 81 49 24 73 69] >>> max_along_axis_1 = blosc2.max(ndarray, axis=1) >>> print("Maximum along axis 1:", max_along_axis_1) Maximum along axis 1: [69 81] >>> max_flattened = blosc2.max(ndarray) >>> print("Maximum of the flattened array:", max_flattened) Maximum of the flattened array: 81
- mean(axis=None, dtype=None, keepdims=False, **kwargs)#
Return the arithmetic mean along the specified axis.
The parameters are documented in the
sum.- Returns:
mean_along_axis – The mean of the elements along the axis.
- Return type:
np.ndarray or NDArray or scalar
References
Examples
>>> import numpy as np >>> import blosc2 >>> # Example array >>> array = np.array([[1, 2, 3], [4, 5, 6]] >>> nd_array = blosc2.asarray(array) >>> # Compute the mean of all elements in the array (axis=None) >>> overall_mean = blosc2.mean(nd_array) >>> print("Mean of all elements:", overall_mean) Mean of all elements: 3.5
- min(axis=None, keepdims=False, **kwargs)#
Return the minimum along a given axis.
- Parameters:
ndarr¶ (NDArray or NDField or C2Array or LazyExpr) – The input array or expression.
axis¶ (int or tuple of ints, optional) – Axis or axes along which to operate. By default, flattened input is used.
keepdims¶ (bool, optional) – If set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the input array.
kwargs¶ (dict, optional) – Keyword arguments that are supported by the
empty()constructor.
- Returns:
min_along_axis – The minimum of the elements along the axis.
- Return type:
np.ndarray or NDArray or scalar
References
Examples
>>> import numpy as np >>> import blosc2 >>> array = np.array([1, 3, 7, 8, 9, 31]) >>> nd_array = blosc2.asarray(array) >>> min_all = blosc2.min(nd_array) >>> print("Minimum of all elements in the array:", min_all) Minimum of all elements in the array: 1 >>> # Compute the minimum along axis 0 with keepdims=True >>> min_keepdims = blosc2.min(nd_array, axis=0, keepdims=True) >>> print("Minimum along axis 0 with keepdims=True:", min_keepdims) Minimum along axis 0 with keepdims=True: [1]
- move(dst)#
Moves the file to a new location.
- Parameters:
dst¶ (Path) – The destination path for the file.
- Returns:
The new path of the file after the move.
- Return type:
Path
Examples
>>> import caterva2 as cat2 >>> # For moving a file you need to be a registered user >>> client = cat2.Client("https://cat2.cloud/demo", ("joedoe@example.com", "foobar")) >>> root = client.get('@personal') >>> root.upload('root-example/dir2/ds-4d.b2nd') <Dataset: @personal/root-example/dir2/ds-4d.b2nd> >>> file = root['root-example/dir2/ds-4d.b2nd'] >>> file.move('@personal/root-example/dir1/ds-4d-moved.b2nd') PurePosixPath('@personal/root-example/dir1/ds-4d-moved.b2nd') >>> 'root-example/dir2/ds-4d.b2nd' in root False >>> 'root-example/dir1/ds-4d-moved.b2nd' in root True
- prod(axis=None, dtype=None, keepdims=False, **kwargs)#
Return the product of array elements over a given axis.
The parameters are documented in the
sum.- Returns:
product_along_axis – The product of the elements along the axis.
- Return type:
np.ndarray or NDArray or scalar
References
Examples
>>> import numpy as np >>> import blosc2 >>> # Create an instance of NDArray with some data >>> array = np.array([[11, 22, 33], [4, 15, 36]]) >>> nd_array = blosc2.asarray(array) >>> # Compute the product of all elements in the array >>> prod_all = blosc2.prod(nd_array) >>> print("Product of all elements in the array:", prod_all) Product of all elements in the array: 17249760 >>> # Compute the product along axis 1 (rows) >>> prod_axis1 = blosc2.prod(nd_array, axis=1) >>> print("Product along axis 1:", prod_axis1) Product along axis 1: [7986 2160]
- remove()#
Removes the file from the remote repository.
- Returns:
The path of the removed file.
- Return type:
str
Examples
>>> import caterva2 as cat2 >>> import numpy as np >>> # To remove a file you need to be a registered user >>> client = cat2.Client('https://cat2.cloud/demo', ("joedoe@example.com", "foobar")) >>> root = client.get('@personal') >>> path = 'root-example/dir2/ds-4d.b2nd' >>> root.upload(path) <Dataset: @personal/root-example/dir2/ds-4d.b2nd> >>> file = root[path] >>> file.remove() '@personal/root-example/dir2/ds-4d.b2nd' >>> path in root False
- slice(key: int | slice | Sequence[slice], as_blosc2: bool = True) NDArray | SChunk | ndarray#
Get a slice of a File/Dataset.
- Parameters:
key¶ (int, slice, or sequence of slices) – The slice to retrieve. If a single slice is provided, it will be applied to the first dimension. If a sequence of slices is provided, each slice will be applied to the corresponding dimension.
as_blosc2¶ (bool) – If True (default), the result will be returned as a Blosc2 object (either a SChunk or NDArray). If False, it will be returned as a NumPy array (equivalent to self[key]).
- Returns:
A new Blosc2 object containing the requested slice.
- Return type:
NDArray or SChunk or numpy.ndarray
Examples
>>> import caterva2 as cat2 >>> client = cat2.Client('https://demo.caterva2.net') >>> root = client.get('example') >>> ds = root['ds-1d.b2nd'] >>> ds.slice(1) <blosc2.ndarray.NDArray object at 0x10747efd0> >>> ds.slice(1)[()] array(1) >>> ds.slice(slice(0, 10))[:] array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
- std(axis=None, dtype=None, ddof=0, keepdims=False, **kwargs)#
Return the standard deviation along the specified axis.
- Parameters:
ndarr¶ (NDArray or NDField or C2Array or LazyExpr) – The input array or expression.
axis¶ (int or tuple of ints, optional) – Axis or axes along which the standard deviation is computed. By default, axis=None computes the standard deviation of the flattened array.
dtype¶ (np.dtype or list str, optional) – Type to use in computing the standard deviation. For integer inputs, the default is float32; for floating point inputs, it is the same as the input dtype.
ddof¶ (int, optional) – Means Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements. By default, ddof is zero.
keepdims¶ (bool, optional) – If set to True, the reduced axes are left in the result as dimensions with size one. This ensures that the result will broadcast correctly against the input array.
kwargs¶ (dict, optional) – Additional keyword arguments that are supported by the
empty()constructor.
- Returns:
std_along_axis – The standard deviation of the elements along the axis.
- Return type:
np.ndarray or NDArray or scalar
References
Examples
>>> import numpy as np >>> import blosc2 >>> # Create an instance of NDArray with some data >>> array = np.array([[1, 2, 3], [4, 5, 6]]) >>> nd_array = blosc2.asarray(array) >>> # Compute the standard deviation of the entire array >>> std_all = blosc2.std(nd_array) >>> print("Standard deviation of the entire array:", std_all) Standard deviation of the entire array: 1.707825127659933 >>> # Compute the standard deviation along axis 0 (columns) >>> std_axis0 = blosc2.std(nd_array, axis=0) >>> print("Standard deviation along axis 0:", std_axis0) Standard deviation along axis 0: [1.5 1.5 1.5]
- sum(axis=None, dtype=None, keepdims=False, **kwargs)#
Return the sum of array elements over a given axis.
- Parameters:
ndarr¶ (NDArray or NDField or C2Array or LazyExpr) – The input array or expression.
axis¶ (int or tuple of ints, optional) – Axis or axes along which a sum is performed. By default, axis=None, sums all the elements of the input array. If axis is negative, it counts from the last to the first axis.
dtype¶ (np.dtype or list str, optional) – The type of the returned array and of the accumulator in which the elements are summed. The dtype of
ndarris used by default unless it has an integer dtype of less precision than the default platform integer.keepdims¶ (bool, optional) – If set to True, the reduced axes are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the input array.
kwargs¶ (dict, optional) – Additional keyword arguments supported by the
empty()constructor.
- Returns:
sum_along_axis – The sum of the elements along the axis.
- Return type:
np.ndarray or NDArray or scalar
References
Examples
>>> import numpy as np >>> import blosc2 >>> # Example array >>> array = np.array([[1, 2, 3], [4, 5, 6]]) >>> nd_array = blosc2.asarray(array) >>> # Sum all elements in the array (axis=None) >>> total_sum = blosc2.sum(nd_array) >>> print("Sum of all elements:", total_sum) 21 >>> # Sum along axis 0 (columns) >>> sum_axis_0 = blosc2.sum(nd_array, axis=0) >>> print("Sum along axis 0 (columns):", sum_axis_0) Sum along axis 0 (columns): [5 7 9]
- to_device(device: str)#
Copy the array from the device on which it currently resides to the specified device.
- unfold()#
Unfolds the file in a remote directory.
- Returns:
The path to the unfolded directory.
- Return type:
Path
Examples
>>> import caterva2 as cat2 >>> client = cat2.Client('https://demo.caterva2.net') >>> root = client.get('example') >>> file = root['ds-1d.h5'] >>> file.unfold() PurePosixPath('example/ds-1d.h5')
- var(axis=None, dtype=None, ddof=0, keepdims=False, **kwargs)#
Return the variance along the specified axis.
The parameters are documented in the
std.- Returns:
var_along_axis – The variance of the elements along the axis.
- Return type:
np.ndarray or NDArray or scalar
References
Examples
>>> import numpy as np >>> import blosc2 >>> # Create an instance of NDArray with some data >>> array = np.array([[1, 2, 3], [4, 5, 6]]) >>> nd_array = blosc2.asarray(array) >>> # Compute the variance of the entire array >>> var_all = blosc2.var(nd_array) >>> print("Variance of the entire array:", var_all) Variance of the entire array: 2.9166666666666665 >>> # Compute the variance along axis 0 (columns) >>> var_axis0 = blosc2.var(nd_array, axis=0) >>> print("Variance along axis 0:", var_axis0) Variance along axis 0: [2.25 2.25 2.25]
- where(value1=None, value2=None)#
Select
value1orvalue2values based onTrue/Falseforself.
- property blocks#
The blockshape of the compressed dataset.
- property chunks#
The chunkshape of the compressed dataset.
- property device#
Hardware device the array data resides on. Always equal to ‘cpu’.
- property dtype#
The data type of the dataset.
- abstract property info: InfoReporter#
Get information about the Operand.
- Returns:
out – A printable class with information about the Operand.
- Return type:
InfoReporter
- abstract property ndim: int#
Get the number of dimensions of the Operand.
- Returns:
out – The number of dimensions of the Operand.
- Return type:
int
- property shape#
The shape of the dataset.
- property vlmeta#
Returns a mapping of metalayer names to their respective values.
This is used to access variable-length metalayers (user attributes) associated with the file.
>>> import caterva2 as cat2 >>> client = cat2.Client('https://demo.caterva2.net') >>> root = client.get('example') >>> file = root['ds-sc-attr.b2nd'] >>> file.vlmeta {'a': 1, 'b': 'foo', 'c': 123.456}