Using the client APIs#

To follow these instructions, make sure that you have started test Caterva2 services (see Launching Caterva2 services).

The top level client API#

Let’s try Caterva2’s top level client API (fully described in Top level API) against the default subscriber at http://localhost:8002/ (you may specify a different one via the urlbase argument) . Run your Python interpreter and enter:

import caterva2

roots = caterva2.get_roots()

Note: If the subscriber requires user authentication (and you get a 401 Unauthorized error), you may first get an authorization cookie with caterva2.api_utils.get_auth_cookie(), then pass the returned cookie to API functions as the auth_cookie keyword argument. For instance:

cookie = caterva2.api_utils.get_auth_cookie(
    'http://localhost:8002/',
    {'username': 'user@example.com', 'password': 'foobar'})
roots = caterva2.get_roots(auth_cookie=cookie)

We just connected to the subscriber and asked about all roots known by the broker. If you print roots you’ll see a dictionary with a foo entry:

{'foo': {'name': 'foo', 'http': 'localhost:8001', 'subscribed': None}}

Besides its name, it contains the address of the publisher providing it, and an indication that we’re not subscribed to it. Getting a list of datasets in that root with caterva2.get_list('foo') will fail with 404 Not Found. So let’s try again by first subscribing to it:

caterva2.subscribe('foo')
datasets = caterva2.get_list('foo')

If you print datasets you’ll see a list of datasets in the foo root:

['ds-1d.b2nd', 'ds-hello.b2frame', 'ds-1d-b.b2nd', 'README.md',
 'dir1/ds-3d.b2nd', 'dir1/ds-2d.b2nd', 'dir2/ds-4d.b2nd']

(If you repeat the call to caterva2.get_roots() you’ll see that foo has subscribed=True now.)

We can get some information about a dataset without downloading it:

metadata = caterva2.get_info('foo/dir1/ds-2d.b2nd')

Note how we identify the dataset by using a slash / to concatenate the root name with the dataset name in that root (which may contain slashes itself). The metadata dictionary contains assorted dataset attributes:

{'dtype': 'uint16',
 'ndim': 2,
 'shape': [10, 20],
 # ...
 'schunk': {# ...
            'cparams': {'codec': 5, # ...
                       },
            # ...
           },
 # ...
 'size': 400}

So foo/dir1/ds-2d.b2nd is a 10x20 dataset of 16-bit unsigned integers. With caterva2.fetch() we can get as a NumPy array the whole dataset or just a part of it (passing a string representation of the slice that we would use between brackets as the slice_ argument):

caterva2.fetch('foo/dir1/ds-2d.b2nd', slice_='0:2, 4:8')

This returns just the requested slice:

array([[ 4,  5,  6,  7],
       [24, 25, 26, 27]], dtype=uint16)

Finally, you may want to save the whole dataset locally:

caterva2.download('foo/dir1/ds-2d.b2nd')

The call downloads the dataset as a file and returns its local path PosixPath('foo/dir1/ds-2d.b2nd'), which should be similar to the dataset name.

Evaluating expressions#

The Caterva2 subscriber also allows you to create so-called “lazy expressions” (lazyexprs) where operands are the array datasets accessible via the subscriber. These expressions get stored in the user’s own scratch space (an always-subscribed pseudo-root named @scratch), thus working with them requires user authentication.

Lazy expressions are very cheap to create as that operation only requires knowing the metadata of the involved operands. The resulting data is not computed on creation, it only takes place at the subscriber when you request access to the data itself (e.g. via fetch or download operations).

This code creates a lazyexpr named plusone from the 2D dataset used above (check the note further above on how to get the auth_cookie):

caterva2.lazyexpr('plusone', 'x + 1', {'x': 'foo/dir1/ds-2d.b2nd'},
                  auth_cookie=...)

The path of the new dataset is returned: @scratch/plusone.b2nd. Now you can access it as a normal dataset, e.g.:

caterva2.fetch('@scratch/plusone.b2nd', slice_='0:2, 4:8',
               auth_cookie=...)

The object-oriented client API#

The top level client API is simple but not very pythonic. Fortunately, Caterva2 also provides a light and concise object-oriented client API (fully described in Root class, File class and Dataset class), similar to that of h5py.

First, let’s create a caterva2.Root instance for the foo root (using the default subscriber – remember to start your Caterva2 services first):

foo = caterva2.Root('foo')

Note: If the subscriber requires user authentication, you may provide credentials to the Root constructor with the user_auth keyword argument, to get authorization for further access. For instance:

foo = caterva2.Root(
    'foo',
    user_auth={'username': 'user@example.com', 'password': 'foobar'})

This also takes care of subscribing to foo if it hasn’t been done yet. To get the list of datasets in the root, just access foo.node_list:

['ds-1d.b2nd', 'ds-hello.b2frame', 'ds-1d-b.b2nd', 'README.md',
 'dir1/ds-3d.b2nd', 'dir1/ds-2d.b2nd', 'dir2/ds-4d.b2nd']

Indexing the caterva2.Root instance with the name of the dataset results in a caterva2.Dataset instance (or caterva2.File, as we’ll see below). The instance offers easy access to its metadata via the meta attribute:

ds2d = foo['dir1/ds-2d.b2nd']
ds2d.meta

We get the dataset metadata:

{'dtype': 'uint16',
 'ndim': 2,
 'shape': [10, 20],
 # ...
 'size': 400}

Getting data from the dataset is very concise, as caterva2.Dataset instances support slicing notation, so this expression:

ds2d[0:2, 4:8]

Results in the same slice as the (much more verbose) caterva2.fetch() call in the previous section:

array([[ 4,  5,  6,  7],
       [24, 25, 26, 27]], dtype=uint16)

Slicing like this automatically uses Blosc2 for the transfer when available. Finally, you may download the whole dataset like this, which also returns the path of the resulting local file:

ds2d.download()  # -> PosixPath('foo/dir1/ds-2d.b2nd')

On datasets and files#

The type of instance that you get from indexing a caterva2.Root instance depends on the kind of the named dataset: for datasets whose name ends in .b2nd (n-dimensional Blosc2 array) or .b2frame (byte string in a Blosc2 frame) you’ll get a caterva2.Dataset, while otherwise you’ll get a caterva2.File (non-Blosc2 data). Both classes support the same operations, with slicing only supporting one dimension and always returning a byte string for Blosc2 frames and other files:

type(ds2d[0:2, 4:8])  # -> <class 'numpy.ndarray'>
type(foo['ds-hello.b2frame'][:10])  # -> <class 'bytes'>
type(foo['README.md'][:10])  # -> <class 'bytes'>