Running independent Caterva2 services#
The services that we used til now are enough for testing, but not for a real deployment. For instance, they only listen to local connections, and they use example data and fixed directories.
In this section we’ll setup a more realistic deployment for a fictional organization:
A broker at host
broker.example.org
.Two publishers at host
pub.lab.example.org
at a data collection laboratory, serving a different root each.A subscriber at host
sub.edu.example.org
at a research & education branch.A custom API client in a workstation at the latter branch.
The broker, publisher and subscriber hosts need a Caterva2 installation with the services
extra:
python -m pip install caterva2[services]
The workstation should be fine with a plain installation, but we’ll also install the clients
extra to perform quick tests with cat2cli
:
python -m pip install caterva2[clients]
(If you’re going to try this tutorial on a single machine, just install caterva2[services,clients]
.)
Broker#
Our example broker shall listen on port 3104 of host broker.example.org
. At that host, it may be run like this:
cat2bro --http *:3104
The broker will create a _caterva2/bro
directory for its state files and listen in all network interfaces. Let’s restrict that to just the public interface, and set the directory to cat2-bro
. Stop the broker with Ctrl+C and run this (using the host name of your machine or localhost
):
cat2bro --http broker.example.org:3104 --statedir ./cat2-bro
(The ./
is not needed, but it shows that the --statedir
option allows both relative and absolute paths, not necessarily under the current directory.)
Let’s put those options in the caterva2.toml
configuration file:
[broker]
http = "broker.example.org:3104"
statedir = "./cat2-bro"
You may now stop the broker and run it with just:
cat2bro
Publishers#
Here we will setup at the pub.lab.example.org
host two publishers, each serving one of the roots which we shall name foo
and bar
. We’ll create their respective Caterva2 directories with the (arbitrary but meaningful) names foo-root
and bar-root
, with simple text files inside:
mkdir foo-root
echo "This is the foo root." > foo-root/readme.txt
mkdir bar-root
echo "This is the bar root." > bar-root/readme.txt
Here we want to run both publishers from the same directory to keep things at hand. To be able to share a common configuration file, we shall give different identifiers to the publishers (foo
and bar
for simplicity). With that, we may have a caterva2.toml
file like this:
[publisher.foo]
http = "pub.lab.example.org:3115"
statedir = "./cat2-pub.foo"
name = "foo"
root = "./foo-root"
[publisher.bar]
http = "pub.lab.example.org:3116"
statedir = "./cat2-pub.bar"
name = "bar"
root = "./bar-root"
We also chose arbitrary ports and state directories like those we used with the broker. Now we can run subscribers like this (in different shells, both from the directory where caterva2.toml
is):
cat2pub --id foo
cat2pub --id bar
However they’ll fail to connect to the broker (a “Connection refused” error). You need to specify the correct broker’s address, either with the --broker
option or an http
setting in the [broker]
section of caterva2.toml
. Let’s add this in there and run both publishers again:
[broker]
http = "broker.example.org:3104"
The publishers will now work and register their respective roots at the broker.
(Yes, if you’re running the broker and publishers from the same directory of the same machine, the publishers will get the broker’s address from its caterva2.toml
configuration section.)
Subscriber#
The subscriber at host sub.edu.example.org
shall cache data from remote publishers for fast access from the research & education local network.
Subscribers also support arbitrary identifiers, but our setup won’t use them as there will only be one subscriber at the host. Use this configuration in the caterva2.toml
file at the subscriber host:
[subscriber]
http = "sub.edu.example.org:3126"
statedir = "./cat2-sub"
[broker]
http = "broker.example.org:3104"
By now, everything should look familiar to you (including the custom port and state directory, and the broker address). Please note that subscribers are configured with a broker address instead of publishers’: a subscriber gets publisher addresses from their common broker as needed.
To start the subscriber, just run:
cat2sub
User authentication#
If the subscriber is to support user authentication (to restrict access, allow computing expressions or uploading files), it will need a CATERVA2_SECRET
environment variable to be defined with its own secret token. That token should be persisted somewhere so as to use the same one every time the subscriber runs. You may start the subscriber like this:
env CATERVA2_SECRET=c2sikrit cat2sub
Then users will need to register via the Web client.
Of course, use of HTTPS is very encouraged in this scenario, e.g. by placing the subscriber behind a reverse proxy, with a configuration like this:
[subscriber]
http = "localhost:8002" # reverse proxy target
urlbase = "https://sub.edu.example.org:3126" # reverse proxy address
Client setup#
Clients at the example workstation need to know the address of the subscriber that they will use.
The command-line client cat2cli
provides the --subscriber
option for that. Running this at the workstation:
cat2cli --subscriber http://sub.edu.example.org:3126 roots
Will retrieve the list of known roots from the subscriber that we set up above. Should authentication be needed, --username
and --password
options may also be used.
Since cat2cli
also supports caterva2.toml
, this configuration in the current directory:
[subscriber]
urlbase = "http://sub.edu.example.org:3126" # "https://..." if needed
#[client] # uncomment section if needed
#username = "user@example.com"
#password = "foobar11"
Should allow you to run the previous command just like this:
cat2cli roots
When using the programmatic API, you need to provide the subscriber address explicitly:
roots = caterva2.get_roots(urlbase='http://sub.edu.example.org:3126')
foo = caterva2.Root('foo', urlbase='http://sub.edu.example.org:3126')
Since parsing TOML is very easy with Python, your API client may just access the needed configuration like this:
from tomllib import load as toml_load # "from tomli" on Python < 3.11
with open('caterva2.toml', 'rb') as conf_file:
conf = toml_load(conf_file)
#user_auth = dict(username=conf['client]['username'],
# password=conf['client]['password'])
foo = caterva2.Root('foo',
urlbase=conf['subscriber']['url'],
#user_auth=user_auth,
)