HAF ZFS Snapshots

What's in the snapshot?

These files are a snapshot of the ZFS datasets of a Hive & HAF API node that's fully in sync as of the snapshot date. Specifically, it has up to date data for:

The snapshot is provided as a quick way to get a HAF API node up & running. On a fast machine, it would take about three days to bring a node in sync. By using these snapshots, you will only need to process the blockchain from the point the snapshot was taken. On a fast machine, it will take about an hour for your node to sync up if you start with a one-day-old snapshot. Most of this time is spent in Hivemind. This doesn't scale linearly -- a slower server was able to catch up 3 days worth of blocks in a little over two hours.

At the time of writing, an incremental snapshot for one day is around 110GB. However, doesn't scale linearly: an incremental snapshot for four days of data is around 155GB. In other words, much of the data that is changed in a single day will also be changed on the next day, so sending two days worth of changes won't take twice as long as sending one day worth.

Snapshot Organization

Snapshots will be organized into chains, consisting of full snapshots followed by incremental snapshots that build off the previous ones. Normally, snapshots will continue the latest chain. If there is a change in HAF that requires a full replay, we won't be able to generate a small incremental snapshot, so we will start a new chain with a new full snapshot and then switch back to generating incremental snapshots.

Since the blockchain itself won't change between chains of snapshots, it will be provided in a separate file from the rest of the datasets. When we start a new chain of snapshots, it will start with the blockchain snapshot from the previous snapshot chain, so you won't have to redownload most of the blockchain itself.

Full snapshots will be generated less frequently (maybe every month?), and incremental snapshots will be added every few days. Initially, you will need to download the latest full snapshot and incremental snapshots generated after it.

20240308T1858Z-full-without-blockchain-v1.27.5rc7-15200616425965031960.zfs
20240308T1858Z-full-blockchain-only-v1.27.5rc7-15200616425965031960.zfs
20240309T1900Z-incremental-v1.27.5rc7-15200616425965031960-1841882368233268207.zfs
20240310T1855Z-incremental-v1.27.5rc8-1841882368233268207-9302265023450484906.zfs

[timestamp]-full-without-blockchain-[version]-[guid].zfs
[timestamp]-full-blockchain-only-[version]-[guid].zfs
[timestamp]-incremental-[version]-[from-guid]-[to-guid].zfs
	  
From the filename, you can see that the first is a full snapshot that was taken at 18:58 UTC on 2024-03-08. It was running v1.27.5rc7 of the haf_api_node stack at the time, so if you restore this snapshot you should launch the same version by setting HIVE_API_NODE_VERSION=v1.27.5rc7 in your .env file. When you restore this snapshot, it will create the necessary datasets, using the timestamp as the snapshot name.
haf-pool/haf-datadir@20240308T1858Z
haf-pool/haf-datadir/blockchain@20240308T1858Z
haf-pool/haf-datadir/shared_memory@20240308T1858Z
haf-pool/haf-datadir/haf_db_store@20240308T1858Z
haf-pool/haf-datadir/haf_db_store/pgdata@20240308T1858Z
haf-pool/haf-datadir/haf_db_store/pgdata/pg_wal@20240308T1858Z
haf-pool/haf-datadir/haf_db_store/tablespace@20240308T1858Z
haf-pool/haf-datadir/logs@20240308T1858Z
	  
The big number at the end is the GUID of the snapshot on the top-level dataset, haf-pool/haf-datadir. ZFS assigns a GUID to each snapshot, and uses it to determine whether an incremental snapshot can be applied to your existing dataset. Incremental snapshot files have the same information, except they have two GUIDs, telling you which snapshot they should be applied to and what snapshot they will result in.

System Requirements

Before starting, you'll need to have your zpool created. We recommend striping the data across at least two reasonably fast NVMe drives. The dataset will be larger than 2TB, so a zpool combining two 2TB drives would be a good choice.

For most NVMe drives, a sector size of 8k is a good choice. You can set this with ashift (8k = 213);

zpool create -o ashift=13 haf-pool /dev/nvme0n1 /dev/nvme1n1
          

Our instructions and example files assume your zpool will be named haf-pool and the dataset will be named haf-dataset. You're welcome to use other names, just make the appropriate substitutions.

If you have an existing ZFS dataset that isn't part of this snapshot chain, you'll need to destroy it before importing this snapshot. (zfs destroy -r haf-pool/haf-datadir)

Downloading the datasets

You can download using either BitTorrent or a regular HTTP download. We recommend downloading the snapshots using BitTorrent whenever possible. The torrent has the regular HTTP download site configured as a web seed, so the torrent should always download at least as fast as the straight HTTP download, even if nobody is seeding.

BitTorrent

HTTP

All snapshot files can be download directly from this link

notes

The initial dataset is over 2TB, and incremental datasets will be tens of GB. If you have the space, you should download the files to a local disk before importing the dataset. In other words, don't stream the data directly into zfs recv, because if the transfer is interrupted you will have to restart the download from the beginning. A magnetic disk will be fine for storing the downloads.

wget --continue --wait=10 http://snapshots.hive.blog/snapshots/latest/20240309T1900Z-incremental-v1.27.5rc7-15200616425965031960-1841882368233268207.zfs
# or
aria2c --continue http://snapshots.hive.blog/snapshots/latest/20240309T1900Z-incremental-v1.27.5rc7-15200616425965031960-1841882368233268207.zfs
          

How to Use

Before starting, you'll need to have your zpool created. We recommend striping the data across at least two reasonably fast NVMe drives. The dataset will be larger than 2TB, so a zpool combining two 2TB drives would be a good choice.

For most NVMe drives, a sector size of 8k is a good choice. You can set this with ashift (8k = 213);

zpool create -o ashift=13 haf-pool /dev/nvme0n1 /dev/nvme1n1
          

Our instructions and example files assume your zpool will be named haf-pool and the dataset will be named haf-dataset. You're welcome to use other names, just make the appropriate substitutions.

If you have an existing ZFS dataset that isn't part of this snapshot chain, you'll need to destroy it before importing this snapshot. (zfs destroy -r haf-pool/haf-datadir)

sudo zfs recv -d -v haf-pool < 20240308T1858Z-full-without-blockchain-v1.27.5rc7-15200616425965031960.zfs
# or, to see progress
pv 20240308T1858Z-full-without-blockchain-v1.27.5rc7-15200616425965031960.zfs | sudo zfs recv -d haf-pool
          
This will create a dataset named haf-pool/haf-dataset and several sub-datasets. Run the same command to import the blockchain-only dataset.

After you've loaded the full or incremental snapshot, you can delete the .zfs file if you need to free up the disk space.

To apply an incremental snapshot,

sudo zfs recv -F -v haf-pool/haf-datadir < 20240309T1900Z-incremental-v1.27.5rc7-15200616425965031960-1841882368233268207.zfs
# or, to see progress
pv 20240309T1900Z-incremental-v1.27.5rc7-15200616425965031960-1841882368233268207.zfs | sudo zfs recv -F haf-pool/haf-datadir 
          
In other words, you don't need to pass -d to zfs recv, and you give the name of the top-level dataset instead of the name of the zpool. Passing the -F flag to zfs recv tells it to discard any data written since the previous snapshot before applying the new one (if you have just imported the full dataset or a previous snapshot, this shouldn't be necessary, but it doesn't hurt to add it).