Table of Contents

Using FSMap with Zarr v2 for Remote Zip Stores

Overview

This guide explains how to open Zarr archives stored as zip files in remote object storage (MinIO/S3) using fsspec's FSMap with Zarr v2.

The Problem

Zarr v2's native ZipStore only accepts local file paths, not remote files or file-like objects. To access a .zip file stored in MinIO/S3, we need a different approach that bridges the gap between remote storage and Zarr's storage interface.

The Solution Architecture

The solution uses a chain of abstractions:

  1. S3FileSystem: Provides filesystem-like access to MinIO/S3 buckets
  2. ZipFileSystem: Wraps a zip file (local or remote) as a virtual filesystem
  3. FSMap: Translates filesystem operations into a key-value mapping that Zarr understands

Chain of Components

MinIO/S3 Storage
    ↓ (accessed via)
S3FileSystem (treats bucket as filesystem)
    ↓ (opens file)
S3File object (file-like interface to remote zip)
    ↓ (wrapped by)
ZipFileSystem (treats zip contents as filesystem)
    ↓ (mapped to)
FSMap (key-value store interface)
    ↓ (consumed by)
Zarr (reads arrays and metadata)

How It Works

1. S3FileSystem

S3FileSystem from the s3fs library provides a Python filesystem interface (fsspec) to S3/MinIO:

import s3fs
 
fs = s3fs.S3FileSystem(
    client_kwargs={
        'endpoint_url': 'https://minio.example.com',
        'verify': '/path/to/ca.crt'
    },
    key='access_key',
    secret='secret_key',
    use_ssl=True
)

This object lets you interact with MinIO buckets using familiar filesystem operations like fs.open(), fs.ls(), etc.

2. ZipFileSystem

ZipFileSystem from fsspec.implementations.zip takes a file object (which can be remote) and exposes the zip archive's internal structure as a filesystem:

from fsspec.implementations.zip import ZipFileSystem
 
# Open the remote zip file
remote_file = fs.open('bucket/path/archive.zip', 'rb')
 
# Create a filesystem view into the zip
zip_fs = ZipFileSystem(fo=remote_file)

The fo parameter accepts any file-like object, including remote files from S3FileSystem. Now zip_fs treats the zip's contents as if they were a directory tree.

3. FSMap

FSMap from fsspec.mapping implements Python's MutableMapping interface (dict-like behavior) on top of any fsspec filesystem:

from fsspec.mapping import FSMap
 
# Create a mapping store
store = FSMap(root='', fs=zip_fs)

The root='' parameter means “start at the zip's root directory”. FSMap now translates dictionary-style access (store[key]) into filesystem operations (zip_fs.open(key)).

4. Zarr Integration

Zarr v2 expects stores to implement the MutableMapping interface, which FSMap provides. When you open a Zarr group:

import zarr
 
root = zarr.open(store, mode='r')

Zarr performs operations like:

Each of these translates through the chain:

  1. FSMapZipFileSystemS3File → MinIO/S3 HTTP request

Complete Example

import s3fs
import zarr
from fsspec.implementations.zip import ZipFileSystem
from fsspec.mapping import FSMap
 
# 1. Configure S3/MinIO access
fs = s3fs.S3FileSystem(
    client_kwargs={
        'endpoint_url': 'https://minio.example.com',
        'verify': '/path/to/ca.crt'
    },
    key='your_access_key',
    secret='your_secret_key',
    use_ssl=True
)
 
# 2. Open the remote zip file as a filesystem
s3_path = 'my-bucket/data/experiment.zarr.zip'
zip_fs = ZipFileSystem(fo=fs.open(s3_path, 'rb'))
 
# 3. Create a mapping store for Zarr
store = FSMap(root='', fs=zip_fs)
 
# 4. Open with Zarr
root = zarr.open(store, mode='r')
 
# 5. Use the Zarr group normally
print(root.tree())
array = root['my_array'][:]

Key-Value Mapping Internals

Under the hood, a Zarr zip archive contains files like:

.zgroup                  # Root group metadata
array_name/.zarray       # Array metadata
array_name/0.0.0         # Chunk at position (0,0,0)
array_name/0.0.1         # Chunk at position (0,0,1)
subgroup/.zgroup         # Nested group metadata

When Zarr does store['array_name/0.0.0']:

  1. FSMap translates to zip_fs.open('array_name/0.0.0', 'rb').read()
  2. ZipFileSystem locates this file in the zip's central directory
  3. ZipFileSystem reads the compressed data from the underlying S3File
  4. S3File makes an HTTP range request to MinIO
  5. The decompressed chunk bytes are returned to Zarr

This happens lazily - only when Zarr actually accesses specific data.

Mode Considerations

For read-only access (mode='r'), this approach works seamlessly.

For write operations, limitations apply:

Recommendation: Use this approach for read-only access to pre-created Zarr zip archives.

Performance Notes

Troubleshooting

"TypeError: expected str, bytes or os.PathLike object"

You tried to pass a file object directly to zarr.ZipStore. Use ZipFileSystem + FSMap instead.

"SSL Certificate Verify Failed"

Add the CA certificate path to client_kwargs={'verify': '/path/to/cert.pem'}.

Store appears empty

Check that root='' in FSMap is correct. If the zip has a subdirectory, use root='subdirectory/path'.

Alternative: Direct FSStore

You can also use zarr.storage.FSStore instead of FSMap:

store = zarr.storage.FSStore(url='', fs=zip_fs)
root = zarr.open(store, mode='r')

Both FSStore and FSMap provide the same MutableMapping interface. FSMap is more lightweight and part of core fsspec.

RKNS (Zarr V2) from Minio ZIP

Execute the following with 'uv run', the dependencies are automatically resolved. This assumes you have our internal pypi registry set up with uv.

# /// script
# requires-python = ">=3.8"
# dependencies = [
#     "boto3>=1.40.49",
#     "python-dotenv>=0.9.9",
#     "packaging>=25.0",
#     "rkns==0.6.2",
#     "s3fs[boto3]>=2023.12.0",
#     "typing-extensions>=4.15.0",
# ]
# ///
import os
from pathlib import Path
from dotenv import load_dotenv
from fsspec.implementations.zip import ZipFileSystem
from fsspec.mapping import FSMap
import s3fs
import zarr
import rkns
 
# load credentials from .env file
load_dotenv()
access_key_id = os.getenv("STORAGE_ACCESS_KEY")
secret_access_key = os.getenv("STORAGE_SECRET_KEY")
endpoint_url = os.getenv("ENDPOINT")
endpoint_url_full = os.getenv("ENDPOINT_FULL")
 
 
# Specify the path to your custom CA certificate
ca_cert_path = "ca.crt.cer"
assert Path(ca_cert_path).is_file()
 
 
# Create s3fs filesystem with custom cert
fs = s3fs.S3FileSystem(
    client_kwargs={"endpoint_url": endpoint_url_full, "verify": str(ca_cert_path)},
    key=access_key_id,
    secret=secret_access_key,
    use_ssl=True,
)
 
s3_path = "rekonas-dataset-shhs-rkns/sub-shhs200001_ses-01_task-sleep_eeg.rkns"
 
zip_fs = ZipFileSystem(fo=fs.open(s3_path, "rb"))
store = zarr.storage.FSStore(url='', fs=zip_fs)
rkns_obj = rkns.from_RKNS(store)
print(rkns_obj.tree)