====== Using FSMap with Zarr v2 for Remote Zip Stores ======
===== Overview =====
This guide explains how to open Zarr archives stored as zip files in remote object storage (MinIO/S3) using ''fsspec'''s ''FSMap'' with Zarr v2.
===== The Problem =====
Zarr v2's native ''ZipStore'' only accepts local file paths, not remote files or file-like objects. To access a ''.zip'' file stored in MinIO/S3, we need a different approach that bridges the gap between remote storage and Zarr's storage interface.
===== The Solution Architecture =====
The solution uses a chain of abstractions:
- **S3FileSystem**: Provides filesystem-like access to MinIO/S3 buckets
- **ZipFileSystem**: Wraps a zip file (local or remote) as a virtual filesystem
- **FSMap**: Translates filesystem operations into a key-value mapping that Zarr understands
==== Chain of Components ====
MinIO/S3 Storage
↓ (accessed via)
S3FileSystem (treats bucket as filesystem)
↓ (opens file)
S3File object (file-like interface to remote zip)
↓ (wrapped by)
ZipFileSystem (treats zip contents as filesystem)
↓ (mapped to)
FSMap (key-value store interface)
↓ (consumed by)
Zarr (reads arrays and metadata)
===== How It Works =====
==== 1. S3FileSystem ====
''S3FileSystem'' from the ''s3fs'' library provides a Python filesystem interface (''fsspec'') to S3/MinIO:
import s3fs
fs = s3fs.S3FileSystem(
client_kwargs={
'endpoint_url': 'https://minio.example.com',
'verify': '/path/to/ca.crt'
},
key='access_key',
secret='secret_key',
use_ssl=True
)
This object lets you interact with MinIO buckets using familiar filesystem operations like ''fs.open()'', ''fs.ls()'', etc.
==== 2. ZipFileSystem ====
''ZipFileSystem'' from ''fsspec.implementations.zip'' takes a file object (which can be remote) and exposes the zip archive's internal structure as a filesystem:
from fsspec.implementations.zip import ZipFileSystem
# Open the remote zip file
remote_file = fs.open('bucket/path/archive.zip', 'rb')
# Create a filesystem view into the zip
zip_fs = ZipFileSystem(fo=remote_file)
The ''fo'' parameter accepts any file-like object, including remote files from ''S3FileSystem''. Now ''zip_fs'' treats the zip's contents as if they were a directory tree.
==== 3. FSMap ====
''FSMap'' from ''fsspec.mapping'' implements Python's ''MutableMapping'' interface (dict-like behavior) on top of any ''fsspec'' filesystem:
from fsspec.mapping import FSMap
# Create a mapping store
store = FSMap(root='', fs=zip_fs)
The ''root=%%''%%'' parameter means "start at the zip's root directory". ''FSMap'' now translates dictionary-style access (''store[key]'') into filesystem operations (''zip_fs.open(key)'').
==== 4. Zarr Integration ====
Zarr v2 expects stores to implement the ''MutableMapping'' interface, which ''FSMap'' provides. When you open a Zarr group:
import zarr
root = zarr.open(store, mode='r')
Zarr performs operations like:
* ''store['.zgroup']'' → reads the root metadata
* ''store['array_name/.zarray']'' → reads array metadata
* ''store['array_name/0.0.0']'' → reads a specific chunk
Each of these translates through the chain:
- ''FSMap'' → ''ZipFileSystem'' → ''S3File'' → MinIO/S3 HTTP request
===== Complete Example =====
import s3fs
import zarr
from fsspec.implementations.zip import ZipFileSystem
from fsspec.mapping import FSMap
# 1. Configure S3/MinIO access
fs = s3fs.S3FileSystem(
client_kwargs={
'endpoint_url': 'https://minio.example.com',
'verify': '/path/to/ca.crt'
},
key='your_access_key',
secret='your_secret_key',
use_ssl=True
)
# 2. Open the remote zip file as a filesystem
s3_path = 'my-bucket/data/experiment.zarr.zip'
zip_fs = ZipFileSystem(fo=fs.open(s3_path, 'rb'))
# 3. Create a mapping store for Zarr
store = FSMap(root='', fs=zip_fs)
# 4. Open with Zarr
root = zarr.open(store, mode='r')
# 5. Use the Zarr group normally
print(root.tree())
array = root['my_array'][:]
===== Key-Value Mapping Internals =====
Under the hood, a Zarr zip archive contains files like:
.zgroup # Root group metadata
array_name/.zarray # Array metadata
array_name/0.0.0 # Chunk at position (0,0,0)
array_name/0.0.1 # Chunk at position (0,0,1)
subgroup/.zgroup # Nested group metadata
When Zarr does ''store['array_name/0.0.0']'':
- **FSMap** translates to ''zip_fs.open('array_name/0.0.0', 'rb').read()''
- **ZipFileSystem** locates this file in the zip's central directory
- **ZipFileSystem** reads the compressed data from the underlying ''S3File''
- **S3File** makes an HTTP range request to MinIO
- The decompressed chunk bytes are returned to Zarr
This happens **lazily** - only when Zarr actually accesses specific data.
===== Mode Considerations =====
For read-only access (''mode='r'''), this approach works seamlessly.
For write operations, limitations apply:
* Zip files are **not designed for random write access**
* ''ZipFileSystem'' in write mode requires recreating the entire zip
* For remote storage, writing is impractical due to the need to download/reupload
**Recommendation**: Use this approach for **read-only** access to pre-created Zarr zip archives.
===== Performance Notes =====
* **First access**: May be slower due to reading zip central directory
* **Chunk reads**: Each chunk access makes a network request (unless cached by ''s3fs'')
* **Optimization**: ''s3fs'' has built-in caching - configure with ''cache_type'' parameter
* **Best for**: Datasets where you don't need to read all chunks (sparse access patterns)
===== Troubleshooting =====
==== "TypeError: expected str, bytes or os.PathLike object" ====
You tried to pass a file object directly to ''zarr.ZipStore''. Use ''ZipFileSystem'' + ''FSMap'' instead.
==== "SSL Certificate Verify Failed" ====
Add the CA certificate path to ''client_kwargs={'verify': '/path/to/cert.pem'}''.
==== Store appears empty ====
Check that ''root=%%''%%'' in ''FSMap'' is correct. If the zip has a subdirectory, use ''root='subdirectory/path'''.
===== Alternative: Direct FSStore =====
You can also use ''zarr.storage.FSStore'' instead of ''FSMap'':
store = zarr.storage.FSStore(url='', fs=zip_fs)
root = zarr.open(store, mode='r')
Both ''FSStore'' and ''FSMap'' provide the same ''MutableMapping'' interface. ''FSMap'' is more lightweight and part of core ''fsspec''.
====== RKNS (Zarr V2) from Minio ZIP ======
Execute the following with 'uv run', the dependencies are automatically resolved.
This assumes you have our internal pypi registry set up with uv.
# /// script
# requires-python = ">=3.8"
# dependencies = [
# "boto3>=1.40.49",
# "python-dotenv>=0.9.9",
# "packaging>=25.0",
# "rkns==0.6.2",
# "s3fs[boto3]>=2023.12.0",
# "typing-extensions>=4.15.0",
# ]
# ///
import os
from pathlib import Path
from dotenv import load_dotenv
from fsspec.implementations.zip import ZipFileSystem
from fsspec.mapping import FSMap
import s3fs
import zarr
import rkns
# load credentials from .env file
load_dotenv()
access_key_id = os.getenv("STORAGE_ACCESS_KEY")
secret_access_key = os.getenv("STORAGE_SECRET_KEY")
endpoint_url = os.getenv("ENDPOINT")
endpoint_url_full = os.getenv("ENDPOINT_FULL")
# Specify the path to your custom CA certificate
ca_cert_path = "ca.crt.cer"
assert Path(ca_cert_path).is_file()
# Create s3fs filesystem with custom cert
fs = s3fs.S3FileSystem(
client_kwargs={"endpoint_url": endpoint_url_full, "verify": str(ca_cert_path)},
key=access_key_id,
secret=secret_access_key,
use_ssl=True,
)
s3_path = "rekonas-dataset-shhs-rkns/sub-shhs200001_ses-01_task-sleep_eeg.rkns"
zip_fs = ZipFileSystem(fo=fs.open(s3_path, "rb"))
store = zarr.storage.FSStore(url='', fs=zip_fs)
rkns_obj = rkns.from_RKNS(store)
print(rkns_obj.tree)