Using FSMap with Zarr v2 for Remote Zip Stores
Overview
This guide explains how to open Zarr archives stored as zip files in remote object storage (MinIO/S3) using fsspec's FSMap with Zarr v2.
The Problem
Zarr v2's native ZipStore only accepts local file paths, not remote files or file-like objects. To access a .zip file stored in MinIO/S3, we need a different approach that bridges the gap between remote storage and Zarr's storage interface.
The Solution Architecture
The solution uses a chain of abstractions:
- S3FileSystem: Provides filesystem-like access to MinIO/S3 buckets
- ZipFileSystem: Wraps a zip file (local or remote) as a virtual filesystem
- FSMap: Translates filesystem operations into a key-value mapping that Zarr understands
Chain of Components
MinIO/S3 Storage
↓ (accessed via)
S3FileSystem (treats bucket as filesystem)
↓ (opens file)
S3File object (file-like interface to remote zip)
↓ (wrapped by)
ZipFileSystem (treats zip contents as filesystem)
↓ (mapped to)
FSMap (key-value store interface)
↓ (consumed by)
Zarr (reads arrays and metadata)
How It Works
1. S3FileSystem
S3FileSystem from the s3fs library provides a Python filesystem interface (fsspec) to S3/MinIO:
import s3fs fs = s3fs.S3FileSystem( client_kwargs={ 'endpoint_url': 'https://minio.example.com', 'verify': '/path/to/ca.crt' }, key='access_key', secret='secret_key', use_ssl=True )
This object lets you interact with MinIO buckets using familiar filesystem operations like fs.open(), fs.ls(), etc.
2. ZipFileSystem
ZipFileSystem from fsspec.implementations.zip takes a file object (which can be remote) and exposes the zip archive's internal structure as a filesystem:
from fsspec.implementations.zip import ZipFileSystem # Open the remote zip file remote_file = fs.open('bucket/path/archive.zip', 'rb') # Create a filesystem view into the zip zip_fs = ZipFileSystem(fo=remote_file)
The fo parameter accepts any file-like object, including remote files from S3FileSystem. Now zip_fs treats the zip's contents as if they were a directory tree.
3. FSMap
FSMap from fsspec.mapping implements Python's MutableMapping interface (dict-like behavior) on top of any fsspec filesystem:
from fsspec.mapping import FSMap # Create a mapping store store = FSMap(root='', fs=zip_fs)
The root='' parameter means “start at the zip's root directory”. FSMap now translates dictionary-style access (store[key]) into filesystem operations (zip_fs.open(key)).
4. Zarr Integration
Zarr v2 expects stores to implement the MutableMapping interface, which FSMap provides. When you open a Zarr group:
import zarr root = zarr.open(store, mode='r')
Zarr performs operations like:
store['.zgroup']→ reads the root metadatastore['array_name/.zarray']→ reads array metadatastore['array_name/0.0.0']→ reads a specific chunk
Each of these translates through the chain:
FSMap→ZipFileSystem→S3File→ MinIO/S3 HTTP request
Complete Example
import s3fs import zarr from fsspec.implementations.zip import ZipFileSystem from fsspec.mapping import FSMap # 1. Configure S3/MinIO access fs = s3fs.S3FileSystem( client_kwargs={ 'endpoint_url': 'https://minio.example.com', 'verify': '/path/to/ca.crt' }, key='your_access_key', secret='your_secret_key', use_ssl=True ) # 2. Open the remote zip file as a filesystem s3_path = 'my-bucket/data/experiment.zarr.zip' zip_fs = ZipFileSystem(fo=fs.open(s3_path, 'rb')) # 3. Create a mapping store for Zarr store = FSMap(root='', fs=zip_fs) # 4. Open with Zarr root = zarr.open(store, mode='r') # 5. Use the Zarr group normally print(root.tree()) array = root['my_array'][:]
Key-Value Mapping Internals
Under the hood, a Zarr zip archive contains files like:
.zgroup # Root group metadata array_name/.zarray # Array metadata array_name/0.0.0 # Chunk at position (0,0,0) array_name/0.0.1 # Chunk at position (0,0,1) subgroup/.zgroup # Nested group metadata
When Zarr does store['array_name/0.0.0']:
- FSMap translates to
zip_fs.open('array_name/0.0.0', 'rb').read() - ZipFileSystem locates this file in the zip's central directory
- ZipFileSystem reads the compressed data from the underlying
S3File - S3File makes an HTTP range request to MinIO
- The decompressed chunk bytes are returned to Zarr
This happens lazily - only when Zarr actually accesses specific data.
Mode Considerations
For read-only access (mode='r'), this approach works seamlessly.
For write operations, limitations apply:
- Zip files are not designed for random write access
ZipFileSystemin write mode requires recreating the entire zip- For remote storage, writing is impractical due to the need to download/reupload
Recommendation: Use this approach for read-only access to pre-created Zarr zip archives.
Performance Notes
- First access: May be slower due to reading zip central directory
- Chunk reads: Each chunk access makes a network request (unless cached by
s3fs) - Optimization:
s3fshas built-in caching - configure withcache_typeparameter - Best for: Datasets where you don't need to read all chunks (sparse access patterns)
Troubleshooting
"TypeError: expected str, bytes or os.PathLike object"
You tried to pass a file object directly to zarr.ZipStore. Use ZipFileSystem + FSMap instead.
"SSL Certificate Verify Failed"
Add the CA certificate path to client_kwargs={'verify': '/path/to/cert.pem'}.
Store appears empty
Check that root='' in FSMap is correct. If the zip has a subdirectory, use root='subdirectory/path'.
Alternative: Direct FSStore
You can also use zarr.storage.FSStore instead of FSMap:
store = zarr.storage.FSStore(url='', fs=zip_fs) root = zarr.open(store, mode='r')
Both FSStore and FSMap provide the same MutableMapping interface. FSMap is more lightweight and part of core fsspec.
RKNS (Zarr V2) from Minio ZIP
Execute the following with 'uv run', the dependencies are automatically resolved. This assumes you have our internal pypi registry set up with uv.
# /// script # requires-python = ">=3.8" # dependencies = [ # "boto3>=1.40.49", # "python-dotenv>=0.9.9", # "packaging>=25.0", # "rkns==0.6.2", # "s3fs[boto3]>=2023.12.0", # "typing-extensions>=4.15.0", # ] # /// import os from pathlib import Path from dotenv import load_dotenv from fsspec.implementations.zip import ZipFileSystem from fsspec.mapping import FSMap import s3fs import zarr import rkns # load credentials from .env file load_dotenv() access_key_id = os.getenv("STORAGE_ACCESS_KEY") secret_access_key = os.getenv("STORAGE_SECRET_KEY") endpoint_url = os.getenv("ENDPOINT") endpoint_url_full = os.getenv("ENDPOINT_FULL") # Specify the path to your custom CA certificate ca_cert_path = "ca.crt.cer" assert Path(ca_cert_path).is_file() # Create s3fs filesystem with custom cert fs = s3fs.S3FileSystem( client_kwargs={"endpoint_url": endpoint_url_full, "verify": str(ca_cert_path)}, key=access_key_id, secret=secret_access_key, use_ssl=True, ) s3_path = "rekonas-dataset-shhs-rkns/sub-shhs200001_ses-01_task-sleep_eeg.rkns" zip_fs = ZipFileSystem(fo=fs.open(s3_path, "rb")) store = zarr.storage.FSStore(url='', fs=zip_fs) rkns_obj = rkns.from_RKNS(store) print(rkns_obj.tree)