Differences
This shows you the differences between two versions of the page.
| Both sides previous revision Previous revision Next revision | Previous revision | ||
| workflow_dataset_to_rkns [2025/10/20 12:41] – fabricio | workflow_dataset_to_rkns [2025/10/20 14:03] (current) – [AI-Assisted Mapping Generation] fabricio | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| ====== Turning Datasets into RKNS Format ====== | ====== Turning Datasets into RKNS Format ====== | ||
| + | |||
| This guide describes how to convert a structured dataset (e.g., BIDS) into the **RKNS format** using the [[https:// | This guide describes how to convert a structured dataset (e.g., BIDS) into the **RKNS format** using the [[https:// | ||
| Line 11: | Line 12: | ||
| ---- | ---- | ||
| - | ===== 1. Transform Logic (main.py) ===== | + | ====== 1. Transform Logic (main.py) |
| The core conversion logic lives in '' | The core conversion logic lives in '' | ||
| Line 22: | Line 23: | ||
| and outputs a validated '' | and outputs a validated '' | ||
| - | ===== CLI Interface | + | ==== CLI Interface ==== |
| It is recommended to keep the CLI interface unchanged if possible—it is designed for easier Airflow usage: | It is recommended to keep the CLI interface unchanged if possible—it is designed for easier Airflow usage: | ||
| Line 40: | Line 41: | ||
| The '' | The '' | ||
| - | ====== 1.1 Load and Standardize Signals | + | ==== 1.1 Load and Standardize Signals ==== |
| <code python> | <code python> | ||
| rkns_obj = rkns.from_external_format( | rkns_obj = rkns.from_external_format( | ||
| Line 50: | Line 51: | ||
| Uses regex mappings in '' | Uses regex mappings in '' | ||
| - | ====== 1.2 Add Annotations | + | ==== 1.2 Add Annotations ==== |
| Converts the TSV file to BIDS-compatible format and adds events: | Converts the TSV file to BIDS-compatible format and adds events: | ||
| <code python> | <code python> | ||
| Line 60: | Line 61: | ||
| </ | </ | ||
| - | ====== 1.3 Extract and Categorize Metadata | + | ==== 1.3 Extract and Categorize Metadata ==== |
| - Matches participant ID (e.g., " | - Matches participant ID (e.g., " | ||
| - Looks up the participant in '' | - Looks up the participant in '' | ||
| Line 66: | Line 67: | ||
| - Groups metadata by category and adds each to the RKNS object. | - Groups metadata by category and adds each to the RKNS object. | ||
| - | ====== 1.4 Finalize and Export | + | ==== 1.4 Finalize and Export ==== |
| <code python> | <code python> | ||
| rkns_obj.populate() | rkns_obj.populate() | ||
| Line 72: | Line 73: | ||
| </ | </ | ||
| - | ====== 1.5 Validate | + | ==== 1.5 Validate ==== |
| <code python> | <code python> | ||
| validate_rkns_checksum(output_file) | validate_rkns_checksum(output_file) | ||
| Line 79: | Line 80: | ||
| ---- | ---- | ||
| - | ===== 2. Preprocessing: | + | ====== 2. Preprocessing: |
| RKNS requires events in a strict tab-separated (TSV) format with three columns: '' | RKNS requires events in a strict tab-separated (TSV) format with three columns: '' | ||
| Line 109: | Line 110: | ||
| ---- | ---- | ||
| - | ===== 3. Standardizing Names via Regex Mappings ===== | + | ====== 3. Standardizing Names via Regex Mappings |
| Once you have a compliant TSV, normalize channel and event names non-destructively using regex mappings defined in two JSON files. | Once you have a compliant TSV, normalize channel and event names non-destructively using regex mappings defined in two JSON files. | ||
| - | ====== Channel Names → assets/ | + | ===== Channel Names → assets/ |
| EDF channel labels vary wildly (e.g., "EEG C3-A2", | EDF channel labels vary wildly (e.g., "EEG C3-A2", | ||
| Line 129: | Line 130: | ||
| Keys are regex patterns (applied sequentially), | Keys are regex patterns (applied sequentially), | ||
| - | ===== Validation | + | ==== Validation ==== |
| Use '' | Use '' | ||
| - | | + | 1. Extract unique channel names from your EDF files: |
| <code bash> | <code bash> | ||
| find /data -name " | find /data -name " | ||
| </ | </ | ||
| - | | + | 2. Run the test: |
| <code bash> | <code bash> | ||
| python test_replace_channels.py | python test_replace_channels.py | ||
| </ | </ | ||
| - | | + | 3. Inspect the output: |
| <code bash> | <code bash> | ||
| cat out/ | cat out/ | ||
| </ | </ | ||
| - | ===== AI-Assisted Mapping Generation | + | ==== AI-Assisted Mapping Generation ==== |
| If you have a list of unique channel names, use this prompt with your LLM to accelerate mapping creation: | If you have a list of unique channel names, use this prompt with your LLM to accelerate mapping creation: | ||
| - | > I will provide you a list of EEG channels | + | <code> |
| - | > I require an output in JSON format that maps channel | + | I will provide you a list of physiological signal channels (e.g., |
| - | > | + | I require an output in JSON format that maps each raw channel |
| - | > For example, this is a reference mapping. Note that replacements are applied in sequential order: | + | |
| - | > <code json> | + | |
| - | > { | + | |
| - | > " | + | |
| - | > " | + | |
| - | > " | + | |
| - | > " | + | |
| - | > } | + | |
| - | > </ | + | |
| - | > | + | |
| - | > The keys are regex patterns (case-insensitive), | + | |
| - | > | + | |
| - | > Based on the above example, generate a similar JSON mapping for the following list of channel names: | + | |
| - | > ``` | + | |
| - | > [INSERT YOUR UNIQUE CHANNEL LIST HERE] | + | |
| - | > ``` | + | |
| - | > | + | |
| - | > Provide the output within a JSON code block. | + | |
| - | ====== Event Descriptions → assets/ | + | Key requirements: |
| + | 1. **Include early normalization rules** to handle common delimiters (e.g., `_`, spaces, `.`, `/`, parentheses) by converting them to hyphens (`-`), collapsing multiple hyphens, and trimming leading/ | ||
| + | 2. All patterns must be **case-insensitive** (use `(?i)`). | ||
| + | 3. Use **physiologically meaningful, NSRR/ | ||
| + | - `EEG-C3-M2` (not `EEG-C3_A2` or ambiguous forms) | ||
| + | - `EMG-LLEG` / `EMG-RLEG` for leg EMG (not `LAT`/`RAT` as position) | ||
| + | - `RESP-AIRFLOW-THERM` or `RESP-AIRFLOW-PRES` (not generic `RESP-NASAL`) | ||
| + | - `EOG-LOC` / `EOG-ROC` for eye channels | ||
| + | - `EMG-CHIN` for chin EMG | ||
| + | - `PULSE` for heart rate or pulse signals (unless raw ECG → `ECG`) | ||
| + | 4. **Do not include a final catch-all rule** (e.g., `^(.+)$ → MISC-\1`) unless explicitly requested—most channels in the input list should be known and mapped specifically. | ||
| + | 5. Replacements are applied **in order**, with each rule operating on the result of the previous one. | ||
| + | |||
| + | Example reference snippet: | ||
| + | ```json | ||
| + | { | ||
| + | " | ||
| + | " | ||
| + | " | ||
| + | " | ||
| + | " | ||
| + | " | ||
| + | } | ||
| + | ``` | ||
| + | |||
| + | Now, generate a similar JSON mapping for the following list of channel names: | ||
| + | ``` | ||
| + | [INSERT YOUR UNIQUE CHANNEL LIST HERE] | ||
| + | ``` | ||
| + | |||
| + | Provide the output **within a JSON code block only**—no explanations. | ||
| + | </ | ||
| + | ===== Event Descriptions → assets/ | ||
| Normalize inconsistent event labels (e.g., " | Normalize inconsistent event labels (e.g., " | ||
| Line 189: | Line 204: | ||
| The last pattern acts as a catch-all: unknown events are prefixed with '' | The last pattern acts as a catch-all: unknown events are prefixed with '' | ||
| - | ===== Validation | + | ==== Validation ==== |
| Use '' | Use '' | ||
| Line 206: | Line 221: | ||
| </ | </ | ||
| - | ===== AI-Assisted Mapping Generation | + | ==== AI-Assisted Mapping Generation ==== |
| Use this prompt with your LLM to accelerate mapping creation: | Use this prompt with your LLM to accelerate mapping creation: | ||
| Line 235: | Line 250: | ||
| ---- | ---- | ||
| - | ===== 4. Metadata Handling ===== | + | ====== 4. Metadata Handling |
| RKNS groups metadata by high-level categories (e.g., '' | RKNS groups metadata by high-level categories (e.g., '' | ||
| Line 258: | Line 273: | ||
| </ | </ | ||
| - | ====== Category Mapping | + | ==== Category Mapping ==== |
| The script uses the '' | The script uses the '' | ||
| Line 273: | Line 288: | ||
| * '' | * '' | ||
| - | ====== How It Works ====== | + | ==== How It Works ==== |
| 1. The script extracts the participant ID from the EDF filename (e.g., " | 1. The script extracts the participant ID from the EDF filename (e.g., " | ||
| Line 285: | Line 300: | ||
| ---- | ---- | ||
| - | ===== 5. CLI: Python & Docker ===== | + | ====== 5. CLI: Python & Docker |
| - | ====== Testing the Development CLI ====== | + | ==== Testing the Development CLI ==== |
| Once you've implemented your conversion logic in '' | Once you've implemented your conversion logic in '' | ||
| Line 308: | Line 323: | ||
| 5. Output a '' | 5. Output a '' | ||
| - | ====== Building and Testing the Docker Image ====== | + | ==== Building and Testing the Docker Image ==== |
| Build the Docker image: | Build the Docker image: | ||
| Line 334: | Line 349: | ||
| **Security Note:** Build args are visible in intermediate layer history. Avoid storing secrets (API keys, credentials) as build args; use runtime environment variables or mount secrets instead. | **Security Note:** Build args are visible in intermediate layer history. Avoid storing secrets (API keys, credentials) as build args; use runtime environment variables or mount secrets instead. | ||
| - | ====== Testing the Docker-CLI | + | ==== Testing the Docker-CLI ==== |
| Build with '' | Build with '' | ||
| Line 342: | Line 357: | ||
| ---- | ---- | ||
| - | ===== 6. Orchestration: | + | ====== 6. Orchestration: |
| The provided '' | The provided '' | ||
| - | ====== DAG Overview | + | ==== DAG Overview ==== |
| The DAG: | The DAG: | ||
| Line 354: | Line 369: | ||
| 4. Collects results and validates completion. | 4. Collects results and validates completion. | ||
| - | ====== Required Adaptations | + | ==== Required Adaptations ==== |
| - | ===== a) Input Dataset Path ===== | + | === a) Input Dataset Path === |
| Update '' | Update '' | ||
| Line 365: | Line 380: | ||
| </ | </ | ||
| - | ===== b) Annotation File Naming Logic ===== | + | === b) Annotation File Naming Logic === |
| The DAG assumes each EDF file has a corresponding annotation file. The default logic replaces '' | The DAG assumes each EDF file has a corresponding annotation file. The default logic replaces '' | ||
| Line 387: | Line 402: | ||
| </ | </ | ||
| - | ===== c) Metadata File Paths ===== | + | === c) Metadata File Paths === |
| The DAG uses global '' | The DAG uses global '' | ||
| Line 400: | Line 415: | ||
| * Validate that files exist before passing to the Docker task. | * Validate that files exist before passing to the Docker task. | ||
| - | ===== d) Docker Image Name ===== | + | === d) Docker Image Name === |
| Update the image parameter to match your built container: | Update the image parameter to match your built container: | ||
| Line 408: | Line 423: | ||
| </ | </ | ||
| - | ===== e) Output Directory | + | === e) Output Directory === |
| The output path specifies where '' | The output path specifies where '' | ||
| Line 421: | Line 436: | ||
| * Sufficient disk space is available. | * Sufficient disk space is available. | ||
| - | ===== f) Volume Mounts | + | === f) Volume Mounts === |
| The DAG binds host directories into the container. Update both the host path and container mount point: | The DAG binds host directories into the container. Update both the host path and container mount point: | ||
| Line 448: | Line 463: | ||
| </ | </ | ||
| - | ===== g) Example: Full Adaptation | + | === g) Example: Full Adaptation === |
| Here's a complete example for a BIDS dataset stored at ''/ | Here's a complete example for a BIDS dataset stored at ''/ | ||
| Line 492: | Line 507: | ||
| ---- | ---- | ||
| - | ===== 7. Project Structure ===== | + | ====== 7. Project Structure |
| < | < | ||
| Line 518: | Line 533: | ||
| ---- | ---- | ||
| - | ===== 8. Workflow Summary ===== | + | ====== 8. Workflow Summary |
| - | ====== End-to-End Process | + | ==== End-to-End Process ==== |
| 1. **Prepare Annotations** | 1. **Prepare Annotations** | ||