Differences
This shows you the differences between two versions of the page.
| Both sides previous revision Previous revision Next revision | Previous revision | ||
| workflow_dataset_to_rkns [2025/10/20 12:38] – [AI-Assisted Mapping Generation] fabricio | workflow_dataset_to_rkns [2025/10/20 14:03] (current) – [AI-Assisted Mapping Generation] fabricio | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| ====== Turning Datasets into RKNS Format ====== | ====== Turning Datasets into RKNS Format ====== | ||
| + | |||
| This guide describes how to convert a structured dataset (e.g., BIDS) into the **RKNS format** using the [[https:// | This guide describes how to convert a structured dataset (e.g., BIDS) into the **RKNS format** using the [[https:// | ||
| Line 11: | Line 12: | ||
| ---- | ---- | ||
| - | ===== 1. Transform Logic (main.py) ===== | + | ====== 1. Transform Logic (main.py) |
| The core conversion logic lives in '' | The core conversion logic lives in '' | ||
| Line 22: | Line 23: | ||
| and outputs a validated '' | and outputs a validated '' | ||
| - | ===== CLI Interface | + | ==== CLI Interface ==== |
| It is recommended to keep the CLI interface unchanged if possible—it is designed for easier Airflow usage: | It is recommended to keep the CLI interface unchanged if possible—it is designed for easier Airflow usage: | ||
| Line 40: | Line 41: | ||
| The '' | The '' | ||
| - | ====== 1.1 Load and Standardize Signals | + | ==== 1.1 Load and Standardize Signals ==== |
| <code python> | <code python> | ||
| rkns_obj = rkns.from_external_format( | rkns_obj = rkns.from_external_format( | ||
| Line 50: | Line 51: | ||
| Uses regex mappings in '' | Uses regex mappings in '' | ||
| - | ====== 1.2 Add Annotations | + | ==== 1.2 Add Annotations ==== |
| Converts the TSV file to BIDS-compatible format and adds events: | Converts the TSV file to BIDS-compatible format and adds events: | ||
| <code python> | <code python> | ||
| Line 60: | Line 61: | ||
| </ | </ | ||
| - | ====== 1.3 Extract and Categorize Metadata | + | ==== 1.3 Extract and Categorize Metadata ==== |
| - Matches participant ID (e.g., " | - Matches participant ID (e.g., " | ||
| - Looks up the participant in '' | - Looks up the participant in '' | ||
| Line 66: | Line 67: | ||
| - Groups metadata by category and adds each to the RKNS object. | - Groups metadata by category and adds each to the RKNS object. | ||
| - | ====== 1.4 Finalize and Export | + | ==== 1.4 Finalize and Export ==== |
| <code python> | <code python> | ||
| rkns_obj.populate() | rkns_obj.populate() | ||
| Line 72: | Line 73: | ||
| </ | </ | ||
| - | ====== 1.5 Validate | + | ==== 1.5 Validate ==== |
| <code python> | <code python> | ||
| validate_rkns_checksum(output_file) | validate_rkns_checksum(output_file) | ||
| Line 79: | Line 80: | ||
| ---- | ---- | ||
| - | ===== 2. Preprocessing: | + | ====== 2. Preprocessing: |
| RKNS requires events in a strict tab-separated (TSV) format with three columns: '' | RKNS requires events in a strict tab-separated (TSV) format with three columns: '' | ||
| Line 109: | Line 110: | ||
| ---- | ---- | ||
| - | ===== 3. Standardizing Names via Regex Mappings ===== | + | ====== 3. Standardizing Names via Regex Mappings |
| Once you have a compliant TSV, normalize channel and event names non-destructively using regex mappings defined in two JSON files. | Once you have a compliant TSV, normalize channel and event names non-destructively using regex mappings defined in two JSON files. | ||
| - | ====== Channel Names → assets/ | + | ===== Channel Names → assets/ |
| EDF channel labels vary wildly (e.g., "EEG C3-A2", | EDF channel labels vary wildly (e.g., "EEG C3-A2", | ||
| Line 129: | Line 130: | ||
| Keys are regex patterns (applied sequentially), | Keys are regex patterns (applied sequentially), | ||
| - | ===== Validation | + | ==== Validation ==== |
| Use '' | Use '' | ||
| - | | + | 1. Extract unique channel names from your EDF files: |
| <code bash> | <code bash> | ||
| find /data -name " | find /data -name " | ||
| </ | </ | ||
| - | | + | 2. Run the test: |
| <code bash> | <code bash> | ||
| python test_replace_channels.py | python test_replace_channels.py | ||
| </ | </ | ||
| - | | + | 3. Inspect the output: |
| <code bash> | <code bash> | ||
| cat out/ | cat out/ | ||
| </ | </ | ||
| - | ===== AI-Assisted Mapping Generation | + | ==== AI-Assisted Mapping Generation ==== |
| If you have a list of unique channel names, use this prompt with your LLM to accelerate mapping creation: | If you have a list of unique channel names, use this prompt with your LLM to accelerate mapping creation: | ||
| - | > I will provide you a list of EEG channels | + | <code> |
| - | > I require an output in JSON format that maps channel | + | I will provide you a list of physiological signal channels (e.g., |
| - | > | + | I require an output in JSON format that maps each raw channel |
| - | > For example, this is a reference mapping. Note that replacements are applied in sequential order: | + | |
| - | > <code json> | + | |
| - | > { | + | |
| - | > " | + | |
| - | > " | + | |
| - | > " | + | |
| - | > " | + | |
| - | > } | + | |
| - | > </ | + | |
| - | > | + | |
| - | > The keys are regex patterns (case-insensitive), | + | |
| - | > | + | |
| - | > Based on the above example, generate a similar JSON mapping for the following list of channel names: | + | |
| - | > ``` | + | |
| - | > [INSERT YOUR UNIQUE CHANNEL LIST HERE] | + | |
| - | > ``` | + | |
| - | > | + | |
| - | > Provide the output within a JSON code block. | + | |
| - | ====== Event Descriptions → assets/ | + | Key requirements: |
| + | 1. **Include early normalization rules** to handle common delimiters (e.g., `_`, spaces, `.`, `/`, parentheses) by converting them to hyphens (`-`), collapsing multiple hyphens, and trimming leading/ | ||
| + | 2. All patterns must be **case-insensitive** (use `(?i)`). | ||
| + | 3. Use **physiologically meaningful, NSRR/ | ||
| + | - `EEG-C3-M2` (not `EEG-C3_A2` or ambiguous forms) | ||
| + | - `EMG-LLEG` / `EMG-RLEG` for leg EMG (not `LAT`/`RAT` as position) | ||
| + | - `RESP-AIRFLOW-THERM` or `RESP-AIRFLOW-PRES` (not generic `RESP-NASAL`) | ||
| + | - `EOG-LOC` / `EOG-ROC` for eye channels | ||
| + | - `EMG-CHIN` for chin EMG | ||
| + | - `PULSE` for heart rate or pulse signals (unless raw ECG → `ECG`) | ||
| + | 4. **Do not include a final catch-all rule** (e.g., `^(.+)$ → MISC-\1`) unless explicitly requested—most channels in the input list should be known and mapped specifically. | ||
| + | 5. Replacements are applied **in order**, with each rule operating on the result of the previous one. | ||
| + | |||
| + | Example reference snippet: | ||
| + | ```json | ||
| + | { | ||
| + | " | ||
| + | " | ||
| + | " | ||
| + | " | ||
| + | " | ||
| + | " | ||
| + | } | ||
| + | ``` | ||
| + | |||
| + | Now, generate a similar JSON mapping for the following list of channel names: | ||
| + | ``` | ||
| + | [INSERT YOUR UNIQUE CHANNEL LIST HERE] | ||
| + | ``` | ||
| + | |||
| + | Provide the output **within a JSON code block only**—no explanations. | ||
| + | </ | ||
| + | ===== Event Descriptions → assets/ | ||
| Normalize inconsistent event labels (e.g., " | Normalize inconsistent event labels (e.g., " | ||
| Line 189: | Line 204: | ||
| The last pattern acts as a catch-all: unknown events are prefixed with '' | The last pattern acts as a catch-all: unknown events are prefixed with '' | ||
| - | ===== Validation | + | ==== Validation ==== |
| Use '' | Use '' | ||
| Line 206: | Line 221: | ||
| </ | </ | ||
| - | ===== AI-Assisted Mapping Generation | + | ==== AI-Assisted Mapping Generation ==== |
| Use this prompt with your LLM to accelerate mapping creation: | Use this prompt with your LLM to accelerate mapping creation: | ||
| Line 235: | Line 250: | ||
| ---- | ---- | ||
| - | ===== 4. Metadata Handling ===== | + | ====== 4. Metadata Handling |
| RKNS groups metadata by high-level categories (e.g., '' | RKNS groups metadata by high-level categories (e.g., '' | ||
| Line 258: | Line 273: | ||
| </ | </ | ||
| - | ====== Category Mapping | + | ==== Category Mapping ==== |
| The script uses the '' | The script uses the '' | ||
| Line 273: | Line 288: | ||
| * '' | * '' | ||
| - | ====== How It Works ====== | + | ==== How It Works ==== |
| 1. The script extracts the participant ID from the EDF filename (e.g., " | 1. The script extracts the participant ID from the EDF filename (e.g., " | ||
| Line 285: | Line 300: | ||
| ---- | ---- | ||
| - | ===== 5. CLI: Python & Docker ===== | + | ====== 5. CLI: Python & Docker |
| - | ====== Testing the Development CLI ====== | + | ==== Testing the Development CLI ==== |
| Once you've implemented your conversion logic in '' | Once you've implemented your conversion logic in '' | ||
| Line 308: | Line 323: | ||
| 5. Output a '' | 5. Output a '' | ||
| - | ====== Building and Testing the Docker Image ====== | + | ==== Building and Testing the Docker Image ==== |
| Build the Docker image: | Build the Docker image: | ||
| Line 334: | Line 349: | ||
| **Security Note:** Build args are visible in intermediate layer history. Avoid storing secrets (API keys, credentials) as build args; use runtime environment variables or mount secrets instead. | **Security Note:** Build args are visible in intermediate layer history. Avoid storing secrets (API keys, credentials) as build args; use runtime environment variables or mount secrets instead. | ||
| - | ====== Testing the Docker-CLI | + | ==== Testing the Docker-CLI ==== |
| Build with '' | Build with '' | ||
| Line 342: | Line 357: | ||
| ---- | ---- | ||
| - | ===== 6. Orchestration: | + | ====== 6. Orchestration: |
| The provided '' | The provided '' | ||
| - | ====== DAG Overview | + | ==== DAG Overview ==== |
| The DAG: | The DAG: | ||
| Line 354: | Line 369: | ||
| 4. Collects results and validates completion. | 4. Collects results and validates completion. | ||
| - | ====== Required Adaptations | + | ==== Required Adaptations ==== |
| - | ===== a) Input Dataset Path ===== | + | === a) Input Dataset Path === |
| Update '' | Update '' | ||
| Line 365: | Line 380: | ||
| </ | </ | ||
| - | ===== b) Annotation File Naming Logic ===== | + | === b) Annotation File Naming Logic === |
| The DAG assumes each EDF file has a corresponding annotation file. The default logic replaces '' | The DAG assumes each EDF file has a corresponding annotation file. The default logic replaces '' | ||
| Line 387: | Line 402: | ||
| </ | </ | ||
| - | ===== c) Metadata File Paths ===== | + | === c) Metadata File Paths === |
| The DAG uses global '' | The DAG uses global '' | ||
| Line 400: | Line 415: | ||
| * Validate that files exist before passing to the Docker task. | * Validate that files exist before passing to the Docker task. | ||
| - | ===== d) Docker Image Name ===== | + | === d) Docker Image Name === |
| Update the image parameter to match your built container: | Update the image parameter to match your built container: | ||
| Line 408: | Line 423: | ||
| </ | </ | ||
| - | ===== e) Output Directory | + | === e) Output Directory === |
| The output path specifies where '' | The output path specifies where '' | ||
| Line 421: | Line 436: | ||
| * Sufficient disk space is available. | * Sufficient disk space is available. | ||
| - | ===== f) Volume Mounts | + | === f) Volume Mounts === |
| The DAG binds host directories into the container. Update both the host path and container mount point: | The DAG binds host directories into the container. Update both the host path and container mount point: | ||
| Line 448: | Line 463: | ||
| </ | </ | ||
| - | ===== g) Example: Full Adaptation | + | === g) Example: Full Adaptation === |
| Here's a complete example for a BIDS dataset stored at ''/ | Here's a complete example for a BIDS dataset stored at ''/ | ||
| Line 492: | Line 507: | ||
| ---- | ---- | ||
| - | ===== 7. Project Structure ===== | + | ====== 7. Project Structure |
| < | < | ||
| Line 518: | Line 533: | ||
| ---- | ---- | ||
| - | ===== 8. Workflow Summary ===== | + | ====== 8. Workflow Summary |
| - | ====== End-to-End Process | + | ==== End-to-End Process ==== |
| 1. **Prepare Annotations** | 1. **Prepare Annotations** | ||
| Line 550: | Line 565: | ||
| ---- | ---- | ||
| - | ===== 9. Troubleshooting ===== | + | ====== 9. Troubleshooting |
| - | ====== EDF File Not Found ====== | + | ==== EDF File Not Found ==== |
| - Verify the path in '' | - Verify the path in '' | ||
| - If using Docker, ensure the path is relative to the container mount point (not the host). | - If using Docker, ensure the path is relative to the container mount point (not the host). | ||
| - | ====== Annotations File Not Found ====== | + | ==== Annotations File Not Found ==== |
| - Check that the annotation file naming logic matches your dataset. | - Check that the annotation file naming logic matches your dataset. | ||
| - Ensure the TSV is in RKNS-compatible format (three columns: onset, duration, event). | - Ensure the TSV is in RKNS-compatible format (three columns: onset, duration, event). | ||
| - | ====== Participant Not Found in participants.tsv | + | ==== Participant Not Found in participants.tsv ==== |
| - The participant ID extraction regex may not match your filename pattern. | - The participant ID extraction regex may not match your filename pattern. | ||
| - Update '' | - Update '' | ||
| Line 572: | Line 587: | ||
| </ | </ | ||
| - | ====== Metadata Columns Not Recognized | + | ==== Metadata Columns Not Recognized ==== |
| - Verify that '' | - Verify that '' | ||
| - Check that the '' | - Check that the '' | ||
| - Add missing mappings to '' | - Add missing mappings to '' | ||
| - | ====== Channel or Event Names Not Mapped | + | ==== Channel or Event Names Not Mapped ==== |
| - Extract raw names and add them to the regex mapping JSON. | - Extract raw names and add them to the regex mapping JSON. | ||
| - Test with '' | - Test with '' | ||
| - Use LLM prompts to generate regex patterns if needed. | - Use LLM prompts to generate regex patterns if needed. | ||
| - | ====== Permission Denied on Output File ====== | + | ==== Permission Denied on Output File ==== |
| - Ensure the output directory is writable by the process (or Airflow worker). | - Ensure the output directory is writable by the process (or Airflow worker). | ||
| - Use '' | - Use '' | ||
| - | ====== Docker Build Fails ====== | + | ==== Docker Build Fails ==== |
| - Check that all dependencies in '' | - Check that all dependencies in '' | ||
| - Verify the '' | - Verify the '' | ||
| Line 593: | Line 609: | ||
| ---- | ---- | ||
| - | ===== 10. Key Files Reference ===== | + | ====== 10. Key Files Reference |
| - | ====== main.py | + | ==== main.py ==== |
| - **'' | - **'' | ||
| - **'' | - **'' | ||
| Line 602: | Line 618: | ||
| - **'' | - **'' | ||
| - | ====== assets/ | + | ==== assets/ |
| - Regex patterns (keys) → standardized channel names (values). | - Regex patterns (keys) → standardized channel names (values). | ||
| - Applied sequentially in order. | - Applied sequentially in order. | ||
| - Examples: " | - Examples: " | ||
| - | ====== assets/ | + | ==== assets/ |
| - Regex patterns (keys) → standardized event labels (values). | - Regex patterns (keys) → standardized event labels (values). | ||
| - Applied sequentially in order. | - Applied sequentially in order. | ||
| - Catch-all pattern: unknown events prefixed with '' | - Catch-all pattern: unknown events prefixed with '' | ||
| - | ====== participants.json | + | ==== participants.json ==== |
| - Column codebook with '' | - Column codebook with '' | ||
| - Folder values are mapped to categories by '' | - Folder values are mapped to categories by '' | ||
| - | ====== participants.tsv | + | ==== participants.tsv ==== |
| - Subject-level metadata table (BIDS-compatible). | - Subject-level metadata table (BIDS-compatible). | ||
| - Rows = subjects, columns = variables (must match '' | - Rows = subjects, columns = variables (must match '' | ||
| Line 623: | Line 639: | ||
| ---- | ---- | ||
| - | ===== 11. Contributing & Customization ===== | + | ====== 11. Contributing & Customization |
| This template is intentionally configurable. The main customization points are: | This template is intentionally configurable. The main customization points are: | ||
| - | | + | |
| - | | + | |
| - | | + | |
| - | | + | |
| - | | + | |
| For questions or issues, refer to the troubleshooting section or the inline code comments in '' | For questions or issues, refer to the troubleshooting section or the inline code comments in '' | ||