workflow_dataset_to_rkns

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
workflow_dataset_to_rkns [2025/10/20 12:35] – created fabricioworkflow_dataset_to_rkns [2025/10/20 14:03] (current) – [AI-Assisted Mapping Generation] fabricio
Line 1: Line 1:
 ====== Turning Datasets into RKNS Format ====== ====== Turning Datasets into RKNS Format ======
 +
  
 This guide describes how to convert a structured dataset (e.g., BIDS) into the **RKNS format** using the [[https://code.rekonas.com/rekonas/airflow-template-dataset-to-rkns|airflow-template-dataset-to-rkns]] template repository. The process is divided into three main phases: This guide describes how to convert a structured dataset (e.g., BIDS) into the **RKNS format** using the [[https://code.rekonas.com/rekonas/airflow-template-dataset-to-rkns|airflow-template-dataset-to-rkns]] template repository. The process is divided into three main phases:
  
-  1. **Transform Logic** – Implement the conversion from raw data to RKNS (via ''main.py''). +  **Transform Logic** – Implement the conversion from raw data to RKNS (via ''main.py''). 
-  2. **Containerization** – Package the logic into a portable Docker image (''build.sh'' + ''Dockerfile''). +  **Containerization** – Package the logic into a portable Docker image (''build.sh'' + ''Dockerfile''). 
-  3. **Orchestration** – Run the transformation at scale using Apache Airflow.+  **Orchestration** – Run the transformation at scale using Apache Airflow.
  
 Below, we detail each phase with practical guidance. Below, we detail each phase with practical guidance.
Line 11: Line 12:
 ---- ----
  
-===== 1. Transform Logic (main.py) =====+====== 1. Transform Logic (main.py) ======
  
 The core conversion logic lives in ''main.py''. It reads: The core conversion logic lives in ''main.py''. It reads:
Line 22: Line 23:
 and outputs a validated ''*.rkns*'' file (a Zarr zip store). and outputs a validated ''*.rkns*'' file (a Zarr zip store).
  
-===== CLI Interface =====+==== CLI Interface ====
  
 It is recommended to keep the CLI interface unchanged if possible—it is designed for easier Airflow usage: It is recommended to keep the CLI interface unchanged if possible—it is designed for easier Airflow usage:
Line 40: Line 41:
 The ''edf_to_rkns()'' function executes the following steps: The ''edf_to_rkns()'' function executes the following steps:
  
-====== 1.1 Load and Standardize Signals ======+==== 1.1 Load and Standardize Signals ====
 <code python> <code python>
 rkns_obj = rkns.from_external_format( rkns_obj = rkns.from_external_format(
Line 50: Line 51:
 Uses regex mappings in ''assets/replace_channels.json'' to rename EDF channels to standardized names (e.g., "EEG(sec)" → "EEG-C3-A2"). Uses regex mappings in ''assets/replace_channels.json'' to rename EDF channels to standardized names (e.g., "EEG(sec)" → "EEG-C3-A2").
  
-====== 1.2 Add Annotations ======+==== 1.2 Add Annotations ====
 Converts the TSV file to BIDS-compatible format and adds events: Converts the TSV file to BIDS-compatible format and adds events:
 <code python> <code python>
Line 60: Line 61:
 </code> </code>
  
-====== 1.3 Extract and Categorize Metadata ======+==== 1.3 Extract and Categorize Metadata ====
   - Matches participant ID (e.g., "sub-0001") from the EDF filename.   - Matches participant ID (e.g., "sub-0001") from the EDF filename.
   - Looks up the participant in ''participants.tsv''.   - Looks up the participant in ''participants.tsv''.
Line 66: Line 67:
   - Groups metadata by category and adds each to the RKNS object.   - Groups metadata by category and adds each to the RKNS object.
  
-====== 1.4 Finalize and Export ======+==== 1.4 Finalize and Export ====
 <code python> <code python>
 rkns_obj.populate()                     # Build internal structure rkns_obj.populate()                     # Build internal structure
Line 72: Line 73:
 </code> </code>
  
-====== 1.5 Validate ======+==== 1.5 Validate ====
 <code python> <code python>
 validate_rkns_checksum(output_file)     # Verify checksums on all data validate_rkns_checksum(output_file)     # Verify checksums on all data
Line 79: Line 80:
 ---- ----
  
-===== 2. Preprocessing: Generate RKNS-Compatible Event Annotations =====+====== 2. Preprocessing: Generate RKNS-Compatible Event Annotations ======
  
 RKNS requires events in a strict tab-separated (TSV) format with three columns: ''onset'', ''duration'', and ''event''.   RKNS requires events in a strict tab-separated (TSV) format with three columns: ''onset'', ''duration'', and ''event''.  
Line 109: Line 110:
 ---- ----
  
-===== 3. Standardizing Names via Regex Mappings =====+====== 3. Standardizing Names via Regex Mappings ======
  
 Once you have a compliant TSV, normalize channel and event names non-destructively using regex mappings defined in two JSON files. Once you have a compliant TSV, normalize channel and event names non-destructively using regex mappings defined in two JSON files.
  
-====== Channel Names → assets/replace_channels.json ======+===== Channel Names → assets/replace_channels.json =====
  
 EDF channel labels vary wildly (e.g., "EEG C3-A2", "EEG(sec)"). Map them to standardized RKNS names: EDF channel labels vary wildly (e.g., "EEG C3-A2", "EEG(sec)"). Map them to standardized RKNS names:
Line 129: Line 130:
 Keys are regex patterns (applied sequentially), and values are replacement strings. Groups in the pattern can be referenced in the replacement (e.g., ''\1''). Keys are regex patterns (applied sequentially), and values are replacement strings. Groups in the pattern can be referenced in the replacement (e.g., ''\1'').
  
-===== Validation =====+==== Validation ====
  
 Use ''test_replace_channels.py'' to validate your mappings: Use ''test_replace_channels.py'' to validate your mappings:
  
-  1. Extract unique channel names from your EDF files:+1. Extract unique channel names from your EDF files:
 <code bash> <code bash>
 find /data -name "*.edf" -exec edf-peek {} \; | grep "signal_labels" | sort | uniq > assets/extracted_channels.txt find /data -name "*.edf" -exec edf-peek {} \; | grep "signal_labels" | sort | uniq > assets/extracted_channels.txt
 </code> </code>
-  2. Run the test:+2. Run the test:
 <code bash> <code bash>
 python test_replace_channels.py python test_replace_channels.py
 </code> </code>
-  3. Inspect the output:+3. Inspect the output:
 <code bash> <code bash>
 cat out/renamed_channels.csv cat out/renamed_channels.csv
 </code> </code>
  
-===== AI-Assisted Mapping Generation =====+==== AI-Assisted Mapping Generation ====
  
 If you have a list of unique channel names, use this prompt with your LLM to accelerate mapping creation: If you have a list of unique channel names, use this prompt with your LLM to accelerate mapping creation:
  
-> I will provide you a list of EEG channels extracted from the original EDFs.   +<code> 
-I require an output in JSON format that maps channel names to new standardized names using grep-style regex replacements.   +I will provide you a list of physiological signal channels (e.g., EEG, EOG, EMG, respiratory, cardiac) extracted from original EDF files.   
->  +I require an output in JSON format that maps each raw channel name to standardized name using **sequential, grep-style regex replacements**.
-> For example, this is a reference mapping. Note that replacements are applied in sequential order: +
-> <code json> +
-> { +
->   "(?i)^ABDO\\sRES\\s*$": "RESP-ABD", +
->   "(?i)^THOR\\sRES\\s*$": "RESP-CHEST", +
->   "(?i)^EEG\\(sec\\)\\s*$": "EEG-C3-A2", +
->   "_": "-" +
-> } +
-> </code> +
->  +
-> The keys are regex patterns (case-insensitive), and the values are replacement strings. Groups in the pattern can be referenced in the replacement (e.g., ''\1''). +
->  +
-> Based on the above example, generate a similar JSON mapping for the following list of channel names: +
-> ``` +
-> [INSERT YOUR UNIQUE CHANNEL LIST HERE] +
-> ``` +
->  +
-> Provide the output within a JSON code block.+
  
-====== Event Descriptions → assets/event_description_mapping.json ======+Key requirements: 
 +1. **Include early normalization rules** to handle common delimiters (e.g., `_`, spaces, `.`, `/`, parentheses) by converting them to hyphens (`-`), collapsing multiple hyphens, and trimming leading/trailing hyphens. 
 +2. All patterns must be **case-insensitive** (use `(?i)`). 
 +3. Use **physiologically meaningful, NSRR/AASM-aligned names**, such as: 
 +   - `EEG-C3-M2` (not `EEG-C3_A2` or ambiguous forms) 
 +   - `EMG-LLEG` / `EMG-RLEG` for leg EMG (not `LAT`/`RAT` as position) 
 +   - `RESP-AIRFLOW-THERM` or `RESP-AIRFLOW-PRES` (not generic `RESP-NASAL`) 
 +   - `EOG-LOC` / `EOG-ROC` for eye channels 
 +   - `EMG-CHIN` for chin EMG 
 +   - `PULSE` for heart rate or pulse signals (unless raw ECG → `ECG`) 
 +4. **Do not include a final catch-all rule** (e.g., `^(.+)$ → MISC-\1`) unless explicitly requested—most channels in the input list should be known and mapped specifically. 
 +5. Replacements are applied **in order**, with each rule operating on the result of the previous one. 
 + 
 +Example reference snippet: 
 +```json 
 +
 +  "(?i)[\\s_\\./\\(\\),]+": "-", 
 +  "-+": "-", 
 +  "^-|-$": "", 
 +  "(?i)^abdomen$": "RESP-ABD", 
 +  "(?i)^c3_m2$": "EEG-C3-M2", 
 +  "(?i)^lat$": "EMG-LLEG" 
 +
 +``` 
 + 
 +Now, generate a similar JSON mapping for the following list of channel names: 
 +``` 
 +[INSERT YOUR UNIQUE CHANNEL LIST HERE] 
 +``` 
 + 
 +Provide the output **within a JSON code block only**—no explanations. 
 +</code> 
 +===== Event Descriptions → assets/event_description_mapping.json =====
  
 Normalize inconsistent event labels (e.g., "stage_AASM_e1_W" → "sleep_stage_wake"): Normalize inconsistent event labels (e.g., "stage_AASM_e1_W" → "sleep_stage_wake"):
Line 189: Line 204:
 The last pattern acts as a catch-all: unknown events are prefixed with ''[comment]'' to prevent validation errors. The last pattern acts as a catch-all: unknown events are prefixed with ''[comment]'' to prevent validation errors.
  
-===== Validation =====+==== Validation ====
  
 Use ''test_replace_events.py'' to validate your mappings: Use ''test_replace_events.py'' to validate your mappings:
  
-  1. Extract unique event names from your annotations:+1. Extract unique event names from your annotations:
 <code bash> <code bash>
 tail -n +2 /data/**/*_events.tsv | cut -f3 | sort | uniq > assets/extracted_annotation_events.txt tail -n +2 /data/**/*_events.tsv | cut -f3 | sort | uniq > assets/extracted_annotation_events.txt
 </code> </code>
-  2. Run the test:+2. Run the test:
 <code bash> <code bash>
 python test_replace_events.py python test_replace_events.py
 </code> </code>
-  3. Inspect the output:+3. Inspect the output:
 <code bash> <code bash>
 cat out/renamed_events.csv cat out/renamed_events.csv
 </code> </code>
  
-===== AI-Assisted Mapping Generation =====+==== AI-Assisted Mapping Generation ====
  
 Use this prompt with your LLM to accelerate mapping creation: Use this prompt with your LLM to accelerate mapping creation:
  
-> I will provide you a list of sleep events extracted from original annotation files.   +<code> 
-I require an output in JSON format that maps event names to standardized names using grep-style regex replacements.   +I will provide you a list of sleep events extracted from original annotation files.   
->  +I require an output in JSON format that maps event names to standardized names using grep-style regex replacements.  
-> For example, this is a reference mapping. Note that replacements are applied in sequential order: +
-> <code json> +
-> { +
->   "(?i)^stage_AASM_e1_W$": "sleep_stage_wake", +
->   "(?i)^stage_AASM_e1_R$": "sleep_stage_rem", +
->   "(?i)^arousals_e1_\\('arousal_standard',\\s*'EEG_C3'\\)$": "arousal_eeg_c3", +
->   "^\\s*(?!sleep_stage_|arousal|apnea|hypopnea|desaturation|artifact|body_position)\"?([^\"]+)\"?\\s*$": "[comment] \\1" +
-> } +
-> </code> +
->  +
-> The keys are regex patterns (case-insensitive), and the values are replacement strings. Groups in the pattern can be referenced in the replacement (e.g., ''\1''). +
->  +
-> Based on the above example, generate a similar JSON mapping for the following list of event names: +
-> ``` +
-> [INSERT YOUR UNIQUE EVENT LIST HERE] +
-> ``` +
->  +
-> Provide the output within a JSON code block.+
  
 +For example, this is a reference mapping. Note that replacements are applied in sequential order:
 +```json
 +{
 +  "(?i)^stage_AASM_e1_W$": "sleep_stage_wake",
 +  "(?i)^stage_AASM_e1_R$": "sleep_stage_rem",
 +  "(?i)^arousals_e1_\\('arousal_standard',\\s*'EEG_C3'\\)$": "arousal_eeg_c3",
 +  "^\\s*(?!sleep_stage_|arousal|apnea|hypopnea|desaturation|artifact|body_position)\"?([^\"]+)\"?\\s*$": "[comment] \\1"
 +}
 +```
 +
 +The keys are regex patterns (case-insensitive), and the values are replacement strings. Groups in the pattern can be referenced in the replacement (e.g., ''\1'').
 +
 +Based on the above example, generate a similar JSON mapping for the following list of event names:
 +```
 +[INSERT YOUR UNIQUE EVENT LIST HERE]
 +```
 + 
 +Provide the output within a JSON code block.
 +</code>
 ---- ----
  
-===== 4. Metadata Handling =====+====== 4. Metadata Handling ======
  
 RKNS groups metadata by high-level categories (e.g., ''demographics'', ''clinical'', ''questionnaires'') to organize heterogeneous data sources into a consistent internal structure.   RKNS groups metadata by high-level categories (e.g., ''demographics'', ''clinical'', ''questionnaires'') to organize heterogeneous data sources into a consistent internal structure.  
Line 257: Line 273:
 </code> </code>
  
-====== Category Mapping ======+==== Category Mapping ====
  
 The script uses the ''category_mapping'' dictionary (defined at the top of ''main.py'') to automatically map folder paths to one of these standardized categories: The script uses the ''category_mapping'' dictionary (defined at the top of ''main.py'') to automatically map folder paths to one of these standardized categories:
Line 272: Line 288:
   * ''treatment'' – CPAP, therapy, adherence   * ''treatment'' – CPAP, therapy, adherence
  
-====== How It Works ======+==== How It Works ====
  
   1. The script extracts the participant ID from the EDF filename (e.g., "sub-0001" from "sub-0001_task-sleep_eeg.edf").   1. The script extracts the participant ID from the EDF filename (e.g., "sub-0001" from "sub-0001_task-sleep_eeg.edf").
Line 284: Line 300:
 ---- ----
  
-===== 5. CLI: Python & Docker =====+====== 5. CLI: Python & Docker ======
  
-====== Testing the Development CLI ======+==== Testing the Development CLI ====
  
 Once you've implemented your conversion logic in ''main.py'', test it end-to-end with your actual data: Once you've implemented your conversion logic in ''main.py'', test it end-to-end with your actual data:
Line 307: Line 323:
   5. Output a ''.rkns'' file with 777 permissions.   5. Output a ''.rkns'' file with 777 permissions.
  
-====== Building and Testing the Docker Image ======+==== Building and Testing the Docker Image ====
  
 Build the Docker image: Build the Docker image:
Line 333: Line 349:
 **Security Note:** Build args are visible in intermediate layer history. Avoid storing secrets (API keys, credentials) as build args; use runtime environment variables or mount secrets instead. **Security Note:** Build args are visible in intermediate layer history. Avoid storing secrets (API keys, credentials) as build args; use runtime environment variables or mount secrets instead.
  
-====== Testing the Docker-CLI ======+==== Testing the Docker-CLI ====
  
 Build with ''./build.sh'' (credentials stay in builder stage) and try running it on the example data with the same arguments as ''run_example.sh''. Build with ''./build.sh'' (credentials stay in builder stage) and try running it on the example data with the same arguments as ''run_example.sh''.
Line 341: Line 357:
 ---- ----
  
-===== 6. Orchestration: Airflow DAG =====+====== 6. Orchestration: Airflow DAG ======
  
 The provided ''edf_migration_dag.py'' serves as a template for processing all EDF files in your dataset at scale using Apache Airflow. The provided ''edf_migration_dag.py'' serves as a template for processing all EDF files in your dataset at scale using Apache Airflow.
  
-====== DAG Overview ======+==== DAG Overview ====
  
 The DAG: The DAG:
Line 353: Line 369:
   4. Collects results and validates completion.   4. Collects results and validates completion.
  
-====== Required Adaptations ======+==== Required Adaptations ====
  
-===== a) Input Dataset Path =====+=== a) Input Dataset Path ===
  
 Update ''base_path'' to point to your root BIDS-like directory: Update ''base_path'' to point to your root BIDS-like directory:
Line 364: Line 380:
 </code> </code>
  
-===== b) Annotation File Naming Logic =====+=== b) Annotation File Naming Logic ===
  
 The DAG assumes each EDF file has a corresponding annotation file. The default logic replaces ''.edf'' with ''_events.tsv'': The DAG assumes each EDF file has a corresponding annotation file. The default logic replaces ''.edf'' with ''_events.tsv'':
Line 386: Line 402:
 </code> </code>
  
-===== c) Metadata File Paths =====+=== c) Metadata File Paths ===
  
 The DAG uses global ''participants.tsv'' and ''participants.json'' files. Ensure they exist at the repository root: The DAG uses global ''participants.tsv'' and ''participants.json'' files. Ensure they exist at the repository root:
Line 399: Line 415:
   * Validate that files exist before passing to the Docker task.   * Validate that files exist before passing to the Docker task.
  
-===== d) Docker Image Name =====+=== d) Docker Image Name ===
  
 Update the image parameter to match your built container: Update the image parameter to match your built container:
Line 407: Line 423:
 </code> </code>
  
-===== e) Output Directory =====+=== e) Output Directory ===
  
 The output path specifies where ''.rkns'' files are written: The output path specifies where ''.rkns'' files are written:
Line 420: Line 436:
   * Sufficient disk space is available.   * Sufficient disk space is available.
  
-===== f) Volume Mounts =====+=== f) Volume Mounts ===
  
 The DAG binds host directories into the container. Update both the host path and container mount point: The DAG binds host directories into the container. Update both the host path and container mount point:
Line 447: Line 463:
 </code> </code>
  
-===== g) Example: Full Adaptation =====+=== g) Example: Full Adaptation ===
  
 Here's a complete example for a BIDS dataset stored at ''/mnt/bids'': Here's a complete example for a BIDS dataset stored at ''/mnt/bids'':
Line 491: Line 507:
 ---- ----
  
-===== 7. Project Structure =====+====== 7. Project Structure ======
  
 <code> <code>
Line 517: Line 533:
 ---- ----
  
-===== 8. Workflow Summary =====+====== 8. Workflow Summary ======
  
-====== End-to-End Process ======+==== End-to-End Process ====
  
   1. **Prepare Annotations**   1. **Prepare Annotations**
Line 549: Line 565:
 ---- ----
  
-===== 9. Troubleshooting =====+====== 9. Troubleshooting ======
  
-====== EDF File Not Found ======+==== EDF File Not Found ====
   - Verify the path in ''--input-edf'' is correct and exists.   - Verify the path in ''--input-edf'' is correct and exists.
   - If using Docker, ensure the path is relative to the container mount point (not the host).   - If using Docker, ensure the path is relative to the container mount point (not the host).
  
-====== Annotations File Not Found ======+==== Annotations File Not Found ====
   - Check that the annotation file naming logic matches your dataset.   - Check that the annotation file naming logic matches your dataset.
   - Ensure the TSV is in RKNS-compatible format (three columns: onset, duration, event).   - Ensure the TSV is in RKNS-compatible format (three columns: onset, duration, event).
  
-====== Participant Not Found in participants.tsv ======+==== Participant Not Found in participants.tsv ====
   - The participant ID extraction regex may not match your filename pattern.   - The participant ID extraction regex may not match your filename pattern.
   - Update ''extract_sub_part()'' in ''main.py'' to match your naming convention:   - Update ''extract_sub_part()'' in ''main.py'' to match your naming convention:
Line 571: Line 587:
 </code> </code>
  
-====== Metadata Columns Not Recognized ======+==== Metadata Columns Not Recognized ====
   - Verify that ''participants.json'' contains a ''"folder"'' key for each column.   - Verify that ''participants.json'' contains a ''"folder"'' key for each column.
   - Check that the ''folder'' values are in ''category_mapping''.   - Check that the ''folder'' values are in ''category_mapping''.
   - Add missing mappings to ''category_mapping'' as needed.   - Add missing mappings to ''category_mapping'' as needed.
  
-====== Channel or Event Names Not Mapped ======+==== Channel or Event Names Not Mapped ====
   - Extract raw names and add them to the regex mapping JSON.   - Extract raw names and add them to the regex mapping JSON.
   - Test with ''test_replace_channels.py'' or ''test_replace_events.py''.   - Test with ''test_replace_channels.py'' or ''test_replace_events.py''.
   - Use LLM prompts to generate regex patterns if needed.   - Use LLM prompts to generate regex patterns if needed.
  
-====== Permission Denied on Output File ======+==== Permission Denied on Output File ====
   - Ensure the output directory is writable by the process (or Airflow worker).   - Ensure the output directory is writable by the process (or Airflow worker).
   - Use ''--create-dirs'' to auto-create the directory with correct permissions.   - Use ''--create-dirs'' to auto-create the directory with correct permissions.
  
-====== Docker Build Fails ======+==== Docker Build Fails ==== 
   - Check that all dependencies in ''pyproject.toml'' are compatible.   - Check that all dependencies in ''pyproject.toml'' are compatible.
   - Verify the ''Dockerfile'' references the correct base image and Python version.   - Verify the ''Dockerfile'' references the correct base image and Python version.
Line 592: Line 609:
 ---- ----
  
-===== 10. Key Files Reference =====+====== 10. Key Files Reference ======
  
-====== main.py ======+==== main.py ====
   - **''edf_to_rkns()''** – Main conversion function. Orchestrates all steps.   - **''edf_to_rkns()''** – Main conversion function. Orchestrates all steps.
   - **''validate_rkns_checksum()''** – Validates data integrity by reading all signal blocks.   - **''validate_rkns_checksum()''** – Validates data integrity by reading all signal blocks.
Line 601: Line 618:
   - **''parse_args()''** – CLI argument parser. Do not modify.   - **''parse_args()''** – CLI argument parser. Do not modify.
  
-====== assets/replace_channels.json ======+==== assets/replace_channels.json ====
   - Regex patterns (keys) → standardized channel names (values).   - Regex patterns (keys) → standardized channel names (values).
   - Applied sequentially in order.   - Applied sequentially in order.
   - Examples: "EEG(sec)" → "EEG-C3-A2", "SaO2" → "SPO2".   - Examples: "EEG(sec)" → "EEG-C3-A2", "SaO2" → "SPO2".
  
-====== assets/event_description_mapping.json ======+==== assets/event_description_mapping.json ====
   - Regex patterns (keys) → standardized event labels (values).   - Regex patterns (keys) → standardized event labels (values).
   - Applied sequentially in order.   - Applied sequentially in order.
   - Catch-all pattern: unknown events prefixed with ''[comment]''.   - Catch-all pattern: unknown events prefixed with ''[comment]''.
  
-====== participants.json ======+==== participants.json ====
   - Column codebook with ''Description'' and **required** ''folder'' keys.   - Column codebook with ''Description'' and **required** ''folder'' keys.
   - Folder values are mapped to categories by ''category_mapping'' in ''main.py''.   - Folder values are mapped to categories by ''category_mapping'' in ''main.py''.
  
-====== participants.tsv ======+==== participants.tsv ====
   - Subject-level metadata table (BIDS-compatible).   - Subject-level metadata table (BIDS-compatible).
   - Rows = subjects, columns = variables (must match ''participants.json'').   - Rows = subjects, columns = variables (must match ''participants.json'').
Line 622: Line 639:
 ---- ----
  
-===== 11. Contributing & Customization =====+====== 11. Contributing & Customization ======
  
 This template is intentionally configurable. The main customization points are: This template is intentionally configurable. The main customization points are:
  
-  1. **''extract_sub_part()'' in main.py** – Update the regex if your participant ID format differs. +  **''extract_sub_part()'' in main.py** – Update the regex if your participant ID format differs. 
-  2. **''category_mapping'' in main.py** – Add or modify mappings if your dataset uses different folder paths. +  **''category_mapping'' in main.py** – Add or modify mappings if your dataset uses different folder paths. 
-  3. **''assets/replace_channels.json''** – Adapt channel mappings to your EDF sources. +  **''assets/replace_channels.json''** – Adapt channel mappings to your EDF sources. 
-  4. **''assets/event_description_mapping.json''** – Adapt event mappings to your annotation sources. +  **''assets/event_description_mapping.json''** – Adapt event mappings to your annotation sources. 
-  5. **''edf_migration_dag.py''** – Adjust discovery logic, paths, and Docker config for your environment.+  **''edf_migration_dag.py''** – Adjust discovery logic, paths, and Docker config for your environment.
  
 For questions or issues, refer to the troubleshooting section or the inline code comments in ''main.py''. For questions or issues, refer to the troubleshooting section or the inline code comments in ''main.py''.
  • workflow_dataset_to_rkns.1760963718.txt.gz
  • Last modified: 2025/10/20 12:35
  • by fabricio