workflow_dataset_to_rkns

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
workflow_dataset_to_rkns [2025/10/20 12:45] fabricioworkflow_dataset_to_rkns [2025/10/20 14:03] (current) – [AI-Assisted Mapping Generation] fabricio
Line 37: Line 37:
 </code> </code>
  
-==== Conversion Workflow ====+===== Conversion Workflow =====
  
 The ''edf_to_rkns()'' function executes the following steps: The ''edf_to_rkns()'' function executes the following steps:
  
-====== 1.1 Load and Standardize Signals ======+==== 1.1 Load and Standardize Signals ====
 <code python> <code python>
 rkns_obj = rkns.from_external_format( rkns_obj = rkns.from_external_format(
Line 51: Line 51:
 Uses regex mappings in ''assets/replace_channels.json'' to rename EDF channels to standardized names (e.g., "EEG(sec)" → "EEG-C3-A2"). Uses regex mappings in ''assets/replace_channels.json'' to rename EDF channels to standardized names (e.g., "EEG(sec)" → "EEG-C3-A2").
  
-===== 1.2 Add Annotations =====+==== 1.2 Add Annotations ====
 Converts the TSV file to BIDS-compatible format and adds events: Converts the TSV file to BIDS-compatible format and adds events:
 <code python> <code python>
Line 61: Line 61:
 </code> </code>
  
-===== 1.3 Extract and Categorize Metadata =====+==== 1.3 Extract and Categorize Metadata ====
   - Matches participant ID (e.g., "sub-0001") from the EDF filename.   - Matches participant ID (e.g., "sub-0001") from the EDF filename.
   - Looks up the participant in ''participants.tsv''.   - Looks up the participant in ''participants.tsv''.
Line 67: Line 67:
   - Groups metadata by category and adds each to the RKNS object.   - Groups metadata by category and adds each to the RKNS object.
  
-===== 1.4 Finalize and Export =====+==== 1.4 Finalize and Export ====
 <code python> <code python>
 rkns_obj.populate()                     # Build internal structure rkns_obj.populate()                     # Build internal structure
Line 73: Line 73:
 </code> </code>
  
-===== 1.5 Validate =====+==== 1.5 Validate ====
 <code python> <code python>
 validate_rkns_checksum(output_file)     # Verify checksums on all data validate_rkns_checksum(output_file)     # Verify checksums on all data
Line 110: Line 110:
 ---- ----
  
-===== 3. Standardizing Names via Regex Mappings =====+====== 3. Standardizing Names via Regex Mappings ======
  
 Once you have a compliant TSV, normalize channel and event names non-destructively using regex mappings defined in two JSON files. Once you have a compliant TSV, normalize channel and event names non-destructively using regex mappings defined in two JSON files.
  
-====== Channel Names → assets/replace_channels.json ======+===== Channel Names → assets/replace_channels.json =====
  
 EDF channel labels vary wildly (e.g., "EEG C3-A2", "EEG(sec)"). Map them to standardized RKNS names: EDF channel labels vary wildly (e.g., "EEG C3-A2", "EEG(sec)"). Map them to standardized RKNS names:
Line 134: Line 134:
 Use ''test_replace_channels.py'' to validate your mappings: Use ''test_replace_channels.py'' to validate your mappings:
  
-  1. Extract unique channel names from your EDF files:+1. Extract unique channel names from your EDF files:
 <code bash> <code bash>
 find /data -name "*.edf" -exec edf-peek {} \; | grep "signal_labels" | sort | uniq > assets/extracted_channels.txt find /data -name "*.edf" -exec edf-peek {} \; | grep "signal_labels" | sort | uniq > assets/extracted_channels.txt
 </code> </code>
-  2. Run the test:+2. Run the test:
 <code bash> <code bash>
 python test_replace_channels.py python test_replace_channels.py
 </code> </code>
-  3. Inspect the output:+3. Inspect the output:
 <code bash> <code bash>
 cat out/renamed_channels.csv cat out/renamed_channels.csv
Line 151: Line 151:
 If you have a list of unique channel names, use this prompt with your LLM to accelerate mapping creation: If you have a list of unique channel names, use this prompt with your LLM to accelerate mapping creation:
  
-> I will provide you a list of EEG channels extracted from the original EDFs.   +<code> 
-I require an output in JSON format that maps channel names to new standardized names using grep-style regex replacements.   +I will provide you a list of physiological signal channels (e.g., EEG, EOG, EMG, respiratory, cardiac) extracted from original EDF files.   
->  +I require an output in JSON format that maps each raw channel name to standardized name using **sequential, grep-style regex replacements**.
-> For example, this is a reference mapping. Note that replacements are applied in sequential order: +
-> <code json> +
-> { +
->   "(?i)^ABDO\\sRES\\s*$": "RESP-ABD", +
->   "(?i)^THOR\\sRES\\s*$": "RESP-CHEST", +
->   "(?i)^EEG\\(sec\\)\\s*$": "EEG-C3-A2", +
->   "_": "-" +
-> } +
-> </code> +
->  +
-> The keys are regex patterns (case-insensitive), and the values are replacement strings. Groups in the pattern can be referenced in the replacement (e.g., ''\1''). +
->  +
-> Based on the above example, generate a similar JSON mapping for the following list of channel names: +
-> ``` +
-> [INSERT YOUR UNIQUE CHANNEL LIST HERE] +
-> ``` +
->  +
-> Provide the output within a JSON code block.+
  
 +Key requirements:
 +1. **Include early normalization rules** to handle common delimiters (e.g., `_`, spaces, `.`, `/`, parentheses) by converting them to hyphens (`-`), collapsing multiple hyphens, and trimming leading/trailing hyphens.
 +2. All patterns must be **case-insensitive** (use `(?i)`).
 +3. Use **physiologically meaningful, NSRR/AASM-aligned names**, such as:
 +   - `EEG-C3-M2` (not `EEG-C3_A2` or ambiguous forms)
 +   - `EMG-LLEG` / `EMG-RLEG` for leg EMG (not `LAT`/`RAT` as position)
 +   - `RESP-AIRFLOW-THERM` or `RESP-AIRFLOW-PRES` (not generic `RESP-NASAL`)
 +   - `EOG-LOC` / `EOG-ROC` for eye channels
 +   - `EMG-CHIN` for chin EMG
 +   - `PULSE` for heart rate or pulse signals (unless raw ECG → `ECG`)
 +4. **Do not include a final catch-all rule** (e.g., `^(.+)$ → MISC-\1`) unless explicitly requested—most channels in the input list should be known and mapped specifically.
 +5. Replacements are applied **in order**, with each rule operating on the result of the previous one.
 +
 +Example reference snippet:
 +```json
 +{
 +  "(?i)[\\s_\\./\\(\\),]+": "-",
 +  "-+": "-",
 +  "^-|-$": "",
 +  "(?i)^abdomen$": "RESP-ABD",
 +  "(?i)^c3_m2$": "EEG-C3-M2",
 +  "(?i)^lat$": "EMG-LLEG"
 +}
 +```
 +
 +Now, generate a similar JSON mapping for the following list of channel names:
 +```
 +[INSERT YOUR UNIQUE CHANNEL LIST HERE]
 +```
 +
 +Provide the output **within a JSON code block only**—no explanations.
 +</code>
 ===== Event Descriptions → assets/event_description_mapping.json ===== ===== Event Descriptions → assets/event_description_mapping.json =====
  
  • workflow_dataset_to_rkns.1760964333.txt.gz
  • Last modified: 2025/10/20 12:45
  • by fabricio