Differences

This shows you the differences between two versions of the page.

--- workflow_dataset_to_rkns [2025/10/20 12:45] – fabricio
+++ workflow_dataset_to_rkns [2025/10/20 14:03] (current) – [AI-Assisted Mapping Generation] fabricio
@@ Line 37: / Line 37: @@
 </code>
-==== Conversion Workflow ====
+===== Conversion Workflow =====
 The ''edf_to_rkns()'' function executes the following steps:
-====== 1.1 Load and Standardize Signals ======
+==== 1.1 Load and Standardize Signals ====
 <code python>
 rkns_obj = rkns.from_external_format(
@@ Line 51: / Line 51: @@
 Uses regex mappings in ''assets/replace_channels.json'' to rename EDF channels to standardized names (e.g., "EEG(sec)" → "EEG-C3-A2").
-===== 1.2 Add Annotations =====
+==== 1.2 Add Annotations ====
 Converts the TSV file to BIDS-compatible format and adds events:
 <code python>
@@ Line 61: / Line 61: @@
 </code>
-===== 1.3 Extract and Categorize Metadata =====
+==== 1.3 Extract and Categorize Metadata ====
   - Matches participant ID (e.g., "sub-0001") from the EDF filename.
   - Looks up the participant in ''participants.tsv''.
@@ Line 67: / Line 67: @@
   - Groups metadata by category and adds each to the RKNS object.
-===== 1.4 Finalize and Export =====
+==== 1.4 Finalize and Export ====
 <code python>
 rkns_obj.populate()                     # Build internal structure
@@ Line 73: / Line 73: @@
 </code>
-===== 1.5 Validate =====
+==== 1.5 Validate ====
 <code python>
 validate_rkns_checksum(output_file)     # Verify checksums on all data
@@ Line 110: / Line 110: @@
 ----
-===== 3. Standardizing Names via Regex Mappings =====
+====== 3. Standardizing Names via Regex Mappings ======
 Once you have a compliant TSV, normalize channel and event names non-destructively using regex mappings defined in two JSON files.
-====== Channel Names → assets/replace_channels.json ======
+===== Channel Names → assets/replace_channels.json =====
 EDF channel labels vary wildly (e.g., "EEG C3-A2", "EEG(sec)"). Map them to standardized RKNS names:
@@ Line 134: / Line 134: @@
 Use ''test_replace_channels.py'' to validate your mappings:
 . Extract unique channel names from your EDF files:
 <code bash>
 find /data -name "*.edf" -exec edf-peek {} \; | grep "signal_labels" | sort | uniq > assets/extracted_channels.txt
 </code>
 . Run the test:
 <code bash>
 python test_replace_channels.py
 </code>
 . Inspect the output:
 <code bash>
 cat out/renamed_channels.csv
@@ Line 151: / Line 151: @@
 If you have a list of unique channel names, use this prompt with your LLM to accelerate mapping creation:
-> I will provide you a list of EEG channels extracted from the original EDFs.
+<code>
-> I require an output in JSON format that maps channel names to new standardized names using grep-style regex replacements.
+I will provide you a list of physiological signal channels (e.g., EEG, EOG, EMG, respiratory, cardiac) extracted from original EDF files.
->
+I require an output in JSON format that maps each raw channel name to a standardized name using **sequential, grep-style regex replacements**.
-> For example, this is a reference mapping. Note that replacements are applied in sequential order:
-> <code json>
-> {
->   "(?i)^ABDO\\sRES\\s*$": "RESP-ABD",
->   "(?i)^THOR\\sRES\\s*$": "RESP-CHEST",
->   "(?i)^EEG\\(sec\\)\\s*$": "EEG-C3-A2",
->   "_": "-"
-> }
-> </code>
->
-> The keys are regex patterns (case-insensitive), and the values are replacement strings. Groups in the pattern can be referenced in the replacement (e.g., ''\1'').
->
-> Based on the above example, generate a similar JSON mapping for the following list of channel names:
-> ```
-> [INSERT YOUR UNIQUE CHANNEL LIST HERE]
-> ```
->
-> Provide the output within a JSON code block.
+Key requirements:
+. **Include early normalization rules** to handle common delimiters (e.g., `_`, spaces, `.`, `/`, parentheses) by converting them to hyphens (`-`), collapsing multiple hyphens, and trimming leading/trailing hyphens.
+. All patterns must be **case-insensitive** (use `(?i)`).
+. Use **physiologically meaningful, NSRR/AASM-aligned names**, such as:
+   - `EEG-C3-M2` (not `EEG-C3_A2` or ambiguous forms)
+   - `EMG-LLEG` / `EMG-RLEG` for leg EMG (not `LAT`/`RAT` as position)
+   - `RESP-AIRFLOW-THERM` or `RESP-AIRFLOW-PRES` (not generic `RESP-NASAL`)
+   - `EOG-LOC` / `EOG-ROC` for eye channels
+   - `EMG-CHIN` for chin EMG
+   - `PULSE` for heart rate or pulse signals (unless raw ECG → `ECG`)
+. **Do not include a final catch-all rule** (e.g., `^(.+)$ → MISC-\1`) unless explicitly requested—most channels in the input list should be known and mapped specifically.
+. Replacements are applied **in order**, with each rule operating on the result of the previous one.
+Example reference snippet:
+```json
+{
+  "(?i)[\\s_\\./\\(\\),]+": "-",
+  "-+": "-",
+  "^-|-$": "",
+  "(?i)^abdomen$": "RESP-ABD",
+  "(?i)^c3_m2$": "EEG-C3-M2",
+  "(?i)^lat$": "EMG-LLEG"
+}
+```
+Now, generate a similar JSON mapping for the following list of channel names:
+```
+[INSERT YOUR UNIQUE CHANNEL LIST HERE]
+```
+Provide the output **within a JSON code block only**—no explanations.
+</code>
 ===== Event Descriptions → assets/event_description_mapping.json =====