Audio Annotation Pipeline #
Processing steps of participant recall audio from the Movie 24 task.
Data Structure #
BOX/Vwani_Movie #
> [Download Box Drive](https://www.box.com/resources/downloads) to access the data through Finder. The pipeline code contains paths for Box mounted through Finder. You need to change these on your locally downloaded code files.
Convert to .wav #
1. convert microphone audio to `.wav` (from `.nsx`, `.ncs`, etc.)
- if .m4a, use: ffmpeg -i .m4a .wav
2. keep file naming convention (e.g.`563_exp_10_preSleep_movie_24_audio.wav`)
3. copy `.wav` to `/Box/Vwani_Movie/raw_patient_audio/24 movie audio`
Fried Lab Movie task notes #
1. open Google Sheets: Fried Lab Movie task notes (ask Soraya or John for link)
2. go to `ExpInfo` sheet, enter all info possible
3. go to `24_S06E01` sheet, enter info
4. open `.wav` and listen for the exact start and stop times for the free recall (format = h:mm:ss)
5. export `24_S06E01` sheet as `.csv` to `/Box/Vwani_Movie/tiro` (replace existing)
Tiro #
Install #
1. copy `/Box/Vwani_Movie/tiro` to your computer and set it up following `README.md`
tiro.py #
1. run `tiro.ipynb`
Change paths for your computer
- set local tiro paths
- tiro = ‘/Users/ajrangel/projects/tiro/tiro’
- set Box paths
- box = ‘/Users/ajrangel/Library/CloudStorage/Box-Box/Vwani_Movie/’
Speech-to-Text #
Whisper (OpenAI) #
Install #
1. install openai-whisper
pip install -U openai-whisper
whisper.sh #
1. go to `Box/Vwani_Movie/audio_annotation/whisper/code`
2. copy `whisper.sh` to local directory
3. open `whisper.sh` and edit the directory variables: `audio_dir`, `whisper`, `out_dir`
audio_dir=/Users/ajrangel/Library/CloudStorage/Box-Box/Vwani_Movie/audio_annotation/Free_recall_sound_files
whisper=/Users/ajrangel/anaconda3/envs/tiro_env/bin/whisper
out_dir=/Users/ajrangel/Library/CloudStorage/Box-Box/Vwani_Movie/audio_annotation/whisper/transcripts
4. run `whisper.sh`
– in Terminal, type path to local `whisper.sh` and hit enter
– enter patient number(s)
## Wordpools
> Complete human check of automated transcription before generating wordpool
#### **wordpool.ipynb**
1. go to `Box/Vwani_Movie/audio_annotation/Wordpools/code`
2. run `wordpool.ipynb`
## Hoffman Post-Processing
### Automatic Annotations
> Download [h2jupynb](https://www.hoffman2.idre.ucla.edu/Using-H2/Connecting/Connecting.html#connecting-via-jupyter-notebook-lab) and connect to Hoffman via Jupyter Lab
#### **automatic_annotations.py**
1. run `automatic_annotation.py`