CNL Wiki

Docs: Video start time and drift estimation

Updated on June 12, 2026

During movie watching experiments we record audio (either with a room mic or a direct-from-laptop connection) using our neural data acquisition device so that we can accurately align our behavioral data (eg events in the movie) to the neural data. There are two measure we estimate with this pipeline:

  1. the movie start time
  2. a drift rate (clocks on different devices run ever so slightly differently so we see an ever increasing time difference between points in the movie audio we record, and what it should be at the fixed frame rate of the video)

Requirements #

  • Audacity and the ffmpeg library.
  • Hoffman account and h2jupynb 
  • Video drift code (which includes instructions in the README on how to set up the conda environment needed)
  • Access to the MovieAudio box folder
  • Access to the Fried lab movie google sheet

Drift correction pipeline #

Each experiment has two audio streams:

  1. “video” refers to the audio track of the video shown to the participant.
  2. “.wav” is the audio recorded with our neural recording device – either a direct connection from the laptop speakers or an external microphone recording of the room.

You will be told which patient/video/wav file to process each time.

1. Listening to the audio in audacity #

Open both the movie video and .wav files in Audacity. Visually compare both streams and look for:

  1. Matching features between the two streams
  2. Pauses in the .wav stream
  3. Audio quality/quiet sections of the video audio that may impact matching.

Then, listen to the start and end of the video audio. Match these sections of audio to the .wav audio and note down the times. (To zoom in on a portion of the audio, highlight the portion of the waveform and click the enlarge button (⌘E).)

  • The time elapsed before the start of the audio in the .wav stream will be used to set the expected initial offset in the alignment notebook.
  • Not all videos are played fully from start to finish. Occasionally, the video presentation is started part way through or stopped before the end.
    • If you cannot find either the start/end of the video in the .wav stream, listen to the .wav stream for the earliest/latest video content you can hear and locate these points in the video stream.
    • Note the time and content in the google slides output.
  • Be as accurate as possible (within 0.5 to 1 second accuracy), but always set times in the video stream to within the bounds of what is clearly heard on the .wav stream. (You may want to click the time dropdown and change the selection to “seconds + milliseconds.”)
  • Check whether the duration between the start and end in the .wav stream is roughly what you would expect from the video duration. If the .wav duration is longer, there might be a pause.
    • Be aware of the audio quality across the two streams, as this may impact the results and alignment parameter selection.

 

2. Running the TTL Extraction Notebook #

Check if the session has TTL information by running the TTL extraction notebook.

  1. Enter the pID
  2. Enter the mov_name
    • If it runs, check that the output roughly resembles what you found from manually matching the .wav file to the video file in Audacity.
    • Note the movie start time and the times of any pauses.
      • If there are pauses, the output will show multiple Playbacks (i.e., Playback 1, Playback 2), these can be used to guide offset inputs into the alignment code.

Note: sometimes TTLs are not correctly sent (eg if the movie is started before the neural recording). So take the audio as the most accurate information, and consider the TTLs as a guide to help when they can.

3. Running the Alignment Notebook #

  • Set pID, wav_name, and vid_name.
  • Load the audio
    • this downsamples the streams to a common sample rate
    • it prints out info about the video and audio files (eg video length, which is used to guide parameter selection below)
  • skip_quiet_chunks can typically be left at 1 unless the code is struggling to find matches, then it can be set to 0 to increase our pool of potential clips.
    • when set to 1, for each clip extracted from the video, the volume of the clip audio must be higher than a certain threshold for a certain amount of the clip to be used as a reference clip for matching. The rationale being that if there is no audio in that bit of the video we won’t be able to match it well in the wav stream.
  • Set the run parameters
    • note: this may be an iterative process, as we may have to try different settings to get a decent output (eg due to audio quality)
    • Set run_number as 1.
      • If there are pauses, set the run_number for the consecutive sections as 2, 3, etc.
    • Set expected_offset as the expected initial offset minus 2 seconds.
      • For pauses, the expected_offset is the length of the original offset plus the length of the pause. Movie_start_time is the video time after the pause.
    • Set movie_start_time based on what you heard in audacity – if the movie was shown from the beginning, set to 0. If it started later (eg if resuming from a pause) should set to the time in the video for what can be heard in the wav audio.
    • Set movie_length based on video length above (unless told otherwise about how long the video was shown)
    • Set movie_skip_time based on movie length to get around 15-20 points for the plot (eg if length is 2000 seconds, skip time could be around 100).
  • Refinement parameters
    • these can be left at default unless the code is struggling to find a match, then you will need to modify these values to make the matching less/more strict to get the best match possible
      • coincidences_thresh (default 4)  range: 2 – 5 (higher is stricter)
      • tolerance (default 0.02 ) range 0.02 – 0. 05 (smaller is stricter)
  • Run the alignment and plot the results.
    • Remove any outliers by inserting point numbers into remove_inds.
    • If there are not sufficient points plotted, lower the skip time or strictness of the parameters
    • Check that the x range is what you expect from the length of the video
      • missing end sections may be due to a pause/weird skipping behavior that we need to account for
      • try modifying parameters to see if you can get close to the full range plotted.

 

4. Outputs #

Once you are happy with the drift plot

  1. Record plot, movieStart (s), Drift rate, and movieSegment in the google sheet.
  2. Screenshot the plot and parameters and add to google slides record.
  3. Save and download the .json file for each run to box or hoffman mount.

There are some cells at the bottom where you can plot the video/wav audio with you estimated offsets to see if they match