Cognitive Neurophysiology Laboratory

CNL Wiki

Docs: Patient Database Manager

Updated on July 26, 2023

The Patient Database Manager

A Software Suite for Processing and Organizing Data

(c) Emily Mankin, 2015

Additional code provided by

Natalia Tchemodanov,

Uri Maoz, and

Ali Titiz

(and from the Matlab File Exchange; Licenses are included in source code)

This software is designed for managing information flow from collecting data through processing it into a format in which relevant information is segmented into individual trials. It was written for use by the Cognitive Neurophysiology Lab (Itzhak Fried) at UCLA, however with minor modifications, it could be extended to more general use.

To date, this code has been tested on a MacBook Pro Retina, running OS X Yosemite, with Matlab version 2014b. Efforts have been made to keep it compatible with Matlab 2014a, and to make sure it runs on Windows and Mac, though it is still in the beta-test phase. Many features are still being implemented. Bug reports and feature requests may be submitted to Emily Mankin: emankin@g.ucla.edu.

Table of Contents

Getting Started

To Enter A New Patient into the Database

To Enter a New Experimental Session into the Database

Entering Session MetaData

Processing Experimental Data

Extracting Data From the Database

Further Information Regarding Signal Processing

Further Information Regarding Behavioral Analysis

Getting Started #

The Patient Database Manager (PDM) has three main purposes.

  • First, it is intended to organize information regarding experiments run in the lab. Information from each patient or research subject should be maintained along with information about each experiment run.
  • Second, it is intended to simplify the data processing workflow by providing a user-friendly GUI for each step in the workflow. By using the PDM GUI, the output of the processing steps will be maintained in a unified format.
  • Finally, the structures of the PDM serve as a queryable database, allowing one to select data of interest precisely and have it returned in a predictable format.

The first time you run patientDataManager, you will have to tell it about your directory structure. The information you provide will be saved in the same location as the patientDataManager code, so if different users have different paths to the same data files, it is important that each user keep a local copy of the patientDataManager code.

Before starting, please create a folder where the patient database will be created. This should be kept separate from other folders, as this is where the files created by the PDM software will be kept. You shouldn’t modify the contents of this folder directly, only through the PDM software.

In our lab, we keep our raw data (as output by the recording system) in a separate location from our “unpacked data” (which has been converted to be Matlab-readable). Thus, you will be prompted for both a raw data folder and an unpacked data folder. If you keep these files together, it is fine to link to the same location.

You will be prompted for the following directories:

  • pathToHoffmanDataFolder: this should be the base folder in which your data generally is stored. We have several subdirectories of this, but this is the starting point from which other subdirectories will be searched.
  • rawDataPath: the location where raw data is expected to be stored.
  • processedPath: the location where processed data should be stored.
  • databasePath: the location where the patient database is stored.

To Enter A New Patient into the Database #

Information about each research subject must be entered before any experiment info can be entered for that subject.

To add a patient to the database, run:

>> patientDataManager

From the pop-up menu, select “Enter or update patient information,” then fill in appropriate information, including electrode location. Currently, this is set up such that electrode locations cannot change between experiments. This may be set as a modifiable preference in a future release.

To Enter a New Experimental Session into the Database #

For each experiment run, please add a session. Sessions/Experiments are expected to be numbered. Keep track of each experiment you do and assign it a unique number, as this number will be used for indexing the database.

To enter a new session, you may run

>>patientDataManager

and select “Add or update session for existing patient” from the menu, or you may run the following code from the command line:

>> patientDataManager_updateOrAddSession(patientNumber, sessionNumber)

The following window will appear:

This window is the primary tool for getting data through the processing pipeline and keeping it organized.

On the left hand side of the updateOrAddSession panel, meta-data about the session should be entered. Except for the session notes, this information can even be entered before the experiment is run. On the right hand side, each button is used for an analysis step. It is generally best to go top to bottom, on the left column, then the right column, then click the wide buttons at the bottom, however some flexibility is possible.

The function of each button is described below.

Entering Session MetaData #

The diagram below shows the typical order of entering meta data. Descriptions of the buttons/fields are below.

Macintosh HD:Users:friedadmin:Documents:PDM Manual:update_left.png

  1. Click on the ‘select’ button to enter the date of the session and choose the date from the linked calendar.
  2. Choose the experiment type from the drop down menu. If your experiment type is not listed, you may enter it manually, but the identical string must be added to the experimentMasterList file in order to process the data later.
  3. If a lab manager assisted with the experiment, enter the lab manager’s name.
  4. Enter the name of the primary experimenter.
  5. For each electrode/electrode bank, enter the electrode identifier
  6. For each electrode/electrode bank, enter the localization of the electrode.
  7. Click Submit. This data shouldn’t change as further processing is done. When you enter submit, the fields described are disabled to avoid accidental changes.
  8. If changes must be made, click on Allow Changes and then confirm your desire to make changes. The fields will be re-enabled. Click submit again when you’re done.
  9. After the experiment is over, enter any notes taken by the lab manager.
  10. After the experiment is over, enter any notes taken by the experimenter.

Processing Experimental Data #

The diagram below shows the typical order of processing experimental data. Descriptions of the buttons/fields are below.

Macintosh HD:Users:friedadmin:Documents:PDM Manual:update_right.png

  1. Link to Raw Neural Data: Select the location of the raw file that is associated to this session.
  2. Unpack Raw Data: This will call code that will convert the raw data into data that is readable by Matlab, including continuous recordings from each channel (CSC) and spikes. It will also launch automated-clustering of spikes for use with wave_clus spike sorting. Running this is generally a long process, and you may prefer to run it from a detachable Unix screen. To do so, launch matlab without JVM and run the following line from the command line (where sourcePath is the path to the raw neural data and destinationPath is the folder where you would like the resulting files saved). Afterward, click on “Unpack Raw Data” to link to the processed files.>> batchProcessAll(sourcePath,destinationPath);For further detail, please see STEP 1 of the “Further Information Regarding Signal Processing” section below.
  3. Launch Spike Sorting: This gives you a choice between doing post-hoc spike sorting and windowed spike sorting. If you select post-hoc, this will launch wave_clus to allow you to do supervised spike sorting. This is appropriate for offline spike sorting when you want your spikes to be carefully clustered (for more information on wave_clus, please see documentation at http://www2.le.ac.uk/departments/engineering/research/bioengineering/neuroengineering-lab/spike-sorting).If instead you want to emulate online spike sorting, you may choose “windowed” spike sorting. In this case, all spikes that were detected online will be shown, and you can draw windows by clicking opposite points of a rectangle. Any spike that passes through all windows you draw will be assigned to a cluster. This type of spike sorting is expected to be less accurate than using post-hoc sorting, but is reflective of the methods available online.
  4. Verify Spike Sorting: Click on this button after you have done spike sorting. If you don’t want to sort all channels before continuing, you can uncheck electrodes from the left hand side of the panel prior to selecting this button, and it will only show you spikes from the checked channels. This allows you to double-check your spike sorting, classify each unit sorted in wave_clus as a single unit or multi-unit, and keep track of which channels have been verified. No spiking data from wave_clus will be included in analysis if it has not been verified through this step.This step is not available for windowed spike sorting, as it would not be available real-time.
  5. Filter LFP Data: As for Verify Spike Sorting, this button will do filtering on only the channels that are checked on the left hand side of the panel.This function takes the raw CSC data and down-samples it to 1 kHz. From the down-sampled data, FIR filters are applied to generate filtered data from bands that have been shown to have physiological significance. These include
  • lfp, 1-300 Hz
  • theta, 3-8 Hz
  • alpha, 8.5-15 Hz
  • beta, 16-30 Hz
  • low_gamma, 30-45 Hz
  • high_gamma, 75-125 Hz

 

Additionally, a noise rejection algorithm finds time points on each channel where the LFP should be rejected. The algorithm works by finding potential outliers on each single channel. Outliers are samples more than 8 iqrstds removed from the median (where iqrstd is the STD approximation from the IQR). Actual artifacts are those that occur within 0.3s on at least half of the electrodes. Then, for each remaining artifact, the time from its peak to when it goes back to the median+/-iqrstd is defined as an artifact.

For further detail, please see STEP 2-3 of the “Further Information Regarding Signal Processing” section below.

  1. Paradigm-Specific Parameters: Based on the task name selected on the left hand side of the panel, this will ask questions that are necessary for parsing data from that paradigm. (Currently, the paradigm specific parameters questions have been implemented for ObjectRecognition, VerbalMemory, and Unity).
  2. Stimulation Parameters: Click on this button to enter stimulation parameters, even if no stimulation was performed. You may choose “none” as stimulation type and leave the remainder of the questions blank. Otherwise, please enter all parameters as appropriate. If any parameters change during an experiment, you may select the appropriate checkbox and leave them blank, but you must have something in your TTL file (and TTL parser) that can fill in the stimulation parameters on a trial by trial basis. For stimulation onset time, please enter a number relative to stimulus onset time. For example, if stimulation begins 3 seconds before picture onset, enter -3.
  3. Process and Verify TTLs: This will open the event file that corresponds to the raw data file, and process it according to the paradigm name selected on the left hand side of the panel. (Currently,TTL parsers have been implemented for ObjectRecognition, VerbalMemory, and Unity).For further detail, please see the “Further Information Regarding Behavioral Analysis” section below.
  4. Select Trials to Exclude: If you have a reason to exclude certain trials or an entire experiment (e.g. because it was a practice experiment or because the patient fell asleep during the last 3 trials), you may enter that information on this panel. Please click on this button, even if you have no files to exclude. (If you had many trials in an experiment, the text may be quite small. The “submit” button is on the bottom, right.)
  5. Segment LFP into Trials for Selected Channels: Prior to clicking this button, be sure that the “Filter LFP Data” button is green. That will indicate that all channels that are currently checked have been filtered. All buttons on the far right column related to trial parsing must also be green. Now, click this button and it will split LFP data from the checked channels into a trial-by-trial data structure.
  6. Segment Spiking into Trials for Selected Channels: Similarly, make sure that the “Verify Spike Sorting” button is green, and then click this button to segment spiking data into a trial-by-trial data structure. (Or, if you have done windowed spike sorting, the verify button doesn’t have to be green, but then you can only segment windowed spikes.)All buttons should be green now — you’ve finished processing data for this session.
  7. Generally, once something has been analyzed, it will not be re-done. If you need to overwrite your filtering or your segmented LFP data, check this box before clicking on Filter or Segment LFP. This hasn’t been implemented for spikes yet, though you can choose to overwrite previously verified spikes when you click on the verify spikes button.
  8. You can move to the next or previous trial with the < and > buttons.
  9. If you would like to go to a folder where data from this experiment is located, select the folder from this drop-down menu. You can also choose “Send Info to Workspace” to see the inner workings of everything the database knows about this experiment.

Extracting Data From the Database #

No GUI has been created yet, however data can be extracted using the function getTrialByTrialData. Here is the syntax:

[lfp,units,trialMetaData,channelMetaData,unitMetaData] = …

getTrialByTrialData(patient,paradigm,…

trialCharacteristics,channelCharacteristics,…

unitCharacteristics,timestampsAlignedTo,…

lfpBandsToLoad,includeExcludedTrials)

patient should be a single numeric value, the number of the patient whose data you want to extract

paradigm should be one of the following:

  • a single string, the name of the paradigm you want to analyze. This must correspond to the strings available on the drop-down menu of patientDataManager_AddOrUpdateSession.
  • a list of experiment numbers (they should all be of the same paradigm type; if not, it may work but no support for this feature is provided), or
  • a cell array of fieldname criteria pairs. Please see the description for trialCharacteristics for more info. You can check the field names and values that are available by sending experiment info to the workspace from patientDataManager_updateOrAddSession (see 14 above).

trialCharacteristics selects which subset of trials should be returned. It should be a cell array of fieldname-value or fieldname-criterion pairs. That is, the cell array should have an even number of elements. The odd entries are names of fields and the even entries are criteria for those fields. If you enter a single value as a criterion, trials whose value in ‘fieldname’ equals the value entered will be returned.

For example, most experiments have both an encoding phase and a recall phase. During the parseTTLs stage of data processing, each trial would have been assigned many characteristics, including “experimentalPhase,” which could be either “encoding” or “memoryTest.” If you are only interested in analyzing trials that occurred during the encoding phase, your trialCharacteristics cell might be

{‘experimentPhase’,’encoding’}.

Or if you wanted multiple characteristics, you could include them:

{‘experimentPhase’,’encoding’,’behavioralResult’,’correct’}

For more sophisticated queries, you may enter an anonymous function of a single variable that evaluates to a logical. If running ‘fieldname’’s value through the anonymous function returns a value of true, that trial will be included. If you input multiple pairs, all criteria must evaluate to true to be included. For example, if you were interested in order effects, you might want to return only trials whose trial number is less than 5. This could be achieved with the following set of trialCharacteristics:

{‘experimentPhase’,’encoding’,’trialNumber’,@(x)x<5}

note: The available fieldnames differ depending on the paradigm run. To find out what your choices are (and what values they may take), you may run

[fieldnamechoices possiblevalues] = …

getFieldNameChoices(patientNumber,paradigmName)

channelCharacteristics select which channels’ data should be returned. The format is the same as trialCharacteristics. Currently the fieldnames available include ‘channel’, ‘electrodeNumber’, and ‘brainRegion’, though more fieldnames are expected soon.

unitCharacteristics select which unit data should be returned. The format is the same as trialCharacteristics. Available filenames include ‘channel’, ‘unit’, ‘electrodeNumber’, ‘brainRegion’,’classification’, and ‘notes.’ Unit classification is either ‘Multi Unit’ or ‘Single Unit’.

timestampsAlignedTo allows you to select a point within each trial to set as time 0. Choices are dependent on the fieldnames from trialCharacteristics. Typically ‘trialStartTime’, ’trialEndTime’, ’stimulusOnsetTime’, ’stimulusOffsetTime’, ‘stimulationOnsetTime’, and ‘stimulationOffsetTime’ are available. You may also select ‘none’ if you want actual time stamps.

lfpBandsToLoad is optional. List the names of the LFP bands you want. If you don’t enter anything, the bands returned by default are lfp (which is filtered to 1-300 Hz) and theta (which is filtered to 3-8 Hz). If you would like to use the masks created by the artifact rejection algorithm, you may request ‘mask’, which returns NaN wherever an artifact was detected, or for one of the filtered bands you may request, e.g. ‘theta_mask’, which returns the same thing as ‘mask’, except that the regions of NaNs have been widened to be concomitant with the width of the filter that made that band.

includeExcludedTrials is optional. Its default value is 0. By default, trials that were excluded in the excludeTrials step will never be returned. If you have a reason why you want to see excluded trials, you may set this to 1.

The output arguments will be structures whose field names are experiment numbers corresponding to experiments from the paradigm requested. Within each experiment, there will be a struct whose field names are the relevant variables. The size of these fields will be nTrials-by-nChannels or nTrials-by-nUnits for lfp and units. MetaData will be corresponding sizes (nTrials-by-one, one-by-nChnnels, one-by-nUnits).

Further Information Regarding Signal Processing #

STEP 1: Data conversion, Clustering, Signal Quality Analysis

Data converstion, Signal Quality Analysis: In the initial Data conversion step raw neural data from the broadband continuous recordings are converted to matlab-readable format, and assessed for signal quality via a per-channel, signal quality analysis algorithm which results in a list of electrophysiologically-informative channels. In effect, the frequency spectrum is analyzed and frequencies where the power is higher or lower than would be expected by chance are found. If no frequencies are found to have power above chance, the channel is discarded from further analysis (integration into pipeline is in progress).

Clustering: Single-unit spikes are detected via an unsupervised spike detection and sorting algorithm with wavelets and super-paramagnetic clustering (wave_clus, Quiran-Quiroga, 2004).  Subsequently a manual cluster validation step is performed in which clusters detected by the sorting algorithm are checked and corrected by two researchers.

STEP 2: Motion Artifact analysis

Each set of 8 channels corresponding to a specific brain region is analyzed together for presence of high-amplitude artifact due to the patient’s movement. Due to low trial counts we retain trials where such signal is present, however generate a noise masking vector that allows to eliminate those bad regions during analysis. Any sample that is more than 6 interquartile ranges removed from the median is designated a potential artifact. Then, potential artifacts that are simultaneously detected on at least half of the leads are attributed to movement, and recorded as artifacts. The temporal locations of these artifacts are stored in a separate masking vector, which contains ‘islands’ of masking values (NaNs), whose indices corresponding to location of artifacts in the continuous data signal. This masking vector can be multiplied with the continuous signal in order to remove noisy regions.

STEP 3: Filtering and de-trending

Continuous data was filtered with a ‘coarse’ low-pass (0-300 Hz) least squares finite impulse response filter consisting of 300 tabs, and decimated to a sampling rate of 1kHz.  This coarsely-filtered downsampled signal (coarse_down) was further high-passed at 1hz to remove any DC drift, generating a ‘clean’ lfp signal. A filter bank of bandpass signals at relevant EEG bands was generated by filtering the coarse_down signal through a series of least-squares FIR filters for which the filter order was computed as three times the rounded ratio between the lower frequency band and the sampling rate. EEG bands include Theta (3 – 8Hz), Alpha (9-15Hz), Beta (16-29 Hz), low Gamma (30-45 Hz), and high Gamma (75-125Hz).  An instance of the motion artifact mask with adjustments for filter widths was created for each EEG bands. This was done in order to account for the filtering step, in which convolution of the coarse signal with a filter is expected to widen the noisy regions by the number of tabs present in the filter. In order to avoid introducing phase distortion, filtering was performed in a bidirectional manner via the filtfilt matlab function.  All power spectra and filter responses are generated and saved for further analysis. Filter implementation was similar to eegfilt.m function (EEGLAB toolbox).

Further Information Regarding Behavioral Analysis #