Preparing datasets for Simitar

Get the mock dataset

The tutorial relies on these files, which you should download now:
  1. NIfTI file containing the dataset (4D, 160 time points)
  2. NIfTI file containing a "brain mask" of voxels to include in the analysis (3D); it happens to also have a positive ROI ID in each voxel corresponding to the ROI it belongs to, but this is not strictly necessary (a brain mask would be deemed a single ROI)
  3. Regressors for each condition (# time points x #conditions)
  4. Regressors for each run (#time points x #conditions)
We chose to have them in a format that should be close to what is found on a typical dataset, or which can be produced easily.

Get the data and labels into MATLAB (Octave)

In order to get the data into MATLAB (Octave) you will have to use one of the following routes
  1. The FSL MATLAB functions (if you have FSL installed)

    From your terminal type

    	    echo $FSLDIR
    	  
    to find out what the FSL path is, and then
    	    addpath('[whatever the FSL path is]/etc/matlab');
    	  
    from MATLAB. To load the data and mask files
    	    [data4D] = read_avw('dataset_small.nii.gz');
    	    [mask3D] = read_avw('dataset_small_atlas.nii.gz');
    	  
    Note of caution:

    Please note that these functions do not read or write orientation information in the NIfTI files (i.e. qform and sform) and so can swap left and right for the specific NIfTI files that you are using if you are not careful; you should be alert to this, and examine the starting files (with fslhd and fslview) and/or plot a few slices from the mask3D where there should be asymmetries to see if it has occurred. You can deal with this within MATLAB and there are FSL functions you can use to modify the NIfTI files a priori (fslcpgeom/fslorient); for more on the latter, please read Fslutils and Orientation Explained.

  2. The niftimatlib package by John Ashburner

    You can get it from

    	    http://sourceforge.net/projects/niftilib/files/
    	  
    or simply use the version bundled with this toolbox. After unpacking you should
    	    addpath('[whatever path to unpacked folder]/matlab');
    	  
    from MATLAB. To load the data and mask files
    	    system('gunzip dataset_small.nii.gz');
    	    tmp = nifti('dataset_small.nii');
    	    data4D = tmp.dat(:,:,:,:);
    	    system('gzip dataset_small.nii');
    	    
    	    system('gunzip dataset_small_atlas.nii.gz');
    	    tmp = nifti('dataset_small_atlas.nii');
    	    mask3D = tmp.dat(:,:,:);
    	    system('gzip dataset_small_atlas.nii');
    	  
    The calls to gzip and gunzip are necessary because the package does not handle gzippped NIfTI files (if your data files end in .nii these calls will not be necessary).

  3. A number of other packages, e.g. which we know other people have used successfully. We cannot document usage for these but, as long as you manage to get them to output a 4D volume with data and a 3D volume with a mask/atlas, they should work as well as the other options.
Either way, you will now have two matrices, one containing the data (4D) and the other the mask (3D). Type
      size(data4D)
      size(mask3D)
    
to see their dimensions. The assumption for this tutorial is that the 160 volumes in the data4D matrix correspond to 160 trials, e.g. each volume might be the average image around the peak of the hemodynamic response in a slow event related design, or a volume of beta coefficients deconvolved from a faster design.

The next step is to make labels for each of those volumes. We can load regressor files directly

      regressorsCondition = load('dataset_small_regressors_condition.txt');
      regressorsRun       = load('dataset_small_regressors_run.txt');
    
(which are both 160x4, as there are 4 conditions and 4 runs), and convert them into 160 entry vectors
      labels    = sum(repmat([1 2 3 4],160,1) .* regressorsCondition,2);
      labelsRun = sum(repmat([1 2 3 4],160,1) .* regressorsRun,2);
    

Create ancillary structures and examples

The Simitar functions are fast because they rely on precomputing the searchlight for each voxel (the neighbourhood of each voxel in a certain radius in 3 dimensions). We compute this and other useful information for all the voxels in the 3D mask
      meta = createMetaFromMask(mask3D,'radius',1);
    
using a radius of 1 (corresponding to a 3x3x3 searchlight), and the result is a structure containing that meta-information (described in the next section if you haven't used it before).

We then use the meta structure to generate a set of examples, each of which is a vector containing all the information in the brain mask voxels only for a particular volume. This saves a lot of space and also helps speed up calculations.

      examples = createExamplesFrom4D(data4D,meta);
      size(examples)
    
Note that examples is a 160x896 array, as there are only 896 voxels in the brain mask used.

A brief tutorial about the meta structure

The goal of this section is to acquaint you with the "meta" structure which we generate for a dataset. This is not necessary to use Simitar, but it will be useful if you ever want to do any custom programs that need to find the neighours of a given voxel (for instance).

By now you've seen data formats such as nifti that store entire 3D volumes over time. This is somewhat wasteful, as we are often interested in a fraction of those voxels (e.g. those in a mask covering cortex). The CMU format stores data as an array with #examples x #voxels, including only those mask voxels.

The best way of introducing this is with an example. Let's say we a 4D dataset with a 3D volume for each of 10 time points, and the 3D volume is 4x4x4

      dataset = randn(4,4,4,10);
    
Even though these are little volumes, we would like to look at the tiny 2x2x2 brain inside, as defined by this mask
      mask = zeros(4,4,4);
      mask(2:3,2:3,2:3) = 1;
    
If you want to see what it looks like, you can disp(mask) or
      clf;
      for i = 1:4
      subplot(1,4,i); imagesc(mask(:,:,i),[0 1]); axis square;
      end
    
The "meta" structure is produced for a mask over the 3D volume that we would like to use, using
      tinymeta = createMetaFromMask(mask);
    
If you type
      tinymeta
    
you'll see various fields. For now, please look at
      dimensions     - the dimensions of the imaging volume
      dimx,dimy,dimz - same, if you quickly want to get each one

      indicesIn3D - m element vector

      these are just the linear indices into a [dimx] x [dimy] x [dimz] matrix
      of the value 1 voxels in the mask.
    
If you type
      mask(tinymeta.indicesIn3D)
    
you should thus get a vector with all 1s. This can also be used to pick the values of mask 1-voxels in any volume in the dataset, e.g.
      volume = dataset(:,:,:,1);
      values = volume(tinymeta.indicesIn3D);
    

exercise 1

At this point, I would like you to create tinyexamples, a matrix of examples containing only the mask voxels in our small dataset. This is #volumes x #voxels in the mask. As a sanity check, you should look at
      tinyexamples(1,:)
    
and make sure the appropriate voxels in dataset(:,:,:,1) are picked up.
      EXERCISE 1 SOLUTION:

      nvoxels  = length(tinymeta.indicesIn3D);
      tinyexamples = zeros(10,nvoxels);
      for i = 1:10
        volume = dataset(:,:,:,i);
        tinyexamples(i,:) = volume(tinymeta.indicesIn3D);
      end
    

Now let's go over the other fields in meta, which are used to relate a voxel's position in 3D to the corresponding column number in the examples matrix
  
      colToCoord - #voxels x 3 matrix 
      
      vector = coordToCol(i,:) places the 3D coordinates of voxel i in vector
      
      coordToCol -  x  x  matrix

	    i = colToCoord(x,y,z) gives the column of voxel with 3D coordinates x, y and z
	    If i = 0 then the voxel is not in the data matrix.
    

exercise 2

Now do the following
      EXERCISE 2 SOLUTION

      voxel = tinymeta.coordToCol(2,2,2);
      values = tinyexamples(:,voxel)
           
      coordinates = tinymeta.colToCoord(8,:)
    

Finally, the two remaining fields; these are meant to speed up the process of determining which voxels are immediately adjacent to a given voxel (and thus in its "searchlight"). It's possible to create meta structures for neighbourhoods with more than just the adjacent voxels, but that won't be necessary today.
      numberOfNeighbours - #voxels x 1

	numberOfNeighbours(i) is the number of neighbours a voxel has

      voxelsToNeighbours - #voxels x (2*radius+1)^3

	neighbours = voxelsToNeighbours(i,1:numberOfNeighbours(i));

	are the neighbours of voxel i (not including i).
	Note that row i in voxelsToNeighbours will be filled with numbers only
	if voxel i has neighbours in every possible direction. If not, the entries
	above numberOfNeighbours(i) are just NaN.
    

exercise 3

Now I would like you to find out, using these fields
      EXERCISE 3 SOLUTION

      voxel  = tinymeta.coordToCol(2,2,2);
      number = tinymeta.numberOfNeighbours(voxel);
      neighbours = tinymeta.voxelsToNeighbours(voxel,1:number)
    

exercise 4

We can also consider having a meta structure with larger searchlights. This would be created like this
      tinymeta2 = createMetaFromMask(mask,'radius',2);
    
This is very similar to tinymeta, except for one of the fields. Which one?
      EXERCISE 4 SOLUTION

      size(tinymeta2.voxelsToNeighbours)

      will show you that, in a searchlight with radius 2 (5x5x5 cube) you
      have up to 124 neighbours.
    

meta and ROIs

Now that you are familiar with the fields in meta, let us go over the meta for the tutorial dataset
      load('dataset_small.mat');
      meta
    
You will notice that, in addition to the fields discussed above, this meta has 3 more fields
      nrois
      roiIDs
      roiColumns
    

This was obtained by giving createMetaFromMask a mask volume containing not just 1s where the brain was present and 0s everywhere else but, instead, a numeric ROI ID in each voxel corresponding to the ROI that voxel belonged to. In the tutorial dataset there are 4 ROIs, with IDs stored in meta.roiIDs, and meta.roiColumns contains, for each of those 4, the column numbers in examples containing the voxels belonging to that ROI. We can use these numbers to find their respective 3D coordinates, by using them as indices into meta.colToCoord. Absent ROI IDs in a 3D mask, meta will contain a single ROI.