library(cifti)
library(tidyverse)
CIFTI Analysis Example
Introduction
We’ve gone through a general tutorial about how to import and work with the cifti format in R, so this document will give a short example of how to calculate average thickness within FreeSurfer’s aparc
parcellation labels for three participants. These labels are automatically transferred to fs_LR
space by HCP’s Structural Preprocessing Pipeline and can be exported by fmriprep as well. We will be using data from the fs_LR_32k
atlas in cifti space. The same general process can be done for gifti as well.
Libraries
For this, we will only need the cifti
and tidyverse
packages.
Creating a Function
We can create a custom function and apply it using the purrr
package to wrangle the same data for three different participants. This function will read in a participant’s thickness
and aparc
cifti files, organize those data into data frames, and then join them together.
<- function(subj){
import_ciftis
# read in aparc cifti and get aparc of key-area pairs
<- read_cifti(file.path('data',subj,'fsaverage_LR32k',paste0(subj,'.aparc.32k_fs_LR.dlabel.nii')))
aparc_cii <- tibble(key = aparc_cii$NamedMap$look_up_table[[1]]$Key,
aparc_table area = aparc_cii$NamedMap$look_up_table[[1]]$Label)
# convert aparc data to tibble with hemisphere, vertex number, and assigned area key
<- length(aparc_cii$BrainModel[[1]]) # number of left hemisphere vertices in aparc
aparc_lhvert <- length(aparc_cii$BrainModel[[2]]) # number of right hemisphere vertices in aparc
aparc_rhvert
<- tibble(hemi = rep(c('LH','RH'),times = c(aparc_lhvert,aparc_rhvert)),
aparc_data vertex = c(aparc_cii$BrainModel[[1]],aparc_cii$BrainModel[[2]]),
key = aparc_cii$data[1:(aparc_lhvert+aparc_rhvert)])
# join the table and the data together to get the assigned area for each vertex then drop the key
<- left_join(aparc_data, aparc_table, by = 'key') %>%
aparc select(-key)
# read in thickness data and convert to a tibble with hemisphere, vertex number, and thickness
<- read_cifti(file.path('data',subj,'fsaverage_LR32k',paste0(subj,'.thickness.32k_fs_LR.dscalar.nii')))
thick_cii
<- length(thick_cii$BrainModel[[1]]) # number of left hemisphere vertices in thickness cifti
thick_lhvert <- length(thick_cii$BrainModel[[2]]) # number of right hemisphere vertices in thickness cifti
thick_rhvert
<- tibble(hemi = rep(c('LH','RH'),times = c(thick_lhvert,thick_rhvert)),
thick vertex = c(thick_cii$BrainModel[[1]],
$BrainModel[[2]]), # concatenate the vertex numbers
thick_ciithickness = thick_cii$data[1:(thick_lhvert + thick_rhvert)]) # get data for all vertices
# join aparc and thickness together by hemisphere and vertex number, then drop vertices with "???" area
<- right_join(aparc, thick, by = c('hemi','vertex')) %>%
combined filter(area != "???")
# add the subject name to the final dataframe and move it to the beginning
<- combined %>%
combined mutate(PID = subj) %>%
relocate(PID, .before = 'hemi')
}
A couple of notes about this function:
- Paths and file names are hard-coded because the only difference in file paths across subjects is the subject name. For less organized data, full file paths may need to be given instead of assumed in the function.
- Because the values in the
aparc
data field are just the key values, we needed to make a table first that gives the area names and their associated key values to merge with the data. Integer keys by themselves are meaningless without the associated area. - The left join at the end of the aparc replicates the area names for each time the associates key was found. This gives an output tibble with size equal to the
aparc
data size. - We calculated the vertex numbers for
aparc
andthick
individually instead of assuming all vertices that had an aparc value also had a thickness value and vice-versa. In this case, assuming would have worked but in cases where you’re dealing with a partial parcellation or individual labels you have created that don’t cover the entire brain, calculating single vertex values and applying them across all maps would have given an incorrect data frame at the end. In general, if you don’t know that the set of vertices with assigned data in your files are identical, calculating vertex counts for each type of file is suggested. - If there is other cleaning that needs to be done to your data after import that is the same across participants, it’s best to include in the function instead of doing after all of the data has been concatenated. R is much faster when altering smaller dataframes as opposed to larger ones. This is especially true for timeseries data as the concatenated data frame can be extremely large.
Applying the Function
As mentioned before, we can apply the function to each subject in our data
folder using map_df
from purrr
. If you haven’t used the map
functions before, they operate similarly to the family of apply
functions from base R. In their basic form, you pass a vector of values to map
as well as a function name. The function is then run on each element of the vector. In essence, it’s a simplified for
loop. map
can be customized to return a specific data type at the end of processing by adding tags, in our case _df
.
# subject names are listed as folder names in 'data'. turn off recursive search and full names
<- list.dirs('data',recursive = FALSE, full.names = FALSE)
subs
# apply import_ciftis to each element of subs. The .x is a replacement variable for the subject name
<- map_df(subs, ~import_ciftis(.x))
full_thick
glimpse(full_thick)
Rows: 175,982
Columns: 5
$ PID <chr> "102109", "102109", "102109", "102109", "102109", "102109", …
$ hemi <chr> "LH", "LH", "LH", "LH", "LH", "LH", "LH", "LH", "LH", "LH", …
$ vertex <dbl> 0, 1, 2, 3, 4, 5, 6, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 1…
$ area <chr> "L_isthmuscingulate", "L_postcentral", "L_precentral", "L_su…
$ thickness <dbl> 3.162518, 2.270701, 3.053933, 3.499338, 2.217890, 2.389992, …
Calculate Stats and Plot
From here, we can do whatever stats we want to do for our analysis. In this case, we will just calculate the average thickness in each region in aparc. In this case, the area names already have hemisphere tags, we so don’t need to group by the hemisphere.
<- full_thick %>%
thick_stats group_by(PID, area) %>%
summarize(mean_thick_subject = mean(thickness)) %>%
ungroup()
glimpse(thick_stats)
Rows: 204
Columns: 3
$ PID <chr> "102109", "102109", "102109", "102109", "102109", "…
$ area <chr> "L_bankssts", "L_caudalanteriorcingulate", "L_cauda…
$ mean_thick_subject <dbl> 2.695727, 2.692136, 2.631260, 1.973687, 3.136468, 2…
We can also create a simple bar graph of the 5 thickest and thinnest areas on average across participants.
# calculate mean thickness across participants
<- thick_stats %>%
group_stats group_by(area) %>%
summarize(mean_thick = mean(mean_thick_subject))
# get 5 thickest and thinnest areas, then start plotting
%>%
group_stats arrange(mean_thick) %>%
filter(row_number() %in% c(1:5,(n()-4):n())) %>%
ggplot(aes(x = mean_thick, y = reorder(area,mean_thick))) + # reorder will plot in order of mean_thick instead of abc
geom_bar(stat = 'identity') + # plot actual thickness value instead of count
theme_minimal() +
labs(y = '', x = "Mean Cortical Thickness (mm)", title = "5 Thickest and Thinnest Areas") +
theme(plot.title = element_text(hjust = 0.5))