Chapter 4 Add isoform counts to Seurat object
4.1 Create a Seurat object with isoform expression data
Now that we have 309 high-quality cells for our downstream analysis, let’s incorporate isoform-level information into our Seurat object. The first step is to read in the outputs from FLAMES, which provides isoform-level counts from Oarfish. We’ll create a new Seurat object for this data, specifically for the isoform counts, and add it as a new assay to our existing object. This setup will allow us to explore both gene-level and isoform-level counts within a single Seurat object, giving us a more comprehensive view of our data.
Code
#This function reads in Oarfish count files and creates a csv file of count data. The function also appends the gene symbol to the ENSTID
<- function(flames_output_folder, sample_name, output_dir) {
process_oarfish_files_to_counts_matrix
# Read in the resource table (transcript_id, gene_id, gene_symbol)
# Define the file paths based on the sample name
<- file.path(flames_output_folder, paste0(sample_name, ".count.mtx"))
count_matrix_path <- file.path(flames_output_folder, paste0(sample_name, ".barcodes.txt"))
barcodes_path <- file.path(flames_output_folder, paste0(sample_name, ".features.txt"))
features_path
# Read the data
<- readMM(count_matrix_path)
counts <- readLines(barcodes_path)
barcodes <- read.delim(features_path, header = FALSE)
features
# Transpose the matrix for Seurat compatibility
<- t(counts)
counts
# Set row and column names
rownames(counts) <- features$V1
colnames(counts) <- barcodes
# Convert to a data frame
<- as.data.frame(counts)
counts_df
# Add transcript_id as the first column
$transcript_id <- rownames(counts_df)
counts_df<- counts_df[, c(ncol(counts_df), 1:(ncol(counts_df)-1))]
counts_df
# Merge with the resource table to add gene symbols
<- counts_df %>%
df_genesymbol left_join(isoform_gene_dict, by = "transcript_id")
# Remove the gene_id column and reorder the columns
$gene_id <- NULL
df_genesymbol<- df_genesymbol[, c(ncol(df_genesymbol), 1:(ncol(df_genesymbol)-1))]
df_genesymbol
# Update row names to include gene symbol instead of transcript_id
rownames(df_genesymbol) <- paste0(df_genesymbol$transcript_id, "_", df_genesymbol$gene_symbol)
$transcript_id <- NULL
df_genesymbol$gene_symbol <- NULL
df_genesymbol
# Write the output to a CSV file
<- file.path(output_dir, paste0("gene_symbol_", sample_name, "_counts.csv"))
output_path write.csv(df_genesymbol, output_path)
cat("Processed sample:", sample_name, "\nOutput saved to:", output_path, "\n")
return(df_genesymbol)
}
<- process_oarfish_files_to_counts_matrix(
oarfish_counts flames_output_folder = "/data/scratch/users/yairp/FLAMES_Day55/outs/",
sample_name = "oarfish",
output_dir = "./output_files/counts/"
)
## Processed sample: oarfish
## Output saved to: ./output_files/counts//gene_symbol_oarfish_counts.csv
Code
# Create the Seurat object with iso counts
<- CreateSeuratObject(counts = oarfish_counts, project = sample_id)
iso_seurat_obj
VlnPlot(iso_seurat_obj, features = c("nFeature_RNA", "nCount_RNA"), ncol = 2) + plot_annotation(title = "QC plots (isoform level) BEFORE Filtering")
4.2 Filter the new Seurat object based on gene level information
We have now created a Seurat object with isoform-level count data. Users can take this object and follow similar processing steps as at the gene level 3, which might include filtering out low-quality cells and removing ambient RNA counts. In our case, we evaluated these steps and found they don’t add much value as low quality cells are already removed at the gene level and we can leverage this information rather than processing the isoform counts in the same way. Additionally, tools like DeconX and SoupX haven’t been tested on isoform count data, so we decided to exclude this analysis. Instead, we believe the most valuable approach is to append the isoform data to our gene-level Seurat object, resulting in two assays: one for gene-level and one for isoform-level data. If users have short-read data, they could similarly add it as a third assay.
Since we’ve already filtered out low-quality cells based on our gene-level data, we’ll use these high-quality cells to filter our isoform data.
Code
#### isofrom assay
## filter the data so iso and gene cells match
<- subset(iso_seurat_obj, cells =filt_seurat_object@graphs[["RNA_nn"]]@Dimnames[[1]])
oarfish_iso_matched_gene
#Rejoin data sets after integration
<- JoinLayers(oarfish_iso_matched_gene)
oarfish_iso_matched_gene <- oarfish_iso_matched_gene[["RNA"]]$counts
counts_table_iso
as.data.frame(counts_table_iso) -> df_iso
# Remove rows where the sum is 0
<- df_iso[rowSums(df_iso) != 0, ]
df_iso
"iso"]] <- CreateAssay5Object(counts = df_iso)
filt_seurat_object[[
# Normalize the new assay data
<- NormalizeData(filt_seurat_object, assay = "iso")
filt_seurat_object <- FindVariableFeatures(filt_seurat_object, assay = "iso")
filt_seurat_object <- ScaleData(filt_seurat_object, assay = "iso")
filt_seurat_object <- RunPCA(filt_seurat_object, assay = "iso", reduction.name = "pca_iso")
filt_seurat_object
#Run UMAP
<- RunUMAP(filt_seurat_object, reduction = "pca_iso", dims = 1:15,
filt_seurat_object assay = "iso", reduction.name = "umap_iso")
#check to see that we have two assays
filt_seurat_object
## An object of class Seurat
## 93557 features across 309 samples within 3 assays
## Active assay: RNA (14906 features, 2000 variable features)
## 3 layers present: counts, data, scale.data
## 2 other assays present: joined, iso
## 4 dimensional reductions calculated: pca, umap, pca_iso, umap_iso
Code
# visualize the UMAP for Gene and isoform
DimPlot(filt_seurat_object, label = TRUE, reduction = "umap") + ggtitle("UMAP gene level clustering") |
DimPlot(filt_seurat_object, label = TRUE, reduction = "umap_iso") + ggtitle("UMAP isoform level clustering")
Code
<- filt_seurat_object
seu_obj saveRDS(seu_obj, file = "./output_files/seu_objects/Day55_tutorial_gene_and_isoform_seurat.rds")
4.3 Add the isoform assay to the Seurat object
Great! We now have an object containing both assays, so we can start by plotting some of our favorite genes and isoforms. This setup gives us the flexibility to visualize gene expression on isoform UMAPs and vice versa, allowing us to integrate and explore the expression of both gene and isoform expression within single cells on the same dimensional reduction.
Let’s begin by plotting VIM and TBR1. VIM is a marker of progenitor cells, while TBR1 is a marker of deep layer neurons.
Code
####### test plotting on iso assays ##
<- rownames(filt_seurat_object@assays$iso@features)
features
<- "VIM"
gene <- grep(paste0("(^|-|\\b)", gene, "($|\\b)"), features, value = TRUE)
plot_features_list
#We can now plot our favorite gene and all its corresponding isoforms on either the gene or isoform UMAPs. To switch between them, simply change the reduction argument to the desired UMAP (either the gene or isoform reduction)
FeaturePlot(filt_seurat_object, features = plot_features_list, reduction = "umap_iso")
Code
FeaturePlot(filt_seurat_object, features = gene, reduction = "umap")
Code
#lets also plot the isoform data on the gene level clsuters.
FeaturePlot(filt_seurat_object, features = plot_features_list, reduction = "umap") +
plot_annotation(
title = 'VIM isoform expresstion plotted on gene level clusters',
#caption = 'made with patchwork',
theme = theme(plot.title = element_text(size = 20))
)
Code
<- "TBR1"
gene <- grep(paste0("(^|-|\\b)", gene, "($|\\b)"), features, value = TRUE)
plot_features_list
#We can now plot our favorite gene and all its corresponding isoforms on either the gene or isoform UMAPs. To switch between them, simply change the reduction argument to the desired UMAP (either the gene or isoform reduction)
FeaturePlot(filt_seurat_object, features = plot_features_list, reduction = "umap_iso")
Code
FeaturePlot(filt_seurat_object, features = gene, reduction = "umap")
Code
#lets also plot the isoform data on the gene level clsuters.
FeaturePlot(filt_seurat_object, features = plot_features_list, reduction = "umap") +
plot_annotation(
title = 'TBR1 isoform expresstion plotted on gene level clusters',
#caption = 'made with patchwork',
theme = theme(plot.title = element_text(size = 20))
)
We observe that the expression of VIM and TBR1 are is localized to two distinct clusters, which aligns with our expectations. This suggests that some cells remain in the progenitor/radial glial stage, while others have already begun differentiating into neurons. Furthermore, our data reveals isoform-level expression for each of our genes, including TBR1, which shows distinct isoform expression patterns across different cell populations. To plot genes and expressed isoforms from that gene simply change the ‘gene’ value to your favorite gene and run the code chunk above.
Now we have a count matrix with two assays, gene and isoform. There are many avenues for analysis. We will begin with finding DE genes and isoforms. Then we identify cell types and perform trajectory analysis. finally we look into interesting isoforms unique to each cluster