Chapter 1 Introduction


Welcome to the FLAMES analysis tutorial!

In this tutorial, we demonstrate how to process and analyze long-read single-cell RNA sequencing data using outputs from the FLAMES package (Tian et al., 2021). FLAMES enables the identification and quantification of isoform-level expression in single cells, providing a unique opportunity to uncover transcriptomic complexity that is often undetectable in short-read data.

We will demonstrate how to load and explore FLAMES outputs in Seurat and other popular single-cell analysis tools. By following this workflow, you’ll learn how to:

  • Preprocess long-read single-cell data

  • Visualize isoform expression patterns and isoform structure

  • Identify differentially expressed isoforms across cell types

  • Detect novel isoforms with potential functional impact

If you’re familiar with short-read data processing, much of the pre-processing workflow will feel intuitive. However, long-read single cell sequencing provides isoform-level information which enables you to explore isoform dynamics in single cells. This can be useful for exploring complex developmental systems or disease parthenogenesis.

1.1 Prerequisites

This tutorial assumes you have already processed your long-read single-cell data using FLAMES, either through the sc_long_pipeline or sc_long_multisample_pipeline. Please ensure that the following parameters in your configuration file are set to TRUE to enable isoform identification and quantification with Bambu (Chen et al., 2023) and Oarfish (Jousheghani & Patro, 2024):

  • "bambu_isoform_identification": [true]

  • "oarfish_quantification": [true]

While FLAMES is optimized for use with specific quantification and isoform discovery tools, much of this workflow can be adapted for use with other tools which FLAMES supports. We recommend using Bambu and Oarfish as they have been validated for the type of analysis demonstrated here.

Additionally, we provide an optional step for users interested in removing empty droplets and ambient RNA contamination. If you plan to use this feature, ensure that you have previously calculated the ambient RNA profile. Detailed instructions for this step can be found here 10.1

1.2 Getting Started with the Data

To follow along with this tutorial, you can use the data provided in the ‘data’ folder from the github page. dowload it using the following command:

Code
wget https://github.com/Sefi196/FLAMESv2_LR_sc_tutorial/data. 

Simply unzip all files to begin. If you prefer to run the tutorial using your own output from FLAMES, there is no need to unzip your files. However, be sure to use the correct GTF file. The GTF file used during FLAMES processing must be the same one used for downstream analyses. The current version utilized in this tutorial can be downloaded using the following command:

Code
wget https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_47/gencode.v47.annotation.gtf.gz  

1.3 Dataset Information

This tutorial uses data generated by the Clark Lab at the University of Melbourne, consisting of a small dataset of approximately 400 cells. The cells were collected at Day 55 of an excitatory neural differentiation protocol. More details on the dataset, sequencing methodology, and the differentiation protocol can be found in the following publication: (You et al., 2023) [PLACE HOLDER - FLAMESv2 paper]

1.4 Citation

If you find this tutorial useful please cite our work [PLACE HOLDER - FLAMESv2 paper]

1.5 Contact

For questions or suggestions, please feel free to email us at sefi.prawer@unimelb.edu.au or leave a comment on our GitHub page: https://github.com/Sefi196/FLAMESv2_LR_sc_tutorial.