RNA-Seq allows researchers to rapidly and comprehensively sequence the vast array of RNAs in cells. The result is an avalanche of raw sequencing data that requires complex bioinformatics pipelines to deconvolute, analyze, and uncover novel insights about the transcriptional state of cells. In the past few years, new sequencing techniques, such as single-cell RNA-Seq, have come on the scene, offering a more detailed, unprecedented view of the heterogeneity present in tissues, collections of cells, and cell and gene therapies. While remarkable, these new techniques present several new analytical challenges and methods for making sense of increasingly rich data.1
Communicating with Effective Data Visualization
The saying goes, “Science isn’t complete until it’s communicated.” And in computational life sciences (and biological research, generally), the key method for communication occurs using data visualization (also called DataViz) techniques, bringing order and sense to a chaotic mess of “big data.”
At its worst, data visualization can lead to misunderstandings and the dissemination of incorrect information.
But at its best, particularly with RNA-seq data, it can reveal patterns, trends, and connections in transcriptomics data to colleagues and the broader scientific community, which can often escape the written or spoken word.2
With the increasing number of complex sequencing and proteomics workflows, there are continuing data visualization challenges with how to represent this data best. Current difficulties include:3
- Deconvoluting the 3-D structure of chromosomes in a cell using genomics and high-resolution imaging
- Visualizing highly specific yet multidimensional single-cell RNA-seq data to understand heterogeneity in cell therapies
- Extracting insight from high-throughput mass spectroscopy data for understanding the post-translational modifications of proteins
- Understanding the 3-D distribution of the transcriptome using spatial transcriptomics
Recommended DataViz Programs and Websites
The increased emphasis on communication of results and data visualization has given rise to web-based applications and software suites for data analysis, creating out-of-the-box, customizable data visualization.
Here are a handful of examples:
- cBioPortal: A cancer genomics platform for visualizing results from the MSK-IMPACT study, which has complied molecular, clinical, and pathological data from a large cohort of cancer patients4
- Azimuth: An annotated reference dataset to automate the processing, analysis, and interpretation of a new single-cell RNA-seq experiment
- JBrowse: An interactive, open-source genomics tool for structural variants and comparative genomics visualization5
7 Powerful Ways to Visualize RNA-Seq data
While there’s no one-size-fits-all pipeline for analyzing RNA-Seq data, some foundational visualization types can drive cell and gene therapy discovery and illuminate meaningful gene expression patterns.
Let’s look at some practical and insightful ways to visualize yourRNA-Seq data.
Gene Expression Plot
Gene expression plots enable you to visualize trends in gene expression across biological replicates or experimental conditions. You can express the data as a bar plot for single or multiple genes.
Heat maps are a common way of visualizing gene expression data from individual or groups of genes (as in gene set enrichment analysis) across all of your biological samples. Each row in the heatmap represents a gene, and each column is a sample. Color differences and intensity are used to describe relative changes in gene expression.
Network maps are ways of analyzing and predicting gene-gene associations. Nodes represent genes, and connections between nodes (i.e., edges) represent associations. The network's topology can be used to predict gene regulation or function.
Principal Component Analysis
Principal Component Analysis or PCA allows you to identify patterns in complex data sets by taking the expression data for all of your genes and condensing them into two dimensions. The clustering of your data points (which may represent each biological sample in your experiment) can be easily visualized and used to compare biological replicates or identify batch effects.
Scatterplots are a common way to visualize the information from a single-cell RNA-Seq dataset. Multiple dimensions can be represented for each cell analyzed, and each data point on a scatterplot represents a single cell. This type of analysis can be used for examining vector copy number, on- or off-target editing, zygosity, and more for cell and gene therapies.
Volcano plots show the logarithm of p-values plotted against the logarithm of fold change in gene expression. Each data point on the plot represents an individual gene that can be color-coded based on mean expression differences.
By looking at sample sparsity, you can Identify the concentration of counts in a single sample over the sum of counts per gene. This analysis is a valuable diagnostic for datasets that might not fit a negative binomial assumption: genes with many zeros and individual very large counts are difficult to model with the negative binomial distribution.
Interested in the science behind effective data visualization for RNA-Seq? Check out Form Bio’s infographic.
Form Bio Will Streamline Your Data Visualization NeedsGet Your Demo Today
- Holmes, C et al. Single-cell gene expression analysis reveals genetic associations masked in whole-tissue experiments. Nat Biotechnol 31(8), 748–752 (2013).
- Why scientists need to be better at data visualization. Knowledgeable magazine: Published November 12, 2019. Accessed August 11, 2022.
- O’Donoghue, S.I. Grand Challenges in Bioinformatics Data Visualization. Front Bioinform (2021).
- Zehir, A., Benayed, R., Shah, R.H., et al. Mutational landscape of metastatic cancer revealed from prospective clinical sequencing of 10,000 patients. Nat Med 23(6), 703–713 (2017).
- Buels, R., Yao, E., Diesh, C.M., et al. JBrowse: a dynamic web platform for genome visualization and analysis. Genome Biol 17:66 (2016).