In biological basic and translational research, the era of big data has arrived. The emergence of high-throughput sequencing and imaging technologies has massively increased the amount of biological data available for analysis and learning. Currently, as part of the research process, many scientists and researchers are expected to wrangle and analyze their own data, that is, run their own computational analysis, since acquiring meaningful insights from this tangled mass of data information is too challenging and time-consuming to do manually.
Enter Computational Life Sciences, which provides scientists and researchers like you with the tools to navigate big data easily, analyze the mountainous volume of scientific information that’s now available, allowing you to focus on science and less on the computational side of things. By combining the principles and theories of the life sciences with the power of computation, computational life sciences will tackle the most complex and challenging problems in biology, generating new insights and knowledge that can be applied in a variety of research, translational and clinical settings.
But learning about a new field, particularly such a broad and interdisciplinary one, can be difficult and confusing. Where to even start when there are many different resources available?
In this introduction, we’ve collected the information you need to know if you’re just getting started looking into Computational Life Sciences and answer many of the questions you may have.
Let’s start with the basics.
The Definition of Computational Life Sciences
So, what is Computational Life Sciences? Computational Life Sciences is an interdisciplinary field that combines the power of computational methods and techniques to study biological systems and processes that lead to the development of new diagnostics and treatments for diseases. Leveraging advanced computational techniques to better understand biological systems and processes accelerate studies of complex biological networks and will bring more precise therapeutics to patients faster.
One of its key characteristics is the interdisciplinary nature of Computational Life Sciences. The principles and theories of the different fields in computational life sciences including biology, bioinformatics, computer science, AI/ML, engineering and high performance computing are merged to acquire, store, analyze, and in some cases predict, complex and large scale biological research and clinical data into meaningful insights. In the next section we’ll take a quick look at the numerous fields that come together to create this emerging field.
The Broad Impact of Computational Life Sciences
Enabling technologies, like next-generation sequencing (NGS) and AI-powered image analysis, combined with computational tools that process and analyze the resulting data has given rise to many computationally-driven scientific fields.
Here is an overview list that continues to grow in scope and utility:
- Bioimage informatics uses computerized image processing and analysis, including pattern recognition, to extract information from images.1
- Bioinformatics is a discipline that combines biology, mathematics and computer science to acquire, store, analyze and disseminate complex biological data.2
- Computational chemistry uses computational methods and simulation to help solve chemical problems, e.g. engineer enzymes or other proteins to work better.3
- Computational microscopy combines artificial neural networks with microscopy to capture microscopic images of objects.4
- Genetics is the study of heredity and the variation of inherited characteristics.5
- Genomics is the branch of molecular biology concerned with the structure, function, evolution, and mapping of genomes.6
- Health informatics uses digital patient data to inform and improve aspects of the healthcare system, from clinical trial recruitment, individual patient care and drug treatments to population-level health.7
- Metabolomics is the study of the biochemistry of metabolism and metabolites.8
- Molecular modeling is the modeling of molecular structures by way of computational chemistry.10
- Pharmacogenomics researches the way in which an individual's genetic attributes affect the likely response to medicines.9
- Phylogenetics is the study of the relationships in groups of animals and humans.11
- Proteomics is the study of proteomes and their functions.12
- Structural bioinformatics is the structure and function prediction of macromolecules such as DNA, RNA, and proteins.14
- Systems biology is mathematical modeling and analysis of large datasets..15
- Transcriptomics is the analysis of the complete RNA transcriptome.13
Computational Life Sciences Solves Challenging Research Problems
Computational Life Sciences will address complex and multi-faceted questions in biology, all of which can improve our understanding of biological systems. Below are examples of the type of life sciences analyzes that Computational Life Sciences include.
The development of Sanger, next-generation, and third-generation sequencing technologies have led to the submission of trillions of nucleotide sequences to GenBank alone.16 To analyze these sequences manually would be impossible. Computational Life Sciences is used extensively to analyze primary DNA, RNA, and protein sequences to identify protein-coding regions, RNA genes, regulatory sequences, structural motifs, and repetitive sequences. Sequences can be compared between organisms to determine evolutionary history (i.e., phylogenetic trees) or make functional predictions.
Computational Life Sciences can also help analyze and understand genome and transcriptome sequencing data. Using tools and software with raw sequence data, you can assemble, map, and annotate whole genomes, identify new transcripts, track genome changes across evolution, find genomic markers for diseases, and much more.
Global gene expression can be measured by RNA sequencing (RNA-Seq), which has become widely used for whole transcriptome analysis. Computational Life Sciences processes and analyzes this data by acting as quality control for raw data, removing noise, aligning reads, annotating transcripts, identifying differentially expressed genes, and performing statistical analysis.
Other gene expression techniques, such as RNA and protein microarrays or mass spectrometry-based proteomics, are also used to process and analyze the large amounts of resulting data. By looking at patterns in gene expression data, you can infer regulatory elements or conditions under which gene expression occurs.
Image analysis is used widely in biology to analyze cell morphology, organelle location, protein localization and trafficking, and nucleus organization. Computational Life Sciences help interpret images or look for patterns and associations with data of interest (i.e., transcriptomic or genomic signatures).
Primary sequence analysis can help identify functional motifs and some secondary structural elements, but what about full 3-D tertiary and quaternary structures of proteins and protein complexes?
Computational Life Sciences, especially machine learning (ML)-based methods are critical to model structure and better understand complex biological functions like ligand-receptor binding or how mutagenesis impacts a protein's structure .
This structural information is critical to understanding biomolecular function. One significant area of Computational Life Sciences research is trying to generate computer models that predict the tertiary structure of a protein from the primary sequence. ML techniques have advanced this field with the development of new methods for predicting molecular biological structures, including AlphaFold, a program for protein structural prediction.
Network analysis looks at the global interaction between biological macromolecules, cells, tissues, and organisms. Computational Life Sciences uses data measuring protein-protein interactions, gene expression, or other biochemical data to identify patterns and form network predictions. Network predictions integrate many data types to gain insights into chromatin remodeling, regulatory regions, and transcription factor binding. The microbiome field also uses network analysis to determine how organisms interact, what resources might be shared, or what organisms might be necessary in an environment.
Computational Life Sciences Solves Challenging Clinical Problems
Computational Life Sciences helps to bring meaning to all kinds of biological data, which you can use to make basic research discoveries faster, diagnose a patient with a rare condition, track and monitor infectious organisms as they move through a population, or identify the best treatment for a patient with cancer - among many other applications.
Here is a short list of fields where Computational Life Sciences is used in real-world clinical applications.
Cell and Gene Therapy Development
There are over 1,000 ongoing clinical trials evaluating the safety and efficacy of cell and gene therapies in a broad array of therapeutic areas. While only a handful of these new therapies have been FDA-approved, biopharmaceutical investors are banking heavily on cell and gene therapy companies and their potential to unleash powerful cures for rare genetic diseases and many cancers.
However, the infrastructure to support the entire journey from pre-clinical research to approval and commercialization is in its infancy. Most cell and gene therapies being evaluated are in phase I/II trials. They have yet to navigate the scale-up process, and with the current manufacturing capabilities, some may be destined for disappointment. Even therapeutics that have successfully navigated approval face challenges due to excessive pricing and lack of clarity around reimbursement.
Computational life sciences make research, development, manufacturing, and commercialization more efficient. Leveraging artificial intelligence (AI) algorithms and deep learning offer approaches to positively impact these processes, improving the quantity and quality of manufactured products and production efficiency advancing them into a new era of innovation.23
Drug discovery is focused on identifying drugs that reduce symptoms associated with a disease without causing adverse side effects, long-term damage to a patient, or negative impact on society or the environment.17 Computational Life Sciences in drug discovery uses data generated by high-throughput genomics, transcriptomics, and proteomics methods to compare patients with disease-related symptoms to normal controls. By applying this to the drug discovery process, you can:17
- Connect mutations, epigenetic modifications, and gene expression patterns to disease symptoms
- Identify potential drug targets that can restore normal physiological functioning
- Design drugs that act upon a drug target to achieve the desired therapeutic result and minimize side effects
- Predict environmental health impact and the potential of drug resistance
As highlighted by the recent COVID-19 pandemic, fast and efficient vaccine development is critical for combating emerging pathogens. Computational Life Sciences methods can significantly reduce the time and cost associated with vaccine design and development. Several approaches to incorporating computational methods into traditional vaccine development pathways have been developed:18
- Reverse vaccinology: Through genomic sequencing of bacteria, viruses, parasites, or cancer cells, it is possible to predict all proteins that may be expressed and which will likely be the most antigenic or have promising physicochemical properties.
- Immunoinformatics: Uses mathematical and computational approaches to process and develop immunological data to make predictions on T- and B-cell responses to specific antigens.19
- Structural vaccinology: Focuses on the 3-D structure of proteins that would make the best antigens
Biomarker Identification and Precision Medicine
Biomarkers play a significant role in our understanding of the biology of individuals. For instance, human genetic variation can result in metabolic differences, disease progression, and responses to therapy.22 Computational Life Sciences can play several roles in this area by facilitating:
- Comparative genomics: Identifying the evolutionary history and associated genetic variation (i.e., chromosomal rearrangements, gene duplications, gene deletions) to understand the function of genes and genomes and their potential role in diseases
- Single nucleotide polymorphism (SNP) identification: Identifying associations of specific SNPs with the risk of developing a particular disease or the response to a specific treatment
- Alternative splice variant identification: Genes associated with different diseases can produce varying mRNA splice variants (positional differences in exon-intron/intron-exon junctions)
We’ve covered a lot in this post, and now, you should have a better sense of the field of Computational Life Sciences and the types of questions it can help you answer, and how it helps solve real research and clinical problems. This is a fast-moving field so we’ll be updating this blog with new information regularly.
Interested in receiving updates on computational life sciences trends straight to your inbox?Sign up for our newsletter
- Wren, J et al. Bioimage informatics: a new category in Bioinformatics. Bioinformatics. Apr 15; 28(8): 1057 (2012).
- Hogeweg, P and Hesper B. Interactive instruction on population interactions. Comput Biol Med 8(4):319-27 (1978).
- Nature Portfolio: Computational chemistry
- Waller, L and Tian L. Machine learning for 3D microscopy. Nature 523, 416-417 (2015).
- National Institute of General Medical Sciences
- National Human Genome Research Institute: A Brief Guide to Genomics
- US FDA: Real-World Evidence. Assessed Dec 12, 2022.
- Clish B. C. Metabolomics: an emerging but powerful tool for precision medicine. Cold Spring Harbor Laboratory Press (2015).
- Athey, B. Deep learning in pharmacogenomics: from gene regulation to patient stratification. Pharmacogenomics, Vol. 19, No. 7 (2018).
- Nature portfolio: Molecular modelling. Assessed Dec 12, 2022.
- Phylogenetic Inference, Stanford Encyclopedia. First published Dec 8, 2021. Assessed Dec 12, 2022.
- Weir MP and Blackstock WP. Proteomics: quantitative and physical mapping of cellular proteins. Trends Biotechnology. Volume 17, Issue 3, p 121-127 (1999).
- National Cancer Institute. Transcriptomics. Assessed Aug 31, 2022.
- EMBL-EBI Training. Structural bioinformatics. Assessed Aug 31, 2022.
- The NIH Catalyst. Systems Biology as Defined by NIH. Assessed Aug 31, 2022.
- Sayers EW, et al. GenBank. Nucleic Acids Res. 48(D1):D84-D86 (2020).
- Xia X. Bioinformatics and Drug Discovery. Curr Top Med Chem. 17(15):1709-1726. (2017).
- Ribas-Aparicio RM, Castelán-Vega JA, Jiménez- Alberto A. The impact of bioinformatics on vaccine design and development. Vaccines. 7:123-147 (2017).
- Ishack S, Lipner SR. Bioinformatics and immunoinformatics to support COVID-19 vaccine development. J Med Virol. 93(9):5209-5211. (2021).
- Spinozzi G, Calabria A, Brasca S, et al. VISPA2: a scalable pipeline for high-throughput identification and annotation of vector integration sites. BMC Bioinformatics. 18(1):520. (2017)
- Hwang B, Lee JH, Bang D. Single-cell RNA sequencing technologies and bioinformatics pipelines [published correction appears in Exp Mol Med. May;53(5):1005]. Exp Mol Med. 2018;50(8):1-14. (2021)
- Mount DW, Pandey R. Using bioinformatics and genome analysis for new therapeutic interventions. Mol Cancer Ther. 4(10):1636-1643. (2005)
- Developing Machine Learning Powered Solutions for Cell and Gene Therapy Candidate Validation. Form Bio Resource Center. Published Nov 2022. Assessed Dec 12, 2022.