In gene therapy basic and translational research, the era of big data has arrived. The emergence of high-throughput sequencing and imaging technologies has massively increased the biological data available for analysis and learning on how best to develop a safe and efficacious gene therapy products. Currently, as part of the research, development, and biomanufacturing process, many scientists and technicians are expected to wrangle and analyze their own data, that is, run their own computational analysis, since acquiring meaningful insights from this tangled mass of data information is too challenging and time-consuming to do manually.
Enter Artificial Intelligence (AI) and Computational Life Sciences, which provides gene therapy developers and biomanufacturers with the tools to navigate big data efficiently and analyze the mountainous volume of scientific information that’s now available. This ability lets you focus more on science and less on building a computational skillset from scratch. By combining the principles and theories of the life sciences with the power of computation and AI, our industry is tackling the most complex and challenging problems in biology, bringing about unparalleled advancements in multiomics analysis to develop more effective, yet less expensive gene therapy treatment options for diseases with unmet needs.
This introduction will provide a high-level overview of AI and Computational Life Sciences. It aims to equip you with working knowledge of the field, followed by common basic and translational gene therapy applications. By engaging with technology-focused colleagues, gene therapy development teams can collaborate towards creating safe gene therapies that reach the market faster, with reduced costs, leading to affordable life saving treatments for patients.
The Evolution of Artificial Intelligence and Large Language Models
AI was born from the computational efforts of John Von Neumann and Alan Turing in 1950 who pioneered cybernetics which acts as a toolbox that enables researchers to communicate with machines21. Not until 1956 was AI first conceptualized as “the construction of computer programs that engage in tasks that require high-level mental processes” such as learning, memory and reasoning. Throughout the 1960's to early 2000's, the focus of AI development was was on generating rule-based systems that could operate on pre-defined principles and instructions22.
It wasn't until 2010 when the field of AI experienced a significant breakthrough with the emergence of neural networks. Inspired by the complexity of the human brain, these networks learn through exposure to examples and experiences, much like a human would. This revolutionary development ushered in a new era of machine learning (ML), enabling AI to advance in previously unimaginable ways.
One type of neural network that has become very popular in recent years is the transformer architecture that are used in Large Language Models (LLMs). How LLMs work is, over the course of several months, LLMs process vast volumes of online content, learn, and have the ability to generate responses to questions based on what it's learned, akin to a human but in vastly larger scale. LLMs are the core technology underlining OpenAI’s ChatGPTs and Google’s Bard, that offer a more user-friendly experience via chatbots. With these chatbots, you ask a question and it processes the request, generating a response like a human would. This differs significantly from search engines, which crawl and index websites based on keywords and displays the most relevant websites.
Beyond chatbots, LLMs are unlocking new scientific possibilities such as understanding the structure and function of DNA23 or predicting the functional impact of mutations24. To learn from biological sequence data, LLMs treat amino acids and stretches of DNA as "words" and make predictions based off of what it has learned25. The use of LLMs in answering complex biological questions is in it's infancy but moving extremely fast. These systems integrated into wet lab work will allow scientists to take a holistic and streamlined approach to answering complex biological questions.
As AI has become more sophisticated, the concept of “compute” has become more important as well. Compute refers to the computational resources, such as central processing units (CPUs), graphics processing units (GPUs), or other specialized hardware designed to perform computations. In the context of AI, compute is particularly relevant to recent development of large language models – as these algorithms often require extensive computational resources to process large datasets, train complex models, and make predictions or decisions based on the learned patterns.
The Broad Impact of AI in Gene Therapy Development and Biomanufacturing
Combining the power of computational methods to identify actionable insights in biological data is powerful, leading to the development of novel ways to deliver gene therapeutics and develop novel gene therapy treatments for diseases. More specifically, leveraging advanced AI and computational techniques to shape more efficient, higher-quality gene therapy discovery, development, and production will provide a safer and efficacious therapeutics and direct improvement in the human condition.
For instance, in a recently published pre-print, researchers have explored using a large language model (LLM) on DNA sequences to effectively capture key regulatory genomic elements, notably enhancers, and promoters23. This predictive algorithm benefits gene therapy developers designing viral vectors for transgene expression. Titrating expression can often be a trial-and-error process requiring extensive validation in vitro and in vivo. By deploying precise in silico techniques, gene therapy developers could improve efficiency and cut costs while ensuring compliance with the FDA's rigorous approvals process. Leveraging these techniques would provide assurance of a pure gene therapy product, absent of empty or partially filled capsids, passing critical regulatory checkpoints with ease.
AI integration into gene therapy development and biomanufacturing and other biological disciplines has led to a new field known as Computational Life Sciences. What distinguishes this field from more traditional studies is it’s interdisciplinary thinking, cross-functional integration, incorporation of basic biological concepts, bioinformatics, AI/ML, engineering, and high-performance computing principles and theories.
Enabling technologies, like next-generation sequencing (NGS) and AI-powered image analysis, combined with computational tools that process and analyze the resulting data, has given rise to this field.
Here is a brief overview of different disciplines within Computational Life Sciences, which continues to grow in scope and utility:
- Bioimage informatics uses automated image processing and analysis, including pattern recognition, to extract information from images.1
- Bioinformatics is a discipline that combines biology, mathematics, and computer science to acquire, store, analyze and disseminate complex biological data.2
- Computational chemistry uses computational methods and simulation to help solve chemical problems, e.g. engineer enzymes or other proteins to work better.3
- Computational microscopy combines artificial neural networks with microscopy to capture microscopic images of objects.4
- Genetics is the study of heredity and the variation of inherited characteristics.5
- Genomics is the branch of molecular biology concerned with the structure, function, evolution, and mapping of genomes.6
- Health informatics uses digital patient data to inform and improve aspects of the healthcare system, from clinical trial recruitment, individual patient care, and drug treatments to population-level health.7
- Metabolomics is the study of the biochemistry of metabolism and metabolites.8
- Molecular modeling is the modeling of molecular structures by way of computational chemistry.9
- Multi-omics is an analysis that defines relationships among genome, epigenome, transcriptome, proteome, and/or metabolome from bulk cells to unravel biological networks regulating transitions from health to disease.10
- Pharmacogenomics researches the way in which an individual's genetic attributes affect the likely response to medicines.11
- Phylogenetics is the study of the relationships between groups of animals and humans.12
- Proteomics is the study of proteomes and their functions.13
- Single-cell analysis allows researchers to investigate the heterogeneity and diversity within a population of cells by examining individual cells separately using fluorescence-activated cell sorting (FACS), genomics, transcriptomics, and more.14
- Spatial transcriptomics uses a variety of methods to quantify gene expression in intact tissue samples26.
- Structural bioinformatics is the structure and function prediction of macromolecules such as DNA, RNA, and proteins.15
- Systems biology is the mathematical modeling and analysis of large datasets..16
- Synthetic biology is an interdisciplinary field that combines principles of biology, engineering, and computer science to design and construct new biological parts, devices, and systems or to modify existing biological systems for useful purposes.17
- Transcriptomics is the analysis of the complete RNA transcriptome.18
AI Enables Next-Gen Gene Therapy Development and Biomanufacturing
AI and Computational Life Sciences will address complex and multi-faceted questions in biology, all of which can improve our understanding of biological systems and drive the development of safe and efficacious gene therapy products. Below are examples of the type of analyses that AI and Computational Life Sciences include.
The development of Sanger, next-generation, and third-generation sequencing technologies have led to the submission of trillions of nucleotide sequences to GenBank alone.19 To analyze these sequences manually would be impossible.
AI and Computational Life Sciences is used extensively to analyze primary DNA, RNA, and protein sequences to identify protein-coding regions, RNA genes, regulatory sequences, structural motifs, and repetitive sequences. Sequences can be compared between organisms to determine evolutionary history (i.e., phylogenetic trees) or make functional predictions.
AI and Computational Life Sciences can also help analyze and understand genome and transcriptome sequencing data. Using tools and software with raw sequence data, you can assemble, map, and annotate whole genomes, identify new transcripts, track genome changes across evolution, find genomic markers for diseases, and much more.
Global gene expression can be measured by RNA sequencing (RNA-Seq), which has become widely used for whole transcriptome analysis. AI and Computational Life Sciences processes and analyzes this data by acting as quality control for raw data, removing noise, aligning reads, annotating transcripts, identifying differentially expressed genes, and performing statistical analysis.
Other techniques for measuring gene expression, such as RNA and protein microarrays or mass spectrometry-based proteomics, are also used to process and analyze the large amounts of resulting data. By looking at patterns in gene expression data, you can infer regulatory elements or conditions under which gene expression occurs.
Image analysis is used widely in biology to analyze cell morphology, organelle location, protein localization and trafficking, and nucleus organization. AI and Computational Life Sciences help interpret images or look for patterns and associations with data of interest (i.e., transcriptomic or genomic signatures).
Primary sequence analysis can help identify functional motifs and some secondary structural elements, but what about full 3-D tertiary and quaternary structures of proteins and protein complexes?
AI and Computational Life Sciences are critical to model structure and better understand complex biological functions like ligand-receptor binding or how mutagenesis impacts a protein's structure.
This structural information is critical to understanding biomolecular function. One significant area of AI and Computational Life Sciences research is trying to generate computer models that predict the tertiary structure of a protein from the primary sequence. ML techniques have advanced this field by developing new methods for predicting molecular biological structures, including AlphaFold, a program for protein structural prediction.
Network analysis examines the global interaction between biological macromolecules, cells, tissues, and organisms. Computational Life Sciences uses data measuring protein-protein interactions, gene expression, or other biochemical data to identify patterns and form network predictions. Network predictions integrate many data types to gain insights into chromatin remodeling, regulatory regions, and transcription factor binding. The microbiome field also uses network analysis to determine how organisms interact, what resources might be shared, or what organisms might be necessary in an environment.
AI Solves Complex Gene Therapy Clinical Development and Biomanufacturing Problems
AI and Computational Life Sciences help to bring meaning to all kinds of biological data, which you can use to make basic research discoveries faster, diagnose a patient with a rare condition, track and monitor infectious organisms as they move through a population, identify the best treatment for a patient with cancer, and much more.
There are over 1,000 ongoing clinical trials evaluating the safety and efficacy of gene therapies in a broad array of therapeutic areas. While only a handful of these new therapies have been FDA-approved, biopharmaceutical investors are banking heavily on gene therapy companies and their potential to unleash powerful cures for rare genetic diseases and many cancers.
However, the infrastructure to support the journey from pre-clinical research to approval and commercialization is in its infancy. Most cell and gene therapies being evaluated are in phase I/II trials. They have yet to navigate the scale-up process, and with the current manufacturing capabilities, some may be destined for disappointment. Even therapeutics that have successfully navigated approval face challenges due to excessive pricing and lack of clarity around reimbursement.
AI and Computational Life Sciences make research, development, manufacturing, and commercialization more efficient. Leveraging artificial intelligence (AI) algorithms and deep learning offer approaches to positively impact these processes, improving the quantity and quality of manufactured products and production efficiency early in the development process and advancing them into a new era of innovation.20 Form Bio's machine learning and bioinformatic expertise and software solutions are making the application of AI to these problems even more streamlined, accelerating cell and gene therapy development and biomanufacturing process early in the clinical development.
We’ve covered a lot in this post, and now, you should have a better sense of how Computational Life Science applies AI to the vast multi-disciplinary fields encompassing this emerging discipline, how it answers complex biological questions and how it can be applied to solve real-world research and translational problems. This is a rapidly evolving science and technology integrated discipline, so we’ll be sharing new information regularly.
AI Disclosure: Feature image was generated by the AI image tool MidJourney.