Artificial Intelligence

From Observational to Predictive: The Pivotal Role of AI in Modern Biological Experimentation

Explore the transformative journey from traditional biological lab techniques to AI-augmented scientific methods. Discover how AI is set to reshape biological lab experimentation in gene therapy.

Amicia Elliott, PhD

Amicia Elliott, PhD

January 23, 2024

From Observational to Predictive: The Pivotal Role of AI in Modern Biological Experimentation

Traditionally, life sciences R&D has relied on biological experimentation in a laboratory setting, or ‘biological lab’, that has evolved over the last 50 years from extremely manual procedures with sometimes dangerous equipment and chemicals and a variety of animal models to high-throughput machines and simplified chemical kits with regulated and standardized model organisms.1  The biological lab is where hypotheses are tested, and protocols for analytical and process development are produced and vetted.

In the last 20 years, with the rise of genomics and other data-generating engines, the reliance on computer systems for data collection, storage, analysis, and visualization has become essential in the life sciences. This has also shifted the ‘dry lab’ side, traditionally composed of theory and mathematics, to incorporate increasingly powerful computational tools to more rapidly evolve new hypotheses for the biological lab.   Until recently, the biological and dry lab had a happy and mutually beneficial relationship, when they had to interact. Some fields have leaned into both biological and computational tools earlier than others. Now, with the recent emergence of artificial intelligence (AI), the workflows, equipment, and operations associated with biological experimentation are poised to undergo another major shift with a much closer relationship to the computational dry lab. 

In the following blog, we weigh the pros and cons and explore how AI can be implemented into biological lab processes to complement ongoing research.

An Example from Gene Therapy Drug Discovery 

First, let’s frame the discussion with an example from the drug discovery process.  In the gene therapy space, the process of pre-clinical drug discovery happens roughly like this: 

  • A biological target gene associated with a specific disease is identified through literature and data review, a clinical presentation, or animal model experimentation. 
  • Once ID’d, the target is validated as crucial to a disease state using in vitro and in vivo genetics and molecular biology techniques; cornerstones of lab work. 
  • If the target is validated, a drug discovery team will design a high-throughput screen using viral vector candidates that introduce or modify the target gene.2
  • Characterization of the most promising leads using in vitro or ex vivo assays is done to ensure reproducibility, functionality, and safety/toxicity for animal studies.
  • In vivo animal studies are conducted to characterize a lead viral vector construct's potential efficacy and safety effects.

Sounds straightforward, right? 

This process led to the FDA's approval of the first gene therapy for adults with hemophilia A and cell-based gene therapies that followed.3,4 However, for all the successes, there have been many costly and time-consuming misses, leading to the ever-growing “translation gap”; and the failure of many promising pre-clinical drug candidates to deliver clinical efficacy. The “translation gap,” currently estimated at 90%, has long been a problem, and many scientists have focused their laboratory expertise on developing more clinically relevant in vitro and in vivo models to help the entire life science industry zipline over the growing translation chasm.5

What if we could reduce this particular chasm using AI-powered tools? There are many examples where AI is beginning to flex in life sciences, from target ID to prediction of preclinical endpoints.6 One of the first drugs with an AI-discovered target and an AI-generated drug design, INS018_055, recently entered the clinic, a big step forward for champions of AI.7 It must be said that AI-driven drug discovery has yet to traverse the entire regulatory approval process and produce the big wins that the inevitable hype cycle has promised. However, owing to achievements in the medical device industry that showcase AI's significance in healthcare, it is evident that AI's involvement will advance therapeutics to a higher standard. This improvement encompasses expediting the discovery of new medicines, reducing diagnosis time, and assisting surgeons in analyzing procedure results, ultimately leading to enhanced clinical outcomes.8.9 

So, with AI's place in healthcare solidifying and so many private and open-source solutions out there, the next questions are, “How do we choose effective models” and  “How and where does AI slip into the upstream process of biological experimentation ahead of translation to the clinic?”  

Biological Data Collection {for AI}

Vis-à-vis the scientific method, experimentation is done by varying one factor at a time in a given biological system, keeping everything else constant, and measuring how a single output (or multiple outputs) of a system changes. This often requires many iterations to arrive at the correct solutions, pH, concentrations, cell lines, etc., to optimize the given biological system for the hypothesis being tested and that one factor’s role. Then, researchers perform technical and experimental replicates to determine how reproducible and statistically significant the outputs are, what conclusions can be drawn against the starting hypothesis, and if the outcome can be interpreted as a function of that one factor.

While we still follow this same process today in the lab, high-throughput and automated instrumentation (including ‘omics technologies) and standardized protocols for many biological systems have decreased the time and effort in data generation and thus increased the amount of biological data being produced. This huge increase in the available data is a double-edged sword when it comes to training and testing new AI algorithms. On the one hand, AI is very data-hungry, and modern experimentation is providing more data than ever and increasingly, in multiple modalities. On the other hand, AI requires well-annotated, structured, and relevant data for the AI task being developed. There remains a lot of methodological variability, intrinsic biological variability in a living experimental system, and huge variation in data structures that are deposited in public databases, which makes a lot of that data unsuitable for AI as-is. This complication emphasizes proper experimental design and collecting “the right data” for training and testing AI systems.

Where Can AI Augment Biological Experimentation?

Generally, AI is well positioned to advance and complement, but not replace, how biological lab experiments are done. An academic example by Stokes et al. in the drug discovery space for novel antibiotics demonstrates the utility of AI + biological lab work upstream of clinical development.10

The group performed E. coli inhibition assays with 2,335 FDA-approved and natural bioactive compounds and used the resulting data to train a deep-learning algorithm.10 The algorithm was then used to predict growth inhibitory activity from a chemical compound library and identified 99 promising compounds, 51 of which had growth inhibitory activity. In contrast, of the 63 least promising molecules, only 2 showed growth inhibitory activity, validating the model's predictive capabilities.  

This example provides a clear success in drug discovery lead identification with available data in an academic setting, where AI might be implemented for more exploratory purposes and able to be used in more high-risk contexts. That said, since budgets are limited by external funding sources, the scope and quality of data may be more limited. 

By contrast, industry labs will typically focus on shorter-term ROI and practical applications, regulatory compliance, and access to private, high-quality datasets for training. An industry example of a clinically successful collaborative AI and biological lab work is described by Evaxion which uses its biological data in its AI algorithms to predict cellular interactions and identify the right target to stimulate a relevant, potentially lifesaving, response.  They now have an ongoing melanoma DNA vaccine in Phase 2 clinical trials.  Moreover, Gingko Bioworks has recently announced a collaboration with Google to co-develop AI tools and biological data in tandem for biological engineering applications.

Here are some of the ways AI may be used in the biology lab:

AI-Driven Experimental Design and Automation

AI can participate in experimental design by suggesting new and simpler experimental paths based on data from previous versions of the same experiments.11  With these types of tools, scientists can enhance the personalization and scalability of their experiments as well as improve resource management, data analysis, and quality control. AI-driven data analysis augments human-observed insights, while scalable robotics and automation systems allow for large-scale experiments. Additionally, AI facilitates collaboration and documentation, streamlining research processes and accelerating scientific progress in biological lab settings.

Data Analysis, Interpretation, and Predictive Modeling

AI-driven tools can efficiently process experimental data, identifying subtle patterns and relationships that may be challenging to discern manually. This assists researchers in data analysis and provides interpretations and insights to predict the results of future experiments. This area is where current models are most prolific. A review of AI-based approaches to analyzing multi-omics data in cancer shows a wide range of examples where modeling is being integrated into downstream data science for hypothesis refinement and generation.12

Data Validation

Researchers can use AI to provide statistics related to repeatability, accuracy, and scalability.13 AI tools can learn from historical data, identifying patterns of data validation issues and improving the accuracy of error detection over time. They can compare collected data against predefined criteria and known standards to detect anomalies or inconsistencies, flagging potential errors or inaccuracies in real-time. 

Data Visualization

Machine learning algorithms can identify relevant patterns and trends in the data, helping researchers create dynamic and interactive visualizations that simplify data interpretation. These visualizations can encompass various formats, such as charts, graphs, heatmaps, and 3D models, making it easier for scientists to understand and communicate their findings effectively. AI-driven data visualization not only aids in data exploration but also supports decision-making, hypothesis testing, and dissemination of research results within the scientific community.

An Example from Gene Therapy Drug Discovery Leveraging AI

Let’s circle back to our starting framework for gene therapy, where Form Bio is currently focused.  Inserting AI-driven technologies into the traditional biological science framework, we see the process of drug discovery evolving to look more like this: 

  • A biological target gene associated with a specific disease is identified through AI agents operating on literature and performing a review of data, a clinical presentation, or animal model experimentation.     
  • Once ID’d, the target is validated as crucial to a disease state using in vitro and in vivo genetics and molecular biology techniques in an AI-augmented automated lab for faster experiments.
  • If the target is validated, drug discovery AI agent(s) will design a high-throughput screen using viral vector candidates that introduce or modify the target gene.
  • Characterization of the most promising leads is done using ML-augmented data science tools alongside small, targeted in vitro or ex vivo assays to ensure reproducibility, functionality, and safety/toxicity for animal studies.
  • Digital animal models are used for initial predictions (models fine-tuned on secondary analysis data from in vitro or ex vivo assays) of a lead viral vector construct's potential efficacy and safety effects ahead of a limited in vivo animal study.

But There Are Still Challenges

As we’ve discussed, AI in biological labs has already been successful in some arenas and holds a lot of potential for future expansion. However, its implementation needs to be thoughtful and controlled.  After all, there are still many challenges to overcome.

Balancing the Reliance on AI with Critical Scientific Thinking

We are not at a point where AI can replace scientists, nor is that the ultimate goal. While AI can process data, recognize patterns, and offer insights, it should be viewed as a tool to augment, not replace, human expertise. Maintaining a critical mindset ensures that researchers remain vigilant in questioning assumptions, designing experiments, and interpreting results, ultimately fostering the integrity and depth of scientific inquiry. The harmonious integration of AI and critical thinking empowers scientists to harness technology's potential while preserving the essence of scientific curiosity and rigor.

Ethical and Transparency Concerns

As AI shapes biological lab experimentation, maintaining integrity and reproducibility is essential. Researchers must grapple with questions regarding the responsible use of AI algorithms and the potential biases inherent in the data on which these systems are trained. Ensuring transparency in how AI is employed, including disclosing its role in experimental design and data analysis, is crucial to maintaining the trust of the scientific community and the public. Additionally, safeguarding against potential unintended consequences or misuse of AI technology in biological labs underscores the need for robust ethical frameworks and ongoing oversight to uphold research integrity.

Data Privacy, Storage, and Access

The increased volume and complexity of data generated by AI-driven experiments require robust data storage solutions, often demanding significant resources and infrastructure. Additionally, ensuring strict data privacy and access controls becomes critical, as AI algorithms may uncover sensitive information, necessitating the implementation of secure and compliant data management practices to maintain the confidentiality and integrity of research data.

Future Trends of AI Use in Gene Therapy Development

AI-facilitated analysis of biological data has largely focused on the small molecule industry from a drug development perspective and has encountered challenges in clinical outcomes. As AI tools are expanding to other areas of the healthcare space, it is important to establish expectations for the contribution of these tools that are tempered by lessons learned in similar applications. The right model, the right data, and the right validations are critical for success. 

Here at Form Bio, we are taking a pragmatic approach by actively navigating known and new challenges with strategic collaborations to strike a balance between the power of AI, biological data, and critical scientific thinking.  Specifically, we are leading the charge of leveraging our AI tools to optimize viral constructs for AAV gene therapy development, emphasizing enhancing performance in manufacturing. This will substantially decrease R&D costs associated with gene therapy, potentially decreasing the price of these drugs which are currently greater than $2M per dose.

Our AI models have demonstrated excellence in predicting the outcomes of packaging therapeutic AAV vectors, which allows us to rapidly iterate on designs to reduce the much slower bench process.  While our biological validation experiments are currently focused on our predictive AI technology,  this year will emphasize generating data and validating our vector optimization models. This feedback loop between biological experiments and AI tooling (both predictive and optimization models) represents a significant stride forward in responsibly harnessing AI for heightened manufacturing efficiency within gene therapy programs.

AI Disclosure: Feature image was generated by an AI image tool MidJourney.

Want a sneak peek of our AAV viral construct comparison report?

Request Sample Report Here


  1. Mechanism of DNA Chain Growth, II.  Accumulation of Newly Synthesized Short Chains in E coli Infected With Ligase-Defective T4 Phages.  Proc Natl Acad Sci U S A. 1968 Aug; 60(4): 1356–1362.
  2. Viral Vector Construct & Design. Published October 3, 2023. Accessed October 9, 2023. 
  3. Commissioner O of the. FDA Approves First Gene Therapy for Adults with Severe Hemophilia A. FDA. Published June 30, 2023. Accessed October 9, 2023. 
  4. FDA Approves First Gene Therapies to Treat Patients with Sickle Cell Disease.  FDA.  Published December 8, 2023.  Accessed Jan 2, 2024.
  5. Sun D, Gao W, Hu H, Zhou S. Why 90% of clinical drug development fails and how to improve it? Acta Pharm Sin B. 2022;12(7):3049-3062.
  6. Qureshi R, Irfan M, Gondal TM, et al. AI in drug discovery and its clinical relevance. Heliyon. 2023;9(7):e17575. 
  7. Insilico’s AI Drug Enters Phase II Clinical Trial. Published June 27, 2023. Accessed Dec 18, 2023. 
  8. Health C for D and R. Artificial Intelligence and Machine Learning (AI/ML)-Enabled Medical Devices. Published October 5, 2022. Accessed October 9, 2023. 
  9. Moor M, Banerjee O, Abad ZSH, et al. Foundation models for generalist medical artificial intelligence. Nature. 2023;616(7956):259-265. 
  10. Stokes JM, Yang K, Swanson K, et al. A Deep Learning Approach to Antibiotic Discovery. Cell. 2020;180(4):688-702.e13. 
  11. Boiko, D.A., MacKnight, R., Kline, B. et al. Autonomous chemical research with large language models. Nature 2023. 624, 570–578. 
  12. Biswas N and Chakrabati S.  Frontiers Oncology 14, Volume 10. Artificial Intelligence (AI)-Based Systems Biology Approaches in Multi-Omics Data Analysis of Cancer.  October 2020.
  13. AI Harnessing Potential of Wet Lab Data. Published August 9, 2023. Accessed October 9, 2023. 

More to Explore