How Computational Life Sciences Enables Metagenomics Studies

Metagenomics studies and sophisticated computational tools advance disciplines from basic research to healthcare. Find out more in our blog.

Jill Roughan, PhD

Jill Roughan, PhD

December 13, 2022

How Computational Life Sciences Enables Metagenomics Studies

Big datasets are a challenge to analyze. While all the “omics” crank out data by the terabyte, it hardly gets any bigger and messier than metagenomics data.  Today, researchers are more heavily relying on technology so they can stop thinking of the computational side of things and start thinking of the science.

Let’s take a deeper dive into the fascinating world of metagenomics.

What is Metagenomics?

Metagenomics - the study of genetic material from environmental samples – is a young discipline. The term was coined in 1998 by a group of researchers exploring soil microbes in the search for “natural products from previously uncultured soil microorganisms”. 1 2

Metagenomics as defined now is broader than just soil microbes and includes other environmental samples, e.g. water, as well as microbiomics, the study of community of organisms living inside humans, animals and even plants.3

What makes metagenomics difficult and distinct from standard genomics is that many of the organisms in environmental samples can’t be cultured in the lab. The established process of isolating an organism, growing identical cells in culture and then sequencing the DNA does not work for these samples. Instead, a mixture of partially fragmented nucleic acids from different organisms needs to be sequenced and analyzed.

An interesting fact emerged early on in metagenomics studies on rRNA that highlights just how important it is to study these organisms. It is estimated that >99% of microorganisms observable in nature typically are not cultivated by using standard techniques.1

Metagenomics, which is also referred to as ecogenomics, environmental genomics or community genomics, is critical for scientists to better understand complex ecological systems that impact not just our health but also the health of entire ecosystems.

Types of Metagenomics Sequencing

Two different techniques are used to sequence DNA recovered from environmental samples: Shotgun sequencing and targeted sequencing.

Here they work and when they are used.

Metagenomic Shotgun Sequencing

Shotgun sequencing is not new or specific to metagenomics studies. In fact, the first human genome was sequenced using shotgun approaches.

In shotgun sequencing a DNA sequence is determined by randomly breaking up the genome into smaller pieces, sequencing these pieces and then puzzling the complete sequence back together with the help of overlapping fragments.

This established process has been adjusted for metagenomic studies. Environmental samples already contain short DNA fragments that can be sequenced. Innovations in next-generation sequencing technologies have made it possible to directly sequence these fragments without time-consuming cloning or amplification steps.

Metagenomic shot-gun sequencing provides information not just about which organisms are present and how abundant they are in the sample but also provides functional profiling, gene prediction and microbial interaction of the whole community.

Typical applications include:

  1. Evaluation of diversity and abundances of microbial species in a sample
  2. Study of unculturable microbes that are otherwise difficult or impossible to analyze
  3. Study of very low abundance microbes that might otherwise go undetected
  4. Identification of microbes or microbial communities that are relevant to human health, agriculture, animal husbandry or environmental monitoring and bioremediation7.
  5. Metagenomics can also play an important role in drug discovery, e.g. the identification of novel proteins such as antibiotics or the microbe-host relationship 9.

Targeted Metagenomics Sequencing

While shotgun sequencing aims to sequence whatever nucleic acid is present in a sample, targeted approaches selectively sequence specific regions of a genome that is shared across multiple organisms and samples.

The major difference to shotgun sequencing is that genes of interest are amplified using PCR. The most commonly analyzed gene is for 16S rRNA, a relatively short sequence that differs between species but is highly conserved within a species and present in a single copy. After sample collection, DNA extraction, amplification and sequencing the 16S rRNA can provide information not just about which organisms are present but also about community structure and functional roles of microbial communities.10

Targeted metagenomics is less expensive and data analysis less challenging and is therefore commonly used for studies like the assessment of the gut microbiota.

How Metagenomics Relies on Computational Life Sciences

A typical soil sample, or a sample of the human gut microbiome can contain thousands of different bacterial species plus viruses and protists that when sequenced result in hundreds of gigabases of sequence data.  Extracting useful information from these data requires advanced computational methods and sophisticated bioinformatics pipelines.

Shotgun sequencing in particular is highly challenging because it creates a vast number of short reads that need to be assembled. Standard bioinformatics analysis pipelines include read curation, assembly, binning (obtaining single genomes from a metagenome), gene prediction, functional and taxonomic annotation as well as taxonomic diversity analysis and visualization.11

Applications of Metagenomics

The applications of metagenomics are broad and span a number of industries. Here is a selection of interesting applications:

  1. Healthcare – microbiome studies using targeted sequencing enables the identification of gut microbial species and their abundance, and allows monitoring of human gut health.
  2. Prevention and Control of Environmental Pollution – identifying environmental microorganisms that can be used for bioremediation, waste water treatment or degrading of contaminants such as pesticides, petroleum hydrocarbons, or plastics.
  3. Diagnosis – the microbiome has been shown to be associated with a growing number of diseases from various cancers to autoimmune disease. Metagenomics can be used to detect drug-resistant genes for pathogens and monitor outbreaks of infectious diseases in hospitals and communities.13
  4. Drug discovery – Novel antibiotics and enzymes are among the molecules discovered using metagenomics. An example is the discovery of Turbomycin A and Turbomycin B, broad-spectrum antibacterial drugs that were screened from the soil metagenomic library. 14
  5. Agriculture and animal husbandry – metagenomics can help reveal interactions between microorganisms, soil, and plants which is essential to improving crop yields and identifying new and novel pathogens that affect farm animals.
  6. Biotechnology – identifying bioactive compounds, e.g. novel enzymes or bioactive ingredients that can be used detergents.


Metagenomics is an exciting, emerging field that has benefitted from ever faster and less expensive sequencing and amplification techniques. Given that metagenomics data sets are huge and messy, the availability of sophisticated computational life sciences tools are especially critical. The recent years have seen constant progress as individual tools as well as complete pipelines are developed by academic and industry researchers alike.

Many different industries are already utilizing metagenomics and more applications are added as the field becomes more established, data can be generated faster and more cheaply and can be analyzed using standardized processes.


What is the purpose of metagenomics?

Metagenomics studies the structure and function of entire nucleotide sequences isolated and analyzed from all the organisms in a complex sample, e.g. in samples from the environment or the human gut or skin. Metagenomics allows researchers to study microbial communities in their natural environment and allows study of those that cannot be cultured in the lab.

What are the types of metagenomics?

There are two basic approaches: shotgun sequencing allows researchers to get an overview of all the different species present in a sample and also allows them to analyze a broad variety of genes that provide clues about how these microbial communities interact. Targeted Sequencing looks for specific sequences only and is the tool of choice for researchers whose main goal is to quickly establish the type and number of microbes present in a sample.

How do scientists use metagenomics?

Scientists use metagenomics for many varied applications. Metagenomics studies can identify novel biological compounds for use as drugs or industrial biomolecules. Metagenomics can help identify new pathogens, and help with bioremediation efforts. In addition, many studies have shown a close link between microbial communities, e.g. in the human gut or mouth, that are linked to disease such as cancer. Studying these microbial communities helps understanding these connections better and developing treatments.

How is metagenomics different from genomics?

Genomics studies the genome of a single organism while metagenomics studies the genomes of all the different organisms in a sample.

Want to learn more about how Form can help with your metagenomics research?

Schedule Your Demo Today


  1. Handelsman J. Molecular biological access to the chemistry of unknown soil microbes: a new frontier for natural products. Chemistry & Biology. 5 (10): R245-9. (1998).
  2. Hugenholtz P. Impact of Culture-Independent Studies on the Emerging Phylogenetic View of Bacterial Diversity. J Bacteriol. 180(18): 4765–4774 (1998).
  3. Cheng M. Microbiome Big-Data Mining and Applications Using Single-Cell Technologies and Metagenomics Approaches Toward Precision Medicine. Front. Genet. (2019).
  4. Borchmann S. An atlas of the tissue and blood metagenome in cancer reveals novel links between bacteria, viruses and cancer. Microbiome. 9 (94). (2021).
  5. Blessing C.N. Metagenomics: A Tool for Exploring Key Microbiome With the Potentials for Improving Sustainable Agriculture. Front. Sustain. Food Syst., 17. (2022).
  6. Kwok K.T.T. Virus Metagenomics in Farm Animals: A Systematic Review. Viruses. 12(1):107. (2020).
  7. Techtmann S.M. Metagenomic applications in environmental monitoring and bioremediation. J Ind Microbiol Biotechnol. 43(10):1345-54. (2016).
  8. Pereira F. Chapter 28 - Metagenomics: A gateway to drug discovery. Advances in Biological Science Research A Practical Approach: 453-468. (2019).
  9. Kalantar K. Host-Microbe Metagenomics: a Lens To Refocus Our Perspective on Infectious and Inflammatory Diseases. mSystems 6(4). (2021).
  10. Mallawaarachchi V. Metagenomics — Who is there and what are they doing?. Towards Data Science. Published March 2, 2020. Accessed September 13, 2022.
  11. Tamames J. SqueezeMeta, A Highly Portable, Fully Automatic Metagenomic Analysis Pipeline. Front. Microbiol. (2019).
  12. Siewald L. Targeted metagenomic sequencing data of human gut microbiota associated with Blastocystis colonization. Sci Data 4. (2017).
  13. Zhang L. Advances in Metagenomics and Its Application in Environmental Microorganisms. Front. Microbiol. (2021).
  14. Gilespie D. Isolation of Antibiotics Turbomycin A and B from a Metagenomic Library of Soil Microbial DNA. Appl. Environ. Microbiol. 68(9). (2002).

More to Explore