Evo 2: Advancements in AI-powered Genome Analysis Expanding Beyond Bacteria

In a report from late 2025, we highlighted a groundbreaking AI system named Evo, which was efficiently trained on vast numbers of bacterial genomes. Evo's ability to predict subsequent genes or suggest entirely new proteins was facilitated by the natural clustering of related genes in bacterial genomes. However, this method faced limitations with organisms that have more intricate genomic frameworks, such as those with complex cells, leading our coverage to note the uncertainty of this approach working with more complex genomes.

Nevertheless, the Evo team embraced this challenge, and today, they have unveiled Evo 2, an open-source AI trained on genomes across all three domains of life: bacteria, archaea, and eukaryotes. By analyzing trillions of DNA base pairs, Evo 2 has developed sophisticated internal representations of crucial features within complex genomes, including regulatory DNA and splice sites, which often pose significant challenges for human analysts.

Genome Features

Bacterial genomes are organized with relative simplicity. Genes encoding proteins or RNAs follow contiguous sequences without interruptions, and those involved in similar functions, like sugar metabolism or amino acid production, tend to cluster together under a singular, compact regulatory system—an arrangement marked by straightforward efficiency.

Eukaryotic genomes, in contrast, are characterized by interrupted coding sequences due to the presence of introns, which contribute no coding information. Regulation in these organisms is managed by sequences that may be dispersed across hundreds of thousands of base pairs. Furthermore, the identifying sequences for introns or regulatory protein binding sites are not well-defined; while some bases are critical, many have only a slightly higher likelihood (e.g., ‘45 percent of the time, it’s a T’). In most eukaryotic genomes, this complexity is compounded by a significant amount of so-called 'junk' DNA, including inactive viruses and irreparably damaged genes.

← Back to News