FLAG: Find, Label, Annotate Genomes, a fully automated tool for genome gene structural and functional annotation of highly fragmented non-model species

Recent advances in long-read sequencing technologies and the efforts of projects aimed atincreasing the universe sequenced reference genomes, such as the Earth Biogenome project haveled to a growth in the number of whole genome sequencing projects for non-model organisms.Still, the vast majority (6,880) of the 28,727 unique publicly available eukaryotic genomes lackgene structure annotations. Annotating genes in a genome of interest is a multistep process, andwhile there are many tools available for each step of the annotation, there are few end-to-endworkflows that (i) run on multiple computing environments, (ii) automatically run without initialdata training and (iii) perform accurately with fragmented genomes. Here we present “Find,Label, Annotate Genomes” (FLAG), a fully automated genome annotation workflow that doesnot require species-specific extrinsic evidence to create highly complete and accurate geneannotations of even highly fragmented genomes. Using FLAG, various eukaryotic genomeassemblies, including 1 plant, 5 living animals, and 1 extinct animal were annotated. In all livingspecies, FLAG annotations provided an average of 18% increase in complete BUSCO scoreswhen compared to BRAKER2. Furthermore, FLAG has more accurately predicted the number ofprotein-coding genes, with an average error rate 15x lower than BRAKER2.