Publications

2020-2023 (selected)

  1. P. Amaral, S. Carbonell-Sala, F.M. De La Vega, T. Faial, A. Frankish, T. Gingeras, R. Guigo, J.L. Harrow, A.G. Hatzigeorgiou, R. Johnson, T.D. Murphy, M. Pertea, K.D. Pruitt, S. Pujar, H. Takahashi, I. Ulitsky, A. Varabyou, C.A. Wells, M. Yandell, P. Carninci, and S.L. Salzberg. The status of the human gene catalogue. Nature 622, 41-47 (2023).
  2. A. Varabyou, M.J. Sommer, B. Erdogdu, I. Shinder, I. Minkin, K.-H. Chao, S. Park, J. Heinz, C. Pockrandt, A. Shumate, N. Rincon, D. Puiu, M. Steinegger, S.L. Salzberg, and M. Pertea. CHESS 3: an improved, comprehensive catalog of human genes and transcripts based on large-scale expression data, phylogenetic analysis, and protein structure. Genome Biology 24:249 (2023).
  3. A. Gihawi, Y. Ge, J. Lu, D. Puiu, A. Xu, C.S. Cooper, D.S. Brewer, M. Pertea, and S.L. Salzberg. Major data analysis errors invalidate cancer microbiome findings. mBio 14:5. https://doi.org/10.1128/mbio.01607-23 (2023).
  4. A. Rhie, S. Nurk, …[71 authors including S. Salzberg and lab members Alaina Shumate and Jakob Heinz]…, M.C. Schatz, K.H. Miga, K.D. Makova, and A.M. Phillippy. The complete sequence of a human Y chromosome. Nature 621, 344-354 (2023).
  5. A. Varabyou, B. Erdogdu, S.L. Salzberg, and M. Pertea. Investigating open reading frames in known and novel transcripts using ORFanage. Nature Computational Science 3: 700-708 (2023). https://doi.org/10.1038/s43588-023-00496-1
  6. A. Guo, S.L. Salzberg, and A.V. Zimin. JASPER: a fast polishing tool that improves accuracy of genome assemblies. PLoS Computational Biology 19(3): e1011032. https://doi.org/10.1371/journal.pcbi.1011032 (2023).
  7. K.-H. Chao, A.V. Zimin, M. Pertea, and S.L. Salzberg. The first gapless, reference-quality, fully annotated genome from a Southern Han Chinese individual. G3: Genes, Genomes, Genetics, jkac321 (2023).
  8. M.J. Sommer, S. Cha, A. Varabyou, N. Rincon, S. Park, I. Minkin, M. Pertea, M. Steinegger, and S.L. Salzberg. Structure-guided isoform identification for the human transcriptome. eLife 11:e82556 (2022). https://doi.org/10.7554/eLife.82556.
  9. S. Nurk, S. Koren, A Rhie, et al. (100+ authors including S. Salzberg and lab member Alaina Shumate). The complete sequence of a human genome. Science 376(6588), 44-53 (2022)
  10. P.J. Simner and S.L. Salzberg. The human “contaminome” and understanding infectious disease. New England Journal of Medicine, 387(10), 943-946 (2022).
  11. J. Lu, N. Rincon, D.E. Wood, F. Breitwieser, C. Pockrandt, B. Langmead, S.L. Salzberg, and M. Steinegger. Metagenome analysis using the Kraken software suite. Nature Protocols (2022), publ. online 28 Sept 2022. dx.doi.org/10.1038/s41596-022-00738-y
  12. C.R. Wensel, J.L. Pluznick, S.L. Salzberg, and C.L. Sears. Next-generation sequencing: insights to advance clinical investigations of the microbiome. Journal of Clinical Investigation 132(7):e154944, 2022.
  13. E.D. Jarvis, [et al.], and the Human Pangenome Reference Consortium. Semi-automated assembly of high-quality diploid human reference genomes. Nature 611, 519-531 (2022).
  14. A.V. Zimin and S.L. Salzberg. The SAMBA tool uses long reads to improve the contiguity of genome assemblies. PLoS Computational Biology, 18(2): e1009860. doi.org/10.1371/journal.pcbi.1009860 (2022).
  15. V.L. Sork, S.J. Cokus, S.T. Fitz-Gibbon, A.V. Zimin, D. Puiu, J.A. Garcia, P.F. Gugger, C.L. Henriquez, K.E. Lohmueller, M. Pellegrini, and S.L. Salzberg. High-quality genome and methylomes illustrate features underlying evolutionary success of oaks. Nature Communications 13:2047, doi: 10.1038/s41467-022-29584-y (2022).
  16. A.V. Zimin, A. Shumate, I. Shinder, J. Heinz, D. Puiu, M. Pertea, and S.L. Salzberg. A reference-quality, fully annotated genome from a Puerto Rican individual. Genetics, 220(2) iyab227 (2022). https://doi.org/10.1093/genetics/iyab227
  17. C. Pockrandt, M. Steinegger, and S.L. Salzberg. PhyloCSF++: A fast and user-friendly implementation of PhyloCSF with annotation tools. Bioinformatics 38:5, 1440-1442 (2022).
  18. A. Shumate and S.L. Salzberg. LiftoffTools: a toolkit for comparing gene annotations mapped between genome assemblies. F1000 Research, 2022, 11:1230. doi.org/10.12688/f1000research.124059.1
  19. C.R. Wensel, J.L. Pluznick, S.L. Salzberg, and C.L. Sears. Next-generation sequencing: insights to advance clinical investigations of the microbiome. Journal of Clinical Investigations 132(7):e154944, 2022.
  20. D.B. Neale, A.V. Zimin, S. Zaman, A.D. Scott, B. Shrestha, R.E. Workman, D. Puiu, B.J. Allen, Z.J. Moore, M.K. Sekhwal, A.R. De La Torre, P.E. McGuire, E. Burns, W. Timp, J.L. Wegrzyn, and S.L. Salzberg. Assembled and annotated 26.5 Gbp coast redwood genome: a resource for estimating evolutionary adaptive potential and investigating hexaploid origin. G3: Genes, Genomes, Genetics, 12(1), jkab380, 2022. https://doi.org/10.1093/g3journal/jkab380.
  21. S.L. Salzberg and D.E. Wood. Releasing the Kraken. Frontiers in Bioinformatics 1:808003, doi: 10.3389/fbinf.2021.808003 (2021).
  22. A. Varabyou, C. Pockrandt, S.L Salzberg, and M. Pertea. Rapid detection of inter-clade recombination in SARS-CoV-2 with Bolotie. Genetics (2021). https://doi.org/10.1093/genetics/iyab074
  23. M.J. Sommer and S.L. Salzberg. Balrog: A universal protein model for prokaryotic gene prediction. PLoS Computational Biology, 17(2): e1008727 (2021).
  24. A. Varabyou, S.L. Salzberg, and M. Pertea. Effects of transcriptional noise on estimates of gene and transcript expression in RNA sequencing experiments. Genome Research 31:301-308 (2021).
  25. A. Shumate and S.L. Salzberg. Liftoff: accurate mapping of gene annotations. Bioinformatics, advance online publication 15 December 2020, https://doi.org/10.1093/bioinformatics/btaa1016.
  26. J. Lu. and S.L. Salzberg. Ultrafast and accurate 16S rRNA microbial community analysis using Kraken 2. Microbiome 8, Article number: 124 (2020).
  27. A.D. Scott, A.V. Zimin, D. Puiu, R. Workman, M. Britton, S. Zaman, M. Caballero, A.C. Read, A.J. Bogdanove, E. Burns, J. Wegrzyn, W. Timp, S.L. Salzberg, and D.B. Neale. A reference genome sequence for giant sequoia. G3: Genes, Genomes, Genetics. 2020 November 5;10(11):3907-3919.
  28. M. Alonge, A. Shumate, D. Puiu, A. Zimin, and S.L. Salzberg. Chromosome-scale assembly of the bread wheat genome reveals thousands of additional gene copies. Genetics 2020 October;216(2):599-608.
  29. J. Lu and S.L. Salzberg. SkewIT: The Skew Index Test for large-scale GC Skew analysis of bacterial genomes. PLoS Computational Biology 16(12): e1008439 (2020).
  30. Alaina Shumate, Aleksey V. Zimin, Rachel M. Sherman, Daniela Puiu, Justin M. Wagner, Nathan D. Olson, Mihaela Pertea, Marc L. Salit, Justin M. Zook & Steven L. Salzberg. Assembly and annotation of an Ashkenazi human reference genome. Genome Biology 21:129 (2020).
  31. M. Steinegger and S.L. Salzberg. Terminating contamination: large-scale search identifies more than 2,000,000 contaminated entries in GenBank. Genome Biology 21:115 (2020).
  32. R.M. Sherman and S.L. Salzberg. Pan-genomics in the human genome era. Nature Reviews Genetics, 21, 243–254 (2020).
  33. A.C. Read, M.J. Moscou, A.V. Zimin, G.Pertea, R.S. Meyer, M.D. Purugganan, J.E. Leach, L.R. Triplett, S.L. Salzberg, and A.J. Bogdanove. Genome assembly and characterization of a complex zfBED-NLR gene-containing disease resistance locus in Carolina Gold Select rice with Nanopore sequencing. PLoS Genetics, 16(1):e1008571 (2020).
  34. F.P. Breitwieser and S.L. Salzberg. Pavian: Interactive analysis of metagenomics data for microbiome studies and pathogen identification. Bioinformatics 36:4 (2020), 1303-1304.

2017-2019 (selected)

  1. S. Kovaka, A.V. Zimin, G.M. Pertea, R. Razaghi, S.L. Salzberg, and M. Pertea. Transcriptome assembly from long-read RNA-seq alignments with StringTie2. Genome Biology 20:278 (2019).
  2. D. Kim, J.M. Paggi, C. Park, C. Bennett, and S.L. Salzberg. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nature Biotechnology 37:907-915 (2019).
  3. F.P. Breitwieser, M. Pertea, A.V. Zimin, and S.L. Salzberg. Human contamination in bacterial genomes has created thousands of spurious proteins. Genome Research 29:954-960 (2019).
  4. E.T. Saitta, R. Liang, M.C.Y. Lau, C.M. Brown, N.R. Longrich, T.G. Kaye, B.J. Novak, S.L. Salzberg, M.A. Norell, G.D. Abbott, M.R. Dickinson, J. Vinther, I.D. Bull, R.A. Brooker, P. Partin, P. Donohoe, T.D.J. Knowles, K.E.H. Penkman, and T. Onstott. Cretaceous dinosaur bone contains recent organic material and provides an environment conducive to microbial communities. eLife 2019;8:e46205.
  5. S.L. Salzberg. Next-generation genome annotation: we still struggle to get it right. Genome Biology 20:92 (2019), doi.org/10.1186/s13059-019-1715-2.
  6. R.M. Sherman, J. Forman, V. Antonescu, D. Puiu, […], K.C. Barnes, and S.L. Salzberg. Assembly of a pan-genome from deep sequencing of 910 humans of African descent. Nature Genetics 51, 30-35 (2019).
  7. R. Wilton, S.J. Wheelan, A.S. Szalay, and S.L. Salzberg. The Terabase Search Engine: a large-scale relational database of short-read sequences. Bioinformatics, 35:4, 665-70 (2019).
  8. M. Pertea, A. Shumate, G. Pertea, A. Varabyou, F.P. Breitwieser, Y.-C. Chang, A.K. Madugundu, A. Pandey, and S.L. Salzberg. CHESS: a new human gene catalog curated from thousands of large-scale RNA sequencing experiments reveals extensive transcriptional noise. Genome Biology 19:208 (2018).
  9. F.P Breitwieser, D.N Baker, and S.L. Salzberg. KrakenUniq: Confident and fast metagenomics classification using unique k-mer counts. Genome Biology 19:198 (2018).
  10. J. Lu and S.L. Salzberg. Removing contaminants from databases of draft genomes. PLoS Computational Biology, 14(6): e1006277 (2018).
  11. S.L. Salzberg. Open questions: How many genes do we have? BMC Biology 16:94 (2018) https://rdcu.be/4TrD.
  12. K.A. Stevens, K. Woeste, S. Chakraborty, M.W. Crepeau, C.A. Leslie, P.J. Martínez-García, D. Puiu, J. Romero-Severson, M. Coggeshall, A.M. Dandekar, D. Klupfel, D.B. Neale, S.L. Salzberg, and C.H. Langley. Genomic variation among and within six Juglans species. G3: Genes, Genomes, Genetics, 8:7 (2018), 2153-2165.
  13. G. Marçais, A.L. Delcher, A.M. Phillippy, R. Coston, S.L. Salzberg, and A. Zimin. MUMmer4: A fast and versatile genome alignment system. PLoS Computational Biology, 14(1): e1005944 (2018).
  14. Z. Li, F.P. Breitwieser, J. Lu, A.S. Jun, L. Asnaghi, S.L. Salzberg, and C.G. Eberhart. Identifying corneal infections in formalin fixed specimens using next generation sequencing. Investigative Ophthalmology & Visual Science 59:1 (2018), 280-288.
  15. A.V. Zimin, D. Puiu, R. Hall, S. Kingan, B.J. Clavijo, and S.L. Salzberg. The first near-complete assembly of the hexaploid bread wheat genomeTriticum aestivum. GigaScience, 6:11 (2017) 1-7.
  16. M.-C. Luo, Y.Q. Gu, D. Puiu, […], A.V. Zimin, G. Pertea, […], S.L. Salzberg*, K.M. Devos*, and Jan Dvořák*. Genome sequence of the progenitor of the wheat D-genome Aegilops tauschii. Nature 551:7681(2017), 498-502. (*Co-corresponding authors)
  17. S.L. Salzberg. Horizontal gene transfer is not a hallmark of the human genome. Genome Biology 18:85 (2017).
  18. R. Luo, A. Zimin, R. Workman, Y. Fan, G. Pertea, N. Grossman, M.P. Wear, B. Jia, H. Miller, A. Casadevall, W. Timp, S.X. Zhang, and S.L. Salzberg. First draft genome sequence of the pathogenic fungus Lomentospora prolificans (formerly Scedosporium prolificans). G3: Genes, Genomes, Genetics 7:11 (2017), 3831-3836.
  19. F. Breitwieser, J. Lu, and S.L. Salzberg. A review of methods and databases for metagenomic classification and assembly. Briefings in Bioinformatics 20:4 (2019), 1125-39. doi.org/10.1093/bib/bbx120.
  20. D.B. Neale, P.E. McGuire, N.C. Wheeler, K.A. Stevens, M.W. Crepeau, C. Cardeno, A.V. Zimin, D. Puiu, G.M. Pertea, U.U. Sezen, C. Casola, T. Koralewski, R. Paul, D. Gonzalez-Ibeas, S. Zaman, R. Cronn, M. Yandell, C. Holt, C.H. Langley, J.A. Yorke, S.L. Salzberg, and J.L. Wegrzyn. The Douglas-fir genome sequence reveals specialization of the photosynthetic apparatus in Pinaceae. G3: Genes, Genomes, Genetics 7:9(2017), 3157-3167.
  21. R. Luo, M.C. Schatz, and S.L. Salzberg. 16GT: a fast and sensitive variant caller using a 16-genotype probabilistic model. GigaScience (2017) 6(7):1-4. .
  22. A.V. Zimin, K.A. Stevens, M.W. Crepeau, D. Puiu, J.L. Wegrzyn, J.A. Yorke, C.H. Langley, D.B. Neale, and S.L. Salzberg. An improved assembly of the loblolly pine mega-genome using long-read single-molecule sequencing. GigaScience 6(1):1-4 (2017).
  23. A.V. Zimin, D. Puiu, M.-C. Luo, T. Zhu, S. Koren, G. Marçais, J.A. Yorke, J. Dvorak, and S.L. Salzberg. Hybrid assembly of the large and highly repetitive genome of Aegilops tauschii, a progenitor of bread wheat, with the mega-reads algorithm. Genome Research 27: 787-792 (2017).
  24. J. Lu, F.P. Breitwieser, P. Thielen, and S.L. Salzberg. Bracken: estimating species abundance in metagenomics data. PeerJ Computer Science 3:e104 (2017).

2015-2016 (selected)

  1. D. Kim, L. Song, F.P. Breitwieser, and S.L. Salzberg. Centrifuge: rapid and sensitive classification of metagenomic sequences. Genome Research 26:12, 1721-29 (2016).
  2. M. Pertea, D. Kim, G. Pertea, J.T. Leek, and S.L. Salzberg. Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie, and Ballgown. Nature Protocols 11, 1650–1667 (2016). (Local PDF copy here)
  3. S.L. Salzberg, F.P. Breitwieser, A. Kumar, H. Hao, P. Burger, F.J. Rodriguez, M. Lim, A. Quiñones-Hinojosa, G.L. Gallia, J.A. Tornheim, M.T. Melia, C.L. Sears and C.A. Pardo. Next-generation sequencing in neuropathological diagnosis of infections of the nervous system. Neurology: Neuroimmunology & Neuroinflammation, 3:4 (2016), e251.
  4. V.L. Sork, S.T. Fitz-Gibbon, D. Puiu, M. Crepeau, P.F. Gugger, R. Sherman, K. Stevens, C. H. Langley, M. Pellegrini, and S.L. Salzberg. First draft assembly and annotation of the genome of a California endemic oak, Quercus lobata Née (Fagaceae). G3: Genes, Genomes, Genetics 6 (2016), 3485–3495
  5. P.J. Martínez-García, et al. The walnut (Juglans regia) genome sequence reveals diversity in genes coding for the biosynthesis of nonstructural polyphenols. The Plant Journal (2016), DOI: 10.1111/tpj.13207.
  6. M. Pertea, G.M. Pertea, C.M. Antonescu, T.C. Chang, J.T. Mendell, and S.L. Salzberg (2015). StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nature Biotechnology 33 (3), 290-295. Publ. online 18 February 2015.
  7. D. Kim, B. Langmead, and S.L. Salzberg. HISAT: a fast spliced aligner with low memory requirements. Nature Methods 12, 357-360 (2015), publ. online 9 March 2015.
  8. A.C. Frazee, G. Pertea, A.E. Jaffe, B. Langmead, S.L. Salzberg, and J.T. Leek. Ballgown bridges the gap between transcriptome assembly and expression analysis. Nature Biotechnology 33, 243-246 (2015).
  9. K.M. Kapheim et alGenomic signatures of evolutionary transitions from solitary to group living. Science 348:6239 (2015), 1139-43.
  10. 9. F.P. Breitwieser, C.A. Pardo, and S.L. Salzberg. Re-analysis of metagenomic sequences from acute flaccid myelitis patients reveals alternatives to enterovirus D68 infection. F1000 Research, 2015, 4:180.
  11. 7. T.-C. Chang, M. Pertea, S. Lee, S.L. Salzberg, and J.T. Mendell. Genome-wide annotation of microRNA primary transcript structures reveals novel regulatory mechanisms. Genome Research 25, 1401-9 (2015).
  12. M. Pop and S.L. Salzberg. Use and mis-use of supplementary material in science publications. BMC Bioinformatics (2015), 16:237
  13. B.M. Sadd et al. The genomes of two key bumblebee species with primitive eusocial organization. Genome Biology 16:76 (2015).
  14. R. Wilton, T. Budavari, B. Langmead, S. Wheelan, S.L. Salzberg, and A.S. Szalay. Arioc: high-throughput read alignment with GPU-accelerated exploration of the seed-and-extend search space. PeerJ (2015) 3:e808, .
  15. K. Deng, M. Pertea, et al. Broad CTL response is required to clear latent HIV-1 due to dominance of escape mutations. Nature 517(7534), 381-385 (2015).
  16. N.J. Booher et al. Single molecule real-time sequencing of Xanthomonas oryzae genomes reveals a dynamic structure and complex TAL (transcription activator-like) effector gene relationships. Microbial Genomics 1:4 (2015).

2013-2014 (selected)

  1. Kraken: ultrafast metagenomic sequence classification using exact alignments. D.E. Wood and S.L. Salzberg. Genome Biology 2014, 15:R46.
  2. A new rhesus macaque assembly and annotation for next-generation sequencing analyses. A.V. Zimin et al. Biology Direct 9:20 (2014),
  3. Decoding the massive genome of loblolly pine using haploid DNA and novel assembly strategies. D.B. Neale et alGenome Biology 2014, 15:R59.
  4. S. Merchant, D. Wood and S.L. Salzberg. Unexpected cross-species contamination in genome sequencing projects. PeerJ 2:e675 (2014).
  5. Sequencing and assembly of the 22-Gb loblolly pine genome. A. Zimin et al. Genetics (2014) 196:3, 875-890.
  6. DIAMUND: Direct comparison of genomes to detect mutations. S.L. Salzberg, M. Pertea, J.A. Fahrner, and N. Sobreira. Human Mutation 35:3 (2014), 283-288.
  7. V.G. Martinson, T. Magoc, H. Koch, S.L. Salzberg, and N.A. Moran. Genomic features of a bumble bee symbiont reflect its host environment. Applied Environmental Microbiology 80:13 (2014), 3793-3803.
  8. GAGE-B: An evaluation of genome assemblers for bacterial organisms.  T. Magoc, S. Pabinger, S. Canzar, X. Liu, Q. Su, D. Puiu, L.J. Tallon, and S.L. Salzberg. Bioinformatics 29:14 (2013), 1718-1725.
  9. The MaSuRCA genome assembler.  A.V. Zimin et al.  Bioinformatics 29(21), 2669-2677.
  10. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions.  D. Kim, G. Pertea, C. Trapnell, H. Pimentel, R. Kelley, and S.L. Salzberg.  Genome Biology 2013, 14: R36.
  11. NIH funding: It does support innovators.  S.L. Salzberg.  Nature (2013) 493, 26.
  12. Hawkeye and AMOS: visualizing and assessing the quality of genome assemblies.  M.C. Schatz et al., Briefings in Bioinformatics 14(2), 213-224.
  13. Genome-guided transcriptome assembly in the age of next-generation sequencing. L.D. Florea and S.L. Salzberg. IEEE/ACM Trans. on Comp. Biology and Bioinf. 10:5 (2013), 1234-1240.
  14. Thousands of exon skipping events differentiate among splicing patterns in sixteen human tissues. L. Florea, L. Song, and S.L. Salzberg. F1000 Research 2013, 2:188.
  15. EDGE-pro: Estimated Degree of Gene Expression in prokaryotic genomes.  T. Magoc, D. Wood, and S.L. Salzberg. Evolutionary Bioinformatics (2013), 9:127.

2011-2012 (selected)

  1. Fast gapped-read alignment with Bowtie 2.  B. Langmead and S.L. Salzberg.  Nature Methods 9 (2012), 357-359. (PDF reprint).
  2. The perils of gene patents.  S.L. Salzberg. Clinical Pharmacology & Therapeutics 91:6 (2012), 969-971.
  3. Butterfly genome reveals promiscuous exchange of mimicry adaptations among species.  The Heliconius Genome Consortium. Nature 487 (2012), 94-98. (PDF reprint)
  4. GAGE: A critical evaluation of genome assemblies and assembly algorithms. S.L. Salzberg, A.M. Phillippy, A.V. Zimin, D. Puiu, T. Magoc, S. Koren, T. Treangen, M.C. Schatz, A.L. Delcher, M. Roberts, G. Marcais, M. Pop, and J.A. Yorke. Genome Research 22:3 (2012), 557-567. (PDF reprint) (Supplementary material) (Suppl Figure 1) (Suppl Figure 2) (Suppl Table 1)
  5. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. C. Trapnell, A. Roberts, L. Goff, G. Pertea, D. Kim, D.R. Kelley, H. Pimental, S.L. Salzberg, J.L. Rinn, and L. Pachter.  Nature Protocols 7:3 (2012), 562-578.
  6. Repetitive DNA and next-generation sequencing: computational challenges and solutions. T.J. Treangen and S.L. Salzberg. Nature Reviews Genetics 13 (2012), 557-567.
  7. Mis-assembled segmental duplications in two versions of the Bos taurus genome. A.V. Zimin, D.R. Kelley, M. Roberts, S.L. Salzberg, and J.A. Yorke. PLoS ONE 7(8): e42680 (2012).
  8. Gene prediction with Glimmer for metagenomic sequences augmented by classification and clustering. D.R. Kelley, B. Liu, A.L. Delcher, M. Pop, and S.L. Salzberg.  Nucleic Acids Research (2012) 40 (1): e9.
  9. TopHat-Fusion: an algorithm for discovery of novel fusion transcripts. D. Kim and S.L. Salzberg. Genome Biology 2011, 12:R72.
  10. FLASH: Fast length adjustment of short reads to improve genome assemblies. T. Magoc and S.L. Salzberg. Bioinformatics 27:21 (2011), 2957-63.
  11. PhymmBL expanded: confidence scores, custom databases, parallelization and more. A. Brady and S. Salzberg. Nature Methods 8, 367 (2011).
  12. Two new complete genome sequences offer insight into host and tissue specificity of plant pathogenic Xanthomonas spp.. A.J. Bogdanove, R. Koebnik, H. Lu, …, and S.L. Salzberg. J. Bacteriology 193:19 (2011), 5450-64.
  13. Complete Columbian mammoth mitogenome suggests interbreeding with woolly mammoths.  J. Enk et al., Genome Biology 2011, 12:R51.
  14. Detection of lineage-specific evolutionary changes among primate species.  M. Pertea, G.M. Pertea, and S.L. Salzberg.  BMC Bioinformatics 2011, 12:274.
  15. Mugsy: Fast multiple alignment of closely related whole genomes.  S. V. Angiuoli and S.L. Salzberg.  Bioinformatics (2011), 27(3), 334-342.
  16. Bacillus anthracis comparative genome analysis in support of the Amerithrax investigation.  D. Rasko et al.  Proc. National Acad. Sci. 108:12 (2011), 5027-32.
  17. The genome of woodland strawberry (Fragaria vesca). V. Shulaev et al.  Nature Genetics 43(2011), 109-116.

2009-2010 (selected)

  1. Do-it-yourself genetic testing. S.L. Salzberg and M. Pertea. Genome Biology 2010, 11:404.
  2. Cloud computing and the DNA data race.  M.C. Schatz, B. Langmead, and S.L. Salzberg.  Nature Biotechnology 28, 691-693 (2010).
  3. Quake: quality-aware detection and correction of sequencing errors.  D.R. Kelley, M.C. Schatz, and S.L. Salzberg. Genome Biology 2010, 11:R116.
  4. Between a chicken and a grape: estimating the number of human genes. M. Pertea and S.L. Salzberg. Genome Biology 2010, 11:206.
  5. Assembly of large genomes using second-generation sequencing.  M.C. Schatz, A.L. Delcher, and S.L. Salzberg.  Genome Research 20 (2010), 1165-1173.
  6. [the Cufflinks paper] Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. C. Trapnell, B.A. Williams, G. Pertea, A. Mortazavi, G. Kwan, M.J. van Baren, S.L. Salzberg, B.J. Wold, and L. Pachter (2010). Nature Biotechnology 28, 511-515 (2010).
  7. Recent advances in RNA sequence analysis.  S.L. Salzberg. F1000 Biology Reports 2010, 2:64.
  8. Mind the gaps.  S.L. Salzberg, Nature Methods 7:2 (2010), 105-6.
  9. [the turkey genome] Multi-platform next-generation sequencing of the domestic turkey (Meleagris gallopavo): genome assembly and analysis. R.A. Dalloul et alPLoS Biology 8(9): e1000475 (2010).
  10. Searching for SNPs with cloud computing. B. Langmead, M.C. Schatz, J. Lin, M. Pop, and S.L. Salzberg. Genome Biology 2009, 10:R134.
  11. Phymm and PhymmBL: metagenomic phylogenetic classification with interpolated Markov models. A. Brady and S.L. Salzberg. Nature Methods, 6:673-676, 2009.
  12. [the Bowtie paper] Ultrafast and memory-efficient alignment of short DNA sequences to the human genome.  B. Langmead, C. Trapnell, M. Pop, and S.L. Salzberg. Genome Biology 2009, 10:R25. doi:10.1186/gb-2009-10-3-r25.
  13. [the TopHat paperTopHat: discovering splice junctions with RNA-Seq.  C. Trapnell, L. Pachter, and S.L. Salzberg.  Bioinformatics, 25:1105-11, 2009.
  14. A whole-genome assembly of the domestic cow, Bos taurus. A.V. Zimin, A.L. Delcher, L. Florea, D.R. Kelley, M.C. Schatz, D. Puiu, F. Hanrahan, G. Pertea, C.P. Van Tassell, T.S. Sonstegard, G. Marcais, M. Roberts, P. Subramanian, J.A. Yorke, and S.L. Salzberg. Genome Biology 2009, 10:R42.
  15. How to map billions of short reads onto genomes.  C. Trapnell and S.L. Salzberg.  Nature Biotechnology 27:5 (2009), 455-7.
  16. OperonDB: a comprehensive database of predicted operons in microbial genomes.  M. Pertea, K. Ayanbule, M. Smedinghoff, and S.L. Salzberg. Nucleic Acids Research 2009, 37:D479-482.

2007-2008 (selected)

  1. Re-assembly of the genome of Francisella tularensis subsp. holarctica OSU18.  D. Puiu and S.L. Salzberg, PLoS ONE 3:10 (2008): e3427.
  2. The complete genome sequence of Bacillus anthracis Ames “Ancestor.”  J. Ravel et al. J. Bacteriology 191:1 (2009), 445-446.
  3. Comparative genomics of the neglected human malaria parasite Plasmodium vivax.  J.M. Carlton et alNature 455 (2008), 757-763.
  4. Genome sequence and rapid evolution of the rice pathogen Xanthomonas oryzae pv. oryzae PXO99A.  S.L. Salzberg et al., BMC Genomics 9:204 (2008).
  5. Gene-Boosted Assembly of a Novel Bacterial Genome from Very Short Reads.  S.L. Salzberg, D.D. Sommer,  D. Puiu, and V.T. Lee. PLoS Computational Biology 4:9 (2008): e1000186.
  6. Bioinformatics challenges of new sequencing technology.  M. Pop and S.L. Salzberg. Trends in Genetics 24:3 (2008), 142-149.
  7. The draft genome of the transgenic tropical fruit tree papaya (Carica papaya Linnaeus).  R. Ming et alNature 452 (2008), 991-6.
  8. Automated eukaryotic gene structure annotation using EVidenceModeler.  B.J. Haas, S.L. Salzberg, et al. Genome Biology 2008, 9:R7.
  9. Rapid, accurate, computational discovery of Rho-independent transcription terminators illuminates their relationship to DNA uptake. C.L. Kingsford, K. Ayanbule, and S.L. Salzberg.  Genome Biology 2007, 8(2):R22.
  10. Genome analysis linking recent European and African influenza (H5N1) viruses.  Steven L. Salzberg, Carl Kingsford, Giovanni Cattoli, David J. Spiro, Daniel A. Janies, Mona Mehrez Aly, Ian H. Brown, Emmanuel Couacy-Hymann, Gian Mario De Mia, Do Huu Dung, Annalisa Guercio, Tony Joannis, Ali Safar Maken Ali, Azizullah Osmani, Iolanda Padalino, Magdi D. Saad, Vladimir Savić, Naomi A. Sengamalay, Samuel Yingst, Jennifer Zaborsky, Olga Zorman-Rojs, Elodie Ghedin, and Ilaria Capua. Emerging Infectious Diseases 13:5 (2007).
  11. A unified model explaining the offsets of overlapping and near-overlapping prokaryotic genes.  C. Kingsford, A.L. Delcher, and S.L. Salzberg.  Molec. Biol. and Evol 24:9 (2007),  2091-98.
  12. Identifying bacterial genes and endosymbiont DNA with Glimmer. A.L. Delcher, K.A. Bratke, E.C. Powers, and S.L. Salzberg. Bioinformatics 2007 Mar 15;23(6):673-9. This is the Glimmer 3 paper.
  13. Draft Genome of the Filarial Nematode Parasite Brugia malayi. E. Ghedin et al., Science 317:5845 (2007), 1756-60.
  14. Comprehensive DNA signature discovery and validation.  A.M. Phillippy, J.A. Mason, K. Ayanbule, D.D. Sommer, E. Taviani, A. Huq, R.R. Colwell, I.T. Knight, and S.L. Salzberg.  PLoS Computational Biology 3:5 (2007), e98.
  15. Hawkeye: an interactive visual analytics tool for genome assemblies.  M. Schatz, A.M. Phillippy, B. Shneiderman, and S.L. Salzberg.  Genome Biology 2007 Mar 9;8(3):R34.
  16. Minimus: a fast, lightweight genome assembler.  D.D. Sommer, A.L. Delcher, S.L. Salzberg, and M. Pop.  BMC Bioinformatics 2007 Feb 26;8:64.
  17. Draft Genome Sequence of the Sexually Transmitted Pathogen Trichomonas vaginalis. J.M. Carlton, et al., Science 315 (2007), 207-212.

2006 and earlier (selected)

  1. A phylogenetic generalized hidden Markov model for predicting alternatively spliced exons. J.E. Allen and S.L. Salzberg. Algorithms for Molecular Biology 1:14 (2006).
  2. It is time to end the patenting of software. J. Quackenbush and S.L. Salzberg. Bioinformatics 22:12 (2006), 1416-7.
  3. JIGSAW, GeneZilla, and GlimmerHMM: puzzling out the features of human genes in the ENCODE regions. J.E. Allen, W.H. Majoros, M. Pertea, and S.L. Salzberg. Genome Biology 2006, 7(Suppl):S9.
  4. Macronuclear genome sequence of the ciliate Tetrahymena thermophila, a model eukaryote. J.A. Eisen, et al. PLoS Biology 4:9 (2006): e286.
  5. Physiogenomic resources for rat models of heart, lung and blood disorders. R.L. Malek et al. Nature Genetics 38 (2006), 234-239.
  6. Large-scale sequencing of human influenza reveals the dynamic nature of viral genome evolution. E. Ghedin, N.A. Sengamalay, M. Shumway, J. Zaborsky, T. Feldblyum, V. Subbu, D.J. Spiro, J. Sitz, H. Koo, P. Bolotov, D. Dernovoy, T. Tatusova, Y. Bao, K. St George, J. Taylor, D.J. Lipman, C.M. Fraser, J.K. Taubenberger, and S.L. Salzberg. Nature (2005), 1162-1166.
  7. Whole-genome analysis of human influenza A virus reveals multiple persistent lineages and reassortment among recent H3N2 viruses. E.C. Holmes, E. Ghedin, N. Miller, J. Taylor, Y. Bao, K. St. George, B.T. Grenfell, S.L. Salzberg, C.M. Fraser, D.J. Lipman, and J.K. Taubenberger. PLoS Biology 3:9 (2005), e300.
  8. Efficient implementation of a generalized pair hidden Markov model for comparative gene finding. W.H. Majoros, M. Pertea, and S.L. Salzberg. Bioinformatics 21:9 (2005), 1782-88.
  9. Efficient decoding algorithms for generalized hidden Markov model gene finders. W.H. Majoros, M. Pertea, A.L. Delcher, and S.L. Salzberg. BMC Bioinformatics 6 (2005), 16.
  10. Comparative genomics of Trypanosomatid parasitic protozoa. N.M. El-Sayed et al. Science 309 (2005), 404-409.
  11. The genome of the basidiomycetous yeast and human pathogen Cryptococcus neoformans. B.J. Loftus et al. Science 309 (Feb 25 2005), 1321-4.
  12. Serendipitous discovery of Wolbachia genomes in multiple Drosophila species.  S.L. Salzberg, J.C. Dunning Hotopp, A.L. Delcher, M. Pop, D.R. Smith, M.B. Eisen, and W.C. Nelson. Genome Biology 2005, 6:R23.
  13. The genome assembly archive: a new public resource. S.L. Salzberg, D. Church, M. DiCuccio, E. Yaschenko, and J. Ostell. PLoS Biology 9:2 (2004), 1273-1275.
  14. Comparative genome assembly. M. Pop, A. Phillippy, A.L. Delcher, S.L. Salzberg. Briefings in Bioinformatics 5:3 (2004), 237-248.
  15. Automated correction of genome sequence errors. P. Gajer, M. Schatz, and S.L. Salzberg. Nucleic Acids Research 32:2 (2004), 562-569.
  16. TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders. W.H. Majoros, M. Pertea, and S.L. Salzberg. Bioinformatics 20:16 (2004), 2878-79.
  17. Genomic insights into methanotrophy: the complete genome sequence of Methylococcus capsulatus (Bath). N. Ward, et al., PLoS Biology 10:2 (2004), e303.
  18. An empirical analysis of training protocols for probabilistic gene finders. W.H. Majoros and S.L. Salzberg. BMC Bioinformatics 5 (2004), 206.
  19. Versatile and open software for comparing large genomes. S. Kurtz, A. Phillippy, A.L. Delcher, M. Smoot, M. Shumway, C. Antonescu, and S.L. Salzberg. Genome Biology 5:R12 (2004). This is the MUMmer3 paper.
  20. DAGChainer: A tool for mining segmental genome duplications and synteny. B.J. Haas, A.L. Delcher, J.R. Wortman, and S.L. Salzberg. Bioinformatics 20:18 (2004), 3643-6.
  21. Hierarchical scaffolding with Bambus. M. Pop, D. Kosack, and S.L. Salzberg. Genome Research 14(2004), 149-159.
  22. Computational gene prediction using multiple sources of evidence. J.E. Allen, M. Pertea, and S.L. Salzberg. Genome Research 14(2004), 142-148.
  23. Yeast rises again. S.L. Salzberg, Nature 423 (2003), 233-234.
  24. Comparative genome sequencing for discovery of novel polymorphisms in Bacillus anthracis. T.D. Read, S.L. Salzberg, M. Pop, M. Shumway, L. Umayam, L. Jiang, E. Holtzapple, J. Busch, K.L. Smith, J.M. Schupp, D. Solomon, P. Keim, and C.M. Fraser. Science 296 (2002), 2028-2033.
  25. Genome sequence of the human malaria parasite Plasmodium falciparum. M.J. Gardner et al., Nature 419 (2002), 498-511.
  26. The genome sequence of the malaria mosquito Anopheles gambiae. R.A. Holt et al., Science 298 (2002), 129-149.
  27. Comparative genome and proteome analysis of Anopheles gambiae and Drosophila melanogaster. E.M. Zdobnov et al., Science 298 (2002), 149-159.
  28. Genome sequence and comparative analysis of the model rodent malaria parasite Plasmodium yoelii yoelii. J.M. Carlton et al., Nature 419 (2002), 512-519.
  29. A comparison of whole-genome shotgun-derived mouse chromosome 16 and the human genome. R.J. Mural et al. (176 authors). Science 296 (2002), 1661-1671.
  30. Fast algorithms for large-scale genome alignment and comparison. A.L. Delcher. A. Phillippy, J. Carlton, and S.L. Salzberg. Nucleic Acids Research 30:11 (2002), 2478-2483. This is the MUMmer 2 paper.
  31. Full-length messenger RNA sequences greatly improve genome annotation. B.J. Haas, N. Volfovsky, C.D. Town, M. Troukhan, N. Alexandrov, K.A. Feldmann, R.B. Flavell, O. White, and S.L. Salzberg. Genome Biology 3:6 (2002), research0029.1-12.
  32. Microbial Genes in the Human Genome: Lateral Transfer or Gene Loss? S.L. Salzberg, O. White, J. Peterson, and J.A. Eisen.  Science 292 (2001), 1903-1906. See also the Perspective in Science. ANNOTATED! See the annotated version of this paper, designed to help students and teachers of science, developed by the SCOPE project and the editors of Science.
  33. The Sequence of the Human Genome. J. Craig Venter et al. (274 authors), Science 291 (2001), 1304-1351. Get the figures showing genome-scale duplications in PDF format here: [Page 1] [Page 2]
  34. GeneSplicer: a new computational method for splice site prediction. M. Pertea, X. Lin, and S.L. Salzberg. Nucleic Acids Research 29:5 (2001) 1185-1190.
  35. A probabilistic method for identifying start codons in bacterial genomes. B.E. Suzek, M.D. Ermolaeva, M. Schreiber, and S.L. Salzberg. Bioinformatics 17:12, 1123-1130.
  36. Prediction of operons in microbial genomes. M.D. Ermolaeva, O. White and S.L. Salzberg. Nucleic Acids Research 29:5 (2001), 1216-1221.
  37. A clustering method for repeat analysis in DNA sequences. N. Volfovsky, B.J. Haas, and S.L. Salzberg. Genome Biology 2:8 (2001), research0027:1-11.
  38. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. The Arabidopsis Genome Initiative (143 authors), Nature 408 (2000), 796-815. (Also contains links to our papers on chromosomes 1, 2, and 3 of Arabidopsis.)
  39. Evidence for symmetric chromosomal inversions around the replication origin in bacteria. J.A. Eisen, J.F. Heidelberg, O. White, and S.L. Salzberg. Genome Biology 1:6 (2000), 1-9.
  40. Microbial genome sequencing. C.M. Fraser, J.A. Eisen, and S.L. Salzberg. Nature 406 (2000), 799-803.
  41. Finding genes in Plasmodium falciparum chromosome 3. M. Pertea, S.L. Salzberg, and M.J. Gardner. Nature 404 (2000), 34.
  42. DNA sequence of both chromosomes of the cholera pathogen Vibrio cholerae. J.F. Heidelberg et al., Nature 406 (2000), 477-483.
  43. Genome sequences of Chlamydia trachomatis MoPn and C. pneumoniae AR39. Timothy D. Read et al., Nucleic Acids Research 28:6 (2000), 1397-1406.
  44. Gene Index analysis of the human genome estimates approximately 120,000* genes. F. Liang, I.E. Holt, G. Pertea, S. Karamycheva, S.L. Salzberg, and J. Quackenbush. Nature Genetics 25:2 (2000), 239-240. *Estimate corrected to 56,000 genes; Nature Genetics 26:4 (2000), 501.
  45. An optimized protocol for analysis of EST sequences. F. Liang, I.E. Holt, G. Pertea, S. Karamycheva, S.L. Salzberg, and J. Quackenbush. Nucleic Acids Research 28:18 (2000), 3657-3665.
  46. Prediction of transcription terminators in bacterial genomes. M.D. Ermolaeva, H. Khalak, O. White, H.O. Smith, and S.L. Salzberg. J. Molecular Biology 301 (2000), 27-33.
  47. Sequence and analysis of chromosome 2 of Arabidopsis thaliana. Xiaoying Lin et al., Nature 402 (1999), 761-768.
  48. Complete genome sequence of Neisseria meningitidis serogroup B strain MC58. H. Tettelin et al. Science 287 (2000), 1809-1815.
  49. Improved microbial gene identification with GLIMMER. A.L. Delcher, D. Harmon, S. Kasif, O. White, and S.L. Salzberg. Nucleic Acids Research, 27:23 (1999), 4636-4641. This is the Glimmer2 paper.
  50. Interpolated Markov models for eukaryotic gene finding. S.L. Salzberg, M. Pertea, A.L. Delcher, M.J. Gardner, and H. Tettelin. Genomics, 59 (1999), 24-31.
  51. Alignment of whole genomes. A.L. Delcher, S. Kasif, R.D. Fleischmann, J. Peterson, O. White, and S.L. Salzberg. Nucleic Acids Research, 27:11 (1999), 2369-2376. Note that Figure 6 is supposed to be in color, and was mistakenly printed as black and white. Click here for the color figure. This is the original MUMmer paper.
  52. Optimized multiplex PCR: Efficiently closing a whole-genome shotgun sequencing project. H. Tettelin, D. Radune, S. Kasif, H. Khouri, and S.L. Salzberg. Genomics 62(1999), 500-507.
  53. Genome Sequence of the Radioresistant Bacterium Deinococcus radiodurans R1. O. White et al. Science 286 (1999), 1571-1577.
  54. DNA uptake signal sequences in naturally transformable bacteria. H.O. Smith, M.L. Gwinn, and S.L. Salzberg. Research in Microbiology, 150 (1999), 603-616.
  55. Evidence for lateral gene transfer between Archaea and Bacteria from genome sequence of Thermotoga maritima. K.E. Nelson et al., Nature 399 (1999), 323-329.
  56. Book: Computational Methods in Molecular Biology (1998; in paperback since 1999) edited by S.L. Salzberg, D.B. Searls, and S. Kasif.
  57. Chromosome 2 sequence of the human malaria parasite Plasmodium falciparum. M.J. Gardner et al., Science 282 (1998), 1126-1132.
  58. Microbial gene identification using interpolated Markov models. S.L. Salzberg, A.L. Delcher, S. Kasif, and O. White. Nucleic Acids Research, 26:2 (1998), 544-548. This is the original Glimmer paper.
  59. A decision tree system for finding genes in DNA. S.L. Salzberg, A.L. Delcher, K. Fasman, and J. Henderson. Journal of Computational Biology 5:4 (1998), 667-680.
  60. Skewed oligomers and origins of replication. S.L. Salzberg, A.J. Salzberg, A.R. Kerlavage, and J.-F. Tomb. Gene 217:1-2 (1998), 57-67.
  61. Complete Genomic Sequence of Treponema pallidum, the Syphilis Spirochete C.M. Fraser et al. Science 281 (1998), 375-388.
  62. Finding Genes in Human DNA with a Hidden Markov Model. J. Henderson, S.L. Salzberg, and K. Fasman.  Journal of Computational Biology 4:2 (1997), 127-141.
  63. A Method for Identifying Splice Sites and Translational Start Sites in Eukaryotic mRNA. S.L. Salzberg. Computer Applications in the Biosciences (CABIOS) 13:4 (1997), 365-376.
  64. Genomic Sequence of a Lyme Disease Spirochaete, Borrelia burgdorferi. C.M. Fraser et al., Nature 390 (1997), 580-586.
  65. Locating Protein Coding Regions in Human DNA using a Decision Tree Algorithm. S.L. Salzberg. Journal of Computational Biology, 2:3 (1995), 473-485.