A Python-based programming language for high-performance computational genomics

To the Editor — The vast growth of next-generation sequencing data has provided us with a new understanding of many biological phenomena. As sequencing technologies evolve, sequencing datatypes (such as standard Illumina short reads, PacBio long reads or 10x Genomics barcoded reads) typically require new implementations of corresponding computational analysis techniques, necessitating software that is not only computationally efficient, but also quick to develop and easy to maintain so as to enable rapid adaptations to new kinds of data.

Access options

Subscription info for Korean customers

We have a dedicated website for our Korean customers. Please go to natureasia.com to subscribe to this journal.

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: The Seq programming language.

References

  1. 1.

    Yu, Y. W., Daniels, N. M., Danko, D. C. & Berger, B. Cell Syst. 1, 130–140 (2015).

    CAS 
    Article 

    Google Scholar 

  2. 2.

    Peng, R. D. Science 334, 1226–1227 (2011).

    CAS 
    Article 

    Google Scholar 

  3. 3.

    Baker, M. Nature 533, 452–454 (2016).

    CAS 
    Article 

    Google Scholar 

  4. 4.

    Lee, R. S. & Hanage, W. P. Lancet Microbe https://doi.org/10.1016/S2666-5247(20)30028-8 (2020).

  5. 5.

    Perkel, J. M. Nature 588, 185–186 (2020).

    Article 

    Google Scholar 

  6. 6.

    Köster, J. Bioinformatics 32, 444–446 (2016).

    Article 

    Google Scholar 

  7. 7.

    Döring, A., Weese, D., Rausch, T. & Reinert, K. BMC Bioinformatics 9, 11 (2008).

    Article 

    Google Scholar 

  8. 8.

    Reinert, K. et al. J. Biotechnol. 261, 157–168 (2017).

    CAS 
    Article 

    Google Scholar 

  9. 9.

    Ward, B. J. BioJulia https://biojulia.net (accessed 19 November 2020).

  10. 10.

    Cock, P. J. et al. Bioinformatics 25, 1422–1423 (2009).

    CAS 
    Article 

    Google Scholar 

  11. 11.

    Russell, P. H., Johnson, R. L., Ananthan, S., Harnke, B. & Carlson, N. E. PLoS One 13, e0205898 (2018).

    Article 

    Google Scholar 

  12. 12.

    Stajich, J. E. et al. Genome Res. 12, 1611–1618 (2002).

    CAS 
    Article 

    Google Scholar 

  13. 13.

    Li, H. Preprint at https://arxiv.org/abs/1303.3997 (2013).

  14. 14.

    Yorukoglu, D., Yu, Y. W., Peng, J. & Berger, B. Nat. Biotechnol. 34, 374–376 (2016).

    CAS 
    Article 

    Google Scholar 

  15. 15.

    Hach, F. et al. Nucleic Acids Res. 42, W494–W500 (2014).

    CAS 
    Article 

    Google Scholar 

  16. 16.

    Li, H. Bioinformatics 34, 3094–3100 (2018).

    CAS 
    Article 

    Google Scholar 

  17. 17.

    Smith, T., Heger, A. & Sudbery, I. Genome Res. 27, 491–499 (2017).

    CAS 
    Article 

    Google Scholar 

  18. 18.

    McKenna, A. et al. Genome Res. 20, 1297–1303 (2010).

    CAS 
    Article 

    Google Scholar 

  19. 19.

    Bray, N., Dubchak, I. & Pachter, L. Genome Res. 13, 97–102 (2003).

    CAS 
    Article 

    Google Scholar 

  20. 20.

    Berger, E. et al. Nat. Commun. 11, 4662 (2020).

    CAS 
    Article 

    Google Scholar 

  21. 21.

    Berger, E., Yorukoglu, D. & Berger, B. International Conference on Research in Computational Molecular Biology 28–29 (Springer, 2015).

  22. 22.

    Abelson, H. & Sussman, G. J. Structure and Interpretation of Computer Programs (MIT Press, 1996).

  23. 23.

    Shajii, A., Numanagić, I., Baghdadi, R., Berger, B. & Amarasinghe, S. Proc. ACM Program. Lang. 3, 125:1–125:29 (2019).

    Article 

    Google Scholar 

Download references

Author information

Author notes

  1. These authors contributed equally: Ariya Shajii, Ibrahim Numanagić.

Affiliations

  1. Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA

    Ariya Shajii, Ibrahim Numanagić, Alexander T. Leighton, Saman Amarasinghe & Bonnie Berger

  2. Department of Computer Science, University of Victoria, Victoria, British Columbia, Canada

    Ibrahim Numanagić & Haley Greenyer

  3. Department of Mathematics, Massachusetts Institute of Technology, Cambridge, MA, USA

    Alexander T. Leighton & Bonnie Berger

Corresponding authors

Correspondence to
Saman Amarasinghe or Bonnie Berger.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature Biotechnology thanks Ivan Costa and Judith Zaugg for their contribution to the peer review of this work.

Supplementary information

About this article

Verify currency and authenticity via CrossMark

Cite this article

Shajii, A., Numanagić, I., Leighton, A.T. et al. A Python-based programming language for high-performance computational genomics.
Nat Biotechnol (2021). https://doi.org/10.1038/s41587-021-00985-6

Download citation

  • Published:

  • DOI: https://doi.org/10.1038/s41587-021-00985-6

Read More

Written by 

Leave a Reply

Your email address will not be published. Required fields are marked *