Skip to main content

Research Repository

Advanced Search

Cotton pan-genome retrieves the lost sequences and genes during domestication and selection

Li, J.; Yuan, D.; Wang, P.; Wang, Q.; Sun, M.; Liu, Z.; Si, H.; Xu, Z.; Ma, Y.; Zhang, B.; Pei, L.; Tu, L.; Zhu, L.; Chen, L.-L.; Lindsey, K.; Zhang, X.; Jin, S.; Wang, M.

Cotton pan-genome retrieves the lost sequences and genes during domestication and selection Thumbnail


J. Li

D. Yuan

P. Wang

Q. Wang

M. Sun

Z. Liu

H. Si

Z. Xu

Y. Ma

B. Zhang

L. Pei

L. Tu

L. Zhu

L.-L. Chen

X. Zhang

S. Jin

M. Wang


Background Millennia of directional human selection has reshaped the genomic architecture of cultivated cotton relative to wild counterparts, but we have limited understanding of the selective retention and fractionation of genomic components. Results We construct a comprehensive genomic variome based on 1961 cottons and identify 456 Mb and 357 Mb of sequence with domestication and improvement selection signals and 162 loci, 84 of which are novel, including 47 loci associated with 16 agronomic traits. Using pan-genome analyses, we identify 32,569 and 8851 non-reference genes lost from Gossypium hirsutum and Gossypium barbadense reference genomes respectively, of which 38.2% (39,278) and 14.2% (11,359) of genes exhibit presence/absence variation (PAV). We document the landscape of PAV selection accompanied by asymmetric gene gain and loss and identify 124 PAVs linked to favorable fiber quality and yield loci. Conclusions This variation repertoire points to genomic divergence during cotton domestication and improvement, which informs the characterization of favorable gene alleles for improved breeding practice using a pan-genome-based approach.


Li, J., Yuan, D., Wang, P., Wang, Q., Sun, M., Liu, Z., …Wang, M. (2021). Cotton pan-genome retrieves the lost sequences and genes during domestication and selection. Genome Biology, 22, Article 119.

Journal Article Type Article
Acceptance Date Apr 14, 2021
Online Publication Date Apr 23, 2021
Publication Date 2021
Deposit Date Apr 28, 2021
Publicly Available Date Apr 30, 2021
Journal Genome Biology
Print ISSN 1474-760X
Publisher BioMed Central
Peer Reviewed Not Peer Reviewed
Volume 22
Article Number 119


Published Journal Article (8.1 Mb)

Publisher Licence URL

Copyright Statement
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

You might also like

Downloadable Citations