Home / News / Latest news / Reference Sequence / Q&A

Questions & Answers

What is the article about?

The article presents a high-quality reference genome sequence of the bread wheat variety Chinese Spring. This is essentially a highly detailed map of the wheat genome DNA ordered along the 21 wheat chromosomes. This genome sequence is the most complete and best ordered sequence that has been produced for wheat, the most widely grown crop in the world.

The article also reports the precise location of 107,891 genes and more than 4 million molecular markers along the 21 chromosomes. The genes and markers are identified “in context”, meaning that they have been positioned on their specific sub-genomes and the sequence information in between the genes and markers is also described, providing a comprehensive view of the organization of the genes and the regions important for their regulation.

Furthermore, the article also presents information on where and when the nearly 108,000 genes are active (or expressed) under different conditions that include different growth stages of the wheat plant and under certain stress conditions.

Who participated?

The work is the result of a collaborative interdisciplinary project involving teams from 73 research institutes and private companies in 20 countries.

The article itself has 202 authors, with the International Wheat Genome Sequencing Consortium (IWGSC) being the main author. All authors have contributed directly to the generation, assembly and/or analysis of the data.

Why is it important to sequence the wheat genome?

Wheat is essential for global food security. It is the most widely grown crop in the world, being grown on all continents except Antartica and is adaptable to a wide range of climate and cultivation conditions. It is the staple food for one third of the global human population and contributes more to the daily calorie and protein intake than any other human food source.

The world is currently facing enormous challenges with a human population projected to rise to over 9.6 billion by 2050. The FAO predicts that food production will need to increase by over 60% to meet demand. The increase in production also must be achieved sustainably, without expanding land use, with minimal use of fertilizers, water and pest treatments, and in the context of climate change.

To produce sufficient wheat for the human population in the future, there is an urgent need to develop new wheat varieties with higher yield, better resistance to diseases and pests, and tolerance to abiotic stresses such as drought, high salinity or high mineral content of the soils.

With the reference genome sequence, breeders have at their disposal tools to identify genes and regulatory elements underlying complex traits and accelerate improvement through genomics assisted breeding and biotechnology. Using the information provided in the genome sequence, breeders will be able to produce more rapidly new wheat varieties with higher yields and improved sustainability to meet the demands of a growing world population in a changing environment.

What is a reference sequence?

A reference sequence is the full DNA sequence information of one representative of a species, in this case Triticum aestivum cv Chinese Spring. All subsequent wheat genome-related research will refer to this resource as if it was a dictionary.

It is considered to be a reference because it meets high quality standards, representing over 90% of the genome in sequence blocks that are organized along chromosomes and are highly representative of the wheat DNA. Previous versions of the wheat genome sequence were considered to be drafts because they did not present the information in the full linear order context of the 21 chromosomes. The cultivar, Chinese Spring was chosen for the reference because it had been used to develop many genetic resources widely used by wheat scientists.

The wheat reference sequence published here represents 94% of the wheat genome assigned to the 21 wheat chromosomes and presents the location and order of 107,891 genes, as well as more than 4.7 million molecular markers on the 21 chromosomes.

What does high-quality mean and why is it important?

Building a genome sequence is like cartography. The IWGSC genome sequence can be likened to a roadmap with several layers of detail: major highways, smaller roads, little paths, rivers, landmarks and houses. The more details there are in a map, the higher its quality. This is essentially the same for a genome sequence.

The IWGSC genome sequence maps out all the little roads, contains more than 4 million landmarks (markers) and gives the precise addresses of more than 100,000 houses (genes). Having this detailed information about the location of markers and genes is absolutely critical to researchers and breeders using the sequence to develop improved wheat varieties.

High-quality also means that a very high proportion of the sequence information between the 100,000+ genes is known. It is crucial to have this information because 80% of the wheat genome is composed of long stretches of repeated DNA sequences that are not genes. It is not yet fully understood what the role of this non-gene information is, but it is suspected that it plays a role in controlling when and how a gene is expressed.

What will be the impact for scientists?

Bread wheat is a very good model for studying complex genomes. The reference sequence provides a unique resource for studying and understanding the biology of the wheat genome, in particular understanding how wheat evolved, why some parts of the genome were conserved over time, and how genes are regulated, to name only a few areas of research.

Scientists will also use the reference genome tool to study wheat diversity, i.e. the differences between the genomes of different wheat varieties that are associated with specific characteristics, such as resistance to pests and diseases, or adaptation to drought or climate extremes. These differences can be studied at the level of single genes, sub-genomes, or whole genomes and new tools will be developed to speed up the process of screening for markers of interest.

For example: even though wheat is not a major crop in Japan in terms of acreage cultivated, the varieties that are grown there are very well adapted to high humidity and temperatures. Studies of the differences between the sequence of those varieties and the reference genome sequence could help identify genes responsible for those desirable characteristics. Once identified, these genes could be introduced in commercial varieties that could be grown in hot and humid climates, such as in Southeast Asia, therefore contributing to food security in the region.

What will be the impact for breeders?

The reference sequence provides breeding companies and public breeders with a tool to speed up the development of new improved varieties.

They will be able to use the information provided in the reference genome sequence to more rapidly identify and locate genes, or markers close to genes, responsible for agronomic characteristics (called “traits”) of interest, such as high yield, stress tolerance, quality, and disease resistance. They can then isolate those genes, study how they function and introduce them into commercial varieties. In the past, the time between “finding a gene” and commercialization of a variety containing an interesting gene was about 12 to 15 years, now, with a high-quality reference sequence available, this could be reduced by a third, to between 3 and 5 years.

What will be the impact for farmers?

Farmers will benefit from the new varieties developed by breeders that will be better adapted to specific field conditions and agronomical practices.For example, the new varieties could be more resistant to drought, need less nitrogen input, or be resistant to diseases, hence requiring less fertilizers or fungicide applications.

Farmers will be able to produce better quality seeds, with less impact on the environment, leading to more sustainable production.

What will be the impact for consumers?

Ultimately, consumers will benefit by having access to higher quality food that meets more stringent requirements for agricultural sustainability and impact on the environment.

Also, breeders will be able to develop refined varieties containing characteristics to meet specific markets. This could, for example, be wheat varieties with higher protein content or less gluten to address gluten intolerance.

Why did it take so long?

The whole project took 13 years.

From the start, obtaining a high-quality reference sequence of the genome of bread wheat has been a scientific challenge because of the size and structure of the wheat genome. The wheat genome is huge – more than five times larger than the human genome – and comprises 21 chromosomes originating from three highly similar sub-genomes (A, B and D). Each sub-genome is larger than the human genome and contains 7 chromosomes.

Also, over 80% of the wheat genome is made of repeated elements that are grouped in long stretches that are nested within each other (like a Russian doll).

These two features present problems for assembling the genome sequence from the short pieces that are the product of sequencing machines, and for assigning the sequences to the correct chromosome and sub-genome.

Because of these issues, the IWGSC decided that the only approach that would deliver a high-quality reference was to reduce the complexity and follow a strategy similar to that used for other high-quality reference genomes – such as human, mouse, zebrafish, Arabidopsis and rice – namely, sequencing each chromosome separately. This took a long time, but the end result is of high quality and can directly be used by breeders to improve wheat.

How did they do it?

To overcome the size and complexity problems of the wheat genome, the IWGSC set out a roadmap in 2005 to produce physical maps of individual chromosomes that positioned physical bacterial clones (bacterial artificial chromosomes (BACs), around 200 kilobases long) and genetic markers to specific chromosomes.

The BACs were then used as a substrate for sequencing (this method was also used to generate the reference human genome sequence and for wheat chromosome 3B), or to produce sequence tags that could be combined with sequences assembled from a whole genome sequencing approach to position the sequence pieces on the correct chromosomes.

A whole genome sequence was produced from short sequence fragments (average 150 bases) using algorithms developed specifically to handle wheat by the company NRGene and chromosome-specific resources (physical maps, chromosome and BAC sequence tags, genetic markers and chromosome conformation data (ChIP) were used to assign and position sequence fragments on the chromosomes.

The ultimate result of combining all of these approaches is a high-quality reference genome sequence of bread wheat.

Are the data publicly available?


All IWGSC data are available at a central IWGSC repository in France at URGI (Unité de Recherche Génomique-Info (URGI) at Institut National de la Recherche Agronomique (INRA)). The repository provides public access to wheat sequence data and other IWGSC resources, such as physical maps and marker data. All data related to the reference genome were released to the scientific community as soon as they became available, in January 2017.

While the IWGSC team continued to work on the analysis of the genome and prepared the present publication, other scientists have had access to the information so that they could advance their work more rapidly. Consequently, already more than 100 scientific articles have been published referencing the IWGSC reference genome.

The wheat reference sequence data are also available in other international database such as Ensembl Plants, Graingenes, NCBI, and have been deposited in the European Nucleotide Archive (ENA).

Wasn’t the wheat genome already sequenced?

Several draft versions of the wheat genome have been generated over the last years by different groups using different approaches. What sets them apart is the extent and the quality of the information provided.

From the start, the IWGSC goal was to produce resources to accelerate wheat improvement, i.e., to produce a reference sequence that could be used by breeders. Unlike genome sequences that are used mostly for scientific research, this requires a very precise map of the genome with information on markers and genes ordered along the 21 wheat chromosomes.

The IWGSC reference genome is the best quality genome sequence produced to date for wheat. It not only presents the genetic code of wheat, it also provides the precise location and sequence of more than 100,000 genes, as well as more than 4 million markers, on the 21 wheat chromosomes.

What is the next step?

The IWGSC will now focus on producing a genome-sequence based toolbox for breeders and scientists to use for wheat improvement.

It will involve several projects, such as maintaining and improving the current reference genome to ultimately produce a “Gold Standard” reference genome sequence that is manually and functionally annotated; sequencing other varieties of wheat in order to represent the worldwide diversity of wheat; and continuing to develop a database for the wheat community to access all these genomic resources.