Know more

Our use of cookies

Cookies are a set of data stored on a user’s device when the user browses a web site. The data is in a file containing an ID number, the name of the server which deposited it and, in some cases, an expiry date. We use cookies to record information about your visit, language of preference, and other parameters on the site in order to optimise your next visit and make the site even more useful to you.

To improve your experience, we use cookies to store certain browsing information and provide secure navigation, and to collect statistics with a view to improve the site’s features. For a complete list of the cookies we use, download “Ghostery”, a free plug-in for browsers which can detect, and, in some cases, block cookies.

Ghostery is available here for free:

You can also visit the CNIL web site for instructions on how to configure your browser to manage cookie storage on your device.

In the case of third-party advertising cookies, you can also visit the following site:, offered by digital advertising professionals within the European Digital Advertising Alliance (EDAA). From the site, you can deny or accept the cookies used by advertising professionals who are members.

It is also possible to block certain third-party cookies directly via publishers:

Cookie type

Means of blocking

Analytical and performance cookies

Google Analytics

Targeted advertising cookies


The following types of cookies may be used on our websites:

Mandatory cookies

Functional cookies

Social media and advertising cookies

These cookies are needed to ensure the proper functioning of the site and cannot be disabled. They help ensure a secure connection and the basic availability of our website.

These cookies allow us to analyse site use in order to measure and optimise performance. They allow us to store your sign-in information and display the different components of our website in a more coherent way.

These cookies are used by advertising agencies such as Google and by social media sites such as LinkedIn and Facebook. Among other things, they allow pages to be shared on social media, the posting of comments, and the publication (on our site or elsewhere) of ads that reflect your centres of interest.

Our EZPublish content management system (CMS) uses CAS and PHP session cookies and the New Relic cookie for monitoring purposes (IP, response times).

These cookies are deleted at the end of the browsing session (when you log off or close your browser window)

Our EZPublish content management system (CMS) uses the XiTi cookie to measure traffic. Our service provider is AT Internet. This company stores data (IPs, date and time of access, length of the visit and pages viewed) for six months.

Our EZPublish content management system (CMS) does not use this type of cookie.

For more information about the cookies we use, contact INRA’s Data Protection Officer by email at or by post at:

24, chemin de Borde Rouge –Auzeville – CS52627
31326 Castanet Tolosan CEDEX - France

Dernière mise à jour : Mai 2018

Menu Logo Principal



Glossary of terms used frequently in genome sequencing.

This glossary was compiled using the following sources; please refer to them for additional information:


The process of identifying regions of a genome sequence that are associated with specific functions and adding pertinent biological information to these sequences; for example, the specific gene for which the sequence codes


The process of taking fragments of DNA sequences and putting them together by matching overlapping sequences to create a representation of the original DNA that was sequenced.


Molecules that form DNA molecules, also called nucleotides, known by their abbreviations: A (adenine), T (thymine), C (cytosine) and G (guanosine).

Bases can form bonds with each other: A bonds only to T and C only with G, linking the two strands in the helical structure of DNA.

Base Pair

Unit of DNA comprising two bases on reciprocal strands commonly used to measure the size of genomes. The wheat genome has 16-17 billion base pairs, or pairs of DNA “letters” (A, T, C, and G).


The science of managing and analyzing biological data using advanced computing techniques.

BAC (Bacterial Artificial Chromosome)

An engineered DNA molecule used to clone DNA sequences in bacterial cells (for example, Escherichia coli). Segments of an organism's DNA, ranging from 100,000 to about 300,000 base pairs, can be inserted into BACs. The BACs, with their inserted DNA, are then taken up by bacterial cells. As the bacterial cells grow and divide, they amplify the BAC DNA, which can then be isolated and used in sequencing DNA.

BACs have proved very useful for producing physical maps and sequencing of large genomes, such as the human, rice, mouse and bread wheat genomes.

BAC Library

Because large genomes are difficult to sequence as a whole, the DNA is fragmented in small segments that are inserted into BACs and amplified. A BAC library is a collection of all the BACs produced in the process, representing the entire genome of an organism.


Basic Local Alignment Search Tool. A computer program used to perform sequence comparisons.


The smallest unit of life that can exist independently. All organisms are made up of one or more cells.


A piece of DNA that is formed into a compact structure by folding and association with specific proteins. Each species has a characteristic number of chromosomes. Bread wheat has 42 chromosomes: three sets of 7 pairs of chromosomes that are derived from ancestral diploid species.

Comparative Genomics

The science of comparing the genome sequences of different species to discover similarities and differences in biology. For instance, genome scientists and breeders might compare the genomes of cultivated wheat varieties with those of wild species to understand evolution or to increase the diversity of cultivated varieties through crossing with wild species.


Short for “contiguous sequence”. A piece of DNA sequence that has been assembled from overlapping sequence fragments.


A cell or organism that contains two copies of each chromosome.

DNA (deoxyribonucleic acid)

A molecule found in all living organisms that carries the genetic information.

The DNA molecule consists of two strands – or chains – of nucleotides joined together by bonds, forming a shape known as double-helix.

DNA Sequence

The order of genetic “letters,” or nucleotides, in a piece of DNA. For instance: ACGTACGTACGT

Draft sequence

A sequence that has been assembled into contigs, but a proportion of the sequence is missing (i.e., there are gaps) and the complete order and orientation of the fragments is unknown.

Functional genomics

The study of how genomes function, including the identification and regulation of genes, their resulting proteins, and the role played by the proteins in biochemical processes.


A gene is the basic physical and functional unit of heredity (i.e., the inherited properties of an organism that is passed from one generation to the next). Genes are made up of nucleic acid, are linear molecules consisting of a string of four nucleotides (in DNA, A, T, G, C); they provide instructions or a part of the instructions necessary to make molecules called proteins. In genomics, a gene is an ordered sequence of nucleotides located in a particular position on a chromosome.

Genetic Marker

An easily identifiable piece of genetic material, e.g., a gene or a portion of DNA, with a known location on a chromosome that can be tracked from one generation to the next.


All the genetic material in the chromosomes of a particular organism.

A genome contains the biological information for building, running, and maintaining an organism—and for passing life on to the next generation. Nearly every cell of an organism contains a complete copy of its genome.

Genome map

A map of the relative positions of landmarks within a genome, their chromosomal position, and the distances between them. Landmarks might include short DNA sequences, regulatory sites that turn gene on or off, and genes.

Genetic map

A map of the relative positions of genes, genetic markers, and other features within a chromosome or genome determined on the basis of recombination frequency between markers.


The study of the structure and organization of genomes, their individual elements (e.g. genes), how they function, and how they are regulated.


In genomics, a region of the genome that is not represented in a map or by sequence.


Containing six sets of chromosomes in each cell.

The bread wheat genome is hexaploid, containing three sets of 7 pairs of chromosomes.

High-throughput sequencing

A rapid method of determining the order of the DNA bases of a genome. With this method, some small genomes can be sequenced in just a few days.

Kilobase (kb)

Unit of length for DNA fragments that equal 1000 nucleotides.

Minimum tiling path (MTP)

MTPs are ways of sequencing a chromosome or genome by dividing the genome into BACs then sequencing and assembling them. The MTP refers to an ordered list or “map” of the minimum set of overlapping BACs necessary to provide complete coverage of the whole chromosome or genome.

Non-coding DNA

DNA in the genome that is not directly involved in making proteins or other molecules.

About 98 % of the wheat genome consists of non-coding DNA. The functions of most non-coding fragments are not yet known; recent evidence suggests that they are involved in controlling the activity of genes.


The four chemical subunits of the DNA molecule, also called bases, known by their abbreviations A, T, C, and G.


The set of observable characteristics of an organism.

These characteristics can be controlled by genetics, by the environment, or a combination of both.

Positional cloning

A technique used to identify and isolate genes, usually those that are associated with a specific trait, based on their physical location on a chromosome. Traits are usually positioned first on the basis of proximity to genetic markers associated with chromosomal regions. Then, if a physical map covering the region is available, they are positioned relative to BACs across the region and subsequently to genes annotated in the BAC sequences.

Physical map

A map of the locations of identifiable landmarks on a chromosome or genome. Physical maps are an alignment of sequences (BACs) with distance between markers measured in base pairs. A physical map often refers to a map of overlapping BAC clones from a library that shows the relative positions of the clones along chromosomes. High resolution physical maps serve as a scaffold for genome sequence assembly.


A representation of the entire sequence of a chromosome that is assembled from smaller sequence contigs. In most cases, the pseudomolecule is ordered using physical and genetic map information.

Quantitative trait locus (QTL)

Stretch of DNA containing or linked to genes that underlie a trait.


The exchange of DNA sequence between sister chromatids during meiosis.

Reference sequence

The formally recognized, verified genome sequence of an organism that is used as a representative example of the genome for a particular species. A reference sequence is useful for assembling and comparing individual genomes of the same species (e.g., comparing elite varieties of wheat with the reference sequence for the purpose of understanding the inherited basis of key traits).


The sequential order of nucleotides (genetic “letters”) in a piece of DNA. A short DNA sequence might be: ACGTACGTACGT


The determination of the sequential order of nucleotides in a piece of DNA or an entire genome.

Single Nucleotide Polymorphism (SNP)

A variation in a single base (A, T, C or G) found when comparing the same DNA sequence from two different individuals in the same species.

Shotgun Sequencing (Also called Whole-Genome Shotgun Sequencing)

A laboratory technique for determining the DNA sequence of an organism's genome. The method involves breaking the genome into a collection of small DNA fragments (typically 600bp to 50kb in size, depending on sequencing technology) that are sequenced individually. A computer program looks for overlaps in the DNA sequences and uses them to place the individual fragments in their correct order to reconstitute the genome.


A physical or agronomical characteristic – such as high yield, resistance to pathogens, resistance to a stress.

Whole Genome Assembly

A whole genome assembly is the process of taking fragments of DNA sequences from an entire (whole) genome and, using high throughput technology, joining them by matching overlapping sequences to create a representation of the original DNA that was sequenced. This contrasts with sequence assemblies of individual chromosomes/chromosome arms.