Genome mapping - The Anatomy of the Human Genome

Two main methods of genome mapping

Genetic mapping Physical mapping
Establishment of relative positions of genes. Assigning genes to particular positions along a DNA strand (along a chromosome)
Measurement of a tendency of genes to segregate together through meiosis. Studying somatic cells and their genetic material
Performed in family studies. Involves some physical measurements/procedures

Genetic mapping


Locus (pl. loci)-

a position of a gene along the chromosome.


Allele is a form of a gene - most of the genes have different forms with different phenotypic effects.

Genetic mapping

  • The basis was established in the beginning of this century by Morgan.
  • Allows identification of genes that are detectable only be their phenotypic effect.

One can map a gene not knowing its function nor sequence.

Meiosis and cross-over - basis for linkage

Recombination can be detected indirectly

It is not possible to see if the recombination has occured. It is a molecular event. But analysis of the progeny phenotype can tell if a recombination has occurred, or not

Genetic linkage

  • Assuming that the cross-over is a random process, the frequency of recombination between two loci would reflect the distance between two gene loci.
  • Genetic linkage is a tendency of 2 genes not to recombine (another words: to pass together) through the meiosis.
  • Measuring this tendency would allow to estimate the genetic distance (or closeness) between two loci.
  • Important: two genes on the same chromosome (syntenic) are not necessarily linked.

Units of linkage - ‘Morgans’

  • An unit - Morgan - for genetic linkage is such genetic length of a chromosome over which one recombination event is observed per meiosis
  • This means that at such a distance there is 100% chance of occurring a cross-over
  • centiMorgan (cM) means 1% chance of recombination

Strategies for genetic mapping

  • Linkage analysis in organisms such as fruit fly or mouse which can be subjects of experimental test crosses
  • Linkage analysis in humans which cannot be test crossed but the analysis of family pedigrees can be adopted
  • Linkage analysis in bacteria which do not go through meiosis

Pedigrees for linkage analysis

  • A family in which we intend to map a gene must also fulfill following criteria:
    • Must be informative - that is parent which carry a disease locus is heterozygous both at disease locus and marker locus
    • Phase must be known that is what is alignment of disease and marker alleles

Example of pedigree analysis in human
Possible interpretations of the pedigree

Mother’s chromosomes:
Hypthesis I Hypothesis II
disease | M1 health | M1
health | M2 disease | M2
1 disease | M1 parental recombined
2 health | M2 parental recombined
3 disease | M1 parental recombined
4 disease | M1 parental recombined
5 health | M2 parental recombined
6 disease | M2 recombined parental
Recombination frequency 1/6=6,7% 5/6=83,3%

Nail patella syndrome and ABO blood group

Nail-patella syndrome is AD syndrome, always showing some expression with close linkage to ABO blood group locus (10 cM)

Multiple dysplasias of osseous and other mesenchymal tissues (hypoplastic and split nails, hypoplastic to absent patella, dark, cloverleaf pigmentation at inner margin of inner margin of iris

LOD score

To estimate if two loci are linked we need to:

  • Calculate series of likelihoods that two loci are linked at various values of θ (theta) where θ=0.00 - no recombination up to θ=0.50 - random assortment
  • Calculate that these loci are not linked (theta=>0.50)

Logarithm of ratio of those two values gives LOD (logarithm of odds) score (Z)
Maximum likelihood estimation

LOD >= 3 (equivalent to greater then 1000:1 in favor of linkage) is considered a proof for linkage between 2 loci (at the given θ (theta))

The θ (theta) at which LOD is the greatest is the genetic distance between two loci

Genetic distance

  • The human genome is about 3000 cM long and consists of 3 billion base pairs (bps).
  • 1 mln bps roughly corresponds 1 cM.
  • Chromosomes can be about 100 - 300 cMs long what means 1 - 3 crossovers per chromosome.
  • Average recombination rate increases as the length of the chromosome arm decreases.

Some remarks on genetic distance

  • Genetic distances are approximately additive.
  • Genetic distance and physical distance are not the same!
  • The frequency of crossover during oogenesis is roughly twice of that during spermatogenesis - genetic distances of “female” chromosomes are longer then “male” ones.

Rate of recombination and length of chromosome arm
For large chromosomes, the average recombination rates are very similar, but as chromosome arm length decreases, average recombination rates rise markedly.

Physical mapping

Physical mapping

  • Uses molecular biology techniques to establish position of characteristic sequences in DNA molecules
  • The ultimate goal of physical mapping is the complete sequence of a genome – this corresponds to mapping with 1 base pair resolution

Physical mapping - road map

FISH method of physical mapping

Methods of genome sequencing

  • Hierarchical shotgun method- sequencing of overlapping large-insert clones spanning the genome - applied by human genome project an international, publicaly funded effort
  • Whole genome shotgun sequencing - applied by Celera Genomics of Rockville, Maryland
  • The hierarchical shotgun sequencing strategy

Genome fragmentation


It is impossible to sequence whole genome

Genome fragmentation is an initial stage in preparing a library of clones

Clones in a library?

After genome fragmentation fragments are separated and inserted into vectors which allow manipulation and cloning in host cells (E. coli) in BACs or bacterial artificial chromosomes, which make a large-insert cloning system.

Clones library is a set of vectors with inserts.
Genome-wide physical map of clones

Clones in a given library have to be aligned in proper order

For this several methods can be used among others:

STS mapping

Restriction enzyme fingerprinting

Those methods relay on finding unique sequences in a clone

STS mapping

STS (sequence tagged sites) are short DNA regions unique for whole genome

This means that two clones containing same STS must overlap

PCR is performed using a pair of primers specific for that region
Restriction enzyme fingerprinting

DNA from each clone digested with an restriction enzyme

Sizes of the resulting fragments measured on agarose gel electrophoresis

Banding patterns compared

Maximum length of sequencing is ~750 bp
Finding genes

Comparison of functional cloning and positional cloning
Positional cloning

Applications of human gene mapping

  • Allows mapping and cloning of disease genes,
  • Testing hypotheses about genetic background of diseases
  • Diagnostic information in genetic counseling

Human Genome Project (HGP)

Human Genome Project

Began in the USA in 1990 when the National Institutes of Health and the Department of Energy joined forces
HGP scientists:

  • Mapped & sequenced the genomes of important experimental organisms
  • Completed working draft covering 90% of the genome in 2000 (published February 2001)
  • Completed in 2003, the Human Genome Project (HGP) was a 13-year project coordinated by the U.S. Department of Energy and the National Institutes of Health. During the early years of the HGP, the Wellcome Trust (U.K.) became a major partner; additional contributions came from Japan, France, Germany, China, and others.

Project goals were to

  • identify all the approximately 20,000-25,000 genes in human DNA,
  • determine the sequences of the 3 billion chemical base pairs that make up human DNA,
  • store this information in databases,
  • improve tools for data analysis,
  • Map and sequence the genomes of important model organisms
  • transfer related technologies to the private sector,
  • address the ethical, legal, and social issues (ELSI) that may arise from the project.

HGP goal - map and sequence the human genome:

  • To construct detailed genetic and physical maps of the human genome;
  • To determine the complete nucleotide sequence of human DNA to grater then 99,99% accuracy;
  • Map all the human genes
  • Chart variations in DNA spelling among human beings

The ethical, legal, and social implications (ELSI)

Three to five percent of the HGP budget funded research on the ethical, legal, and social implications (ELSI) of having so much new genetic information about our species

Mutation rate is about twice as high in male as in female

Draft and finished genome sequence

Generating a sequence of the human genome involved three steps:

  • Selecting the BAC clones to be sequenced,
  • Sequencing them,
  • And assembling the individual sequenced clones into an overall genome sequence.

For draft sequence the 4-fold average sequence coverage was required with no clone below 3-fold (corresponding to 99% accuracy).
For finished sequence it is about 9-fold (99,99%).

Draft human genome sequence
Published in February 2001by HGP in Nature (Feb. 15, 2001) and Celera Genomics
in Science (Feb. 16, 2001)
Are freely accessible in the Internet
Human Genome on Nature web pages - free
The Sequence of the Human Genome, Venter et al.

Conclusions from draft sequence of human genome

  • The human genome contains 3164,7 million bases
  • Average gene consists of 3000 bases
  • 30,000 – 40,000 protein coding genes, more then 50% of unknown function
  • The full set of proteins (the ‘proteome’) encoded by human genome is more complex than those of invertebrates

The wheat from the chaff

  • Only 2% of the genome encodes proteins
  • Hundreds of human genes appear likely to have resulted from horizontal transfer from bacteria
  • About half of human genome derives from transposable elements (but in in the human genome most of them are inactive)
  • Segmental duplications are much more frequent in humans then in yeast, fly or worm (the pericentric and telemeric regions are filled with them)

HGP and medicine

  • Will help reveal which genes contribute to the risks for common diseases.
  • Bring to light the molecular processes that normally maintain the human body in good working order.
  • Allow the prediction of individuals' responsiveness to particular drugs (pharmacogenomics).

Single nucleotide polymorphism (SNP)

  • Single nucleotide polymorphism is a polymorphism caused by the change of a single nucleotide.
  • Most genetic variation between individual humans is believed to be due to SNPs.
  • Over 1.42 million SNPs have been identified.
  • An average density is one SNP every 1,9 kb.
  • The order of almost all (99,9%) nucleotide bases is exactly the same in all people

Human Genome Organization (HUGO)

  • Organization of scientist involved in the HGP
  • Fosters the exchange of data and biochemicals
  • Encourages the spreading and sharing technologies
  • Provides an information on aspects of human genome projects
  • Serves as an interface between the community of researches and funding agencies
O ile nie zaznaczono inaczej, treść tej strony objęta jest licencją Creative Commons Attribution-ShareAlike 3.0 License