|
Why
SNPs?
Strengths
- The most abundant class of DNA polymorphisms.
- SNPs are the basis for a variety of ultra-high throughput
and massively parallel genotyping technologies.
- SNP markers are locus specific
- SNP markers are an excellent long term investment.
- SNP markers can be used to pinpoint functional polymorphisms.
- SNP assays typically require very small amounts of
DNA (typically 25 to 50 ng per individual).
Number
of SNPs: How many SNPs are there in the human genome? This
is the same as asking how many of the 3.2 billion sites
in the genome have variant forms, at frequencies above
the mutation rate. There is good information
on the proportion of sites that differ between two
randomly chosen homologous chromosomes. This
proportion is called the nucleotide diversity; it is
useful for comparing the amount of variability among
chromosome regions or among populations, and takes
into account the number of chromosomes examined (2). Many
SNPs were discovered in the overlap of the ends of
bacterial artificial chromosomes (BAC) clones used
to assemble the human genome, when these BAC clones
came from different individuals or from different chromosomes
from the same individual; the number of differences
between two chromosomes averaged 1/1331 sites of the
DNA sequence (3). Since people have two copies of all
chromosomes (except the sex chromosomes in males),
this means that any one individual is heterozygous
at about 3.2 billion bases 1 difference/1331 bases
= 2.4 million sites across all chromosomes.
When
two chromosomes are compared, they may have the same
base at a DNA site even though that site is polymorphic
in the population. The number of sites that vary
in a population cannot be estimated simply by counting
the number of sites that differ between two chromosomes. The
number of sites seen to have variants will rise as more
individuals are examined; the exact number will depend
on the distribution of the frequencies of the SNP alleles,
but many SNPs will be missed. For example, samples
of 10 chromosomes have a 97% chance of including both
SNP alleles when the minor allele frequency is at least
20% in the population, but only a 59% chance when the
minor allele frequency is at least 1% (4). Thus
small samples are going to miss many SNPs with common
alleles as well as most SNPs with rare alleles, and even
samples that are larger are going to miss many SNPs with
rare alleles. Based on neutral theory and the observed
rate of 1/1331 differences in two chromosomes, the estimate
of the number of SNPs in humans with minor allele frequencies
above 1% is 11 million (4). However, this estimate
misses SNPs that are rare overall but are more common
in some populations. Currently there is too little
information about the variation in rare allele frequencies
among populations as well as about the deviations from
the assumptions of the neutral model to make a good guess
of the number of SNPs (5). A rough guess is that
there are about 10 - 30 million SNPs in the human genome,
or one on average about every 100 - 300 bases. Eventually
the number of SNPs will be found empirically, as many
individuals are genotyped across the genome.
The
Pattern of Human SNP Variation Humans arose about 100,000 - 200,000
years ago in Africa, and spread from there to the rest
of the world (6). The original population was polymorphic,
and so populations around the world share most polymorphisms
from our common ancestors. For example, all populations
are variable at the gene for the ABO blood group. About
85- 90% of human variation is within all populations
(7). Thus any two random people from one population
are almost as different from each other as are any two
random people from the world. Mutations have arisen
in populations since humans spread around the world,
so some variation is mostly within particular populations. Variants
that are rare are likely to have arisen recently, and
are more likely than common variants to be found in some
populations but not others (8,9). Common variants
are usually common in all populations. Only a small
proportion of variants are common in one population and
rare in another. Usually, a difference among populations
is of the sort that a variant has a frequency of 20%
in one population and 30% in another.
References:
1. N Patil et al. (2001) Blocks of
limited haplotype diversity revealed by high-resolution
scanning of human chromosome 21.Science 294, 1719-1723.
2. Hartl, D. L. and Clark, A. G. (1997) Principles of
Population Genetics, 3rd ed. Sinauer, Sunderland, MA.
3. The International SNP Map Working Group (2001) A map
of human genome sequence variation containing 1.42 million
single nucleotide polymorphisms. Nature 409, 928-933.
4. Kruglyak, L. and Nickerson, D. A. (2001) Variation is the spice of
life. Nat. Genet. 27, 234-236.
5. Przeworski, M., et al. (2000) Adjusting the focus on human variation.
Trends Genet. 16, 296-302.
6. Tishkoff, S., et al. (1996) Global patterns of linkage disequilibrium
at the CD4 locus and modern human origins. Science 271, 1380-1387.
7. Barbujani, G., et al.
(1997) An apportionment of human DNA diversity. Proc.
Natl. Acad. Sci. USA 94, 4516-4519.
8. Rieder, M. J., et al. (1999)
Sequence variation in the human angiotensin converting
enzyme. Nat. Genet. 22, 59-62.
9. Nickerson, D. A., et al. (1998) DNA sequence diversity
in a 9.7- kb region of the human lipoprotein lipase gene.
Nat. Genet. 19, 233-240. |