Purpose of the Study. To determine the genetic sequence of the entire human genome.
Study Population. Five normal volunteers: 1 African American, 1 Asian-Chinese, 1 Hispanic-Mexican, and 2 Caucasians.
Methods. A 2.91-billion base pair (bp) consensus sequence of the human genome was generated. Two assembly strategies—a whole-genome assembly and a regional chromosome assembly—were used, each combining sequence data from Celera and the publicly funded genome effort.
Results. The 2 assembly strategies yielded very similar results that largely agree with independent mapping data. Analysis of the genome sequence revealed 26 588 genes for which there was strong corroborating evidence and an additional ∼12,000 likely genes based on weaker evidence. Only 1.1% of the genome is spanned by exons (coding regions), whereas 24% is in introns (sequences within the coding region of the gene that are not translated into protein), and 75% of the genome is intergenic DNA. DNA sequence comparisons between the consensus sequence and publicly funded genome data provided locations of 2.1 million single-nucleotide polymorphisms (SNPs), and <1% of all SNPs resulted in variation in proteins.
Conclusions. The “shotgun” sequencing approach rapidly and accurately generated a nearly-complete (∼95%) sequence of the human genome. Major surprises include the identification of a surprisingly low number of genes, estimated at 26 000 to 38 000, compared with previous estimates of up to 140 000. The polymorphisms that have so far been identified represent only a small fraction of the whole human population, and will be useful in both tracing human origins as well as identifying genetic variants that are associated with specific diseases. Analysis of the sequence data underscores previous findings that genetic differences between humans arise from only about 0.1% of the total sequence. Now that this “blueprint” of the human genome has been constructed, the next steps will be to identify genes and control elements, their functions, sequence variation among the human population, and the relationship between sequence variation and gene function.
Reviewer’s Comments. What a way to kick off the new millenium! Dual papers were published in Science and Nature (International Human Genome Sequencing Consortium, Nature 2001;409:860) to describe the efforts and results of the genome sequencing projects led by the private group at Celera and the public Human Genome Project respectively, and this entire issue of Science is devoted to the related scientific, ethical, and societal issues. There isno doubt that this is one of the landmarks of human achievement, but as the authors point out, this is only the beginning. The following quote by Eric Lander, who is the lead author of the Nature paper, helps to put things in perspective: “We’ve called the human genome the blueprint, the Holy Grail, all sorts of things. It’s a parts list. If I gave you the parts list for the Boeing 777 and it has 100 000 parts, I don’t think you could screw it together, and you certainly wouldn’t understand why it flew.” Given the apparent complexity of the genetics of allergy and asthma, it is likely that we will know that parts are broken before we truly understand the functional consequences.
- Venter JC, and teams of scientists from Celera Genomics (Rockville, MD), GenetixXpress (Sydney, Australia), University of California-Berkeley, Penn State University, Case Western Reserve University, Johns Hopkins University, Rockefeller University, New England Biolabs (Beverly, MA), California Institute of Technology, Yale University, Applied Biosystems (Foster City, CA), The Center for Genome Research (Rockville, MD), Bar Ilan University (Ramat-Gan, Israel), and Universitat Pompeu Fabra (Barcelona, Spain). Science.2001;291 :1304– 1351
- Copyright © 2002 by the American Academy of Pediatrics