Detecting Lehi's Genetic Signature: Possible, Probable, or Not?
David A. McClellan
The influence genetics and genetic information have had on the overall body of scientific knowledge cannot be overestimated. Genetic research has substantively enhanced our ability to treat medical conditions ranging from inherited genetic disorders to worldwide viral epidemics. It has revolutionized the way we think about and study the natural world, from cells to organisms, from species to ecosystems. It factors into pharmaceutical discovery and vaccine design, plant and animal domestication, and wildlife conservation. Needless to say, we now know much more about genetic concepts and applications than in even the recent past. In fact, our body of knowledge has grown so vast that mastery of all aspects of genetic research by a single researcher is now virtually impossible. For this very reason, minor misunderstandings abound, both among the lay public and within the scientific community.
One such misunderstanding is the current controversy over DNA evidence and its bearing on the veracity of the Book of Mormon. On the one hand, statements by the Prophet Joseph Smith indicate that Native Americans are descended from the Lamanites. On the other, recent scientific studies have evaluated the current genetic compositions of selected worldwide human populations, and several of these have concluded that the principal genetic origin of the sampled Native American peoples has been Asiatic, likely due to the constant documented flow of humans back and forth across the Bering Strait.1 The real issue, however, is not necessarily if Native Americans are the inheritors of Asian genetic material; it is whether or not this evidence refutes the story line of the Book of Mormon and the claims of Joseph Smith relative to Native Americans.
The question of whether the Americas were populated prior to the arrival of the Lehites and Mulekites is addressed elsewhere in this number, as well as the implications of the messages of the Book of Mormon and the statements of Joseph Smith.2 Both are important components of this complex challenge. The remaining challenge left to be addressed relative to this issue is whether or not we are to infer from recent scientific evidence that the Book of Mormon and associated Latter-day Saint doctrine are false.
First, however, I feel compelled by my faith to state that the only reliable way to test the veracity of the Book of Mormon or statements by modern prophets such as Joseph Smith is to put Moroni's promise to the test on a personal level:
Behold, I would exhort you that when ye shall read these things, if it be wisdom in God that ye should read them, that ye would remember how merciful the Lord hath been unto the children of men, from the creation of Adam even down until the time that ye shall receive these things, and ponder it in your hearts.
And when ye shall receive these things, I would exhort you that ye would ask God, the Eternal Father, in the name of Christ, if these things are not true; and if ye shall ask with a sincere heart, with real intent, having faith in Christ, he will manifest the truth of it unto you, by the power of the Holy Ghost.
And by the power of the Holy Ghost ye may know the truth of all things. (Moroni 10:3—5)
Attempting to settle the matter solely upon the merits of empirical data will always leave one wanting.
That stated, the purpose of this essay constrains me to deal exclusively with those aspects, concepts, and principles of science that may contribute to a complete—or as complete as possible—understanding of the essential question at hand. Within this essay, therefore, I intend to present the basic biological principles that are, in my opinion, relevant to whether it is possible to identify the genetic signature of Lehi or Mulek; address the question using the powerful tools of scientific method and population genetic theory; and briefly review the current status of human population genetics in the context of these principles and concepts, outlining some of the limits under which genetic data may be interpreted.
The background information presented herein is meant as a supplement for the nonscientist. Explanations about what a chromosome is or how genetic information is used in population studies may not be directly pertinent to the essential question of this essay, but they are meant to serve as a primer for the uninitiated. Some of these informational reviews may seem burdensome to those that may have substantial backgrounds in biology. To readers who fit into this category, I would suggest skipping directly to the conclusions section.
As outlined above, the central question of this essay is whether acceptance of current genetic data necessitates the wholesale rejection of the Book of Mormon story line and the claim that Native Americans are descended from Lamanitish ancestors. On the surface, given certain characteristics of the data it appears that this may be possible. This may seem threatening to the Latter-day Saint layperson, who may therefore be tempted to discount the science surrounding the matter rather than sacrifice belief in the Book of Mormon. Before either of these alternatives becomes a "logical" conclusion for anyone, though, let us redefine the issue in terms of an essential question that may be scrutinized directly by scientific evaluation philosophically, theoretically, and empirically.
In my opinion, the most plausible essential question having to do with human genetic data may be something like: Is it possible to recover a genetic signature from a small migrating family from 2,600 years in the past? To answer this question in a coherent manner, let me first present a few basic concepts by which all genetic hypotheses are tested; these will empower nonbiologists to judge for themselves the accuracy of the conclusions presented herein. I am confident that the conclusions of this essay, emergent from the accepted principles of biology, will illustrate the complete harmony between scientific thought and the fundamentals of Latter-day Saint belief.
At the very heart of the question posed above are the basic principles of genetics and evolution as they have unfolded over the past 150 years and especially in the past 50 years. The discoveries over this period of time have been numerous—too numerous to describe in any detail. Our knowledge, however, remains far from complete—constant controversies arise within the scientific community over minute theoretical details, and much remains to be discovered. Nevertheless, there is little controversy over the basic principles of the science; these have been verified in many different ways and have survived the test of time and effort: 150 years of scientific method seeking to displace previously held ideas with more general explanations.
Most cells that constitute the human body contain a more or less complete copy of the human genetic complement. This genetic complement comes in two varieties, each with a unique function and a unique genetic language, or code. First, the nuclear genome, the genetic complement that resides in the nucleus of each cell, comprises by far the greatest portion of cellular genetic material. It is governed by the universal genetic code, the standard genetic language used to create the vast majority of cellular proteins produced naturally within the bodies of most currently living species of organisms. In human beings, it encodes proteins from insulin to hemoglobin. Second, we possess another genome that, in most cells, resides in tiny intracellular structures known as mitochondria, the powerhouses of the cell. The few proteins produced by this mitochondrial genome work in conjunction with nuclear proteins to manufacture the energy needed for cells to function. Cells that need more energy, such as muscle cells, have more mitochondria, each of which contains a complete mitochondrial genome. The genetic code that governs man's mitochondrial genome—and is shared by the mitochondrial genomes of all vertebrate organisms, including fish, amphibians, reptiles, birds, and mammals—differs from the universal code in only a few ways, but those few differences can have significant effects on the long-term molecular evolution of intracellular metabolism.3
Nuclear genomes. The genetic material of every genome, human or otherwise, is composed of deoxyribonucleic acid, or DNA. In man and in all plants, animals, and fungi, DNA is organized into discrete packages called chromosomes. The basic unit of the chromosome is the nucleosome, a structure that is composed of several proteins around which is twice wrapped a strand of DNA that is held in place by another protein, much like you might place your finger on a ribbon when helping someone tie a bow on a gift box. Nucleosomes connected by DNA are coiled into a fiber called chromatin, which is looped and coiled to form the arms of a chromosome (see fig. 1). The human nuclear genome contains 46 chromosomes that come in 23 homologous pairs—that is, they correspond in structure and in the sequence of genes. Each chromosome in a pair was inherited from a parent, one being maternal in origin and the other paternal. The sex chromosomes (referred to as X and Y) are inherited this same way, but the Y chromosome is always paternally inherited; females inherit one X chromosome from each parent, while males always inherit an X chromosome from their mother and a Y chromosome from their father.
Along each chromosome lie several regions that encode either a protein or a ribonucleic acid (RNA) molecule. The precise number of human coding regions, or genes, remains to be determined but is currently in the process of being resolved. Estimates from the year 2000 placed the range of this number from around 35,000 to 120,000 protein-coding genes,4 while estimates from the year 2001 derived from the results of the Human Genome Project confirmed the lower portion of this range, around 23,000 to 39,000 genes (26,383 genes have now been confirmed by multiple lines of evidence).5 There are also regions that do not encode genes but may have a distinct genetic history nonetheless. The diversity among noncoding regions is truly amazing, and many are even viral in origin and are thus parasitic to our genome. In several genetic studies, coding regions are used to estimate genetic diversity and identity, but many noncoding regions are also used as diagnostic genetic markers.
Just as the basic unit of the chromosome is the nucleosome, the basic unit of DNA itself is the nucleotide. The entire human nuclear genome is approximately 3.175 billion nucleotides in length,6 2.91 billion of which appear to contain active DNA.7 Nucleotides come in four types, with their names and classifications being based on their chemical structure: there are two pyrimidines, referred to as cytosine and thymine, and two purines, adenine and guanine. These nucleotides bind together in triplet sets, or codons, which form the basic unit of the genetic code. Each possible combination of three nucleotides either directly encodes an amino acid, the basic unit of proteins (in the universal code, this accounts for 61 of the 64 possible codons), or encodes what is known as a termination signal that basically tells the cellular protein-construction mechanism, the ribosome, to stop making a particular protein.
Mitochondrial genomes. The mitochondrial genome is composed of a single, circular piece of DNA that is not very unlike the genomes of some bacteria. It is not packaged like the chromosomes of the nuclear genome, most probably because it is small enough that such complex organization is unnecessary. One unusual characteristic of the mitochondrial genome is that it is maternally inherited: every individual's mitochondrial genome is inherited from his or her mother. However, current evidence suggests that mitochondrial inheritance may not be exclusively maternal.8 The mitochondrial genome of every man most likely hits an abrupt dead end; he cannot pass it on to his children. However, if a man has sisters with children, his mitochondrial genome will live on in his nephews and nieces and in his nieces' children.
The human mitochondrial genome bears 13 protein-coding genes, 2 ribosomal RNA genes (to build the mechanism that interprets the genetic code), and 22 transfer RNA genes (that act as vehicles by which amino acids are guided into place in a growing protein). There is very little nonfunctional DNA within the mitochondrial genome, but a noncoding control or regulatory region called the D-loop figures prominently among DNA sequences used to reconstruct species relationships.9
Since the mitochondrial genome is inherited as a single unit, all the genes contained in it are linked. But unlike the nuclear genome, in which genetic information is routinely exchanged between homologous pairs—a process termed recombination, which will be discussed in more detail below—mitochondrial genomes have no opportunity to exchange information. This is a primary reason why they are often used to track lineages; a particular mitochondrial genetic variant (including all 37 coding regions and the D-loop) represents a single lineage and must be completely replaced in order to be unrecoverable or to become so obscure that it is very unlikely to be found by a scientist looking for it. This, initially, is one reason why the lack of a Middle Eastern genetic signature was so "troubling" to those anticipating it.10
DNAs encode, but proteins adapt. DNA is relatively protected from the demands and influences of the environment surrounding the cell because it is the task of proteins to interact with their surroundings and carry out functions; the primary responsibility of genes is to encode, whereas proteins must function properly to ensure the survival and reproduction of the organism. Thus, DNA is always at least one step removed from any influence that the environment may have on the organism. A change in DNA, referred to as a mutation, may or may not result in a change in the primary structure of the associated protein that interacts directly with the demands of the environment. If a given mutation in the DNA results in an amino acid change, however, the whole organism may pay the price by contracting a life-threatening disease. Examples include those rare cases of mutation in which people spontaneously develop cystic fibrosis11 or spinal muscular atrophy12 without having inherited the disease from either of their parents. The environment directly affects these unlucky recipients of a disease-causing mutation by making them less likely to survive to bear children and thus contribute to the gene pool. The unforgiving truth of the matter is that the great majority of possible mutations that occur in those regions of the genome responsible for the adaptation of the organism are deleterious in some way and are often fatal. More will be said below about the role of mutations in molecular evolution.
As mentioned above, nuclear chromosomes occur naturally in pairs, one inherited from each parent. The rules that govern inheritance of chromosomes were first discovered by Gregor Mendel (1822—1884), an Austrian monk who published his findings on the genetics of pea plants in 1865.13 The genetic principles enunciated by Mendel can be boiled down to two fundamental principles: segregation and independent assortment. These principles of inheritance, which will be described in more detail below, have since been confirmed as the processes that chromosomes go through prior to the creation of the specialized reproductive cells known as gametes (sperm and eggs). The processes of segregation and independent assortment of chromosomes can now be seen under a microscope just prior to the cell divisions that create gametes, but Mendel discovered these principles without knowledge of chromosomes. He was able to infer these truths by observing the frequency with which pea plants expressed different trait variants, such as height, coloration, and texture.
Mitosis and meiosis in nuclear genomes. Since the time of Mendel, biologists have determined that there are two different types of cell division in the human body. The most common, which takes place at one time or another in all somatic (or nongerminal tissue) cells, involves a process called mitosis, in which each of the 46 chromosomes, unpaired at this point, laterally splits to form two chromatids, each of which is composed of two arms—one on top and one on bottom—instead of the four illustrated in figure 1. These chromatids then migrate to the forming nucleus of a different daughter cell. At this time, each daughter cell will generally start to produce proteins and then undergo a synthesis phase that restores each chromosome to the form it had prior to mitosis. Mitotic cell division thus results in two daughter cells that are complete and exact copies of the mother cell. Mitosis takes place most rapidly during gestation, while the embryo is quickly developing. After birth, the rate of cell division slows dramatically, with some cell lines, such as in muscle and nerve tissue, coming to a complete stop.
The second type of cell division produces gametes—called gametogenesis—and occurs exclusively in specific places in the male and female gonads. Gametogenesis implements a process called meiosis, in which two successive cell divisions break down the genome so that, instead of having 23 pairs of chromosomes, the four daughter cells have 23 single chromosomes. Meiosis separates the homologous pairs in the first cell division and then laterally splits each chromosome into two chromatids in the second cell division. The first meiotic division is the point at which segregation and independent assortment physically take place. The second division is quite similar to the process seen in mitosis except that there are half the number of chromosomes.
At the beginning of the first meiotic cell division is a stage referred to as the pachytene stage, in which homologous chromosomes come very close together to form a structure called a tetrad, because each structure looks like it has four arms—two on top and two on bottom (see fig. 1). Because of the close proximity of homologous pairs, regions of chromosomes that encode the same type of genes are naturally attracted to one another. Quite often, there is an exchange of information between homologous chromosomes when large chunks of genetic material are swapped. This process, called recombination, is a very important mechanism for creating the genetic diversity that makes each of us unique. Most of the time these chunks are of roughly equal size, but sometimes they are not, creating redundancy in the genetic sequence of some chromosomes but eliminating potentially vital genes in others. Recombination, also referred to as crossing-over, is error prone, but these errors actually enhance the long-term survival of a species at the expense of a few individuals who end up without their full genetic complement. Unequal crossing-over is the principal genetic mechanism that gives rise to gene families via gene duplication. It allows for evolutionary specialization relative to different demands, such as those required by distinct stages of embryological development or the production of dissimilar cellular tissues such as muscle and bone. The genetic redundancy generated by unequal crossing-over does not produce additional body structures or superhuman qualities, but it does allow babies to produce proteins that are uniquely suited for proper maturation; the adult versions of the same proteins may not be appropriate for the distinctive changes a baby's body must go through to develop properly. It also allows the body to produce trypsin, which helps us digest protein in the digestive track, and haptoglobin, which binds free hemoglobin in the bloodstream. Although these proteins now have very different functions, they have retained similar structures, suggesting that they originated from the same generalized ancestral gene by unequal crossing-over.14 Truly novel protein structure is produced only rarely, so the creation of redundancy with the possibility of modification presents a wonderful opportunity for molecular adaptation to respond to constantly changing environmental conditions, changes both within the organism and from external surroundings.
Since linked genes (genes on the same chromosome) are inherited as a single unit more often than genes of different chromosomes, they will assort nonindependently—as discrete units—in the absence of recombination. Generally speaking, genes that are physically closer to one another on a chromosome assort nonindependently more often than genes that are farther apart. Inferring information about how frequently linked genes assort nonindependently is the basis upon which gene mapping is founded.
Segregation and independent assortment. As mentioned, the first stage of meiosis is the time at which the processes of segregation and independent assortment are likely to occur. Segregation, in modern terms, means that an individual's chromosome pairs are not likely to end up in the same gamete; instead, each gamete receives one chromosome from each pair. In accordance with this principle, human gametes do not have 46 chromosomes organized into 23 homologous pairs but have 23 single chromosomes, one from each homologous pair of the parent cell. Violations of this rule have serious genetic repercussions; they may result in spontaneous miscarriage of a poorly developed embryo or in developmental retardation of living offspring, as is the case with Down syndrome children.15
In terms of chromosomes, the concept of independent assortment is that as each chromosome pair segregates, either chromosome may go to either daughter cell without being influenced by what is happening in the segregation of the other pairs around it. As a result, a given gamete will generally carry an assortment of maternal and paternal chromosomes. This randomization of chromosomal assortment results in an enormous variety of possible genetic combinations that offspring may inherit from their parents. In humans, the number of possible combinations totals over 70 trillion (223 for each parent) for every set of parents, without considering mutation or recombination.
The processes of segregation and independent assortment apply to nuclear genetic material, which provides the greatest portion by far of an individual's genetic inheritance. Mitochondrial genes, on the other hand, do not follow the basic rules of segregation and independent assortment because mitochondrial genomes do not segregate at all. They are all inherited as a single unit, or linkage group, and always from one's mother. The reproduction of the mitochondrial genome is inherently asexual, each descendant genome being nearly an exact clone of its progenitor. Instead of millions of combinations that may be produced by segregation and independent assortment among nuclear chromosomes, the mitochondrial genome may only produce one kind of genetic offspring.
Individuals are genetically unique. With the exception of identical twins, segregation and independent assortment guarantee that every individual has a unique genetic complement. Coupling these genetic mechanisms with recombination and mutation, we can accurately conclude that every individual is genetically unique. This characteristic of genomic evolution, however, also leaves open the possibility that offspring may have genetic problems that their parents did not pass on to them. For example, one of the most studied genes in the human genome is the one responsible for cystic fibrosis, CFTR (cystic fibrosis transmembrane conductance regulator). A normal copy of this gene enables cells in the lining of the lungs to kill the bacterium Pseudomonas aeruginosa. It is estimated that 2 out of about 30,000 cystic fibrosis patients experience the onset of the disease because of new mutations.16 As can be seen in this example, however, mutation as a genetic mechanism is generally considered a weak evolutionary force, although it is constant and unforgiving. Mutation generally plays a much bigger role when considering genetic change over much longer periods of time, in terms of thousands of generations, especially if any of those changes are significantly affected by selection acting on the functional constraints of gene products.
According to neutral theory, which will be discussed below, most persistent changes, including most crossing-over events, are selectively neutral17 or nearly so.18 Thus, most changes that become diagnostic (like those that result in a unique genetic signature) do not have a significant effect on the reproductive success of any given individual. There are some changes, although rare, that may be adaptive in nature, and these also have distinct opportunities of becoming perpetuated in a genetic signature. Adaptive and neutral changes, therefore, allow unique diagnostic genetic signatures to develop over long periods of time—again, in the order of thousands of generations.
Genetic mutations may occur in a variety of forms, including single nucleotide-level point mutations, insertions or deletions of various sizes, gene duplications, chromosomal inversions, complete genome duplications (polyploidy), and so on. Most of these are relatively infrequent and probably have not contributed significantly to the evolution of the human genome within recorded history.19 The overall rate of mutation among humans, including all the types listed above, has been estimated to occur, on average, at a rate of 1.6 mutations per genome per generation,20 or about 5 x 10-10 mutations per nucleotide site per generation. Most of these mutations take the form of nucleotide-level point mutations, small insertions, or small deletions, especially within noncoding DNA regions that are largely free from functional and structural constraints. It is clear that noncoding DNA, such as that which appears within the numerous chromosomal microsatellite regions, may evolve several orders of magnitude faster, creating new short-tandem repeats (in which every repeat is only a few nucleotides in length but may exist as hundreds of copies, one right after the other) at a rate of one new repeat approximately every 833 generations.21 Regardless of which estimate one accepts, the mitochondrial genome evolves much faster—about 10 times faster22—than the nuclear genome, probably because mitochondrial DNA is maternally inherited and does not recombine, although there is considerable heterogeneity in both genomes.23 The exception is the Y chromosome, which is incredibly conservative in its rate of genetic change, probably due to what is known as a selective sweep, whereby a single, positively selected mutation pulls all other mutations with it to fixation (to a relative frequency within a population of 100 percent), resulting in very little genetic diversity within that particular linkage group.
Molecular-clock hypothesis and neutral theory. One implication of the relatively constant rate of genomic mutation is that mutation may be clocklike, or approximately constant, over extremely long periods of time.24 This led naturally to the idea that if the accumulation of mutations is clocklike, then the vast majority of persistent mutations are probably neutral—neither advantageous nor detrimental—or nearly so.25 This natural extension of the molecular-clock hypothesis has since become known as the neutral theory, or, more recently, as the nearly neutral theory.
These hypotheses form the conceptual backbone of subsequent studies that explore the mechanisms governing the accumulation of genetic variation in populations. They offer a convenient framework within which to implement scientific method for studying mutation rates and their implications. The conclusions resulting from such studies are equally informative whether the hypotheses are ultimately accepted or rejected. Additionally, the implications of acceptance or rejection of these hypotheses are extremely well explored in the theoretical literature. Thus, using them as a framework for research endows the researcher with the power to interpret experimental results easily. Despite the fact that they are often rejected, they have persisted as scientific tools that allow researchers the freedom to set up a predefined set of conditions, the rejection of which is often more interesting than acceptance would be.
Genetic drift and the probability that a mutant allele will become fixed. When a mutation takes place in a gene at a particular locus (the physical location of the gene on its respective chromosome), a new genetic variant, or allele, is born. Initially, a new allele exists at a very low frequency in a population; there is only one copy of it out of all of the chromosomes in all of the individuals in a population who possess it. If that new allele is to eventually be "successful" and become the standard version of the gene in the population, it must displace all alternative alleles and reach a frequency of 100 percent—it must become fixed. If, however, the allele is not "successful," it will eventually go completely extinct. This latter case is much more likely because of the low frequency at which the new allele starts out. It is possible, though, for the frequency of the allele in the population to remain constant under certain circumstances in a relatively isolated population that exists at a constantly large effective size.
Genetic drift is the idea that within a small effective population—that is, the number of individuals who are responsible for parenting children—random error causes successive generations to have slightly different allele frequencies due to the chance association of gametes, resulting in greater fluctuations in allele frequencies than if an effective population were very large. In large populations, new mutations have very little chance of becoming fixed or of even perpetuating for very long. If the effective population size is small, however, mutant alleles may become fixed much more easily because of the increased effect of genetic drift.
A real-world example governed by the same principle upon which genetic drift is based is a coin flip. Each possible result (heads or tails) may have a 50 percent chance of occurring, but in practice what actually happens depends on how many times the coin is flipped. Flip it 10 times and you may get, purely by chance, 4 heads and 6 tails—40 percent to 60 percent—which is not very close to the 50-50 split you predicted, even though the actual number of heads and tails tallied is only 1 off the prediction. Flip the coin 100 times and you may get 45 heads and 55 tails—45 percent to 55 percent—which is closer to your prediction, even though the actual number of heads and tails tallied is now 5 off the prediction. Now flip it 1,000 times, and you may get 490 heads and 510 tails—49 percent to 51 percent. Each time you increase the sample size an order of magnitude, you get closer to the predicted ratio of heads to tails. If you were to flip the coin an infinite number of times (which is not realistic, but for the sake of this example let's allow this extreme situation), you will most likely flip almost exactly 50 percent heads and 50 percent tails.
To make this example more similar to genetic drift, let's pretend that when you flip the coin the first 10 times, the results you tally actually determine the ratio of probabilities governing the next 10 flips. The first 10 times you flip the coin, you tally 4 heads and 6 tails. That result dictates that the probability of getting a head is now 40 percent and that of getting a tail 60 percent for the next set of 10 flips. With the probability of flipping a tail now increased, chances are good (50-50, to be precise) that the next set of 10 flips will weight the ratio even more in favor of tails. If this pattern continues, it will not take many sets of flips for the probability of flipping a tail to become 100 percent. If you were to increase the number of flips per set to 100, however, it would take longer for this to happen because each set of flips would most likely be closer to the predicted ratio. In fact, each time you increase the number of flips per set an order of magnitude, you would decrease the probability that random error would have a significant effect on the actual long-term results. This is exactly what makes allele frequencies drift in small populations. Each time there is a random error that makes the allele frequencies of a generation different from those of the one that precedes it, the probability of transmitting that allele to a subsequent generation changes in proportion. In this way, molecular evolution can take place even if no one allele has a distinct advantage or disadvantage.
The effect of selection on mutations in populations. Mutations must achieve a relative frequency of 100 percent in a population—that is, they must become fixed—to have a lasting evolutionary effect. However, most new alleles must travel a bumpy road to get to that point. According to neutral theory, most mutations are at least somewhat deleterious and are not perpetuated very long because the detrimental effects of deleterious mutations often result in decreased fitness, meaning that the organism possessing the mutation usually has fewer offspring than organisms of the same species that do not possess the mutation. The relative frequency of the mutant allele therefore decreases in the population from generation to generation. This decrease in fitness is said to be the effect of natural selection, or the idea that nature will determine how advantageous or disadvantageous a genetic variant is, just like a farmer may determine which domesticated animals he or she will breed based on desirable physical characteristics. In both cases, desirable variants are perpetuated, one by a discerning farmer and the other by nature itself.
If the environment in which an organism lives changes, however, the fitness of the organism may also change. One example of the differential influence of environmental conditions on fitness might be that of a woman with diabetes. If she is not under the care of a physician, she may have serious problems and not be able to bear children without drastically reducing her probability of survival. If, however, she is introduced to an expert endocrinologist specializing in diabetic care and has access to synthetically produced human insulin, she can lead a very normal life. The first case would result in the woman having a reduced fitness, while the second would potentially result in her relatively normal fitness. Although this is probably an oversimplified example, it illustrates how a change in environmental conditions may bring about a change in fitness. Another example might be a person who has sickle-cell anemia. In most places in the world, sickle-cell anemia results in a dangerous condition, especially during pregnancy, which can exacerbate the sickle-cell condition. It has been found, however, that people who are carriers of the sickle-cell trait are somewhat resistant to malaria. This may not have a significant effect in the United States, where malaria has been eradicated; but in Africa, where malaria is common and causes 2.7 million deaths per year,26 it may make a big difference. Not coincidentally, the highest incidence of sickle-cell anemia corresponds to those areas in which malaria is endemic and widespread.27 This associated trait of increased resistance to malaria may be why sickle-cell anemia still persists in the world despite its extremely detrimental side effects.
Unlike the sickle-cell allele, which bestows a benefit in certain places of the world when it is possessed by a carrier, most detrimental alleles will not be maintained in a population. Generally speaking, if a mutation is deleterious, it most probably will not become fixed in a population because deleterious alleles are more likely to result in a decrease in the number of offspring than are advantageous and neutral alleles. Due to genetic drift, however, a slightly deleterious allele may have a much greater chance of becoming fixed in a small effective population because the influence of genetic drift becomes stronger as population size decreases. Because of this, alleles that may be deemed detrimental in large populations and gradually disappear due to natural selection are said to be "effectively neutral" in smaller populations28 because they do not disappear, despite detrimental effects.
If a mutation is advantageous, almost the opposite is true. The recipient of an advantageous allele will, on average, bear more children, resulting in a faster increase in allele frequency than if it had not been advantageous. Advantageous alleles thus generally become fixed in a population relatively quickly. However, mutations resulting in new advantageous alleles are extremely rare according to neutral theory, so the accumulation of advantageous alleles is an inherently slow process, taking literally thousands of generations. Unlike detrimental alleles, advantageous alleles have less chance of becoming fixed in small populations. It may seem peculiar for genetic drift to have opposite effects on advantageous and deleterious alleles, but this serves a useful purpose in acting as a leveling influence in the evolutionary processes within small populations; increasing the probability of fixation among deleterious alleles while decreasing the probability of fixation among advantageous alleles results in both extremes behaving more nearly neutrally over time.
Genetic drift also acts on allelic variants originating in uniparental (or haploid) DNA—the maternally inherited mitochondrial genomes and paternally inherited Y chromosomes. Generally speaking, however, the random error associated with haploid alleles is roughly twice that associated with biparentally inherited (or diploid) alleles,29 meaning that the effect of genetic drift is amplified among mitochondrial and Y-chromosome alleles because they are inherited from only one parent. There are exceptions to this rule of thumb owing to the variety of ways in which homologous alleles interact in biparentally inherited DNA (such as dominance, codominance, and recessiveness), but in each case haploid alleles should theoretically experience more random error than diploid counterparts, resulting in selection having even less of an overall effect.
These are some of the most basic of the scientific principles that influence the dynamics of genetic variation in populations or factor into the question of human genetic ancestry. Although I have not yet addressed the probability of recovering a genetic signature from a single family migrating 2,600 years ago, I have presented all the pertinent scientific concepts that will assist me in doing so. What follows is a scientific approach to estimating this probability, be it high, low, or somewhere in between.
One of the most basic claims made by critics of the Book of Mormon based on human population genetic data is that the Book of Mormon story line presents a testable hypothesis. The fundamental assumption of this claim is that it is possible to recover the genetic signature of a small migrating family 2,600 years in the past. They further claim that the fact that no Middle Eastern genetic signature has been recovered indicates that the Book of Mormon is fictitious. These claims and associated assumptions have not been critically evaluated in light of scientific method and population genetic theory, the most basic scientific principles connected with the analysis of human population genetic data. In this section of the essay I will carry out the thought exercises necessary to evaluate the claim that the Book of Mormon story line is a testable hypothesis and the assumption that it is possible to recover the genetic signature of Lehi or Mulek.
The foundational philosophical assumption of scientific method must first be emphasized and, indeed, cannot be overemphasized: Nothing in science can be proven; hypotheses can only be rejected. In fact, rejectability is the central criterion of a hypothesis. If an idea is not rejectable, it is not a hypothesis nor can it be tested. Therefore, in the context of the present discussion we must clearly define the central essential question, identify alternative testable hypotheses for this question, and characterize the implications of each.30
The essential question as identified at the beginning of this review is as follows: Is it possible to detect an ancient genetic signature of a small migrating family, such as the family of Lehi or Mulek? Competing hypotheses relative to this essential question include the null hypothesis (the hypothesis that, upon rejection, would leave only one other alternative possibility such that interpretation of results is unambiguous), which might be phrased as follows: Based on the currently understood principles of science, it is possible to recover such a genetic signature. If the null hypothesis is rejected upon the analysis of available data, however, we are forced to accept the alternative hypothesis: It is not possible to recover such a genetic signature. These hypotheses may be more formally written thus:
H0: It is possible to recover the ancient genetic signature of small migrating families.
Ha: It is not possible to recover the ancient genetic signature of small migrating families.
If we fail to reject H0, implications may include the following:
Alternatively, if we do reject H0, we are forced to accept Ha, that it is not possible to recover the genetic signature. If that were the case, the following would be true:
Therefore, although on the surface it would appear that the lack of genetic evidence to support the Book of Mormon story line implies that it is false, the fact remains that, regardless of whether or not it is possible to recover the ancient genetic signature of a small migrating family, we cannot discount the truthfulness of the Book of Mormon based on the implications of its story line using the scientific method. The validity of the Book of Mormon story line is not testable because it does not present a rejectable hypothesis. Genetic data can never be used to invalidate these claims; its only possible use would be to support them.
This thought exercise has not yet approached the question of whether it is possible to recover the genetic signature of Lehi or Mulek, but it has presented logic suggesting that it really does not matter. Detractors have no basis for their claims that current human genetic data calls into question the story line of the Book of Mormon. Current genetic data cannot, nor will any future data ever, falsify the Book of Mormon story line. The claim that Lehi left Jerusalem and settled in the Americas cannot be rejected based on the philosophy of scientific method, the most powerful secular tool the people of the world have ever had for generating knowledge.
Population Genetics Theory
The thought exercise presented above illustrates the need for and use of testable hypotheses. The fundamental principles of population genetics have been framed and mathematically explored such that truly testable hypotheses concerning the genetics of populations may be generated if an adequate sampling of global variation is available. Unlike some other branches of biology that may only be evaluated qualitatively, population genetics has historically been dominated by mathematicians and statisticians, resulting in its natural resemblance to "hard sciences" like physics and chemistry. The theory behind population genetics constitutes a convenient conceptual framework from which other quantitative fields of biology have emerged, entirely or in part, such as phylogenetic systematics (the science of reconstructing genetic relationships, or gene genealogies, based on genetic variation), molecular evolution (the science of inferring patterns of molecular change from extant data), and more recently, bioinformatics (the science of using computational methods to analyze complex data structures and reveal biologically relevant information). The null hypotheses generated from the basic concepts of population genetics represent a set of default predictions by which the characteristics of empirical data may be ascertained. By rejecting null hypotheses, researchers can easily establish what has not occurred and, by default, what most likely did occur. The use of null hypotheses therefore presents a powerful strategy by which important information may be revealed.
As discussed above, the segregation of chromosomes during meiosis results in any given autosomal allele (alleles found on chromosomes other than the X or Y chromosomes) having a random chance of being maternal or paternal in origin within gametes. This is not true for the inheritance of the mitochondrial genome, which is entirely maternal in origin, or for the Y chromosome, which is entirely paternal in origin. Thus, the human genome—and that of any other species with sexually dimorphic chromosomes (such as X and Y)—possesses both double-copy biparental genetic information (a diploid component) that has possibly undergone recombination prior to inheritance, and single-copy uniparental genetic information (a haploid component) that is basically composed of a clone of the parental copy. The Y chromosome, however, still has a random chance of being inherited by any given offspring (depending on the ratio of X- and Y-chromosomal sperm in the population of male gametes), whereas the mitochondrial genome is maternally inherited by all offspring.
Both uniparental and biparental alleles become fixed in a population in the same way: the chromosomal lineage of the individual from which an allele originated must grow in numbers until all other lineages are extinct and no other alleles exist at that locus in any member of the population. When new adaptive alleles arise through mutation, they can spread by means of natural selection throughout the population regardless of its size, given enough time and flow of genetic information.31 New alleles, however, may also spread quickly by genetic drift when historical populations are extremely small, whether the allele is adaptive or not. Although the two homogenizing principles of natural selection and genetic drift have the same result, it is statistically possible to differentiate them from one another and from other historical phenomena using complex yet elegant statistical approaches.32 This science of teasing apart genetic information to reveal complex dynamics has seen many recent advances33 and has become a powerful diagnostic tool for reconstructing the historical events from which present-day genetic variation originated.
The Hardy-Weinberg equilibrium principle. When Mendel's research was rediscovered in the early 1900s, there was an initial sentiment that Mendelism was fundamentally at odds with Darwinism because Charles Darwin (1809—1882) had proposed a different mechanism of inheritance. However, a small portion of the scientific community sought to harmonize the discoveries of Darwin and Mendel. Due in part to the early work of Reginald Crundall Punnett (1875—1967) to explain and illustrate Mendelian concepts using what has since become known as a Punnett square, it became much easier for the scientific community to reconcile these two principles that now codominate biological thought. Punnett was convinced that under specific circumstances, multiple alleles at a single locus within a population could exist at equilibrium frequencies with no eventual fixation. Others had tried to describe this system but were unable to succeed with satisfactory results.34 Punnett took his ideas to a prominent mathematician, Godfrey H. Hardy (1877—1947), who in 1908 published the first equations to accurately describe allelic frequency equilibria.35 Wilhelm Weinberg (1862—1937) published similar findings that same year,36 so the description became known as the Hardy-Weinberg equilibrium principle. An allele system that is able to remain in equilibrium, they predicted, would have a specific set of characteristics, now known as the Hardy-Weinberg assumptions. These assumptions include:
Although the Hardy-Weinberg assumptions appear ridiculously impractical and incapable of being met by a natural population, it is truly amazing how often alleles in ordinary populations are found to be in equilibrium. In reality, the requisite primary criterion is that there must not be significant violations of the assumptions. Obvious violations, however, will always result in deviations from expected allele frequencies.
Violations of the Hardy-Weinberg assumptions. The Hardy-Weinberg assumptions must hold if genetic signatures are to be maintained relative to autosomal alleles, sex-chromosome alleles, and mitochondrial alleles. Violations of the Hardy-Weinberg assumptions will result in changes in allele frequency, with the more blatant violations resulting in greater changes. However, all alleles are not created equal. Violations of these assumptions will have a greater effect on X-chromosome alleles than autosomal alleles and a greater effect on mitochondrial and Y-chromosome alleles than on X-chromosome alleles. This phenomenon is based on chromosomal population size. There are two copies of each autosomal locus, one on each homologous chromosome in a pair—in other words, they are diallelic. There are also two copies of each X-chromosome locus in women because women have two X chromosomes, but only one in males because they have only one X chromosome. Finally, there is always just one copy of each mitochondrial and Y-chromosome locus because these linkage groups do not possess homologs. These differences in relative population sizes mean that random error has different influences among these linkage groups. As discussed above, the smaller the population size is, the greater the influence of genetic drift will be. Genetic drift results from a violation of the population-size assumption. Violations of the other assumptions are also dependent on population size: the smaller the population size is, the greater the effect of the violation will be. Therefore, effects of violations of the Hardy-Weinberg assumptions will generally be amplified among mitochondrial and Y-chromosome loci. The lone exception to this is the violation of the assumption of random mate choice, because mitochondrial and Y-chromosome loci are not diallelic.
The violation of each Hardy-Weinberg assumption has been shown to have a specific dynamic effect in a population; these effects have been demonstrated over and over, both algebraically and empirically. The following are the predicted results of violations of these assumptions:
If, however, there is differential reproductive success among individuals in the population, the assumption of neutrality is violated and natural selection has a significant influence. If possession of an allelic variant results in an increase in reproductive success—that is, if the allele is positively selected—the likelihood that the allele will eventually become fixed goes up and the path toward fixation becomes less stochastic and more direct. The greater the reproductive success, the faster the increase in relative frequency. Conversely, if possession of an allelic variant results in a decrease in reproductive success—if the allele is negatively selected—the likelihood that the allele will eventually become fixed decreases. The greater the decrease in reproductive success, the faster the allele will go extinct.
Mutation is by itself a very weak evolutionary force. However, when it is coupled with another of the violations of the Hardy-Weinberg equilibrium, like selection or a change in population size, the result is often a very potent combination of evolutionary forces that can change the genetic signature of a population in a relatively short period of time. There is also evidence to suggest that an increase in mutation rate is often favored upon colonization of a new environment where adaptation is required.37
Generally speaking, these violations of the Hardy-Weinberg assumptions all result in the genetic signature of the population in question changing relative to what it had historically been. These evolutionary forces cause changes in allele frequencies that, given certain conditions, may change the fundamental genetic characteristics of the lineage. Nevertheless, some equilibrium violations are more likely to result in substantive change than others.
When evolutionary forces are combined, greater change becomes more likely and even expected. The primary caveat of the study of population genetics is that there are always situations in which it is impossible to reconstruct the characteristics of past evolutionary events. Violations of the Hardy-Weinberg assumptions are generally assumed not to have occurred unless there is extrinsic evidence available that indicates to the contrary. This is the primary reason why the results of population studies must be loosely interpreted.
Did the people of Lehi or Mulek violate Hardy-Weinberg assumptions? Generally speaking, the Book of Mormon peoples violated most of the Hardy-Weinberg assumptions presented above. Clearly, they violated the assumptions of no migration and constant, large population size. These violations included: (1) Lehi (1 Nephi 18:8—23) and Mulek (Helaman 6:10; 8:21) migrating to the Americas in small groups; (2) multiple accounts of groups that left the central population to colonize other lands, like the initial split of the Nephites and the Lamanites (2 Nephi 5:5—6) or the story of Hagoth building a ship and launching into the west sea (Alma 63:5—8); (3) constant wars that killed thousands of people and may have resulted in population bottlenecks (for example, Omni 1:3, 10, 24 through Mormon 6:10—14); (4) the catastrophes prior to the coming of Christ to the Americas in which thousands of people lost their lives (3 Nephi 8:5—18); (5) groups that dissented and separated themselves from the main body of Nephites (such as the Zoramites in Alma 31:8); (6) partitioning of major populations into cultural tribes and subdivisions (referred to as "-ites" as in 4 Nephi 1:17, 36—37); (7) secondary contact between Nephite dissenters and Lamanites resulting in gene flow (e.g., Alma 21:2—3; 25:4); and (8) secondary contact between the Anti-Nephi-Lehies who converted and left the Lamanites to live among the Nephites (Alma 23:17—18; 27:25—27).
The assumption of no selection may also have been violated when the people journeyed through the wilderness in the Old World (see 1 Nephi 16:20, 35; 17:1—2 [a direct reference to bearing children amid hardship], 21) and the New World (see Omni 1:27—30) and experienced hardships due to expansion (as in Alma 63:5—8; Helaman 3:3—4, 7, 9). They inhabited a new land that may have been very different from the habitat endemic to Jerusalem and the rest of Israel. These new environmental factors may have meant that alleles that were neutral in the old environment became selectively advantageous, while formerly advantageous alleles may have become neutral or even detrimental. Alleles that proved to be advantageous would have enjoyed a newfound reproductive success and spread throughout the population, accumulating over successive generations. Although selection is definitely a possible violation of Hardy-Weinberg assumptions, it remains largely unclear as to whether it had a significant influence or what that influence may have been, based on the Book of Mormon story line.
Another potential violation of a Hardy-Weinberg assumption may have been nonrandom mating. Although Lehi's family brought with them the family of Ishmael, all the mate choices from within the founding population's first generation following the initial colonization would have been exclusively first cousins, and most would have been double first cousins—that is, their fathers were brothers and their mothers were sisters. Possible exceptions to this pattern would have been the children of Zoram; their mother was a daughter of Ishmael (1 Nephi 16:7) and therefore a sibling of either the husband or wife of the other Lehite couples, but their father was probably genetically unrelated to the rest of the party. It is also possible that some of the children of Laman, Lemuel, and the sons of Ishmael, once their parents became separated from the other colonists (2 Nephi 5:5—6), may have produced offspring with partners originating from native populations, thus not allowing an Israelitish mitochondrial genome to be passed on among those lineages.38
There is, however, no reason to suspect the mutation rate to have changed, although fewer allelic variants are produced in a small population than in a large population as a result. Mutation, as explained above, is a very weak evolutionary force, so it probably would not have had a great effect by itself anyway. It is true that higher rates of mutation may be favored upon colonizing novel environments, but there is no direct Book of Mormon evidence that this was the case.
Human Genetics and Genealogical Inference
If genetic change is constant, we should be able to accurately trace racial and lineal ancestry, right? As discussed above, there is a specific set of circumstances under which this would be true, but in reality these circumstances generally have not been met within the recorded history of humankind. Implicit assumptions that must be invoked in tracing ancestry using genetic information include the following: (1) the sample population has had a large and relatively constant effective size; (2) the population has been largely reproductively isolated from other populations; and (3) the majority of the genetic variations used to trace the population's ancestry and infer historical relationships have become fixed in the sample populations and, in effect, represent diagnostic markers. In most organisms, these are pretty fair assumptions; but humans have deviated considerably from this model. There has been recent exponential population growth among human beings in most areas of the world, and our capacity and propensity for movement have always been such that, even thousands of years ago, most populations were far from genetically isolated.39 As a result, there has been a continuous historical flow of genetic information among most of the world's populations.40 These violations of the most basic of assumptions have resulted in the human gene pool being "profoundly different" from that of other higher primates, such as chimpanzees,41 within which genetic variation is more diverse in a single social group than in the entire human race!42 Researchers studying historical human genetic variation must therefore be very careful with their experimental design; they must try to sample only those populations that they have reason to believe have been relatively stable and isolated through the relevant period of history.
Analytical concerns. Alan Templeton, a world-famous researcher and expert on the analysis of population genetic information working out of Washington University in St. Louis, and others, including Keith Crandall, a professor of integrative biology, microbiology, and molecular biology at Brigham Young University, have outlined a research protocol that may help avoid these problems.43 When Templeton applied this new technique to the analysis of human genetic population structure, one of his primary conclusions was that human populations have experienced ubiquitous genetic interchange throughout their history.44 He underscored the idea that although a population may have a strong genetic signature originating from a particular geographic location, there is nearly always some genetic variation that cannot be explained by the predominant hypothesis. Rather than discounting this unexplained variation, he maintained that it is an indication that variation from other sources may have a significant influence, even though the source of the information may not be ascertainable.
Templeton also found that different types of DNA varied in their ability to resolve questions of range expansion, long-distance dispersal, and isolation by distance factors, largely owing to the ways in which the particular type of DNA recombines or does not recombine. Mitochondrial DNA does not recombine at all, and Y chromosomes may recombine with X chromosomes in some regions but not in others. X chromosomes and autosomal chromosomes (chromosome pairs 1—22), however, recombine among homologs relatively frequently. Implementation of a given type of DNA in population-based studies may require a unique experimental design because recombination blurs analytical results, making interpretation of the data ambiguous. For example, it has been demonstrated that the mitochondrial genome and the nonrecombining portion of the Y chromosome are subject to a large degree of stochastic error because they do not recombine, meaning that any calculations of timing of divergences resulting from analysis of these molecules should be seen as uncertain estimates.45 One study based on a marker on the Y chromosome concluded that the common ancestor of all living males lived 270,000 years ago, but the 95 percent confidence interval placed on this value means effectively that this common ancestor may have lived at any time between yesterday and 800,000 years in the past.46 When considering uniparental, nonrecombining DNA, uncertainty is the rule of thumb, and results must be considered gross estimates, the exact value of which is completely dependent on influential factors such as natural selection, effective population size, and the degree of gene flow.
Most surviving mutations in the mitochondrial genome have been shown to be selectively neutral, but this is not necessarily true in the nuclear genome. When the effective female population is small—that is, when only limited numbers of the females in the population do all of the childbearing—population genetics theory predicts that mutations may become fixed more quickly in mitochondrial genomes, resulting in overestimates of the timing of coalescence (the approximate date when an ancestor may have lived from which an extant variation originated).47 Likewise, when gene flow between populations is prevalent, populations evolve much more slowly and as if they are much larger; but if gene flow is sparse, populations will evolve independently and much more quickly. It is clear that techniques used to resolve interspecies relationships (which are generally not at the population level but at higher taxonomic levels, where considering the effects of these phenomena is not as important) should not be applied carte blanche to studies of populations within species.48 Even population-level genetic relationships should not be equated with lineal genealogies. Thus, careful experimental design, biologically appropriate methods, and conservative interpretation of results are a must.
Conclusions from empirical studies. A recent article addressing the subject of historical Amerind (Native American) population genetics underscores the perspective that conclusions resulting from the analysis of human genetic markers must be interpreted conservatively:
Human geneticists might be well advised to only modestly suggest that their suggestions with regard to the identification of population waves for archaeological consideration are simply exercises in speculation that have little precision. Our research continues to document the unique composition of genomes in space and time, but interpretations of the exact process by which genetic diversity has accumulated should be stated with greater caution, if it is to have credibility among a broader range of disciplines. . . . The difficulties that attend the appropriate incorporation of information from biparentally inherited loci into the effort to reconstruct population history—an effort that is the ultimate goal of most anthropological geneticists—can be only broadly imagined on the basis of this example [the case of the Amerinds presented in the article].49
Thus, recovering a specific genetic signature, even one that may have been of major historical importance, may not be possible. Furthermore, if a genetic novelty is recovered and it is suspected that it may correspond to a historical event, it may not be advisable to suggest the correlation unless there are multiple lines of evidence. It would be extremely inadvisable for any scientist to claim to have found Lehi's genetic signature, even if the claim was merely to have recovered the remnant of a limited Middle Eastern migration. If my research yielded such results, I would simply claim that other variants exist that are not easily explained but that there may be some historical relationship or similarity to Old World genetic lineages with possible descendants in present-day Middle Eastern communities. Any conclusions that go beyond the presentation of demonstrable data would invite the scrutiny and criticism of the scientific community, and rightly so. Conservatism in one's conclusions should always be the rule, never the exception.
Ancient DNA. The use of ancient DNA for studying human evolutionary relationships has experienced a moderate level of success. For example, DNA was extracted from a Neanderthal (Homo neanderthalensis) fossil that was collected nearly 150 years ago from western Germany. Results indicated that Neanderthals and modern humans are four times more distantly related than the most divergent of human lineages50 and confirmed that no extant human is even partially descended from a Neanderthal lineage.51 Ancient DNA obtained from museum specimens has also been useful when inferring species relationships among extinct organisms such as the quagga, a zebra relative.52 Therefore, the use of DNA from preserved skeletal material and mummies may be very useful in studying human origins and diversity. However, studies incorporating ancient DNA must be interpreted with more than usual care due to the high probability of spontaneous DNA degradation and possible violations of the assumptions used to estimate genetic relationships (for instance, the possibility that the specimens do not originate from the same time frame or temporal context). Results must be interpreted with a conservative eye to avoid conclusions that go beyond what is appropriate considering the nature of the data and the accepted governing scientific principles.
A haplotype (also termed a multilocus genotype) is a distinct variant of a group of linked loci. Strictly speaking, a haplotype may be isolated for comparison by cutting homologous DNA sequences with restriction enzymes to identify restriction fragment length polymorphisms (RFLPs), amplifying length variants in satellite DNA using the polymerase chain reaction (PCR), sequencing a distinct region of DNA to reveal nucleotide variation, or any number of different techniques that distinguish derived genetic characters within a single linkage group. Groups of haplotypes that share prominent features are considered monophyletic (of a single origin) and are referred to as haplogroups.
Relative to human population studies, haplotype information has been gathered from many potential sources, including mitochondrial genomes, Y chromosomes, and autosomal chromosomes. Several correlations have been made between the molecular evolution of these genetic markers and the development of regional linguistics.53 In fact, cross-referencing genetic and linguistic studies provides a rich context by which genetic information may be interpreted. However, certain assumptions must be taken into account when considering such a correlation, including the following: (1) once language families diverge, they never again exchange migrants—an idea that is not supported by genetic evidence54—and (2) genetic lineages diverge quickly in small populations and slowly in large populations such that a molecular clock cannot be invoked.55 Not surprisingly, definite conclusions that explain all the observed genetic variations are few.56 Characterizing the dynamics of human population genetics is a highly complex research pursuit and must be approached with a certain degree of conservatism and skepticism.57
Mitochondrial haplotypes. One of the first very important human population studies was performed in 1984 by a research group at the University of California at Berkeley using 12 restriction enzymes that produced polymorphisms relative to 441 cleavage sites in the human mitochondrial genomes of 112 people from 4 continents. Of these sites, 163 were polymorphic for cleavage, most likely due to a single-base mutation that was most probably under very little functional constraint. Although very few inferences regarding historical contact or migrations were drawn from these data, the enormous amount of genetic variation among humans, especially within the mitochondrial genome, was an obvious conclusion of the study. It also revealed a type of coevolution between the mitochondrial cytochrome oxidase subunit 2 and the nuclear cytochrome c genes, both of which are involved in cellular energy production (as part of the electron transport chain) and evolve roughly five times faster in primates (including humans) than in rodents or ungulates. This study represented the most comprehensive comparative study for closely related, complete mitochondrial genomes of that period, but—of importance to the topic of this essay—this study did not include any Native American samples.58
The group at Berkeley followed up the 1984 study with a paper published in the internationally prestigious scientific journal Nature. This paper, entitled "Mitochondrial DNA and Human Evolution," has since become the foundation for the study of human population genetics. It drew upon restriction-map data from 147 people from 5 geographic populations, once again not including Native Americans. The main conclusion of this study was that the common female ancestor of these sampled individuals lived about 200,000 years ago59—an individual who has since become known as "mitochondrial Eve." This controversial study has since been confirmed multiple times, although the exact time frame and other details relative to our most recent common female ancestor remain unclear.60 Other questions persist—most notably, To what extent does the history of a locus represent the history of a population?61
Some resolution has been achieved by correlating the results of population genetics, archaeology, and linguistics. For example, it has been suggested that one of the major routes of humans from Africa to Eurasia (the combined European and Asian continents) may have been across Saudi Arabia, through Iraq and Iran, dispersing to Pakistan and along the coasts of the Indian subcontinent to East Asia, and then on to the islands of Micronesia, including Australia and New Guinea. Archaeological evidence suggests that Australia has experienced continuous human occupation for about the past 60,000 years, and it is clear that people have inhabited New Guinea for at least 45,000 years.62 These approximate dates may be used to calibrate the molecular clock emergent from genetic studies such that the timing of each event along the route of migration may be inferred.63 This, however, is the approximate limit of the technique; only mass migrations may be inferred, and only with a degree of uncertainty, and only if there is corroborating evidence. Details relative to historical human migration may be achieved without correlating these three lines of support, but only at the cost of uncertainty as to absolute dates and unsubstantiated assumptions.
The historical population structure of Native Americans may be characterized by the four major haplogroups A, B, C, and D.64 All have been associated with an Asian origin. There also are more rare haplotypes that do not appear to be part of haplogroups A—D. These "other" haplotypes65 form a monophyletic haplogroup66 that is curiously similar to the uncommon European and Druze (Israel) haplogroup X.67 This haplogroup is currently endemic to Native American groups in North America—including the Ojibwa, Nuu-Chah-Nulth (Nootka), Sioux, Navajo, and Yakima68—and has also been identified among the Yanomami of the northern Amazon.69 Accumulated fixed differences between the "other" haplotypes of Native Americans and the European/Druze haplogroup X indicate that they may have had a common ancestor between 12,000 and 36,000 years ago,70 representing a fifth founding lineage of Native Americans.71 However, this may be an overestimate if the original founding population was very small; as discussed above, population size and the probability of fixation have an inverse relationship, so small historical populations may appear to be older than they are if the assumption of constant, large population size is asserted when no evidence to the contrary is forthcoming. The recent discovery of a 9,300-year-old Caucasoid human skeleton buried near Kennewick, Washington—the so-called Kennewick man72—may provide an independent confirmation of molecular findings surrounding haplogroup X or, at the very least, allow for the possibility of Caucasoid habitation in the Americas.73
Subsequent research has identified haplogroup X among the Altaian people of south Siberia,74 and some have suggested that this invalidates previous speculation of a Caucasoid ancestry for haplogroup X;75 but this suggestion is based on the speculation that haplogroup X must originally have come from Asia because haplogroups A—D also originate in Asia.76 This explanation, however, does not account for the fact that haplogroup X is found to be more widespread in Europe than in Asia, while haplogroups A—D are not found in Europe. Far from determining that there was a single place of origin for Native Americans, these new data underscore the possibility that X and A—D may be parts of completely separate lineages. In general, without a proper outgroup (DNA sequences that have a sister relationship to the study group DNAs) to polarize the relationships of the population network, it is nearly impossible to determine the point of origin.
Several possible conclusions may be consistent with these data, including the following: (1) as presented by Derenko et al., that Altaians represent the origin of the haplogroup77 (which does not explain why Europeans and Israelis also possess it); (2) that haplogroup X originated in Europe and migrated independently to south Siberia and North America; (3) that haplogroup X originated in Europe and migrated to Israel, south Siberia, and then on to North America;78 or even (4) that haplogroup X originated somewhere central to Europe and Asia (perhaps near Israel) and migrated simultaneously in different directions at the same time, arriving in North America as part of the same dispersal (which is consistent with a scenario not unlike the diaspora). Given that fluctuations in population sizes may affect the rate at which variants become fixed in populations,79 none of these hypotheses&ndmdash;or a host of other hypotheses that may or may not exhibit testable characteristics—can be verified. It is very possible that migrating populations originally represented only small subpopulations of a much bigger parent population; genetic drift may thus have had a great effect among founders, generating more fixed differences while at the same time ridding the population of a great percentage of its within-population variation than is expected by chance alone.
Another haplotype, C10,80 is found only among the Cayapa people of Ecuador, who possess it in relatively high frequencies (30 percent). C10 does not appear to be closely related to any other extant human haplotype, although it appears that it may be loosely related to haplogroup C to the exclusion of haplogroups B and A. At best, haplotype C10 represents a lineage that has a questionable origin.
Mitochondrial studies have also been performed with the remains of ancient Maya from the Postclassic period of A.D. 900—1521, just prior to European colonization.81 Findings include the identification of a single individual (1 out of 16) whose mitochondrial haplotype failed to correspond to any of the known extant haplogroups (A—D). Although another unidentified haplotype was isolated among contemporary Maya, it was discounted as the product of modern European admixture.82 However, the presence of a similarly unidentified haplotype in ancient Maya may call this conclusion into question.
Although the preponderance of mitochondrial genome data supports the hypothesis that the Americas were originally peopled by humans from eastern Asia, the exact location of the source population and the number of migration waves remains controversial,83 despite claims to the contrary.84 The presence of haplotypes X and C10 and the "unknown" Maya haplotypes (both ancient and modern), however, emphasize the fact that much that has been discovered is yet to be explained. A hypothesis for the diversity of Native American mitochondrial genome haplotypes that relies exclusively on an out-of-Asia origin falls short of a complete explanation.
Y-chromosome haplotypes. Parallel to human studies of the matrilineal mitochondrial genome are studies of the Y chromosome, its patrilineal counterpart. However, unlike the mitochondrial genome, or even autosomal chromosomes, the Y chromosome exhibits very little polymorphism85 yet is subject to a large measure of stochastic error.86 The lack of genetic variation may be the result of episodic selective sweeps, but the exact mechanism for this evolutionary constraint remains unclear.87 Nevertheless, great effort has been exerted to discover fixed differences that may act as diagnostic haplotypes that allow for the identification of human founder events. To date, these fixed differences have been found within several genes and noncoding regions such that the construction of compound haplotypes has been possible.88 A positive correlation between Y-chromosome haplotypes and linguistic patterns has also been deduced.89
Since Y-chromosome markers lack much of the genetic diversity that mitochondrial genomes exhibit, the ambiguity arising in the data is somewhat compounded. It is very difficult to differentiate true ancient relationships from relatively recent and extensive European admixture resulting from colonization after the time of Columbus. One example of this problem is a recent study that examined Native American Y-chromosomal haplotypes and concluded that there may have been two separate lineages of migrating populations to the Americas,90 a conclusion that has been confirmed by independent evaluation.91 Of the five Native American haplotypes, four (haplotypes 1, 10, 20, and 31) exhibited only 1—2 mutational differences among them, while the fifth haplotype (23) clusters tightly with other haplotypes to the exclusion of the first four. The fifth haplotype is more closely allied with Central East Asian, Evenki, and Mongolian haplotypes (7, 24, and 28); the first four were similar to these, as well as to Altai, Ket, Indian, and European haplotypes (4, 6, 13, and 32). When the data were analyzed using a different optimality criterion, however, these results converge on a single lineage emerging from Asia, largely discounting the strong relationship with European haplotypes (4 and 6 were exclusively European) and the presence of a single haplotype (31) that did not appear in any sample population outside the Americas.
Although I do not necessarily disagree with this study's conclusion that Native American Y-chromosome lineages originate largely from Asian source populations,92 I do find that it fails to explain many aspects of the resulting data. For example, when the haplotypes shared by Europeans and either Native Americans or Siberians were excluded from the analysis, it did not appreciably change the ancestral relationships inferred from the data, indicating that modern European admixture is not a plausible explanation. Yet the most common European haplotype (1) also appears in Native Americans, suggesting that there has been modern admixture. The authors of the study then refer to studies involving Kennewick man93 and haplogroup X94 as evidence of a Native American-European connection, only to turn right around and explicitly state that a recent European admixture is likely. Needless to say, conclusions are far from definite.
Differing results from mitochondrial DNA and Y-chromosome analysis. The previous example points out the problem scientists have with ambiguity, especially the uncertainty emerging from human Y-chromosome data. One issue that can create ambiguity is the inherent difficulty of interpretation presented by inferring population dynamics from gene-based markers. The problem was defined clearly in a recent paper on New World Y-chromosome haplotypes:
Gene trees [relationships inferred from gene variation] such as our Y-chromosome scaled coalescent tree . . ., the numerous mtDNA trees in the literature (Cann et al. 1987), and the recent global β-globin-analysis tree based on autosomal sequence data (Harding et al. 1997) are not equivalent to population trees [the true relationships of populations]. Inferences about population relationships derived from gene trees must be made very cautiously, especially since each gene has its own evolutionary history (Harpending et al. 1998).95
This difficulty is compounded when polymorphism levels are low, as is the case with much of the Y-chromosome data. Although many researchers acknowledge this to be the case,96 some continue to use relationship-reconstruction techniques that ignore the problem, yet they freely draw seemingly unambiguous conclusions from their inferences.97 This problem is further amplified with regard to the question of ancient colonization of the New World by the fact of extensive and prolonged gene flow from Asia,98 which serves to confound the ability of scientists to reconstruct the historical population structure of Native Americans.99
Ambiguity notwithstanding, some authors of studies with multiple interpretations relative to possible recent European admixture in the Americas point out that the estimated dates of dispersal generally correspond to the estimated age of Kennewick man.100 This acknowledgment suggests that at least some researchers have reason to be skeptical of the global acceptance of the prevailing "out-of-Asia" paradigm. As a recent commentary put it, "Genetic evidence derived from contemporary populations can only study lineages that survived. It is impossible to estimate the number of nonsurviving lineages"101—meaning that if a population is currently extinct due to war or some kind of natural disaster, we could never infer their existence from DNA data because they would have no descendants. Furthermore, this would be true independently for each genomic linkage group, which is the primary reason why mitochondrial DNA and Y-chromosome data may yield different analytical results.102
Differing results from mitochondrial DNA and Y-chromosome analysis. One factor that may potentially result in conflicting conclusions emerging from among unique human genetic data sets is the differing regional dispersal patterns of males and females. A good example of this is a recent study entitled "Mitochondrial and Nuclear Genetic Relationships among Pacific Island and Asian Populations." Among 745 samples collected throughout eastern Asia and major islands of the Pacific Ocean, mitochondrial data (190 bp) correlates closely with linguistic data, suggesting that peoples of remote Pacific islands originated from human populations of Southeast Asia. Nuclear data (17 short tandem-repeat [STR] loci) from these samples, on the other hand, fail to correlate with linguistic data but underscore a relationship between peoples of larger western islands and smaller eastern islands.103 On the surface, these data appear to be in conflict, even to the point of supporting conflicting hypotheses for human dispersal in the islands of Melanesia, referred to as the "express train" and "entangled bank" hypotheses.104 These differing results, however, may be reflective of different dispersal patterns among males and females, with females dispersing from southern China to the remote islands via primary expansion (the "express train"). In contrast, males probably dispersed secondarily without exterminating the local female population, whether by completely displacing the local males or by extrapair copulations while engaged in fishing or merchant ventures (thus resulting in an "entangled bank").105 Although this is just one interpretation of these data and others may be possible, given additional data from other genetic loci, this article stresses the importance of considering multiple points of view in an effort to characterize a scenario that is consistent with all of the data, not just those that fit one's a priori assumptions.
As noted above, mitochondrial DNA and Y-chromosome data may have independent natural histories, resulting in inferential discrepancies. Recent findings confirm previous conclusions106 that these discrepancies have a cultural basis.107 The differing conclusions resulting from the analysis of these linkage groups are largely the product of either men remaining near their birthplace while women migrate to be near them (termed patrilocality)108 or women remaining near their birthplace while men migrate (termed matrilocality).109 Each scenario results in a different discrepancy among analytical results. Patrilocality would naturally produce a high rate of mitochondrial change and a low rate of Y-chromosome change, while matrilocality would naturally produce the opposite result. This is exactly what was found.110 However, patrilocality prevails in the majority of peoples sampled to date,111 resulting in Y-chromosome data that are less robust than mitochondrial data, thus yielding different inferences.112
This review has produced several biologically meaningful conclusions relative to the question of whether it is possible to recover an ancient genetic signature of a small migrating group that lived 2,600 years ago—namely, the parties of Lehi and Mulek, who, the Book of Mormon claims, migrated to the Americas from Jerusalem just prior to the occupation of Judah by the Babylonians. Each of these conclusions is open to interpretation because each necessitates the application of scientific concepts and assumptions, which is largely a subjective endeavor. One of the most common misconceptions of science, especially among the lay public (and new biology students), is that it is a completely deterministic process. If experiments are performed correctly, they reason, the results will have no ambiguity. In reality, not only are the results highly ambiguous, but it is often difficult to come up with an appropriate experimental design when little is known of a topic. In practice, a lot of experimentation is exploratory in nature. If the dynamics of a system are unknown, experiments are designed that will allow the researcher to gain an intuition for how the components are related and interact. Thus, initial experimentation is largely for the purpose of probing a system such that a preliminary understanding of the applicable parameters may be ascertained.
Some of the students I train in laboratory research express frustration with my inability to answer their questions with confidence. Quite often I tell them that one conclusion would be most greatly supported under one set of circumstances, while another would be supported under another set of circumstances. Furthermore, I add, the set of assumptions—both explicitly stated and implicitly supposed—limit the conclusions that are possible given the data. These assumptions are frequently difficult to reveal or even understand unless the researcher has a great deal of experience with the system in question. Put plainly and simply, the more complex the system, the harder it is to interpret the data appropriately.
Such is the case with those who have attempted to draw conclusions regarding the validity of the Book of Mormon based on the current body of human genetic data.113 They reveal their ignorance of scientific principles by drawing conclusions that are inappropriate. They ignore pertinent information because they do not know that it may be important, or they fail to probe the primary literature, opting instead to use summaries or popular scientific literature exclusively because they have a difficult time interpreting much of the data for themselves. They simply trust the speculative suggestions of scientists, when all the scientists were doing was offering a possible interpretive alternative—a hypothesis that may or may not be testable—rather than stating a definite conclusion that is emergent from the facts because such a conclusion may not be possible given the data.
This review first concluded that, regardless of the answer to the essential question under consideration, it is not possible to conclude logically that the Book of Mormon is not true based on its story line. Nothing can be proven in science; hypotheses can only be rejected. Thus, if it is not possible to recover such a signature, it also is not possible to disprove the Book of Mormon based on genetic data. Conversely, if it is possible to recover a genetic signature like Lehi's or Mulek's, the mere fact that it has not been recovered means nothing with regard to the truthfulness of the Book of Mormon. Either way, the Book of Mormon does not present a testable hypothesis in terms of human population genetics.
Putting the philosophical ramifications of scientific method aside, I then attempted to test the hypothesis that it is possible to recover the ancient genetic signature of Lehi or Mulek. The story line of the Book of Mormon presents a great deal of information bearing on the conditions known to preserve genetic signatures (which would include the preservation of a suite of genetic alleles over evolutionary time):
Thus, almost all the assumptions of Hardy-Weinberg equilibrium were violated by the Book of Mormon peoples. According to the specifics of the Book of Mormon story line, it may not be possible to recover the genetic signature of Lehi or Mulek. Too many influences would have resulted in too many violations of equilibrium-preserving conditions. In light of this information, a population geneticist would not even bother designing an experiment to test the hypothesis because there would be no reason to expect a successful result. Furthermore, if it were possible to recover the genetic signature, there would be no way to verify its source. One would expect that if Lehi's or Mulek's genetic signature was found, it would be categorized as "unknown" or "other" or "unrelated." Based on this information, and if I were forced to design an experiment that would produce evidence in support of the Book of Mormon, I would look for haplotypes that are not closely related to any extant ethnic group, but appear to be older—perhaps much older—than 2,600 years. Curiously, documentation of such haplotypes is exactly what is emerging in the literature (haplogroup X, haplotype C10, the "other" haplotypes from ancient and modern Maya, the unexplained Y-chromosome haplotypes, and so forth), but interpretation of these data is largely avoided in the individual studies because they do not correspond well to the current scientific paradigm. However, I will stop short of interpreting these "other" data as belonging to the Book of Mormon peoples because it is completely unverifiable. As indicated, one cannot prove anything; one can only reject hypotheses.
My next point builds on this: current human population genetic data produce many ambiguous results that are hard to interpret, so they must be interpreted conservatively. They also present more data than fit into the general conclusions of the paper, and that data must eventually be dealt with. If we read a human population genetics study that purports to have definite, ironclad conclusions drawn from data of questionable interpretation, we should feel fairly confident that the authors of the research article are going beyond what the data will realistically allow them to conclude. The leading experts in the field are currently urging their colleagues to avoid definite conclusions because of the lack of precision produced by conflicting data.114 This professional skepticism, however, rarely makes its way into popular media or literature reviews because there are no definite conclusions to report. Those who question the truth of the Book of Mormon based on genetic data would be well advised to avoid these publications like the plague because they present only part of the story. They generally do not, however, present the part that tends to be the most pertinent to the critics' essential question—the ambiguous results.
The general conclusion of this essay, therefore, is that although it may be possible to recover the genetic signature of a small migrating family from 2,600 years ago, it is not probable. But either way, it would not allow the story line of the Book of Mormon to be rejected because the absence of a genetic signature means absolutely nothing.
That said, I feel compelled to voice my professional confidence in those that are actively researching human population genetics. I have read a large body of primary literature while compiling this review, and I have found the methods and interpretation of results to be consistent with scientific principles and current thought. I am convinced that there has been constant gene flow between Asia and the Americas, but I am also convinced that there has been a trickle of migrants from other source populations. Though far from verifying or proving the Book of Mormon, this observation allows for the plausibility of the Book of Mormon story line. It is very possible that a group or groups of people from the Middle East found their way to the New World in 600 BC. Others had made the trip from somewhere other than Asia at much earlier dates. Thus, a statement that the Book of Mormon account is absolutely impossible would be at the very least naïve, but most probably quite foolish. It would reveal the overall absence of scientific training, as well as an underlying agenda.