Nicholas Wade has a problem. Although his new book, “A Troublesome Inheritance: Genes, Race and Human History”, appears to be selling well, he’s not encountering the praise that he expected from biologists for “courageously” freeing them from the “intimidating social scientists” on the subject of race).
What is he arguing? I go over this briefly in my recent piece on the Huffington Post, and in much greater detail here on this blog, but essentially Wade is using patterns of human variation in populations as a justification for claiming that race is a valid, biological taxonomic category. He goes on to speculate (and that’s really the only word for it, since his claims are unsupported by the preponderance of scientific evidence) that these racial differences determine behavioral differences and thereby explain why some civilizations have historically been more successful economically and politically than others. (You can guess which races he’s talking about; his speculation happens to coincide neatly with traditional stereotypes.)
Wade claims that all critics of this viewpoint are motivated by political concerns and ignore data showing that races are genetically distinct enough to be meaningful taxonomic categories of humans. His book relies particularly upon one genomics study to support this point. In his words (emphasis mine):
Raff and Marks take issue with one of these surveys, Rosenberg et al. 2002, which used a computer program to analyze the clusters of genetic variation. The program doesn’t know how many clusters there should be; it just groups its data into whatever target number of clusters it is given. When the assigned number of clusters is either greater or less than five, the results made no genetic or geographical sense. But when asked for five clusters, the program showed that everyone was assigned to their continent of origin. Raff and Marks seem to think that the preference for this result was wholly arbitrary and that any other number of clusters could have been favored just as logically. But the grouping of human genetic variation into five continent-based clusters is the most reasonable and is consistent with previous findings. As the senior author told me at the time, the Rosenberg study essentially confirmed the popular notion of race.
It’s not a question of logic, but rather what the data show. Rosenberg et al. (2002)’s paper did not analyze or identify just 5 clusters, but rather it considered 1-20 clusters. What Wade is omitting from his paragraph above (and also from his book) is that Rosenberg and colleagues never presented any statistical justification for the choice of 5 clusters over any other number.
Here are the specifics of my criticism, which I posted in response to a commenter on my blog. (If you’re not interested in the statistical refutation of Wade’s argument, feel free to skip this paragraph. I hope Wade takes the time to read it, though).
Structure starts with the assumption that there are K populations (where K is assigned by the user). It assigns individual genotypes to those populations using a Markov Chain Monte Carlo method that minimizes deviations from Hardy Weinberg and also minimizes linkage disequilibrium. Structure will provide posterior probability of the data for a given K: Pr (X|D). Using this estimate to choose between values of K is not without some controversy—the authors of structure caution that it “merely provides an approximation” and “biological interpretation of K may not be straightforward.” Evanno et al. (2005) observed that since the log probability of data wasn’t maximized at the correct value for K, they recommend the measure of ∆K, a second order rate of change of the likelihood function with respect to K, as a better estimate of which K most accurately described the data. A lot of authors follow this practice, although it has been argued against (Waples and Gaggiotti, 2006).
Neither Rosenberg et al.’s (2002) nor (2005) papers report the LnP(D) (or ∆K) for any of the values for K, so we have no way of distinguishing which is the “best” value of K. (I use “best” in quotes because LnP(D) is a somewhat disputed metric, as I discussed above, but at least it’s some way of evaluating between different Ks). Bolnick (2008) reports information about the unpublished LnP(D) values (provided by personal communication from N. Rosenberg): “…no single value of K clearly maximized the probability of the observed data. Probabilities increased sharply from K=1 to K=4 but were fairly similar for values of K ranging from 4 to 20. The probability of the observed data was higher for K=6 than for smaller values of K, but not as high for some replicates of larger values of K. The highest Pr (X|K) was associated with a particular replicate of K=16, but that value of K was also associated with very low probabilities when the individuals were grouped into 16 clusters in other ways. Consequently it is uncertain which number of genetic clusters best fits this data set, but there is no clear evidence that K=6 is the best estimate.” (p77)
Structure is not designed to be applied to populations that experience isolation by distance (IBD), as is true of most human populations. The authors of the program explicitly warn against this, which unfortunately hasn’t stopped people from doing it. Guillot (2009) discusses this further:
“Another confounding factor of the clustering algorithms is IBD. All the models make sense fully only at a scale that is small enough to ignore its effect. At larger spatial scales, any species is affected by IBD and assuming within-cluster panmixia becomes inappropriate.” and “The general effect reported is that the presence of clinal variations tends to be interpreted as the presence of clusters and a number of clusters larger than one is generally inferred, even though no barrier to gene flow was present. “
Structure can absolutely be a useful tool for inferring individual ancestry, but only with (1) an understanding of the assumptions inherent in the clustering algorithms, and (2) cautious interpretation of the results. Because of these caveats, careful and rigorous scientists generally view the “best” clustering scheme as a starting point for generating testable hypotheses about ancestry and population history, NOT as the basis for slicing the species into a small number of groups or races.
To summarize all the above: despite Wade’s claim that structure groups “human genetic variation into five continent-based clusters”, there is no statistical support for choosing 5 population clusters (which Wade thinks is most “reasonable,” “practical,” and “simple”) over any other number of clusters. Wade is certainly allowed to argue that there are 5 races in humans, but we have to recognize that this is not a scientifically-based or genetic argument. Rather, it’s based on his subjective opinion of what is reasonable or practical, which is a very different thing. Wade deliberately omits discussion of this lack of statistical support, as well as the fact that the authors of structure caution against using it on populations that exhibit genetic variation due to isolation by distance (as humans do).
These points are all discussed more extensively in my original critique of Wade’s book. But instead of taking my criticisms seriously. Wade attacked my credentials (and those of his other critics, biological anthropologists Agustin Fuentes and Jon Marks ).
My critiques of Wade’s misrepresentation of structure (and the Rosenberg et al. 2002 and 2005 papers) are inconvenient for his arguments, so instead of addressing them, he dismisses me as merely “a postdoctoral student.” (Postdocs are not students—a fact which I would have expected a science reporter who has been writing “for years in a major newspaper” to be familiar with. I finished my dual PhD in genetics and biological anthropology several years ago and have been conducting full time research on human population genetics since then.)
The fact that I am trained in statistics and specialize in human genomics research is inconvenient for Wade’s attempts to frame this issue as a case of “biologists” vs. “social scientists”, so he simply ignores my genetics expertise (saying that I have no “standing in statistical genetics, the relevant discipline”) and characterizes me as a politically motivated social scientist. He seems to be banking on the belief that people who read his response won’t have read my blog or be familiar with my credentials and publications.
Wade is highly critical of Fuentes and Marks in the same fashion, attacking them as people who “do little primary research” (an accusation which is both patently false and also problematic coming from someone who himself does NO scientific research). You can read their responses to him here and here.
I have no formal training in journalism, but it seems to me that this argumentation from authority is a troubling approach for any science reporter to take.
I have no doubt that Wade put a lot of effort into constructing his arguments, and I can understand how he might be dismayed that the majority of credible biologists aren’t coming to the defense of his book. Rather, criticisms are piling up (two excellent ones that were published after my last piece came out are from population geneticist Jeremy Yoder, and biological anthropologist Alan Goodman. There are many others in the bibliography of my last piece).
What Wade fails to understand is that scientists, by and large, are more convinced by data than rhetoric. He would do well perhaps to go back and read more of the recent literature emerging from anthropological genetics, human population genetics, and biological anthropology, in which human variation and recent evolutionary histories are being productively studied without depending upon culturally constructed partitioning and politically motivated labeling.
Bolnick DA. Individual ancestry inference and the reification of race as a biological phenomenon. In: Koenig BA, Lee SS-J, Richardson SS, editors. Revisiting Race in a Genomic Age. New Brunswick, NJ: Rutgers University Press; 2008. pp. 77–85.
Evanno G, Regnaut S, Goudet J (2005) Detecting the number of clusters of individuals using the software STRUCTURE: a
simulation study. Mol Ecol 14:2611–2620
Pritchard JK, Wen X, Falush D (2007) Documentation for structure software: version 2.2. University of Chicago, Chicago, pp 1–36
Waples and Gaggiotti, 2006 http://www.ncbi.nlm.nih.gov/pubmed/16629801
Note for the commenters: I’m teaching at an intensive genetics workshop this week, so moderation may be a little slower than usual, and I will not be responding to comments directed at me. Please read my site policies before commenting. I will do my best to monitor comments as often as I can, but if any abusive comments make it through moderation and don’t get taken down for a little while, I apologize in advance.