A simple guide to how science works.

“If it disagrees with experiment it’s wrong.
In that simple statement is the key to science. It doesn’t make a difference how beautiful your guess is. It doesn’t matter how smart you are, who made the guess, or what his name is;
If it disagrees with experiment, it’s wrong”
–Richard Feyman

Here’s what it’s like to be a scientist:

You fall in love with a subject. You immerse yourself in your chosen subject, spending a few years reading all of the research that has ever been done in it. You go to conferences where you hear about what other people in your field have been doing, and you talk to them about what you’re doing. You share ideas.

You begin to observe some patterns, develop some ideas of your own about how your subject works. Questions about your subject that have never been answered occur to you. You think hard about whether they’re good questions, and ask advice from more experienced people in your field. You come up with testable ideas (called hypotheses), figure out experiments to test them, and what kind of data you need to in order to prove if they’re wrong.

Then you do the experiments. You think hard about the results you get. You repeat the experiments if you can. You do other experiments to see if you can disprove the results from the first experiments. You take your work to a conference and see what other experts think about it. Based on their feedback, you decide to publish your results and how you interpret them. You write up the whole process: questions, hypotheses, experimental design, results, and your interpretations of the results. You send it to a journal in your field.

The journal editor decides whether the paper is appropriate for the journal, and then contacts experts who specialize in the same field. These experts anonymously review the paper, decide whether the study is well designed, the results are valid, and the interpretation of the results is reasonable. If they decide that it’s acceptable, the paper is published. If not, the paper is rejected.

If your paper is published, your data are available to the rest of the scientific community. They may do the same experiments as you did, trying to replicate your findings. If they can’t replicate your data, or find problems with your interpretation, new ideas may take the place of yours. If they are able replicate your results and they stand the test of time, they may be used as a starting point for a new level of questions.

This basic process, repeated in thousands of laboratories around the world for a thousand years, is what lies behind the simplified form of the scientific method which you learned in school as:

Observation → Hypothesis→ Prediction → Experiment → Analysis →Interpretation →Publication → Replication

But in practice, it’s really more dynamic, like this:

Screen Shot 2013-05-05 at 11.11.37 AM

Here is a much more detailed (and very useful) description of how research is done.

Working in science requires that you are comfortable with uncertainty and doubt. Because the process of science is about disproving things, your own hypotheses can be disproven at any time. But this is why the scientific method is so powerful: it’s a type of inquiry that builds upon itself while being self-critical at the same time. It’s not perfect by any means, but it is, most importantly, effective. It works.

This is not to say that other ways of knowing about things (religiously, traditionally, intuitively) are wrong in themselves. But by definition they are not effective or useful for knowing things about the natural world, which operates according to physical principles. It’s fine if a person has other sources of knowledge, but it’s not fine to take advantage of the respect accorded to scientific knowledge and mislead people by trying to pass off non-scientific ideas as scientific. Fortunately, it’s fairly easy to recognize when this is happening if you know what to look for. I’ll write about this in my next post.

Science papers for non-scientists: Where do Europeans come from?

Reading and understanding scientific literature can be incredibly frustrating for most people. You may want to understand some cutting-edge finding, but find you can’t wade through the technical jargon and obtuse figures, so you give up and read some crappy summary in the news. This doesn’t mean you’re not smart! I’m want to assure you that this is a learned skill–we actually have to explicitly teach our students how to do it.

I feel very strongly about making science accessible to everyone. One of the ways I’m going to do it here is to walk people through recent and exciting scientific papers. Here’s my first attempt. Please feel free to give me feedback!

Summary of Brotherton et al: Neolithic mitochondrial haplogroup H genomes and the genetic origins of Europeans. 2013. Nature Communications 4:1764.

I talked recently about how you can use genetics to test the idea that cultural changes in the past were the result of migration. A few days ago, this study was published, doing just that. I want to go through their findings, because they’re exciting and important.

Europe has a very complex prehistory, characterized by lots of migrations of different ethnic groups. Understanding this prehistory genetically is a tricky endeavor, requiring the sequencing genetic lineages of both modern and ancient populations in order to try to link them in time and space. Remember how I said that the majority of ancient DNA research targets the mitochondrial (maternal) genome? By comparing the frequency of different groups of closely related lineages (called haplogroups) in different populations, we can see how closely they are related. More distantly related populations will have different proportions of haplogroups. This is pretty intuitive when you think about the story behind the science; women living in these populations were passing down their mitochondrial lineages through their daughters and grand-daughters. When a woman moved into a new place, she would have brought her lineage with her. Populations that shared greater proportions of related women would have similar haplogroup frequencies, and would differ from more distant populations.

In modern European populations the most common haplogroup is H; it comprises something like 40% of the population. In fact, my own mitochondrial genome belongs to H, reflecting my mother’s family’s Celtic origins*. It’s therefore crucial to understand how different lineages of haplogroup H are related to each other, or what their phylogeny is. Think of a phylogeny as being analogous to a family tree, with individual mitochondrial lineages being sisters, cousins, second cousins, etc. differing by the mutations they possess. You need to work out how they’re related to each other in order to start understanding their shared histories.

Now, the phylogeny of ancient haplogroup H lineages was worked out previously, but that was done using only the hypervariable regions of the mitochondrial genome. (Again, see this post for an explanation of what the hypervariable regions are, and why they’re the targets of ancient DNA research). It turns out that there’s a whole bunch of genetic variation in the rest of the genome, and without incorporating it, the phylogeny is inaccurate.

The mitochondrial genome, showing the hypervariable control region

The control region, containing two hypervariable segments, makes up only a small proportion of the mitochondrial genome, but is the most frequent target of ancient DNA research. (Image modified from an original source which I’ve unfortunately lost.)

So we (finally!) get to the paper itself! What Brotherton et al. (the authors**) did was first observe that haplogroup H was much less frequent among ancient populations than in modern Europeans; Early Neolithic (~5450 BC) farmers had only a 19% frequency of H, and the older Mesolithic hunter-gatherers basically didn’t have any H. The authors decided to completely sequence the mitochondrial genomes of a sampling of ancient people who were already (through previous research) known to belong to haplogroup H. By expanding sequencing past the hypervariable regions to get at the entire genome, they would be able to “capture” all of the genetic variation, and create more high-resolution phylogenies. This would lead to a better understanding of how individual maternal lineages within H moved into the region.

They chose to sequence DNA from 37 skeletons that spanned ~ 3,500 years of the European Neolithic period (roughly 5450-1575 BC) in the Mittelelbe-Saale region of Saxony-Anhalt (Germany). Without going into the chemistry details, trust me when I say that this is a technically impressive feat!

So what did they find? I’m going to focus only on one of their main results. I’ve excerpted Figure 1A from their paper to show you:

Modified Figure 1a from Brotherton et al., 2013
Modified Figure 1a from Brotherton et al., 2013

I realize this looks like something created by a demented spider. Bear with me, and I’ll explain.

This picture is a network diagram, showing the phylogenetic relationships of all the lineages they obtained from the ancient individuals. The circles are the individuals themselves, colored to represent the different cultures they come from (see the key at top left). The lines are the mutational steps between them, with longer lines indicating more mutations (and thus greater genetic distances). The mutations are listed alongside the lines. Unfilled circles are lineages which aren’t actually present at the sites, but are known about from other places. For fun, I’ve indicated with a purple arrow where I fit in on this network. (Have you ever had your mitochondrial DNA sequenced by one of the commercial genome services? If you belong to haplogroup H, see if you can find yourself on this network, too!)

How do the authors interpret this phylogeny? First, look at the position of the red circles. These are the oldest samples in the study, dating to 5450-4775 BC. Do you see how they’re on shorter lines, closer to the central node? That means they have fewer mutations away from the “basal” H type, and are therefore the oldest lineages! (Remember that lineages accumulate mutations over time, so younger, “more derived” lineages are going to have more mutations). And indeed, we see that the youngest lineages (the ones with the most mutations) tend to correspond to the more recent archaeological sites. It’s a cool pattern, that reinforces the validity of this approach.

This also shows something more subtle, but very important. We’re looking at genetic lineages present throughout time within a single region, remember? So…if that region was continuously occupied by the same group of people and their descendents, we would expect to find the oldest lineages on the same branches as the later lineages. Specifically, we’d expect to see the Early Neolithic individuals (red, orange, yellow, green) to be on the same lines (but closer to the central H node) as the Late Neolithic (light blue and blue), and the Bronze and Iron Age (brown and black) individuals. Instead, they’re all on different lines. This means they’re distinct lineages (not-very-closely-related female ancestors).

And this means that, most likely, there was considerable migration of women (and probably men, though we can’t tell from these data) into Central Europe over time, beginning around 4000 BC. The authors suggest (for various reasons which I won’t get into here) that they were likely immigrants from the West, who interacted with the early Neolithic farmers, and ultimately “superseded” their genetic diversity to shape the patterns of genetic diversity seen in present-day Europeans (including myself!). How cool is that?

Does this explanation make sense? Do you have any questions? Let me know in the comments!

————————————
*Specifically, H 5
** We have a convention for referring to a study as “So-and-so et al.” that recognizes the first author (who did most of the work). “Et al.” is short for “Et alii” which means “and the others”. It’s a cool/ pretentious bit of science tradition that reflects the discipline’s historic usage of Latin.

Doing archaeology with DNA

Imagine you’re an archaeologist. (I know you wanted to be one when you were younger, so let’s pretend you never got sidetracked) You’re digging at a cool site somewhere and you find two completely different types of pottery*. The older type is black with a swirly design and was the only pottery used at this site during that time period. The younger type is red with no decoration. When the red pottery appears at the site, the old black pottery suddenly disappears and is never again made. How do you interpret this?

a) The people at the site suddenly decided that they hated black-colored pottery with a swirly design and only wanted plain, red-colored pottery. So they either invented it for themselves, or perhaps they learned how to make it from some other group.
-or-
b) The people at the site (who used black-colored pottery with a swirly design) were invaded by people who only used red-colored pottery. Soon, there was nobody left (or willing) to make the older type of pottery.
-or-
c) A few people living in another region who used red-colored pottery married people at this site, and brought their special pottery with them. Soon everyone adopted this kind and abandoned the old style of pottery.

I’m sure you can think of other possibilities as well. This is a simplified example, but in fact all three of these scenarios (with different technologies, of course) have actually happened in human prehistory.

How can we choose the correct interpretation between these possibilities? This is one of the biggest questions that anthropology grapples with: When we see cultural change in the archaeological record for a region, is it the result of new ideas/technologies/language being adopted by the inhabitants, or is it the result of people moving into a region and bringing the culture with them? Is it the movement of just ideas (diffusion) or a movement of people (migration)? Or something else?

There are many ways to try to figure this out, depending on what type of data you have available from a site. Maybe, in addition to the pottery, you have the skeletons of the people who lived there, and so you can compare skeletal traits of the people in the two time periods and see if they look very different from each other. Or you can do an analysis of the isotopes in their skeletons and see if they had very different diets (a suggestion that they came from different places). Or, maybe you can get DNA from their bones and see if they come from genetically distinct populations. This last approach is what I and other anthropological geneticists do, and in recent years it’s really revolutionized our understanding of human prehistory. We can directly test hypotheses of human migration by looking at patterns of genetic variation in both present-day populations, and their ancient ancestors. In many cases, DNA can reveal subtle details about the past that archaeological or osteological approaches alone can’t.**

So, we choose skeletons from both pottery-containing layers, and after getting the appropriate permissions, we isolate, amplify, and sequence mitochondrial DNA from them.

Why mitochondrial DNA? It’s a genome that exists separately from our nuclear genome (which is what’s in your chromosomes). Non-coding parts of the mitochondrial genome, called ‘hypervariable regions’, accumulate mutations faster than the rest of the genome, and studying them allows us to ‘see’ much more recent evolutionary events than would otherwise be possible. In addition, mitochondrial DNA is maternally inherited, so it provides a way to trace individual maternal lineages through time and space. Finally, mitochondria exist in many copies per cell (vs. 2 for nuclear DNA). So it’s therefore much more likely that we’ll be able to get mitochondrial DNA from ancient bones where preservation is poor.

DNA in ancient bones is degraded and prone to contamination from modern DNA. So it has to be extracted and handled in a special isolation laboratory, and only by researchers who a) know what they’re doing, b) are willing to dress ridiculously, and c) are willing to cope with a very high failure rate, since ancient DNA is extremely difficult to work with and often isn’t preserved at all.
Shadoboxing in the ancient DNA lab

Venting frustration in the ancient DNA lab when samples don’t work.

Let’s assume we were able to get DNA from a reasonable number of individuals buried in both layers. We sequence it, and figure out which maternal lineages are present in each temporal ‘population’. Using statistical tests, we determine that the two populations are genetically significantly different from each other. So, this is a population replacement, right?!

Maybe. Remember that because we’re looking at mitochondrial DNA, we are only assessing maternal lineage history. There are men at this site as well, and there’s a male history that we simply aren’t getting at with mitochondrial DNA. In order to provide the most complete answer to that question, we’d need to look at Y-chromosome DNA too, which is (of course) exclusively paternally inherited. But since Y-chromosome DNA is in the nuclear genome, it’s much less likely to be preserved in ancient human remains. So quite often the only information we can get about human prehistoric past is limited to female lineages.

Even bearing in mind these limitations, however, by finding significant genetic differences between the black-pottery-using individuals and the red-pottery-using individuals at this site, we’ve just been able to confirm a hypothesis that cultural changes were associated with the immigration of people into this region, and not simply the sharing of ideas. In my next post, I’ll go over a specific study that employs this approach to test similar hypotheses (on a larger scale) about the genetic prehistory of Europe.
————————————————————————

*Sorry, you’re not Indiana Jones. You’re a good archaeologist, who excavates carefully and gets excited about scraps of pottery.

**But without these other types of data, drawing conclusions just from genetics alone can be very problematic. The best approach, in my opinion, is to integrate archaeological, osteological, linguistic, and cultural data, if possible. But since I’m an anthropological geneticist, I’m going to talk mostly about that perspective on the past.