Over the last few decades, the sequencing of the human genome has revealed how the mechanisms of inheritance contribute to disease. Whether it’s a particular health problem, such as obesity, or a propensity for a disease, such as cancer, the DNA we inherit from our ancestors has more impact on our health outcomes than we may think.
But when you consider that the human genome has in it roughly three billion base pairs made up of four nucleotides—which are the units typically scrutinized in genetic studies—the overwhelming enormity of the data sets in a genomic project becomes apparent. “We’re starting to get data sets with information about the genomes of hundreds of thousands of people,” says Amy L. Williams, Biological Statistics and Computational Biology, who works on a wide range of human genetic studies. “That scale of data creates a huge computational burden on algorithms that were developed even a couple of years ago.”
Williams has taken her computer science background, combined with a love for characterizing human genetic ancestry, and applied it to developing new computational methods for addressing the needs of modern genetics studies. “I’m interested in developing methods that enable medically relevant studies and that also tell us something about human ancestry and human history,” Williams says. “I also do analyses that help characterize the means by which genetic variation arises, namely mutation and recombination. These research directions are interrelated. To make medical genetic inference, we need models that account for the genetics histories of populations and peoples. We can’t adequately test for association with disease unless we can distinguish which signals flag biologically meaningful genetic variants from those that are only a marker of ancestry. When we find an association, we have to be certain that it is not just a benign difference between these people.”
Ancestry and Disease: A Breast Cancer Study
In one study Williams joined with Laura Fejerman and Elad Ziv (University of California San Francisco School of Medicine), Christopher Haiman (University of Southern California), and other researchers to look at the question of ancestry and disease as it applied to a group of Latinas diagnosed with breast cancer. Specifically, the researchers wanted to ascertain the population-of-origin of each position in the women’s genomes. People of Latino background are admixed individuals, carrying genes from more than one population that had previously evolved in isolation from each other for thousands of years. In the case of the Latinas in the study, those groups were of European, Native American, and African origin.
“To make medical genetic inference, we need models that account for the genetics histories of populations and peoples,” Williams says.
“We asked whether there were positions in these women’s genomes that have more ancestry from one population than from another, averaged across all the women together,” Williams says. To do that, Williams helped create a method to compare the women’s ancestry at distinct positions, called genetic loci, along their chromosomes. At positions where there was no effect on breast cancer, the average ancestry of all contributing ancestral populations was the same as elsewhere in the genome. But at risk loci, positions that do influence whether a woman will develop breast cancer, the program found either higher or lower ancestry from one of the contributing populations.
“We found that these Latinas had increased European ancestry at a certain position on chromosome 6,” says Williams. “That tells us that European ancestry at that particular position on that chromosome is a risk factor for developing breast cancer. But it also tells us that there are genetic differences between European and Native American population groups that impact susceptibility to breast cancer.”
Transmitting DNA from Parents to children
In another study, Williams joined Molly Przeworki (Columbia University), John Blangero (Texas Biomedical Research Center), and David Reich (Harvard Medical School) to characterize the way DNA is transmitted from parents to children through meiosis. During meiosis—a type of cell division central to sexual reproduction, which creates cells with half the chromosomes of the parent cell—the two copies of a parent’s genome are broken apart and reassembled in a controlled process. The reassembled genome is a single copy, which the child will inherit from that parent. The other copy of the child’s genome will come from the child’s other parent.
Meiosis has long been known to include crossovers—large-scale switches—where millions of base pairs in a chromosome switch out with other base pairs. In this study, however, the researchers looked at another type of switching between chromosomes called non-crossover, which is comparatively quite small, only a hundred or a thousand base pairs long. “We found there’s actually a very strong bias component to which strand gets transmitted to the child,” says Williams. In fact, two of the four nucleotides in DNA— cytosine and guanine—are more likely to be transmitted. “On an evolutionary scale it’s very clear that this bias has a significant impact on the composition of genomes,” Williams says. “Cytosine and guanine are transmitted at a rate of 68 percent at positions that were subject to non-crossovers, or 18 percent more often than we would expect if non-crossovers were random.”
In that same study, Williams and her fellow researchers used DNA samples from multiple generations of the same families to discover how likely non-crossovers were to occur at any particular position on a chromosome. They found the incidence of non-crossover at any particular position was roughly six out of a million.
“You might say this is small,” says Williams, “but at an evolutionary scale, it has a non-trivial effect on genetic variation due to this bias in transmissions of nucleotides across generations.” This has impact on the sequence composition of genomes, and it may also influence the number of copies of mutations segregating in humans that affect human disease. More specifically, a cytosine or guanine mutation that affects disease susceptibility can be pushed to higher frequency by non-crossovers.
These findings can impact the results of all kinds of genetics studies dealing with disease as well as analyses of worldwide genetic variation, Williams explains. “We’ve known about non-crossover in the past, but this study leads to an understanding of several new parameters that will help us build our models and correctly characterize genetic data from now on.”