Design a site like this with
Get started

The genetic architecture of facial variation

The following is a non-technical summary of our work on facial feature genetics together with some musings on future possibilities, with some genealogical ideas. It is reproduced here from the Lost Cousins newsletter for which I contributed it in 2017.

Additional research has since been published by other groups, which I may comment on later.


Most of us at some point will have met a pair of identical twins, and been astonished by how similar they are in most respects. This is especially so when considering their physical appearance; including height, weight and the pigmentations of their hair, eyes and skin, and this is due them sharing 100% of their DNA sequence. Perhaps most noticeable of all is the similarity in the structure of their facial features, and this tells us that differences in facial appearance between individuals must be overwhelmingly genetic. In other words, they are due to DNA differences, rather than being a result of one’s upbringing or some effect of the external environment.

Genes are also disproportionately shared between lower degree relatives and, accordingly, it is widely understood that facial similarity is on the whole lower between cousins than it is between siblings, and lower still between second cousins, and so on. Degree of likeness seems to manifest as the number of shared or extremely similar facial features, and these often appear to be inherited from particular ancestors, for example when someone is described as having ‘their mother’s eyes’. This is in contrast with other traits such as height, for which people appear to conform roughly to the average between their two parents, after correcting for their sexes and any year-on-year average increase in height due to improved population health.

This is probably due to the particular genetic mechanisms at work. A person’s height is the product of a large number of genes acting in concert, each with a small influence. As we inherit, on average, 25% of our genes from each grandparent, roughly 25% of one’s height-influencing genes are also likely to be inherited from each of them. On the other hand, I propose that a facial feature is likely to be under strong influence of a single gene variant. As each individual has two versions of each variant (termed alleles), these must have descended from just two of their grandparents. This model, based on the strong effects of small numbers of genes, can explain the inheritance pattern of facial features that we tend to observe in families, as it implies that individuals will tend to take after a limited number of relatives that they share at least one of their two alleles with.

In theory, then, it should be possible to locate particular genes that have strong influences on facial appearance, and eventually to understand their biological functions. Only recently has this become feasible due to advances in a) our ability to establish people’s DNA sequence information (or ‘genotypes’) from blood or saliva samples on a large scale, and b) the camera technology that allows one to obtain accurate 3D images of faces. In a new publication, our research group describes work that has resulted in the discovery and verification of 3 genetic variants that have strong effects on facial features, influencing the spacing between the eyes, the protrusion of the face and prominence of the chin. This represents one of the first steps towards uncovering the overall genetic architecture of the human face, which one has to presume remains largely mysterious due to the huge amount of facial variation that exists between people.

Understanding these and other genes’ influences on appearance serves a number of purposes, most obvious being those in forensic science; for example producing e-fit images for suspects from DNA samples they have left at crime scenes. But there are also medical applications. Treatments for those congenital illnesses which have accompanying dysmorphic facial features currently rely on plastic surgeons coming to, inevitably, fairly subjective decisions about the desired appearance for the patient, largely based on the average facial characteristics that exist within the appropriate ethnic background. It would be more desirable to estimate, in quantitative fashion, what the patient would have looked like, if they did not have their particular condition; giving a more accurate objective for the surgical outcome. In theory this can be achieved by interrogating their DNA sequence, provided that a reasonable number of the genetic causes of appearance have been established.

There has long been an interest in reconstructing the outward physical appearance of people based on their skeletal remains; often for forensic purposes, but also in archaeology. Presently this is done by remodelling soft tissue structure, either by hand, over a cast of the underlying skull, or by using 3D computer artistry. These techniques suffer from being relatively subjective, as the distribution of soft tissue can only be estimated approximately. It is now possible to extract accurate DNA profiles from skeletons that are thousands of years old, and this suggests the intriguing possibility of using information on face-influencing genes to build up a picture of what particular individuals from the past would have looked like. From a genealogical perspective, one could in theory even reconstruct the genomes of ancestors, computationally, by piecing together DNA segments shared between living descendants. Facial appearances of these individuals from the past, for whom no biological samples are available, could then be predicted using the remnants of their genomes carried by those living in the present day.


The evolution of sexual reproduction

What is the purpose of reproducing sexually, from a Darwinian perspective? A sexual organism has to discard half of his or her genetic material when forming an offspring, so surely it is more advantageous to clone oneself. Genes for cloning would be guaranteed a copy of themselves in the following generation for each offspring produced, rather then facing a 50/50 coin toss determining their fate. I’m sometimes surprised by how unmysterious this seems to many biologists. Perhaps it appears to be a contingent fact; something occuring by chance in the ancestor of all higher organisms that, now, we are lumbered with. But a few species are capable of parthenogensis, choosing when to reproduce sexually and asexually, strongly implying they find different utilities in each. Only about 0.1% of animals produce entirely asexually, and these tend not to persist for long before becoming extinct (Vrijenhoek, 1998).

There is another way of framing the problem: why bother with males? Males cannot produce their own offspring, and often contribute relatively few resources to their offsprings’ survival, other than their genetic material. In parthenogenic species, females can ‘choose’ to reproduce without the assistance of males, proving that they are not strictly required. Doesn’t a female then benefit more from investing her resources entirely in female offspring, each of which could go on to produce offspring by themselves? The great preponderance of sexual reproduction implies that mating with males must, in fact, allow females to produce more descendants than they could by asexual means, but quite how this happens remains an open question.

This “two-fold cost” of sexual reproduction (Figure 1) poses a deep and challenging problem in evolutionary theory. The most compelling explanations are centred on the role of genetic recombination. Reproducing sexually facilitates this process, in which the pieces of the genome (chromosomes) inherited from the father and mother are cut up and re-spliced together in a random fashion.

It is desirable for the population to have all its harmful genetic variants, those that cause the individuals carrying them to have lower survival and reproduction rates, kept together on the same chromosomes, and all the beneficial variants together on other chromosomes. We refer to the disproportionate presence of beneficial (or harmful) variants together on the same chromosome as ‘coupling linkage disequilibrium’, or ‘coupling LD’ for short (Figure 2). Coupling LD facilitates rapid evolution as harmful variants can be purged rapidly by natural selection. And in the long term, if the fittest possible individuals are to prevail, there must be at least one chromosome consisting entirely of beneficial variants, which is unlikely unless there is a great deal of coupling LD. However, evolution without recombination will tend to lead to a build up of both harmful and beneficial variants on the same chromosomes: repulsion LD, which causes a greater equality between individuals in their numbers of offspring, thus weakening the strength of natural selection.

Sexual reproduction, leading to random recombination between parental chromosomes, allows harmful variants to be hived off from beneficial variants, eliminating repulsion LD and improving the efficiency of natural selection. But is it not also possible that many existing coupling LDs could be split up? In fact, recombination leads to a mixture of repulsion and coupling LD, and a successful theory needs to show why this is a more healthy Darwinian situation than the one that exists its absence, where repulsion LD prevails.

There are three major candidates, each proposing a different generating process for repulsion LD: the
(a)parasite resistance,
(b)negative epistasis and
(c)genetic drift theories.

Put extremely simply, (a)parasites, evolving more quickly than host organisms, can adapt to particular combinations of genotypes present in the hosts, and (b)certain types of epistasis (that is, interactions between genes within an individual’s genome) can cause individuals with large numbers of beneficial variants to gain no great advantage over those with fewer numbers. Processes (a) and (b) are both equivalent to the production of repulsion LDs. These suffer from relatively strict constraints on the form of the parasite-host relationships in the case of (a), and a lack of experimental evidence for the necessary epistatic gene interactions in (b)(Otto, 2009), although this has recently been challenged (Sohail et al., 2017).

Theory (c) has had, arguably, the greatest overall impact. It proposes that random events that occur in ‘small’ populations cause beneficial and harmful mutations to be totally randomly allocated together on chromosomes (quite how ‘small’ they need to be is an important question). So far, so good, as there is an equal mix of repulsion and coupling LDs, making for a fairly healthy evolutionary situation. However, as repulsion LD is cleared away less efficiently by evolution than coupling LD, this creates an asymmetry, whereby random allocation of mutations on chromosomes, and random reproductive success of individuals, eventually lead to a predominance of repulsion LDs (Hill and Robertson, 1966, Felsenstein, 2017).

A shortcoming of this classic result (the Hill-Robertson Effect) is that it is focussed on the long-term consequences of repulsion LD (and its removal) on population fitness. Natural selection is not usually ‘thinking’ about what happens in the long term – if a genetic variant, say conferring recombination, could have a benefit in 100 generations time, selection does not have the ‘foresight’ to maintain it in the population against a competing variant that is more advantageous to its bearers in the short term. And if that short-term advantage is strong enough, the carriers of the long term variant may be wiped out before exercising their advantage. To examine how generation-to-generation increase in fitness is affected by theory (c), I rearranged a mathematical model of evolutionary change to obtain the following equation (Crouch, 2017):

which, in English, is:

The green factor is the key measure of evolutionary success. If the expected change in average fitness is high, the rate at which the population grows is expected to increase. After further work, it can be shown that the “Covariance in average fitness between generations” term decreases in size (i.e. contributes a less negative value) when recombination is active. The covariance between two variables is essentially a measure of how correlated they are. This makes some intuitive sense: sex mixes up the genotypes randomly in the offspring generation so the generations are less correlated with one another in their average fitnesses. With high randomness, those random changes in fitness in the parents are inherited by the offspring generation, causing the covariance, and decreasing the power of natural selection. The red factor, mean fitness in the parental generation, is always positive and, over the timescale considered by my model, unaffected by recombination. Therefore, reducing the size of the covariance term causes the key green factor to increase. Computer simulations support the conclusion that Hill-Robertson types of effects cause significant decreases in the average fitness of asexual populations over short time scales (Hickey and Golding, 2018).

In an infinitely sized population, the covariance term vanishes, and the Hill-Robertson effect is absent. The only term remaining on the right hand side is the variation between individuals that we would find in a population approaching infinite size, i.e. where the kind of randomness we are interested in has been averaged out. In real populations that are so large as to be effectively infinite, sex most likely only provides an advantage via the parasite avoidance or negative epistasis theories.

My personal view is that the advantage of sex comes from some mixture of theories (a), (b) and (c), but that a more general synthesis can be achieved, perhaps via a similar statistical approach to that captured in the equation above. This, in turn, may assist in the identification of novel phenomena that lead to sexual advantages over asex, and also perhaps to non-sexual features of organisms that share evolutionary properties with sex.


CROUCH, D. J. M. 2017. Statistical aspects of evolution under natural selection, with implications for the advantage of sexual reproduction. J Theor Biol, 431, 79-86.

FELSENSTEIN, J. 2017. Theoretical Evolutionary Genetics.

HICKEY, D. & GOLDING, G. 2018. The advantage of recombination when selection is acting at many genetic Loci. Journal of Theoretical Biology.

HILL, W. G. & ROBERTSON, A. 1966. The effect of linkage on limits to artificial selection. Genet Res, 8, 269-94.

OTTO, S. P. 2009. The Evolutionary Enigma of Sex. American Naturalist, 174, S1-S14.

SOHAIL, M., VAKHRUSHEVA, O. A., SUL, J. H., PULIT, S. L., FRANCIOLI, L. C., GENOME OF THE NETHERLANDS, C., ALZHEIMER’S DISEASE NEUROIMAGING, I., VAN DEN BERG, L. H., VELDINK, J. H., DE BAKKER, P. I. W., BAZYKIN, G. A., KONDRASHOV, A. S. & SUNYAEV, S. R. 2017. Negative selection in humans and fruit flies involves synergistic epistasis. Science, 356, 539-542.

VRIJENHOEK, R. C. 1998. Animal clones and diversity. Bioscience, 48, 617-628.