Zero Sum
How wonderful that we have met with a paradox. Now we have some hope of making progress. — Neils Bohr
Click on the section headings below to reveal/hide the contents
Substance: WordPress Style: Rachel
Posted Sep 3rd, 2008 by ravi    / Permalink /

To begin at the beginning (the latest beginning), the issue of women’s innate abilities at doing mathematics or science was offered as a possible reason for their under-representation in tenured positions at top universities in these fields, by Larry Summers, economics advisor to President Clinton and erstwhile president of Harvard University, in a speech (which he presciently deemed “provocative”) at the National Bureau of Economic Research. Listing other areas of under-representation (example: white men in the NBA), Summers urges:

These are all phenomena in which one observes underrepresentation, and I think it’s important to try to think systematically and clinically about the reasons for underrepresentation.

With this in hand, he proceeds:

There are three broad hypotheses about the sources of the very substantial disparities that this conference’s papers document and have been documented before with respect to the presence of women in high-end scientific professions. One is what I would call the-I’ll explain each of these in a few moments and comment on how important I think they are-the first is what I call the high-powered job hypothesis. The second is what I would call different availability of aptitude at the high end, and the third is what I would call different socialization and patterns of discrimination in a search. And in my own view, their importance probably ranks in exactly the order that I just described.

In other words, Summers ranks “aptitude availability” (the lack of it, that is) higher than “patterns of discrimination” in their importance of explaining the under-representation of women. It is worthwhile to stop at this point and emphasise that Summers is not only a liberal Democrat, but can be seen to represent the elite view in that segment, and thus he suggests:

Another way to put the point is to say, what fraction of young women in their mid-twenties make a decision that they don’t want to have a job that they think about eighty hours a week. What fraction of young men make a decision that they’re unwilling to have a job that they think about eighty hours a week, and to observe what the difference is. And that has got to be a large part of what is observed. Now that begs entirely the normative questions-which I’ll get to a little later-of, is our society right to expect that level of effort from people who hold the most prominent jobs? Is our society right to have familial arrangements in which women are asked to make that choice and asked more to make that choice than men? Is our society right to ask of anybody to have a prominent job at this level of intensity, and I think those are all questions that I want to come back to.

before ignoring the normative issues and moving on to the second most important contributor to female under-representation:

It does appear that on many, many different human attributes-height, weight, propensity for criminality, overall IQ, mathematical ability, scientific ability-there is relatively clear evidence that whatever the difference in means-which can be debated-there is a difference in the standard deviation, and variability of a male and a female population. And that is true with respect to attributes that are and are not plausibly, culturally determined.

Summers then goes on to offer various back of the envelope standard deviation calculations based on the top 5% of twelfth graders, and concludes:

So my sense is that the unfortunate truth-I would far prefer to believe something else, because it would be easier to address what is surely a serious social problem if something else were true-is that the combination of the high-powered job hypothesis and the differing variances probably explains a fair amount of this problem.

And in case you are left unconvinced by his nod to present objections (while not seriously addressing them):

Now, it’s pointed out by one of the papers at this conference that these tests are not a very good measure and are not highly predictive with respect to people’s ability to do that. And that’s absolutely right. But I don’t think that resolves the issue at all.

He offers a clincher:

So, I think, while I would prefer to believe otherwise, I guess my experience with my two and a half year old twin daughters who were not given dolls and who were given trucks, and found themselves saying to each other, look, daddy truck is carrying the baby truck, tells me something. And I think it’s just something that you probably have to recognize.

Even as he sets out to discuss his least important cause (socialisation and discrimination) he dismisses its possible significance via a modified “nature vs nurture” argument:

One is socialization. Somehow little girls are all socialized towards nursing and little boys are socialized towards building bridges. No doubt there is some truth in that. I would be hesitant about assigning too much weight to that hypothesis for two reasons. First, most of what we’ve learned from empirical psychology in the last fifteen years has been that people naturally attribute things to socialization that are in fact not attributable to socialization.

This is arguable at two levels. First, the claim that “people naturally attribute things to socialization” seems unlikely in the face of the instinctive primitive biologism that most people employ in explaining human behaviour and outcomes, as is testified to by the various folk adages and maxims. Entire caste systems have been justified using exactly that sort of “innate ability” argument. Second, the claim that the last fifteen years (including results from twin studies) have cast a doubt on environmental factors in human outcomes is highly specious given the lack of any references.

Summers valiantly hopes:

I would like nothing better than to be proved wrong, because I would like nothing better than for these problems to be addressable simply by everybody understanding what they are, and working very hard to address them.

However he seems unclear on whom the burden of proof rests on, here: on those favouring a parsimonious assumption that differences in innate abilities pale in comparison to environmental and other (normative) factors, or on Summers, who is admittedly putting forth a “provocative” and not greatly substantiated proposal.

The questions raised in response offer various criticisms of Summers dismal summation of female limitations:

Q: […] So I think that there is a real great need on both sides to begin to talk about whether or not we can predict. I hate to use a sports metaphor, but I will. This is drawn basically from an example from Claude Steele, where he says, he starts by using free throws as a way of actually determining, who should-you’ve got to field a basketball team, and you clearly want the people who make ten out of ten, and you say, “Well, I may not want the people who make zero out of ten,” but what about the people who make four out of ten. If you use that as the measure, Shaq will be left on the sidelines.

And regarding female representation in other locales:

Q: What about the rest of the world. Are we keeping up? Physics, France, very high powered women in science in top positions. Same nature, same hormones, same ambitions we have to assume. Different cultural, given.

LHS: Good question. Good question. I don’t know much about it.

And the nature of the conclusions we can draw from measurement (emphasis below is mine):

Q: I would like to make an on observation and then make a suggestion. The observation is that of the three. There is a contradiction in your three major observations that is the high-powered intensive need of scientific work-that’s the first-and then the ability, and then the socialization, the social process. Would it be possible the first two result from the last one and that math ability could be a result of education, parenting, a lot of things. We only observe what happens, we don’t know the reason for why there’s a variance. I’ll give you another thing, a suggestion. The suggestion is that one way to read your remarks is to say maybe those are not the things we can solve immediately. Especially as leaders of higher education because they are just so wide, so deep, and involves all aspects of society, institution, education, a lot of things, parenting, marriages are institutions, for example. We could have changed the institution of those things a lot of things we cannot change. Rather, it’s not nature and nurture, it is really pre-college versus post-college. From your college point of view maybe those are things too late and too little you can do but a lot of things which are determined by sources outside the college you’re in.

Some hard data

A study by researchers at University of California at Berkeley and at University of Wisconsin, Madison looked into test score data for high school students, to ascertain differences in mathematics performance.

Their findings are summarised as:

Funded by a grant from the National Science Foundation, the researchers reached their conclusions after sifting through mountains of data, including math scores from 7 million students who were tested in accordance with the federal No Child Left Behind Act (NCLB). The team compared not only the average performance of all students on these tests, but also the scores of just the most gifted children, as well as the ability of children to solve complex math problems. In all cases, girls measured up to boys.

A criticism that has persisted is that variance is greater among boys and hence (since the averages for boys and girls are the same) there is a greater distribution of “genius” level performers among boys i.e., boys are over-represented in the 99th percentile.

Some critics argue, however, that even when average performance is equal, gender discrepancies may still exist at the highest levels of mathematical ability. To account for this possibility, researchers compared the variability in boys’ and girls’ math scores, the idea being that if more boys fell into the top scoring percentiles than girls, the variance in their scores would be greater.

Again, the team found little difference, as did a comparison of how well boys and girls did on questions requiring complex problem solving.

While this suggests that the variance was similar, some have offered a critique on the basis of older data:

Among the studies avoiding these pitfalls is the Project Talent1 study of 1960. It remains one of the best assessments of cognitive sex differences in a complete age group ever made. The sample was designed to be representative of all 15 year-olds in the US. It included more than 73,000 15 year-olds, both students and nonstudents. They were given an all-day battery of 23 cognitive tests. The mathematics results revealed a mean (male-female) difference of 0.12 standard deviations2 and a 1.20 (male/female) variance ratio.

The National Post writes that the very data presented by the Berkeley/Wisconsin researchers vindicate Summers since:

Appealing to the data that existed in 2005, Mr. Summers described a concrete example of this phenomenon. He noted that the male-female ratio in the top 5% of Grade 12 math students appeared to be about two to one, suggesting that the variance in male test scores was probably about 20% higher than that of female ones. On average, in other words, women tend to be more average.

And that’s exactly what Ms. Hyde’s team found: The test data for boys were spread out more in every state, and in every single grade, by between 11% and 21%. That may not sound like a big difference. But such differences can create tremendous disparities in the relative proportion of men and women meeting a certain criterion.

Indeed it is true that the research team’s data finds that the ratio of boys to girls at the 99th percentile is approximately 2:1. Does this vindicate Summers? The National Post not only believes so, but goes on to claim victory in the underlying struggle (albeit in a tone that is known in the blogosphere as “concern troll“):

On the other hand, if there are relevant innate differences between the sexes of the sort that Mr. Summers brought up, the quest to stamp out discrimination and achieve pure equality will, at some point, become a waste of effort.

Given all the effort that has gone into sex-based affirmative action in recent decades — both in Canada and the United States — we must ask ourselves: Have we reached this point already?

This deep concern on the part of the National Post, that we may be wasting our effort in trying to provide equal opportunity to girls, would be legitimate (as also their claim regarding Summers) if the question were one of fact. However, as one of Summers’ questioners pointed out:

We only observe what happens, we don’t know the reason for why there’s a variance.

The question, on other words, remains begged. And it is a surprising bit of reasoning to offer that a few decades of sex-based corrective action can reverse centuries of discrimination, or even erase existing inequalities.

The criticism of Summers is on two fronts:

  • His haste to make pronouncements while admitting the lack of hard data or reasoning on his part.
  • The ordering of causes offered by him with little reasoning offered to defend such an ordering.

These criticisms remain legitimate and will do so until they can be refuted by pointing to contrary content in his speech. The question of what the data implies is answered implicitly (if entirely unsatisfactorily) in the National Post’s reference to “affirmative action in recent decades”. That the National Post chose to dangle the duration (“recent decades”) admits to the point that the measurements in themselves cannot settle the question. For if they could, we could well have measured scores the day after the institution of affirmative action and pronounced it a waste of time.

The question that remains unaddressed is the one that the National Post raises and dismisses (via a clever interrogative) summarily towards the end: when can/do the measurement tell us what part of differences are innately caused and what are environmental? One way to gain a partial answer is to study the 99th percentile performance data over time. I do not have access to such data, but findings in a similar debate may be informative:

The IQ debate

Murray and Herrnstein’s book “The Bell Curve” echoes the claim that IQ tests measure an innate fixed ability in human beings, and differences in IQ between groups is therefore something we may have to live with. Data is offered by the authors, and others, pointing out that African Americans score, on average, lower than White Americans (who in turn score below Asian Americans, but that is not a matter for discussion!) in IQ tests. To reiterate, This consistent under-performance is held to be innate and fixed since IQ measures an innate and fixed capacity (more question begging!).

There is however a small problem with this reasoning, and it is called the Flynn Effect. From Wikipedia:

The Flynn effect is the rise of average Intelligence Quotient (IQ) test scores over the generations, an effect seen in most parts of the world, although at greatly varying rates. It is named after James R. Flynn, who did much to document it and promote awareness of its implications. This increase has been continuous and roughly linear from the earliest days of testing to the present. “Test scores are certainly going up all over the world, but whether intelligence itself has risen remains controversial,” psychologist Ulric Neisser wrote in an article in 1997 in The American Scientist.

If IQ were measuring innate intelligence, which current consensus in evolutionary biology would exhibit no noticeable evolutionary effect in a matter of decades, then the average IQ of children 60 years ago should echo the average of group of similar age today. However, the Flynn Effect finds otherwise:

Ulric Neisser, who in 1995 headed an American Psychological Association task force writing a consensus statement on the state of intelligence research, estimates that if American children of 1932 could take an IQ test normed in 1997 their average IQ would have been only about 80.[1] In other words, half of the children in 1932 would be classified as having borderline mental retardation or worse in 1997.

The Environment Factor

While Summers passes lightly over environmental contributors, offering anecdotes about his children to dismiss them, the parsimonious explanation(s), in light of the measured improvement (in girls’ mathematics performance and in IQ), is that the environment plays a critical role in both test performance and in career outcomes.

Even if tests measure innate abilities of individuals (itself a questionable proposition as seen above), they are not performed via electrodes attached to the subject’s brain, but mediated through his or her attitudes, interests and external influences. One example of this is “stereotype threat”:

Telling girls that boys are better than they are at mathematics can irritate them so much that it negatively impacts their performance, according to a U.S. study.

Researchers from three U.S. universities found that the threat of stereotypes could create worries that undermined the women’s short-term memory system needed for problem solving.

“The women start worrying about screwing up which uses up important short term or working memory which could otherwise be used performing the task,” said Sian Beilock, assistant professor in psychology at the University of Chicago and lead investigator in the study.

Interestingly, white students underperform when explicitly competing against Asian American students. Wikipedia notes the real consequences of stereotype threat:

Stereotype threat can result in physiological responses, since the pressure and fear caused by negative stereotypes is so great. For example, a study by Blascovich J, Spencer SJ, Quinn D and Steele C. found that African Americans under stereotype threat exhibited larger increases in arterial blood pressure during an academic test, and performed more poorly on difficult test items. Some researchers feel this may explain the higher death rates from hypertension-related disorders among African Americans.[16] A study by Toni Schmader and Michael Johns found that stereotype threat can effectively reduce working memory capacity, another factor in poor test performance.[17] Stereotype threat may undermine intellectual performance by triggering a disruptive mental load. Studies have found increased heart rates for test subject operating under stereotype threat.

This threat is the crux of the criticism of Larry Summer, a powerful member of the opinion-making class, who indulges in idle speculation on the capabilities of a discriminated group, while masquerading as an enlightened liberal. The damage wreaked by his endorsement of the stereotype is to be reaped in the ensuing “decades”.

Comments Feed
4 Responses

  • Bappa says:

    Good eye opening article. I categorize myself as a liberal and generally skeptical about broad generalizations. But even I had pretty much accepted these studies when they were splashed on the newspapers a few years back. Partly because of the people who were making the claims and also because of anecdotal evidence about the other half of the claim in the newspaper articles. They had mentioned then that girls inherently have better linguistic skills while boys had better analytical/mathematical skills. And at least looking around in my immediate circle, baby girls do start talking earlier and with more clarity than baby boys. Off course, its anecdotal evidence and moreover doesnt address the mathematical skills part at all. But guess, your article taught me to be even more vigilant and skeptical than I have been all along.

    As you have pointed out, the dangers of such studies (and conclusions) is real since I have two daughters. So, maybe my subconscious preconceptions would have affected them in some way.

  • ravi says:


    I think the issue remains unresolved: the results from the latest research demonstrate a 2:1 ratio in male:female at the 99th percentile. Perhaps, as the conservatives claim, this is due to innate abilities. But given the weight of the evidence of discrimination, I think the burden of proving that rests heavily on the “innate” theorists. Especially since the ratio has (I believe) decreased. Such a decrease validates the notion of environmental contribution. Conservatives are left arguing that we have no nullified those contributions and the residue is entirely due to innate differences. This is a tall claim, indeed!

    My problem with Larry Summers is that someone of his prominence, as a self-proclaimed liberal, would feed into the stereotype threat so blatantly, especially by explicitly ranking innate over environment (as he did in his speech), when he has (or offers) no justification for such a ranking.

    Then there is also the value of 99th percentile genius and to what extent the output of a person in this class is entirely individual. As the mathematician and women’s studies scholar Moon Duchin says:

    “Lots of people think this is a non-social field—would math come out differently in a society with a different social organization?” While she’s not trying to debunk the existence of genius (“there really are people you meet in math and you learn about who just synthesize things in ways that other people don’t have access to with any investment of time”), the Great Man theory “definitely stilts the narrative. A real intellectual history is harder to do but it illuminates the math very differently.”

  • ravi says:

    Another thing that bothers me is that Summers starts off by saying that we should think “systematically and clinically” but then that is exactly what he proceeds not to do, speculating idly, generalising from anecdote, and so on. What he really means is to lecture fellow liberals thus: just because we are committed to women’s equality we should not overlook what the data tells us. That’s the defence (of Summers) offered by Steve Pinker, also. This is at best patronising. The criticism of Summers’ comments is exactly that the data and analysis do not compel us to his conclusion/conjecture. If anything, it is his conjecture that is driven by psychological factors (“we must not look ideological”).

  • PSri says:

    Agree with you, Ravi…exactly the problem I had with Pinker’s ‘The Blank Slate’: the smugness of the writer who says ‘hey I’m liberal and not judgmental in the least, but the numbers don’t lie.’ They do, and so do the writer.
    Lets substitute ‘women’ with…I dont know, Arkansans? Polynesians? Buddhists? Children of unwed mothers? People with olive eyes and brown hair? People who work in the Indian software industry? People who drive Fords? People who can wiggle their ears?
    Each of these is a segment in the population. Most of these segments are not likely to be proportionally represented in the upper echelons of the scientific hierarchy, any more than they are likely to be proportionally represented in the CEO community of Corporate America – or in Major League Baseball. Corporate, sporting or scientific achievers are not random sample subsets. They are spikes. Spikes are special cases – cases of highly driven individuals willing to trade in a lot of things in order to monomanically achieve a single goal, combined with favorable environmental factors. If a large percentage of the spikes are from a single segment (over a significant amount of time), we need to see why that segment should see more spikes than other segments. The fact that more spikes arise in one segment than others is a statistical fact about the SPIKES, not about the SEGMENTS. Most human beings who dont want to think about work 80 hours a week. A large majority of us are discriminated against and belittled and told every day that we are inferior. A VAST majority of us have other interests than sitting in a room and churning out abstract papers. An overwhelming majority of us are not -ever – going to occupy the stratospheres of scientific, artistic, sporting, commercial or political leadership. Therefore, Summers could have made the same remark, without loss of generality, about Indians, Chinese, Africans and people from Houston, Texas.