Tuesday, January 4, 2011

Around the web: cognitive sex differences

The "Around the Web" series highlights informative websites, and also targeted blog posts and news articles, relevant to the courses I teach. Last semester I taught Anth 143: Biology of Human Behavior, an introductory-level course that covers the basics of evolution, behavioral biology, and the interaction of biology and culture. My hope is that these posts are useful not only for my current students, but other people hoping to gain background or insight into these topics.

ResearchBlogging.orgAh, cognitive sex differences. Here we often find a mix of explanations for why we don't need to try to achieve equity in the sciences, or for why women are simply less interested in the sciences. There are plenty of examples trotted out of men's superiority in spatial ability, and the few where women are sometimes found to be superior put women on a pedestal without gaining her any real power or advantage in society (look at lovely woman, so able to verbally communicate that it makes her a good mommy and wife!).

This year has been a good year to critically evaluate cognitive sex differences, thanks to Cordelia Fine's book Delusions of Gender and the many spaces online that have reviewed her book. I have yet to read it and it didn't turn up under the Christmas tree, so I'll be buying it for myself. The reviews have me very excited.

So, I'll start there, then work my way through the other cool stuff that's been covered this year.

Delusions of Gender

Slate reviews the book and interviews Fine. Here is one of my favorite quotes from her:
We look around in our society, and we want to explain whatever state of sex inequality we have. It's more comfortable to attribute it to some internal difference between men and women than the idea that there must be something very unjust about our society. As long as there has been brain science there have been misguided explanations and justification for sex and inequality — that women's skulls are the wrong shape, that their brain is too small, that their head is too unspecialized. It was once very cutting-edge to put a brain on a scale, and now we have cutting-edge research that is genuinely sophisticated and exciting, but we're still very much at the beginning of our journey of understanding of how our brain creates the mind.
New Scientist also has a review in CultureLab. This article also reviews Jordan-Young's Brainstorm, which looks like a similarly excellent book on the topic of sex differences. It is published with Harvard University Press rather than a press that tends to attract a wider audience, so maybe that's why Fine's book has received more attention.

Katherine Bouton reviews the article in the New York Times. The last line was my favorite: "It’s really not just a few steps from looking longer at moving objects to aptitude in math, from gazing at faces to mind reading."

This Language Log post refers to the Bouton one and makes some interesting parallels between the Connellan et al (2001) article Fine dismantles and the Hauser misconduct case. I love teaching the Connellan et al (2001) article, and have been for many years -- it's such a great example of reductionist wording, flawed methodology, and incorrect conclusions off the authors' own evidence. I have used it in particular in introductory writing courses, as a way to show students they can be critical thinkers, since they quickly pick up on most of the paper's errors.

The Language Log post already dismantled the flawed methodology. I just want to briefly mention the flawed conclusions off the results they get. Remember, Connellan et al are using Connellan's face, and a mobile comprised of a broken up photo of her face, as the two objects the infants are gazing at. Staring at Connellan implies a preference for faces and eventual social superiority, where preference for the mobile implies a preference for physical-mechanical objects.

Below, I've reproduced Tables 1 and 2.

Table 1. Number (and percent) of neonates falling into each perference [sic] category
Face preferenceMobile preferenceNo preference
Males (n = 44)11 (25.0%)19 (43.2%)14 (31.8%)
Females (n = 58)21 (36.2%)10 (17.2%27 (46.6%)

Table 2. Mean percent looking times (and standard deviation) for each stimulus
Males (n = 44)45.6 (23.5)51.9 (23.3)
Females (n = 58)49.4 (20.8)40.6 (25.0)

Let's pretend for a minute that there were not significant methodological concerns and just look at the data. What I notice are a few things. First, females primarily exhibit NO preference, not facial preference. If half my subjects exhibited no preference, I'd probably have to say the methods and stimuli were flawed. Males might have a slight mobile preference, but even if that were statistically significant, I'm not sure there is a lot of biological meaning to 19 vs 11 individuals' preferences. Further, they mention that their statistical significance derives entirely from the greater male preference for the mobile (not a greater female preference for the face), yet their conclusions indicate female superiority in social cognition skills.

Table 2 is perhaps more damning. First, the difference in percent looking time is not really different between any of the four groups (male/face, male/mobile; female/face, female/mobile). This becomes more obvious when you consider the standard deviations. Again, it is important to place statistical significance in the context of biological usefulness. Do these few seconds' difference in looking time tell us something, or not? My bet is on the latter.

Other delightful bits

Coverage of Fine's book wasn't the only time I got to read about cognitive sex differences, prejudice, and social conditioning. Most of the posts and articles I link to this section should provide very strong evidence for social conditioning playing a primary role in cognitive and behavioral sex differences. I am quite sure there are some genetic and/or biological differences between the sexes; however, I am unconvinced that they would amount to much of anything if we didn't seize upon them and nurture them from birth. Further, meta-analyses of cognitive sex difference studies have found very small effect sizes, which means that overall, even when differences are found in empirical studies, those differences are tiny (Hyde 2005).

Check out Greg Laden's great post: Why do women shop and men hunt? He does a nice job criticizing the idea of some sort of universal Pleistocene environment of evolutionary adaptedness (EEA), which already does a lot to undermine arguments that humans have evolved certain sex-specific behaviors over the last few million years due to foraging in the savannah. He also discusses the huge amount of variation in social structure among modern humans, which helps us understand why this idea that there is essential male and female behavior is flawed.

Here's a neat Time Magazine article on pink toys. It discusses the Pink Stinks campaign, which I follow on Twitter.

This article discusses the damage that can be done to a woman's cognitive ability when she is objectified. I know I have trouble thinking when I receive comments on my physical appearance in my student evaluations, and the few times this has been done to me professionally by colleagues.

Related to this, Communicate Science discusses a study that had male and female actors give scripted 10-minute physics lectures and then had real physics students give evaluations (the students thought they were lecturers). The males received higher evaluations overall -- when broken down by student gender, the female students gave slightly higher evals to the female lecturers, but the male students gave MUCH higher evals to the male lecturers. This is the sort of study that keeps me up at night, thinking about going up for tenure as a female scientist.

More on physics teaching: Ed Yong writes about a writing exercise that helps reinforce students' values and their sense of self, which then appears to close the gender gap in physics assessment. I had my students do this assignment on the last day of class as a way to help them with their finals (though we only did it for about 2 minutes -- I encouraged them to do more at home). A really neat piece!

Pharyngula is a blog I read often, and was one of the first science blogs I ever read, but I don't think PZ's work has ever made it into one of my Around the Web posts. However, this post, "Attention, perversely assertive women! You are abnormal!" really resonated with me. He covers a recent news story about using dexamethasone to pre-treat normal girl fetuses (and those with the legitimate genetic disorder CAH) to prevent masculine preferences and behaviors.

Next, an article in the New York Times Business Section on why more women aren't the boss. There are some interesting thoughts shared on mentorship and risk-taking behaviors.

The always-brilliant Jennifer Ouelette discusses the idea that "boyz will be boyz" in her post that dismantles the idea that female science teachers are feminizing science classes and increasing the dropout rate for boys.

Finally, I don't know how to introduce this piece, "The Rise of Enlightened Sexism" by Susan Douglas up at On the Issues, except to say: read it. Read it now.

Random interesting tidbits

I had intended to finish this post in time for the end of 2010. I had wanted to send you in the direction of some pretty pictures as a way to close out the year, so let this be some eye candy to start you off well for 2011. Myrmecos (who I feel privileged to know in person through his fantabulous wife) offers up "The Best of Myrmecos 2010." I will be honest here and say that, before this blog, I had close to zero appreciation for insects and mostly thought of ways to keep them out of my house and office, or kill them if they came in. I pay a lot more attention to them now, and wish I knew more.

And, Jerry Coyne put together some images from National Geographic that I liked from the 2010 contest.

Happy new year to all!


Connellan, J. (2000). Sex differences in human neonatal social perception Infant Behavior and Development, 23 (1), 113-118 DOI: 10.1016/S0163-6383(00)00032-1

Hyde, J. (2005). The Gender Similarities Hypothesis. American Psychologist, 60 (6), 581-592 DOI: 10.1037/0003-066X.60.6.581


  1. Oh so many comments to make! On the Fine quote about unjust behavior in our society in regard to the sexes and the Miller-McCune article, it seems to make sense to me since colleagues are studying weathering and embodiment of the effects of racism on health that sexism could be creating the same sorts of issues for women or others of the non-dominant gender.
    The comment you made about students commenting on physical appearance brings up two of my own experiences. Once a student wrote that he would like to see "more sundresses" on a question that asked something about professor performance, and my immediate thought was "but I never even wore a sundress to class and what the hell does that have to do with anything!" Recently I saw a former student and he commented that it was the first time he had seen me in pants instead of a skirt. I knew this was a fallacy b/c I always wear jeans and a t-shirt on test days in an effort to send psychological signals that students should be less stressed during testing. It was interesting for me to note that when his mind needed to be on the test, it wasn't on what I was wearing...

  2. I haven't read the Connellan study and have no horse in the race. However, I think your discussion of the Connellan methodology is perhaps missing some context. I don't know about anthropology, but in most psych studies, it is not assumed that a real effect can be detected in an individual subject. It depends on your signal-to-noise ratio. In neonate studies, signal-to-noise is terrible. You have to look at groups. That's doubly true for preferential-looking studies. So if you have a problem with this, you have a problem with 95% of psych studies ever run. Which may be the case, but then it isn't really about Connellan.

    I'm also unsure what to make of this focus on the "fact" that it's the boys who drive the effect. Leaving aside the fact that absolute numbers are never interpretable in preferential-looking studies (you can't control stimulus size, total luminance, contrast, etc., all at the same time, and these all have effects), let's say that what we're seeing is *less* of a mobile-preference in girls. How are the implications different from saying girls have more of a face preference?

  3. Erica, thanks for commenting! I think the work on the impact of racism on health is powerful and important, and I agree that similar work on sexism could yield interesting results. I also have lots of stories to share about students' perceptions and comments about my clothing and other elements of my physical appearance. Sometimes flattering (I get at least two students a year who ask me about my Frye boots), but mostly awkward, embarrassing or disturbing.

    GamesWithWords, I wonder if I just didn't explain myself well enough, because I'm not sure I understand your point either. I did not go into reading the Connellan et al with the idea that a real effect can be detected in an individual (nor did I assume that's where they started).

    For your second point, the main conclusions of the paper were that girls had more of a face preference. This is not what they found, by my reading of the evidence. The only thing they found is that, within boys, the largest group was mobile preference, followed by no preference, followed by face preference. But if the primary preference for girls was NO preference -- fully 46% of the female subjects had no preference -- I don't get where they get their conclusion from. Do you see what I'm saying? The within-sex variation suggests, to me, that girls don't prefer faces. Further, the standard deviations of the difference in amount of looking time at the face versus mobile were so great as to completely drown out the tiny differences found between the within-sex averages. That makes the differences effectively zero.

  4. @KBHC. Two quotes from your comment:

    "I did not go into readin the Connellan et al with the idea that a real effect can be detected in an individual."
    "Fully 46% of the female subjects had no preference."

    I think these two statements are contradictory. If you don't expect to see effects in individual subjects, then it doesn't matter how many subjects showed effects. Honestly, I don't know why they provided those numbers, because they don't really matter.

    What matters is the comparison of the two populations (male and female), and there the effects are, if not huge, respectable. To use numbers, the Cohen's d effect size is around .5 in one direction for the males and .5 in the other direction for the females. 0.5 is standardly considered a mid-sized effect.

    As far as whether the females had a face preference ... I think that you are using "preference" as an absolute term, but preferential looking only gives us relative information. Face preference and Mobile preference are two sides of the same spectrum. Saying that the boys had a stronger mobile preference is identical to saying that the girls had a stronger face preference.

    Let me put this another way. How much people look at something is a complex interaction of relevant features (whatever it is that makes a face look different from a mobile) and irrelevant features (size, luminosity, etc.). What is relevant and what is irrelevant is itself an empirical question, but let's imagine for the moment we know. Ideally, when designing a preferential looking experiment, you would equate the two objects in terms of their irrelevant features. That way, you know that any looking time differences are due to the relevant features and not the irrelevant features. Obviously, that can't be done with faces and mobiles.

    It's possible that, based on the luminance, size, and contrast, mobiles were 1000x more attractive than faces. But since everyone likes faces a whole lot more than mobiles, you end up with babies looking at each about half the time. But girls have a stronger face preference, so they end up looking a bit more than half the time, and boys have a weaker face preference, so they look less than half the time. Or maybe faces are 1000x more attractive (based on luminance, etc.), but everyone really likes mobiles. Who knows.

    Maybe the authors understood this and were using "face preference" as a sloppy short-hand. Maybe they didn't really appreciate the issues, either. I don't think this is relevant to the conclusions, though, which is that girls looked at the faces more than the boys did. This preference over the long term could have have consequences for development.

    Incidentally, these same kinds of studies have been done with Autism. Babies with Autism look at faces less than babies who don't (I forget if the foil was a mobile in those studies). What I'm saying is, you're looking at a pretty run-of-the-mill study analyzed in a run-of-the-mill fashion. Given the high false-positive rate in science (particularly on sensational studies), I'd want to see it replicated. But nothing you've said so far makes it look any different than thousands of other preferential looking studies.

  5. GamesWithWords, if you look at the Language Log post that I link to, which, as I said, describes serious methodological concerns that I didn't reiterate, you'll see that it's not run-of-the-mill. If I remember correctly, Liz Spelke has also written some pretty interesting criticisms of this article, and Baron-Cohen's work more generally (he is second author on that paper).

    And while I'd like to move on, from where I sit -- which is certainly not in psychology, so there are likely disciplinary differences here -- if I performed a study with two stimuli where I was looking for a sex difference in their preference in the two stimuli, and one sex had no preference half the time, I would redo the study. Maybe that means I misunderstand psychology literature to you, but in anthro I don't think that would fly.

  6. @KBHC -- It's very rare in psychology to find significant effects in individual subjects. We just don't have that kind of signal-to-noise (except those damnable vision folk). You might choose to rerun the study if you got those results, but you'd rerun every study you ever conducted (unless, again, you're one of those damnable vision folk). And if such results couldn't be reported in anthro, that limits what kinds of things anthropologists allow themselves to study.

    This discussion sounds to me a little like saying to a particle physicist -- wait, you study things that can't actually be seen with the naked eye? That wouldn't fly in my field! This is just the nature of the beast. I found this post interesting -- and commented on it -- because these kinds of misunderstandings seem typical when someone from one field actually sits down and reads through the methods used in another field.

    In saying this was a run-of-the-mill study, I was focusing only on what you reported above. As I said, I haven't read the study. The Language Log post focused on the fact that the experimenters weren't blind to condition. That was the similarity to Hauser's work. To the best of my knowledge, BTW, blind coding wasn't what got Hauser in trouble; it was common knowledge that studies were not run blind in that lab. I'm sorry to say that these are not the only labs that don't do blind coding, and I agree it is a problem. But that has nothing to do with what you wrote about above.

    As far as Liz Spelke, I don't remember what her issue with this study was. Since she invented the preferential looking method -- and since preferential looking studies *never* show significant results for each kid -- I doubt she'd have a problem with that aspect of the results. There may be other issues with the study (the fact that the experimenter was not blind may or may not actually be a problem; you'd have to run an experiment to find out), but those results that worry you would be considered a very respectable effect.

    I already mentioned the respectable Cohen's d of 0.5 based on Table 2. I just checked, and if you stick your numbers from Table 1 into a chi square test, you get a significant p-value on that alone, again demonstrating that the boys and girls were likely sampled from different underlying distributions. Do you not use chi square in anthropology?

  7. sorry, I meant "blind experimenting" not "blind coding".

  8. Hmmm, my post prior to the correction right above doesn't appear to have posted. I'll try to reconstruct.

    The reason I found your post interesting and commented is that it's interesting to see what happens when someone unfamiliar with a field tries to actually engage with the methodology and data used in that field. Sometimes it seems like we, as scientists, don't actually learn how to do anything, but the truth is we gain a great deal of cultural knowledge, and that becomes quickly apparent when anyone from outside the culture tries to interact with our cultural products -- that is, our papers.

    I suspect I would find an arbitrary anthro paper just as suspicious. Do you use self-report measures such as questionnaires where you ask people to report on their own behavior, reasons for doing things, etc? Cognitive psychologists *hate* that kind of methodology. This leads to a lot of heated arguments with social psychologists, who use self-report questionnaires a lot. I think it's worth erring on the side of caution: a field probably adopted its current methodology for a reason.

    So just to reiterate a few points about psychology. It's very rare to find an effect that's significant in most -- or even *any* -- of your participants (unless you're one of those damnable vision folk). If you want to rerun every study that doesn't have significant findings for most of the individual participants, you're going to have to rerun them all (except those damnable vision studies). The signal-to-noise ratio is simply too small.

    In any case, as I mentioned previously, focusing on whether the girls showed a significant effect is focusing on the wrong thing, because absolute numbers are uninterpretable here. What matters is the difference between boys and girls, which appears robust. As I mentioned, looking at your Table 2, the effect sizes are pretty decent (Cohen's ds around .5). Even if you look at Table 1 and feed those numbers into a chi square, the result is significant, which is pretty strong evidence that the girls and boys differ, at least as a population.

  9. The Language Log post was concerned with the fact that experimenters were not blind to condition. That was an issue in many Hauser lab studies as well, though I don't think that's what got him in trouble (it was common knowledge he didn't do blind testing). Whether that's actually a problem for the study is an empirical question; you'd want to run the experiment both ways to find out (or, alternatively, just run it blind to begin with, which is what they should have done).

    I don't know what Liz Spelke's issue with the study is. Since she pioneered preferential looking studies, I seriously doubt she's worried about the issues you discuss above. As I said before, that's just what preferential looking study data looks like. You might not like it, but then you also don't like any of Liz Spelke's work, either. If I were to lay odds, I'm guessing she was concerned about the experimenters not being blind to condition; her lab is very careful about such things.

    It should be pointed out that being blind to condition is itself a cultural issue. In the kinds of studies I run, experimenters are rarely if ever blind to condition, and it's generally assumed not to matter. Certainly, it's never caused any of my studies to give me the results I expected! With this kind of preferential looking study, it's more likely an issue, but again that's an empirical question.

  10. I think the comment was temporarily hung by Blogger because you have commented so many times on this post :). I think I just released everything from moderation.