Wednesday, March 7, 2018

Going on the record via preregistration

A public commitment to update my own beliefs in response to a planned analysis I haven’t seen yet (Part 1)

Update, 3/26/18: Unfortunately, my request for the data behind this recently published JPSP paper (McNulty, Meltzer, Makhanova, and Maner, 2018 online publication) was unsuccessful. Dr. McNulty has informed me that there will be a “regrettable delay” of unknown duration in sharing these now published data until his team writes up and successfully publishes a second manuscript on these same data columns. Part 2 of this blog post is here, along with our email exchange about the data sharing question. 

In my previous post, I talked about how essential it is that we, as scientists, remain open to the possibility of having our intuitions disconfirmed.

Now let’s see if I can put my money where my mouth is.

If I take my own admonishment seriously, I need to be willing to have my own intuitions and beliefs disconfirmed—even when those beliefs have developed through years of researching a particular topic.

Here’s one of my own findings in which I have a high degree of confidence. In a meta-analysis I conducted about five years ago, we examined whether a partner’s attractiveness was more romantically appealing to men than to women. We acquired a large collection of published and unpublished datasets (k = 97, N = 29,780) that spanned a variety of paradigms in which men and women reported on partners they had (at a minimum) met face-to-face. Overall, we found that the sex difference in the appeal of attractiveness was not significantly different from zero, and it did not matter whether the study examined initial attraction (e.g., speed-dating, confederate designs) or established relationships (e.g., dating couples, married couples).

Here is a hypothetical illustration of this finding: If a man’s satisfaction in a given relationship is predicted by his female partner’s attractiveness at r = .08, we might find that a woman’s satisfaction is predicted by her male partner’s attractiveness at about r = .03. Meta-analytically, the sex difference is about this size: r(difference) = .05 or smaller. You can interpret this r(difference) like you would interpret r = .05 in any other context—really small, hard to detect, and probably not practically different from zero.
However you slice the meta-analytic data, it is hard to find a sex difference in the appeal
of attractiveness in paradigms where participants have met partners face-to-face.
(p refers to the p value of the sex difference test statistic Qsex.) From here.

Interestingly, the sex difference in attractiveness is much larger when you ask men and women to use a rating scale to indicate how much they think they like physical attractiveness in a partner. The size of this “stated preference” sex difference is about r = .25 (see Table 1 in this paper). [1]

In other words, an r = .25 effect when people make judgments about what they think they like drops to r = .05 when people are responding to partners who they have actually met in real life. 

I find this “effect size drop” deeply fascinating. It opens two interesting questions that have guided much of my research:

1. If men and women truly differ in the extent to which they believe attractiveness to be important in a partner, what factors interfere with the application of these ideals when they evaluate partners in real life?

2. If there is essentially no difference between men and women in how much they actually prefer attractiveness in a real life partner, what sorts of social-cognitive biases might produce the sex difference in how much people think they prefer attractiveness in a partner?

I have spent considerable time and effort in the last decade examining these two questions in my research. We’ve found some answers, and yet there’s still a long way to go in this topic area.

All effect sizes are coded so positive values mean that attractiveness receives higher
ratings/is a larger predictor for men than for women. I am prepared to update the
table after I examine the new McNulty et al. (in press) data according to my
preregistered analysis plan.
But back to my belief that I am putting on the line in this blog post: I believe that the sex difference is about r = .05 (or smaller) when people evaluate real-life partners. I feel pretty confident about this belief, given all the evidence I have seen. But there are other scholars who believe something entirely different.


Since we published the meta-analysis, two empirical articles have taken a strong stance against our conclusion that the sex difference in the appeal of attractiveness is small or nonexistent. I discussed one of them (Li et al., 2013) in an earlier post; given the tiny effective sample size of that study, I won’t discuss it further here. Instead, let’s talk about the second one: Meltzer, McNulty, Jackson, & Karney (2014).

This paper found the expected sex difference in a sample of N = 458 married couples. In brief, they found that women’s attractiveness predicted men’s satisfaction at r = .10, whereas men’s attractiveness predicted women’s satisfaction r = -.05. That’s an r(difference) of .15—still pretty small, but not zero (p = .046).

One unusual element of this paper is that the authors only present this sex difference in one analysis, and it included a large number of covariates. Twenty-eight of them, to be exact. Another element worth noting is that there were actually two ways that the sex difference could have emerged—on the intercept of satisfaction or the slope of satisfaction. The effect that the authors focused on was the intercept; slope effects did not differ for men and women, r(difference) = .02.

Personally, I don’t believe that this analysis provides an accurate depiction of the sex difference. It’s hard for me to buy into the idea that you need twenty-eight covariates in this analysis, and even then, the sex difference only emerges in one place and not the other. In fact, we conducted an identical analysis on some of our own data that had the same variables, and we didn’t find a hint of the sex difference (if anything, the slope effect trended in the opposite direction).

Nevertheless, for the past five years, this debate gets distilled to “Team X says no sex difference, but Team Y says yes.” If someone wants to cite evidence for the absence of the sex difference, they have it; if someone wants to cite evidence for the presence of the sex difference, they can do that, too. This does not seem to be a good scientific recipe for getting closer to the truth.

I’m pretty confident in my belief that the sex difference here is tiny or nonexistent. But you know what? Maybe I’m wrong. If I want to call myself a scientist, I have to be open to that possibility. I have to be willing to say: Here are the data that would convince me to change my belief.

So here it is: I will update my belief if a preregistered test, using the same 28-covariate analysis in a new dataset, replicates the sex difference on the intercept found in Meltzer et al. (2014).

You may be thinking, it’s easy for me to say that, so long as no dataset of the kind exists. But in fact, just the other day, I saw this new published paper (McNulty, Meltzer, Makhanova, & Maner, in press). It primarily examines a different (and totally fascinating!) research question, and it uses a new sample of N = 233 couples. But buried in the descriptions of the covariates in that paper are all of the key variables and all but one of the covariates required to directly replicate the earlier sex difference analysis reported in Meltzer et al. (2014).

Here is what I am committing to, publicly, right now: I have written up a preregistered analysis plan that provides the test I outline above. I will email Jim McNulty for the data they used in this new published manuscript, which I am confident that he will share with me. I will run the preregistered analysis on these data, and I will describe the results as a “Part 2” of this blog post. If the key finding from Meltzer et al. (2014) replicates—that is, if the sex difference on the intercept is significant—then I need to seriously consider the possibility that I am wrong, and I need to update my beliefs accordingly. If it is not, I hope that those scholars who believe in this particular sex difference will be willing to update their beliefs and/or conduct a highly powered test of their prediction.  

Either way, we’ll be getting closer to the truth rather than being stuck in an endless circle around it.

[1] When people talk about the “robust literature” showing that attractiveness matters more to men than to women, they could be talking about one of two things. First, they could be talking about this stated preference sex difference. Second, they might be talking about findings showing that, in hypothetical settings (e.g., viewing photographs), attractiveness tends to matter more to men than to women. In fact, we preregistered a study examining this context and found the sex difference! As I described in this earlier post, the size of the sex difference that we found in a very highly powered design was r = .13. 


  1. This analysis seems like a promising opportunity to advance this debate and the underlying questions surrounding it. It also shows a real commitment to intellectual honesty and the pursuit of scientific truth.

    One note, though. Meltzer et. al and replications of it speak to evaluations of face to face partners in long-term relationships during young adulthood. If research consistently shows that, it would certainly be at odds with your current beliefs and the conclusions of the Psychological Bulletin article. However, the r difference might be .05 in unspecified and short term relationships and among older populations. Li and Meltzer et. al (2014) both say that sex differences are valid, but emerge when young people evaluate long-term partners. That might also be significant because sex-related variation in photographic evaluations seems to span ages and contexts, and the same may be true of states preferences.

    1. Hello Nadia, good to hear from you again! I agree - if the Meltzer et al. (2014) procedure reliably produces a sex difference (but meta-analytic approaches do not), then that last row of my table could perhaps be broken into subgroups of one type or another.

      For what it is worth, we did not have much luck finding evidence for the specific Li & Meltzer, 2015, moderator-combination prediction in one paper that I describe here:

  2. Hi: Interesting post. I look forward to part 2.

    On a small point: I would prefer that you specified and wrote about the sex difference in units other than untransformed r, because the value of the subtraction depends directly on the size of the original r's. A sex difference of r=.08 isn't much when comparing r=.04 and r=-.04, but it's a LOT when it's r=.90 vs r=.98.

    If you're transforming without telling us (which doesn't appear to be the case), please do tell us!

    1. Hi Chris - yes, you're absolutely right. The r(differences) I'm talking about often happen to round off to the same values after you Fisher-transform them, but that's mainly because the correlations in question tend to be pretty small.

      But to make sure I am as accurate as possible: In footnote 1, those photograph correlations were r = .41 for men, r = .28 for women. Technically, that's an r(difference) of .15 if you do the proper transformation (not .13, as I stated there).