Monday, March 26, 2018

Testing the replicability of claims about a sex difference: A regrettable delay

A public commitment to update my own beliefs in response to a planned analysis I haven’t seen yet (Part 2)

In Part 1 of this series, I tried to make some headway in the debate over sex differences in the appeal of attractiveness in established relationships by putting my own beliefs on the line, pre-registering an analysis plan to see if a prior result would replicate, and publicly committing to update my beliefs regardless of how the results turned out. Unfortunately, this test will have to wait.

Although I assumed that it would be easy to obtain the data from a just-published manuscript, I was incorrect: Dr. McNulty has informed me that there will be a “regrettable delay” of unknown duration in sharing the data underlying the published manuscript until his team finishes working on and successfully publishes a second manuscript analyzing the same columns of data. Once the second manuscript is successfully published, he will be happy to share the data associated with the first manuscript, but he has no guess about how long that might take. Our full email exchange is included below, with Dr. McNulty’s permission.

I think it is fair to say that he and I are reading the APA ethical principle on data-sharing differently. In light of the field’s growing appreciation of the importance of openly and transparently sharing the data that is used in published manuscripts, I wonder if the language in the APA principle needs to be clarified or updated to reflect current standards in the field. (Indeed, the most surprising element to me of our whole exchange was Dr. McNulty noting that one of his colleagues had advised him against ever sharing the data associated with his published manuscript. Clearly, scholars have very different views about whether and when the data behind published papers should be shared with other researchers, and it seems crucial that our societies and journals provide clear guidance to authors going forward.)

In light of the indefinite and regrettable delay, any claims that this particular sex difference is robust seem premature. I have posted below the results of the Meltzer et al. (2014) 28-covariate analysis, as well as the Eastwick et al. (2014) unsuccessful replication attempt, so that readers can get a sense of the existing evidence for this sex difference. I have also left a blank space for the eventual inclusion of a direct replication from the new McNulty et al. (2018 online publication) dataset. I will fill it in once the data from those N = 233 couples are shared with me and I can conduct the preregistered analyses. 

I’ll close with an exhortation to other scholars: Future tests of this idea should examine it in a confirmatory way (i.e., with a detailed analysis plan that is written ahead of time, before seeing the data). My post did not end the debate, but I do hope that this approach will set a standard that helps researchers come together to address this question with strong methods going forward. 

Results of the 28-covariate analysis proposed by Meltzer et al. (2014) and the one direct replication to date (Eastwick et al., 2014). Meltzer et al. (2014) concluded that the association of coder-rated attractiveness with relationship satisfaction is stronger for men than for women (see first Intercept test). I will update the figure when the data for McNulty et al. (2018 online publication) are made available.
Bars indicate 95% CIs. Y axis is effect size q (interpretable like r).

My preferred approach to testing this sex difference is as follows: a random effects meta-analysis examining the effect of coder-rated attractiveness on relationship evaluations (e.g., satisfaction) in established (i.e., dating and/or married) relationships. That meta-analytic effect (k = 11, N = 2,976), which includes both the Meltzer et al. (2014) and Eastwick et al. (2014) data analyzed above, is shown here:

Bar indicates 95% CI. Y axis is effect size q (interpretable like r).

Emails reprinted here, with permission:

March 7, 2018

Hello Jim,

I hope you enjoyed SPSP this year – it was good to run into you briefly. I am writing to request the data associated with your new paper, which looks really interesting:

In addition to the covariates in Table 2 and income (mentioned on p. 4), I would be very appreciative if you would also include extraversion if you have it. But I also recognize that, technically speaking, you are under no obligation to share extraversion given that it wasn’t mentioned in the published article.

My intention is simply to conduct this preregistered analysis plan. If you are curious, I also have written a blog post about the relevant interpretive issues – if you and/or Andrea would like to comment on the second part (once I write it), I would be happy to include your response on the blog.




March 8, 2018

Hi, Paul.

I enjoyed SPSP and it was good to run into you. It was astute of you to realize we have some more data to address our debate. I would be happy to share them with you eventually, but one of Andrea’s doctoral students is currently working on a manuscript that addresses this exact effect. They have been working on it off and on for some time now, but, as is typical, other priorities keep interfering. I fear it could undermine her project to share these valuable data with you and the world right now. That said, I do appreciate complete transparency, as well as your attempts to shed more light on this issue, and I would be happy to share all the data with you once her project is complete. Does that sound okay? I wish I had a good guess as to when that would be, but for some reason I still haven’t figured out how to predict how reviewers will feel about a particular paper. Haha.



March 9, 2018

Hi Jim,

I totally understand wanting to make sure that your student will be able to publish his/her paper. And I realize that my email might not have been clear: I was only suggesting that I would report the results on the blog, not a journal article. You should of course be able to carve up the remaining dataset for journal articles as you see fit – I’m only requesting the data that were used in the in press publication (plus extraversion if you had it and were willing to share it -- but of course, I understand that you are under no obligation to do so since it’s not in the published article). I wouldn’t anticipate that a blog post on this particular analysis would interfere with your student’s ability to report and build off of it in a future article.




March 16, 2018

Hi Jim,

I just wanted to follow up with you on the message I sent last week requesting the data from your in press JPSP. I’m still excited to take a look, and I want to reiterate that my plan is only to share the results of the preregistered analyses on a blog (i.e., not a journal publication). In case it helps mitigate the concerns you articulated about wanting to publish analyses based on these data in a separate article, I had an idea: What if I only post the effect sizes and confidence intervals associated with the three sex difference tests that I preregistered (i.e., no other statistical information or detailed descriptives)?

I really hope that we can navigate these data sharing complexities ourselves in a friendly way – I am committed to making some progress on the sex difference question by conducting and reporting the analyses I preregistered on my blog however they turn out, and you of course should be able to publish additional analyses in the future off of these published data. I do think it’s important to keep in mind that the data I am requesting are now published, and that this means that ethically, they must be made available to “other competent professionals” (APA, 8.14, 2010). But I’d much rather do this in a friendly and informal way over email rather than going through the journal or APA or something.

If I don’t hear from you by next Friday (the 23rd), I’ll go ahead and update my blog to indicate that you declined to share the data, and we’ll go from there.




March 20, 2018


I understand that you do not plan to pursue publication of the data you requested. And I believe you are probably correct that a blog will not interfere with a future publication. However, I must admit that the blogosphere is extremely foreign to me and I perceive that it seems to have some traction. I also have no idea what the future holds. I see no reason to risk even an unlikely negative outcome for one of our students. I’m not sure I was clear in my original email, but the student is not simply working with these data; she is working on a manuscript describing the sex difference in the association between partner attractiveness and marital satisfaction—the precise effect in question. I have received advice from two colleagues who are unattached to this debate and they tell me not to share the data yet (one says don’t share it at all).

Regarding any ethical obligation to share the data with you, my read of the APA ethics statement on this issue is that I am only obligated to share with “other competent professionals” who intend to replicate the result in question. APA Ethical Principles specify that "after research results are published, psychologists do not withhold the data on which their conclusions are based from other competent professionals who seek to verify the substantive claims through reanalysis and who intend to use such data only for that purpose, provided that the confidentiality of the participants can be protected and unless legal rights concerning proprietary data preclude their release" (Standard 8.14).”Retrieved: You left out of your email the critical qualifier that I bolded above. It is quite clear from your email, and from the fact that you preregistered a completely unrelated analysis of my covariates, that you have no intentions to verify our substantive claims but instead want to capitalize on our covariates to address your own research goals.

To be honest with you, Paul, what is frustrating to me about your latest email that threatens to post on your blog that I declined your request and potentially take up this issue with APA is that I did not decline your request. As I said in my original email, I will give you the data after the student working on this exact effect is finished, even though I do not believe I am obligated to do so, because I too am committed to science and understanding this sex difference. If you post anything on your blog about this other than the fact that there will be a regrettable delay in getting the data from us, please also post this entire string of emails so people can decide for themselves if I am being unethical.



March 22, 2018

Hi Jim,

Thanks for your reply. It seems like we have different interpretations of the APA data-sharing principle (at least as it applies in this case). I thought it was self-evident that my proposed analysis was addressing a “substantive claim” of your published manuscript: You tested and reported a sex difference in the partner attractiveness-infidelity association, and concluded the following on pp. 15-16: “This latter sex difference is consistent with evidence that partner attractiveness is more important to men than it is to women (Li et al., 2013; McNulty, Neff, & Karney, 2008; Meltzer et al., 2014a, 2014b), and thereby challenges the idea that the importance of partner attractiveness is equivalent across men and women (see Eastwick & Finkel, 2008).” You had the opportunity to conduct the same analysis that you and your colleagues have argued is the best test of this sex difference (Meltzer et al., 2014a; this is the analysis I proposed in my blog post) to see if the Meltzer et al. (2014a) findings would replicate in this new dataset. Although you did not report this analysis, you claimed in the Discussion of your paper to have supported those findings anyway.

In my blog post, I proposed to reanalyze the data from your published paper in order to test the claim that “partner attractiveness is more important to men than it is to women” (p. 16). To me, it seems like the APA data-sharing principle (as well as the field’s current norms about the importance of openness and transparency) applies here. Nevertheless, I agree that multiple interpretations of the APA principle are possible and I appreciate your willingness to engage with me on this issue.

I’m disappointed that there will be a regrettable delay (as you note) in your sharing of these data. I’m also sad to hear that, in this day and age, your colleagues are advising you to delay or avoid sharing the data behind a published paper. I appreciate your willingness to allow me to post our email exchange, and I apologize if you worried that I would misrepresent you – that was definitely not my intention, and I agree with you that it is important to post the exchange for transparency’s sake.



PS: Despite all this, I really do think the new paper is cool. One of the questions it addresses had come up a few days beforehand in my grad class.

Wednesday, March 7, 2018

Going on the record via preregistration

A public commitment to update my own beliefs in response to a planned analysis I haven’t seen yet (Part 1)

Update, 3/26/18: Unfortunately, my request for the data behind this recently published JPSP paper (McNulty, Meltzer, Makhanova, and Maner, 2018 online publication) was unsuccessful. Dr. McNulty has informed me that there will be a “regrettable delay” of unknown duration in sharing these now published data until his team writes up and successfully publishes a second manuscript on these same data columns. Part 2 of this blog post is here, along with our email exchange about the data sharing question. 

In my previous post, I talked about how essential it is that we, as scientists, remain open to the possibility of having our intuitions disconfirmed.

Now let’s see if I can put my money where my mouth is.

If I take my own admonishment seriously, I need to be willing to have my own intuitions and beliefs disconfirmed—even when those beliefs have developed through years of researching a particular topic.

Here’s one of my own findings in which I have a high degree of confidence. In a meta-analysis I conducted about five years ago, we examined whether a partner’s attractiveness was more romantically appealing to men than to women. We acquired a large collection of published and unpublished datasets (k = 97, N = 29,780) that spanned a variety of paradigms in which men and women reported on partners they had (at a minimum) met face-to-face. Overall, we found that the sex difference in the appeal of attractiveness was not significantly different from zero, and it did not matter whether the study examined initial attraction (e.g., speed-dating, confederate designs) or established relationships (e.g., dating couples, married couples).

Here is a hypothetical illustration of this finding: If a man’s satisfaction in a given relationship is predicted by his female partner’s attractiveness at r = .08, we might find that a woman’s satisfaction is predicted by her male partner’s attractiveness at about r = .03. Meta-analytically, the sex difference is about this size: r(difference) = .05 or smaller. You can interpret this r(difference) like you would interpret r = .05 in any other context—really small, hard to detect, and probably not practically different from zero.
However you slice the meta-analytic data, it is hard to find a sex difference in the appeal
of attractiveness in paradigms where participants have met partners face-to-face.
(p refers to the p value of the sex difference test statistic Qsex.) From here.

Interestingly, the sex difference in attractiveness is much larger when you ask men and women to use a rating scale to indicate how much they think they like physical attractiveness in a partner. The size of this “stated preference” sex difference is about r = .25 (see Table 1 in this paper). [1]

In other words, an r = .25 effect when people make judgments about what they think they like drops to r = .05 when people are responding to partners who they have actually met in real life. 

I find this “effect size drop” deeply fascinating. It opens two interesting questions that have guided much of my research:

1. If men and women truly differ in the extent to which they believe attractiveness to be important in a partner, what factors interfere with the application of these ideals when they evaluate partners in real life?

2. If there is essentially no difference between men and women in how much they actually prefer attractiveness in a real life partner, what sorts of social-cognitive biases might produce the sex difference in how much people think they prefer attractiveness in a partner?

I have spent considerable time and effort in the last decade examining these two questions in my research. We’ve found some answers, and yet there’s still a long way to go in this topic area.

All effect sizes are coded so positive values mean that attractiveness receives higher
ratings/is a larger predictor for men than for women. I am prepared to update the
table after I examine the new McNulty et al. (in press) data according to my
preregistered analysis plan.
But back to my belief that I am putting on the line in this blog post: I believe that the sex difference is about r = .05 (or smaller) when people evaluate real-life partners. I feel pretty confident about this belief, given all the evidence I have seen. But there are other scholars who believe something entirely different.


Since we published the meta-analysis, two empirical articles have taken a strong stance against our conclusion that the sex difference in the appeal of attractiveness is small or nonexistent. I discussed one of them (Li et al., 2013) in an earlier post; given the tiny effective sample size of that study, I won’t discuss it further here. Instead, let’s talk about the second one: Meltzer, McNulty, Jackson, & Karney (2014).

This paper found the expected sex difference in a sample of N = 458 married couples. In brief, they found that women’s attractiveness predicted men’s satisfaction at r = .10, whereas men’s attractiveness predicted women’s satisfaction r = -.05. That’s an r(difference) of .15—still pretty small, but not zero (p = .046).

One unusual element of this paper is that the authors only present this sex difference in one analysis, and it included a large number of covariates. Twenty-eight of them, to be exact. Another element worth noting is that there were actually two ways that the sex difference could have emerged—on the intercept of satisfaction or the slope of satisfaction. The effect that the authors focused on was the intercept; slope effects did not differ for men and women, r(difference) = .02.

Personally, I don’t believe that this analysis provides an accurate depiction of the sex difference. It’s hard for me to buy into the idea that you need twenty-eight covariates in this analysis, and even then, the sex difference only emerges in one place and not the other. In fact, we conducted an identical analysis on some of our own data that had the same variables, and we didn’t find a hint of the sex difference (if anything, the slope effect trended in the opposite direction).

Nevertheless, for the past five years, this debate gets distilled to “Team X says no sex difference, but Team Y says yes.” If someone wants to cite evidence for the absence of the sex difference, they have it; if someone wants to cite evidence for the presence of the sex difference, they can do that, too. This does not seem to be a good scientific recipe for getting closer to the truth.

I’m pretty confident in my belief that the sex difference here is tiny or nonexistent. But you know what? Maybe I’m wrong. If I want to call myself a scientist, I have to be open to that possibility. I have to be willing to say: Here are the data that would convince me to change my belief.

So here it is: I will update my belief if a preregistered test, using the same 28-covariate analysis in a new dataset, replicates the sex difference on the intercept found in Meltzer et al. (2014).

You may be thinking, it’s easy for me to say that, so long as no dataset of the kind exists. But in fact, just the other day, I saw this new published paper (McNulty, Meltzer, Makhanova, & Maner, in press). It primarily examines a different (and totally fascinating!) research question, and it uses a new sample of N = 233 couples. But buried in the descriptions of the covariates in that paper are all of the key variables and all but one of the covariates required to directly replicate the earlier sex difference analysis reported in Meltzer et al. (2014).

Here is what I am committing to, publicly, right now: I have written up a preregistered analysis plan that provides the test I outline above. I will email Jim McNulty for the data they used in this new published manuscript, which I am confident that he will share with me. I will run the preregistered analysis on these data, and I will describe the results as a “Part 2” of this blog post. If the key finding from Meltzer et al. (2014) replicates—that is, if the sex difference on the intercept is significant—then I need to seriously consider the possibility that I am wrong, and I need to update my beliefs accordingly. If it is not, I hope that those scholars who believe in this particular sex difference will be willing to update their beliefs and/or conduct a highly powered test of their prediction.  

Either way, we’ll be getting closer to the truth rather than being stuck in an endless circle around it.

[1] When people talk about the “robust literature” showing that attractiveness matters more to men than to women, they could be talking about one of two things. First, they could be talking about this stated preference sex difference. Second, they might be talking about findings showing that, in hypothetical settings (e.g., viewing photographs), attractiveness tends to matter more to men than to women. In fact, we preregistered a study examining this context and found the sex difference! As I described in this earlier post, the size of the sex difference that we found in a very highly powered design was r = .13.