Thursday, May 16, 2019

Testing the replicability of claims about a sex difference: A brief update

A public commitment to update my own beliefs in response to a planned analysis I haven’t seen yet (part 3)

Over the last year or so, several people have asked me if I had any updates about the data sharing quandary that I covered in part 1 and part 2 of this series. Now, I do.

To recap: As part of an effort to make sure I was willing to update my beliefs in response to empirical data (even if the data disconfirmed those beliefs), I preregistered an analysis plan in March of 2018 to test the replicability of a sex difference in the effect of attractiveness on marital satisfaction. The relevant data would have been the published data columns from a recent JPSP article  (which reported a different sex difference and only tangentially mentioned the one I would be testing in the Discussion). The owner of the data, Dr. McNulty, felt that my analysis was too close to a graduate student’s paper that was in the works and asked that I wait until the student’s paper was published, and so that’s what we did.

In February 2019, the graduate student’s paper was published online in PSPB. I wrote to the authors, and after some back and forth, they sent me the data on April 24, 2019.

Importantly, they sent me the data on the condition that I would only use the data to verify the exact analysis published in the PSPB article. Dr. McNulty explained that he wants to add additional data to the PSPB dataset before reporting the sex difference in the effect of attractiveness on marital satisfaction.

So, I am permitted to report that I have verified the published analysis that is reported in the PSPB (a three-way interaction that moderates the two-way interaction that I am interested in). But I am not permitted to report the underlying two-way interaction (i.e., the analyses that could assess the replicability of the sex difference tested in Meltzer, McNulty, Jackson, & Karney, 2014, and in Eastwick, Luchies, Finkel, & Hunt, 2014).*


This has all been a strange foray into the complexities of data sharing. These issues are thorny, people have strong opinions on both sides, and I don’t want to spend my time right now trying to push things further with PSPB or any other relevant governing body.

Nevertheless, I do want to reiterate that this whole episode wasn’t originally about data sharing. It was actually about preregistration and being willing to update a strongly held scientific belief in light of new data that could have gone either way. It was about the usefulness of declaring publicly what results would be persuasive, and what results would not be persuasive. It was about specifying an analysis plan as a means of improving one’s ability to differentiate signal from noise, and vice versa.

When it comes to increasing our understanding of sex differences in the appeal of attractiveness in marriages, a persuasive contribution would have been enabled by a preregistered analysis plan that constrained the many researcher degrees of freedom (e.g., stopping rules for data collection, planned covariates) that happen to characterize this research area. It’s a missed opportunity that these hard-to-collect data won’t be able to do that.

* In the 3-way interaction analysis in the PSPB, the covariates are not quite the same as what is reported in the original Meltzer et al. (2014) article. I would need to remove some covariates and shift around some others to reproduce the Meltzer et al. (2014) analysis; Dr. McNulty has asked me not to do this.


  1. Paul, you say in your post that you were allowed to look at maximization*partner attractiveness*sex. Did they let you look at the slope on that 3 way interaction, or only the intercept?

    In Part 2, you said, "Future tests of this idea should examine it in a confirmatory way (i.e., with a detailed analysis plan that is written ahead of time, before seeing the data)." It seems like that's not how things unfolded with this dataset. Yet, confirmatory studies with constrained researcher degrees of freedom would be very helpful, maybe even more so.

    You discussed some interpretative issues in post 1. The French and Meltzer paper would have some answers to why different paradigms show such different effects, if their results are reproducible. Schmitt predicted that individual variation would moderate sex differences(pp. 669) in his commentary on your Psychological Bulletin article. That being said, I am not entirely convinced by their results. PSPB was right to publish it, because it is an intensive study design with longitudinal reports. Nonetheless, the key results were marginally significant with no preregistered hypotheses. If their (Meltzer, McNulty, etc.) account is correct, then the effect size is relatively small, and it will take large sample sizes of several hundred couples to test it. Making reliable and rigorous judgments about moderators will entail considerable power. I don't think the French and Meltzer arguments are baseless, just that a bigger sample size is needed to demonstrate that maximization is a moderator.

  2. Hi Nadia,

    I definitely think the individual difference moderation idea is intriguing, and I think Maximization could be a good candidate. That being said, you're right about the power issues. To put a fine point on it: If the sex difference is as large as they think it is (q = .15), detecting that (intercept) effect with 80% power requires N = 1400, which means a "knockout interaction" would require N = 5600. If it's as small as I think it is (q = .05), detecting that effect requires N = 12,500, and the interaction would be N = 50,000. So it can probably only be done convincingly by compiling across many different datasets (e.g., a meta-analysis).

    And no to your first permission to look beyond the published analyses.