Monday, May 27, 2019

The Big Thing Relationships Researchers Don’t Study

You might think that relationship researchers would investigate how relationships form. But we don’t—largely because relationship formation is surprisingly difficult to study.

When Eli Finkel and I conducted our speed-dating studies over a decade ago, we were hopeful that we would see our participants go on to form actual romantic relationships. That is, we thought we would be able to follow participants from their very first impressions of each other through the formation of a dating relationship.

About a third of our speed-daters went on to have something like a coffee date with someone they met at speed-dating. But in the weeks and months following the initial event, only about 5% of participants reported having a casual or serious dating relationship with one of their fellow speed-daters.

Was this low percentage something weird about speed-dating? Maybe Northwestern University undergrads have terrible social skills? Well, in 2008, Eli and I were part of a co-ed kickball league in Chicago, and ~150 twenty- and thirty-somethings from this league got together on a weekly basis to compete, eat, and imbibe a few alcoholic beverages.  We administered a survey to try to get a sense of how often people were forming relationships across this league.

The period of time between the moment two people meet and the
formation of a committed relationship is empirically hazy.
Was relationship formation more common in this group? Yes… a whopping 7%.  That is, 7% of single people who took part in the kickball league formed a relationship with someone else in the league over the course of a 2-3 month season. 

In the years since, there haven’t been many more attempts to capture relationship formation as it happens; I could probably count these studies on one hand. The path from strangers to relationship partners is extremely hard to study. And, in my view, it remains one of the greatest untapped reservoirs of interesting psychological phenomena.


Close relationship scholars are pretty good at studying initial attraction between strangers, and we are really good at studying people who agree that they are currently romantic partners. But what about the time period between initial attraction and "real relationship" status? It’s more or less missing entirely from our literature. 

As Eli Finkel, Jeff Simpson, and I argue in this recent (open access) Psych Inquiry article, this gap in the literature is a big problem.  Why? Three reasons:

Ethan Hawke might have been able to pick up a stranger on a
train, but most people form romantic relationships with
acquaintances and friends, not strangers.
1. This period of time is not short. The average is about a year. There are exceptions, of course, but people typically form romantic relationships by drawing from their network of preferred-sex friends and acquaintances. Successfully chatting up a stranger a la Before Sunrise is not the norm. So we are missing out on about a year's worth of presumably important psychological processes. 

2. There are many studies that examine whether individual differences predict relationship outcomes. But very rarely do these studies get measures of individual differences that are uncontaminated by a current relationship (i.e., measured before the current relationship had the chance to shape them).  Sure, some studies recruit participants right as they start dating each other, but even these studies are not capturing the true beginning of the relationship. If you just started dating someone who has been your friend for the past year, she could have been boosting your self-esteem or exacerbating your attachment anxiety for that entire time. This means that even though we think we’re studying the effect of individual differences on relationship processes, we may actually be studying the effect of relationship processes on relationship processes.

3. The fields of evolutionary psychology and close relationships both inspire a lot of work on romantic relationships, yet remain surprisingly disconnected given this shared focus. The mystery period between initial attraction and acknowledged romantic relationship might be hindering integration across these two fields: Most evolutionary psychological studies resemble studies of initial attraction (e.g., participants evaluate a stranger depicted in a photograph), whereas close relationships studies often focus on existing relationships (e.g., participants report on a dating partner over time). Sometimes, scholars suggest that studies of initial attraction capture short-term mating whereas studies of established relationships capture long-term mating, but this suggestion imbues a methodological distinction with intense theoretical weight (i.e., are you capturing two theoretically distinct mating strategies or simply measuring two points along a normative arc?). By filling in the missing time period between initial attraction and relationship formation, we may be able to shed better light on the true distinctions between short-term and long-term relationships, since it very well might take weeks or months after an initial interaction before people figure out whether someone is friend material, hookup material, or bring-home-to-meet-grandma material.

Our article offers a meta-theoretical framework for thinking about time across the entirety of a romantic relationship, from the moment two people actually meet. You can also read several very thoughtful commentaries on our article from close relationships, sexuality, and evolutionary psychological scholars who are deeply committed to studying these issues as well (see here, here, here, and here, and see here for our reply).


Thursday, May 16, 2019

Testing the replicability of claims about a sex difference: A brief update

A public commitment to update my own beliefs in response to a planned analysis I haven’t seen yet (part 3)

Over the last year or so, several people have asked me if I had any updates about the data sharing quandary that I covered in part 1 and part 2 of this series. Now, I do.

To recap: As part of an effort to make sure I was willing to update my beliefs in response to empirical data (even if the data disconfirmed those beliefs), I preregistered an analysis plan in March of 2018 to test the replicability of a sex difference in the effect of attractiveness on marital satisfaction. The relevant data would have been the published data columns from a recent JPSP article  (which reported a different sex difference and only tangentially mentioned the one I would be testing in the Discussion). The owner of the data, Dr. McNulty, felt that my analysis was too close to a graduate student’s paper that was in the works and asked that I wait until the student’s paper was published, and so that’s what we did.

In February 2019, the graduate student’s paper was published online in PSPB. I wrote to the authors, and after some back and forth, they sent me the data on April 24, 2019.

Importantly, they sent me the data on the condition that I would only use the data to verify the exact analysis published in the PSPB article. Dr. McNulty explained that he wants to add additional data to the PSPB dataset before reporting the sex difference in the effect of attractiveness on marital satisfaction.

So, I am permitted to report that I have verified the published analysis that is reported in the PSPB (a three-way interaction that moderates the two-way interaction that I am interested in). But I am not permitted to report the underlying two-way interaction (i.e., the analyses that could assess the replicability of the sex difference tested in Meltzer, McNulty, Jackson, & Karney, 2014, and in Eastwick, Luchies, Finkel, & Hunt, 2014).*


This has all been a strange foray into the complexities of data sharing. These issues are thorny, people have strong opinions on both sides, and I don’t want to spend my time right now trying to push things further with PSPB or any other relevant governing body.

Nevertheless, I do want to reiterate that this whole episode wasn’t originally about data sharing. It was actually about preregistration and being willing to update a strongly held scientific belief in light of new data that could have gone either way. It was about the usefulness of declaring publicly what results would be persuasive, and what results would not be persuasive. It was about specifying an analysis plan as a means of improving one’s ability to differentiate signal from noise, and vice versa.

When it comes to increasing our understanding of sex differences in the appeal of attractiveness in marriages, a persuasive contribution would have been enabled by a preregistered analysis plan that constrained the many researcher degrees of freedom (e.g., stopping rules for data collection, planned covariates) that happen to characterize this research area. It’s a missed opportunity that these hard-to-collect data won’t be able to do that.

* In the 3-way interaction analysis in the PSPB, the covariates are not quite the same as what is reported in the original Meltzer et al. (2014) article. I would need to remove some covariates and shift around some others to reproduce the Meltzer et al. (2014) analysis; Dr. McNulty has asked me not to do this.

Tuesday, August 7, 2018

What is a Mate Preference?

If you study nonhuman animals, there is one answer to this question. But if you study humans, there are two.

Top: Interior decorating skills - a highly desirable
attribute in male bowerbirds

Bottom: One operationalization of the preference
for "nest fanciness" in bowerbirds, r = .54
(Borgia, 1995). x-axis = # of decorations;
 y-axis = # of copulations.
Here is a male bowerbird. Like other members of his species, he is a natural interior decorator: He uses pieces of colorful plastic and glass to make his nest look as fancy as possible.

Now, imagine you wish to test the hypothesis that females have a preference for males who are able to construct fancier nests. You might examine the extent to which (a) the nest fanciness of several male bowerbirds predicts (b) the extent to which female bowerbirds want to mate with them. In fact, this association is fairly strong (see graph at right), suggesting that bowerbird females have a strong preference for the attribute “nest fanciness” in their mates.

Meanwhile, if you study humans rather than bowerbirds, you might want to test the hypothesis that women have a parallel mate preference for fanciness of abode. You could examine the extent to which (a) the apartment fanciness of several men predicts (b) the extent to which women want to date each of them.

I could be wrong, but I don’t think anyone has conducted this study with humans. And that might be because there is a second, completely different way of studying preferences for attributes when working with human participants: You just ask them. After all, asking is much easier: Humans (unlike bowerbirds) can simply rate the appeal of the attribute “fancy apartment” on a 1-9 scale. [1]

But here’s the problem: Asking humans about their mate preferences assesses their ideas about the attributes they like and dislike, rather than the extent to which an attribute actually drives their mate preferences in real life. In other words, these two mate preferences are not the same construct. Rather than providing a quick-and-easy measurement shortcut to the human analogue of bowerbird mate preferences, rating scales provide a tool for measuring a different—and perhaps uniquely human—type of mate preference.


As this new paper discusses, these two types of preferences are distinct enough that they deserve different names. As a nod to the animal behavior literature, we call the first one—the association between (a) the level of an attribute in each of a series of targets and (b) liking for each of those targets—a functional preference. [2] We call the second one a summarized preference because it reflects a person’s evaluative summary of the attribute as an overall concept. A functional preference for an attribute is the extent to which the attribute drives liking for a set of targets (e.g., the extent to which the intelligence of a potential partner drives the extent to which you like them). A summarized preference for an attribute is the extent to which a person likes the attribute itself, as a concept (e.g., your evaluation of the attribute “intelligence in a romantic partner”).

The paper linked above reviews how functional and summarized preferences fundamentally differ in a number of ways. For one, they have different evolutionary origins. Functional preferences should exist in any species that possesses an evaluative mechanism (e.g., these food sources are good, and these are bad). But a summarized preference requires an organism to be able to evaluate an attribute as an abstraction—an evolutionarily much more recent ability—and it seems plausible that only humans can do this. What’s more, functional and summarized preferences tend to be biased by different sources of information, and they may have different downstream consequences.

In the existing mate preferences literature, the summarized preference is the construct you will see most often. That’s okay if you intend to study people’s ideas about the attributes they like. But it’s not a shortcut to studying functional preferences: Depending on the context, the correlation between a summarized and functional preference for the same attribute ranges from r = ~.00 (if people are evaluating live interaction partners) to r = ~.20 (if people are evaluating photographs). In other words, although both constructs are interesting, they’re not the same thing.

Seven studies in the mating domain examining the correlation between
summarized and functional preferences for the same attribute. Correlations
approximate r = .20 when people rate partners depicted in photographs and
r = .00 when people rate partners face-to-face.

Adapted from Ledgerwood, Eastwick, & Smith (in press, PSPR)

And if we want to understand mate preferences in humans, we have to stop conflating summarized and functional preferences; overlooking this distinction creates a construct validity nightmare. For example, many scholars take inspiration from the animal literature on mating to generate new predictions about human mating. But they then test these predictions with summarized preferences—preferences that have no conceptual parallel in nonhuman animals and would not have been subject to the same evolutionary pressures. If you are generating predictions about human mate preferences with evolutionary relevance, your construct—nine times out of ten—is the functional preference.

So if you work with humans, you can choose to study how strongly attributes predict evaluative outcomes (i.e., functional preferences), or you can study people’s ideas about the attributes they like (i.e., summarized preferences), or you can study both and the relationship between them. Just be mindful of and clear about which construct(s) you are studying. If you work with nonhuman animals, you are almost surely studying functional preferences...unless you have figured out how to get bowerbirds to fill out a pen-and-paper survey.

Borgia, G. (1995). Complex male display and female choice in the spotted bowerbird: Specialized functions for different bower decorations. Animal Behaviour, 49, 1291-1301.

Fletcher, G. J., Simpson, J. A., Thomas, G., & Giles, L. (1999). Ideals in intimate relationships. Journal of personality and social psychology, 76, 72-89.

Ledgerwood, A., Eastwick, P. W., & Smith, L. K. (in press). Toward an integrative framework for studying human evaluation: Attitudes towards objects and attributes. Personality and Social Psychology Review

Wood, D., & Brumbaugh, C. C. (2009). Using revealed mate preferences to evaluate market force and differential preference explanations for mate selection. Journal of personality and social psychology, 96, 1226-1244.

[1] “Nice house or apartment” is, in fact, one item on a very popular mate preference scale (Fletcher et al., 1999) – a scale that I myself use all the time.

[2] Wood and Brumbaugh (2009) offer the most comprehensive prior treatment of this construct, which they called a “revealed preference.” We shied away from the “revealed preference” label primarily because behavioral economists use this term to refer to an observable behavior. We mean something far more specific (i.e., the association between an attribute and liking within a set of targets).

Monday, June 18, 2018

How Critical Are We?

One perennial issue in the best-practices discussion is whether or not our discipline is overly critical or not critical enough. When we evaluate other people’s research, should we be increasing our focus on the positive aspects or the negative aspects?

My current view is “yes, both,” and the moderator [1] is whether we are talking about criticism that takes place pre- or post-publication. Pre-publication, I think we need to dial up the positivity; post-publication, I think we need to dial up the criticism.


Pre-publication peer-review: We can afford to emphasize the positives

Lopsided criticism in peer-review
Here are two ways of envisioning the reviewer’s job. One way: The reviewer is the firewall that protects the world from weak manuscripts by pointing out all their flaws. A second way: The reviewer is a knowledgeable colleague who has been asked to offer input on ways to strengthen the manuscript.

As an editor, I find that—nine times out of ten—the latter approach ultimately makes for a better published literature. Here are two specific reviewer tactics that help tip the pre-publication peer review balance in a more constructive direction:

1. Reviews are especially helpful when they explain what a particular subfield can gain from the manuscript. Given that the manuscripts I handle are rarely (if ever) examining a topic that I myself study, I need reviewers to tell me about the value that the manuscript will have for them and their colleagues. Does the manuscript help to define or organize a problem? Will the findings be useful for other scholars when planning their own studies? Does the manuscript properly situate the findings in the literature they are trying to inform? It is extraordinarily informative when a reviewer says “Wow, my subfield really needs an article that does what this manuscript is trying to do.”

2. Reviews are especially helpful when they avoid getting hung up on small imperfections and inconsistencies that make a story less pretty and glossy. These small imperfections (e.g., a simple main effect was not significant on one of the five tests, different meta-analytic publication bias analyses reveal different conclusions) are very real parts of science, and all articles have some. Indeed, I would argue that the picture-perfect (but impossible) articles of the past emerged because authors and reviewers pushed each other to scrub away the imperfections. [2] As an editor, I prefer reviews that focus in depth on a few big picture concerns, if they exist (e.g., missing a large segment of the literature in a review, using the incorrect statistical test, drawing a conclusion that is not supported by the data). And if there are no big picture concerns, the reviews should so.

Post-publication peer-review: We can afford to be more critical

I think we have a natural tendency to assume that findings enter an “official canon” when they are accepted for publication. Canon is for fiction, like Star Wars and Marvel. As scientists, we must fight the urge to canonize.

Fictional scientist Bruce Banner fights the urge to
transform into the Hulk in two eponymous movies,
but only the second one is canon and "counts."

Real scientists have to fight the urge to canonize
articles just because they have crossed the
threshold from unpublished to published..
After all, good science should spark debates. I study the psychology of mating and relationships because I think much of this literature is debatable and debate-worthy. I criticize others’ approaches; others criticize mine. And I have served as a reviewer of productive back-and-forth debates between other scholars. These experiences were sometimes stressful, but in my view, these criticisms all served to advance the science.

I find it bewildering that some journals and editors are reluctant to devote page space to debates and criticism of previously published work. I have heard people express the opinion that criticism belongs only in the review process; if an article survives this “due process,” it earns a shield against any further published criticism. This attitude has a perverse effect: It prevents debates from moving forward openly for all to evaluate and confines them to a closed review process. I would posit that blogs, Facebook, and Twitter have become popular means of scientific criticism and debate in part because journals do not commonly offer opportunities for the ongoing, post-publication peer-review that is an essential part of science.

I would love to see journals embrace post-publication criticism—especially the thoughtful and productive kind of criticism that could even merit publication on its own. Indeed, I would love to see all journals operate like Behavioral and Brain Sciences or PNAS, where post-publication criticism is encouraged (or even solicited) shortly after the initial release of an article. If we create additional avenues for post-publication peer-review, I think we will see a much needed shift in the balance of criticism in our field.

[1] Hidden?
[2] I have always liked this piece about how changes in our scientific practices require changes on the part of reviewers, too. 

Wednesday, May 16, 2018

Improvements in Research Practices: A Personal Power Ranking

Science is about shifting consensus (Grene, 1985; Longino, 1990). At a given point in time, scientists in a field might believe one thing, and later, they believe something else. For this reason, persuasion is the fuel that powers the scientific enterprise.  

The conversation about best practices is our field over the last decade is no different. It is a persuasion process: Scientists who believe that direct replications or statistical power or preregistration will improve the quality of our science attempt to convince scientists who do not hold this belief to change their views. When confronted with strong evidence, argumentation, and logic, skeptics should be willing to change their beliefs (or else they aren’t really practicing science).

I have been persuaded about many things. Sometimes I was persuaded when I simply learned more about a topic. Other times, I was persuaded because I learned that my previous views were incorrect in some way.

In celebration of scientific persuasion, I thought I would offer my own personal Top-5 power ranking. Relative to ten years ago, I have been persuaded about the value of all of these practices. What follows  is a list of the top 5 improvements in research practices—ranked in the order that I have found them valuable for my own research. [1]


Improving my own Research Practices: Top 5 Power Ranking

5. Use social media for scientific conversation: It is remarkable that scholars of all ranks can turn to social media to learn about research practices, share their knowledge, and debate scientific issues. When I was in graduate school, debates and critical discussions were largely confined to conferences and took place once or twice a year. Now, these conversations happen multiple times a day, with contributions from a diverse set of voices. In this way, social media has made civil scientific critique and debate a normal, everyday activity.

Why not higher? I still think that editors serve an extremely important role in curating scientific criticism and keeping the debate focused on the substantive issues. For reasons I can’t quite fathom, some journals are reluctant to give page space to debates about previously published work, so naturally social media stepped in to fill this void. Nevertheless, I would love to see journals play a larger role in post-publication peer review, perhaps by offering something like the PNAS “letters” format.

4. Conduct direct replications: I now routinely build direct replications into my work. For example, if we want to see whether an effect of Study 1 is moderated in Study 2, I might ensure that the effect in the Study 2 control condition ALSO functions as a direct replication of Study 1. I continue to conduct conceptual replications, of course, but I have certainly shifted my emphasis over the past few years. I now routinely assess the direct replicability of my findings before building on them, especially when I’m doing something new, and I no longer assume that other findings are directly replicable if they have only been demonstrated once.

Why not higher? If we were to over-prioritize direct replications, we could be at risk for enshrining particular operationalizations in lieu of the conceptual variables we really care about. For example, in my home topic area, many findings in the literature on stated mate preferences for traits are directly replicable, but they have ambiguous connections to the conceptual variables of interest: It’s very easy to replicate the finding that men and women say they want different things in a partner, but it’s not clear the extent to which what people SAY they want maps onto what they ACTUALLY want when interacting with real potential partners (see this earlier post). We should not become so focused on direct replications that we forget to care about what our variables are actually measuring.

3. Focus on effect sizes (rather than significance): In graduate school, my programs of research often lived and died by p < .05. I am overjoyed that this trend is shifting; when I focus on effect sizes and confidence intervals rather than an arbitrary black-and-white decision rule, I learn much more from my data. This is especially true when comparing across studies: We used to think “This study was significant but this one was not…what happened?” When we focus on effect sizes, these comparisons take place on a continuum and do not rely on arbitrary cut-offs, and our attention shifts instead to the extent to which effect size estimates are consistent across studies.

Why not higher? I am receptive to the argument that, in many experimental contexts, the effect size “doesn’t really matter” in the sense that the manipulation is not intended for use in an applied context. Nevertheless, even when I run experiments, I still find it extremely useful to compare effect sizes across similar operationalizations, so that I can develop a sense of how confident I should be in a set of results (more confident if the effect sizes are similar across experiments using similar manipulations; less confident if the effect sizes seem to be all over the place).

2. Promote and participate in registered reports: As I noted in a prior post, I am a big fan of registered reports. I love how they function to get both reviewers and authors alike to agree that the results of a particular study will be informative however they turn out. I now think that our studies are generally stronger when we design them with this kind of informative potential from the beginning. I have largely stopped conducting the “shoot the moon” studies that are counterintuitive and cool if they “work” but wouldn’t really change my mind if they don’t.

Why not higher? If registered reports became the norm, what would happen to large pre-existing datasets that are not eligible? Would people stop investing in large-scale efforts going forward? I hope that we develop a registered report format that can make use of pre-existing data (e.g., perhaps in combination with meta-analytic approaches).

1. Improve power: My studies are more highly powered than they once were. And as a result, I feel as though I have been going on fewer wild goose chases: If I see a medium effect size with several hundred participants in Study 1, I would bet money that I am going to see it again in my direct replication in Study 2. In cases where I do make the decision to chase a small effect, that decision is now conscious and careful (i.e., I will decide if it’s really worth it to invest the resources to have adequate power to detect the effect if it is there), and if I decide I do want to chase it with a highly powered study, I learn something from my data no matter what happens.

Even though I ranked this #1, I still see a potential downside. For example, I am still running labor-intensive designs (e.g., confederate studies involving one participant at a time), but they take much longer, and so I am running fewer of them. But I have considered this tradeoff, and my assessment is that I am better off running a few highly powered versions of these studies than many underpowered ones.

Will this be my top 5 power ranking forever?  Probably not. [2] I look forward to future research practice improvements, and to having my mind changed yet again.

Grene, M. (1985). Perception, interpretation, and the sciences: toward a new philosophy of science. In Evolution at a crossroads: The new biology and the new philosophy of science.

Longino, H. E. (1990). Science as social knowledge: Values and objectivity in scientific inquiry. Princeton University Press.

[1] Note that this is not a Top-5 list of what developments convinced me that the field as it existed circa 2010 “had a problem” or “was in crisis.” I have been persuaded on that front, too, but that would be a different list.

[2] If you’re curious, here were four honorable mentions that did not quite make the top 5 for me, in no particular order: Preregistered analysis plans, transparency in reporting methods, selection methods for assessing publication bias, open data.

Monday, March 26, 2018

Testing the replicability of claims about a sex difference: A regrettable delay

A public commitment to update my own beliefs in response to a planned analysis I haven’t seen yet (Part 2)

In Part 1 of this series, I tried to make some headway in the debate over sex differences in the appeal of attractiveness in established relationships by putting my own beliefs on the line, pre-registering an analysis plan to see if a prior result would replicate, and publicly committing to update my beliefs regardless of how the results turned out. Unfortunately, this test will have to wait.

Although I assumed that it would be easy to obtain the data from a just-published manuscript, I was incorrect: Dr. McNulty has informed me that there will be a “regrettable delay” of unknown duration in sharing the data underlying the published manuscript until his team finishes working on and successfully publishes a second manuscript analyzing the same columns of data. Once the second manuscript is successfully published, he will be happy to share the data associated with the first manuscript, but he has no guess about how long that might take. Our full email exchange is included below, with Dr. McNulty’s permission.

I think it is fair to say that he and I are reading the APA ethical principle on data-sharing differently. In light of the field’s growing appreciation of the importance of openly and transparently sharing the data that is used in published manuscripts, I wonder if the language in the APA principle needs to be clarified or updated to reflect current standards in the field. (Indeed, the most surprising element to me of our whole exchange was Dr. McNulty noting that one of his colleagues had advised him against ever sharing the data associated with his published manuscript. Clearly, scholars have very different views about whether and when the data behind published papers should be shared with other researchers, and it seems crucial that our societies and journals provide clear guidance to authors going forward.)

In light of the indefinite and regrettable delay, any claims that this particular sex difference is robust seem premature. I have posted below the results of the Meltzer et al. (2014) 28-covariate analysis, as well as the Eastwick et al. (2014) unsuccessful replication attempt, so that readers can get a sense of the existing evidence for this sex difference. I have also left a blank space for the eventual inclusion of a direct replication from the new McNulty et al. (2018 online publication) dataset. I will fill it in once the data from those N = 233 couples are shared with me and I can conduct the preregistered analyses. 

I’ll close with an exhortation to other scholars: Future tests of this idea should examine it in a confirmatory way (i.e., with a detailed analysis plan that is written ahead of time, before seeing the data). My post did not end the debate, but I do hope that this approach will set a standard that helps researchers come together to address this question with strong methods going forward. 

Results of the 28-covariate analysis proposed by Meltzer et al. (2014) and the one direct replication to date (Eastwick et al., 2014). Meltzer et al. (2014) concluded that the association of coder-rated attractiveness with relationship satisfaction is stronger for men than for women (see first Intercept test). I will update the figure when the data for McNulty et al. (2018 online publication) are made available.
Bars indicate 95% CIs. Y axis is effect size q (interpretable like r).

My preferred approach to testing this sex difference is as follows: a random effects meta-analysis examining the effect of coder-rated attractiveness on relationship evaluations (e.g., satisfaction) in established (i.e., dating and/or married) relationships. That meta-analytic effect (k = 11, N = 2,976), which includes both the Meltzer et al. (2014) and Eastwick et al. (2014) data analyzed above, is shown here:

Bar indicates 95% CI. Y axis is effect size q (interpretable like r).

Emails reprinted here, with permission:

March 7, 2018

Hello Jim,

I hope you enjoyed SPSP this year – it was good to run into you briefly. I am writing to request the data associated with your new paper, which looks really interesting:

In addition to the covariates in Table 2 and income (mentioned on p. 4), I would be very appreciative if you would also include extraversion if you have it. But I also recognize that, technically speaking, you are under no obligation to share extraversion given that it wasn’t mentioned in the published article.

My intention is simply to conduct this preregistered analysis plan. If you are curious, I also have written a blog post about the relevant interpretive issues – if you and/or Andrea would like to comment on the second part (once I write it), I would be happy to include your response on the blog.




March 8, 2018

Hi, Paul.

I enjoyed SPSP and it was good to run into you. It was astute of you to realize we have some more data to address our debate. I would be happy to share them with you eventually, but one of Andrea’s doctoral students is currently working on a manuscript that addresses this exact effect. They have been working on it off and on for some time now, but, as is typical, other priorities keep interfering. I fear it could undermine her project to share these valuable data with you and the world right now. That said, I do appreciate complete transparency, as well as your attempts to shed more light on this issue, and I would be happy to share all the data with you once her project is complete. Does that sound okay? I wish I had a good guess as to when that would be, but for some reason I still haven’t figured out how to predict how reviewers will feel about a particular paper. Haha.



March 9, 2018

Hi Jim,

I totally understand wanting to make sure that your student will be able to publish his/her paper. And I realize that my email might not have been clear: I was only suggesting that I would report the results on the blog, not a journal article. You should of course be able to carve up the remaining dataset for journal articles as you see fit – I’m only requesting the data that were used in the in press publication (plus extraversion if you had it and were willing to share it -- but of course, I understand that you are under no obligation to do so since it’s not in the published article). I wouldn’t anticipate that a blog post on this particular analysis would interfere with your student’s ability to report and build off of it in a future article.




March 16, 2018

Hi Jim,

I just wanted to follow up with you on the message I sent last week requesting the data from your in press JPSP. I’m still excited to take a look, and I want to reiterate that my plan is only to share the results of the preregistered analyses on a blog (i.e., not a journal publication). In case it helps mitigate the concerns you articulated about wanting to publish analyses based on these data in a separate article, I had an idea: What if I only post the effect sizes and confidence intervals associated with the three sex difference tests that I preregistered (i.e., no other statistical information or detailed descriptives)?

I really hope that we can navigate these data sharing complexities ourselves in a friendly way – I am committed to making some progress on the sex difference question by conducting and reporting the analyses I preregistered on my blog however they turn out, and you of course should be able to publish additional analyses in the future off of these published data. I do think it’s important to keep in mind that the data I am requesting are now published, and that this means that ethically, they must be made available to “other competent professionals” (APA, 8.14, 2010). But I’d much rather do this in a friendly and informal way over email rather than going through the journal or APA or something.

If I don’t hear from you by next Friday (the 23rd), I’ll go ahead and update my blog to indicate that you declined to share the data, and we’ll go from there.




March 20, 2018


I understand that you do not plan to pursue publication of the data you requested. And I believe you are probably correct that a blog will not interfere with a future publication. However, I must admit that the blogosphere is extremely foreign to me and I perceive that it seems to have some traction. I also have no idea what the future holds. I see no reason to risk even an unlikely negative outcome for one of our students. I’m not sure I was clear in my original email, but the student is not simply working with these data; she is working on a manuscript describing the sex difference in the association between partner attractiveness and marital satisfaction—the precise effect in question. I have received advice from two colleagues who are unattached to this debate and they tell me not to share the data yet (one says don’t share it at all).

Regarding any ethical obligation to share the data with you, my read of the APA ethics statement on this issue is that I am only obligated to share with “other competent professionals” who intend to replicate the result in question. APA Ethical Principles specify that "after research results are published, psychologists do not withhold the data on which their conclusions are based from other competent professionals who seek to verify the substantive claims through reanalysis and who intend to use such data only for that purpose, provided that the confidentiality of the participants can be protected and unless legal rights concerning proprietary data preclude their release" (Standard 8.14).”Retrieved: You left out of your email the critical qualifier that I bolded above. It is quite clear from your email, and from the fact that you preregistered a completely unrelated analysis of my covariates, that you have no intentions to verify our substantive claims but instead want to capitalize on our covariates to address your own research goals.

To be honest with you, Paul, what is frustrating to me about your latest email that threatens to post on your blog that I declined your request and potentially take up this issue with APA is that I did not decline your request. As I said in my original email, I will give you the data after the student working on this exact effect is finished, even though I do not believe I am obligated to do so, because I too am committed to science and understanding this sex difference. If you post anything on your blog about this other than the fact that there will be a regrettable delay in getting the data from us, please also post this entire string of emails so people can decide for themselves if I am being unethical.



March 22, 2018

Hi Jim,

Thanks for your reply. It seems like we have different interpretations of the APA data-sharing principle (at least as it applies in this case). I thought it was self-evident that my proposed analysis was addressing a “substantive claim” of your published manuscript: You tested and reported a sex difference in the partner attractiveness-infidelity association, and concluded the following on pp. 15-16: “This latter sex difference is consistent with evidence that partner attractiveness is more important to men than it is to women (Li et al., 2013; McNulty, Neff, & Karney, 2008; Meltzer et al., 2014a, 2014b), and thereby challenges the idea that the importance of partner attractiveness is equivalent across men and women (see Eastwick & Finkel, 2008).” You had the opportunity to conduct the same analysis that you and your colleagues have argued is the best test of this sex difference (Meltzer et al., 2014a; this is the analysis I proposed in my blog post) to see if the Meltzer et al. (2014a) findings would replicate in this new dataset. Although you did not report this analysis, you claimed in the Discussion of your paper to have supported those findings anyway.

In my blog post, I proposed to reanalyze the data from your published paper in order to test the claim that “partner attractiveness is more important to men than it is to women” (p. 16). To me, it seems like the APA data-sharing principle (as well as the field’s current norms about the importance of openness and transparency) applies here. Nevertheless, I agree that multiple interpretations of the APA principle are possible and I appreciate your willingness to engage with me on this issue.

I’m disappointed that there will be a regrettable delay (as you note) in your sharing of these data. I’m also sad to hear that, in this day and age, your colleagues are advising you to delay or avoid sharing the data behind a published paper. I appreciate your willingness to allow me to post our email exchange, and I apologize if you worried that I would misrepresent you – that was definitely not my intention, and I agree with you that it is important to post the exchange for transparency’s sake.



PS: Despite all this, I really do think the new paper is cool. One of the questions it addresses had come up a few days beforehand in my grad class.

Wednesday, March 7, 2018

Going on the record via preregistration

A public commitment to update my own beliefs in response to a planned analysis I haven’t seen yet (Part 1)

Update, 3/26/18: Unfortunately, my request for the data behind this recently published JPSP paper (McNulty, Meltzer, Makhanova, and Maner, 2018 online publication) was unsuccessful. Dr. McNulty has informed me that there will be a “regrettable delay” of unknown duration in sharing these now published data until his team writes up and successfully publishes a second manuscript on these same data columns. Part 2 of this blog post is here, along with our email exchange about the data sharing question. 

In my previous post, I talked about how essential it is that we, as scientists, remain open to the possibility of having our intuitions disconfirmed.

Now let’s see if I can put my money where my mouth is.

If I take my own admonishment seriously, I need to be willing to have my own intuitions and beliefs disconfirmed—even when those beliefs have developed through years of researching a particular topic.

Here’s one of my own findings in which I have a high degree of confidence. In a meta-analysis I conducted about five years ago, we examined whether a partner’s attractiveness was more romantically appealing to men than to women. We acquired a large collection of published and unpublished datasets (k = 97, N = 29,780) that spanned a variety of paradigms in which men and women reported on partners they had (at a minimum) met face-to-face. Overall, we found that the sex difference in the appeal of attractiveness was not significantly different from zero, and it did not matter whether the study examined initial attraction (e.g., speed-dating, confederate designs) or established relationships (e.g., dating couples, married couples).

Here is a hypothetical illustration of this finding: If a man’s satisfaction in a given relationship is predicted by his female partner’s attractiveness at r = .08, we might find that a woman’s satisfaction is predicted by her male partner’s attractiveness at about r = .03. Meta-analytically, the sex difference is about this size: r(difference) = .05 or smaller. You can interpret this r(difference) like you would interpret r = .05 in any other context—really small, hard to detect, and probably not practically different from zero.
However you slice the meta-analytic data, it is hard to find a sex difference in the appeal
of attractiveness in paradigms where participants have met partners face-to-face.
(p refers to the p value of the sex difference test statistic Qsex.) From here.

Interestingly, the sex difference in attractiveness is much larger when you ask men and women to use a rating scale to indicate how much they think they like physical attractiveness in a partner. The size of this “stated preference” sex difference is about r = .25 (see Table 1 in this paper). [1]

In other words, an r = .25 effect when people make judgments about what they think they like drops to r = .05 when people are responding to partners who they have actually met in real life. 

I find this “effect size drop” deeply fascinating. It opens two interesting questions that have guided much of my research:

1. If men and women truly differ in the extent to which they believe attractiveness to be important in a partner, what factors interfere with the application of these ideals when they evaluate partners in real life?

2. If there is essentially no difference between men and women in how much they actually prefer attractiveness in a real life partner, what sorts of social-cognitive biases might produce the sex difference in how much people think they prefer attractiveness in a partner?

I have spent considerable time and effort in the last decade examining these two questions in my research. We’ve found some answers, and yet there’s still a long way to go in this topic area.

All effect sizes are coded so positive values mean that attractiveness receives higher
ratings/is a larger predictor for men than for women. I am prepared to update the
table after I examine the new McNulty et al. (in press) data according to my
preregistered analysis plan.
But back to my belief that I am putting on the line in this blog post: I believe that the sex difference is about r = .05 (or smaller) when people evaluate real-life partners. I feel pretty confident about this belief, given all the evidence I have seen. But there are other scholars who believe something entirely different.


Since we published the meta-analysis, two empirical articles have taken a strong stance against our conclusion that the sex difference in the appeal of attractiveness is small or nonexistent. I discussed one of them (Li et al., 2013) in an earlier post; given the tiny effective sample size of that study, I won’t discuss it further here. Instead, let’s talk about the second one: Meltzer, McNulty, Jackson, & Karney (2014).

This paper found the expected sex difference in a sample of N = 458 married couples. In brief, they found that women’s attractiveness predicted men’s satisfaction at r = .10, whereas men’s attractiveness predicted women’s satisfaction r = -.05. That’s an r(difference) of .15—still pretty small, but not zero (p = .046).

One unusual element of this paper is that the authors only present this sex difference in one analysis, and it included a large number of covariates. Twenty-eight of them, to be exact. Another element worth noting is that there were actually two ways that the sex difference could have emerged—on the intercept of satisfaction or the slope of satisfaction. The effect that the authors focused on was the intercept; slope effects did not differ for men and women, r(difference) = .02.

Personally, I don’t believe that this analysis provides an accurate depiction of the sex difference. It’s hard for me to buy into the idea that you need twenty-eight covariates in this analysis, and even then, the sex difference only emerges in one place and not the other. In fact, we conducted an identical analysis on some of our own data that had the same variables, and we didn’t find a hint of the sex difference (if anything, the slope effect trended in the opposite direction).

Nevertheless, for the past five years, this debate gets distilled to “Team X says no sex difference, but Team Y says yes.” If someone wants to cite evidence for the absence of the sex difference, they have it; if someone wants to cite evidence for the presence of the sex difference, they can do that, too. This does not seem to be a good scientific recipe for getting closer to the truth.

I’m pretty confident in my belief that the sex difference here is tiny or nonexistent. But you know what? Maybe I’m wrong. If I want to call myself a scientist, I have to be open to that possibility. I have to be willing to say: Here are the data that would convince me to change my belief.

So here it is: I will update my belief if a preregistered test, using the same 28-covariate analysis in a new dataset, replicates the sex difference on the intercept found in Meltzer et al. (2014).

You may be thinking, it’s easy for me to say that, so long as no dataset of the kind exists. But in fact, just the other day, I saw this new published paper (McNulty, Meltzer, Makhanova, & Maner, in press). It primarily examines a different (and totally fascinating!) research question, and it uses a new sample of N = 233 couples. But buried in the descriptions of the covariates in that paper are all of the key variables and all but one of the covariates required to directly replicate the earlier sex difference analysis reported in Meltzer et al. (2014).

Here is what I am committing to, publicly, right now: I have written up a preregistered analysis plan that provides the test I outline above. I will email Jim McNulty for the data they used in this new published manuscript, which I am confident that he will share with me. I will run the preregistered analysis on these data, and I will describe the results as a “Part 2” of this blog post. If the key finding from Meltzer et al. (2014) replicates—that is, if the sex difference on the intercept is significant—then I need to seriously consider the possibility that I am wrong, and I need to update my beliefs accordingly. If it is not, I hope that those scholars who believe in this particular sex difference will be willing to update their beliefs and/or conduct a highly powered test of their prediction.  

Either way, we’ll be getting closer to the truth rather than being stuck in an endless circle around it.

[1] When people talk about the “robust literature” showing that attractiveness matters more to men than to women, they could be talking about one of two things. First, they could be talking about this stated preference sex difference. Second, they might be talking about findings showing that, in hypothetical settings (e.g., viewing photographs), attractiveness tends to matter more to men than to women. In fact, we preregistered a study examining this context and found the sex difference! As I described in this earlier post, the size of the sex difference that we found in a very highly powered design was r = .13.