Science is about shifting consensus (Grene, 1985; Longino,
1990). At a given point in time, scientists in a field might believe one
thing, and later, they believe something else. For this reason, persuasion is
the fuel that powers the scientific enterprise.
The conversation about best practices is our field over the
last decade is no different. It is a persuasion process: Scientists who believe
that direct replications or statistical power or preregistration will improve
the quality of our science attempt to convince scientists who do not hold this
belief to change their views. When confronted with strong evidence,
argumentation, and logic, skeptics should be willing to change their beliefs
(or else they aren’t really practicing science).
I have been persuaded about many things. Sometimes I was
persuaded when I simply learned more about a topic. Other times, I was
persuaded because I learned that my previous views were incorrect in some way.
In celebration of scientific persuasion, I thought I would
offer my own personal Top-5 power ranking. Relative to ten years ago, I have
been persuaded about the value of all of these practices. What follows is a list of the top 5 improvements in
research practices—ranked in the order that I have found them valuable for my
own research. [1]
==================
Improving my own
Research Practices: Top 5 Power Ranking

Why not higher? I still think that editors serve
an extremely important role in curating scientific criticism and keeping the
debate focused on the substantive issues. For reasons I can’t quite fathom,
some journals are reluctant to give page space to debates about previously
published work, so naturally social media stepped in to fill this void. Nevertheless,
I would love to see journals play a larger role in post-publication peer review,
perhaps by offering something like the PNAS “letters” format.
4. Conduct
direct replications: I now
routinely build direct replications into my work. For example, if we want to
see whether an effect of Study 1 is moderated in Study 2, I might ensure that
the effect in the Study 2 control condition ALSO functions as a direct
replication of Study 1. I continue to conduct conceptual replications, of
course, but I have certainly shifted my emphasis over the past few years. I now
routinely assess the direct replicability of my findings before building on
them, especially when I’m doing something new, and I no longer assume that
other findings are directly replicable if they have only been demonstrated
once.
Why not higher? If we were to over-prioritize
direct replications, we could be at risk for enshrining particular operationalizations
in lieu of the conceptual variables we really care about. For example, in my
home topic area, many findings in the literature on stated mate preferences for
traits are directly replicable, but they have ambiguous connections to the
conceptual variables of interest: It’s very easy to replicate the finding that
men and women say they want different things in a partner, but it’s not clear
the extent to which what people SAY they want maps onto what they ACTUALLY want
when interacting with real potential partners (see this
earlier post). We should not become so focused on direct replications that
we forget to care about what our variables are actually measuring.

Why not higher? I am receptive to the argument
that, in many experimental contexts, the effect size “doesn’t really matter” in
the sense that the manipulation is not intended for use in an applied context.
Nevertheless, even when I run experiments, I still find it extremely useful to
compare effect sizes across similar operationalizations, so that I can develop
a sense of how confident I should be in a set of results (more confident if the
effect sizes are similar across experiments using similar manipulations; less
confident if the effect sizes seem to be all over the place).
2. Promote and
participate in registered reports: As I noted in a prior
post, I am a big fan of registered reports. I love how they function to get
both reviewers and authors alike to agree that the results of a particular
study will be informative however they turn out. I now think that our studies
are generally stronger when we design them with this kind of informative
potential from the beginning. I have largely stopped conducting the “shoot the
moon” studies that are counterintuitive and cool if they “work” but wouldn’t
really change my mind if they don’t.
Why not higher? If registered reports became the
norm, what would happen to large pre-existing datasets that are not eligible?
Would people stop investing in large-scale efforts going forward? I hope that
we develop a registered report format that can make use of pre-existing data
(e.g., perhaps in combination with meta-analytic approaches).
Even though I ranked this #1, I still see a potential downside.
For example, I am still running labor-intensive designs (e.g., confederate
studies involving one participant at a time), but they take much longer, and so
I am running fewer of them. But I have considered this tradeoff, and my
assessment is that I am better off running a few highly powered versions of
these studies than many underpowered ones.
Will this be my top 5 power ranking forever? Probably not. [2] I look
forward to future research practice improvements, and to having my mind changed
yet again.
Grene,
M. (1985). Perception, interpretation, and the sciences: toward a new
philosophy of science. In Evolution at a
crossroads: The new biology and the new philosophy of science.
Longino, H. E. (1990). Science as
social knowledge: Values and objectivity in scientific inquiry. Princeton
University Press.
[1] Note that this is not a Top-5
list of what developments convinced me that the field as it existed circa 2010
“had a problem” or “was in crisis.” I have been persuaded on that front, too,
but that would be a different list.
[2] If you’re curious, here were four honorable mentions
that did not quite make the top 5 for me, in no particular order: Preregistered
analysis plans, transparency in reporting methods, selection methods for assessing
publication bias, open data.