Science is about shifting consensus (Grene, 1985; Longino,
1990). At a given point in time, scientists in a field might believe one
thing, and later, they believe something else. For this reason, persuasion is
the fuel that powers the scientific enterprise.
The conversation about best practices is our field over the
last decade is no different. It is a persuasion process: Scientists who believe
that direct replications or statistical power or preregistration will improve
the quality of our science attempt to convince scientists who do not hold this
belief to change their views. When confronted with strong evidence,
argumentation, and logic, skeptics should be willing to change their beliefs
(or else they aren’t really practicing science).
I have been persuaded about many things. Sometimes I was
persuaded when I simply learned more about a topic. Other times, I was
persuaded because I learned that my previous views were incorrect in some way.
In celebration of scientific persuasion, I thought I would
offer my own personal Top-5 power ranking. Relative to ten years ago, I have
been persuaded about the value of all of these practices. What follows is a list of the top 5 improvements in
research practices—ranked in the order that I have found them valuable for my
own research. [1]
==================
Improving my own
Research Practices: Top 5 Power Ranking
5. Use social
media for scientific conversation: It is remarkable that scholars of
all ranks can turn to social media to learn about research practices, share
their knowledge, and debate scientific issues. When I was in graduate school, debates
and critical discussions were largely confined to conferences and took place
once or twice a year. Now, these conversations happen multiple times a day,
with contributions from a diverse set of voices. In this way, social media has made
civil scientific critique and debate a normal, everyday activity.
Why not higher? I still think that editors serve
an extremely important role in curating scientific criticism and keeping the
debate focused on the substantive issues. For reasons I can’t quite fathom,
some journals are reluctant to give page space to debates about previously
published work, so naturally social media stepped in to fill this void. Nevertheless,
I would love to see journals play a larger role in post-publication peer review,
perhaps by offering something like the PNAS “letters” format.
4. Conduct
direct replications: I now
routinely build direct replications into my work. For example, if we want to
see whether an effect of Study 1 is moderated in Study 2, I might ensure that
the effect in the Study 2 control condition ALSO functions as a direct
replication of Study 1. I continue to conduct conceptual replications, of
course, but I have certainly shifted my emphasis over the past few years. I now
routinely assess the direct replicability of my findings before building on
them, especially when I’m doing something new, and I no longer assume that
other findings are directly replicable if they have only been demonstrated
once.
Why not higher? If we were to over-prioritize
direct replications, we could be at risk for enshrining particular operationalizations
in lieu of the conceptual variables we really care about. For example, in my
home topic area, many findings in the literature on stated mate preferences for
traits are directly replicable, but they have ambiguous connections to the
conceptual variables of interest: It’s very easy to replicate the finding that
men and women say they want different things in a partner, but it’s not clear
the extent to which what people SAY they want maps onto what they ACTUALLY want
when interacting with real potential partners (see this
earlier post). We should not become so focused on direct replications that
we forget to care about what our variables are actually measuring.
3. Focus on
effect sizes (rather than significance): In graduate school, my programs
of research often lived and died by p
< .05. I am overjoyed that this trend is shifting; when I focus on effect
sizes and confidence intervals rather than an arbitrary black-and-white
decision rule, I learn much more from my data. This is especially true when
comparing across studies: We used to think “This study was significant but this
one was not…what happened?” When we focus on effect sizes, these comparisons
take place on a continuum and do not rely on arbitrary cut-offs, and our
attention shifts instead to the extent to which effect size estimates are
consistent across studies.
Why not higher? I am receptive to the argument
that, in many experimental contexts, the effect size “doesn’t really matter” in
the sense that the manipulation is not intended for use in an applied context.
Nevertheless, even when I run experiments, I still find it extremely useful to
compare effect sizes across similar operationalizations, so that I can develop
a sense of how confident I should be in a set of results (more confident if the
effect sizes are similar across experiments using similar manipulations; less
confident if the effect sizes seem to be all over the place).
2. Promote and
participate in registered reports: As I noted in a prior
post, I am a big fan of registered reports. I love how they function to get
both reviewers and authors alike to agree that the results of a particular
study will be informative however they turn out. I now think that our studies
are generally stronger when we design them with this kind of informative
potential from the beginning. I have largely stopped conducting the “shoot the
moon” studies that are counterintuitive and cool if they “work” but wouldn’t
really change my mind if they don’t.
Why not higher? If registered reports became the
norm, what would happen to large pre-existing datasets that are not eligible?
Would people stop investing in large-scale efforts going forward? I hope that
we develop a registered report format that can make use of pre-existing data
(e.g., perhaps in combination with meta-analytic approaches).
1. Improve power:
My studies are more highly powered than they once were. And as a result, I
feel as though I have been going on fewer wild goose chases: If I see a medium
effect size with several hundred participants in Study 1, I would bet money
that I am going to see it again in my direct replication in Study 2. In cases
where I do make the decision to chase a small effect, that decision is now
conscious and careful (i.e., I will decide if it’s really worth it to invest
the resources to have adequate power to detect the effect if it is there), and
if I decide I do want to chase it with a highly powered study, I learn
something from my data no matter what happens.
Even though I ranked this #1, I still see a potential downside.
For example, I am still running labor-intensive designs (e.g., confederate
studies involving one participant at a time), but they take much longer, and so
I am running fewer of them. But I have considered this tradeoff, and my
assessment is that I am better off running a few highly powered versions of
these studies than many underpowered ones.
Will this be my top 5 power ranking forever? Probably not. [2] I look
forward to future research practice improvements, and to having my mind changed
yet again.
Grene,
M. (1985). Perception, interpretation, and the sciences: toward a new
philosophy of science. In Evolution at a
crossroads: The new biology and the new philosophy of science.
Longino, H. E. (1990). Science as
social knowledge: Values and objectivity in scientific inquiry. Princeton
University Press.
[1] Note that this is not a Top-5
list of what developments convinced me that the field as it existed circa 2010
“had a problem” or “was in crisis.” I have been persuaded on that front, too,
but that would be a different list.
[2] If you’re curious, here were four honorable mentions
that did not quite make the top 5 for me, in no particular order: Preregistered
analysis plans, transparency in reporting methods, selection methods for assessing
publication bias, open data.