Wednesday, May 16, 2018

Improvements in Research Practices: A Personal Power Ranking

Science is about shifting consensus (Grene, 1985; Longino, 1990). At a given point in time, scientists in a field might believe one thing, and later, they believe something else. For this reason, persuasion is the fuel that powers the scientific enterprise.  

The conversation about best practices is our field over the last decade is no different. It is a persuasion process: Scientists who believe that direct replications or statistical power or preregistration will improve the quality of our science attempt to convince scientists who do not hold this belief to change their views. When confronted with strong evidence, argumentation, and logic, skeptics should be willing to change their beliefs (or else they aren’t really practicing science).

I have been persuaded about many things. Sometimes I was persuaded when I simply learned more about a topic. Other times, I was persuaded because I learned that my previous views were incorrect in some way.

In celebration of scientific persuasion, I thought I would offer my own personal Top-5 power ranking. Relative to ten years ago, I have been persuaded about the value of all of these practices. What follows  is a list of the top 5 improvements in research practices—ranked in the order that I have found them valuable for my own research. [1]


Improving my own Research Practices: Top 5 Power Ranking

5. Use social media for scientific conversation: It is remarkable that scholars of all ranks can turn to social media to learn about research practices, share their knowledge, and debate scientific issues. When I was in graduate school, debates and critical discussions were largely confined to conferences and took place once or twice a year. Now, these conversations happen multiple times a day, with contributions from a diverse set of voices. In this way, social media has made civil scientific critique and debate a normal, everyday activity.

Why not higher? I still think that editors serve an extremely important role in curating scientific criticism and keeping the debate focused on the substantive issues. For reasons I can’t quite fathom, some journals are reluctant to give page space to debates about previously published work, so naturally social media stepped in to fill this void. Nevertheless, I would love to see journals play a larger role in post-publication peer review, perhaps by offering something like the PNAS “letters” format.

4. Conduct direct replications: I now routinely build direct replications into my work. For example, if we want to see whether an effect of Study 1 is moderated in Study 2, I might ensure that the effect in the Study 2 control condition ALSO functions as a direct replication of Study 1. I continue to conduct conceptual replications, of course, but I have certainly shifted my emphasis over the past few years. I now routinely assess the direct replicability of my findings before building on them, especially when I’m doing something new, and I no longer assume that other findings are directly replicable if they have only been demonstrated once.

Why not higher? If we were to over-prioritize direct replications, we could be at risk for enshrining particular operationalizations in lieu of the conceptual variables we really care about. For example, in my home topic area, many findings in the literature on stated mate preferences for traits are directly replicable, but they have ambiguous connections to the conceptual variables of interest: It’s very easy to replicate the finding that men and women say they want different things in a partner, but it’s not clear the extent to which what people SAY they want maps onto what they ACTUALLY want when interacting with real potential partners (see this earlier post). We should not become so focused on direct replications that we forget to care about what our variables are actually measuring.

3. Focus on effect sizes (rather than significance): In graduate school, my programs of research often lived and died by p < .05. I am overjoyed that this trend is shifting; when I focus on effect sizes and confidence intervals rather than an arbitrary black-and-white decision rule, I learn much more from my data. This is especially true when comparing across studies: We used to think “This study was significant but this one was not…what happened?” When we focus on effect sizes, these comparisons take place on a continuum and do not rely on arbitrary cut-offs, and our attention shifts instead to the extent to which effect size estimates are consistent across studies.

Why not higher? I am receptive to the argument that, in many experimental contexts, the effect size “doesn’t really matter” in the sense that the manipulation is not intended for use in an applied context. Nevertheless, even when I run experiments, I still find it extremely useful to compare effect sizes across similar operationalizations, so that I can develop a sense of how confident I should be in a set of results (more confident if the effect sizes are similar across experiments using similar manipulations; less confident if the effect sizes seem to be all over the place).

2. Promote and participate in registered reports: As I noted in a prior post, I am a big fan of registered reports. I love how they function to get both reviewers and authors alike to agree that the results of a particular study will be informative however they turn out. I now think that our studies are generally stronger when we design them with this kind of informative potential from the beginning. I have largely stopped conducting the “shoot the moon” studies that are counterintuitive and cool if they “work” but wouldn’t really change my mind if they don’t.

Why not higher? If registered reports became the norm, what would happen to large pre-existing datasets that are not eligible? Would people stop investing in large-scale efforts going forward? I hope that we develop a registered report format that can make use of pre-existing data (e.g., perhaps in combination with meta-analytic approaches).

1. Improve power: My studies are more highly powered than they once were. And as a result, I feel as though I have been going on fewer wild goose chases: If I see a medium effect size with several hundred participants in Study 1, I would bet money that I am going to see it again in my direct replication in Study 2. In cases where I do make the decision to chase a small effect, that decision is now conscious and careful (i.e., I will decide if it’s really worth it to invest the resources to have adequate power to detect the effect if it is there), and if I decide I do want to chase it with a highly powered study, I learn something from my data no matter what happens.

Even though I ranked this #1, I still see a potential downside. For example, I am still running labor-intensive designs (e.g., confederate studies involving one participant at a time), but they take much longer, and so I am running fewer of them. But I have considered this tradeoff, and my assessment is that I am better off running a few highly powered versions of these studies than many underpowered ones.

Will this be my top 5 power ranking forever?  Probably not. [2] I look forward to future research practice improvements, and to having my mind changed yet again.

Grene, M. (1985). Perception, interpretation, and the sciences: toward a new philosophy of science. In Evolution at a crossroads: The new biology and the new philosophy of science.

Longino, H. E. (1990). Science as social knowledge: Values and objectivity in scientific inquiry. Princeton University Press.

[1] Note that this is not a Top-5 list of what developments convinced me that the field as it existed circa 2010 “had a problem” or “was in crisis.” I have been persuaded on that front, too, but that would be a different list.

[2] If you’re curious, here were four honorable mentions that did not quite make the top 5 for me, in no particular order: Preregistered analysis plans, transparency in reporting methods, selection methods for assessing publication bias, open data.