Articles, Blog

Brian Nosek – Shifting incentives from getting it published to getting it right

February 14, 2020


I too would like to acknowledge the Traditional
Owners of the land on which we gather and meet, and to pay our respects to Elders past,
present, and emerging. Professor Brian Nosek, co-founder and director,
Center for Open Science, members of the university senior management group, UQ colleagues, distinguished
guests, ladies and gentlemen, I’m extremely pleased to be able to welcome a person of
Professor Nosek’s experience to UQ for today’s public seminar. Professor Nosek is, as many of you may know,
the co-founder and director of the Center for Open Science. The Center’s overriding
mission is to increase transparency, integrity, and reproducibility of our scholarly research.
Famously, in 2011, our guest and his collaborators embarked on an ambitious project. Brian and
269 other psychologists set about reproducing 100 psychological experiments. They published
their findings 4 years later in 2015 and demonstrated that only 36 of those 100 experiments showed
any statistically significant results. This compared with 97 of the original 100 experiments
that claimed significant results. Effectively, some 61 of those 100 studies were called into
question, but even more serious than that, some of the replicated experiments suggested
effects in the opposite direction to the original published research. Now there are many reasons that a particular
experimental result may not have been reproduced; however, it does give pause for consideration
of the biases that both researchers and journals have towards producing and reporting positive
findings. Brian is no doubt going to expand upon this particular point later. Humans highly value truth and trust, and these
are attributes demanded of us by the community’s academics and scientists. We have a duty to
the end users of our research to maintain the confidence of both the public and the
research community in our peer-reviewed research. F-A-I-R, fair data is an acronym which stands
for findable, accessible, interoperable, and reusable. I’d like to take this opportunity
to highlight that some of the approaches UQ has taken recently and continues to undertake
are consistent with the FAIR agenda. UQ’s policy identifies that research data is a
valuable resource that can facilitate innovation and creation of new knowledge, but only if
there is appropriate management of the data. That means consideration of planning, storing,
analyzing, curating, describing, preserving, and sharing the data through its full lifecycle.
The more we abide by FAIR data principles, the greater chances of reproducibility, reliability,
and most importantly public confidence. Within UQ strategic plan is an important reference
that this institution aims to positively influence society through the creation and preservation
of knowledge. UQ is committed to the values of accountability and integrity to ensure
responsible stewardship of these university resources. The university abides by the Australian
Code for the Responsible Conduct of Research to provide facilities for the safe and secure
storage of research data and records where it is stored. Researchers are required by
the same code to manage research data they collect in line with best practice. So, in effect, research data is recognized
by UQ as a strategic asset. It is a strategic asset which needs to be managed to maximize
value, not only for researchers who produced it and this university, but predominantly
for broader society. This university is committed to adopting and pioneering the high standards
of research excellence in data management. It is a matter of some pride that the European
Commission Report turning FAIR into reality cites UQ’s research data management system
as an exemplar of great practice. The system is designed to facilitate sharing but also
safeguards against unwanted open publication of research datasets. This approach ensures
data quality sharing while recognizing premature sharing may compromise the study itself, the
interests of the researcher, and/or the interests of the university. When the reproducibility project conducted
by Brian and his collaborators concluded, they made an important observation. They commented
and I quote, “Any temptation to interpret these results as a defeat for psychology,
or science more generally, must contend with the fact that this project demonstrates science
behaving as it should: in a self-correcting manner. Science is about being constantly
skeptical of previous claims and always striving for improvement. The fact that a large number
of scientists volunteered their time for what was essentially a form of self-awareness and
self-correction is extremely heartening. This project reminds us not only not to be afraid
to admit our mistakes, but also it’s okay to identify our possible shortcomings. There’s
something in that for all of us as individual researchers. And on that note, there is no better time
to welcome Brian to the stage. Please give him a warm round of applause. Thank you for that introduction. Thank you
to Jason and the team for coordinating my visit, and thanks for coming this evening.
I’m delighted to be able to present to you tonight. My laboratory’s research interest is in
the gap between values and practices: what we think we should do, what we’re trying to
do, what we want to do in our everyday behavior, versus what we actually do and what accounts
for the gap when sometimes we do things that are counter to our intentions or values or
goals. Most of the work that we have done in sort of trying to understand this in ordinary
human behavior is how we may have biases that occur outside of our awareness or control
that may shape our everyday behavior in ways that are counter to our intentions and how
the culture around us provides some constraints, some demands, some ways that might shape our
behavior that are different than what we might value or what we might hope for. So what I want to talk about tonight is a
practical application of that research interest, which is the gap between scientific values
and practices. I don’t know if the echo here—we might try
to lower this microphone. And so that’s really what the Center for Open
Science is founded on—is trying to maximize the extent to which we can identify where
there are gaps between our values and our everyday practices, and how do we address
those gaps with different tools and resources to shift the culture towards one that embraces
greater openness and reproducibility. Let me start with talking about: what are
the values of science? A way that we can consider that first is through Robert Merton’s norms
of science. Robert Merton was a sociologist who identified what he thought are the core
ways in which science is unique or important as a way of knowing or understanding how the
world works. He identified what he considered the four
norms of science. One: communality, the open sharing of information.
When I make a scientific claim, it doesn’t gain credibility because I say, “Please
just trust me.” It gains credibility based on your ability to review the evidence that’s
the basis of that claim. What’s the methodology that I used to generate some data. What data
did get generated? How did I draw inference from that data to arrive at the claims that
I now say you should take seriously? As opposed to the counter norm: secrecy. “You just
got to trust me. Believe me, I figured this out. Just trust me.” A second norm is universalism. Research is
evaluated based on its own merit, right? The data itself, the process itself is the basis
for deciding whether it’s a credible claim or not. Versus the counter norm: “That’s
a famous person. All right, let’s trust it because they must know what they’re talking
about.” The third is disinterestedness. Researchers
are motivated by knowledge and discovery, just trying to figure it out. Versus the counter
norm of self-interestedness: “Really I’m here to do more and get more ahead of my colleagues
and try to advance my career more than they are able to advance theirs.” Fourth norm of organized skepticism: a researcher
considers all the new evidence even that that’s against their prior work or claims, versus
organized dogmatism: “What I do is I get dissertation findings, and then I spend the
rest of my career combating all of the attackers to try to preserve the integrity of those
original claims.” While Merton didn’t talk about it, a number
of observers of science have talked about the norm of quality over the norm of quantity. So we might recognize this as a list, and
there’s other related ones that other philosophers of science have identified as ways in which
science might try to accumulate knowledge, but an obvious question is whether researchers
themselves endorse it. Do they think that we should behave according to these norms
of science, or do they endorse the counter norms? Anderson and her colleagues decided to ask,
so they did a survey of about 3,300 NIH awardees. In this case early career refers to researchers
that were in postdoctoral awards from NIH, and mid-career are researchers who had their
first R01 Award, so average age in that sample of around 40. And they just asked: which one
do you endorse, the norm or the counter norm? What you’re seeing here is the cumulative
plot of proportion of people in gray who endorsed the norms over the counter norms on average.
About 90 percent of participants did so. In black are those that endorsed the counter
norms over the norms. They said, “We should treat it as a competition. I do want to keep
it all secret.” Very, very few people endorse the counter norms over the norms. Then in
the gray hatches are those that said, “Well, both of these I endorse.” About equal weight
for how science should operate. So then they said, “Great. Thank you. That’s
what you endorse. Don’t tell me what you endorse what you think should happen. Tell me how
you do behave in your research?” And it looks like this. So about 60 percent of people
say, “I behave according to the norms of science,” but a much larger proportion of
people are acknowledging that those counter norms have some weight on their behavior,
and still very, very few are saying that they behave according to the counter norms over
the norms. So then they said, “Okay, that’s great.
That’s great. Thank you. Now don’t tell me what you do. Tell me what everybody else in
your research area, what do they do?” And it looks like this. So this is remarkable
because everyone themselves is endorsing the norms of science. We are individually largely
feeling like we behave according to those norms, but we do not perceive the rest of
us, which are us that just answered the survey, to be behaving by those norms. This illustrates
a massive gap between what it is we value individually and collectively, and what we
perceive the culture of science to value and reward. That, really, what we have are principles
that we aspire to and realities of how it is we advance in our career. As a consequence, we put individual researchers
in a very difficult position. How is it that I advance my career, and I have a choice to
make according to these perceptions, which is I could live in science according to my
values—open sharing, communicate everything, be skeptical my own work—at the potential
cost of advancing my career; or I could say I need a career, so I’m going to do what it
takes. Those counter norms would advance me even at the cost of living according to my
values. And that isn’t a situation that is a healthy
situation for any culture. if our values are misaligned with what we’re actually incentivizing
and rewarding in practice, then we may interfere with the whole goal of science in the first
place: accelerating discovery figuring out knowledge, solutions, and cures. So what is the core challenge? Well, we think
that there are lots of challenges, but it boils down to this as the central challenge,
which is the incentives for my success, advancing my career, are focused on me getting it published,
not on me getting it right. If I publish a lot and I publish in the most prestigious
outlets that I can, I am more likely to get the next reward, the next job, tenure in my
job, good career prospects, good grants, and then more of the rewards because of that. Of course, I want to get it right. I didn’t
get into science because I like writing papers and writing grants. I got into science because
I’m curious. I like figuring things out. I like the problems that I’m studying. I want
to contribute some knowledge to the world. Very conceptually aspirational goals. And
yet there are realities, concrete realities, of what it takes to be rewarded and advanced
in scholarship. And we know that not everything that we do
gets published, so I might work very hard and produce lots of evidence, but some kinds
of evidence is more publishable than other kinds of evidence. I’m more likely to get published by finding
a positive result—this treatment is effective on this outcome; these two things are related
to one another—than a negative result: nothing to see here. I am more likely to get published by finding
something novel that hasn’t been claimed previously, as it compared to here’s a replication of
something that somebody else has claimed. I’m adding some precision, but it’s the same
thing as before. I’m more likely to get published if I have
evidence that sort of makes for a neat and tidy story where it all fits together, and
the claim is clear, and the evidence supporting the claim is all consistent, as opposed to
things that have exceptions, things that don’t quite fit, uncertainties, an explanation that
doesn’t quite account for all of it. So a positive, novel, tidy story is the best
kind of story in science because it is the best kind of story. If you’re able to find
out something new, explain it clearly, and have all the evidence fit together, you have
made an amazing contribution to science and sent it to new directions. The problem of course is that reality doesn’t
look like that very often. When we’re in the lab doing research, there’s lots of exceptions,
lots of false starts, lots of things that don’t make sense, things that don’t fit together,
lots of negative findings. That’s because we’re studying things we don’t understand.
That’s why we study them. So the emergence of understanding, the emergence of that clean
and tidy story that’s well backed by evidence, is a slow emergence. It happens over many
research projects, over lots of time trying to figure out what the right narrative understanding
is for this thing that we are studying. But, of course, the incentive system doesn’t work
like that. It’s every paper needs to be positive, novel, and tidy in order to maximize our rewards. With that then becomes a conflict of interest.
I need certain kinds of outcomes in order to advance my career, but to get those outcomes,
I might need to do things that maximize their publishability at the potential cost of the
credibility of those claims. And we know that because we have lots of discretion in how
it is we do our research, that there is lots of opportunity for me, whether I intend to
or not, to exploit, flex that flexibility, and improve publishability at the cost of
credibility. And that occurs at every stage of the research process, where we might exploit
that flexibility in order to improve our chances for publication. For example, once I have
data and I’ve been looking at it multiple ways, trying to make sense of that data, some
of those ways might make the finding more publishable than other ways of analyzing that
data. And it’s possible that without recognizing, I might rationalize that in fact the way that
looks better for publication is in fact the right way to analyze that data, as opposed
to the alternative way that’s less publishable. Likewise, we do lots of experiments in the
lab. We don’t publish all of them. What is the selection process to decide that this
one belongs in the paper and that one does not? Is there any selection bias where if
we get a negative result, we’re more likely to say, “Well, we screwed up the methodology
on that one. That doesn’t actually account for the thing that we were trying to study,
so we can put that one aside. And the one that found the positive result, well, that’s
the one that we should be including in the paper as the next figure.” All of those
potential decisions may maximize publication possibility, but because of the selective
reporting, the selective choices that we make, we may cost the credibility. There’s lots of examples that we could describe.
I’m just going to give a couple of illustrations just to show how it is these might manifest
in practice in different ways. This is a project that we did where we recruited
29 different teams to study the same question, and the question in this case was: are players
with darker skin tone more likely to get red cards in soccer than players with lighter
skin tone? What you’re seeing in the circles are each team’s effect size estimate of what
they found in their research. The box around is a 95 percent confidence interval, so if
that confidence interval overlaps with the dark line, we would consider that a null result.
There’s no relationship between these variables that could be detected, rejecting with standard
null hypothesis significance testing. What you see is that about a third of the teams,
those in gray, are no results by that standard; whereas about two-thirds of the teams found
a positive result, found a relationship with darker skin tone and more likely to get a
red card in soccer. There’s substantial variability across the teams. Now we see variability like this in research.
Lots of teams study problems in different ways, and we would typically consider the
meta-analytic result, the combination of these, to be the most reliable estimate of the true
effect to the extent there is a true effect. But the twist in this study was that they
all used the same data set. So we gave them the research question, then we handed them
the data to analyze that question, and this is the variation that they came up with of
testing the same question with the same data. But it’s more than that. This is after a round
of internal peer review. Their first phase is they figured out their analysis plan, got
their results. We removed the results and then shared the analysis strategies across
the teams for them to give each other feedback because there’s complex analysis choices to
make. So they gave each other feedback. People could incorporate that feedback however they
wanted, and then they submitted their final analysis plan and outcomes, and this is the
variation, still after the chances to get some convergence in their results. The point for our purposes is that when we
read a paper, the way it’s read easily is that’s what the data show, but of course that
isn’t correct. That’s what the data show contingent on the choices that I made in that analytic
pipeline for how it is I drew inference from that data, and we don’t represent the variation
that may occur in how those choices may have impact on the outcomes that we observe. So
we don’t even know necessarily how much those choices, the findings we have, are contingent
on those choices that we make, and that flexibility is unseen because it’s possible we may unwittingly
exploit it, depending on what stakes we have and what kind of outcome we want to observe
a particular outcome. Okay, that’s one example. Second example comes from work by Ernest O’Boyle
and his colleagues from the management literature. They were interested in trying to figure out
how is it that papers change over time, or findings change over time, over the life course
of the research project. This is a very hard thing to track because you can’t see the things
prior to the paper, but they came up with an ingenious solution to try to see how findings
might shift over time. What they did was they found 142 dissertations
in the management literature where they could get access to the original dissertation that
also had a publication of the same research, so the same project and two versions of it:
what was reported in the dissertation and what was reported in the final paper, and
they just compared them. What they’ve compared was: what were the outcomes that were reported?
In dissertations, 45 percent of the hypotheses that were reported in the papers were significant.
They found positive results. In the papers of the same research, 66 percent of the hypotheses
were significant results. So something changed in the same research from what was reported
in the dissertation to what appeared in the paper. They unpacked that further by looking first
at what tests were added to the paper that weren’t in the dissertation or were removed
from the dissertation that then didn’t appear in the paper. They found across the of the
projects that 333 hypotheses were added to the publication that hadn’t been in the dissertation,
and 70 percent of those were significant. Then they also observed that there were 1,333
hypotheses removed from the dissertation for publication, and 38 percent of those were
significant. There’s lots of reasons that things change
between dissertations and papers. Dissertations are usually too long—stuff has to come out—and
there are choices to make about how it is we tell what we found in the paper versus
what the origins of the work were. Also, editors and reviewers contribute to saying, “Well,
I’m not so interested in that. What about this? What about that?” So we don’t know
from this work how it is these changed. All’s we know is that there was some systematicity
in that change. Findings that were significant were more likely to be added. Findings that
were not significant were more likely to be removed. But they also looked at the same tests that
were in the dissertation and the publication, same thing. There were 272 unsupported hypotheses
in both, but 20 percent of them became significant in the publication when they weren’t significant
in the original research. On the flip side, there were 373 hypotheses that were supported
in the dissertation, and less than 5 percent of those became unsupported in the final publication.
So there’s also a bias in effects becoming significant as they get into the publication
that weren’t because of some changes—we don’t know where they came from—in the in
the reporting process from dissertation to publication. The upshot of this is that that final paper
looks more significant, looks more positive, looks like they found more things than what
was actually in the data because of some selection process that brought out the positive results
and suppressed the negative results, potentially exaggerating the evidence of those findings. Okay, one more example, and this comes from
work by Bob Kaplan and Veronica Irvin who were interested in what’s the impact of getting
people to commit on what their actual primary outcomes are, on what they observe on those
outcomes? What they did was they looked at some large clinical trials from the National
Heart, Lung, and Blood Institute, one of the NIH institutes in the US, and they compared
studies where you had to pre-register in advance. Pre-registration became necessary by law for
these kinds of trials in 2000. So what happened with the trials that were done prior to 2000?
They didn’t have to say in advance what the primary outcome was. Compared to those that
were done after the fact, where you had to commit before you’ve seen the outcomes. What
is our primary outcome? Their upshot finding is this: that the positive result rate dropped
from 57 percent positive results of the studies prior to committing to what the primary outcome
was, to 8 percent after you had to commit in advance. Which one is the primary outcome? We have to be cautious with this for a couple
of reasons. One is it’s a small end study itself, so it needs to be replicated. They’re
conducting, as I understand it, a larger scale replication of this. Then the other is that
it’s not actually an experiment. People didn’t get randomly assigned to have to commit in
advance before or after. It was over time. It’s possible that after 2000, we ran out
of things to discover in this area of research, and so that’s why there are no more, almost
no more, positive results. We can’t rule that out as a possibility, but the other possibility
is that if you have to say in advance, make a commitment, “This is my primary outcome,”
then that gives no more flexibility. You don’t have that discretion anymore for when you’re
confronted with the data for deciding, “Well, actually, this one is our primary outcome,
not that one. This is really the important finding,” and then amplify that in the paper,
thus potentially exaggerating the likelihood that that’s actually there, taking advantage
of noise. Okay, so there are many other possibilities
that we could review for what are things that happen in practice that are challenges for
the credibility of the literature, but across all of this evidence, what we have observed
is that there’s many opportunities for discretion; there’s many opportunities of what we may
not realize that we’re employing reasoning biases into how we treat the evidence that
we’re accruing, and when we try to reproduce the findings in the literature, they are harder
to reproduce than we might expect. If it is the case that we are exploiting this flexibility
and exaggerating the apparent quality of the findings, the apparent significance, the apparent
fact sizes of those findings in the literature, then when we try to replicate them, only a
small portion of replicating and our effect sizes are dropping by half on average in most
of these systematic studies. What can we do to improve that? How is it
that we can improve the credibility of the evidence if we’ve created a culture where
we’re valuing reporting of significant findings over report credible findings? Ultimately,
we think the change that’s needed is a simple change, or changes in behavior that are fundamentally
things that we know already, we’ve known for a long time in our schooling of how it is
we can improve the credibility of our findings. They are just these two things: show your
work and share. What we mean by show your work is that when
I make a scientific claim to you, I show you the process of how I got there, not just show
you the paper at the end of my summary, what David Donoho calls advertising for the research
is a paper, but actually how it is I generated the data: what decision process and different
things I might have tried along the way; what other things did I do that didn’t make it
into the paper but are relevant for this research domain? If you can see that entire process,
then it’s easier for you to evaluate the credibility of the claims at the end. If I know that you’re
going to see that process and the choices that I make, then I have an occasion to take
some perspective. How would someone else think about the decisions that I make in here? Would
they see these as damaging the credibility of the claims, and if they would, then maybe
I ought to take a different strategy in how it is I pursue that evidence? Then buy share, we mean make available the
data, the materials, the code, all of the things that were done in that research, so
that someone else can more easily replicate it, if they want to replicate it. They can
easily reproduce the findings if they want to reanalyze it, or they can scrutinize the
choices that were made and see how robust those findings are to other scenarios and
other choices, and then extend those findings in other applications that may not have been
tested yet. By making those commitments as clearly and
as transparently as possible, what we enable, rather than telling people how they should
do their research and make their claims, we enable them to do the research the way that
they think is the most effective, and then allow what is the great promise of science
to actually occur, which is self-correction. We conceive of science as self-correcting.
We know that lots of our claims are not going to turn out to be true because we’re at the
boundaries of knowledge, pushing them as far out as we can. For that to work, the system,
when we are all debating these ideas in the marketplace, the system needs the information
of how is it that those claims were made. If we don’t have that information, if we only
have what is I think is important to represent about my claims in the paper, then you can’t
effectively self-correct because you don’t have the information that you need in order
for that process to work. Transparency is facilitating that self-correction process
by making it possible for researchers to critique, to extend, to challenge whatever it is that
that marketplace of ideas will exchange. All of the other practices that we might talk
about or associate with open science are just mechanisms of methodology: how is it that
we might improve our research practices when we make them transparent so that others find
those claims to be more credible and that they are actually more credible and robust. Okay, so what I want to talk about for the
rest of the time is how is it that we can get there? How is it we can shift a culture
so that showing our work and sharing is more likely to occur and becomes a pervasive practice
across the scientific community? When we’re thinking about changing a research
culture, we can refer back to some of this classic work by Rogers about how technologies
become adopted in communities. The Diffusion of Innovations is the classic book by Rogers.
You might recognize these terms very well. There are innovators who are at the frontlines,
the first people that see the possibility of this new technology and so adopt it immediately
just to try it out, see what happens. Those early adapters are convinced about the promise
of that, and so they are willing to get on board and suffer through the early days of
technologies. Then that moves into as the technology shows that it’s effective, people
see that others are doing it, you get an early majority, the late majority, and then of course
there’s always the laggards who are quite happy with the world as it is and, “Please
don’t ask me to change anything that I do because I like how I do it right now, and
you don’t need to show me that new technology because I’m not going to use it,” so they
bring up the rear. Okay, okay so we think about adoption of research
practices in the same way, and we think about the big-picture strategy of changing a research
culture with this pyramid. It’s a pyramid because effectiveness of the higher changes
depends on effectiveness at the lower elements of this. At the base is infrastructure. We need tools.
If we want people to adopt new behaviors, we need tools that make it possible for those
behaviors to occur. If I can’t do those behaviors anywhere, there’s no way that I will do them.
But it’s not sufficient for most to just make it possible to do it, especially with researchers
that are already busy, already have a lot on their plate of things to do. If we provide
them solutions that require more things for them to do, append it on the end of the research
process, then they’re very unlikely to do them because they already got plenty to do,
so effective solutions have to integrate with the life cycle that the researchers already
have, the workflows of how it is they get their work done and how it is they pursue
their solutions. Just having those technologies available even
when they’re easy, people look to others about how is it that I’m supposed to behave, how
is it that people do their work in my field, and so communities play a very important role
of making things normative, of making it explicit, making it visible that other people are doing
these behaviors that might be behaviors that I should do as well. We set norms through
our local communities in science of this is what people in my subdiscipline do, and how
it is they do it, and how they do it well. For some, we can add to that, instead of just
saying this is what people do because it’s the right thing to do, in fact it’s in my
interest to do these things because the rewards that I get from my institution, from my granting
agency, from the publisher are aligned with the behaviors that we are trying to get people
to do: be more open, share your work, show your work. Then finally, for many of these things, you
need a policy. This this is what you do now. If you want the money, these are the requirements
to get the money, so do those things, but a key part of policy change is that if these
other things aren’t present, then policies are more likely to be perceived as bureaucratic
burdens, rather than part of promoting good practice. If there isn’t a good technology,
if I don’t see others doing it, if I don’t see this as something that’s relevant for
the quality of my research, then I’m going to treat that policy as just the next form
I have to fill out, rather than facilitating the work that I’m trying to do and the success
that I’m trying to have. These are highly independent interdependent
parts of a change process. I want you to think back to that adoption curve, from the innovators
all the way through the laggards, and we can rotate it and impose it right back on to this
pyramid, like this. For innovators, all we need to do because they have vision, they
see potential in new technologies, all you need to do is make it possible. They’re willing
to suffer through the idiosyncrasies of new technologies because they’re focused on the
behavior itself. They value openness for openness’ sake, and they’re willing to do it regardless
because they see that as a way that the world can change. Likewise, early adopters. Once the technology
becomes relatively easy to use, they can integrate it into their workflows. That’s enough. They
don’t need the rest of the community to do it because they see value in those behaviors
themselves. They’re not looking for what others are doing or what’s valued for them to try
it out and to do those new behaviors. As you have that initial adoption of some
minority of the research community, if you make that visible, if you can see that others
in my community are doing this behavior that I didn’t know other people do but is aligned
with my values, those values are already present for transparency in the community, if I see
that others are doing those transparency behaviors, then those norms start to shift for that early
majority. Many more people will say, “Oh, I can see that this is a behavior the people
are doing. I am going to do it as well.” To really get that large majority in the middle
of the bell curve to adopt, then make it rewarding. Find ways to integrate those behaviors into
the key mechanisms of reward across the research community. Finally, those laggards, well, they’ll do
it if you say you have to do it to get the money. Then they’ll do it, but they may not
adopt it otherwise. This is how we think about the adoption process,
and I want to give a couple of examples of interventions at each of these levels of analysis
that are working together interdependently to try to shift how it is the culture does
these behaviors. The first is two base levels, the technology
levels, and most of what our organization does is technology. We have a staff of about
50, and about two-thirds of that staff are the software development team who are building
Open Science Framework or OSF, and the goal of the OSF in terms of thinking about this
culture change strategy is to integrate with the researchers’ workflow and provide support
across the entire research lifecycle. The OSF does not demand that researchers make
everything open. What it does is it tries to solve a problem that researchers have today,
that they can make use of the OSF to help solve. That problem occurs for my lab as it
does, I know, for many others, which is I lose stuff from my own use all of the time.
Our data, our materials, our old experiments—we do not keep track of it well. A computer explodes,
a lot of information explodes with it. A postdoc explodes, a lot of information goes with it.
As a consequence, we can’t keep track of our own work, especially across many different
collaborations and the ad hoc ways that each person organizes their materials and workflow. The OSF tries to solve that problem. We should
no longer lose anything we have for own use because it’s a cloud-based system that can
archive everything that we do and help with collaborative management of teams that are
distributed across places. We all just add to our private projects. Here the collaborators,
all the data, all the materials, all the process gets posted there. We can register the designs
if we want to commit in advance what it is we’re planning for the research to distinguish
it from what we discover as we do the work, and it helps facilitate that process just
among the collaborative team. Then what we do is with each project and each
component of each project, there’s a little button in the corner that says, “Do you
want to make it public?” If you click it, it says, “Are you really sure you want to
make this public?” If you say yes, then that part is public. We integrate that private
workflow for collaborative management with the public workflow for making these things
available to others so that you can make as much of your process as transparent as you
want or as that you can because of IP or sensitivity of data and otherwise as possible, and you
can have controlled access, so you can say, “I can’t share this data with anybody. I
can’t make it public because it’s sensitive data, but I’ll make it so that other people
can request it,” and so you make it available as a controlled access and then other people
can say, “Here’s my IRB, that I can work with such data,” and then you can navigate
the data exchange. The purpose is to make openness part of the
research process, not an extra thing, and support the researcher lifecycle as it happens.
You can go check it out. It’s free to use. You can just go and use it, but I won’t say
anything more about that part. Oh, I guess I will say one more thing, which is to show
adoption. Just by making tools like this available, there is enough interest in the community
already that that adoption is accelerating at nonlinear rates. We released the OSF in
2012. This is showing one of these graphs, the number of registrations, number of times
that researchers created a project, and then created a frozen version usually right before
they start data collection or analysis to say this is what my commitments are in advance
of observing the outcomes. Any other figure that we show, whether it’s users or number
of files or amount of data, they all look like this, doubling essentially every year
since the service was launched. Ours is not the only service that provides such things.
Others are showing good growth rates as well, showing that those values are already present,
that sometimes it’s just a matter of making the tools available and providing other reasons
to get involved so that people will start to adopt these behaviors. Okay, I also want to point out that we had
a pre-registration challenge about starting registration of trying this out, and more
than 500 universities participated, and you can see the number one university all of those
from the pre-reg challenge. University of Queensland beat Oxford, and Stanford, and
UCL, and Pennsylvania, Toronto, Duke. This is just how many research teams entered a
pre-registration for a new project into this challenge and tried out some of these new
behaviors to promote transparency. There’s a lot happening here among your colleagues,
a lot more than there is anywhere else, so feel good about that. That’s all I got to
say, really, about that. Okay, another intervention. How can we think
about the existing reward system and nudge how it is that that system works to be more
values-aligned so that we are promoting some of the rigor and transparency behaviors and
commitments to minimize some of the discretion that we have at different phases? Registered Reports is one of these solutions.
Here is the cartoon version of how publishing happens. We design a study. We collect and
analyze data. We write the report, and then we publish it. Of course, it doesn’t quite
happen like that because there’s this big barrier after the report called peer review
that is in my way of getting what I need, that publication. In this context, all of
the incentives for that report, I’ll make it as beautiful as possible so that at the
seventh or eighth or ninth journal that I submit the paper to, they eventually say,
“Okay, fine. Okay, just let it go. It can be published.” Registered Reports makes one fundamental change
to this process, and that is to move peer-review to right after the design phase at the journal.
Now to the journal, if I submit a Registered Report, what I submit is: here’s my research
question; here’s justification for why that question is important; here’s the methodology
that I’m using to test that question; and maybe here are a few preliminary experiments
or exploratory studies that I did to sort of help to show that there’s something viable
here, something worth testing. Then those critical tests the reviewers evaluate saying,
“Is that an important question? Is that methodology a good test to that question?”
And if it passes those reviews, then you get in-principle acceptance. Whatever the outcome,
we will publish it as long as you follow through with the methodology that we’ve now agreed
upon, showing that in fact you carried it out in an effective way, and so here are some
outcome independent criteria to show that you did the things that you said you were
going to do and did them effectively, but regardless of outcome, we will publish it. Just this change fundamentally shifts some
of the incentives for researchers in the publishing process. I still need the publication—we’re
not going to change those rewards very soon—but now the kinds of ways that I get publications
and Registered Reports is more dependent, in fact it’s exclusively dependent, on asking
important questions and proposing effective methodological tests of those questions. I
don’t know what the results are; the reviewers don’t know what the results are, so we can’t
be biased by picking only positive results, picking only sexy results. Well, what we can
be biased by is: is the question important, and do we need to know the answer, which is
presumably what a lot of research process is about—is we need to find this out. We’re
not supposed to control the results. We control the designs and the methodologies and our
ingenuity to test things. With this shift, now my primary goal is to test the most important
things, provide good evidence that’s worth testing, provide a compelling rationale and
methodology for testing it. Reviewers’ incentives also shift. When I’m
a reviewer, when I see the entire report and all the outcomes, after checking how many
times I was cited, the second thing I check is: are the findings consistent with my prior
claims? If they’re consistent, then it’s a great paper. We should publish this. It reinforces
my point of view. If it’s inconsistent with my prior claims, then I’m pretty sure I’ll
find some errors in the methodology that justify rejecting it. I have skin in the game as a
reviewer and how it is I might engage with this research even if I’m trying to be objective,
even if I’m willing to entertain possibilities that are against my prior claims. But in this context with Registered Reports,
I don’t know what the results are. I can’t be biased by them, and so even if I have someone
that has a completely different point of view than me in this particular research domain,
both of us may come to agreement on the quality of this design and the quality of the methodology.
One of us may be particularly disappointed with the outcomes, but we don’t know what
the outcomes are, and so we can’t be biased in that decision-making. There’s a lot more that we can describe about
this, but adoption of Registered Reports is likewise been showing this nonlinear growth.
As of yesterday, I think there are 170 journals that have adopted Registered Reports as an
option for submissions. The first adoption, you can see, 2013, we published a special
issue of the journal Social Psychology to demonstrate the viability as proof of concept
for this, and there now I think 150-or-so published Registered Reports that have finished
and the final publication is out. We have some initial things that we’ve learned
about the process and about the outcomes that we can look at for what Registered Reports
are doing, and so far the evidence is consistent with what we would expect this kind of commitment
to achieve. The first is that it seems to be addressing
publication bias quite effectively. You might know that if you look at the literature as
it exists today, almost all of it is positive results. Depending on the subfield, between
80 percent and 95 percent of the results are positive results. We’re just always really
good at confirming our hypotheses. With Registered Reports, whether they are for novel studies,
testing new questions, or for replication studies, testing claims already made, the
negative result rate is about 60 percent on average. More than half of the findings we’re
publishing with Registered Reports are negative results, suggesting that there is substantial
bias in what we end up publishing yet gets left out in the standard model. Now that’s important because just seeing that
is evidence for many editors to say, “And that’s why I will never publish Registered
Reports because if I publish a lot of negative results in my journal, then no one will read
my journal. No one will cite the things in my journal, and I’ll be the one that destroyed
the impact factor of this journal. That’s not a legacy that I want for my journal.”
It’s a reasonable question, whether we agree with that logic or not, it’s a reasonable
question to ask is: what’s the impact on, at minimum, citations for the work that’s
published as Registered Reports compared to others? So far, the evidence suggests that
there is no citation reduction for Register Reports compared to the other articles in
those journals that published at the same time. If anything, there might be an improvement,
an increase in citations for those. We don’t know how robust this will be or if
it will remain this way, but my speculative interpretation is that when peer review is
done, before you know the results, it has a big benefit for the importance and the quality
of that research—the quality, because we critique the methods very closely. We’re attending
to that because that’s all we have to attend to. Then the importance of the results, because
what we’ve agreed before knowing what the results are is that we really need to know
these results because this is important for advancing research, whether they are positive
or negative. We’re evaluating it with the possibility that they won’t be; this relationship
won’t exist. As a consequence, those papers get more attention and interest because they
are very high quality and they are testing outcomes that we just wanted to know what
the answer was. That still is speculative because we don’t know. We have some more evaluation
projects underway to try to see what is the impact of adopting Registered Reports on the
quality of evidence and whether it has other unintended consequences that we haven’t yet
anticipated? All right. So that is example two. I’ll give
one more example, and then we’ll transition to closing and discussion. And that is: how
is it that we can address this challenge of normative change? When there are people doing
open science behaviors now, it’s not obvious that others would observe that they’re doing
open-science behaviors. Based on that pyramid, we really need evidence for ourselves that
there are others in my community that do this thing. Many of the big barriers to adoption
of a new behavior is nobody else does it. It must be too hard, and so why would I spend
time doing it? One of the solutions for that is obvious:
is make it visible, when there are others, however many there are, when there are others
doing that behavior, so that I can see that in fact other people are doing it, that it’s
not impossible to do, that it might be worth my while if I think it’s of value. An easy
way to do that is to signal that behavior when it occurs with things like badges. A
community of researchers develop these badges in these specifications. What does it mean
to have open data for an article, open materials, or pre-registration? A journal can adopt those
badges, and when your article is accepted at the journal, the journal can ask, “We
offer these badges for good practices. If you would like to get an open-data badge,
you just need to meet these criteria, and if you can meet them and you want to, you
get the badge and then a link to the data in the article. We might say that that’s silly, badges. This
is science. This isn’t Girl Scouts. We don’t care about badges. Badges are a stupid thing.
But the obvious question to ask is—well, first the obvious point is that it’s not about
the badges. It’s about the behavior that the badges signal, and if those behaviors are
desirable behaviors, if we see them as valued behaviors, then perhaps a simple act of signaling
that I did that is enough to encourage me to do it. Even more importantly, my signaling
that I did it might be important for others in my subdiscipline to say, “Oh, people
do that. Maybe I’ll do it, too, because it is something that I value. I don’t have to
convince people of the value. I just have to potentially convince them that in fact
this is something that people do.” Of course, we need to evaluate whether that’s
effective. The first journal that adopted badges was Psychological Science, and they
did so on January 1st, 2014. What I’m showing you here on the y-axis is the percent of articles
in Psych Science in black and for comparison journals in grey prior to adoption of badges
by Psych Science, so in six-month increments. Psych Science had about 3 percent of their
articles with open data prior to adoption of badges. They adopt badges on January 1st,
2014, and the graph goes like this. Within 18 months, 39 percent of the articles in Psych
Science had open data. No change in the comparison journals. This was first half of 2015. This
year, 90 percent of the articles in Psych Science have open data. We have gone from
near zero to almost all of them within a five-year span, and that’s not because the badges are
super-important right. All the badges are doing is providing signals that some people
are doing this, and that’s enough for a few more people to do it, which makes it enough
for a few more people to do it, which makes everybody else realize this is an emerging
norm, and that pretty soon it becomes costly not to do it because this is something that
we value, and if everybody’s doing it, then the question that gets asked is why are you
not doing it, rather than are you going to do that? So with valued behaviors, it’s not so hard
to change that behavior if you can make it visible when people are doing it, and you
can make it easy to do it, and you can provide the solutions to make it possible, and, even
better, if you can provide some reward for doing it. Okay, so that’s that model. Back to the floor. Okay, now I’ve emphasized—oh, I have I have
one more illustration of just how these norms are changing. Betsy Paluck and Ted Miguel
and their colleagues did a survey of social behavioral sciences—so econ, political science,
sociology and psychology—to see when people report retrospectively, have you adopted any
of these behaviors—open data, open materials, pre-registration—when did you start to do
that behavior. They did a very good sampling effort on this. What you see here is simply
the cumulative plot of people saying that they’ve done at least one of those behaviors
in black. By 2017, 84 percent of their survey respondents across these four fields have
said that they have done at least one of these behaviors, and you can see the individual
behaviors below that. There’s also variation across these fields
in which other behaviors have become more popular and when they became more popular.
For example, in economics, most of the leading journals now, by routine, require data sharing
and sharing code. In economics, you see much earlier in this decade a rapid increase in
code sharing and data sharing behaviors but not the same kind of increase in psychology
and sociology, in which journals have not generated yet strong policies for those practices.
Whereas, on the other hand, most of that rise in pre-registration is occurring within psychology
compared to the other fields because this is a particular behavior that’s become mostly
popular within psychology with rapid onset. Among the psychologists surveyed, 40 percent
of them, by 2017, had pre-registered a study. Now given the rates, it might be past 50 percent
by now. These changes are happening, and they’re happening
quickly, but to close, all of the examples that I’ve provided so far focus on a single
point of entry of influence of shaping the culture, and that’s with journals, in shifting
the behaviors of individual researchers, but, as we know, the ecosystem around researchers
has many different stakeholders that shape the behaviors, the culture, the incentives
that drive that individual’s behavior. Each one of us has a unique combination of institutions,
publishers that we publish in, societies that we’re members of, and funders that fund
our work. If all of those are not working in concert, then we can’t effectively shape
the entire culture. If one of them doesn’t reinforce the right messages, then we provide
some inhibition in allowing the values that we have to be expressed in daily practice. The big challenge for culture change is the
coordination problem. All of these are independent actors, and within them there’s lots and lots
of independent actors, and we need them all to sort of nudge their incentives at the same
time in order to reinforce each other’s incentives, in order to make it in our interests to live
according to our values for our own career success. There’s lots of different activities
that we can discuss in Q and A about how each of these can play a role, and our plain roles,
in advancing transparency and reproducibility of research. For now, I will just stop and if you would
like these slides, just take a picture. There’s a link to them at the bottom. If you’d like
in more information about any of the things that I mentioned, there are links there to
get you more. Thank you very much for your time and attention.
I really appreciate it all. Thanks, Brian. Again, if anyone has any questions,
please come up to the microphones and ask them. To get started, I think I’ll ask one.
This is great. You’ve spent the last week in Australia—this is his first trip to Australia—and
touring around and interviewing, chatting with students, chatting with academics, but
also funders, policymakers, and the Chief Scientist of Australia, and so on. What’s
your impression? I mean, how are we doing within—you showed us a bit of evidence about
how we’re tracking is an institution in terms of pre-registrations and so on—but what’s
your impression of how we’re going in Australia, more generally? Good question. It’s hard to know the extent
to which I get a biased sample because if people invite me, they may be inviting me
because they’re already invested in these issues, but because I’ve had a chance to go
to a number of institutions—this is my fourth institution on this trip and one more still
in Sydney—but across the board, there has been broad enthusiasm from every stakeholder
across every level for adoption of these practices and new behaviors of shifting this culture. That is very encouraging because the shared
enthusiasm for that is the key initial place to start, and then the question is: what are
people willing to do in order to actually adopt these behaviors and shift the internal
cultures? The hardest change is within institutions because there’s so many different disciplines
and so many different interests across those that it’s very hard at an administrative level
to have those practices get implemented and extended across the disciplinary boundaries
because so much of our norms and practices are within our disciplinary domains. The conversations
can be promoted at an institutional level from top, but a lot of the change is local
within individual departments. To the extent that those changes are happening
within these groups, it’s because of groundswell work: people coming together saying I’m interested
in this; we’re going to start a little course; we’re going to do a little training together;
we’re going to all work on this particular problem; we’re going to change our honors
requirements for students. Whatever it is, those kinds of individual behaviors have a
rapid cumulative effect for making it easier for institutional change at the top. Across places that I’ve been just on this
trip, that’s universally occurring. I haven’t, as opposed to some prior trips in other places,
haven’t had people just say, no, we like our metrics; h-index is the way to go, or whatever,
you know. They didn’t exactly say that, but they effectively said that. This has not been
that at all. Thanks. Yes, please. Is it working? Thank you for your talk. It was really great. I just had a similar cultural change question.
I’m almost at the end of my PhD, and lots of people here are PhD students as well. I
was just wondering how we might go about in the future reconciling this quality over quantity
issue because a lot of us here will be starting doing pre-registrations and also Registered
Reports, and some of my friends have done Registered Reports, and it takes a very long
time, and it’s not necessarily the same kind of quick timeframe and turnaround that
we would normally have to get to a postdoc where people might want 10, 12 publications
or whatever kind of extravagant number that they would want, so I’m just wondering what
we should be emphasizing, this kind of quality science in our CVs and things like that? Yes, okay, great question. If in case you couldn’t hear that in the back,
the question was fundamentally about quality versus quantity, and how do we navigate that
with the incentive systems as they currently exist as early-career researchers, particularly
because some of these practices may be more work and may result in slower science compared
to the other ways that we might do things that might allow us to get quantity but might
not be as robust. There’s a few different answers to that, and
so I’ll sort of do it piecemeal. The first was specifically things like Registered
Reports. Well, Registered Reports is going to take longer because we have to go through
this whole peer review upfront, and so that might slow us down in producing publishable
outcomes. That might be true, but I think the early evidence suggests it’s actually
the opposite. The advantage of Registered Reports for speed is that you know very early
whether your paper is going to be accepted or not. You can even put it on your CV—this
is in principle accepted—before you’ve collected the data, which is pretty amazing. The yield
is also higher than the traditional review process. As we do regular research—we do both kinds
of research in my lab. The regular research at the outset has very uncertain yield. We
might do lots of different experiments. Some of them end up being publishable. Some of
them not. That takes time. Then, also, the report itself has uncertain yield and lots
of time which is: we send it to Journal A, send it to Journal A minus, send it to Journal
B plus, send it to Journal B, until it finally, finally somewhere gets in. That takes a lot
of time. The average publication rate for regular-style reports with Registered Reports
is somewhere in the neighborhood of 15 to 20 percent of articles get accepted. It’s
very common to have to go serially through many journals, which adds a lot of burden
and time. With Registered Reports, so far, the publication rate in those same journals
is about 70 percent, so far. If that holds, that higher yield means a lot less time having
to go serially through multiple publications. This is an extremely pragmatic argument on
this point. The reason I think that it’s a much higher
yield is because in the regular model, when you identify, as a reviewer, fatal problems
in my design, there’s nothing I can do about it. You just identified the reason for a rejected,
which I hope the next reviewers won’t notice so that it’ll get through, whatever it is,
but there’s nothing that can be done. When you identify a fatal flaw in my Registered
Report design, you suggest a fix, and then I change the design to address the concern,
and then it’s still a viable project. There ends up being a much more collaborative exchange
between reviewers and editors and the authors for trying to make these into viable interesting
projects. Most of our ideas are interesting in some way, and the reviewers as experts
in that domain can help build an interesting case. It might be more time just for that
individual review process, but I think there’s actually higher yield, total, and perhaps
higher quality yield, which is the promise of Registered Reports. That’s a super sort of at-the-base pragmatic
answer. Another answer at the totally other end is:
what are people hiring actually looking for? It may be true that there are people that
are looking for quantity. The perception is there that that’s what matters—is we need
to produce quantity. When I went up for tenure—I told this story
earlier—when I went up for tenure and then again up for full professor just a few summers
ago, the administrator in my department said, okay, please print out all your papers and
submit them for the committee to review. I had been—you know, going up for full professor
in the US is going up at 10 years after PhD or 10 years after starting a faculty position.
I had a hundred papers, so printing them out, first of all, that’s stupid, but printing
them all out, that’s really stupid. What have they communicated to me? They’ve communicated
to me that volume matters, that they tenure by weight. They will put them on a scale,
say, “Yes, you got enough,” and then I get tenure. Of course, they don’t mean that. I have observed
the promotion process in my department. My spouse was at the dean’s level committee
and talks about the rigor that that committee goes through as it goes through the stages
for advancement. In those committees, we’re not weighing the papers. It’s a very rigorous
kind of evaluation, but they’re what they’re communicating is the wrong thing to how it
is I get advanced. I think, and this is now speculative, I think that the reality of evaluation
is much more about quality than what we perceive it as. We just don’t have the evidence, and
the communication is off target. What we could improve is how it is we communicate
what is the actual basis of evaluation and then show the evidence that that is in fact
the basis of evaluation, that if you do want to just count by numbers, then say it, “This
is a postdoc where the person with the most publications wins.” No one is going to put
that ad up, but they might be influenced unknowingly by numbers, and so we should be studying whether
that occurs. There’s a simple messaging that might change. If my department had instead
said, “When you come up for tenure six years from now, you’re going to submit three papers.”
You pick what they are, and they, the committee, is going to read those three papers and evaluate
the quality of your work based on those three papers. If that was what the message was to
me at the outset, it would have totally changed my mindset about what it means to succeed
in this environment. It’s not about volume because I’m not going to give them everything.
I’m going to give them three, so I’m going to make sure they have three great ones. Of
course, I’m going to try to do more things, and I know that not everything will hit, so
I will do more work, but I need to have some quality work. I think there’s a messaging problem, a lack
of data for what actually is rewarded, and more potential for us to communicate our values
in how it is we reward and incentivize so that we don’t feel that conflict, that, really,
what you’re trying to do every day early in your career and late is do the best work you
can, and that’s it. That should be where it stops. Thanks for that question. Yes. Nice job. I’ve had a bit of a rocky start to my PhD
with the publication process where I’ve ultimately had my paper rejected four times
and the conclusion has been the same every time, where it’s—they like the research
question; they love the method, but the results are a little bit messier than they want, so
it’s been rejected. It’s ultimately come to that conclusion four times over. I guess
what my question is: with the open science movement, do you see it shifting towards holding
reviewers and editors and journals accountable for those sorts of decisions or just relying
on a general social change, an environmental change, where reviewers are just basing it
on different decisions? Yes, it’s a great question and it’s a challenge
that will be the longest-running challenge I think for how publication decisions get
made because there’s good reasons to like when it all fits together and it’s all beautiful,
and some of our submissions still look like that, and so reporting the reality as it is—you
know, the messiness is dissatisfying because we want certainty; we want cleanliness. Those
things are good in the abstract sense regardless of what reality in doing the science provides.
That is a reasoning barrier that we’re not easily going to escape from reviewers and
editors. But even in the standard reports process, there is opportunity to address some
of the points that you make. Registered Reports would be great if you hadn’t
done the research already. You submit. You don’t know what the results are, if they’re
going to—hey, exceptions, fine, that happens. A sister model is results-blind review. There
are a few journals, not a lot but there are a few, that have adopted the option to submit
your methodology results blind. This is how we’re going to report it, but you just leave
out the actual findings. From what you described, your paper’s a
perfect candidate for that. Yes, the results do have exceptions and things that don’t quite
fit, but if everyone agrees the methodology is good, the question is important, then people
may be much more likely to agree, “Yes, let’s take it.” Then the results are all
messy and they say, “I don’t understand what’s happening.” Yes, welcome to science.
That’s how it works. Those, I think, are the only mechanisms we currently have available
for that. The other journal strategy that has emerged
in some journals or try to be rigorous on this is evaluate based on the quality of the
methods, not on the importance of the outcomes, like PLOS One has this as a publishing model.
Even there, the accountability part, as you say, the reviewers and the editors may not
subscribe to that or may forget that that’s the principle or may just so much want that
thing to make sense and for it to all fit together that they still say, “I’m not excited
because of the messiness and the methods,” or, “Please just leave out those messy parts.
Just show me the good parts,” which is an even harder situation. You want the publication,
you’re going to drop that stuff. That’s a hard one. No easy solutions there, but I
think we are making progress over time in trying to address it with other solutions. Thanks for that question. Hi, Brian. Thanks for the talk. Just with your kindergartener’s guide, the
“show your work” part of it, I’m assuming it’s intended to be, I guess, a reassurance
of actually, you know, it’s not bad; it’s not that difficult. But as your own example
showed with like the soccer team stuff, for example, it’s actually quite complicated
and there are lots and lots and lots of researcher’s degrees of freedom, things like that, that
people don’t even realize they are doing when they are making decisions, that they
might not be reporting and things like that. So do you find that that kind of approach
of likening it to kindergarteners or whatever sometimes may backfire as coming across as
sort of condescending and not willing to engage with the reality of how difficult research
is? Yes, great point. It is definitely an onboarding
strategy to say, look, we understand these principles as very simple principles, and
the principles are very simple, but the reality of implementation as you described is far
from that, particularly with complex research studies and particularly with the actual process:
how it is we make our decisions and translate that into making it evident in our behavior. One of the key things that I think is a real
barrier to adoption of open science practices is presenting it as all-or-none. You’re either
in or you’re out. Are you an open scientist? And besides that, treating it as an identity
rather than as good methodology. That, I think, is, as you described it, that’s a potential
barrier. If people see this as it’s supposed to be simple, it doesn’t feel simple. I don’t
know where to start. I don’t know how to start. I’m not going to do it right now. It’s a good
way to create barriers to entry for that. The key that we try to communicate, and perhaps
the kindergartner example doesn’t help meet that, is incrementalism. How is it that you
can do something a little bit better today in your methodology than you did yesterday?
We always know that we can do better in our methodology. We know that there are research
design things that we don’t have sufficient training on. We know that there are statistical
practices that we can do better than what we understood before. Open science behaviors really fit into that.
They’re methods. They’re about how to do good methodology. It’s not identity. For that,
then we can get people to onboard in simple ways. For example, my lab started pre-registering
in 2011 because everybody had to because I said so. What we started with was we’ve never
done this before; we don’t know how to do it well; what should we do to start pre-registering?
Because we want to start now. We’re ready. What we did first was, in that final planning
meeting where we’re having the lab meeting or discussion, where we’re deciding what’s
the design; what’s the question with a test—we take notes. Let’s register those notes. That’s
it. Let’s just start with that. We wrote that down. That’s easy. We’ll put that up. Then we did that, and then the next time,
we were analyzing the data, we said we never decided what the exclusion rules are. We should
do that next time. That iterativeness, the recognition that putting those notes up is
better than what we did before, which is not have any notes up at all, no commitments,
okay, we’re doing a little bit better than we did before. Okay, now we add a little bit
more. Now we add a little bit more. It is not an all-or-none doing this or not, an identity
or non-identity. It is good methodology, just like trying to improve statistical practice
and everything else. To the extent that we can help with onboarding, it’s to provide
those mechanisms for incrementalism, that it’s really just about step-by-step improvement. I appreciate the comment on that. Thanks very
much for the question. Yes? I just have two questions, I guess, issues
that I wonder if you have ideas about how to address, or are they being addressed possibly
in ways that you didn’t have time to talk about that. The first one is that its true that a lot
of the researcher’s degrees of freedom come into play and the choice of how you analyze
your data, but there’s also a lot in terms of which data you analyze. You’ve got several
outcome variables. You pick the one that’s favorable. When I, on occasion, actually tried
to go download the data associated with papers, it’s pretty clear that what is posted is just
barely enough to replicate what is reported in the paper, as opposed to fully including
everything that’s relevant to the question. I don’t know how you deal with that or how
much that’s undermining the value of the data sharing. How do you deal with this? Is
there anything from the ground up that can be done? Yes, okay, good question. Let me take the second one first because I
have it in mind, and then you’ll remind me of the first question. On adoption of Registered Reports, the impact
factor or prestige of the journal still matters. Whether it’s reality or not, that still matters
in how people make decisions. With these innovations like Registered Reports and otherwise, we
have made a particular effort to try to make inroads into some of the more prominent journals
within different disciplines to see if we can get adoption. Another option for economics
is Nature Human Behaviour, which has adopted Registered Reports and is for all social behavioral
sciences. That doesn’t solve the top five, but we haven’t gotten in the top five yet. The other thing that we’re doing is grassroots
campaigns to have people within a subdiscipline send notes, talk to editors in the particular
journals that they think would help with adoption within their subfield. We’ve had a good deal
of success in different subfields with that. If that’s an interest to you or anybody, you
can just email me or go to the registered reports website, and there are is a community
of practice that has developed templates of letters to send to editors, and then sharing
of information of what’s working, what’s not working, who’s tried this journal, who seems
most open on the editorial board to go with next? There’s a lot of work happening to try
to get adoption in areas where then it becomes more viable over time. That’s the answer to
that. First question was about incompleteness of
data sharing and other degrees of freedom—I remembered it myself; this is great—and
how that might undermine the effectiveness of some of these. A couple of important points that you make
in that: one is that the behaviors themselves are not done perfectly. Related to the incrementalism
point, there is a sense of, “Well, they got a badge, but they really didn’t do what
I would expect that would make that badge meaningful. Yes, they shared the data, but
the code book’s terrible and I can’t parse it,” or there might be variables that are
relevant that are missing but are relevant for reproducing the findings. Like incrementalism
on adopting behaviors, our overall strategy is recognition that a lot of these behaviors
are skills. Pre-registering well is a skill. It took us a long time to get good at it and
then develop templates to try to help others get good at it faster. Because of that, there
is going to be a lot of slop in the system initially, but our first goal is get people
to do the behavior, then get them to do it well, because if we set the bar so high that
in order for you to get this acknowledgment of having open data, you have to meet all
of these crazy criteria that take months to do, then no one’s going to adopt it. There
is a pragmatism of let’s get people on board with the behaviors and then incrementally
incentivize and improve those behaviors as people learn how to do it better. What it raises is the exactly the issue that
you mentioned, which is it doesn’t then guarantee the reliability of the findings. If I still
took advantage of a lot of degrees of freedom in what I’ve analyzed and I only share the
subset of data that allows you to reproduce the way I analyzed it, then that doesn’t effectively
address all of the challenges for the confirmatory nature of those tests. Yes, that’s how it
is. We aren’t going to get to certainty for a very, very long time, and we’re never going
to get to certainty on any individual claim associated with any individual study. As we
improve, what we will see is the deficiency in how we’ve improved, and as long as we
adopt a mindset of incrementalism, of constant improvement, then part of the critique of
papers will be: it’s great that you shared those. I’m able to reproduce your findings,
but I’m not able to reproduce your process. How did you select those outcomes as opposed
to other outcomes? That’s what I need to see from you next time in order to find your claims
even more credible. The critique process will mature, but now we’ll be at a shared basis
of, yes, we should be sharing this, and that’s a great place to start from in order to make
those additional arguments more credible over time. These are real challenging things, and
the point is that we will never end. There isn’t, okay, now we’re open science and we’re
done. This is just like every other part of science. Now we’re doing a little better before,
and we still have more to do to do even better, and we’ll always be on that path. Thanks for that question and comment.

No Comments

Leave a Reply