Imagine this situation: in groups, you are asked to design and set up an electrophysiological brain recording experiment in which your participants should exhibit a specific waveform, gamma patterns. After performing an extensive literature review, you are sure that this waveform is best visible during tasks that require the manipulation of information with working memory. For some reason, though, gamma waves are not visible in your participants’ recordings. What do you do?
Psychology students are used to non-significant results like these – in social sciences, finding valid population-level phenomena is difficult. Yet, they seem to be very aware of the fact that finding a true effect is not only dependent on its existence in reality but also on the methodological steps that the researcher takes during the conduction and the analysis process. If a phenomenon is well established in previous literature, a psychology student is trained to take a critical look into the sample size and study design, and to discuss the changes that could be made in order to yield different results.
This class was however not organized by the Department of Psychology, but by one of the Life Sciences. Despite being equally well trained in their respective field, students from this department tended to blatantly remove non-significant findings from the analysis and present them as clear outliers that were not reflective of the true population. More importantly, no one seemed to question these outlier-removal practices. When the underlying outlier-correction motives were questioned, the explanation was simple: it was a confirmatory analysis in which we had to replicate findings from existing literature. Them knowing exactly what to do in case the p-value failed to reach the traditional threshold of significance, but the lack of knowledge on terms such as p-hacking and cherry-picking, begs the question of where this philosophy stems from.
It could be argued that in this specific setting, the unwillingness to modify one’s own design was due to time constraints – with only five days to set up, conduct, analyze, and present the findings of the experiment in this course, major changes to the design after data collection could have eaten up precious time. However, as the project evaluation was not based on whether one could replicate the findings but instead on the general approach that the team had taken, it seemed that these interdepartmental differences in general statistical philosophy originated from somewhere else.
Could it be that there merely exists a fundamental difference between the focus of a natural scientist compared to a social one? Biological mechanisms (e.g., systemic blood circulation) are well established whereas cognitive processes such as the use of mental imagery during problem-solving tasks are difficult to measure. Although psychologists are trying hard to look for general laws for human behaviour, it is widely accepted that we can only try to predict it by using the knowledge we possess over pathological brains and, ultimately, statistics.
“More aware of the grand-scheme effects of reporting false results, and the costs of conducting research that is fundamentally ungeneralizable, advancements in the field could be brought nearer.”
It is highly unlikely that the natural science students performing questionable outlier correction procedures did them aware of their problematic nature. Indeed, a look into the curriculum of their individual degree programs reveals that this lack of awareness might, in reality, be a lack of knowledge. During the 2021-2022 academic year, the number of study credits from mandatory research design, reasoning, and statistics courses were 33 ECs in the Psychology Department, whereas in Biomedical Sciences and Biology it was only 8 and 12 ECs, respectively. Especially the extensive 9 EC course unique to the psychology program, Scientific and Statistical Reasoning, coordinated by Dr. Roeland Voskens, seems to serve as a great advantage for psychologists graduating from our university. Diving deep into the pitfalls of frequentist statistics and the biased nature of human reasoning, it provides students with extensive skills in interpreting scientific findings and their underlying argumentation. This raises the question as to whether the lack of a similar course in natural science programs results in the unintentional transfer of these faulty lines of reasoning to actual research laboratories.
The replication crisis within the field of psychology has been well established for some time already. Out of necessity, it has cornered psychological researchers and forced them to put their own as well as all the prior literature under careful inspection. As awareness of the crisis has spread, numerous incentives to reproduce classical studies have sprouted. Within medicine, though, the discussion seems to lag behind or actuate change only in its sub-fields such as oncology. In fact, it could be argued that issues with biomedical and pharmacological research have been brought under sufficient scrutiny only in light of the COVID-19 pandemic. As studies on the virus’s nature have a more direct effect on the population’s well-being than research on, say, the effects of listening to classical music before eating, this is certainly understandable. As an example, analyzing 29 subject-to-concern research publications about the diagnosis and treatment of COVID-19, Besançon and colleagues (2021) found six of these papers to have been retracted due to the methodology or the data analysis being wrong. Although not all of these six papers underwent the peer-review process, it is worth noting that even at this level of research, scientists still make errors in choosing the appropriate analysis methods.
Could it be that these mistakes were made due to a general lack of understanding of design and statistical strategies? Perhaps. By increasing the number of mandatory undergraduate as well as graduate-level statistical reasoning courses, future researchers could at least be made aware of the importance of choosing the correct methodological approach. More aware of the grand-scheme effects of reporting false results, and the costs of conducting research that is fundamentally ungeneralizable, advancements in the field could be brought nearer. Another explanation for these mistakes could be that they were done deliberately. Although not a valid method in any shape or form, impeding researchers from making proper inferences from their findings, we know that depending on the sample and research question, different statistical tests may yield very different results in terms of significance. If that were the case with any of these six publications, the responsibility to publish these results as respectable findings has undeniably shifted to the hands of the scientific journal and the surrounding academic machinery.
“It is clear that to counter this issue, a multi-layered approach should be taken.”
Altogether, without including systematic reviews of statistical methodology in the study’s findings and properly investigating potential conflicts of interest between the researcher and the reviewer, a problem of distrust in published results arises. To counter this issue and to detect early flaws, the general principles of open science, pre-registration and open review processes, could be employed. Making the review process more transparent might pose a challenge, though, as the general attitude toward reviewing is not too positive – the process can and should be time-consuming, but unfortunately leads to the reduction of the amount of time academics can spend on conducting and publishing their own research.
It is clear that to counter this issue, a multi-layered approach should be taken. At the lowest levels, it is the institutions’ responsibility to train their future researchers to adopt and favour the open science framework. This can be done by simply including more material about critical and scientific thinking into the curriculum of all fields that have a research component to them. Moreover, the institutions should support the employment of this framework by funding extensive and archived peer-review systems. Instead of treating peer-reviews as a mandatory loss of time, researchers should be encouraged to engage in such projects and to keep track of the process by archiving all information in a systemic manner. This, though, is a slippery slope that could easily lead to an increase in journal subscription fees and article processing charges. At the highest levels, journals and policymakers should admit their accountability in the process and enforce these changes in their organizational system. However, it should be emphasized that an acknowledgment of this major would require a large-scale change in the academic culture, and for the time being, that might be a little too idealistic.
Although merely a practical laboratory course to introduce undergraduate neurobiology students to the world of electrophysiology, this particular session revealed much about the interdepartmental differences in knowledge about good research practices. In subtle ways, p-hacking creeps its way into classrooms, and if we are not careful enough, it attaches itself to future researchers’ coats and travels to actual publishing laboratories.
References
-Bachelor’s Biology Programme. (2021, October 21). University of Amsterdam. Retrieved 10 February 2022, from https://www.uva.nl/en/shared content/programmas/nl/bachelors/biologie/studieprogramma/studieprogramma.html#Jaar-3
-Bachelor’s Biomedical Sciences Programme. (2021, June 30). UvA Studiegids
2021–2022. Retrieved 10 February 2022, from https://www.uva.nl/programmas/bachelors/bio-medischewetenschappen/studieprogramma/studieprogramma.html
-Bachelor’s Psychology Programme. (2021). UvA Studiegids 2021–2022. Retrieved
10 February 2022, from https://studiegids.uva.nl/xmlpages/page/2021-2022-en/search-programme/programme/6563
-Besançon, L., Peiffer-Smadja, N., Segalas, C., Jiang, H., Masuzzo, P., Smout, C., Billy, E., Deforet, M., & Leyrat, C. (2021). Open science saves lives: lessons from the COVID-19 pandemic. BMC Medical Research Methodology, 21(1). https://doi.org/10.1186/s12874-021-01304-y
-Chopik, W. J., Bremner, R. H., Defever, A. M., & Keller, V. N. (2018). How (and whether) to teach undergraduates about the replication crisis in psychological science. Teaching of Psychology, 45(2), 158–163. https://doi.org/10.1177/0098628318762900
Imagine this situation: in groups, you are asked to design and set up an electrophysiological brain recording experiment in which your participants should exhibit a specific waveform, gamma patterns. After performing an extensive literature review, you are sure that this waveform is best visible during tasks that require the manipulation of information with working memory. For some reason, though, gamma waves are not visible in your participants’ recordings. What do you do?
Psychology students are used to non-significant results like these – in social sciences, finding valid population-level phenomena is difficult. Yet, they seem to be very aware of the fact that finding a true effect is not only dependent on its existence in reality but also on the methodological steps that the researcher takes during the conduction and the analysis process. If a phenomenon is well established in previous literature, a psychology student is trained to take a critical look into the sample size and study design, and to discuss the changes that could be made in order to yield different results.
This class was however not organized by the Department of Psychology, but by one of the Life Sciences. Despite being equally well trained in their respective field, students from this department tended to blatantly remove non-significant findings from the analysis and present them as clear outliers that were not reflective of the true population. More importantly, no one seemed to question these outlier-removal practices. When the underlying outlier-correction motives were questioned, the explanation was simple: it was a confirmatory analysis in which we had to replicate findings from existing literature. Them knowing exactly what to do in case the p-value failed to reach the traditional threshold of significance, but the lack of knowledge on terms such as p-hacking and cherry-picking, begs the question of where this philosophy stems from.
It could be argued that in this specific setting, the unwillingness to modify one’s own design was due to time constraints – with only five days to set up, conduct, analyze, and present the findings of the experiment in this course, major changes to the design after data collection could have eaten up precious time. However, as the project evaluation was not based on whether one could replicate the findings but instead on the general approach that the team had taken, it seemed that these interdepartmental differences in general statistical philosophy originated from somewhere else.
Could it be that there merely exists a fundamental difference between the focus of a natural scientist compared to a social one? Biological mechanisms (e.g., systemic blood circulation) are well established whereas cognitive processes such as the use of mental imagery during problem-solving tasks are difficult to measure. Although psychologists are trying hard to look for general laws for human behaviour, it is widely accepted that we can only try to predict it by using the knowledge we possess over pathological brains and, ultimately, statistics.
“More aware of the grand-scheme effects of reporting false results, and the costs of conducting research that is fundamentally ungeneralizable, advancements in the field could be brought nearer.”
It is highly unlikely that the natural science students performing questionable outlier correction procedures did them aware of their problematic nature. Indeed, a look into the curriculum of their individual degree programs reveals that this lack of awareness might, in reality, be a lack of knowledge. During the 2021-2022 academic year, the number of study credits from mandatory research design, reasoning, and statistics courses were 33 ECs in the Psychology Department, whereas in Biomedical Sciences and Biology it was only 8 and 12 ECs, respectively. Especially the extensive 9 EC course unique to the psychology program, Scientific and Statistical Reasoning, coordinated by Dr. Roeland Voskens, seems to serve as a great advantage for psychologists graduating from our university. Diving deep into the pitfalls of frequentist statistics and the biased nature of human reasoning, it provides students with extensive skills in interpreting scientific findings and their underlying argumentation. This raises the question as to whether the lack of a similar course in natural science programs results in the unintentional transfer of these faulty lines of reasoning to actual research laboratories.
The replication crisis within the field of psychology has been well established for some time already. Out of necessity, it has cornered psychological researchers and forced them to put their own as well as all the prior literature under careful inspection. As awareness of the crisis has spread, numerous incentives to reproduce classical studies have sprouted. Within medicine, though, the discussion seems to lag behind or actuate change only in its sub-fields such as oncology. In fact, it could be argued that issues with biomedical and pharmacological research have been brought under sufficient scrutiny only in light of the COVID-19 pandemic. As studies on the virus’s nature have a more direct effect on the population’s well-being than research on, say, the effects of listening to classical music before eating, this is certainly understandable. As an example, analyzing 29 subject-to-concern research publications about the diagnosis and treatment of COVID-19, Besançon and colleagues (2021) found six of these papers to have been retracted due to the methodology or the data analysis being wrong. Although not all of these six papers underwent the peer-review process, it is worth noting that even at this level of research, scientists still make errors in choosing the appropriate analysis methods.
Could it be that these mistakes were made due to a general lack of understanding of design and statistical strategies? Perhaps. By increasing the number of mandatory undergraduate as well as graduate-level statistical reasoning courses, future researchers could at least be made aware of the importance of choosing the correct methodological approach. More aware of the grand-scheme effects of reporting false results, and the costs of conducting research that is fundamentally ungeneralizable, advancements in the field could be brought nearer. Another explanation for these mistakes could be that they were done deliberately. Although not a valid method in any shape or form, impeding researchers from making proper inferences from their findings, we know that depending on the sample and research question, different statistical tests may yield very different results in terms of significance. If that were the case with any of these six publications, the responsibility to publish these results as respectable findings has undeniably shifted to the hands of the scientific journal and the surrounding academic machinery.
“It is clear that to counter this issue, a multi-layered approach should be taken.”
Altogether, without including systematic reviews of statistical methodology in the study’s findings and properly investigating potential conflicts of interest between the researcher and the reviewer, a problem of distrust in published results arises. To counter this issue and to detect early flaws, the general principles of open science, pre-registration and open review processes, could be employed. Making the review process more transparent might pose a challenge, though, as the general attitude toward reviewing is not too positive – the process can and should be time-consuming, but unfortunately leads to the reduction of the amount of time academics can spend on conducting and publishing their own research.
It is clear that to counter this issue, a multi-layered approach should be taken. At the lowest levels, it is the institutions’ responsibility to train their future researchers to adopt and favour the open science framework. This can be done by simply including more material about critical and scientific thinking into the curriculum of all fields that have a research component to them. Moreover, the institutions should support the employment of this framework by funding extensive and archived peer-review systems. Instead of treating peer-reviews as a mandatory loss of time, researchers should be encouraged to engage in such projects and to keep track of the process by archiving all information in a systemic manner. This, though, is a slippery slope that could easily lead to an increase in journal subscription fees and article processing charges. At the highest levels, journals and policymakers should admit their accountability in the process and enforce these changes in their organizational system. However, it should be emphasized that an acknowledgment of this major would require a large-scale change in the academic culture, and for the time being, that might be a little too idealistic.
Although merely a practical laboratory course to introduce undergraduate neurobiology students to the world of electrophysiology, this particular session revealed much about the interdepartmental differences in knowledge about good research practices. In subtle ways, p-hacking creeps its way into classrooms, and if we are not careful enough, it attaches itself to future researchers’ coats and travels to actual publishing laboratories.