Headlines about health are meant to draw our attention. A recent headline touted dark chocolate’s effects on weight loss. Sounds great, but there are some fundamental flaws in this research that probably make it useless.
But these flaws can also be a benefit by offering us a chance to do an analysis and learn a great deal about research. But don’t worry: this article won’t be too painful to read – and there might indeed be some weight-loss chocolate if you make it to the end.
Headlines about health are meant to draw our attention. A recent headline touted dark chocolate’s effects on weight loss. Sounds great, but there are some fundamental flaws in this research that probably make it useless.
But these flaws can also be a benefit by offering us a chance to do an analysis and learn a great deal about research. But don’t worry: this article won’t be too painful to read – and there might indeed be some weight-loss chocolate if you make it to the end.
“Scientists say” isn’t always a guarantee of truth.
What the Research Says
Johannes Bohannon published a study in the Internal Archives of Medicine comparing three groups of dieters:
- Control group
- Low-carbohydrate group
- Low-carbohydrate-plus-dark-chocolate group
These participants adhered to their prescribed diets for 21 weeks. The researchers indicated that the low-carbohydrate group and the low-carbohydrate-plus-chocolate group lost more weight than the control group.
But before you start eating chocolate, we need to take a deeper bite into the study and order up a small side of statistics.
What Is the P-Value?
Many people tend to gloss over the results sections of research articles as they are riddled with statistical terminology, such as “t (67) = 2.1, p < .05.” But there’s an important thing in that statistical statement – that last little part with the p, which is called the p-value.
The p-value is simply the probability that there is no effect. If this value is small (say, less than 5%), we are saying we found something. More accurately, but not grammatically correct, we are saying, “We probably don’t have nothing.”
Scientists are often very proud of their tiny p-values and you will see statements such as “p = .00001.” It might seem a little backward to say we probably don’t have nothing, but it is the way this type of testing works. It is similar to saying, “Innocent until proven guilty,” but we can prove guilty if we have a 5% chance or less of innocence.
“When we test for many different outcomes, it is like holding many lottery tickets. There is a chance we will find a significant finding that isn’t real (a false positive). It is one of the limitations of this type of testing in that we are making a guess based on probability.”
But the more we test, the more likely we are going to have a low p-value or a significant effect. It is like holding a lottery ticket. If I have only one, I have a very low chance of winning. If you have sixty lottery tickets, your odds of winning are much better than mine. When we test for many different outcomes, it is like holding many lottery tickets. There is a chance we will find a significant finding that isn’t real (a false positive). It is one of the limitations of this type of testing in that we are making a guess based on probability.
The researcher of the chocolate article tested for an enormous number of effects (at many time points). That is, he was holding hundreds of lottery tickets and was bound to find something significant. To put it in terms of the p-value, if I have a hundred tests, on average, five would come out significant (that is where the 5% comes from). Therefore, the three or four effects they mention in the article could be false positives.
The researcher in this case was holding hundreds of lottery tickets, so he was bound to find something significant.
Number of Participants
One of the earliest things that young research methods students catch is when a study has a small sample. Small samples are not inherently bad. However, they are not as precise as large samples.
In a small sample, we might find the odd person who is quite different from everyone else. That one person can throw off the results (think about the person at your gym who excels at everything right away). In a large sample, you have a greater chance of balancing out extreme people. Thus, your estimates would be more precise.
“Small samples are not inherently bad. However, they are not as precise as large samples.”
The chocolate study has many statistical methods issues that are too dense for this article. But one is that they don’t mention their sample size anywhere in the paper. We can backward engineer their sample size by looking at p-values and the type of test they used. From my estimates, there were no more than twenty people in this study (depending on how they divided them up in the three groups).
Thus, one person might have been driving the results when the chocolate group got better (again, imagine the person at the gym who excels no matter what exercise they do). If, on the other hand, these results are true, then the effect of chocolate will be seen when another researcher does another study.
Autism and Vaccinations
We can see some similar science in the 1998 report in Lancet that MMR vaccinations were related to the incidence of autism. This study only had twelve participants and this correlation between the two could not be replicated in larger studies (furthermore, the original paper was redacted as the main research falsified data). However, this false positive leads many parents to avoid vaccinating their children.
Unfortunately, the recent study didn’t really make a sound case for daily chocolate consumption.
Pay-to-Play Journals
Publishing research is the currency that pays the bills in academia. Having more publications leads to a researcher being higher up on the food chain in an academic department. Thus, many “journals” run as a business where people pay to publish.
These journals are often utilized by fringe online universities attempting to gain credibility. Contrast this with top-tier journals, where there is often a 99% rejection rate, and it can take months (or years) to receive an answer back on a submitted manuscript as it goes through multiple reviewers. Upon examination, we can see that the chocolate study was published in an open-access journal that charges per manuscript (I attempted to send in manuscript and was offered the prepay method for 600€).
“Editors and reviewers are there to be the wardens guarding science. If we do not have these safeguards, then many more fringe scientific articles can be released.”
I do not fault journals for having a business model. However, with this type of business model, they often forgo the rigors of more selective scientific journals. All of my manuscripts have gone through a tough revision process by reviewers and editors. This process made each article much more scientifically sound.
Editors and reviewers are there to be the wardens guarding science. If we do not have these safeguards, then many more fringe scientific articles can be released. The chocolate study is missing many key descriptions, which I would not have let pass if I were a reviewer. I doubt that the article was even reviewed before being published.
The chocolate and weight-loss study has already been retracted.
Collect All the Puzzle Pieces
The chocolate and weight-loss study has already been retracted. The findings of this study are suspect because of the reasons mentioned above, but of course the retraction leads to more suspicion (J. Bohannon is a journalist who has written extensively on media bias of scientific reporting and it might have been pulled for that reason).
The pictures in this article, taken from major fitness magazines, are examples of how these published “research” articles can be taken and disseminated without a critical eye. The chocolate article is only one example, but without looking too much, we could find many more that make outrageous claims that turn out to not be so spectacular when the original research is reviewed. My suggestion is to take each headline with skepticism, then go and check the original research.
“I take each headline with a bit of skepticism. I think of them as only one piece of the puzzle. I need to collect many more before I can make an informed decision.”
At times it is easy to throw your hands in the air and give up when faced with all the contradictory headlines. I take each headline with a bit of skepticism. I think of them as only one piece of the puzzle. I need to collect many more before I can make an informed decision. If I collect multiple puzzle pieces (research studies) that converge on one item, then I feel more confident in the results.
I do not want to discount all the results in this chocolate study. However, I would want to see similar studies showing similar results. Furthermore, we need to see how these pieces fit in with other pieces. Does the chocolate only work in combination with a low-carbohydrate diet or does it work with other combinations? We need to see many more puzzle pieces before making a choice.
More Like This:
- Don’t Just Argue, Debate Like a Scientist
- A Case Study on How Supplement “Research” Actually Works
- How to Read and Analyze Research Like a Pro
References:
1. J. Bohannon, et al. “Chocolate with high cocoa content as a weight-loss accelerator.” Institute of Diet and Health, Vol. 8 No. 55, 2015.
2. Wakefield Andrew, Murch S, Anthony A et al. “Ileal-lymphoid-nodular hyperplasia, non-specific colitis, and pervasive developmental disorder in children”. Lancet 351 (9103): 637–41. 1998.
Photos 4 courtesy of Shutterstock.