Author: Marshall Schott
About the time Ray was collecting data for the WLP029 vs. WY2565 yeast comparison xBmt, I was contacted by a nice dude named Justin Angevaare who offered some advice on our general statistical approach. I receive emails of this sort somewhat frequently, but Justin’s background and experience enticed me to engage him a bit more than I usually do, as he shared he is currently working toward his Ph.D. in statistics. After commending our general methodology, Justin explained the statistical procedure we were using to determine the p-value for the triangle tests, called a two-tailed binomial proportions test, was not the most appropriate for our purposes and that we should be using a one-tailed version of the test instead.
As you know, the process of determining whether there is a discernible difference between the beers in our xBmts is based on the outcome of a series of triangle tests. The more xBmt participants that are able to correctly identify the odd-beer-out in the triangle test, the more evidence we have that there is a perceivable difference between the beers. We use a statistical test to determine if the evidence we have collected is sufficient to make such a conclusion. The two-tailed statistical test that we had been using was sensitive both to evidence of more participants identifying the odd-beer-out than would be expected under random chance, and also, needlessly, to evidence of fewer participants identifying the odd-beer-out than would be expected under random chance. By switching to the one-tailed version of this test, we refine the evidence to which the test is sensitive to. The result of this in terms of our xBmts is that slightly fewer participants are required to be correct in order to conclude that there is a perceivable difference in the beers at our specified significance level of 5%.
The evidence collected during our triangle tests is distilled to a single value, a test statistic. A larger test statistic indicates greater evidence against the null hypothesis– that the 2 beers are indiscernible. By switching to a one-tailed version of the test, a less extreme test statistic is required to reject this null hypothesis. If you’re a numbers person, the one-sided test requires a statistic greater than 1.645 to make this conclusion at the 5% significance level, while the two-sided test requires a statistic of 1.96 or greater.
Skeptical as always, I did some researching and discovered Justin’s suggestion was consistent with the recommendation provided in the book Sensory Evaluation of Food, which was validating. And so we officially changed the methodology we use such that future xBmt results that would have been really close to significant before may end up achieving significance, meaning we’ll be able to more accurately identify those variables that have an impact on the distinguishability of beer. Rad, but…
What about all of the previous xBmt results?!
Curious how this change might have impacted prior xBmts, particularly whether the one-tailed test might have nudged some closer results into the territory of significance, I went back and re-ran the stats for every single one in which the triangle test was used. Perhaps unsurprisingly, the outcomes of most past xBmts remained unchanged, but there were a few that didn’t, the articles of which I updated to reflect the changes. Here’s a brief outline of those that were affected by the change:
| Dry Hopping: Whole Cone vs. Pellet Hops In An American IPA |
Greg performed this xBmt back in February 2015 and ultimately concluded that a beer dry hopped with whole cone hops was not reliably distinguishable from the same beer dry hopped with pellets. The participant pool for this xBmt consisted of 14 people, 8 of whom accurately identified the odd-beer-out in the triangle test, just 1 shy of significance using the old two-tailed test yet achieving significance with the new one-tailed test (p=0.029). With this, we can conclude that tasters were in fact capable of reliably distinguishing a beer dry hopped with whole cone hops from another hopped with pellets. This makes the comparative evaluation a bit more meaningful, I recommend anyone interested check the full article out!
| Impact Fining With Irish Moss Has On An American Red Ale |
Back in March 2015, I made 2 separate batches of the same beer and hit one with Irish Moss while the other was left alone. An ingredient purported to improve beer clarity, I certainly didn’t expect Irish Moss to have a noticeable impact on other characteristics in these beers, and the data at the time backed this hunch up… barely. Using the old two-tailed test, 9 of the 15 participants would have had to accurately identify the unique beer in the triangle test, but only 8 were able to do so, the exact number needed to reach the cutoff commonly used to imply statistical significance using the one-tailed test: p=0.05. Since the p-value was not less than (<) 0.05, there are many that would contend significance still was not achieved. I was on the fence about this one and Justin assured me it was not significant, hence I remain hesitant to say with any amount of certainty that Irish Moss has a considerable impact on the flavor and aroma of beer. This opinion is admittedly influenced by my own experience with these beers as being indistinguishable. However, I thought it prudent to share this data given how close the results were to being significant.
| Yeast Pitch Rate: Ale vs. Lager in a Kölsch |
Malcolm previously reported his findings from the 3rd yeast pitch rate xBmt where he split a batch of Kölsch and evaluated the differences between ale and lager pitch rates. This is the one where we ran into a bit of conundrum due to the post-it notes Malcolm used during one tasting session emitting an odor that almost certainly impeded the participants’ ability to accurately perceive any differences between the samples during the triangle test. As such, we reported 2 sets of results– the first included all participants and failed to reach significance; however, when we controlled for the participants who sampled from the post-it notes cups, significance was reached. Utilizing the new one-tailed test, significance was achieved even when the post-it note participants were included (p=0.026), allowing us to conclude that tasters were capable of reliably distinguishing between a Kölsch fermented with the amount of yeast commonly used for ale from one fermented at lager pitch rates.
We strongly value transparency and honesty when it comes to sharing the data we collect, hopefully this clears some things up. We’ll do our best to keep errors such as this to a minimum in the future. If you have any questions, concerns, or comments, please share them below.
Support Brülosophy In Style!
All designs are available in various colors and sizes on Amazon!
Follow Brülosophy on:
FACEBOOK | TWITTER | INSTAGRAM
| Read More |
18 Ideas to Help Simplify Your Brew Day
7 Considerations for Making Better Homebrew
List of completed exBEERiments
How-to: Harvest yeast from starters
How-to: Make a lager in less than a month
| Good Deals |
Brand New 5 gallon ball lock kegs discounted to $75 at Adventures in Homebrewing
ThermoWorks Super-Fast Pocket Thermometer On Sale for $19 – $10 discount
Sale and Clearance Items at MoreBeer.com
If you enjoy this stuff and feel compelled to support Brulosophy.com, please check out the Support Us page for details on how you can very easily do so. Thanks!
3 thoughts on “A Statistician’s Perspective On The Triangle Test”
Still though the .05 is such an arbitrary number. Statistically it may be significant but we are humans and it is going to take a big effect to make me really care. 1 in 20 people noticing a difference (regardless of preference) is a ridiculous standard for me to not use the convenience of pellet hops. Still I appreciate the precision and care you use in your xBmts. Makes for nerdy reading. I want to see if you can brew a two hour beer that 75% of people think is pretty tasty. 20 minute mash. 15 minute super-concentrated boil and top up with cold water. If I could cut my brew day in half for a ‘pretty good’ keg to feed the masses, I’m in!
It’s on the list 🙂
Nice one!
Transparency and honesty is ofc paramount when it comes to sharing data. The thing to remember though is that all other exBeeriments, still valid under the previous method, got validated a second time.