The exBEERiment series began less than 2 years ago and since then we’ve presented results on over 60 variables, of which only a few have reached statistical significance. I have to believe it’s because of this that Brülosophy has developed a reputation as the “mythbusters” of brewing, a moniker that seems to presume some sort of intent to prove convention and those who have promoted it wrong. Is this true? Are we actually busting myths? And what about the fact so few xBmts have produced significant results? Have we been misled?!?
For this Brü’s Views, I asked the guys to share their unfettered opinions on the exBEERiments so far, my vagueness an intentional attempt to extract from each those aspects they view as most important. As will be the case with all Brü’s Views, each contributor was asked to keep their response private in order to elicit the most honest and unbiased version of their thoughts.
When the idea of starting this series initially came up, Ray suggested it might be fun to have a different guest respond every time, someone whose contributions to the brewing world are relevant to the issue at hand. Who better to share their opinions on the impact of the xBmts than the person who has likely inspired more homebrewers than anyone on Earth? A huge thanks to homebrewing legend and friend of Brülosophy, John Palmer, for taking the time to share his thoughts!
The contributors were not made aware of the guest respondent prior to publication.
On The exBEERiments So Far
| JOHN PALMER |
There are rules, that are meant to enforce the guidelines, that are derived from the principles, which suffice until you really understand what is going on.
~ John Palmer ~
Your theory is crazy, but it’s not crazy enough to be true.
~ Niels Bohr ~
Any time scientists disagree, it’s because we have insufficient data. Then we can agree on what kind of data to get; we get the data; and the data solves the problem. Either I’m right, or you’re right, or we’re both wrong. And we move on. That kind of conflict resolution does not exist in politics or religion.
~ Neil Degrasse Tyson ~
I discourage passive skepticism, which is the armchair variety where people sit back and criticize without ever subjecting their theories or themselves to real field testing.
~ Tim Ferriss ~
I am often asked why I got into brewing and why I started writing about it. The answer is pretty straightforward: I am an engineer; brewing is a process; engineers think about improving processes. Documentation just naturally follows. A scientist asks, “Why?” An engineer asks, “How?”
The history of brewing is the history of the human race. While it did not directly lead to spaceflight, it is a fact that once we achieved a stable presence in space, beer was brewed there. But I digress.
I think I can state with confidence that we are in the golden age of brewing. There have never been more breweries in the world than there are now, there have never been more brewers than there are now. And I don’t just mean academians publishing in scientific journals, although they/we are certainly part of it. All over the world, brewers are asking questions as they brew. They are not content to simply brew the same beer that their mothers did, they are going to brew different beers, different ingredients, different methods, different equipment.
Ingredients have changed over the centuries and recent decades. Why decoction mash? Because malts in the 19th Century were significantly less modified than today and benefited from multi-rest mashes. Yes, but why decoction? Because it was an elegant solution to two issues: better starch conversion and raising mash temperature. Is it required today for lager beer? No. However, it still adds specific melanoidin content that is desirable for some styles. How did we determine this? Because brewers challenged the paradigms and asked Why and How.
You may have seen “swan necks” at commercial breweries on the lauter tuns– a series of laboratory-type faucets all in a row over the lauter grant. What are those for? We don’t use them as homebrewers; we rarely ever use lauter grants either. Should we be using them? Professional brewers do; if they think those are a good idea, then we should probably do it too, right? Not necessarily. Here is where the difference in scale between 5 gallons and 50 barrels creates issues that require engineering solutions. A false bottom in a 10 gallon kettle has very uniform flow and will efficiently lauter 20 pounds of grain. But multiply that area by 100 and you are going to have variation in lautering rates across that false bottom. That’s what the swan necks are for; they allow the brewer to regulate the drawing of wort from different areas of the false bottom to achieve uniform lautering. It is not a problem or solution that homebrewers need to address, but unless you ask the questions and run the experiments to answer them, you aren’t going to know that.
Engineers are always qualifying their answers– trying to make sure that what they are saying is always true and can’t be misconstrued and put something or someone at risk. Therefore, we tend to speak conservatively. We issue documents that describe best practices, even if that best practice is only occasionally necessary. Things such as wearing full personal protective equipment (PPE) when handling acids, or always rehydrating dry yeast. Are such things Myths that need to be Busted? Occasionally, but more often it is simply a playing field or work space that needs to be better explored and better defined so that more people can make better use of it. The more experiments we run, the more data we have, and the better our understanding will be. I applaud all explorers in brewing. Cheers!
John Palmer
| GREG |
The data resulting from the exBEERiments have tended not to achieve statistical significance (at the p<0.05 level). That’s okay. The objective is not to prove anything. If it were, it would be easy to manipulate the results, but there’s no secret beer agenda here. I just want data. I only want to know if slight changes in a given variable have a noticeable impact.
To that extent, it shouldn’t matter whether or not a result is statistically significant. “Lacking significance” certainly does not mean “lacking relevance, importance, and/or value,” though it is easily interpreted this way. We’re talking about probability, which is not binary, but continuous. It’s a sliding scale. The data goes into a formula, and if your X is greater than some arbitrary Y, the formula tells you that there is probably a relationship. It’s never enough to claim certainty.
Now, to contradict myself, I would rather my data be statistically significant than not. The reason is a significant result provides me with slightly more comprehensive information about the variable. To help explain this, consider the following:
When I’m performing a split batch single variable xBmt, there are three things I want to learn about the variable:
- Does the variable produce a perceptible difference between the two beers?
- Is the difference perceptible to a significant number of blind tasters?
- How big of a difference does this variable make?
Let’s first consider an xBmt where the triangle test results are statistically significant. Those results would support (1) that the variable likely caused a perceptible difference and (2) that a significant portion of blind tasters were able to reliably notice that difference. Furthermore, the p-value might provide us with a hint at (3) how big of a difference the variable makes.
Now consider an xBmt where the results didn’t reach statistical significance. Can we conclude the opposite, that (1) the variable caused no perceptible difference? Not really. Not only have there have been multiple xBmts where the Brülosophy crew members claim to be able to discern a difference even though a statistically significant number of tasters were unable to, but that’s just not how inferential statistics work– the failure to demonstrate a difference in a single trial does not mean the variable has no impact.
But surely, we can conclude (2) is false, that a group of blind tasters cannot tell the difference, right? Well, maybe. In the case where 0.5>p>0.05, one could argue the data supports the possibility a perceivable difference exists, just not to the threshold required to label it significant. Additionally, just because one particular test group is unable to achieve statistical significance doesn’t mean the same is true for all test groups. These xBmts are purposely designed to err on the side of false negatives, and there is a multitude of reasons why the impact caused by a particular variable might not be reliably perceived in a given test including too few participants, small beer samples, olfactory and palate fatigue, etc. Unfortunately, non-significant results can only be interpreted very narrowly, that is, an isolated group of testers in a single experiment under certain conditions were unable to reliably detect a difference.
Put simply, proving a positive is a lot easier than proving a negative. If I’m here to learn everything there is to know about beer, then all of these non-significant results slow me down because they necessitate far more retesting in order to come to any conclusions. On the bright side, this means I’m just going to have to brew even more beer. I can live with that.
| RAY |
There have been quite a few exBEERiments so far that have not yielded significant results, despite my expectations that they would. My first article for Brülosophy, Dogma Is A Funny Thing, captured exactly this sentiment by exploring how shaken I was upon learning how yeast may not be quite as sensitive to temperature as we generally accept. Through three trials with different strains, we’ve seen results that either didn’t reach statistical significance (1, 3) or was just barely over the threshold (2). Having had the opportunity to personally sample two of the three in question, my experience with the beers mirrored the statistics with the marginally significant one being ever-so-slightly different and the non-significant being totally impossible to identify.
As I’ve continued to perform xBmts, I’ve heard all manner of criticisms: the triangle test is too blunt of a tool, human perception is too unreliable to use, tasters are not of high-enough quality, tasters are not prepped to focus on a single aspect, etc… and frankly, I don’t buy any of it. Since we primarily care about how humans experience our beer, since that’s who it’s made to be consumed by, the blind triangle test and the human palate have exactly the appropriate level of fidelity, in my opinion.
I have come to a few conclusions.
First, results that are not statistically significant do not mean, “this variable doesn’t matter,” but simply, “we could not identify an effect with this test.” We can infer further applicability of findings, but that’s all it is, an inference, albeit one with perhaps some evidentiary support. Any time we generalize further than the specific evidence supports, we’re going out on a limb. For example, a vitality starter may perform similarly in a general sense to a conventional starter with a higher cell count, an inference I’ve made for myself that has influenced my practice, but I know I’ve stepped beyond the specific support of the evidence to do so.
Second, I have taken to seeing significance vs. non-significance more as an indication of how to put a given process into practice, how powerful a given adjustment knob is on beer design. For example, when I view the non-significant results of the mash temp xBmt together with the consistently significant results of the water chemistry xBmts, I notice a potential insight: whereas previously, if I wanted a given recipe to turn out crisper/drier, I might have mashed slightly cooler to encourage better attenuation, I’m now more inclined to consider upping the SO4 levels by adding gypsum, which the evidence has shown creates the impression of dryness, suggesting it’s an adjustment knob with more of an impact.
But this whole process has given me a new question to ponder, an overarching, big question I don’t have an answer to:
What causes bad beer?
We’ve mistreated a lot of batches in a lot of different ways, yet we’ve failed to produce a truly bad beer, at least by our standards. What gives?
As it is, I still treat many of these things as a sort of Pascal’s wager— I still boil my Pils malt based worts for 90 minutes, aerate my wort before pitching, and make efforts to target a specific mash temperature. I don’t necessarily view the conventional wisdom as being wrong, aside from the occasional post hoc ergo propter hoc fallacies, but I have come to accept that it probably isn’t necessarily right either.
| MALCOLM |
My opinion about the fact so many exBEERiments have not reached significance has changed and it happened rather quickly. It’s not unrealistic or unexpected when people have some sense of ownership of ideas and processes they subscribe to, and as such, they may be defensive if another proposes those beliefs are incorrect. Further, if some beer-famous individual has espoused some of which is dearly held to be “the way,” then they’re not only defending the idea but also one of their industry idols, the person who they took the idea from. This brings to mind one of my favorite quotes:
It ain’t ignorance causes so much trouble; it’s folks knowing so much that ain’t so.
~ Josh Billings (with variants) ~
I must admit that during the first xBmt I performed, for the reasons stated above, I truly hoped significance would be reached, as I’d spent so much time studying and trying to understand the nuances of water chemistry, it would’ve seemed fruitless had it proved to be inconsequential with respect to sensory perception. I’d observed real differences as measured by various devices, but what does that matter if making such adjustments ended up having no flavor impact? Lo and behold the results were significant. Rejoice! My convictions were upheld and obsession justified! But then, after some introspection, I thought about why I actually cared about the results. Had some of my predisposed beliefs become a religion? How would you classify holding onto concepts that may be proven, more or less, to be complete bollocks? Malcolm, you just need to have faith, adding pumpkin matters, just keep doing it.
During data collection sessions, it’s common for a few people to express concern or frustration with their ability to decipher which beer is different in the triangle test. My reply typically comes in two parts:
- Why do you care? You can either perceive it or you cannot so it’s not worth expelling any energy on it. This time around you might detect a particular variable, but maybe not the next time and so it goes for other people. We have varying abilities and sensitivities to countless variables so if you do not know – guess.
- Then I tell them it doesn’t matter. There is no correct response. You might accurately identify the different sample, but who says what the right response is? Not being able to tell if a beer is boiled for 30 vs. 60 minutes has what impact? Does it affect your belief system?
It’s always been my nature to question what is presented before me, much to the chagrin of some my teachers and supervisors, so I seize the xBmt results as an opportunity to learn, challenge my beliefs, and discover even more questions. When choosing which xBmts I wish to perform, I don’t think of it as “myth busting,” and since we don’t consider the results from one trial to be the end of debate, nor do I deem the outcome as the truth. Treating the potential findings thusly allows for me to constantly be in the mode of discovery. Sure, a few xBmts continue to point to repeatable results, such as water chemistry, but perhaps the next challenge is to discover when, or how, water won’t produce a beer that is distinguishable by a significantly significant proportion of participants. Challenge excepted!
| MARSHALL |
I overthink everything. For me it’s normal, it’s how I work. With this comes the annoying tendency for me to refuse to accept anything as fact without at least some evidence beyond anecdote. I am a skeptic to the core. I’ve gone through phases where I attempted to shelve this aspect of my being, but every time it only ended up leading to deeper consideration and usually abandonment of my faith in the concept. From the meaningful to the mundane, if it can be questioned, it will be, often without pretense, as it’s the process of uncovering something novel and gaining new knowledge I’m most interested in. Not proving myself or anyone else right or wrong.
As in life, so it is with brewing.
From my first extract batch through my transition to all grain, I’ve always wondered about the specificities of homebrewing where the seemingly meaningless is purported to have a monumental impact. As I delved into the popularly available books, blogs, forums, and podcasts, I realized most of what I was basing my processes off of had been gleaned from professional brewing practices and often shared by folks who hadn’t performed firsthand experiments on the stuff they were talking about. So, I started to put some of the commonly accepted brewing practices to the test, mostly to satiate my curiosity, partially to see if I could simplify my brewing, and not at all to minimize the contributions of those who came before me. I fully expected to use my results to support the methods I’d adopted, to confirm my strongly held biases.
This hasn’t exactly been the case.
Trust me, I’m as surprised and confused as anyone else. There are a small number of people who accept the xBmt data as gospel, relying on it to justify changes in their process, something we regularly caution against. Then there’s an equally small number who seem to do everything they can to avoid accepting even the possibility the results might be speaking to some truth, believing our data is influenced by something other than the variable being investigated. Comments from the latter camp often consist of efforts to poke holes using 2 common contentions:
Those who invoke what I call the shitty palates argument contend the participants were unable to distinguish a difference because they’re inept at tasting beer. This comes across as a bit self-righteous to me, as it presumes the person commenting has better tasting abilities, which is easy to do when so far removed. To avoid getting too defensive about the wonderful people who regularly contribute their palates to the cause, many of whom are BJCP judges, suffice to say the entire purpose of the xBmts is to test what a typical craft beer drinker perceives, not a so-called “super taster.”
The second argument is one I actually agree with to a large degree and usually involves comments about the weaknesses of the triangle test and, in particular, using a p-value to determine whether distinguishability is statistically significant. It’s true, the p-value has its issues and should not be used to say with certainty whether some variable actually has an impact. But it’s the best we’ve got and can be used to provide data with arguably more validity than a single person’s anecdotal report. I love hearing about how others subjectively experience stuff, especially with beer, but if I’ve learned anything in my time as a beer drinker and experimenter, it’s that individual perceptions are not generalizable or anywhere near absolute. They’re really just opinions.
As much as I love Jamie and Adam, as much as I enjoy actual myths being busted, their mentality in respect to the xBmts is not one I share. I don’t view the homebrewing variables we test as myths, because they’re not. Amazing people, from ancient Sumerians and medieval German brewers to geniuses like John Palmer and Gordon Strong, worked their asses off to bring our understanding of the brewing process to where it is today. Ultimately, I like to think of the xBmts as our tiny way of taking the baton and doing our part to advance what we know about brewing and beer. It’s kind of cheesy sounding, but I feel a certain sense of responsibility to keep this going and not give in to the pressure of the doubters, those who would prefer we close up shop in order to preserve tradition. Imagine where we’d be today if we’d accepted stopped testing new methods out way back when.
I’ll end by responding to one of the most common questions I receive:
What have you changed as a result of the xBmts?
Not much, actually! I’m no longer such a stickler about mash times and temps, I do intentionally rack some trub to my carboys, and there are a few styles I’m fine boiling for less than an hour. But for the most part, my brewing hasn’t changed much as a result of the xBmts, with one major exception– water chemistry. This is the only variable that has consistently produced significant results, which to me indicates perhaps it’s more important than we’ve been taught. Who knows though, I could be wrong.
Support Brülosophy In Style!
All designs are available in various colors and sizes on Amazon!
Follow Brülosophy on:
FACEBOOK | TWITTER | INSTAGRAM
| Read More |
18 Ideas to Help Simplify Your Brew Day
7 Considerations for Making Better Homebrew
List of completed exBEERiments
How-to: Harvest yeast from starters
How-to: Make a lager in less than a month
| Good Deals |
Brand New 5 gallon ball lock kegs discounted to $75 at Adventures in Homebrewing
ThermoWorks Super-Fast Pocket Thermometer On Sale for $19 – $10 discount
Sale and Clearance Items at MoreBeer.com
If you enjoy this stuff and feel compelled to support Brulosophy.com, please check out the Support Us page for details on how you can very easily do so. Thanks!
14 thoughts on “Brü’s Views w/ John Palmer | On The exBEERiments So Far”
Merry Christmas Marshall! Thanks for all the great reads this year.
Merry Christmas to you! Thank you so much for the support.
Maybe your XBmnts haven’t had statistical results but more importantly you’ve gotten me, and probably many others, to think about the conventional wisdom of home brewing that’s out there. Merry Christmas and Happy New Year!
As always, this was a great read. I think it’s an excellent idea to reflect on the impact of the Brulosophy experiments.
The timing of your experimentation overlapped completely with the growth in my own brewing. It has caused me to really look closely at my own process. Early on, I had spent a good deal of time constructing a fermentation chamber. I read with interest many of the exbeeriments that failed to show a significant difference with respect to fermentation temperature. However, I still control that variable, and I think for me the main reason is to remove it as a concern. Before the chamber, I was constantly checking temps and worrying, “is this too high, too low, etc.” Occasionally I would have a vigorous fermentation that would run the temp up beyond even what I normally saw. Controlling this variable allows me to stop worrying, and probably also contributes to consistency batch to batch.
The main thing I have changed as a result of the Brulosophy exbeeriments (other than not racking to a secondary or worrying about trub) has been to change my water. I used to use my tap water, which tastes fine to me, but I didn’t love my beer. I have switched to building from RO, and my beer has become consistently good.
I also have learned a lot from reading about the processes that you use to brew. While that is not usually what you are testing, I love thinking about how I can do it better and more efficiently.
Thank you for devoting so much time and energy to advancing our knowledge of brewing. I agree that you are not myth-busting. For the most part, this is not urban legend, but rather principles that have served a purpose. You are refining our knowledge of those processes and their applicability in modern brewing. John Palmer is right, it really is the golden age of brewing!
Regards,
Ben
Thank you for such great feedback. Cheers!
Agreed!! Cheers for all the entertainment this year and happy holidays to you guys!
Awesome. Just awesome. Love brulosophy and love your xbmts. Always look forward to your writeups. My beer has never tasted better, and brewing has never been more enjoyable. Merry Christmas!
Great site.
I’ve always been a little worried that your experiments might be underpowered to detect small effect sizes.
Marshall,
Great write up. I too really enjoy reading your entries. So humbly well written. I should have taken English class a little more serious…
Your blog has allowed me to really relax and enjoy the brew day and brewing process. Out of all the research I do and have done, Brulosophy has had the biggest impact on how I look at beer and how I make it. Beer is forgiving, beer is good and straight up fun!
Thank you and your contributors
Merry Christmas…
Cheers
Fuzz
Merry Christmas! I love the new XBMT page, it’s great seeing a mini summary of the results. I think there’s some interesting correlation made between experiments, things like fermentation temperature don’t seem to matter at lower temperatures, but it was recognizable with the English ale yeast (even if most recognized a slight difference) fermented at higher temps. Whatever the results find, it’s a good guide to look at my own process and determine what I did play around with.
Have you ever considered doing an xBmt with a large number changes between samples? I know that the scientific process says to change one variable at a time (and is the basis of this blog) but it would be interesting to do just one xBmt where you change all kinds of things that previously showed “no significance” and see it still holds true. For example brew an unforgiving style like a helles, and make batch 1 based on homebrewing dogma and on batch 2 use a different malt, shorten the boil, use a similar but different strain, ferment higher, etc. Then sit some really good palates down like those found at the NHBC. It might shed light on the effect of interaction between variables, although any single effect would be difficult to pin down.
Considered it? It’s already planned 🙂
I think what your xbmnts prove conclusively is that Charlie Papazian had it right all along; Relax, don’t worry, have a homebrew!
I really enjoy the Xbeerments and the ideas behind them. I suspect the infrequency of statistical significance comes from Mr. Palmer’s observation that “Beer wants to be made” and that’s why it’s been done for thousands of years.
The engineer in me wants to see some of these delved into more. Like the Neil Degrasse Tyson quote above about “more data” I sometimes think what the Xbeerment really does is highlight the design of the experiment. Is there some aspect of the test chosen that masks the area where the variable in question comes into play? With so many beers and styles there are lots of corners where some of these practices come in to play (as John Palmer says in his commentary). If the recipe tested doesn’t get into one of those corners then that is perhaps a reason why we don’t find many tests triggering a difference.
We all like different styles of beer, but I have to say “huh?” when trying to taste the difference in yeast in a beer with 80 IBU. To me that recipe masked the perhaps subtle variable being tried.
There are often hints in the reactions to the testing that might suggest a second test where the variable being tested might become apparent. As Neil says, when we can’t agree we need more data.