If you read audio websites often, you’ve surely seen discussion about whether or not measurements are important in audio reviews. Unfortunately, few of the people writing about this topic have experience in audio measurement, and their comments rarely amount to anything more than excuses for why they don’t do measurements. Because measurement is such a big part of SoundStage!’s group of websites, and SoundStage! Solo in particular, I thought it important to explain why we do measurements, and what conclusions you should draw -- and not draw -- from them.
The reason we do measurements is that a subjective audio review cannot present a comprehensive, unbiased evaluation of an audio product. It’s one writer’s opinion, almost always formed after casual, sighted listening sessions. A subjective review reflects not only the sounds that reached the writer’s eardrums, but also the writer’s presuppositions about the product category, the brand the product wears, and the technology the product uses; the writer’s relationship with the manufacturer and/or the public relations person; and the writer’s concerns about what readers, other writers, and other manufacturers will think of the review. It can also be affected by the writer’s mood, the music chosen for listening, even the time of day during which the writer performed the evaluations and wrote the review. How much do these factors affect the review? We don’t know -- and neither does the reviewer, unless he possesses a depth of self-knowledge that the Buddha would envy.
These problems could be eliminated by blind testing, but for reasons I’ve discussed elsewhere, almost no audio writers do blind testing. So what we have are mostly subjective reviews that include only the writer’s reactions to a product. These reviews can be entertaining to read as a sort of audio travelogue, but because there’s no attempt to correlate the writer’s judgment with anyone else’s judgment, or with any objective standards, these reviews provide, as famed audio researcher Floyd Toole says in the new 3rd edition of Sound Reproduction: The Acoustics and Psychoacoustics of Loudspeakers and Rooms, “just stylish prose and opinion.”
A better method is to include performance measurements of the product being tested. Measurements provide a practical way to get beyond a writer’s opinion and provide a more comprehensive and less biased evaluation of a product.
Many audio reviewers say they reject measurements because music is about emotion, and that measurements can’t gauge emotion. The audio writer, they suggest, can gauge the emotion of a certain piece of music played through a certain piece of audio equipment, and the presumption is that the reader will share his emotional reaction to this experience. But our emotional reactions to music incorporate all sorts of influences, many of which I cite above, and it’s hubristic for any audio writer to assume your emotional reaction to a certain piece of music played over a certain system at a certain moment will correlate with his. I find it insulting when audio writers -- few of whom have demonstrated deep knowledge of audio engineering, scientific research, physics, or music -- presume that their emotional reaction to a piece of music played through a certain piece of gear will be the same as mine.
Contrary to the beliefs of many audio reviewers, measurements tell us much more about how well a component conveys the emotion of a piece of music than their opinions can. What the critics of measurement fail to realize is that the key measurements of speakers and headphones are interpreted by how they relate to the preferences of real listeners established through extensive blind testing. Measurements allow us to gauge a product against the opinions of dozens or hundreds of listeners, formed in conditions where bias is minimized or eliminated. This is vastly more useful than gauging a product against one reviewer’s opinion, formed in uncontrolled, casual testing with no attempt to eliminate bias.
Research in correlating measured performance with listener responses dates back at least to the 1980s. Here’s how the process generally works. The researcher brings in numerous listeners -- with a preference for trained listeners experienced at evaluating audio products -- to listen to samples of a wide variety of audio products in a particular category, and pick their favorites. The researcher then performs measurements of the products to see which measurements predict the listener impressions and which ones don’t. A target response is created based on the listeners’ comments and the responses of the listeners’ favorite products, and then the target response is tested against listener perceptions to confirm its validity.
In these studies, researchers are typically able to develop measurements that predict listener preferences with impressive accuracy. For example, in their 2017 paper “A Statistical Model That Predicts Listeners’ Preference Ratings of In-Ear Headphones: Part 2 -- Development and Validation of the Model,” researchers Sean Olive, Todd Welti, and Omid Khonsaripour report a correlation of 0.91, with 1.0 being perfect correlation. What’s the correlation between subjective reviews and listener preferences? To my knowledge, no magazine or website has tested this, or published the resulting data.
Note that I’m mostly talking about frequency response measurements of loudspeakers and headphones. I’ve also found excellent correlation between my headphone isolation measurements and listener perception of the leakage of outside noise into the headphones and earphones, using listeners from the staff of Wirecutter (a website that tests headphones and many other products) as my test subjects, and playing a recording of airplane cabin noise through my surround-sound system.
The correlation between other measurements and listener perception is not as well established. Distortion measurements predict listener perception only in fairly extreme cases. Spectral decay, or waterfall, measurements have yet to be well correlated with listener perceptions, but they are interesting to look at and they often correspond with frequency response measurements, so I include them. Impedance and sensitivity measurements tell you little or nothing about the sound quality of headphones or speakers, but they are important for assuring that a set of headphones can deliver optimum performance with the amplifier or source device you use.
You may be wondering why I haven’t mentioned measurements of audio electronics, such as amplifiers, preamps, and DACs. That’s because the numerous papers on the subject from the Audio Engineering Society’s E-Library show at best a tenuous and slight correlation between measurements of electronics and the results of blind listening tests. Listeners are only rarely able to consistently distinguish between these products in blind tests, and even when they can, the preferences among multiple listeners are usually too varied and mild to be meaningful. Without reasonable consistency in listener preferences, there’s nothing with which the measurements -- or the impressions of a subjective reviewer -- can be correlated.
However, listeners can distinguish among these devices when they exhibit significant flaws, such as high levels of distortion or large deviations in frequency response, and measurements can easily and reliably detect these flaws. Some of these products also have idiosyncrasies, such as high output impedance or low maximum output, that affect how well they’ll work with the other products in your system. Thus, it’s important to measure these products to see if they have any flaws, characteristics, or limitations that might affect your experience with them.
I certainly understand why most audio publications avoid measurements. I think most audio engineers would agree that it takes at least a couple of years’ experience to become proficient in any one measurement, plus incalculable hours to actually run the measurements and analyze the results. It’s also costly: while there are a few good, affordable audio measurement systems, most cost somewhere between $3000 and $30,000. And of course, audio measurement demands more commitment, passion, and effort than most people would prefer to devote to such a dense and challenging subject. It’s much easier to pour yourself another glass of scotch and deride the measurement guys as “enemies of poetry, love, and humanistic culture.” But if that’s all the writer is willing to do, they won’t be able to provide information that can predict how the reader -- as opposed to just the reviewer -- will like the product in question.
. . . Brent Butterworth