6 Comments

If you read audio websites often, you’ve surely seen discussion about whether or not measurements are important in audio reviews. Unfortunately, few of the people writing about this topic have experience in audio measurement, and their comments rarely amount to anything more than excuses for why they don’t do measurements. Because measurement is such a big part of SoundStage!’s group of websites, and SoundStage! Solo in particular, I thought it important to explain why we do measurements, and what conclusions you should draw -- and not draw -- from them.

The reason we do measurements is that a subjective audio review cannot present a comprehensive, unbiased evaluation of an audio product. It’s one writer’s opinion, almost always formed after casual, sighted listening sessions. A subjective review reflects not only the sounds that reached the writer’s eardrums, but also the writer’s presuppositions about the product category, the brand the product wears, and the technology the product uses; the writer’s relationship with the manufacturer and/or the public relations person; and the writer’s concerns about what readers, other writers, and other manufacturers will think of the review. It can also be affected by the writer’s mood, the music chosen for listening, even the time of day during which the writer performed the evaluations and wrote the review. How much do these factors affect the review? We don’t know -- and neither does the reviewer, unless he possesses a depth of self-knowledge that the Buddha would envy.

Headphones

These problems could be eliminated by blind testing, but for reasons I’ve discussed elsewhere, almost no audio writers do blind testing. So what we have are mostly subjective reviews that include only the writer’s reactions to a product. These reviews can be entertaining to read as a sort of audio travelogue, but because there’s no attempt to correlate the writer’s judgment with anyone else’s judgment, or with any objective standards, these reviews provide, as famed audio researcher Floyd Toole says in the new 3rd edition of Sound Reproduction: The Acoustics and Psychoacoustics of Loudspeakers and Rooms, “just stylish prose and opinion.”

A better method is to include performance measurements of the product being tested. Measurements provide a practical way to get beyond a writer’s opinion and provide a more comprehensive and less biased evaluation of a product.

Many audio reviewers say they reject measurements because music is about emotion, and that measurements can’t gauge emotion. The audio writer, they suggest, can gauge the emotion of a certain piece of music played through a certain piece of audio equipment, and the presumption is that the reader will share his emotional reaction to this experience. But our emotional reactions to music incorporate all sorts of influences, many of which I cite above, and it’s hubristic for any audio writer to assume your emotional reaction to a certain piece of music played over a certain system at a certain moment will correlate with his. I find it insulting when audio writers -- few of whom have demonstrated deep knowledge of audio engineering, scientific research, physics, or music -- presume that their emotional reaction to a piece of music played through a certain piece of gear will be the same as mine.

Contrary to the beliefs of many audio reviewers, measurements tell us much more about how well a component conveys the emotion of a piece of music than their opinions can. What the critics of measurement fail to realize is that the key measurements of speakers and headphones are interpreted by how they relate to the preferences of real listeners established through extensive blind testing. Measurements allow us to gauge a product against the opinions of dozens or hundreds of listeners, formed in conditions where bias is minimized or eliminated. This is vastly more useful than gauging a product against one reviewer’s opinion, formed in uncontrolled, casual testing with no attempt to eliminate bias.

Research in correlating measured performance with listener responses dates back at least to the 1980s. Here’s how the process generally works. The researcher brings in numerous listeners -- with a preference for trained listeners experienced at evaluating audio products -- to listen to samples of a wide variety of audio products in a particular category, and pick their favorites. The researcher then performs measurements of the products to see which measurements predict the listener impressions and which ones don’t. A target response is created based on the listeners’ comments and the responses of the listeners’ favorite products, and then the target response is tested against listener perceptions to confirm its validity.

In these studies, researchers are typically able to develop measurements that predict listener preferences with impressive accuracy. For example, in their 2017 paper “A Statistical Model That Predicts Listeners’ Preference Ratings of In-Ear Headphones: Part 2 -- Development and Validation of the Model,” researchers Sean Olive, Todd Welti, and Omid Khonsaripour report a correlation of 0.91, with 1.0 being perfect correlation. What’s the correlation between subjective reviews and listener preferences? To my knowledge, no magazine or website has tested this, or published the resulting data.

Note that I’m mostly talking about frequency response measurements of loudspeakers and headphones. I’ve also found excellent correlation between my headphone isolation measurements and listener perception of the leakage of outside noise into the headphones and earphones, using listeners from the staff of Wirecutter (a website that tests headphones and many other products) as my test subjects, and playing a recording of airplane cabin noise through my surround-sound system.

The correlation between other measurements and listener perception is not as well established. Distortion measurements predict listener perception only in fairly extreme cases. Spectral decay, or waterfall, measurements have yet to be well correlated with listener perceptions, but they are interesting to look at and they often correspond with frequency response measurements, so I include them. Impedance and sensitivity measurements tell you little or nothing about the sound quality of headphones or speakers, but they are important for assuring that a set of headphones can deliver optimum performance with the amplifier or source device you use.

You may be wondering why I haven’t mentioned measurements of audio electronics, such as amplifiers, preamps, and DACs. That’s because the numerous papers on the subject from the Audio Engineering Society’s E-Library show at best a tenuous and slight correlation between measurements of electronics and the results of blind listening tests. Listeners are only rarely able to consistently distinguish between these products in blind tests, and even when they can, the preferences among multiple listeners are usually too varied and mild to be meaningful. Without reasonable consistency in listener preferences, there’s nothing with which the measurements -- or the impressions of a subjective reviewer -- can be correlated.

However, listeners can distinguish among these devices when they exhibit significant flaws, such as high levels of distortion or large deviations in frequency response, and measurements can easily and reliably detect these flaws. Some of these products also have idiosyncrasies, such as high output impedance or low maximum output, that affect how well they’ll work with the other products in your system. Thus, it’s important to measure these products to see if they have any flaws, characteristics, or limitations that might affect your experience with them.

Measurement system

I certainly understand why most audio publications avoid measurements. I think most audio engineers would agree that it takes at least a couple of years’ experience to become proficient in any one measurement, plus incalculable hours to actually run the measurements and analyze the results. It’s also costly: while there are a few good, affordable audio measurement systems, most cost somewhere between $3000 and $30,000. And of course, audio measurement demands more commitment, passion, and effort than most people would prefer to devote to such a dense and challenging subject. It’s much easier to pour yourself another glass of scotch and deride the measurement guys as “enemies of poetry, love, and humanistic culture.” But if that’s all the writer is willing to do, they won’t be able to provide information that can predict how the reader -- as opposed to just the reviewer -- will like the product in question.

. . . Brent Butterworth
This email address is being protected from spambots. You need JavaScript enabled to view it.

Say something here...
Log in with ( Sign Up ? )
or post as a guest
People in conversation:
Loading comment... The comment will be refreshed after 00:00.
  • This commment is unpublished.
    headphoneryan · 2 years ago
    @Brent Butterworth you wrote something? Checking
  • This commment is unpublished.
  • This commment is unpublished.
    Brent Butterworth · 2 years ago
    @todd Thanks, Todd!

    On your first question, yes, it applies to Class D amplification, too. I've searched the Internet and the AES e-library and I can't find any research where controlled testing showed an audible difference between Class AB and Class D amps. There are anecdotal reports, but none TMK where the testing was blind. That said, I don't see the need for Class D in applications where power demand is low and power consumption/heat dissipation is not a problem.

    On your second question, an external DAC/headphone amp doesn't hurt, and many are very affordable and will have lower output impedance and higher max output power than the DAC/headphone amps built into computers. I assisted on an extensive test of those a while back: https://thewirecutter.com/reviews/best-portable-headphone-amp-with-built-in-dac/
  • This commment is unpublished.
    todd · 2 years ago
    RE: "...at best a tenuous and slight correlation between measurements of electronics and the results of blind listening tests...."

    1. would this apply to class D amplification as well, or do you recommend sticking with class AB amplifier topography?
    2. would this apply to the DACs in low end computers or would you recommend an external dac for computer audio?

    love your work Brent , thanks for the measurements


    p.s. awesome to have the likes of floyd toole making comments

    todd
  • This commment is unpublished.
    headphoneryan · 2 years ago
    @Brent Butterworth thank you! I am looking forward. FR is the most relevant for me. What's right? I do not care about isolation, etc. Ryan
  • This commment is unpublished.
    Brent Butterworth · 2 years ago
    @headphoneryan That's a great idea. Look for it on March 1.
  • This commment is unpublished.
    headphoneryan · 2 years ago
    A suggestion for you: please write something about how to read the measurements you take. I try but I do not understand most I see. FR is even hard.

    Ryan
  • This commment is unpublished.
    Brent Butterworth · 2 years ago
    @Joseph Yes, they have target curves for earphones and over-ear headphones now. My guess is that they wouldn't call them "finished," but their validity has been established through blind testing.
  • This commment is unpublished.
    Joseph · 2 years ago
    "A target response is created based on the listeners’ comments and the responses of the listeners’ favorite products"

    Is this so called target finished? I read the Innerfidelity article about Harman. Is the work completed?

    Joseph Tan
  • This commment is unpublished.
    Ian Colquhoun · 2 years ago
    @Brent Butterworth It has been so refreshing to read this article simply because it needs to be said. We use an entire suite of measurements taken in our anechoic chamber to create a Listening Window and Sound Power curve using our algorithm to average them. Ultimately the final test a speaker must pass is the double-blind listening test and the results of this testing will generally mean adjustments to the response because of the high audibility of very low Q artifacts that are difficult to visually identify. But I can say with certainty that the correlation between the results derived from the “Spinorama” and the results from the double-blind listen testing are very real, and large variations from these results will result in a guaranteed loser in the double-blind listen testing. It is worth noting that other factors, like power capabilities and overall bandwidth, play a large roll in overall product performance and can affect the results of a double-blind listen test. But those measurements are not the discussion here, and they tend not to be haunted by a disbelief that they need to be measured.

    Ian Colquhoun
    Axiom Audio

Latest Comments

Your review helped me to make a decision on getting these headphones.

I'm a super picky ...
Brent Butterworth 11 days ago Focal Arche DAC-Headphone Amplifier
@RoyI'm sorry, I am the furthest thing from an iOS expert. I keep an iPad ...
@Brent ButterworthCan you please tell me how to increase the usb sample bit rate when my ...
Brent Butterworth 12 days ago Technics EAH-AZ70W True Wireless Earphones
@Scott HI'd prefer these if you want noise cancelling, and IMO you need noise cancelling only ...
I'm in the market for my first pair of earbuds and I was deciding between ...
Very, very useful and informative article. Thank you so much!
I found SpeakerCompare to be a decent comparative tool.  As a way to increase sales ...
Michel CZARNY 23 days ago Calyx H Headphones
@jerryThe Thinksound On1 was very good indeed and classy too. Only problem was with the ...
Brent Butterworth 24 days ago The New Standard That Killed the Loudness War
@Soundstagenew2021 Saying the loudness war is over doesn't mean no one's going to create highly compressed ...
Soundstagenew2021 25 days ago The New Standard That Killed the Loudness War
@Soundstagenew2021 Correction discussion is for 2017 album 'Colours', so not new. But if you follow that ...