You can’t get far into an audio forum or the comments sections of audio websites without encountering the statement “Some products that measure well sound bad, and some products that measure poorly sound good.” Depending on who said it, it’s at best uninformed and at worst a lie. And it’s a lie that sometimes sticks listeners with underperforming audio gear.

The inaccuracy of the first half of that sentence can be shown in scientific papers and in the absence of documented examples. The second half could be true if the words “. . . to me” were added, but to the best of my recollection, I’ve always seen it presented as a universal statement, in which case it’s false. This platitude reflects not wisdom, but a rejection of science by people who, as far as I can tell, haven’t bothered to look into the science and have no measurement experience.

The Biggest Lie

One of the most glaring examples of this sentiment appeared just this month, in a review of the Tannoy Revolution XT 6 speaker by Herb Reichert in the July 2020 issue of Stereophile. The first sentence of the review reads, “I’ve been wrestling with my elders about new ways to measure loudspeakers, lobbying for methods that might collaborate [sic] more directly with a listener’s experience.” In another article, the same writer states his opinion more directly: “As a tool for evaluation, or as a predictor of user satisfaction, today’s measuring procedures are almost useless.” As we’ll see, this review clearly shows why measurements are so essential in the evaluation of audio products.

Both of the author’s statements reflect ignorance of the subject. In the case of speakers, measurement methods that have been shown to predict user satisfaction with 86% correlation were established more than 30 years ago. They were developed largely through extensive research led by Dr. Floyd Toole, conducted at Canada’s National Research Council (NRC) in Ottawa, and continued at Harman International. Countless speaker companies now use these methods as a design guideline. That’s because they know that speakers that measure well according to these principles will sound good to most listeners.

Some might point out that the model fails 14% of the time, but it’s unlikely that the 14% of speakers that measure well but didn’t win universal love from the listening panel sound “bad,” unless they have, say, high distortion -- which a different set of measurements could easily detect. Regardless, it’s absurd to proclaim an 86% success rate “almost useless.”

More recently, scientific research has produced headphone and earphone measurements that predict user satisfaction about as accurately. For example, in AES paper 9878, “A Statistical Model that Predicts Listeners’ Preference Ratings of In-Ear Headphones: Part 2 -- Development and Validation of the Model,” a Harman International research team of Dr. Sean Olive, Todd Welti, and Omid Khonsaripour report a 91% correlation between measurements and listener preferences in an evaluation of 30 earphones using 71 listeners.

AES 9878

I’ll agree that measurements don’t predict which amps, DACs, and other electronics people will like. But that’s not because of flaws in the measurements -- it’s because listeners rarely agree about which audio electronics they like. Blind tests seldom show clear differences between, or preferences for, certain models, brands, or types of amplifiers, for instance. Reviews of these products do not indicate preference trends among reviewers; they tend to rave about all sorts of amps and DACs. If a statistically significant number of participants in controlled listening tests don’t express affection for some audio electronics and disdain for others, there’s no way measurements or subjective reviews can predict listener preference.

What about the idea that “some products that measure poorly sound good”? A solid argument against this notion came from Stereophile technical editor (and former editor-in-chief) John Atkinson, who, in a summary of his 1997 AES presentation, stated, “. . . once the response flatness deviates above a certain level -- a frequency-weighted standard deviation between 170Hz and 17kHz of approximately 3.5dB, for example -- it’s unlikely the speaker will either sound good or be recommended.” And he’s talking here about the speakers recommended by Stereophile writers. Research shows that a panel of multiple listeners in blind tests would likely be even less forgiving of speakers that measure poorly.

AES Atkinson

Of course, even a clearly flawed audio product might sound good to somebody. To find an example, look no further than the very same Tannoy review. Atkinson’s measurements show that, as he puts it, “. . . the tweeter appears to be balanced between 3dB and 5dB too high in level,” which creates an “excess of energy in the presence region, which I could hear with the MLSSA pseudorandom noise signal when I was performing the measurements.”

To get a rough idea of what this sounds like, turn the treble knob on an audio system up by 4dB. It’s far from subtle, and it’s not pleasant. I can’t look at that measurement without thinking the factory used the wrong tweeter resistor. In a blind test with multiple listeners, such as the evaluations conducted by the NRC or Harman, this speaker would almost certainly score poorly.

Yet I find no mention of this flaw in the subjective review. In fact, the reviewer describes the speaker’s sound as “slightly soft,” and concludes with the words “Highly recommended.” Based on this review, at least, it seems likely that if a measurement technique could be found that reliably predicts which speakers this reviewer likes, most listeners won’t like those same speakers.

Fortunately, those who read the measurements got the real story. Those who ignored the measurements because they’ve been told they’re “almost useless” may end up buying a speaker with an obvious tonal-balance error.

Don’t get me wrong -- I don’t mind if someone raves about an audio product with a huge, demonstrable flaw, just as I’d hope no one minds if I occasionally enjoy listening to Kiss’s Alive! album. I’ve read many such reviews, and rarely felt inspired to comment on them. But dismissing decades of work by some of the world’s most talented audio scientists just because it doesn’t fit your narrative is as frivolous as claiming that Gene Simmons is the greatest bass player of all time.

I would hope that audio writers would be curious about their avocation and want to learn everything they can about it, but a huge percentage of them have shut themselves off from any new information that might cast some of their beliefs in doubt. In their rejection of science, they’ve mired their readers and their industry in nonsense -- and in many cases, they’ve stuck their readers in the infinite loop of buying underperforming products and then selling those to buy other flawed products, instead of simply learning key facts about audio so they can buy good gear the first time.

Frequency response curves

I’m encouraged, though, because the headphone community isn’t burdened with an anti-science attitude. On the contrary, headphone enthusiasts are putting together measurement rigs, reading the research, and working to understand how their headphones and amps work and interact. Yet they understand that science provides only guidelines, and that they ultimately have to listen for themselves and trust their ears to make the final judgment. Most important, they are getting better reproduction of, and more enjoyment from, their music. I think and hope that this is the future of audio.

. . . Brent Butterworth
This email address is being protected from spambots. You need JavaScript enabled to view it.

Say something here...
Log in with ( Sign Up ? )
or post as a guest
People in conversation:
Loading comment... The comment will be refreshed after 00:00.
  • This commment is unpublished.
    Doug Schneider · 3 months ago
    @Klaus Christensen Hard to address everything in your post, but the information about diffraction jumps out. I think it is audible because of the magnitude of the diffraction effects, at least at times. On a really poor baffle design, I've seen it spike many decibels -- like a big ricochet that typically winds up with a big spike in the response. Now, can you hear that? *Maybe* not one, depending on frequency, but several? But more important is that it's not hard to fix all that up, so why not? I can't see the harm.

    Re: multiple tweeters. The most surprising speaker for me in years was the Aurelia Cerica, which comes from Finland. Three tweeters in a waveguide. The way it disperses sound, and how that translates to some of the most precise imaging I've ever heard, is intriguing.

  • This commment is unpublished.
    Klaus Christensen · 3 months ago
    @Brent Butterworth Hi Brent and Doug.

    Thanks for the effort and extend you are interacting with your readers. I have reread everything and I think we are all in agreement. But allow me to pass over a few of my favorite pet theories, which might be right or wrong. But the point here being that I have newer seen scientific proof of whether they are right or wrong.
    1: we have since Roy Allison’s papers of the 1970’s been aware of the strong interaction between loudspeaker and the closest room boundaries (normally the floor or wall behind the speaker). Swedish audiophiles knew it 20 year earlier because of the work done by the Stig Carlsson, from the Swedish technical university and later founder of Sonab. This often thins out the frequency response between 200Hz and 400Hz by up to 9 dB. This is un debatable, but some claim it’s desirable, I claim it’s destroy any aspiration being called hi fidelity. At the latest high fidelity show I attended before COVID kicked in, Audio Note had brought in a cellist so one could compare the setup they had with the live sound of the cello. The only thing I could hear was the massive lack off lower midrange from their corner speaker set up. I was completely unable to come to any conclusion as to how the speakers sounded at other frequencies. My brain could only focus on the lack of lower midrange, and it’s destructive effect on the listening experience. Robert E. Green, one of the very few audio writers I respect from the subjectivity audio press writes in his recent review of the Graham Audio Ls 5/95f about this. Obviously with a much better grasp of the English language than I can. The problem is, neither he nor I can prove the psycho acoustic importance of this because (to the best of my knowledge) no research papers on its audibility has come out since Allison’s original work. which focused on the effect, and its solution, rather than audibility (something Allison probably took for granted). Hardly any modern speaker design takes note of this, the Revel Salon partially compensates, and it’s a lovely speaker so.....
    2: The next thing I would like to see is research being done on the audibility of diffraction. Personally I don’t belive it’s major problem and if this is the case, then it opens up the window for using two or more small tweeters arranged on hemispherical baffle. The question here is: What is most important, a diffraction free treble with limited dispersion above 8 kHz (roughly where a one inch tweeter starts to become highly directional ) or an array of small tweeters with perfect sound dispersion up and into audibility but with high levels of diffraction? I don’t know! Does anyone know? I would love if someone research this and presented the results. If it happens I am sure Brent and Doug will write about it.
    Let me end by stating, objectivism, research and psychoacoustic is the way forward towards better products! Let’s hope that these will be so good that even Herb Reichert will have to take notice and start believing in measurements.

    Greetings from Denmark

  • This commment is unpublished.
    Brent Butterworth · 3 months ago
    @Klaus Christensen We have fully arrived when it comes to predicting what speakers and headphones most people will like. We will never fully arrive when it comes to predicting with 100% accuracy what speakers and headphones a particular person likes. That will vary with their hearing, age and taste, but even more importantly, with their personal biases. Some audiophiles prefer certain products not because they sound better, but because of what those products say about them. For example, they may prefer an Audio Note speaker over a Polk even if the Polk beats the Audio Note speaker in a blind test. While industry insiders can certainly intuit which products will get a rave review from which writers (and most manufacturers I've worked with make this calculation every time they roll out a new model), there's no way an audio analyzer can gauge a person's personal biases. Maybe if combined with an algorithm that scans their social media posts and determines their preferences and prejudices. ;)
  • This commment is unpublished.
    Doug Schneider · 3 months ago
    @Klaus Christensen "My point is (and it is the only point!) that we haven’t fully arrived."

    I agree with that. And people who say we HAVE fully arrived aren't telling the truth. If we'd fully arrived we wouldn't have to listen at all -- but, of course, in Brent's reviews, he measures and he listens, and the measurements don't always tell the whole story. But they're rarely, if ever, THAT far off. They tell a lot of the story.

    I think my (and SoundStage!'s) biggest issue with what Herb wrote is he gave the impression that measurements haven't gotten us anywhere and this, of course, is completely wrong.

  • This commment is unpublished.
    Klaus Christensen · 3 months ago
    @Doug Schneider Basically we are in agreement!
    The subjectivism versus the objectivism debate has been going on for decades and since the 1950. We have gotten a substantially better psychoacoustic understanding of what matters in sound reproduction . This obviously allows us a much better grasp of which measurements to focus on. My point is (and it is the only point!) that we haven’t fully arrived.
    Seen from here the main problem of a purely subjective approach is that, buyers
  • This commment is unpublished.
    Doug Schneider · 3 months ago
    @Klaus Christensen "So for me Herb Reichert is right to some extent."

    The real problem is that Herb flat out dismissed measurements entirely, and with that he was completely wrong, as Brent said. In various articles and on forums I've seen Herb make comments about certain measurements that have me questioning whether he understands the measurements themselves or what work on them has been done.

    And that's really a big part of the problem -- many of the writers who dismiss measurements have little to no understanding about them.

    Doug Schneider
  • This commment is unpublished.
    Brent Butterworth · 3 months ago
    @Klaus Christensen Hi, Klaus. I have to say right off, Herb Reichert is flat-out wrong on this subject. He stated “As a tool for evaluation, or as a predictor of user satisfaction, today’s measuring procedures are almost useless," yet speaker measurements have been shown to predict user satisfaction with 86% accuracy, and headphone measurements do even better. Yes, there are still many things to explore in speaker testing, but that doesn't negate the accomplishments of prior research.

    On the boundary interaction, Dr. Toole devotes Chapter 9 of his book to that subject. He cites Allison's research and specifically examines Allison's designs in section 9.4.

    Dr. Toole strongly recommends crossing over the speakers to the subwoofer at 80 Hz (there is no room for debate on that one, it's simple physics). However, he might have recommended that systems need to go down to 20 Hz. (I can't find that in the book, but it's a big book.) You're right that few recordings contain much energy below 30 Hz (or even 40 Hz), but my blind listening tests have shown that subs whose output remains strong down to 20 Hz tend to be preferred (probably because their distortion is low) although listeners in my tests seem to place at least as much importance on the sound character of the sub. Plenty of room for a good AES paper on this one!

    I don't know if there's an ideal baffle size, but speakers of different baffle sizes can be designed to have response that conforms to the standards emerging from the NRC/Harman research. If the designer doesn't properly compensate for the baffle size, that'll show up in the measurements. That's sort of like asking, what's the ideal midrange driver size? I've encountered speakers with midrange drivers as large as 6.5" and as small as 3" that work very well with proper crossover design and a suitable tweeter and woofer, so there's probably no "ideal" size.

  • This commment is unpublished.
    Klaus Christensen · 3 months ago
    Hi Brent.

    I am great believer in the work of Floyd Tool and company. And have been so since I came to know about it in my early twenties. However I don’t think Harman or anyone else has discovered all there is to discover about loudspeaker performance, so yes I believe there are frontiers not yet fully understood. Allow me to give a few examples.
    1: Roy Allison did great work on boundary and speaker interaction in the 1970’s but to this day I have not seen a study where audibility and preference have been evaluated.
    2: Floyd Tool claims (undoubtedly correct) that there is a preference for speakers which reach down to 20 Hz. Why is this so, when most acoustic instruments only goes down to the mid 30 Hz? And this is so just for The very lowest note they can play. So why’s there a preference for truly full range speakers? Is it low frequency group delay? Do we know?
    3: What is the ideal baffle size? The size of the front baffles obviously determines the frequency where the speaker goes from being an omnidirectional to a hemispherical dispersing device. But what is the ideal frequency for this to happen? To me this has newer been researched properly? Maybe there is a preference for this transition to newer happen, leaving us with cardioid, figure of eight and omnidirectional speakers. Who knows, do we know?

    So for me Herb Reichert is right to some extent. We know a lot but in the future we will hopefully know more, and be better to understand and evaluate measurements.

    Greetings from Denmark

    Klaus H. Christensen

  • This commment is unpublished.
    Brent Butterworth · 9 months ago
    @Jim Farrell You raise some good points. One could argue that the writer who rejects (and probably doesn't even read) the science on measurement because it challenges his established beliefs and his identity as an audiophile is not lying in this case. But we are talking about a publication with a technical editor who confirms and supports measurement science in practically every issue, and an editor-in-chief with a master's degree in physics who claims to be conversant with audio measurement science. Yet they are publishing writers who breezily deny the science without making a case against it. I'm comfortable in calling that a lie. They can't have it both ways. One of those statements is false and the editor and technical editor know it.

    You're 100% right that cultural biases, brand identification, etc., can play a big role in one's preference for an audio component. In fact, we can celebrate that. But I think my readers deserve, as much as possible, an unbiased assessment of a component's performance. They already know if they are drawn to some particular brand or technology.

    Our current range of audio measurement tools is quite remarkable, in my opinion; I've owned many of them and used almost all of them. As noted, using these tools we can accurately predict user preference in speakers and headphones, and it's impossible for them to predict user preference in electronics when no such preferences can be discerned in listening tests. A lot of brilliant people worked years to develop those tools, and if you're going to call them "pathetic," then back up your statement with specific criticisms that demonstrate your qualifications to make such a statement.

    Yes, audio is a hobby, but evaluating audio components is my profession, and has been since the early 1990s. I take my gig very seriously and I want to do the best job I can at it. That's why I include measurements.
  • This commment is unpublished.
    Jim Farrell · 9 months ago
    I agree with your premise up to a point. Rejecting measurements entirely is clearly a scientifically indefensible position to take. Calling it a lie, however, suggests that those who promote it know it is wrong, but say it anyway for some unspecified purpose or personal gain. That is somewhat harsh and requires backing up. And yes, I know you said "at best uninformed" but the title of your article is " The biggest Lie in audio" so you don't get to roll back from that in the text.

    Also if there is an 86% correlation between good measurements and the perception of good sound that is indeed highly significant but means that 14% of people did not hear good sound from well measuring components. So for some people at some time under some circumstances good measuring components can indeed sound bad and it may well be that the opposite is also true.

    The problem lies at the interface between what we measure and what we hear. We only measure a limited number of parameters and these take little cognizance of biological variations between individuals hearing, psychological biases around what sounds good and what doesn't, expectations regarding brands and sources and price, cultural bias involving musical taste and what we grew up with and also, crucially, what we are used to in our own rooms in our own houses. It goes on and on and our currently rather pathetic range of measuring tools cannot even scratch the surface.

    Frankly both absolutist camps are wrong. Absolutists generally are.

    If you want to rely on science then double blind listening setups with a big enough sampling size to account for the psychacoustical issues laid out above is probably the way to go, but it's a hobby for goodness sake, not curing cancer or solving the mysteries of the universe so I doubt that is going to happen. In the meantime everyone in both camps and on the fence can continue to enjoy arguing with each other. Like this.
  • This commment is unpublished.
    Brent Butterworth · 9 months ago
    @Todd fetterman I can't tell anything from those measurements, unfortunately. If they were averaged over, say, 6 positions, which largely eliminates the effects of room modes, I'd put more faith in them. Those measurements show a +5dB bump at 1.2kHz, which is something I have not seen in any other Revel speaker. It is possible to do useful in-room measurements -- John Atkinson does them for Stereophile, and I often do them as backup for my quasi-anechoic measurements -- but real analysis requires anechoic or quasi-anechoic measurement. If you visit this page and go do to "Speaker Measurements 101," I visit this topic in depth. http://www.brentbutterworth.com/writing.html
  • This commment is unpublished.
    Todd fetterman · 9 months ago
    @Brent Butterworth Yes, that's right.
  • This commment is unpublished.
    Brent Butterworth · 9 months ago
    @todd Are you talking about Tom Norton's review of the Revel PerformaBe system in Sound & Vision?
  • This commment is unpublished.
    todd · 9 months ago
    a well known publication just published a review of a higher end Revel speaker system and published measurements taken in the reviewers listening space.

    If one measures that way are they not measuring the room, not the speaker invalidating the measurements?

  • This commment is unpublished.
    SECA_Alan · 10 months ago
    This article advances a fascinating issue, one that's poorly served by tribalism and is full of nuance. There's certainly no problem enjoying a technically 'less resolved' design and discovering emotional connection with music through it. However that cannot be a basis for comparison. The 'Spinorama' and associated preference ratings are a great way to understand certain design choices, engineering competency and a PIR that accounts for many of the basic building blocks of good sound.

    I thoroughly enjoy the intense discussion, the context and knowledge shared teaches the avid reader a lot.
  • This commment is unpublished.
    Jeanette · 10 months ago
    Great article Brent! Just one of many areas where people are encouraging us to ignore the science to our peril...
  • This commment is unpublished.
    Brent Butterworth · 10 months ago
    @Dustin Now that I think of it, I've heard a few well-regarded brands used as whipping boys in blind tests at Harman. I think that is one of them. (mm)
  • This commment is unpublished.
    Dustin · 10 months ago
    @Brent Butterworth B&W?
  • This commment is unpublished.
    Brent Butterworth · 10 months ago
    @Kevin Voecks 1) I so wish we could expose more people to blind testing. It'd be a great thing to do at an audio show.
    2) I bet $1 I can name the mystery brand.
  • This commment is unpublished.
    Kevin Voecks · 10 months ago
    Bravo for telling the truth Brent! I should add one observation: People who "like," or even own a particular loudspeaker may not feel the same way about it in a properly conducted double-blind test. I once had three well known audio people participate in a double-blind test in which they did not know the identities of any of the speakers they were evaluating. All three of them owned one of the speakers in the test from a well-known brand, which they all regarded as a true reference. They all rated the very speaker they owned very poorly, disparaging it. It goes to show the power of human prejudices.

Latest Comments

As a long-time audiophile with roots in high-end audio, including manufacturing, I couldn’t agree more ...
Brent Butterworth 21 hours ago How Audio Writers are Killing the Audio Industry
The writing is just so much spot on. And I am saying this as a ...
@todd fettermanWe did it with headphones, at least! I might do something on my speaker, subwoofer ...
@Brent ButterworthYes, of course, there are others, doing measurements and I appreciate you turning me onto ...
@todd fettermanThank you so much, Todd! But Soundstage deserves a lot of credit, they've been doing ...
@Rudi I think TapeOp is one of the best magazines I have ever read. Agreed that ...
@Chris LaunderKudos to Harry Pearson for conceiving such an appealing idea and message. But his construct ...
Excellent piece. Somebody had to come forward and say the emperor has no clothes. Its ...
This is a good piece and, while I largely agree with it, I think putting ...