• Explore the magic and the mystery!



  • Some Troubling Information About Consumer Reports’ Product Testing

    May 23rd, 2018

    AppleInsider got the motherlode. After several years of back and forth debates about its testing procedures, Consumer Reports magazine invited the online publication to tour their facilities in New York. On the surface, you’d think the editorial stuff would be putting on their best face to get favorable coverage.

    And maybe they will. AppleInsider has only published the first part of the story, and there are apt to be far more revelations about CR’s test facilities and the potential shortcomings in the next part.

    Now we all know about the concerns: CR finds problems, or potential problems, with Apple gear. Sometimes the story never changes, sometimes it does. But the entire test process may be a matter of concern.

    Let’s take the recent review that pits Apple’s HomePod against a high-end Google Home Max, which sells for $400 and the Sonos One. In this comparison, “Overall the sound of the HomePod was a bit muddy compared with what the Sonos One and Google Home Max delivered.”

    All right, CR is entitled to its preferences and its test procedures, but lets take a brief look at what AppleInsider reveals about them.

    So we all know CR claims to have a test panel that listens to speakers set up in a special room that, from the front at least, comes across as a crowded audio dealer with loads of gear stacked up one against another. Is that the ideal setup for a speaker system that’s designed to adapt itself to a listening room?

    Well, it appears that the vaunted CR tests are little better than what an ordinary subjective high-end audio magazine does, despite the pretensions. The listening room, for example, is small with a couch, and no indication of any special setup in terms of carpeting or wall treatment. Or is it meant to represent a typical listening room? Unfortunately, the article isn’t specific enough about such matters.

    What is clear is that the speakers, the ones being tested and those used for reference, are placed in the open adjacent to one another. There’s no attempt to isolate the speakers to prevent unwanted reflections or vibrations.

    Worse, no attempt is made to perform a blind test, so that a speaker’s brand name, appearance or other factors doesn’t influence a listener’s subjective opinion. For example, a large speaker may seem to sound better than a small one, but not necessarily because of its sonic character. The possibility of prejudice, even unconscious, against one speaker or another, is not considered.

    But what about the listening panel? Are there dozens of people taking turns to give the speakers thorough tests? Not quite. The setup involves a chief speaker tester, one Elias Arias, and one other tester. In other words, the panel consists of just two people, a testing duo, supposedly specially trained as skilled listeners in an unspecified manner, with a third brought in in the event of a tie. But no amount of training can compensate for the lack of blind testing.

    Wouldn’t it be illuminating if the winning speaker still won if you couldn’t identify it? More likely, the results might be very different.  But CR often appears to live in a bubble.

    Speakers are measured in a soundproof room (anechoic chamber). The results reveal a speaker’s raw potential, but it doesn’t provide data as to how it behaves in a normal listening room, where reflections will impact the sound that you hear. Experienced audio testers may also perform the same measurements in the actual listening location, so you can see how a real world set of numbers compares to what the listener actually hears.

    That comparison with the ones from the anechoic chamber might also provide an indication how the listening area impacts those measurements.

    Now none of this means that the HomePod would have seemed less “muddy” if the tests were done blind, or if the systems were isolated from one another to avoid sympathetic vibrations and other side effects. It might have sounded worse, the same, or the results might have been reversed. I also wonder if CR ever bothered to consult with actual loudspeaker designers, such as my old friend Bob Carver, to determine the most accurate testing methods.

    It sure seems that CR comes up with peculiar ways to evaluate products. Consider tests of notebook computers, where they run web sites from a server in the default browser with cache off to test battery life. How does that approach possibly represent how people will use these notebooks in the real world?

    At least CR claims to stay in touch with manufacturers during the test process, so they can be consulted in the event of a problem. That approach succeeded when a preliminary review of the 2016 MacBook Pro revealed inconsistent battery results. It was strictly the result of that outrageous test process.

    So turning off caching in Safari’s usually hidden Develop menu revealed a subtle bug that Apple fixed with a software update. Suddenly a bad review become a very positive review.

    Now I am not going to turn this article into a blanket condemnation of Consumer Reports. I hope there will be more details about testing schemes in the next part, so the flaws —  and the potential benefits — will be revealed.

    In passing, I do hope CR’s lapses are mostly in the tech arena. But I also know that their review of my low-end VW claimed the front bucket seats had poor side bolstering. That turned out to be totally untrue.

    CR’s review of the VIZIO M55-E0 “home theater display” mislabeled the names of the setup menu’s features in its recommendations for optimal picture settings. It also claimed that no printed manual was supplied with the set; this is half true. You do receive two Quick Start Guides in multiple languages. In its favor, most of the picture settings actually deliver decent results.



    Share
    | Print This Post Print This Post

    20 Responses to “Some Troubling Information About Consumer Reports’ Product Testing”

    1. Total says:

      no attempt is made to perform a blind test, so that a speaker’s brand name, appearance or other factors doesn’t influence a listener’s subjective opinion. For example, a large speaker may seem to sound better than a small one, but not necessarily because of its sonic character. The possibility of prejudice, even unconscious, against one speaker or another, is not considered.

      Wait, you mean they’re treating them like ordinary consumers might? God, if only there was something, maybe right there in the name of the publication, that suggested that they were interested in the ordinary consumer experience!

      • Well then they are not experts are they?

        Peace,
        Gene

        • Total says:

          They’re experts duplicating the consumer experience. That strikes me as useful, not something to criticize.

          • gene says:

            They are supposed to provide useful information, with scientific testing, to help you make decisions on what to buy. Performing a careless subjective test of a loudspeaker system, with unconfirmed and unscientific results, doesn’t help you know what to buy.

            Peace,
            Gene

    2. Total says:

      I’d rather they provided insight on how products perform in a situation that typical consumers might encounter.

      • gene says:

        Well you don’t listen to speakers in a small room surrounded by lots of other speakers.

        Peace,
        Gene

        • Total says:

          First, I assume they’re not listening to all the speakers at the same time, so that they’re in the same room is irrelevant. But also, yes, actually, I do have multiple speakers (including an Echo, a Google Home, and computer speakers) in the same space. Second, I do listen to them in a room that doesn’t have special carpeting or wall treatment, and the furniture hasn’t been placed to optimize the audio quality. I also don’t blank out the names or the sizes of the speakers.

          Consider tests of notebook computers, where they run web sites from a server in the default browser with cache off to test battery life

          Er…they did that so that the computers wouldn’t cache what was — after all — the same site being accessed over and over and over again, and thus mimicking a more typical consumer who goes to multiple sites during the course of the day. That seems perfectly reasonable.

          • gene says:

            You are not testing audio products and rating sound quality for several million readers.

            How you listen is not how an experienced audio tester listens.

            When it comes to a notebook battery ratings, people vary on the sort of sites they visit. Sometimes they revisit the same cached sites, sometimes they don’t. You have to attempt to approximate the usage patterns for regular users, and this oddball scheme, using features people do not ordinary use, does not accomplish that purpose. The results do not match what consumers should expect.

            Pace,
            Gene

    3. Total says:

      You are not testing audio products and rating sound quality for several million readers.

      Yes, I know. I’m one of those several million readers that CR is testing for, and their setup is much closer to mine than some artificially created sound room.

      How you listen is not how an experienced audio tester listens.

      Absolutely correct! How I listen is as a consumer which is, after all, who CR is trying to test for. If they were testing for experienced audio testers than they should do it the way you suggest.

      When it comes to a notebook battery ratings, people vary on the sort of sites they visit. Sometimes they revisit the same cached sites, sometimes they don’t.

      I’m extremely skeptical that any consumer goes to exactly the same site (and no other) thousands upon thousands of times in the same day, which was what CR was doing.

      Now you can argue that they should have had the computers access lots of different sites on the web, but that brings in other factors (like how those websites were doing on the day that the computer was tested).

      The results do not match what consumers should expect

      Really? So the large scale complaints about battery life that came both before and after CR’s tests were just figments of my imagination? (And Apple’s apparently — disabling the “Time Remaining” in the battery life menu is a pretty telling indicator).

      • gene says:

        Let’s get this clarified: After Apple fixed a problem with Safari when caching was disabled, CR reported that it had excellent battery life and gave it a positive rating.

        Your comments, about user complaints of battery life, are not unusual for Apple gear. In most tests. reviewers rated battery life as satisfactory, but that also depends on how people use the product. The item about “large scale complaints” has noting to do with CR, since no such complaint was made by the magazine in the final test.

        Peace,
        Gene

    4. Total says:

      After Apple fixed a problem with Safari when caching was disabled, CR reported that it had excellent battery life and gave it a positive rating.

      Sure — after Apple fixed what was causing the problem, CR retested and changed its conclusions. That seems to me a good thing.

      The item about “large scale complaints” has noting to do with CR, since no such complaint was made by the magazine in the final test.

      Sure it does — it suggests that consumers were running into battery life issues, just as CR reported. Now, were they because the consumers had disabled the cache? Probably not — but they were highly likely to be situations where the cache was not being used heavily (because the consumers were going to lots of different sites) and the bug was getting invoked.

      • gene says:

        Not correct.

        In its final review, CR concludes that the 2016 MacBook Pro “Has a very long battery life.”

        There is one comment from someone complaining about poor battery life, but that’s not something one can evaluate without lots of similar comments and an analysis of their usage patterns.

        Peace,
        Gene

    5. Total says:

      Not correct

      Uh, yes correct. Once Apple fixed the problem, CR retested (using the same protocol) and found the battery life was fine. That doesn’t invalidate their earlier test, it just means Apple fixed what was causing the problem.

      There is one comment from someone complaining about poor battery life

      And lots of other comments on other sites complaining about battery life, indicating that CR’s conclusions didn’t come out of nowhere.

      • gene says:

        One more time: CR’s final conclusions were favorable. There were no mentions about poor battery life. The original issue only occurred when the test was run with caching off in Safari, due to a bug that was quickly fixed. That fix eliminated the problem. And that problem was not poor battery life, but inconsistent battery life.

        Did you actually read the CR report? I have.

        Peace,
        Gene

    6. Total says:

      The original issue only occurred when the test was run with caching off in Safari, due to a bug that was quickly fixed. That fix eliminated the problem.

      Yes, when Apple fixed the problem, the battery life got better. I’m not sure why you see that as an issue with CR. They didn’t make up the issue — they mimicked consumer usage and found a problem.

      There were no mentions about poor battery life

      Because Apple fixed it. Why would they mention something that no longer applied? And it’s not like CR was shying away from it — they put out multiple articles explaining why they were now recommending the MBP.

      (And yes, I’ve read the report)

      • gene says:

        Consumers do not run Safari on their MacBook Pros with the Develop menu on. That’s only the province of developers, which is why the matter was not apparent. It was fixed over a Christmas holiday, and has little or nothing to do with any ongoing complaints about battery life. Most people would have installed the update by that time.

        Peace,
        Gene

    7. Total says:

      Consumers do not run Safari on their MacBook Pros with the Develop menu on

      Consumers do, however, run Safari in such a way that the cache is not used that heavily, which is essentially the same thing. Absent Apple specifically saying that the bug could not be invoked unless the Develop menu route was used, which I don’t believe they ever did, I tend to think that it was something that could easily have affected consumers.

      The Apple statement was very careful to say turning the cahce off “does not reflect real-world usage” which is true, but says nothing about whether real-world usage could invoke the bug.

      • gene says:

        This borders on trolling.

        Apple issued a fix for a problem that only exhibited itself with the caching turned off.

        CR retested the computer after this fix was made and found the battery life inconsistencies they encountered had been fixed.

        End of story.

        Peace,
        Gene

    8. Total says:

      It’s not trolling, it’s disagreement.

      Apple issued a fix for a problem that only exhibited itself with the caching turned off.

      You don’t know that, and Apple — as far as I know — has been very careful not to claim that. If they did, I’m sure you can cite it?

    Leave Your Comment