Movie Review Scores Are Fundamentally Flawed

Rotten Tomatoes and Metacritic have become our first stop in determining how good a movie is. Until recently, I had no idea how each site arrived at their review scores. Once I found out, I realized I’d been reading them all wrong.

Where Rotten Tomatoes and Metacritic Ratings Come From

Rotten Tomatoes and Metacritic ratings are embedded in everything from movie listing apps like Flixster to Google search results. You’ve probably seen the rating next to a movie title. Experienced users might even know that each site actually has two scores: one for critics and one for regular viewers. What you may not realize is that each site calculates those numbers very differently.

To get the critics’ ratings, Rotten Tomatoes collects critic reviews from a variety of sources, usually a couple hundred or so, depending on how high profile the movie is. Each review is then categorized as either Fresh (positive) or Rotten (negative). The score you see is the percentage of the total reviews that are considered “Fresh.” So, for example, with the recent superhero clash-up Batman v Superman, the site collected 327 reviews, 90 of which fell into the positive category. 90 is 28% of 327, so that becomes the movie’s score.

Metacritic, on the other hand, uses a bit more nuance in their system. The company collects reviews from around the web and assigns them a score ranging from 0 to 100. In instances where a site uses a measurable metric—like a numerical rating system or a letter grade—Metacritic fills in a number that it most closely believes represents that figure. The site then takes a weighted average of all the reviews. The company doesn’t reveal how much weight it assigns to individual reviewers, but it does explain that certain reviewers are given more significance in overall score based on their “stature.” This system allows a bit more nuance to show through. In the case of Batman v Superman, Metacritic gave the movie a 44, which is considerably higher than the 28% Rotten Tomatoes gave it.

It’s worth pointing out that Rotten Tomatoes and Metacritic—as well as IMDb—also have separate user scores. These work more or less consistently across all three sites. Users are allowed to rate a movie on a scale from one to ten (technically Rotten Tomatoes uses a five-star rating, but you can use half-stars, making the math functionally identical). Then, each site has different ways of weighting their scores, to come up with the final user rating.

Rotten Tomatoes Drags Scores Towards the Extreme

The problem with Rotten Tomatoes’ method is that by boiling down an entire review to “good” or “bad,” it gives critical reviews the nuance of a coin flip. This dramatically sways review scores in polarizing directions. While Rotten Tomatoes doesn’t draw attention to it, you can find an “average rating” for every film directly below the Tomatometer score on the website. This scale averages reviewer scores after they’ve been assigned a value on a ten-point scale. If we look at that Batman v Superman example again, we see that its average rating is actually 4.9. That’s even higher than Metacritic rated the movie. However, since Rotten Tomatoes treats a reviewer who thought the movie was okay but had some problems the same way it treats a reviewer that thought the movie was total crap, that slightly-below-average 4.9 score gets dragged down to an abysmal 28% score.

This effect isn’t just negative, though. We can look at the other big summer superhero clash to see the effect in reverse. Captain America: Civil War pulls in a respectable average rating of 7.9 on Rotten Tomatoes right now, but the Tomatometer score is considerably higher at 92% (with 126 “Fresh” reviews out of 137). Once again, Metacritic’s method gives Civil War a score of 77, which is much closer to Rotten Tomatoes average rating. Appropriately, this effect makes the Tomatometer a bit like Captain America’s super soldier serum: Good becomes great. Bad becomes worse.

The same effect applies to Rotten Tomatoes user scores, though it’s a bit less pronounced. Any score of 3.5 stars (or 7 out of 10) is considered positive, or “Fresh.” Less than that is considered negative or “Rotten.” The user score represents the percentage of positive ratings. While this is still simplistic, the source data has more room for a middle ground than a subjective “good” or “bad,” and it has a much bigger data set to pull from.

Metacritic is More Nuanced, But Also Might Be More Biased

Rotten Tomatoes biggest problem may be that it avoids nuance, but there’s an understandable reason why it might want to. While Metacritic embraces nuance, it’s also sometimes criticized for getting it “wrong.” As we established earlier, Metacritic assigns a numeric value to reviews before averaging them. However, picking those numbers can be a subjective ordeal.

For example, many review sites will offer letter grades attached to their reviews on an A through F scale. In the case of an F, Metacritic would assign that review a score of 0, while a review like a B- might receive a 67. Some reviewers disagree with how this metric is assigned, believing that an F should be closer to a 50, or a B should be closer to an 80. The lack of standardization across letter grades notwithstanding, this highlights a key problem with Metacritic: How do you put a numerical value on an opinion?

Paradoxically, Metacritic gives reviewers both more and less control over their scores. A reviewer’s rankings and opinions are represented more faithfully with a numerical score than a boolean good/bad value. On the other hand, it also has more wiggle room that might result in a reviewer’s opinions being represented in a way they disagree with. This can be a huge problem if an industry starts relying on review scores. Of course, if Metacritic only allowed each reviewer to choose a score of either 100 or a 0, there would probably be a lot more disagreement (which, mathematically speaking, is exactly what Rotten Tomatoes does).

What Really Matters In a Review Score

No matter how “objective” we try to get when it comes to review scores, we’re still trying to convert opinions into numbers. That’s a bit like trying to turn love into a fossil fuel. The conversion doesn’t make sense on its face. However, review scores are still useful. There are a lot of movies out there and most of us don’t have enough time or money to watch them all for ourselves. Reviewers help us determine which films are worth spending our time on. Handy review scores make it even easier, turning the decision into a simple, two-digit number. In my experience (also an opinion!) here are the best ways to use each metric:

Rotten Tomatoes is a basic yes/no recommendation engine. If you want a simple answer to the question “Should I see this movie?” Rotten Tomatoes probably answers it pretty well. The score isn’t necessarily reflective of how good the movie is, but it measures enthusiasm for a film pretty well. Just keep in mind that it tends to drag films to the extremes.
Metacritic tries to measure the value of a film, based on reviewers opinions. Opinions are never objective, but Metacritic will probably more closely resemble the actual quality of a film than Rotten Tomatoes. The flip side is that the site may also inadvertently inject opinions of its own.
User reviews on all sites are generally consistent representations of the public’s opinion. There are minor variances between Rotten Tomatoes, Metacritic, and IMDb user ratings, but since they’re all open to the public, you can use any user rating to get a decent glimpse into what the average movie going audience thinks. Just keep in mind, it’s exactly that. The average movie-going audience. If your tastes differ from the mainstream, you might not agree with user ratings.

Most importantly, remember that your opinions are still your own. Reviewers, no matter how well-intentioned, come from different backgrounds than you and might enjoy some things you don’t. Moviegoers like to follow review scores like they’re a competitive sport. While that’s fun and all, it’s important to keep in mind no score will ever be truly objective as long as they’re measuring opinions. Use the metrics that are most helpful to you to decide what you’ll spend your time on, but don’t let a number tell you what to like or dislike.