IMDb Movie Ratings Over the Years
It’s time for a random dose of statistics courtesy of The Internet Movie Database. Let’s consider all movies that have been released theatrically over the last 60 years and see whether there is a trend in their perceived quality over time. That is, do new movies generally receive higher or lower scores on IMDb than old movies?
Before looking at the numbers though, we need some rules to clarify what types of movies we are considering:
- We only consider theatrically-released films — no straight-to-video movies or TV movies.
- Short films that were released theatrically (such as Pixar’s Presto) are included.
- We only consider movies that have received 1000 or more votes. This restriction is to prevent movies with only a handful of votes from skewing the results too much.
- The theatrical release date of the movie must have been at least as recent at 1950.
IMDb contains 10034 movies that satisfy the above criteria. The average score (on a scale of 1 to 10) of those movies is 6.38 and the median score is 6.6. The average score per release year is given by the following graph:
As you can see, older movies (1950 – 1975) have abnormally high scores, as do very recent movies (2000 – 2009). These differences are indeed statistically significant. For example, the p-value associated with the test that the mean score in 1950 is the same as the mean score in 1989 is less than 10-19. The p-value associated with the test that the mean score in 2008 is the same as the mean score in 1989 is about 0.0021. Other nearby years give similar p-values.
So this tells us that, in general, particularly old movies receive the highest scores, followed by newly-released movies, followed by “semi-old” movies from the 1980’s and 1990’s. So why the differences? Were movies from the 1980’s really just that bad? Possibly, but the more likely explanation is that movies from the 1950’s through 1970’s have artificially higher scores because people don’t generally go back and watch the crummy movies of the last generation, so they get forgotten and do not have 1000 votes on IMDb. Will people be watching Disaster Movie in forty years? I sure hope not.
On the other hand, particularly recent movies tend to draw a fair amount of hype and fanboyism. Remember when The Dark Knight had a score of 9.8 and was at #1 on the IMDb top 250? Now, one year later, it has a score of 8.9 and is located at #9 on the top 250. It will likely dwindle a little further down over the coming years as well.
The Best and Worst of Each Year
While we’re looking at ratings of movies over the years, I suppose I might as well provide a list of the best and worst movie of each year (based on the votes of IMDb users), since such a list is not available on the IMDb website itself to my knowledge. Keep in mind that, as before, only movies with 1000 or more votes are considered. Enjoy!
Downloads:
- IMDb rating data [.zip of an Excel spreadsheet — 341KB]
- IMDb rating data [tab-delimited plaintext — 0.97MB]
I believe the 1000 vote cutoff has a strong influence on the older movie data. The list of what people have seen is skewed to the classics; most of the bad old stuff has fallen into such obscurity few have bothered to put in a rating.
One other possibility is the number of movies per year. If the number of movies released in a given year has increased with time, then the decrease could be caused by the tail. For example the number of movies with a >7-star rating in 1950 may be the same as in 2007, but there could be many more movies in 2007 with a much lower rating which causes the lower mean/median. It might be better to investigate the histogram of the number per rating per year instead.
@Jason Dyer – Absolutely. Nothing illustrates that point better than the fact that the “worst” movie of 1950 with 1000 or more votes is Destination Moon, which has a rating of 6.3
@batz – It’s hard to get reliable data for older movies (since, as has been said, so few people watch and vote on the old crummy movies), so I’m not sure if that’s a viable approach.
Modifying the graph so that all movies (regardless of vote count) are considered (and weighed equally) surely wouldn’t lead to very useful data since there are a handful of movies with a dozen or so votes for every “real” movie with a couple thousand votes. Similarly, if we consider all movies but weigh the “influence” of each movie proportionally to the number of votes that movie has received, we’ll end up with a graph that looks almost identical to the one provided above.
While the average appears to be going down, don’t forget how many more movies there are now than then. There are actually more really great movies out every year than there were in most decades in the past.
Thanks for the interesting stats. Couldn’t resist pointing out that many of the worst movies you’ve listed have been royally butchered by Mystery Science Theater 3000, and it says a lot about the talent of that show that they were able to select these movies before such statistics were available. In many cases these movies went on to the be most popular MST3K episodes too. Hope this doesn’t count as thread hijacking.
Up is overtaking inglorious bastard on the top 250 list http://www.imdb.com/chart/top?tt0361748
Tarrentino is the fuckin MAN. Cool graph too.
How did you get access to the IMDB database? I’m interested in doing a similar study on another site.
If you look, “Up” by Pixar is 2 spots higher than Inglorious Basterds. It was also made in 2009. Please check your sources. I’m so adamant on this because I just absolutely love Pixar to death. Please give Pixar the recognition it deserves. Thank you.
is it possible to list IMDb movies by it’s rating:
8.0, 8.1, 8.2, etc etc.
dickie_foy@hotmail.co.uk
@Shukri
You’re assuming people voted those down because they saw the movie and thought it was bad, and did not vote them down because they are fans of MST3K and voted it down. They would certainly be low on the list, but MST3K may drop the rating.
Hi Nathaniel,
Interesting study. I wonder about the bias of the movie ranking system due to the ability (necessary or you wouldn’t select movies for viewing)to see the ranked score of a movie. In other words, I suspect that most people will give a movie a score similar or equivalent to it’s existing ranking.
What about the average rating of the top ten movies per year? Or choosing the top N movies as a function of the number of movies created that year? It might give you a better understanding of how movies evolve over time, since most people only watch a few of the best movies every year anyways.
Reminds me of
http://freakonomics.blogs.nytimes.com/2009/04/24/another-way-to-look-at-free-throw-percentage/
@CapnCrunch91 – IMDb makes their database public and available for download: http://www.imdb.com/interfaces
@You’re wrong. and tabris chen – IMDb uses two different scores for a movie. I was using the raw score, not the adjusted top 250 score. Both Up and Inglourious Basterds have an 8.6 raw score (not 8.5), but I.B. won out a few decimal places down the line. I’m a big Pixar fan too (WallE is one of my favourite movies), but I don’t have an agenda here — I’m just going by the numbers.
@Scott – I’m curious about that too. Of course, on the flip side, there are many people who give a rank of 1 to a movie even if they only mildly disliked it if it’s ranked higher than they think it should be. It’s too bad that so many people vote to adjust the ranking rather than vote to give their opinion.
@MWF – That’s actually a really good idea, as it gets us (mostly) around the problem of the huge bias in the 1950’s. There’s enough data to include the best 40 or so titles from each year, so that might be fun to look at.
bear in mind IMDB is not the opinion of the general public but just the pretty hardcore computer geeks who would log in and vote – see The Dark Knight for proof of this
Thank you for very interesting statistics! And clear & cool graphs.
@Badman – I don’t know anyone who doesn’t use IMDB for movie information. It’s not geared towards geeks… it’s geared toward people who enjoy movies.
Have you noticed how many of the “worst” column were on Mystery Science Theater 3000?
Yeah, so did I (:
I think the reason that we have an increase in ratings over the last 10 years is simply to do with kids/younger generation now having regular access IMDb. I don’t think it’s unfair to say that most under 18s are going to be less critical (on average) than adults.
Wow! You are so cool! 😉
Hi Nathaniel – great study! thanks for sharing.
Have you ever looked at the relation between the ratings of films and their box office returns? I would be curious to see a study like that.
I was under the impression “Kiwi!” was just a run of the mill youtube video. From the description – “My Master’s Thesis Animation, which I completed while I was at The School of Visual Arts, MFA Computer Art, in New York City.”
It is good to see this data and analysis. When I downloaded the data file, I looked at the variance and noticed the variance for past year data was smaller, but with more variations. The variance for more recent years is larger with less variations. I would suggest that this too is a matter of the sampling for older movies. If the data is statistically significant for the older movies, one explanation could be that more movie aficionados are rating older movies, while a more general population is evaluating newer movies. Another factor could be the vote limit rule, which would select for those films which have larger audiences. This would make older, poorer films show up less in the data, and furthermore would be selective of the better films with higher ratings with less rating variance.
Its funny how Kubrick has 2 of the best movies, the Killing and Dr Strangelove as well as 1 worst movie, The Flying Padre, a documentary about a airplane flying minister. This “movie” is definitely not the worst movie of the year, but the 1000 votes rule makes it the only mediocre movie to be voted for over 1000 times as the only people who will have seen this since IMDB was set up will be Kubrick obsessives like myself.
You should also note that there was a concentrated effort by the Mystery Science Theater 3000 (MST3K) fanclub (the Usenet group rec.arts.tv.mst3k.misc — check the GOOGLE GROUPS archive with the and use rec.arts.tv.mst3k.misc and “imdb” + “vote” to find relevant discussion threads) to “bad list” every original movie that got the MST3K delightful value-improving mockery treatment (to diminsh costs of using the original movie’s licensing rights) back in 1996 and the campaign to “bad list” by voting waves kept slowly going up to 2001.
It kind of worked given that the named movies did hit the “Worst 100 Films” list, but did not drop the costs of the licensing rights for the movies. Realistically the campaign should’ve been using the “Your movies are lapsing into welcome obscurity, use MST3K to act as a joy-creating sales platform to keep your flicks in circulation.” Alas, this was not clued upon until much later in the rerelease marketing cycle.
This list is fatally flawed because it is wildly different from the IMDB top 250 list…I was going to “fix” it but after a dozen corrections, I gave up. I can’t understand for the life of me why you would put together a *new* cut on the data that IMDB has already decided to use a statistically sound Bayesian estimate for. Their result gives the just right weight to movies like Taxi Driver, Deer Hunter and Star Wars so that they can be recognized in their correct place for the given year. Please consider reworking the list using the IMDB formula. Thanks
@Jim Vierling – The point of this post was never to make a list of the best movies of all time or to “improve” upon IMDb’s top 250 list. The list at the end of the post is nothing more than a “by the way, if you’re curious” sort of thing and is beside the point.
The reason why I used raw numbers rather than those that are tweaked by IMDb’s algorithm is simply that this is the info that IMDb provided. As far as I’m aware, they don’t (or at least, they didn’t in 2009 when they post was written) provide a publicly-accessible database of all their tweaked scores — only raw scores. And I don’t have time to sift through 10000+ IMDb pages to extract that information 1 at a time.