fbpx
Apply to program

Analyzing Netflix Movie Trends: Are Movies Getting Shorter?

Are Movies Getting Shorter _ Sunway College Kathmandu

I recently conducted a data analysis project on Netflix movies, and I’m excited to share some intriguing insights with you! In a world where binge-watching has become the norm, I set out to investigate whether movies on Netflix are getting shorter over time. The answer? Well, it’s not that straightforward.

🍿 Movie Duration Trends: While some movies have indeed become more concise, it’s essential to note that not all genres are following this trend.

Here’s what I discovered:

📉 Decline in Length: Genres like Children’s and Documentaries have shown a decline in movie length. It seems that audiences in these categories are favoring shorter, more focused content.

📈 Consistency: Interestingly, genres like Stand-up comedy have remained relatively consistent in their movie durations. This suggests that these genres are maintaining a stable viewing experience.

Further Exploration:

1. Loading your friend’s data into a dictionary

Netflix! What started in 1997 as a DVD rental service has since exploded into the largest entertainment/media company by market capitalization, boasting over 200 million subscribers as of January 2021.

Given the large number of movies and series available on the platform, it is a perfect opportunity to flex our data manipulation skills and dive into the entertainment industry. Our friend has also been brushing up on their Python skills and has taken a first crack at a CSV file containing Netflix data. For their first order of business, they have been performing some analyses, and they believe that the average duration of movies has been declining.

As evidence of this, they have provided us with the following information. For the years from 2011 to 2020, the average movie durations are 103, 101, 99, 100, 100, 95, 95, 96, 93, and 90, respectively.

If we’re going to be working with this data, we know a good place to start would be to probably start working with pandas. But first we’ll need to create a DataFrame from scratch. Let’s start by creating a Python object covered in Intermediate Python: a dictionary!

2. Creating a DataFrame from a dictionary

To convert our dictionary movie_dict to a pandas DataFrame, we will first need to import the library under its usual alias. We’ll also want to inspect our DataFrame to ensure it was created correctly. Let’s perform these steps now.

3. A visual inspection of our data

Alright, we now have a pandas DataFrame, the most common way to work with tabular data in Python. Now back to the task at hand. We want to follow up on our friend’s assertion that movie lengths have been decreasing over time. A great place to start will be a visualization of the data.

Given that the data is continuous, a line plot would be a good choice, with the dates represented along the x-axis and the average length in minutes along the y-axis. This will allow us to easily spot any trends in movie durations. There are many ways to visualize data in Python, but matploblib.pyplot is one of the most common packages to do so.

Note: In order for us to correctly test your plot, you will need to initalize a matplotlib.pyplot Figure object, which we have already provided in the cell below. You can continue to create your plot as you have learned in Intermediate Python.

4. Loading the rest of the data from a CSV

Well, it looks like there is something to the idea that movie lengths have decreased over the past ten years! But equipped only with our friend’s aggregations, we’re limited in the further explorations we can perform. There are a few questions about this trend that we are currently unable to answer, including:

  1. What does this trend look like over a longer period of time?
  2. Is this explainable by something like the genre of entertainment?

Upon asking our friend for the original CSV they used to perform their analyses, they gladly oblige and send it. We now have access to the CSV file, available at the path "datasets/netflix_data.csv". Let’s create another DataFrame, this time with all of the data. Given the length of our friend’s data, printing the whole DataFrame is probably not a good idea, so we will inspect it by printing only the first five rows.

5. Filtering for movies!

Okay, we have our data! Now we can dive in and start looking at movie lengths.

Or can we? Looking at the first five rows of our new DataFrame, we notice a column type. Scanning the column, it’s clear there are also TV shows in the dataset! Moreover, the duration column we planned to use seems to represent different values depending on whether the row is a movie or a show (perhaps the number of minutes versus the number of seasons)?

Fortunately, a DataFrame allows us to filter data quickly, and we can select rows where type is Movie. While we’re at it, we don’t need information from all of the columns, so let’s create a new DataFrame netflix_movies containing only titlecountrygenrerelease_year, and duration.

Let’s put our data subsetting skills to work!

6. Creating a scatter plot

Okay, now we’re getting somewhere. We’ve read in the raw data, selected rows of movies, and have limited our DataFrame to our columns of interest. Let’s try visualizing the data again to inspect the data over a longer range of time.

This time, we are no longer working with aggregates but instead with individual movies. A line plot is no longer a good choice for our data, so let’s try a scatter plot instead. We will again plot the year of release on the x-axis and the movie duration on the y-axis.

7. Digging deeper

This is already much more informative than the simple plot we created when our friend first gave us some data. We can also see that, while newer movies are overrepresented on the platform, many short movies have been released in the past two decades.

Upon further inspection, something else is going on. Some of these films are under an hour long! Let’s filter our DataFrame for movies with a duration under 60 minutes and look at the genres. This might give us some insight into what is dragging down the average.

8. Marking non-feature films

Interesting! It looks as though many of the films that are under 60 minutes fall into genres such as “Children”, “Stand-Up”, and “Documentaries”. This is a logical result, as these types of films are probably often shorter than 90 minute Hollywood blockbuster.

We could eliminate these rows from our DataFrame and plot the values again. But another interesting way to explore the effect of these genres on our data would be to plot them, but mark them with a different color.

In Python, there are many ways to do this, but one fun way might be to use a loop to generate a list of colors based on the contents of the genre column. Much as we did in Intermediate Python, we can then pass this list to our plotting function in a later step to color all non-typical genres in a different color!

Note: Although we are using the basic colors of red, blue, green, and black, matplotlib has many named colors you can use when creating plots. For more information, you can refer to the documentation here!

9. Plotting with color!

Lovely looping! We now have a colors list that we can pass to our scatter plot, which should allow us to visually inspect whether these genres might be responsible for the decline in the average duration of movies.

This time, we’ll also spruce up our plot with some additional axis labels and a new theme with plt.style.use(). The latter isn’t taught in Intermediate Python, but can be a fun way to add some visual flair to a basic matplotlib plot. You can find more information on customizing the style of your plot here!

10. What next?

Well, as we suspected, non-typical genres such as children’s movies and documentaries are all clustered around the bottom half of the plot. But we can’t know for certain until we perform additional analyses.

Congratulations, you’ve performed an exploratory analysis of some entertainment data, and there are lots of fun ways to develop your skills as a Pythonic data scientist. These include learning how to analyze data further with statistics, creating more advanced visualizations, and perhaps most importantly, learning more advanced ways of working with data in pandas. This latter skill is covered in our fantastic course Data Manipulation with pandas.

We hope you enjoyed this application of the skills learned in Intermediate Python, and wish you all the best on the rest of your journey!

-Bishesh Giri

You may Also Like

history of artificial intelligence (ai)

For AI Enthusiasts: A Fascinating Journey Through the History of Artificial Intelligence

arrow

Upon hearing the term “Artificial Intelligence,” many immediately picture familiar[...]

Artificial Intelligence in Network Security _ Sunway College Kathmandu

Power of Artificial Intelligence in Network Security: From Fraud Detection to Edge Computing in Telecom Infrastructure

arrow

In today's hyper-connected digital landscape, the evolution of technology has[...]

enhance your job placement opportunities _ steps to a better linkedin profile in 2024

Enhance Your Job Placement Opportunities: Steps to a Better LinkedIn Profile in 2024

arrow

In today's competitive job market, LinkedIn stands out as a[...]

best python courses in nepal

Discover The Best Python Courses In Nepal.

arrow

Eager to develop a genuine passion for Python programming? If[...]

is ai reshaping nepal's future

Is AI Reshaping Nepal’s Future? Discover the Impact of Artificial Intelligence in Nepal!

arrow

The integration of Artificial Intelligence (AI) in various sectors worldwide[...]

Best AI Colleges in Nepal _ Sunway College Kathmandu

Which College Offers the Best Program for Artificial Intelligence in Nepal?

arrow

The landscape of education in Nepal has been dynamically evolving,[...]

Start with Sunway College

phoneQuick Enquiry

Subscribe To our Newsletter.