Interview Question for a Netflix Analytics Engineer: SQL

Based on how many hours they watched over the previous month, Netflix hopes to identify its VIP subscribers. Finding the top ten users with the most overall watch hours is the aim.



We utilise two datasets to solve this:

1. Users Table: Holds user information.
user_id: The user's unique identification number.
sign_up_date: The user's sign-up date.
subscription_type: Subscription type (standard, premium, basic, etc.).
 
2. Watching Activity Table: Monitors what users are streaming.
activity_id: A special number assigned to every activity.
user_id: An identifier that connects a user to an activity.
date_time: The activity's time stamp.
show_id: The show's identification.
hours_watched: The user's viewing hours during this session.

Examples of Data:

Using a SQL query to find VIP users
The SQL query will:

  1. Filter the most recent month's records (in this example, October 2024).
  2. Add up all of the users' watch hours.
  3. Users are sorted in descending order by the total number of hours they have viewed.
  4. Keep the results limited to the top ten users.
This is the querry:
WITH last_month_watching_activity AS (
    SELECT
        user_id,
        SUM(hours_watched) AS total_hours_watched
    FROM
        watching_activity
    WHERE
        date_time >= DATE_TRUNC('month', CURRENT_DATE) - INTERVAL '1 month'
        AND date_time < DATE_TRUNC('month', CURRENT_DATE)
    GROUP BY
        user_id
)
SELECT
    u.user_id,
    u.sign_up_date,
    u.subscription_type,
    lm.total_hours_watched
FROM
    last_month_watching_activity lm
JOIN
    users u
ON
    lm.user_id = u.user_id
ORDER BY
    lm.total_hours_watched DESC
LIMIT 10;
Detailed Description 
1. Using the WITH Clause to Create a Common Table Expression (CTE)
WITH last_month_watching_activity AS (
    SELECT
        user_id,
        SUM(hours_watched) AS total_hours_watched
    FROM
        watching_activity
    WHERE
        date_time >= DATE_TRUNC('month', CURRENT_DATE) - INTERVAL '1 month'
        AND date_time < DATE_TRUNC('month', CURRENT_DATE)
    GROUP BY
        user_id )
Purpose: In this section, a temporary result set named last_month_watching_activity is created, which determines how many hours each user has watched during the past month.
 
Breaking Down Each Line:
  • WITH: initiates a temporary table called a Common Table Expression (CTE), which is intended to improve readability and reuse.
  • SUM(hours_watched): sums up how many hours a user has spent watching. This indicates the total number of hours that each user viewed within the specified period.
  • date_time >= DATE_TRUNC('month', CURRENT_DATE) - INTERVAL '1 month': restricts the data to activities that began at the beginning of the previous month.
  • DATE_TRUNC('month', CURRENT_DATE): provides the current month's first day.
  • It is moved to the first day of the preceding month by subtracting INTERVAL '1 month'.
  • date_time < DATE_TRUNC('month', CURRENT_DATE): makes sure we don't include anything from the current month.
  • GROUP BY user_id: Each user receives a row in the result after the data is aggregated by user_id.
The CTE's outcome:



2. Main Query
SELECT
    u.user_id,
    u.sign_up_date,
    u.subscription_type,
    lm.total_hours_watched
FROM
    last_month_watching_activity lm
JOIN
    users u
ON
    lm.user_id = u.user_id
ORDER BY
    lm.total_hours_watched DESC
LIMIT 10;

Purpose: This is the primary query that retrieves the final results by adding user information to the CTE and sorting it to identify the top ten users.

Breaking Down Each Line:
  • SELECT u.user_id, u.sign_up_date, u.subscription_type, lm.total_hours_watched
  • Selects user_idsign_up_dateand subscription_type from the users table.
  • Adds total_hours_watched from the last_month_watching_activity CTE.
  • FROM last_month_watching_activity lm: Refers to the CTE created earlier, which has aggregated watch hours for each user.
  • JOIN users u ON lm.user_id = u.user_id
  • Matches the user IDs in the last_month_watching_activity CTE with the corresponding IDs in the users table.
  • adds user information, including sign-up date and membership type, to the aggregated data.
  • ORDER BY lm.total_hours_watched DESC: places the most active users at the top of the results by sorting them in descending order of total hours watched.
  • LIMIT 10: Limits the list of results to the top ten users.
Top 10 VIP Users as the output
The results derived from the sample data will be:

Important Points:
  • With 7.5 hours of viewing last month, User 1 is the most active user.
  • Every user is arranged according to the hours they watch.
  • The top ten results are the only ones shown.
Conclusion: Important Things to Remember
In technical interviews, knowing how to answer SQL questions like this one is crucial, particularly for positions involving data. Here's why concentrating on this query will help you:

  1. Real-World Relevance: This scenario illustrates your capacity to resolve issues that have a direct bearing on decision-making by reflecting a real-world business situation.
  2. Gaining Proficiency in Fundamental SQL Skills:
  • using date ranges to filter (time-based analysis).
  • combining information (SUMGROUP BY).
  • Results are sorted and limited for top-N analysis.
  • Interviewers frequently evaluate essential SQL operations, such as joining tables to augment data.

  1. Demonstrating Knowledge of Business: Writing SQL queries is only one aspect of it; another is analysing data to derive insightful conclusions. You can demonstrate your business savvy by emphasising the "why" behind your question.
  2. Managing Complexity: Interviewers are interested in your approach to challenging issues. You exhibit structured thinking by rationally outlining your processes, which include filtering, aggregating, enriching, and sorting.
  3. Communication Skills: Explaining your ideas well is essential during an interview. These kinds of questions allow you to demonstrate both your technical proficiency and your capacity for straightforward explanation.
  4. Adaptability: Although Netflix is the subject of this example, the ideas can be applied to any sector. Stress how these SQL concepts can be applied to other industries, such as healthcare, finance, or e-commerce.
You can establish yourself as a candidate who not only knows how to write SQL but also comprehends its strategic applications in resolving real-world issues by practicing queries like this one. This is a crucial skill for data professionals.
Hi There, I'm Yahya, and I enjoy sharing knowledge and experiences.