Day 2 — High Level System Design Series -How does YouTube stream millions of videos daily?

System design guide with handwritten math calculations

Varsha Das
Javarevisited

--

Made with ❤️for System Design

As we all know, Youtube is free school, the ultimate hub for entertainment, learning, staying informed, and beyond.

With billions of users and an extensive collection of videos covering diverse topics, massive influx of videos, comments, likes, and shares pouring in every second, it is no easy feat to give all these users & creators a seamless experience.

But have you ever wondered what goes on behind the scenes to make this magic happen?

Isn’t it fascinating to know about how to build a digital infrastructure capable of seamlessly managing the enormous amount of data flooding in from billions of users worldwide?

In this series on system design, we will take a deep dive into the inner workings of YouTube’s architecture from ground up.

This is our second article in the row.

Previous article — URL Shortening System Design

In this series on system design, our aim is to break down the functionality, architecture, challenges, scalability, system APIs, performance, and resource estimation associated with each system. We will provide the essential insights required to tackle similar projects in real-world scenarios. Just as coding involves more than just writing lines of code, design also means building intuition and a structured thought process. Because you cannot really write good code without having a clear picture of your design.

If this is not enough, read through till the end of the article for a special surprise announcement that may supercharge your system design skills!

Through this series, we hope to improve your design intuition and help you become a skilled, curious engineer capable of building large-scale systems.

What will be your unique learning proposition?

If you are not very system design-savvy, you may get to learn some of the cool things below:

  • Blob Storage
  • CDN
  • Transcoder, encoder
  • Bigtable

If you have zero idea about these concepts or heard these for the first time and want to grasp the fundamentals of System Design building blocks first, we have got you covered. Before diving into the rest of the article below, you can check out the below video first:

What shall we cover:

We have divided it into 6 modules

1. Functional and non-functional requirements

2. Building blocks in-depth - why and how

3. Resource estimation — Servers, bandwidth, storage

4. SYSTEM APIs and Database Design

5. Components,Workflow diagram, video uploading + streaming flow

6. Non functional requirements mapping

MODULE 1: REQUIREMENTS

Functional requirements

The actual functionality of this system will cover the points below. You cannot really start a design without knowing the requirements, right?

  • Stream videos: Our system should be capable of efficiently delivering high-quality content (videos) to users worldwide.
  • Upload videos: Users can upload videos of various formats and sizes, ensuring smooth processing and storage.
  • Search videos: Users can easily search videos by entering keywords matching titles.
  • Like and dislike videos: Users can like or dislike videos, according to their preference.
  • Add comments: Users can comment on the videos, fostering community engagement and feedback.
  • View thumbnails: Our system should be able to display thumbnail previews for videos to offer users a visual glimpse into the content

Non-Functional requirements

The criteria specify how a system should behave rather than what it should do.

  • High availability: Ensure that YouTube services are accessible and operational with minimal downtime to users.
  • Scalability: System should be able to handle increasing user load and content growth efficiently, allowing for seamless expansion as demand grows.
  • Good performance: System should deliver fast response times for user interactions such as video playback, search queries, and content loading.
  • Reliability: Ensure that the YouTube platform operates reliably under various conditions, minimizing the risk of system failures and data loss to provide a consistent user experience.

MODULE 2: Building Blocks we will use:

  1. Databases: To store the metadata of videos, thumbnails, comments, and user-related information.
  2. Blob Storage: For storing all videos on the platform.

Blob Storage: It is used to store large amount of unstructured data, like images and videos.

3. CDN: To effectively deliver content to end users, reducing delay and burden on end-servers.

CDN: Content Delivery Network(CDN) is a network of servers strategically placed around the world to deliver content quickly to users, reducing latency and ensuring smooth playback of videos on YouTube.

4. Load balancer: To distribute millions of incoming client requests among the pool of available servers.

5. Servers: To store, manage, and deliver vast amounts of video content efficiently to users worldwide.

6. Encoders and transcoders: To compress videos and transform them into different formats and qualities to support varying numbers of devices according to their screen resolution and bandwidth.

When content creators upload videos to YouTube, encoding process involves converting the raw video data into a compressed format that balances video quality and file size. YouTube typically uses advanced video codecs like VP9 or AV1 for encoding, which offer high compression efficiency while maintaining good video quality.

Transcoding occurs when YouTube needs to deliver videos to users in different formats and resolutions based on their device capabilities, screen sizes, and network conditions. For example, a video uploaded in high resolution may need to be transcoded into multiple lower-resolution versions to accommodate users with slower internet connections or devices that cannot handle high-definition playback. This is called adaptive streaming approach that causes no buffering or playback interruptions.

MODULE 3: ESTIMATION

In this part, we’ll be doing some math to figure out how much resources we will need for our system. It’s important because when we’re building something, we need to know how big it could get and plan ahead for things like storage, bandwidth, servers and so one.

Assumptions:

  • We assume that the total number of YouTube users is 2 billion.
  • Active daily users (who watch or upload videos): 600 million.
  • The average length of a video: 10 minutes.
  • Size of an average (10 minute-long) video before processing(compression, format changes, etc): 600 MB.
  • The size of an average video after encoding, compression: 50 MB.

Let’s break it down!

  1. Total number of YouTube users are 2 billion. This means there are 2 billion users who have their accounts on YouTube.
  2. Out of the total 2 billion users, 600 million users are active on a daily basis. These users engage with the platform by watching videos or uploading content regularly.
  3. The average length of a video on YouTube is 10 minutes. This is the typical duration of videos that users upload to the platform.
  4. Before undergoing any processing, such as compression or format changes, the average size of a 10-minute video is 600 MB. This is the initial file size of a video uploaded by a user.
  5. After the video undergoes encoding, which includes compression and format changes, its size is reduced to 50 MB. This smaller file size is what users ultimately view and interact with on the platform.

Storage Estimation:

We will first estimate the total number of videos and the length of each video uploaded to YouTube per minute.

Keep in mind that this storage estimate is for the compressed versions of videos we intend to retain, not the original uncompressed files. Obviously, the original videos would occupy more storage space. Our goal is to store shorter, compressed videos to optimize storage utilization.

Let us assume that 500 hours worth of content is uploaded to YouTube in one minute.

So, a one minute long video will require 5 MB of storage.

Therefore, if 500 hours of video is uploaded per minute, 150 GB will be required per minute.

Hence, for one day we will require 216 TB of storage.

For one year, we will require 79 PB of storage.

As you might have noticed, YouTube videos comes in different qualities(360p, 720p, 1080p, etc). If we consider there are 5 different video qualities, the overall storage estimation for one minute will be:

For 1 minute, a total of 750 GB of storage would be good enough.

So, the actual estimation of 79PB is just for one format of video, we need more storage to store videos in 5 different formats.

Bandwidth Estimation:

Let us assume that upload : view ratio is 1:100.

This means for 1 uploaded video, there are 100 views, per second.

When a video is uploaded, it not done in compressed format, whereas when we view in 5 different formats, it is present in compressed and encoded format.

So the bandwidth for uploading video needs to be calculated based on the original video size.

As we know, bandwidth can be defined as:

Bandwidth=Traffic × Average Response Size

Therefore, in this case, we can calculate the bandwidth as follows:

Total bandwidth = Total content transferred × Size required per minute

Bandwidth required for uploading:

For uploading videos, we need 14 TBps bandwidth.

It means that the system requires a bandwidth capacity of 14 Terabits per second (Tbps) to handle the process of uploading videos to the platform. This bandwidth capacity is necessary to accommodate the transfer of data from users uploading videos to the servers where they are stored and processed.

Bandwidth for streaming videos:

Let us assume each video requires 5 MB of streaming bandwidth.

Recall, when we stream or view it in 5 different formats, it is present in compressed and encoded format.

Let me explain the above calculation —

500 hours of content * 60 mins/hour * 5MB video (size of 1 min) * view ratio * 8 bits

This gives us 2 TBps streaming bandwidth.

A streaming bandwidth of 2 Tbps means that the system can handle the simultaneous transfer of data at a rate of 2 terabits per second. This bandwidth capacity allows the platform to efficiently deliver video content to users in real-time without experiencing significant delays or buffering.

For example, if there are multiple users streaming videos concurrently on the platform, the total data transfer rate needed to serve all these users would ideally not exceed 2 terabits per second. This capacity ensures that users can enjoy smooth playback experiences, even during peak usage periods when there is high demand for streaming content.

Servers Estimation:

Let us assume that one server can handle 5000 requests per second.(arbitrary number; research says average server can handle 5000–10000 requests per second.).

So, if there are 2 billion daily active users, total number of servers required to handle requests from these users will be:

Therefore, our system will require 400,000 servers to handle 2 billion users per day.

Time for the announcement:

If you are liking what you are reading, you are going to love what I have in store for you on my YouTube channel.

We are providing a range of exclusive perks, such as virtual video collaborations, member-only polls, premium content only for our members, and so much more.

In the context of this article, our exclusive videos will guide you through all the essential aspects of system design, deep-dive into the components, offer extra insights into math concepts, share valuable tips, and provide actionable strategies that can strengthen your System design skills.

Who am I?

Across 7 years, I have led projects on AWS, Java, Spring Boot, Kafka, ELK stack, Splunk, Apache Mesos, etc, optimizing systems for cost-effectiveness, achieving savings of up to 30% , reduced latency by 50% through performance tuning techniques, and enabled systems to handle a higher volume of transactions.

So, I am quite certain you can rely on my expertise.

In addition to that, we would also have exclusive videos dedicated to common interview questions about Java threads and concurrency, detailed plans for Spring Boot projects, and plenty of project ideas. All this special content is just for you!

As you support our channel, join our expanding community, and get access to things you most likely won’t find anywhere else.

To become a valued member right now, click this link.

MODULE 4: SYSTEM APIs and Database Design

We’ll create a separate API for each functionality of the system. Let’s have a look at them:

  1. Upload videos
  2. Stream videos
  3. Search videos
  4. View thumbnails
  5. Like or dislike videos
  6. Comment on videos

Upload Video:

The uploadVideo() API sends a POST request containing the video data.

Stream Video:

The streamVideo() API sends a GET request to fetch the video data.

Search Video:

The searchVideo() API sends a GET request to fetch the requested list of videos.

View thumbnails:

The viewThumbnails() API sends a GET request to fetch the thumbnails of a video.

Like or Dislike videos:

The likeDislike() API sends a PUT request to update the count of likes or dislikes in the video.

Comment on videos:

The commentVideo() API sends a POST request containing the comment data on a specified video.

Database Design:

  1. User Table:
  • user_id: A unique identifier for each user. Serves as the primary key for the table.
  • username: The chosen username of the user for identification purposes.
  • email: The email address of the user for communication and authentication.
  • password: The user's password for secure storage and authentication.

2. Video Table:

  • video_id: A unique identifier for each video. Serves as the primary key for the table.
  • user_id: A reference to the user who uploaded the video. Foreign key which references the User Table.
  • title: The title or name of the video.
  • description: A brief summary of the video content.
  • upload_date: The timestamp indicating when the video was uploaded.
  • duration: The duration of the video in seconds or another appropriate unit.
  • views: The number of views the video has received.
  • likes: The number of likes the video has received.
  • dislikes: The number of dislikes the video has received.
  • location: URL of the video file.
  • thumbnail_location: URL of the thumbnail image associated with the video.

3. Comment Table:

  • comment_id: A unique identifier for each comment. This field serves as the primary key for the table.
  • video_id: Video id to which the comment belongs. Foreign key which references the Video Table.
  • user_id: User who posted the comment. Foreign key which references the User Table.
  • comment_text: The text content of the comment.
  • comment_date: The timestamp indicating when the comment was posted.

4. Channel Table:

  • channel_id: A unique identifier for each channel. This field serves as the primary key for the table.
  • user_id: User who owns the channel. Foreign key which references the User Table.
  • channel_name: The name of the channel.
  • description: A short summary of the channel content.
  • created_at: The timestamp indicating when the channel was created.
  • subscribers: The number of subscribers to the channel.

MODULE 5 : Detailed Workflow

On a high level, below are the steps that happen:

  1. User Uploads a Video
  2. Encoding and Transcoding
  3. Metadata Update
  4. Thumbnail Generation
  5. Content Delivery and Caching
  6. When a user uploads a video:
  • The server stores information about the video and the user in a database.
  • The video is sent to an encoder for compression and transformation.
  • While video is being uploaded to storage, we also send another request in parallel to update the video metadata, such as video title, description, tags, etc.

2. The encoder, along with a transcoder, compresses the video and creates multiple resolutions.

3. The compressed videos are stored in blob storage.

After encoding is done, we store them in a blob storage for generating the thumbnails. As part of the encoding process, thumbnails for the video are generated and stored in Bigtable, a distributed NoSQL database optimized for handling massive structured data, making it ideal for storing multiple thumbnails per video.

Blob storage, short for Binary Large Object storage, is a type of storage solution designed to store and manage large amounts of unstructured data, such as documents, images, videos, audio files, backups, and log files. It is particularly well-suited for handling data that doesn’t fit neatly into traditional relational databases.

4. Popular videos may be forwarded to a CDN for caching, may be selectively forwarded to a Content Delivery Network (CDN) for caching, reducing latency and enhancing the user experience by storing frequently accessed content closer to end-users.

5. The CDN reduces waiting time for users by storing videos closer to them. However, the CDN is not the sole method for delivering videos to users.

Here’s a breakdown of the purpose of each component used in our system mentioned earlier:

  • Load balancers: Necessary to distribute user requests efficiently among web servers.
  • Web servers: Serves as the interface for user requests, connecting with API servers.
  • Application servers: House application and business logic, generating data for web servers to address user queries.
  • User and metadata storage: To manage the extensive user and video data, storing metadata and user-related content in separate storage clusters allows for scalability and decoupling of unrelated data.
  • Bigtable: We’ll have to store mutiple thumbnails for each video. Bigtable is ideal for storing multiple thumbnails per video due to its high throughput and scalability for key-value data, suitable for YouTube’s thumbnail requirements.

Bigtable is a distributed, scalable, and highly available NoSQL database service designed by Google for handling massive amounts of structured data across thousands of commodity servers. Bigtable is optimal for storing a large number of data items each below 10 MB. Therefore, it is the ideal choice for YouTube’s thumbnails.

  • Upload storage: Temporarily stores user-uploaded videos.
  • Encoders: Handle video compression, transcoding into various formats, and thumbnail generation.
  • CDN and colocation sites: Store popular and moderately popular content closer to users for faster access, with colocation centers serving as alternatives to dedicated data centers when necessary for business reasons.

Workflow:

Made with ❤️for System Design

Let’s summarise the video uploading and video streaming flow.

Video uploading:

Made with ❤️for System Design
  1. The user selects the video file they want to upload through the YouTube website or app interface and initiates the upload process.
  2. Information about the video and the user, including metadata like title, description, tags, and user details, is stored in the database for future retrieval and management.
  3. Once validated, the video file is dispatched to an encoder for compression, transcoding, and thumbnail generation. This step prepares the video for efficient storage and streaming.
  4. While the video is being uploaded and processed, another request is sent in parallel to update the video’s metadata, such as title, description, and tags.
  5. The encoder, working in tandem with a transcoder, compresses the video and generates multiple resolutions, like discussed earlier.
  6. The compressed videos are securely stored in blob storage, a specialized solution designed to manage large volumes of unstructured data efficiently.
  7. Thumbnails for the video are generated as part of the encoding process and stored in a suitable storage solution like Bigtable, optimized for handling massive structured data.
  8. Once the upload, compression, and metadata updates are complete, the server notifies the user of successful upload completion, providing a link to the uploaded video.

Video Streaming:

Made with ❤️for System Design
  1. When a user requests to watch a video on YouTube, the client sends a request to YouTube’s servers for the requested video content.
  2. Load balancers distribute user requests efficiently among web servers, ensuring optimal performance and scalability.
  3. The server retrieves the requested video content from storage, the server dynamically transcodes the video into different formats and resolutions to ensure compatibility and optimal playback quality across various devices and network conditions.
  4. The server streams the video content to the user’s device in real-time using HTTP-based adaptive streaming protocols (already mentioned above) such as MPEG-DASH or HLS, adjusting video bitrate and resolution as needed to provide smooth playback.
  5. Popular videos may be cached in a Content Delivery Network (CDN) to reduce latency and enhance user experience by storing content closer to end-users.
  6. The user interacts with the video player interface, controlling playback options such as play, pause, seek, volume, and quality settings.
  7. Users can engage with the video content by liking, commenting, sharing, subscribing, or accessing related content recommended by YouTube’s algorithms.
  8. The server continues streaming the video content to the user’s device until playback is complete or the user chooses to stop watching.

MODULE 6: Reviewing non-functional requirements

So, based on what we have designed so far, we understand that we are able to meet the functional requirements.

But how are we meeting the non-functional requirements?

Let’s discuss and wrap-up.

Low Latency/Smooth Streaming:

  • Utilizing appropriate storage systems based on data type (e.g., Bigtable for thumbnails, blob storage for videos).
  • Leveraging content delivery networks (CDNs) with caching to serve videos from memory, deployed close to end users for low-latency services.

Scalability:

  • Horizontal scalability of web and application servers to accommodate user growth.

Availability:

  • Redundancy is achieved through data replication across multiple servers to avoid single points of failure.
  • Replicating data across data centers for high availability, even in the event of entire data center failures.
  • Local load balancers to exclude dead servers, and global load balancers to redirect traffic to different regions as needed.

Reliability:

  • Use of redundant hardware and software components for fault tolerance.
  • Monitoring server health using the heartbeat protocol to remove faulty servers.
  • Utilizing a variant of consistent hashing for seamless addition or removal of servers, reducing load imbalances.

That’s all for this article.

Thanks for reading.

If you liked this article, please click the “clap” button 👏 a few times.

It gives me enough motivation to put out more content like this. Please share it with a friend who you think this article might help.

Connect with me — Varsha Das | LinkedIn

If you’re seeking personalized guidance in software engineering, career development, core Java, Systems design, or interview preparation, let’s connect here.

Rest assured, I’m not just committed; I pour my heart and soul into every interaction. I have a genuine passion for decoding complex problems, offering customised solutions, and connecting with individuals from diverse backgrounds.

Follow my Youtube channel — Code With Ease — By Varsha, where we discuss Java & Data Structures & Algorithms and so much more.

Subscribe here to receive alerts whenever I publish an article.

Happy learning and growing together.

--

--

Varsha Das
Javarevisited

"Senior Software Engineer @Fintech | Digital Creator @Youtube | Thrive on daily excellence | ❤️ complexity -> clarity | Devoted to health and continuous growth