Eliminating Downtime and Enabling 100K Concurrent Users for PhysicsWallah - OpsTree Global
AI Icon OpsTree AI Experience Center Explore Now →

Eliminating Downtime and Enabling 100K Concurrent Users for PhysicsWallah

The client is a leading Ed-Tech unicorn, providing affordable, high-quality educational content for over 6 million students. Focused on competitive exams like JEE, NEET, and UPSC, the client is committed to making education accessible.

The Problem Statement

The client aimed to achieve seamless live class streaming for up to 100K concurrent users while maintaining infrastructure stability and data security. Their existing system, housed under a single AWS account, needed improvements in scalability, security, and operational efficiency to meet the growing demand.

Challenges

The infrastructure couldn’t support 100K concurrent users, leading to risks of downtime and poor performance.

Both production and non-production environments were in the same AWS account, making resource management and isolation difficult, which increased the risk of disruptions.

Security evaluation and traceability were lacking, leaving the system vulnerable to breaches and making it hard to track incidents.

Reliable data backups were missing, putting the platform at risk of data loss during unexpected events.

There was no alert system for incidents or cost monitoring, making it hard to respond quickly to issues, leading to delays and increased costs.

Solutions

To tackle the pressing challenges of scaling live class streaming the client required re-evaluating their AWS architecture to enhance resource management, strengthen security protocols, and establish effective monitoring solutions. All aimed at supporting a rapidly expanding user base while ensuring seamless operation and data protection.

Set up multiple AWS accounts with comprehensive segregation of controls, implementing preventive and detective controls on each account using AWS Control Tower and establishing a defined hierarchy with AWS Organizations. 

Deployed AWS GuardDuty and AWS Security Hub to provide centralized findings and reporting for all accounts, enhancing security visibility and response capabilities. 

Created customized Grafana dashboards to visualize critical endpoint SLAs, application-specific metrics, and their dependencies, improving operational monitoring.

Implemented comprehensive observability and business intelligence monitoring dashboards using AWS QuickSight, enabling informed decision-making through data visualization.

Established cross-region, cross-account RDS backups using Terraform to ensure reliable data protection and recovery across the infrastructure.

Integrated Kubecost into the Kubernetes environment for in-depth insights into cost allocation and utilization of its components, optimizing resource management. 

Deployed CID dashboards in AWS to monitor cost utilization across accounts and services, facilitating better financial oversight and resource allocation.

Outcomes

Created a unified framework for scalable account management with centralized access controls.

Streamlined cost allocation and enhanced resource tracking and security monitoring.

Increased platform resilience to support approximately 100K concurrent users. 

Established a robust disaster recovery plan to mitigate data loss during regional outages.

Faster & Secure Software Delivery With BuildPiper!!

See the Impact We've Made

Billion-dollar cosmetic sales were enabled by state-of-the-art observability and monitoring.

Read More

Real-time observability and monitoring ensured seamless streaming for 33 million users.

Read More
Get in Touch!
Experience Faster Time-to-Market
w

Possibilities ReImagined

w