Case Study

Few intriguing success stories from some great brands.

24/7 Uninterrupted User Experience With Cloud Platform Resilience & Efficient Scaling

About Customer

The leading Indian e-commerce company specializes in multi-beauty and personal care products. It houses 1,500+ brands and over 1.8 million products, catering to diverse consumers’ journeys across its platform.

Problem Statement
  • traffic

    Given the huge incoming traffic, it was getting difficult to balance platform scaling and cost optimization.

  • Managed DevSecOps support

    Needed extensive Managed DevSecOps support to decrease the cognitive load of their engineering teams.

  • Wanted to cut down their time on self-managing MongoDB which was cumbersome to manage and usually resulted in downtime while performing any software upgrades.

  • critical data

    Given the diverse services they were dealing with – alerts from every single component were getting populated which increased the mean time to resolution.

  • Redis

    Required fine-tuning of Redis for reducing response times during sales, as the high volumes of data being sent overwhelmed Redis, leading to congestion.

  • During the peak sale, they were facing issues with EFK and Kafka which made it difficult to view the logs if any application crashed.

Solution Offered!
  • Managed multiple AWS services – From rightsizing them to ensuring their optimum performance by analyzing the associated metrics via Cloudwatch.
  • Reduced noisy alerts with intelligent threshold values and managed their precedence during the sale period. Analyzed the root cause of repeated alerts to reduce the MTTR.
  • Set up parallel sending of logs to Amazon S3 alongside Kafka, ensuring no loss of logs.
  • Set up dashboards and alarms of all services on Grafana and Cloudwatch.
  • Categorized alerts on the basis of criticality, source, and service for efficient prioritization and enabling quick action on genuine incidents.
  • Improved monitoring for automation alerts by attaching screenshots & INC numbers for false alarms.
  • Many cost-optimization processes are established

    – Deleted old Production/Non-production snapshots of RDS, AMI, etc, and saved around $6000 monthly

    – Created a Lambda function to stop the functioning of Pre-Prod RDS during Non-Working Hours for cost optimization

    – Right-sized the services as per the environment and usage patterns to ensure optimum utilization at reduced costs

  • End-to-end management of their Ad tech platform from the planning of resources during sales, logs handling, new services onboarding, and streamlining daily operations.
  • Migration of MongoDB to DocumentDB for eliminating the operational overhead involved with managing underlying infrastructure scaling, updates, etc.
  • Fine-tuned Redis performance by adding shards to the Redis cluster, which optimized the network bandwidth and eliminated congestion — scaled performance effectively during sales.
  • Implemented AWS IAM Policy for fine-grained access controls of workloads.
  • Recommended a custom Status Check Page which gave a central view of service connectivity with AWS resources thereby reducing the Dev dependency on DevOps.

Final Outcomes
  • Dashboard

    Real-time dashboard creation for newly created services through automation scripts.

  • automation

    Suppressed noisy alerts by priority, alert source, tag, and enabled auto-resolve of non-critical alerts.

  • automated-remediation

    Assessed their business lines for gaps and introduced automated remediation and 360-degree visibility across all services.

  • security

    Fortified defenses with SOC implementation to maintain a robust security posture under high-volume user requests.

  • User

    Prompt incident response and robust governance enabled uninterrupted user experience.

  • Dedicated support during the sale for ensuring seamless scale of the platform to maintain high availability.

  • cost

    Standardized, simplified, and rationalized the usage of services for creating a cost-optimized tech platform.

Looking for a similar solution?

Know More