What is SRE (Site Reliability Engineer)

Before deep dive into the SRE world, let’s talk about, where SRE is derived from. The concept of SRE got originated in 2003 by Ben Treynor Sloss. In 2003, when the cloud wasn’t a thing, Google was one of the most prominent web companies with a massive and distributed infrastructure. They had several challenges to face simultaneously; keep the trust and reputation of their services, provide a smooth user experience involving minimum downtime and latency, manage dozens of sprawling data centers, etc. They needed to rely heavily on automation and, thereby, formulated strategies that led them to implement large-scale automation. Small Companies at that time could bear the loss of a few hours of downtime but giants like Google could not afford it as they were a frontier of best user experience. Therefore, come to think of it, building a team that can help ensure the application’s availability and reliability was an obvious outcome.

Continue reading “What is SRE (Site Reliability Engineer)”

A Detailed Guide to Key Metrics of MongoDB Monitoring

In the current era, organizations demand high-quality working data, and management systems that can scale, deploy quickly, robustly, are highly available, and highly secure for any unfortunate incidents. Traditionally, applications used relational databases as the primary data stores but in today’s need for data-driven applications, developers lean towards alternative databases like NoSQL(Not Only Structured Query Language).

NoSQL databases enable speed, flexibility, and scalability in this era of growing development in the cloud. Moreover, NoSQL databases also support JSON-like documents which are commonly used formats to share data in modern web applications.

Continue reading “A Detailed Guide to Key Metrics of MongoDB Monitoring”

Prometheus-Alertmanager integration with MS-teams

As we know monitoring our infrastructure is one of the critical components of infrastructure management, which ensures the proper functioning of our applications and infrastructure. But it is of no use if we are not getting notifications for alarms and threats in our system. As a better practice, if we enable all of the notifications in a common work-space, it would be very helpful for our team to track the status and performance of our infrastructure.

Continue reading “Prometheus-Alertmanager integration with MS-teams”