At least, it is if you’re a company that runs 24x7 production infrastructure. If you can’t see what’s going on with your systems, at a glance, it’s next to impossible to run systems with high availability. And, as your systems grow in size and complexity, this quickly goes from “challenging” to “downright impossible” if you don’t have the right tools at hand. At Salesforce, this is something we think about a lot. We have hundreds of software stacks, running in data centers all over the world, serving mission-critical applications for the world’s most successful companies. Keeping all of that humming along, while serving billions of transactions every day, is no small feat. In this post, we’re excited to share Refocus, a new tool we’ve created, that helps our Site Reliability Engineers do just that. We use it at Salesforce, and we’re releasing it as open source so you can use it, too.
Checkout our blog on Medium!