Monitoring Blind Spots in the Cloud and What to Do About Them

Discover the 3 primary monitoring blind spots and what you and your team can do to stay aware of ever-hidden performance problems.

Cloud adoption is growing because it comes with many advantages—like easy provisioning of new resources when there’s demand for it. Plus, there are generally short-term money savings as well.

The cloud is more than just SaaS; there are lots of third-party providers that use the cloud—including DNS, CDNs, and APIs.

This means that there’s more to monitor than ever, and in this post, we’re going to cover the best ways to avoid the biggest blind spots that come with complex infrastructure.

The 3 blind spots we cover are:

What third parties control
What users control
What you control

1. What Third Parties Control

Because of all the migration to the cloud, you no longer control much of the network you rely on. Your providers may or may not be monitoring their own infrastructure. Your providers are also using cloud solutions and other third parties—so there are lots of interconnections that you and your customers are depending on.

With SaaS, you don’t write any of the code. Instead, you just pull up a browser window and log in to the product. You may be able to customize bits and pieces, but the control is ultimately in the hands of the provider.

What can you do about it?

Choose the Right Third Parties

You’re going to need third parties, so your best first step is to choose the right ones.

Make sure they understand your business needs and that there’s a mutual SLA, and that they adhere to it.
If you’re migrating to the cloud, you’ve got to have a solid plan in place for pre and post-migration. A monitoring system will help you benchmark performance both before and after you’ve migrated so that you can make improvements where needed.

Build a Redundancy Plan

Once you’ve selected the right third parties to suit your business needs, it’s time to build a redundancy plan.

Have backups for all of your third parties because you can’t control third-party outages; you can only control your preparation for it.
Have a plan in place for what happens when an outage occurs. Who’s responsible for communicating with that third party? How will you communicate with your internal team?
Besides having backup CDNs, servers, and applications—you need to have a backup plan for what to do if an outage happens. Make a plan with your team about communicating with your users, and have a backup site ready to go—learn how to make this backup site more than just an error page here.

Manage Third-Party Tags

You should use a reputable tag manager. This will help you wrangle issues fast, often before they’ve affected your customers.

Make sure you know where your tags are and which third parties they belong to.
Keep tags as lean as you can during events. Eliminate any unnecessary ad tags that you don’t need. If you’ve got to have the ads there, make sure they aren’t delivering Flash, video, or large image files.

2. What Your Users Control

It’s not always the fault of something on your site. It could be a browser issue, device issue, or a geographic one—all of which are determined by your user.

We once detected an error in loading times for Internet Explorer users. It turns out that it was a problem with JavaScript and iframe—but there wasn’t anything that could be done to enhance speed if their users were on Internet Explorer instead of another browser.

You can’t control your user’s browser, but here’s what you can do:

You don’t choose their device. So, make sure your website is responsive.
You don’t choose their location. So, deploy multiple CDNs for faster delivery of content at your major points of presence (PoPs).

3. What You Control

Believe it or not, there are many popular blind spots that are under your control. Namely, you’re not monitoring all the pieces of your infrastructure. Let’s take a look at some examples.

MQTT

MQTT is a machine to machine (M2M) protocol that powers the Internet of Things. Monitoring MQTT means you can spot disruptions occurring between your devices or those of your users. Pinpointing MQTT issues will help your team improve mean time to resolve (MTTR).

API

Monitoring APIs will help you pinpoint poor execution and detect which API or location is causing a particular problem—whether it’s an internal or external API. This is key to improving business-critical transactions, like your checkout process.

DNS

If you’re not monitoring your DNS, then you’re missing a critical point in your customer/client journey—the very beginning of their journey—and one that can make or break their loyalty to your brand. If they can’t get to your site, you need to know about it.

SMTP

If you monitor your SMTP server, you can improve application availability and quickly detect outages and protocol failures. You’ll be able to determine whether an outage is due to a connection failure or SSL not being supported by your user’s browser.

Monitoring Blind Spots

Despite not being in control of third parties, cloud, or user behavior, you can still deploy a few monitoring best practices to give yourself a leg-up on detecting issues quickly and improve your mean time to resolve (MTTR).

These monitoring practices will make sure your third parties are meeting your requirements, and they’ll also help you determine whether an issue is yours, a vendor’s, or SaaS providers.

The Cloud

Applications in the cloud should have better, or equal, performance than before migration. To test whether or not your cloud providers meet your performance requirements, you should use a combination of synthetic and real user monitoring (see below).

Once you’re completely migrated to the cloud, you can continue with both RUM and synthetic to make sure your third parties are adhering to SLAs.

Synthetic monitoring

You can’t nip issues in the bud without synthetic monitoring. Synthetic monitoring means you can automate typical user behavior with your SaaS applications and third parties. You can monitor page load, response, and transaction time.

Real User Monitoring

You can’t rely only on synthetic because your users might be experiencing something different—you need to know exactly what their experience is.

Combine RUM and Synthetic

Ramp up synthetic users to test new features or prep for high-traffic events. Look into RUM at different PoPs—i.e., make sure you’re not looking at users in AWS from AWS.

Where Your Users Are

You need to measure performance from wherever your users are—your PoPs. A monitoring solution should measure the quality of internet services being used to deliver the SaaS application from internet backbones around the world. Including DNS and acceleration services that support the application delivery as well as internal network services.

3 Key Takeaways

When it comes to blind spots, there’s really no better way to deal with them than to prepare.

Make sure you choose the right vendors
Have backups ready for each piece of your infrastructure (including a backup site)
Monitor each piece of your infrastructure so that you can determine where the problem lies and improve your MTTR.

We Provide consulting, implementation, and management services on DevOps, DevSecOps, Cloud, Automated Ops, Microservices, Infrastructure, and Security

Services offered by us: https://www.zippyops.com/services

Our Products: https://www.zippyops.com/products

Our Solutions: https://www.zippyops.com/solutions

For Demo, videos check out YouTube Playlist: https://www.youtube.com/watch?v=4FYvPooN_Tg&list=PLCJ3JpanNyCfXlHahZhYgJH9-rV6ouPro

If this seems interesting, please email us at [email protected] for a call.

Relevant Blogs:

Unlocking Four Requirements for Enterprise-Grade Kubernetes

Serverless Reference Architecture

AWS CDK Project Blueprint - Modeling and Organizing

AWS CDK Project Blueprint - Modeling and Organizing (Part 2/2)