If you own a car, you already know the concept of preventive maintenance: change oil and filters regularly and the likelihood of your engine failure will dramatically decrease. Your engine is still fine, there aren’t any weird noises and it operates as it should, but you still plan on doing some maintenance because otherwise the costs of replacing a broken engine or gearbox will be much higher. And not only that, try to remember a case when the car broke NOT at the moment when you needed it most. All this means that without preventive maintenance, you are increasing your chances of losing money, time and maybe getting a couple of new grey hairs.
Data centers, as you imagine, require preventive maintenance as well, but the stakes are much higher: not only does downtime cost you money for repairs, you also lose customers and damage your brand.
So, based on the schedule you build depending on equipment age or workload importance, be sure to include the following into the checklist:
- Visual inspection to check that all components are clean and function within their designed specifications
- Environmental inspection to check your data center room conditions: temperature, airflow, dust, etc.
- Disk scans to ensure that you don’t have any unexpected data corruption. Add capacity planning to this to ensure you don’t run out of space
- Event log inspection to see the trends, security threats or any unusual activity
- Test and then install patches and updates
- Schedule the next maintenance
What are the benefits of preventive maintenance?
If you stick to your schedule and carefully investigate and fix any problems you find in your data center, here’s what you gain:
- Enhanced efficiency of your equipment. Data centers consume a massive amount of energy, so if you want to lower your operational expenses, make sure your equipment works efficiently.
- Extended server lifespan. Keep in mind that server components running outside of their requirements will degrade faster, so don’t let them overheat, pay attention to electricity spikes, dust contamination, etc.
- Environmental sustainability. Today, this is one of the priorities you should keep in mind. Running environmental maintenance will make sure your data center is running more efficient.
How can preventive maintenance reduce business risks?
Unexpected downtime will affect not only your hardware, it will also cost you money. If you check the Availability Report, one hour of downtime of a “high-priority” application is estimated to cost around $67,000. You will also lose clients that are not satisfied with your RTO and RPOs.
These failures not only affect your customers, but also place additional psychological risks on your employees: extra shifts, stress and time pressure.
By performing preventive maintenance, you can take care of your equipment, protect your brand reputation and keep employees happy.
What are the differences between preventive and reactive maintenance?
This is easy: maintenance can be planned and unplanned. Preventive is planned and conducted to prevent any breakdowns. Unplanned is reactive and a response to a breakdown you need to react to.
My grandma used to say, “Don’t go solving problems that didn’t happen yet.” While that might be good advice in daily life and a easy concept to follow, that’s a bad strategy for a data center. Repairing or replacing broken equipment might seem like an easy plan, but it’s very costly and doesn’t meet modern RTO and RPO requirements. So, let’s agree that we will use some form of preventive maintenance.
What are the different various types of preventive maintenance?
There are a few types of preventive maintenance, but they have quite a lot in common and you can certainly combine them in your data center maintenance strategy.
Planned maintenance
This strategy means that you perform your equipment assessment based on time intervals. The goal here is to increase the reliability of your servers by regularly checking and cleaning the equipment and fixing any issues. The downside of this type of maintenance is that it’s time-costly and doesn’t take into account the asset wear.
Predictive maintenance
This plan requires purchasing some additional condition-monitoring equipment. Special sensors will be measuring your asset wear based on vibrations, infrared heatmaps, ultrasonic scans, etc. Based on this data, you’ll be able to predict when a piece of your equipment is about to fail and fix or replace it upfront. However, this is costly and requires some skilled personnel to interpret the sensors data.
Proactive maintenance
This is a more modern type of predictive maintenance that involves figuring out the reason for failures. This type of maintenance happens only when necessary. You gather data about your equipment and when you detect a possible failure of a certain component, only then you apply the required fixes. The main goal of this type of maintenance is to extend your equipment’s lifespan.
Veeam’s approach to preventive maintenance
Veeam has a number of tools that help you identify problems with your data before you have an emergency situation and you restore your backup only to find out that it’s been contaminated with a virus. We offer Secure Restore to check your backups for malware, and SureBackup and SureReplica to test that your data is recoverable before you need it. But one of the most recent and impressive features is Veeam’s Intelligent Diagnostics.
Veeam’s Intelligent Diagnostics has a lot in common with the preventive maintenance I mentioned above. Your Veeam ONE has a large pool of VID signatures that describe various possible causes of failures of your Veeam environment: Veeam Backup & Replication servers, proxies, repositories, etc. These signatures are based on the data that out customer support is gathering and updating on a regular basis. If there’s something in your environment that can cause a failure, there’s a good chance someone somewhere has already encountered the same problem and we already know the cause and the remediation. If the problem has been identified, you simply agree on suggested remediation actions and the problem is gone. You can learn more about this here.
Making use of this feature will prevent any surprises you might get in case of an emergency and you can rest assured that your data is safe and available.
If you want to see Veeam’s Intelligent Diagnostics in action, try our 30-day trial for free.