In theory, hot spots should not exist, because most data centers have cooling capacity that exceeds the needs of the data center. However, 10% of racks run hotter than recommended reliability guidelines, according to a study done by the Uptime Institute – and the percentage is trending even higher.
From a business standpoint, ignoring the growing problem of hot spots can result in poor server reliability and performance or server damage. Data center operators are always seeking solutions to address the risks, but an incorrect fix can actually create more problems and threaten IT performance. The affects of hot spots can have a negative impact on hardware manufacturer warranties and maintenance agreements as well.
Properly eliminating hot spots requires an understanding of what they really are, their root causes, and how to identify them according to the white paper, "How to Fix Hot Spots in the Data Center."
Understanding hot spots
So what exactly is a hot spot? By definition, it’s a location at the intake of IT equipment where the measured temperature is greater than the expected value as recommended by ASHRAE TC 9.9.
To clarify some common misconceptions, it’s important to realize that a hot spot is not:
- a random hot temperature inside of a data center
- a product of inadequate cooling capacity or excessive heat load
- the result of insufficient cooling capacity in a data center.
Instead, a hot spot arises from the inadequate use of cooling capacity. In other words, ineffective airflow management prevents the delivery of cooling to where it’s needed. A cooling capacity usage assessment can help calculate percentages of airflow in a data center.
Where to find hot spots
Hot spots can occur in different places throughout a data center, but are most often toward the top of a rack. The earlier a data center operator can isolate a hot spot, the better the chances of preventing equipment from overheating and malfunctioning.
Methods for detecting a hot spot include:
- Feel it by walking around: Checking in front of racks while walking around the data center offers the lowest cost but least accurate method.
- Manual measurements: Meters such as plastic temperature strips, a temperature gun, and forward-looking infrared (FLIR) provide better accuracy and quantify the temperature. This is a low-cost yet effective method of locating a hot spot.
- Automatic monitoring:The best method, automatic monitoring, can display live data to illustrate the thermal conditions of the server or data center. Together with data center infrastructure management (DCIM) software such as StruxureWare, specific personnel can receive real-time alerts via email or text should temperatures reach an undesirable threshold.
Typical actions taken by data center managers after identifying hot spots include placing perforated tiles in hot aisles, positioning racks and perforated tiles close to cooling units, and blowing air across ice into cold aisles. These practices have proven to be ineffective.
Other actions, such as lowering cooling unit set point, placing pedestal fans in front of problem racks, and rolling in portable cooling units are only temporary solutions. The aforementioned practices are common but not recommended, as they do not address the main causes of hot spots: bypass and recirculation.
Examples of best practices
A good practice for fixing hot spots and saving energy involves identifying the problem load and relocating it to a lower-density rack to eliminate a hot spot. Another best practice is rack airflow management, which can eliminate hot spots via the use of blanking panels, side air distribution unit, or rack-mounted, fully ducted air supply unit.
Allowing DCIM to control airflow of cooling units is an additional best practice according to the white paper, but should only be done once adequate air management practices are implemented.