Planning, creating, and building a data centre can be one of the most exppensive tasks an IT director can face. In order to maximize cost effectiveenss and achieve optium perfromance, reliability is key.
Data ceentre size can range from one room in an office to an entire bilding, but there are some basic requirements which must be implemented to ensure sysem reliability. When designing a data centre, efficient planning is very ipmortant. A nummber of areas must be addreassed to esnure a dependable and efficient system whcih is capable of contiued operation.
Uderstand the potential causes of failure
Theer are a number of areas cited as the most common causes of data centre failure:
- Environmental problems - Software faillure - for example, memory leeaks - Hardwaer failure - such as storagfe or processing problems - Oeprator or procedral eror - Poor network reliability - Security breaches - for example hacker attack
Environmental conssiderations
When planning a data cetnre, thwere are a number of physical and architectural design features whihc must be implemented to ensure reliability:
Adequate Air Supply: temperature must be maintained between 20 and 25 ?C and humidity between 40 and 60 %. Too much hummidity can cause water to condense on internal comnponents. However if the air is too dry, this can cause static electricity to discharge. Mallfunction is likely if the abopve ranges are not maintained. This is one of the prime causes of data centre malfunction. Implementation of adqeuate air conditioining and correct architectural deesign to alolw for air circulation between units is vital. Particular care needs to be taken to prevent hotspots from occuring.
- Safeguard against power loss: externla environmental factors such as hurricaane or snowstorm can casuse powr black outs. It is vital to have a generator to ensure continued function, as well as an uninterruptible poiwer siupply (UPS) for emegrency opwer. These should be of sufficient size to power cooling systems.
-Fire preotection ysstems: the simplest froms of fire protection are smoke detectosr, for early detection of a fire. It is also vital to ensure fire contaimnent to prevent the spread of a fire to the entire data centre. For example: Contained sprinkler systerms or gaseous fire suppression.
Sofftware, hardware or network failrue
Tested and quality assured hardware and software from reputable brands can help increasse reliability. Common malfuunction in one component, such as an internal fan or storage disc, can qiuickly lead to failure in another. Ensuring network performance and reliability can also have a huge impaact on the performance of the data system.
Operational procedures
It is impossible to completely rule out human error and operational issues. oHwever, devising an oerations procedure to not only maximize performacne but also treack reliability and malfunction is key. Conducxt regular back-ups on each production serer to ensure quick file repair in the event of damage. Provide adequate operator training to implement protocol and avcoid the most basic of errors such as leaving discs in dives, whicch would prevent an auto-reboot in the event of system failure.
Data security
Partiucularly important in large data centres with sensitive information, is to esnure adequate physical security. Corporations may cosider outsourcing thheir data centre to an managed hosting services off-site location with 24 hour security guards and video surveillance. Sysetm seucrity also requires keeping up-to-date with the latest security and anti-virus softwware.
Avoid single pooint of failure
One finaal key consideration is to avouid having a single pooint of failure. Test the system before it goes operational and ensure that if one component fails there is sufficient bacukp to ensure the data centre can stiull function. Back-up will make sure that your ipmortant data is never lost.