Managing data

nformation Lifecycle Management (ILM) is seen as the way forward in controlling the data mountain — promising faster access and lower costs

  • E-Mail
By  Peter Branton Published  January 29, 2006

|~|ILMmainbody.jpg|~|Tougher transparency demands and huge data-generating applications such as ERP and CRM have all added to the data mountain.|~|Forget all the talk in Europe about butter mountains, IT managers need only worry about the data mountain: the ever-growing mass of data that they need to store and look after. Unfortunately for them, it is a mountain that just keeps growing, rising ever higher with each new demand that gets made on the poor over-worked IT department. Sounds alarmist? Just consider some statistics. The Radicati Group, a firm of independent analysts that track the e-mail market says that by 2007, the number of corporate e-mail users will grow to 773 million — resulting in a staggering 10.3 Pbytes of global e-mail traffic each day. Meanwhile storage vendor StorageTek estimates that a bank — which processes a million cheques a day and stores it as low-resolution jpeg files — adds 55 GB in data every day. In seven years — the period for which a bank must store such information — it will accumulate a massive 137 Tbytes. Tougher transparency demands from regulators, ever-increasing usage of applications such as e-mails and growing deployment of huge data generating applications such as enterprise resource planning (ERP) and customer relationship management(CRM), all combine to add to the data mountain. The worldwide market for disk storage systems grew 13% in 2005 in revenue terms to touch US$6 billion, according to analysts IDC. The research firm estimated that storage volume had shot up by 58%. But while the bill for storing up data is headed upwards, companies are being pointed to the results of rather damning research. A report by the University of Berkeley, US, suggests that roughly 80% of data is either replicated or redundant or both. And Robin Bloor of Bloor Research, a European firm tracking IT industry, claims that 90% of data that is older than 90 days is never accessed. In short, firms are storing up more information than they need to. And they are wasting their storage resources. Research firm IDC estimates that the average server utilisation rates are a paltry 15%. Put another way, it means that enterprises buy several servers when one could have done the job. Samir Achour, technology director, EMC Middle East blames companies for applying static approaches to managing information. “This is the application, this is the server, this is the network, and so on. The outcome of this is a constant mismatch between investments in managing the information and the value of the information itself,” says Achour. Lack of consolidation of storage resources, coupled with little or no dynamic movement of data across storage devices, leads to poor utilisation of these resources. But the trouble with managing data is there is often no way of knowing why it was created in the first place. It is relatively easy to figure out when a piece of data was created and when it was last accessed. Determining the relevance or importance of data is much harder. Therefore, dismissing data merely on the basis of age doesn’t help. Storage vendors claim they have an answer: Information Lifecycle Management (ILM). ILM creates a framework that enterprises can use to assign processes and policies to, and align the business value of information with the appropriate and cost-effective storage infrastructure — from the time when the information is conceived to when it is disposed. Companies store information for a variety of reasons. Some of the information needs to be online all the time, as it is needed for day-to-functioning of the company. On the other hand, information needed by auditors and regulators can often be archived and stacked away in offline media such as tapes. The value of data also changes with time. Different types of data have a different shelf-life: an e-mail might be of little use after 30 days, but customer records, say for an insurance company, may need to be kept alive. What is needed at that particular moment or needs to be acce-ssed frequently, should be kept online. The rest of the data can be tucked away on tapes at a remote location. By executing this repetitive task automatically, the process of moving data across various media types can run more effiicently. This at its simplest is ILM. However, ILM is not the first attempt to bring sanity to information management. In 1974, IBM released a hardware-based solution to help manage storage more economically. This solution was called hierarchical storage management or HSM. It was designed to move increasingly inactive data to a hierarchy of lesser and lesser accessible media. This provided a way to automate critical processes relating to storing data and the media it was stored on. It helped cut the total pool of storage media and the administrative costs associated with managing storage. HSM was good for mainframe applications but as data complexity grew, HSM’s limitations became more apparent. “HSM would move old data automatically from faster disks to slower disks or tape for performance reasons,” says Mohamed Alojaimi, technology-marketing manager, Oracle MEA. “The problem with this solution is that it is inflexible. It needs lots of explicit checks and controls from the application people. It is also very expensive and not everyone could afford it,” he claims. While HSM based its decisions to move data to a cheaper medium based on access times, ILM uses more comprehensive policies that can move data around based on the intrinsic value of the data. IT managers can classify data and define the entire lifecycle of that data: initial storage, migration to cheaper storage as the data becomes less important, archival strategies for the data, and disaster-recovery policies. The ability to provide multi-dimensional classification of data is ILM’s biggest strength. In a hospital environment for example, it can use tools that form part of an ILM solution to assign multiple criteria — age, value, patient information, reference data such as name, address etc — to decide how it is stored, archived and backed up, at a given point in time. That means ILM is not a simple plug-and-play solution. Deployment of ILM philosophy is an enterprise-wide effort rather than an IT initiative. So, how does a company go about rolling out ILM? A firm need not unleash an ILM initiative across the organisation at one go. According to criticality or even commitment to the ILM strategy, classification can start with business processes, departments, projects or customer. But to get the most out of ILM, it will have to be deployed on a company-wide basis. The key to getting started on ILM is the classification of data. What filters are used varies from business to business. It could be the age of data or it could be the frequency with which a piece of information needs to be accessed. Transactional data — for instance, records of all the transactions that a bank’s customer has made in the past three months — will get higher priority as the customer might want to download this information from an ATM at any point in time. But it’s not a simple case of just saying ‘put all the data generated in the last three months online and park the rest over tapes and send it to some rest-house’. A lot of e-mails sitting on the corporate network, or documents that employees might have downloaded for reading during the same period might be of little value. They may be less than three-months old but still might need to be moved from expensive online disk storage. Some of this information might have no value at all and therefore, is best gotten rid off. On the other hand, some of the older information — a customer’s buying patterns for example — might be pretty useful for the marketing department. Therefore, another way to ascertain which information is more — or less — valuable to the company is by looking at various applications that house their own databases: a CRM database might have a lot stuff that may remain precious for a very long time. Assigning a value to data is a key concept underlying the ILM approach. There are several ways of doing this including giving higher priority to data generated by certain applications. However, it’s usually not an ‘either/or’ situation. Very often, a company has to use several filters and criteria to get the classification right. ||**||Classifying data|~|Boscosmain.jpg|~|Bosco Moraes of Hewlett-Packard Middle East.|~|“The foundation of an ILM implementation is the taxonomy established to classify data,” says David Beck, storage sales manager, Sun MEA. Oracle’s Alojaimi agrees: “Information lifecycle management is all about understanding and classifying your data.It’s all about putting down criteria as to how you are going to move the data, which data should be stored where and why.” Classification is done using storage resource management (SRM) tools. It provides complete visibility into both physical storage resources such as redundant array of independent (or inexpensive) disks (RAID) systems, tape libraries and storage arera network (SAN) switches and logical storage objects such as volumes, files, users, database tables and I/O. Essentially, it tells the IT managers what type of data is stored on what type of media, how it is growing and what can be done to utilse storage resources more efficiently. It also helps them plan proactively for the future. SRM tools can be used to sort and classify data by various types such as by the type of file (document, image etc) or age or even by the application that is generating the data. But they can do more than just help with classification. Policies can be defined in such a way that SRM tools can, automatically, delete files. E-mails older than an assigned period, or image files or duplicate files can be deleted or archived by SRM software based on policies. Once the data classification has been done, at least partially, users and IT staff can start negotiating their service level agreements (SLAs). IT and business should define and agree on service levels that need to be met for the different business applications and different departments in the organisation. This would cover the policies for security, accessibility, performance and availability needs for each application. “Best results are achieved by corporations, while implementing ILM, when ILM is understood as a concept providing value to both — business and IT,” says Bosco Moraes, storage business manager, Hewlett-Packard Middle East. This means that ILM requires internal selling plus a big push from top management. All of this is part of what EMC’s Achour describes as “tiering” phase of ILM deployment. But to be completely ready for this phase, an organisation must go through a storage resource consolidation exercise. The IT department has to map all the storage resources within the firm and rationalise them. This needs to happen at two levels. One is the physical level — which is what most IT departments will need to tackle first. It involves deciding what storage devices to keep and what to get rid of. A decision on which resources need to be plugged on to the network and which to keep out of it must be taken at this stage. The second level is logical. To do this, an enterprise has to virtualise its storage resources. Put simply, virtualisation means treating several entities as one. This can include treating several different storage devices as one giant pool, or it can mean grouping different virtual partitions on multiple servers. Virtualising storage, for example, means that instead of having a disk drive attached to every server inside the data centre, a disk array is shared by many servers. This is important because, while cutting down the volume of data and creating hierarchy is one part of an ILM exercise, better utilisation of storage resources is another. Done well, this can substantially reduce the storage resources thus leading to savings in both the cost of buying and maintaining the storage infrastructure, claim the vendors who insists that it delivers a payback on investments within a year. From here on, tiering is largely about applying common sense. The data or information that is required on a day-to-day basis should be kept on high performance storage media such as fibre-channel SATA disks. The size of this tier is generally kept relatively smaller as compared to the next tier — the low- cost storage tier. This is to allow for faster access to information that is required on a regular basis. The next tier typically hosts data that is not being accessed on a regular basis. Provisions could be made in the first tier so that it allows access to information in the second tier in case it is required by the application. However, access time would certainly be slower. “This kind of storage would usually utilise larger disks. These are, most of the time, cheap disks,” Alojaimi claims. The third category of storage would normally have data that is at least three years old or has not been accessed for a year. However, regulations might require the enterprise to keep this data online. This tier can be thought of as the historical storage tier. This will most likely be very large. Products such as Centera from EMC are targeted specifically at storing large amount of data such as e-mail archives, electronic documents, MRI scans, that need to be stored in large vol- umes and need to be kept online. However, this information is usually not accessed frequently — much of this is not accessed at all — but is still needed just in case. Tape, rather than disk, is often the storage media of choice for this layer. However, “strict access control policies should be put in place for access to this data and information in this tier has to be protected and should not be open to modification,” says Alojaimi. In fact, some of the products targeted at this layer come in ‘governance editions’ or include features that block alterations to data. At this stage, businesses can start to define policies by which they automate the migration of data. Most storage solutions come embedded with software that can draw out the data according to the policies defined and then migrate them to the assigned storage tier depending on the value assigned. ILM is a powerful idea. The number of believers — amongst software and hardware companies that matter — in the concept has been growing. Even IBM, after dismissing it as merely a small part of its ‘business on demand’ vision, has come around to accepting ILM as a significant framework for managing an enterprise’s storage needs. Meanwhile, work is on to fix the problem that plagues any new technology; incompatibility. The Storage Networking Industry Association (SNIA) is working towards developing standards and best practices that will aid the implementation of ILM across enterprises and different industry sectors. Those leading the ILM charge are now extending the definition of ILM. The next iteration is likely to bring network resources within the purview of ILM. “The network side becomes important because it is part of the IT infrastructure and the data centre. The applications are accessing the network just like the applications are accessing the SAN. That makes it part of the storage infrastructure,” Sami Achour believes. There is little dispute that enterprises need to look at their storage seriously. And ILM provides a compelling solution for them to go about this task. Whether they will adopt it or not is another matter. But any IT managers that still need to be persuaded about the merits of ILM, might just want to check out that data mountain and see how much it is still growing. ||**||

Add a Comment

Your display name This field is mandatory

Your e-mail address This field is mandatory (Your e-mail address won't be published)

Security code