A new approach to Big Data

The amount of data we generate is growing at unprecedented levels and IT managers are struggling to find ways to structure and analyse it all

Tags: Big dataCommVault Systems Incorporated
  • E-Mail
A new approach to Big Data Simon Gregory is business development director, CommVault.
By  Simon Gregory Published  July 19, 2012

The amount of data we generate is growing at unprecedented levels and IT managers are struggling to find ways to structure and analyse it all. Simon Gregory looks at what can be done to bring our data under control.

Whilst Big Data brings with it a lot of ways to create information that offers real business value, it also presents new challenges for the IT department. It appears that there just isn’t enough time, resources or budget to manage, protect, index and retain massive amounts of unstructured data. The negative side effects of Big Data, which include risk, complexity and cost, clearly need to be met head on if the positive benefits are to win out. Unfortunately, legacy data management methods and tools aren’t up to the task of managing or controlling the data explosion.

Originally created to solve individual challenges, multiple products have been deployed to manage backup, archive and analytics and this has resulted in administrative complexity. This has created information silos and lack of reporting across these platforms ultimately reduces data visibility across an organisation and impacts the ability to introduce effective archiving strategies.

Traditional solutions also have two stages for each protection operation – scan and collection. In order to perform backup, archive and file analytic operations, each product must scan and collect files or information from the file system. Synthetic full, de-duplication and VTL solutions may have been introduced to try to reduce repository problems, but a lack of integration capabilities causes these solutions to fall short in the longer term. Typically, incremental scan times on large file systems can also require more time than actual data collection. Regularly scheduled, full protection operations then exceed back up windows and require heavy network and server resources to manage the process. It’s a vicious circle.

There is an alternative approach, which is to adopt a unified strategy that collapses data collection operations into a single solution to enable the copying, indexing and storage of data in an intelligent, virtual repository that provides an efficient and scalable foundation for e-Discovery, data mining, and retention. Such an approach also enables data analytics and reporting to be performed from the index in order to help classify data and implement archive policies for data tiering to lower cost media.

The advantages here are immediately clear. Built-in intelligent data collection classification will help to reduce scan times, which in turn allows companies to maintain incremental backup windows. Improved single pass and data collection for backup, archive and reporting also helps to reduce server load and operations. Integration, source-side de-duplication and synthetic full back up then further reduces the network load whilst a single index instantly decreases the silos of information. Instead of moving the pain point, a converged solution will create a single process that has the potential to reduce the combined time typically required to back up, archive and report by more than 50% compared to traditional methods and will deliver the simplified management tools required to affordably protect, manage and access data on systems that have become ‘too big’.

What companies should be focused on is the use of one platform that will enable those working with the information to intelligently manage and protect enormous amounts of data across a number of applications, hypervisors, operating systems and infrastructure from a single console. A policy-driven approach to protecting, storing and recovering vast amounts of data whilst automating administration will always be the best way to maximise IT productivity and reduce overall support costs. Eliminating manual processes and seamlessly tiering data to physical, virtual and cloud storage helps to decrease administration costs whilst increasing operational efficiencies, enabling IT departments to do more, with less.

A single data store would empower businesses to streamline data preservation and eliminate data redundancy during the review process which is now considered to be one of the major causes of skyrocketing data management costs. The ability to more easily navigate, search and mine data could fundamentally mean that Big Data is finally viewed as an asset to the business, not a hindrance.

Simon Gregory is business development director, CommVault.

2159 days ago
H.M.

Nice article Simon. An enterprise-ready solution worth looking at is HPCC Systems which provides a single platform that is easy to install, manage and code. Their built-in analytics libraries for Machine Learning and integration with Pentaho for great BI capabilities make it easy for companies to understand the meanings and patterns hidden in complex information and extract the value for taking action on real-time insights. For more info visit: hpccsystems.com

Add a Comment

Your display name This field is mandatory

Your e-mail address This field is mandatory (Your e-mail address won't be published)

Security code