Big data infrastructure: DIY or buy?

Samina Rizwan, ECEMEA Big Data Business Development Snr Director, Oracle, discusses what the best approach is for Big Data infrastructure.

Tags: Big dataCloud computingOracle CorporationOracle Middle East
  • E-Mail
Big data infrastructure: DIY or buy?
By  Samina Rizwan Published  September 12, 2016

Building your own big data capabilities from scratch does seem tempting to many companies, especially in some countries across Middle East and Africa which pride themselves on their technically skilled workforce and are able to undertake challenging technical projects. They see it as a way to tailor the technology to their specific business needs and to give their IT department complete oversight and control of the processes and capabilities.

There is some truth in this, but one must consider not just short term but long term scenarios as well.  In fact, the approach is fraught with challenges that could mean business goals take a long time to be delivered, are only partially met, or aren't achieved at all. We must keep in mind that the IT function exists to serve the business; if business goals are delayed or remain unaddressed; IT's contribution to the health of the business remains questionable.

Here we explore the challenges of taking the DIY route with big data and how buying the capabilities packaged together on a pre-built system can overcome them.

Time to value

It's the classic IT service dilemma; how can we do more with less? This is particularly true when discussing big data, which for many companies is a new set of capabilities which delivers valuable insights but can often be challenging for some organisations to execute.

When tackling big data for the first time, most IT departments will have little experience or expertise to apply. Added to this is the complexity of building capabilities with technology from a range of vendors which results in the challenge multiplying ten-fold.  

It's not just the building of the infrastructure that needs to be considered but also the need to evaluate, test, develop, integrate and tune the environment. This can take considerable time especially with the added issues around lack of skills, knowledge and experience.

The challenges associated with finding the talent pool with the right skills and expertise has a longer term impact.  The IT workforce is known to be transient.  People change jobs within short periods, thus there's always a possibility that resources engaged in the big data environment will no longer be part of the team after a few years.  Losing expertise during the big data evolution phase is a risk that must be mitigated for benefits to be reaped in a timely and effective manner. There may also be a basic lack of resources to devote to the task, meaning the project takes longer to complete, delaying the business benefits that big data is meant to deliver - also known as ‘time to value'.

And if there are problems with the implementation of the new technology, this could lead to further complications and delays that will require numerous hours and investment to correct.

For example, poor network performance can often take months to solve when taking the DIY approach, especially when taking into account the multiple vendors involved. With the pre-built approach, a single vendor will take responsibility to track down and fix the issue much more quickly.

For many businesses, the ‘buy' approach will be the better option. Similarly, by investing in a suite of big data technologies packaged together, either deployed on-premise or via the cloud, companies can address all of the time to value challenges.

With such an approach, where a solution is engineered to work together, the technology is already tested, integrated and optimised for the task, with the vendor providing the expertise and support that may be lacking in the IT department, and which will evolve with the technology.

And while even well thought-out DIY implementations take months to become production-ready, in theory, Oracle's Big Data Appliance can be up and running on-premise in a matter of hours.

Counting the pennies

Some organisations may assess the DIY approach as a means to reduce capex. They hold the opinion that a vendor charges a premium for the expertise and effort it puts into packaging the technology for an engineered system, and that such premium is unnecessary.  By choosing the least upfront cost for each component of the big data stack, and by applying in-house or multi-vendor skills, they expect to "pay less".   

But, according to research by the Enterprise Strategy Group (ESG) and commissioned by Oracle, taking the pre-built approach when ramping up your big data capabilities is likely to result in substantial capex and apex savings.

For a medium-sized Hadoop-oriented big data project, ESG found that a pre-built system, like the Oracle Big Data Appliance, could be around 45% cheaper than the DIY equivalent.

As an example of the savings that a pre-built solution provides, Oracle includes the annual subscription licence for Cloudera Enterprise as part of the fully tested and integrated hardware and software solution, whereas, buying the same Cloudera licence separately would incur an annual fee, increasing overall cost of ownership.

By taking the ‘buy' approach, Belgian media group De Persgroep was able to deploy its big data project in three months. The Big Data Appliance also proved to be more cost-effective than an internally-built Apache Hadoop cluster which would have required multiple servers and software licences, as well as greater maintenance resources.

De Persgroep analysed customer behaviour, such as website interactions and payment behaviour, so that it was able to predict subscription churn for its newspaper business with an accuracy of 92%.

Future proofing

Such is the speed of development of open source big data technologies that in order for organisations to remain at the cutting edge of these, they must continuously evaluate and integrate new open source projects whilst delivering enterprise grade platforms and services. For example, there is currently a trend towards the Apache Spark cluster computing framework. This shift means significant migration and integration activity for Hadoop users to ensure that relevant technology is applied.

This task is easily addressed in the Oracle Big Data stack which is engineered to work together.  Cloudera's distribution for Hadoop is part of Oracle's Big Data Appliance, so the technology can be easily and quickly updated as it evolves.  Testing, integration and support efforts are part of the services that Oracle delivers.   The cloud-ready nature of Oracle's capabilities also means that organisations can easily test their big data capabilities in the cloud, and then migrate the services on-premise if and when they feel the time is right. In contrast, the DIY approach will make this a hugely complicated and time-consuming process.

Add a Comment

Your display name This field is mandatory

Your e-mail address This field is mandatory (Your e-mail address won't be published)

Security code