Prioritize predictable performance in Hadoop

Organizations running Hadoop in production can ensure that high-priority jobs complete on time, every time

The growth of Apache Hadoop over the past decade has proven that the ability of this open source technology to process data at massive scale and allow users access to shared resources is not hype. However, the downside to Hadoop is that it lacks predictability. Hadoop does not allow enterprises to ensure that the most important jobs complete on time, and it does not effectively use the full capacity of a cluster.

YARN provides the ability to preempt jobs in order to make room for other jobs that are queued up and waiting to be scheduled. Both the capacity scheduler and the fair scheduler can be statically configured to kill jobs that are taking up cluster resources otherwise needed to schedule higher-priority jobs.

These tools can be used when queues are getting backed up with jobs waiting for resources. Unfortunately, they do not resolve the real-time contention problems for jobs already in flight. YARN does not monitor the actual resource utilization of tasks when they are running, so if low-priority applications are monopolizing disk I/O or saturating another hardware resource, high-priority applications have to wait.

As organizations become more advanced in their Hadoop usage and begin running business-critical applications in multitenant clusters, they need to ensure that high-priority jobs do not get stomped on by low-priority jobs. This safeguard is a prerequisite for providing quality of service (QoS) for Hadoop, but has not yet been addressed by the open source project.

Let’s explore the problem by considering a simple three-node cluster as illustrated in Figure 1. In this example there are two jobs in the queue ready to be scheduled by the YARN ResourceManager. ResourceManager has determined that the business-critical HBase streaming job and the low-priority ETL job can run simultaneously on the cluster and has scheduled them for execution.

hadoop resource manager — Figure 1: Simple three-node cluster with two jobs in the YARN ResourceManager queue.

Figure 2 illustrates a runtime situation on this cluster without QoS, where YARN has determined that the cluster has sufficient resources to run a low-priority job and a business-critical job simultaneously. In most situations there is an expectation that the business-critical job will complete within a certain period of time defined by a service-level agreement (SLA). The low-priority job on the other hand has no such expectation and can be delayed in favor of the higher-priority job.

hadoop disk contention lg — Figure 2: High-priority job slowed by low-priority job due to disk I/O contention.

In this scenario the low-priority job starts accessing HDFS; soon after, the business-critical job needs access to the same data location in HDFS. The read and write requests from the two jobs are interleaved so that the business-critical job has to wait when the low-priority job has control of the disk I/O. Now, in this small-scale example, this wait time will very likely not result in a significant delay or compromise the SLA guarantee of the business-critical job. However, in a multinode Hadoop deployment, low-priority workloads could easily pile up and compete for hardware access, resulting in unacceptable delays in execution time for high-priority workloads.

There are a few solutions to this problem. One is to have separate clusters for business-critical applications and for low-priority applications. This is a commonly recommended best practice, and it would seem to be a perfectly logical solution to the problem of guaranteeing QoS. The downside of this approach is wasted capacity and additional overhead in maintaining multiple clusters. Another way to “guarantee QoS” is to keep a single cluster but manually restrict low-priority jobs to certain time periods in which the cluster operator does not schedule high-priority jobs. In practice, companies often find these approaches to be unworkable or too complex to manage.

A more effective solution to the problem of resource contention is to monitor the hardware resources of each node in the cluster -- in real time -- in order to understand which job has control over resources (in this instance, disk I/O). This real-time awareness, along with knowledge of the priority levels of each job across the cluster, can be used to force the low-priority jobs to relinquish control over hardware resources that are needed by high-priority jobs. This dynamic resource prioritization ensures that all jobs get access to cluster hardware resources in an equitable manner so that business-critical jobs can finish on time.

Much of the focus and attention in the Hadoop open source community has gone toward making Hadoop easier to deploy and operate, but there are technologies available to address this real-time performance bottleneck. My company, Pepperdata, has developed a solution that provides real-time, second-by-second monitoring in the cluster for an accurate view of the hardware resources consumed by each task running on every node. Using this information, Pepperdata can algorithmically build a global, real-time view of the RAM, CPU, disk, and network utilization across the cluster and dynamically reallocate these resources as needed. In contrast to the YARN ResourceManager, which controls when and how jobs are started, Pepperdata controls hardware usage as jobs are running.

With a simple cluster configuration policies file, an administrator can specify how much of the cluster hardware to provide to a particular group, user, or job. Pepperdata senses resource contention in real time and dynamically prevents bottlenecks in busy clusters, slowing low-priority jobs so that high-priority jobs meet SLAs and allowing numerous users and jobs to run reliably on a single cluster at maximum utilization. Pepperdata looks at real-time resource allocation (of jobs currently in flight) in the context of pre-set prioritization, and determines which jobs should have access to hardware resources in real time.

Job performance is enforced based on priority and current cluster conditions, eliminating fatal contention for hardware resources and the need for workload isolation. The software collects 200 metrics related to CPU, RAM, disk I/O, and network bandwidth, precisely pinpointing where problems are occurring so that IT teams can quickly identify and fix troublesome jobs. Because Pepperdata measures actual hardware usage in a centralized Hadoop deployment, the software also enables IT to accurately track and allocate costs associated with shared cluster usage per department, user, and job. By guaranteeing stable and reliable cluster performance, Pepperdata ensures QoS for the cluster.

Sean Suchter is the CEO and co-founder of Pepperdata. He was the founding GM of Microsoft’s Silicon Valley Search Technology Center, where he led the integration of Facebook and Twitter content into Bing search. Prior to Microsoft, he managed the Yahoo Search Technology team, the first production user of Hadoop. Sean joined Yahoo through the acquisition of Inktomi, and holds a B.S. in Engineering and Applied Science from Caltech.

New Tech Forum provides a venue to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to newtechforum@infoworld.com.

Next read this: