Q&A: MapR Technologies' Tomer Shiran on Hadoop, Myriad, Apache Drill, and Data Analytics

by Ostatic Staff - Feb. 20, 2015

 Recently, MapR Technologies, focused on Hadoop, has been out with some interesting announcements that we covered. We wrote about Myriad, an open source project focused on consolidating big data with other workloads in the datacenter, in this post. And we covered the latest release of the MapR Distribution including Hadoop in this post.

In addition to his role as Vice President of Product Management at MapR Technologies, Tomer Shiran (shown here) is founder and PMC member of the Apache Drill project through the Apache Foundation. OStatic recently did an interview with Shiran about Apache Drill, and we caught up with him for another interview about MapR Technologies and its latest news. Here are his thoughts:

 Q: Earlier this week, OStatic covered Myriad. What can organizations using Hadoop get out of Myriad?

A: Myriad paves the way for Hadoop jobs to co-exist with non-Hadoop jobs in large-scale clusters that can span across multiple data centers. With Myriad, entire data center resources can be managed as a single pool of resources, breaking down processing silos and thus improving resource utilization efficiency. The fewer moving parts lets enterprises spend more time on real work and less time on troubleshooting. Google has benefited from this type of large-scale resource scheduling, and Myriad is the open source software project that delivers this capability for everyone.

Q: A whole ecosystem of tools and enhancements is arising around Hadoop. There are tools like Myriad and tools that do next-generation types of batch processing. Which types look most promising, and why? 

A: Of course Myriad looks promising, but a project that will change the way data is queried is Apache Drill, an open source interactive SQL query engine for Hadoop and NoSQL. Modern big data applications such as social, mobile, web and the Internet of things deal with large amounts of data that are often self-describing and complex (JSON, Parquet). Apache Drill is built from the ground up for such data, providing low latency queries on rapidly evolving datasets.

Drill’s unique value comes from its capability to query data without requiring pre-defined schemas. This not only allows for instant querying on newly-ingested data in Hadoop but also avoids the constant maintenance associated with evolving schema requirements for diverse data types. No ETL process or DBA intervention is required at any stage of the data lifecycle.  That said, Drill can also leverage any defined schema in the Hive metastore.

Q: Do you have any metrics or even just anecdotal data about whether organizations are finding it hard to hire people with Big Data and Hadoop skills?

A: There is a shortage of trained big data technology and analytics experts. Labor supply constraint is a key inhibitor of adoption and use of big data technologies. Current training offerings in the marketplace do not meet the cost, convenience and flexibility needs of today’s professionals. In-person training incurs significant costs, travel, and a big contiguous block of committed time. This is why we launched free on-demand Hadoop training last month. MapR wants to enable individuals to get trained on valuable skills and increase big data adoption in the market.

Q: MapR offers a sandbox for using Hadoop. How can people without Hadoop experience get to that and benefit from it?

A: The MapR Sandbox is perfect for people without Hadoop experience.  We offer two Sandboxes for developers, administrators, and business intelligence analysts.  First, the MapR Sandbox for Hadoop provides tutorials, demo applications, and browser-based user interfaces to let developers and administrators get started quickly with a fully functional Hadoop cluster running in a virtual machine. You can download it here:  https://www.mapr.com/products/mapr-sandbox-hadoop/download .

Second, if you are a business intelligence analyst or a developer interested in self-service data exploration on Hadoop using SQL and BI Tools, we also have a MapR Sandbox including Apache Drill. The Drill Sandbox is available here: https://www.mapr.com/products/mapr-sandbox-hadoop/download-sandbox-drill

Q: How can organizations get a real sense of ROI from the use of Hadoop and other emerging data-centric tools?

A: A third-party research firm recently surveyed more than 50 of our customers and a majority of them experienced payback in less than 12 months and greater than 5X returns on their investment using the MapR Distribution.

The reason for this is the emergence of data-centric enterprises that have realized the value in architecting IT infrastructures to collect and analyze big data in real time.  This enables organizations to impact business as-it-happens through automated processes that shorten data-to-action cycles.

Hadoop is an extremely effective technology to leverage big data because it runs on lower cost commodity hardware, benefits from continuous technology innovation shared through a thriving open source development community, and raises opportunities to generate more revenue and mitigate risk.  All these benefits point toward more rapid ROI.

Q: What can we expect from MapR going forward?

A: MapR believes open-source software is extremely important, especially when coupled with our patented technology. We invest heavily in participating and contributing to OSS, furthering the viral adoption of Apache Hadoop.  We also continue to focus on delivering the best Hadoop Distribution on the market.

We’re seeing customers evolve to a data-centric enterprise where data is used as the primary influence for deploying IT infrastructures. This results in more agile and scalable applications that offer faster time-to-value. MapR lets our customers handle many data formats with multiple workload requirements, and our latest release lets them more effectively extend their reach to a global user base.

You can expect us to continue to invest heavily in our technology, including our in-Hadoop NoSQL database, MapR-DB, to support real-time operational analytics so customers can impact business as it happens. MapR will also continue to invest in engineering resources for data agility by decreasing time to value from data, including investing heavily in open source projects such as Apache Drill, Project Myriad, etc.

Editor's Note: This interview is the latest in a series of interviews with project leaders working on the cloud, Big Data, and the Internet of Things. The series has included talks with Rich Wolski who founded the Eucalyptus cloud project, Ben Hindman from Mesosphere, Tomer Shiran of the Apache Drill project, Philip DesAutels who oversees the AllSeen Alliance, and co-founder of Mirantis Boris Renski.