Machine learning reviews

Review: 6 machine learning clouds

Amazon, Microsoft, Databricks, Google, HPE, and IBM machine learning toolkits run the gamut in breadth, depth, and ease

Contributor, InfoWorld |

At a Glance

Amazon Machine Learning
Microsoft Azure Machine Learning
Databricks with Spark 1.6
Hewlett Packard Enterprise Haven OnDemand
IBM Watson and Predictive Analytics

Machine learning reviews

What we call machine learning can take many forms. The purest form offers the analyst a set of data exploration tools, a choice of ML models, robust solution algorithms, and a way to use the solutions for predictions. The Amazon, Microsoft, Databricks, Google, and IBM clouds all offer prediction APIs that give the analyst various amounts of control. HPE Haven OnDemand offers a limited prediction API for binary classification problems.

Not every machine learning problem has to be solved from scratch, however. Some problems can be trained on a sufficiently large sample to be more widely applicable. For example, speech-to-text, text-to-speech, text analytics, and face recognition are problems for which "canned" solutions often work. Not surprising, a number of machine learning cloud providers offer these capabilities through an API, allowing developers to incorporate them in their applications.

These services will recognize spoken American English (and some other languages) and transcribe it. But how well a given service will work for a given speaker will depend on the dialect and accent of the speaker and the extent to which the solution was trained on similar dialects and accents. Microsoft Azure, IBM, Google, and Haven OnDemand all have working speech-to-text services.

There are many kinds of machine learning problems. For example, regression problems try to predict a continuous variable (such as sales) from other observations, and classification problems attempt to predict the class into which a given set of observations will fall (say, email spam). Amazon, Microsoft, Databricks, Google, HPE, and IBM provide tools for solving a range of machine learning problems, though some toolkits are much more complete than others.

In this article, I'll briefly discuss these six commercial machine learning solutions, along with links to the five full hands-on reviews that I've already published. Google's announcement of cloud-based machine learning tools and applications in March was, unfortunately, well ahead of the public availability of Google Cloud Machine Learning.

A brief history of AI

Artificial intelligence (AI) has a checkered history. Early work was directed at playing games (checkers and chess) and proving theorems, then the field moved on to natural language processing, backward chaining, forward chaining, and neural networks. After the "AI winter" of the 1970s, expert systems became commercially viable in the 1980s, although the companies behind them didn't last long.

In the 1990s, the DART scheduling application deployed in the first Gulf War paid back DARPA's 30-year investment in AI, and IBM's Deep Blue defeated chess grand master Garry Kasparov. In the 2000s, autonomous robots became viable for remote exploration (Nomad, Spirit, and Opportunity) and household cleaning (Roomba). In the 2010s we've seen a viable vision-based gaming system (Microsoft Kinect), self-driving cars (Google), IBM Watson defeating two past "Jeopardy" champions, and a victory against a ninth-dan ranked Go champion (Google AlphaGo).

Natural language has reached the point where we take Apple Siri, Google Now, and Microsoft Cortana for granted when talking to (or typing at) our phones. Finally, years of research in computational learning theory and training algorithms for pattern recognition and optimization against historical data have paid off in the field of machine learning.

Amazon Machine Learning

Amazon has tried to put machine learning in easy reach of mere mortals. It is intended to work for analysts who understand the business problem being solved, whether or not they understand data science and machine learning algorithms.

In general, you approach Amazon Machine Learning by first cleaning and uploading your data in CSV format in S3; then creating, training, and evaluating an ML model; and finally by creating batch or real-time predictions. Each step is iterative, as is the whole process. Machine learning is not a simple, static magic bullet, even with the algorithm selection left to Amazon.

Amazon Machine Learning supports three kinds of models -- binary classification, multiclass classification, and regression -- and one algorithm for each type. For optimization, Amazon Machine Learning uses Stochastic Gradient Descent (SGD), which makes multiple sequential passes over the training data and updates feature weights for each sample mini-batch to try to minimize the loss function. Loss functions reflect the difference between the actual value and the predicted value. Gradient descent optimization works well for continuous, differentiable loss functions only, such as the logistic and squared loss functions.

For binary classification, Amazon Machine Learning uses logistic regression (logistic loss function plus SGD).

For multiclass classification, Amazon Machine Learning uses multinomial logistic regression (multinomial logistic loss plus SGD).

For regression, Amazon Machine Learning uses linear regression (squared loss function plus SGD).

amazon ml model report — After training and evaluating a binary classification model in Amazon Machine Learning, you can choose your own score threshold to achieve your desired error rates. Here we have increased the threshold value from the default of 0.5 so that we can generate a stronger set of leads for marketing and sales purposes.

Amazon Machine Learning determines the type of machine learning task solved from the type of the target data. For example, prediction problems with numerical target variables imply regression; prediction problems with non-numeric target variables are binary classification if there are only two target states, and multiclass classification if there are more than two.

Choices of features in Amazon Machine Learning are held in recipes. Once the descriptive statistics have been calculated for a data source, Amazon will create a default recipe, which you can either use or override in your machine learning models on that data.

Once you have a model that meets your evaluation requirements, you can use it to set up a real-time Web service or to generate a batch of predictions. Bear in mind, however, that unlike physical constants, people’s behavior varies over time. You’ll need to check the prediction accuracy metrics coming out of your models periodically and retrain them as needed.

Azure Machine Learning

In contrast to Amazon, Microsoft tries to provide a full assortment of algorithms and tools for experienced data scientists. Thus, Azure Machine Learning is part of the larger Microsoft Cortana Analytics Suite offering. Azure Machine Learning also features a drag-and-drop interface for constructing model training and evaluation data flows from modules.

The Azure Machine Learning Studio contains facilities for importing data sets, training and publishing experimental models, processing data in Jupyter Notebooks, and saving trained models. Machine Learning Studio contains dozens of sample data sets, five data-format conversions, several ways to read and write data, dozens of data transformations, and three options to select features. In Azure Machine Learning proper, you’ll find multiple models for anomaly detection, classification, clustering, and regression; four methods to score models; three strategies to evaluate models; and six processes to train models. You can also use a couple of OpenCV (Open Source Computer Vision) modules, statistical functions, and text analytics.

That’s a lot of stuff, theoretically enough to process any kind of data in any kind of model, as long as you understand the business, the data, and the models. When the canned Azure Machine Learning Studio modules don’t do what you want, you can develop Python or R modules.

You can develop and test Python 2 and Python 3 language modules using Jupyter Notebooks, extended with the Azure Machine Learning Python client library (to work with your data stored in Azure), scikit-learn, matplotlib, and NumPy. Azure Jupyter Notebooks will eventually support R as well. For now, you can use RStudio locally and change the input and output for Azure later if needed, or install RStudio in a Microsoft Data Science VM.

When you create a new experiment in Azure Machine Learning Studio, you can start from scratch or choose from about 70 Microsoft samples, which cover most of the common models. There is additional community content in the Cortana Gallery.

azure ml studio — The Azure Machine Learning Studio makes quick work of generating a Web service for publishing a trained model. This simple model comes from a five-step interactive introduction to Azure Machine Learning.

The Cortana Analytics Process (CAP) starts with some planning and setup steps, which are critical unless you are a trained data scientist who's already familiar with the business problem, the data, and Azure Machine Learning, and who has already created the necessary CAP environments for the project. Possible CAP environments include an Azure storage account, a Microsoft Data Science VM, an HDInsight (Hadoop) cluster, and a machine learning workspace with Azure Machine Learning Studio. If the choices confuse you, Microsoft documents why you’d pick each. CAP continues with five processing steps: ingestion, exploratory data analysis and pre-processing, feature creation, model creation, and model deployment and consumption.

Microsoft recently released a set of cognitive services that have "graduated" from Project Oxford to an Azure preview. These are pretrained for speech, text analytics, face recognition, emotion recognition, and similar capabilities, and they complement what you can do by training your own models.

Databricks

Databricks is a commercial cloud service based on Apache Spark, an open source cluster computing framework that includes a machine learning library, a cluster manager, Jupyter-like interactive notebooks, dashboards, and scheduled jobs. Databricks (the company) was founded by the people who created Spark, and with Databricks (the service), it's almost effortless to spin up and scale out Spark clusters.

The library, MLlib, includes a wide range of machine learning and statistical algorithms, all tailored for the distributed memory-based Spark architecture. MLlib implements, among others, summary statistics, correlations, sampling, hypothesis testing, classification and regression, collaborative filtering, cluster analysis, dimensionality reduction, feature extraction and transformation functions, and optimization algorithms. In other words, it’s a fairly complete package for experienced data scientists.

databricks model — This live Databricks notebook, with code in Python, demonstrates one way to analyze a well-known public bike rental data set. In this section of the notebook, we are training the pipeline, using a cross validator to run many Gradient-Boosted Tree regressions.

Databricks is designed to be a scalable, relatively easy-to-use data science platform for people who already know statistics and can do at least a little programming. To use it effectively, you should know some SQL and either Scala, R, or Python. It's even better if you're fluent in your chosen programming language, so you can concentrate on learning Spark when you get your feet wet using a sample Databricks notebook running on a free Databricks Community Edition cluster.

InfoWorld Scorecard	Variety of models (25%)	Ease of development (25%)	Integrations (15%)	Performance (15%)	Additional services (10%)	Value (10%)	Overall Score (100%)
Amazon Machine Learning	8	9	9	9	8	9	8.7
Azure Machine Learning	9	8	9	9	8	9	8.7
Databricks with Spark 1.6	10	9	9	9	8	9	9.2
HPE Haven OnDemand	7	8	8	8	7	8	7.5
IBM Watson and Predictive Analytics	10	9	9	9	9	8	9.2

At a Glance

Amazon Machine Learning
Pros
- Amazon Machine Learning service simplifies model selection by doing it for you
- Offers real-time and batch predictions from a model
- Service presents appropriate graphs and diagnostics for the model, where and when you need them
- Able to process training data from S3, RDS MySQL, and Redshift
- Service automatically does some textual processing
- API can be used from Linux, Windows, or Mac OS X
Cons
- Exploratory data analysis is outside the scope of the machine learning service
- The machine learning service doesn’t allow the analyst to tinker with the algorithms
- Does not import or export models
Microsoft Azure Machine Learning
Pros
- A strong selection of models, with the option of using additional models in R or Python
- Easy model design and training using a drag and drop interface
- Exploratory data analysis can be done using real data in the Azure cloud
- Free to get started
- Accessible from any Web browser
Cons
- Picking the appropriate features and finding the best model requires data science expertise
- Exploratory data analysis requires some Python or R programming
- Passing R results into the processing flow is awkward
Databricks with Spark 1.6
Pros
- Makes it almost effortless to spin up and scale out Spark clusters
- Provides a wide range of ML methods for data scientists
- Offers a collaborative notebook interface using R, Python, or Scala, and SQL
- Free to start and inexpensive to use
- Easy to schedule jobs for production
Cons
- Not as easy to use as a BI product, although it integrates with several BI products
- Assumes that the user is familiar with programming, statistics, and ML methods
Hewlett Packard Enterprise Haven OnDemand
Pros
- Strong document format conversions
- Strong enterprise search capabilities
- Reasonably priced
Cons
- Some services aren’t quite cooked
- Some services have limited scope, restricting their utility
IBM Watson and Predictive Analytics
Pros
- SPSS Modeler offers a wide variety of models in a point-and-click application
- The Bluemix Predictive Analytics Web service works well at a reasonable price
- Watson Bluemix services offer good, reasonably priced capabilities for developers
- IBM Watson Analytics uses natural language to make modeling easier for the relatively untrained
Cons
- SPSS Modeler is pricey by current standards
- Bluemix Predictive Analytics Web service requires SPSS models
- IBM Watson Analytics tries too hard to be easy to use

1 2 Page 1 Next

Page 1 of 2

Machine learning reviews

Currently reading

Review: 6 machine learning clouds

Review: Azure Machine Learning is for pros only

Man in blue suit reaching out to virtual gears for productivity

Review: Amazon puts machine learning in reach

big data analytics analysis thinkstock 673266772 100749739 orig

Review: Databricks makes big data dreams come true

Review: IBM Watson strikes again

Review: HPE’s machine learning cloud overpromises,...

Haven OnDemand’s enterprise search and format conversions are the strongest services, while more...

Amazon Machine Learning

Microsoft Azure Machine Learning

Databricks with Spark 1.6

Hewlett Packard Enterprise Haven OnDemand

IBM Watson and Predictive Analytics

Machine learning reviews

Amazon Machine Learning

Azure Machine Learning

Databricks

Amazon Machine Learning

Pros

Cons

Microsoft Azure Machine Learning

Pros

Cons

Databricks with Spark 1.6

Pros

Cons

Hewlett Packard Enterprise Haven OnDemand

Pros

Cons

IBM Watson and Predictive Analytics

Pros

Cons

Machine learning reviews