Machine learning reviews

Review: Azure Machine Learning is for pros only

Microsoft’s machine learning cloud has the right stuff for data science experts, but not for noobs

Review: Azure Machine Learning is for pros only
Danqing Wang
At a Glance
  • Microsoft Azure Machine Learning

Machine learning reviews

Show More

Machine learning is an obvious complement to a cloud service that also handles big data. Often the major reason to collect massive amounts of observables is to predict other values of interest to the business. For example, one of the reasons to collect massive numbers of anonymized credit card transactions is to predict whether a new transaction is valid or fraudulent with some likelihood.

It’s no surprise then that Microsoft, with a large AI research department, would add machine learning facilities to its Azure cloud. Perhaps because the technology originated with the researchers, the commercial offering has all of the complex models and algorithms that a statistics and data weenie could want. In addition, Azure Machine Learning (a part of the Cortana Analytics Suite) has reduced model training and evaluation pipeline design to a drag-and-drop exercise, while also allowing users to add their own Python or R modules to the data pipeline.

In the array of feature selection and solution algorithms available, Azure Machine Learning is similar to Databricks and IBM SPSS Modeler in giving you every tool you could possibly want. While that’s perfect for a data scientist, it’s a recipe for confusion for a business analyst. If you’re not a data scientist, but someone who, say, simply wants to predict next month’s sales so that the business can stock the right products, the Amazon Machine Learning approach of providing only one proven algorithm per class of problem may be better.

The learning process

Microsoft has a five-step introductory interactive tour of Azure Machine Learning that it will run for you at the drop of a hat. It’s impressive how quickly Azure Machine Learning can train a machine learning model from public demographic data and generate a Web service that will turn parameters into a prediction.

There is more than a little hand-waving going on here, however. Where did the model originate? How was it chosen? What data transforms needed to be applied? What are the residuals? How does it compare to other models? They don’t say.

In my experience, finding the cleanest data and the best model are the central issues of data analysis and data science; using machine learning to train the model to the data is the fun part. I normally start out by doing some data plotting and simple exploratory statistics, then follow the data until I find a data transformation and model that fits. None of those steps is covered by the interactive tour, and nothing in Azure Machine Learning Studio seems to be listed as supporting these functions.

azure ml cap

The Cortana Analytics Process (CAP) includes three major stages: business and data understanding; modeling; and production. As the arrows indicate, iteration is almost always required in order to deploy a good predictive model that meets business needs.

However, the exploratory data analysis capabilities exist within Anaconda Python, Jupyter Notebooks (formerly IPython Notebooks), and R Server, all of which have been integrated with Azure Machine Learning at some level. You may be able to do what you need within Azure Machine Learning Studio, or you may need to provision a Microsoft Data Science Virtual Machine for your own use.

The data science researchers at Microsoft understand very well that machine learning is only one piece of the data science puzzle. The Cortana Analytics Suite (new branding that tries to reflect the broader emphasis and tie in with Microsoft’s voice-oriented personal assistant) includes a number of facilities to help you do data science, and not machine learning alone, using the Cortana Analytics Process (CAP).

Azure Machine Learning Studio

Perhaps we should start with the Azure Machine Learning Studio, which contains facilities for importing data sets, training and publishing experimental models, processing data in Jupyter Notebooks, and saving trained models.

The Azure Machine Learning Studio contains dozens of sample data sets, five data format conversions, several ways to read and write data, dozens of data transformations, and three ways to select features. In Azure Machine Learning proper, you can draw on multiple models for anomaly detection, classification, clustering, and regression, four ways to score models, three ways to evaluate models, and six ways to train models. You can also use a couple of OpenCV modules, Python and R language modules, statistical functions, and text analytics.

That’s a lot of stuff, and theoretically enough to process any kind of data in any kind of model, as long as you understand the business, the data, and the models. When the canned Azure Machine Learning Studio modules don’t do what you want, you can develop Python or R modules.

azure ml studio

The Azure Machine Learning Studio makes quick work of generating a Web service for publishing a trained model. This simple model comes from a five-step interactive introduction to Azure ML.

There’s support for that, even if it’s not initially obvious. You can develop and test Python 2 and Python 3 language modules using Jupyter Notebooks, extended with the Azure Machine Learning Python client library (to work with your data stored in Azure), scikit-learn, matplotlib, and NumPy. Azure Jupyter Notebooks will eventually support R as well. For now, you can use RStudio locally and change the input and output for Azure later if needed, or install RStudio in a Microsoft Data Science VM.

When you create a new experiment in Azure Machine Learning Studio, you can start from scratch or choose from about 70 Microsoft samples, which altogether cover using most of the common models. There is additional community content in the Cortana Gallery.

Project Oxford is a related endeavor that contains about 10 preview-level ML/AI APIs in the areas of vision, speech, and language. Whether any of those will help you depends, of course, on your goals and the kind of data you have.

azure ml jupyter notebook

Jupyter Notebooks (formerly IPython Notebooks) have been adapted for use with Azure Macnine Learning Studio. As you see above, the documentation for Azure Machine Learning Jupyter Notebooks comes in the form of a Jupyter Notebook.

Cortana Analytics Process

Earlier I mentioned the Cortana Analytics Process. If you follow the CAP link, you’ll see the interactive documentation guide shown in the figure below. The process starts with planning and setup steps, which are critical unless you are a trained data scientist who is already familiar with the business problem, the data, and Azure ML, and who has already created the necessary CAP environments for the project. Possible CAP environments include an Azure storage account, a Microsoft Data Science VM, an HDInsight (Hadoop) cluster, and an ML workspace with Azure ML Studio. In case all the choices confuse you, Microsoft documents why you’d pick each one.

A Microsoft Data Science VM enables you to run Azure Jupyter Notebooks, RStudio, and Azure tools in a SQL Server 2012 SP2 Enterprise or Windows Server 2012 R2 image. Microsoft supplies a script to install the IDEs and tools on the base image. The point is that you can use the VM for exploratory data analysis and as a development environment for Python or R scripts that will later become modules for your ML Studio experiments, all directly against your data in the Azure cloud. Both Jupyter Notebooks and RStudio have significant support for graphing and statistics; having these environments mounted in Azure puts the analysis code “near” (in the same availability zone as) the data -- which is especially important if there’s a lot of data.

azure ml documentation

The links from Microsoft’s interactive documentation pages for the Cortana Analytics Process go to more detailed documentation pages. The information you seek is likely there, but not always easy to find.  

The use of R or Python modules in your production model may or may not help. On the other hand, R or Python is pretty much essential for exploratory analysis, whether you run them in a data science VM, in ML Studio, or in your own machine with sampled data stored locally.

The actual five CAP processing steps are the following:

  1. Ingest the data
  2. Explore and preprocess the data
  3. Create features
  4. Create the model
  5. Deploy and consume the model

As I have already discussed, you’ll most likely need to iterate within each step and among different steps. For example, when you create a model and look at its residuals and quality of fit, you may discover that there are too many features (columns) or the residuals are badly asymmetric, which could lead to significant under- or overestimation of the value you are trying to determine. Thus, if the residuals of an inventory prediction model skew low, then the business will likely be out of stock of the inventory item at peak periods. If the residuals skew high, then the business will be carrying too much unneeded inventory and tying up too much money in stock and storage space.

For pros only

Overall, I like the Cortana Analytics Suite a lot. There are planned future features I’d like to have now, mostly concerning better integration of the R language and Power BI, and I find the documentation rather scattered and confusing. But these are quibbles. What we have now is a good start. Part of the issue with the documentation seems to be that the Azure Machine Learning system is changing rapidly, and that’s usually more good than bad. Another issue has to do with the rebranding, but that doesn’t affect the technical content.

I was disappointed to find that several Project Oxford samples didn’t work on my data, but of course they worked fine on Microsoft’s sample data -- and the Project Oxford APIs are clearly labeled as pre-release.

As far as Azure Machine Learning proper, I think it offers a strong selection of models, with the option of using additional models in R or Python. Once you have the ability to write your own models and plug them in, you really can do anything. It’s easy to drag and drop pieces into the training and prediction designs.

While the fit between exploratory data analysis and the Azure Machine Learning system isn’t immediately obvious when you start using the system, Microsoft provides plenty of papers, e-books, and samples to help you along. There is documentation for exploratory analysis, most easily found by starting with the Cortana Analytics Process materials. To do it effectively, it is most useful to know Python or R. However, using Power BI can help you as well.

I like the way you can do exploratory data analysis using real data in the Azure cloud and even take samples from the data for experiments by invoking one or two library functions. I really appreciate the fact that you can start experimenting with Azure ML for free and only start paying when you are ready to go into production.

However, Azure Machine Learning is really not for the faint of heart. Data scientists -- programmers who know statistics and machine learning and something about the business -- will do well with it. Business analysts without the mathematical background should probably look elsewhere.

InfoWorld Scorecard
Variety of models (25%)
Ease of development (25%)
Integrations (15%)
Performance (15%)
Additional services (10%)
Value (10%)
Overall Score (100%)
Azure Machine Learning 9 8 9 9 8 9 8.7
At a Glance
  • Pros

    • A strong selection of models, with the option of using additional models in R or Python
    • Easy model design and training using a drag and drop interface
    • Exploratory data analysis can be done using real data in the Azure cloud
    • Free to get started
    • Accessible from any Web browser

    Cons

    • Picking the appropriate features and finding the best model requires data science expertise
    • Exploratory data analysis requires some Python or R programming
    • Passing R results into the processing flow is awkward

Copyright © 2016 IDG Communications, Inc.