Machine learning models need love, too

Machine learning is infusing applications with predictive power -- but unless you give machine learning models ongoing attention, that power will fade away

Machine learning models need love, too
Tim Hornyak

A shining city on a hill is a sight to behold. But you wouldn’t admire it so much if the city stopped maintaining its roads, electrical blackouts grew more frequent, electricity grew intermittent, and those gorgeous buildings started to fade under thick coats of grime.

Modern businesses are building their shiny new applications on a foundation of machine learning. For any organization that hopes to automate distillation of patterns in feeds of big data, natural language, streaming media, and Internet of things sensor data, there's no substitute for machine learning. But these data-analysis algorithms, like the glimmering city, will decay if no one is attending to their upkeep.

Machine learning algorithms don’t build themselves -- and they certainly don’t maintain themselves. Where model building is concerned, you probably have your best and brightest data scientists dedicated to the responsibility. Therein lies a potential problem: You may have far fewer data-scientist person-hours dedicated to the unsexy task of maintaining the models you’ve put into production.

Without adequate maintenance, your machine learning models are likely to succumb to decay. This deterioration in predictive power sets in when environmental conditions under which a model was first put into production change sufficiently. The risk of model decay grows greater when your data scientists haven’t monitored a machine learning algorithm’s predictive performance in days, weeks, or months.

Model decay will become a bigger problem in machine learning development organizations as their productivity grows. As your developers leverage automation tools to put more machine learning algorithms into production, the more resources you’ll need to devote to monitoring, validating, and tweaking it all. And the resources you dedicate to maintenance may not be the people who built the models in the first place, a situation that fosters inefficiencies and confusion as one group of data scientists struggles to understand exactly how another group of data scientists built their models. Even when the models have been well-documented, their growing number, variety, and complexity are likely to make maintenance more time consuming and difficult.

How can you estimate the downstream maintenance burden associated with the new models your data scientists are spinning out? In that regard, I found this recent research paper by Google’s machine learning development team fascinating.

In it, they discuss the concept of “technical debt,” which refers to the deferral of maintenance costs associated with current development efforts. They discuss how certain machine learning development practices incur more technical debt, hence entail more future maintenance, than others. According to the authors, the machine-learning-specific development debt-risk factors are diverse. They include the myriad probabilistic variables, data dependencies, recursive feedback loops, pipeline processes, configuration settings, and other factors that exacerbate the unpredictability of machine learning algorithm performance.

The more these complexities pile up, the more difficult it is to do the root-cause analyses necessary for effective maintenance. In addition, the opacity of debt-laden machine learning assets can make it very difficult to assess exactly which lines of code were responsible for any particular algorithm-driven action. This might make “algorithmic accountability” difficult -- if not impossible -- in many legal, regulatory, or compliance circumstances.

The paper’s authors don’t attempt to create a quantitative yardstick to measure the technical debt associated with machine learning development. But they provide a very useful framework for identifying which development practices you’ll want to avoid if you don’t want to be saddled later on with exorbitant maintenance costs.

You won’t be able to automate your way out of that maintenance burden. Under any scenario, tending to machine learning models demands the close scrutiny, critical thinking, and manual effort that only a highly trained data scientist can provide.

Related:

Copyright © 2016 IDG Communications, Inc.