Webinars / Blog Home

The Machine Learning Lifecycle

By August 24, 2020

  • facebook
  • twitter
  • pinterest
  • linkedin

During this webinar, industry expert Diego Oppenheimer discussed the machine learning lifecycle and the 10 Measures and KPIs of ML Success We’ve included a short transcription of the webinar, beginning at 10:55 of the webinar.

Diego Oppenheimer, Algorithmia: So let's talk a little bit about the machine learning lifecycle and, more importantly, how it differs maybe from other things that we've seen before. So the one thing that is really interesting, if we think about data scientists as software engineers to a certain degree, they are the fastest moving software engineers that we've ever seen in industry. And the reason why I say that is the iterative process. It is a process of constant iteration where data changes, model changes, conditions change and those become extremely important. So you have to have a process in this life cycle that actually allows for that iteration. And so we think about two big areas to that. 

One which is, how do we actually develop these models? The training, the data prep, the experimentation. How do we manage that? How do you actually look at that perspective and scale that out? And then you move over to the world of production operations and deployment, which is how do you actually manage these things in production?

One of the interesting aspects of this that I think is particularly relevant is, while the data analytics stack and I talked a little bit about BI analytics, fact is, we're pretty self contained in the sense that, you prepared data you had your data warehouses you would create your cubes and measures and then you deploy your dashboards. The end result was in a dashboard and so it could be fully contained. 

Now, in the world of machine learning, the final result goes into an application. So we have a whole new set of requirements now. To get an application into production, there's a whole devops software development lifecycle that exists. So a very important part of thinking about machine learning at scale and how to get there is how do we merge the world of that machine learning lifecycle into the traditional software development life cycle that exists as an organization? 

There's a lot of overlap from a devops perspective, but there's also a lot of unique things. So those are some of the things I'm going to talk about and how to measure those risks and KPI's today. So while in the world of model development, you're going to be doing iteration, changing of data, experimentation, hyper parameter optimization, experiment tracking, which is the management side. You also have to have governance and security around that. Are you accessing the right data? Are you training models with the data you're supposed to be? Do you have controls around audit? Do you have controls around explainability and observability of the training part of that life cycle? 

Once you've actually gotten to a result and a model that can go into an application, then you start thinking, okay, how am I going to connect to the data services that are providing that data for inference? How am I going to be able to register those models into a controllable, observable, manageable framework? And then how do I actually scale that? How do I make sure that we have the right back ops? The right up times? The right SLAs? The right monitoring? The right reporting? Right infrastructure management? Who can access those models? Do you have the proper infosec compliance so that these models can actually go into production using the library? Is the tensorflow library you’re using approved by your security team? Do you have the proper Regulatory Compliance set up so that if one day a regulator comes in for your life sciences workflows or your financial services workflows, you can actually provide them full audit control backwards around that. 

Remember that what connects to these machine learning models is applications. So these things will actually be exposed in applications that have their own life cycle, their own refresh cycle, and they have their own conditions around reporting and stuff like that.

Learn more and watch the full video on YouTube:


Recent Posts