Machine Learning in Production: From Research to the Customer
In this presentation Ameen Kazerouni, the Lead Data Scientist at Zappos, walks through the entire process of developing a proper machine learning process that can be scaled and deployed to solve business problems. Ameen began his own journey in the Biomedical Informatics but as he describes in this presentation, and what serves as an example of the power of machine learning, the skills learned in one industry are transferable vertically to other industries, in this case the shoe industry.
In the business model of Zappos, an online shoe and clothing retailer, being able to offer personalized search results for customers provides benefits in both customer experience as well as profits recognized by the company. The question and problem can easily be framed from a machine learning perspective as from a search bar with minimal input, how do you start to tell an algorithm which products to display.
At first this may seem simple but when there are thousands of products that share names and descriptions, being able to display the proper top results can quickly become complicated. Even product names and descriptions that are designed to help search algorithms display them, will often complicate the process of proper categorization rather than help. This brings us to the first step that should be considered when trying to implement any machine learning algorithm, that is understanding and formulating the problem.
The issues in this instance include understanding the context of each search query that comes in. Context Models are produced from several different algorithms where each are searching at a customer behavior level and using Natural Language Processing techniques to derive contextual results.
An additional issue included understanding the context from the Context Models as it applied to individual users in a Customer Model. This looks at the importance of different features and preferences for individual customers. But as Ameen points out, being able to develop a personalized 1 to 1 search is useless in a business model if latency is too greatly sacrificed.
The machine learning algorithms and models that they build are only useful if the information that is being processed and results being returned are completed in a time frame a customer is willing to wait for. The example, Ameen uses is that a result needs to be returned between 30 and 50 milliseconds, a very quick time.
“If you torture data enough it will tell you anything.”
This leads into a second issue that machine learning deployment faces. Machine learning algorithms are only as good as the data set that is designed to solve the problem. Oftentimes, data scientists are not good at coming up with a good feature space to begin to train a model from. They are good at squeezing accuracy percentages out of models but this does not address the real issue.
Feature engineering is a meticulous process that encompasses all of the necessary steps to formulating a proper feature space to build a model from. Documenting every feature that is being used and providing a hypothesis as to why that feature makes sense for that specific model helps solve this overall problem of creating a data set that will be able to provide an accurate prediction once trained.
Ameen and his team have been able to apply all of these necessary steps to develop a proper machine learning model. The next steps of being able to deploy it for business uses, require the proper software to be utilized which requires the proper team to be assembled. Ameen offers an interesting perspective when it comes to building the proper team. His advice, don’t try and hire the unicorns that can do it all. Instead, hire people with specialized skills that can help deploy the product into production. This avoids language dependency within a department and allows for the best tools to be used to implement a model successfully.
Ameen then begins to discuss some of the specific tools and software utilized that allow for them to implement their 1 to 1 personalized search model. The end result is they are able to expose complex algorithms over API’s that are capable of handling high throughput traffic while meeting customers facing service level agreements. By navigating these issues Zappos is able to offer personalized search results to the benefit of the customers and the company.