Machine Learning Proof of Concept to Production: Part One

Moving Your ML Proof of Concept to Production Part 1: From Business Goals to ML Metrics

14 Aug, 2024. 5 minutes read

Machine Learning Proof of Concept to Production: Part One

Topic

A.I.

Distilling Goals into Business Metrics

Businesses usually decide to introduce ML into their processes for one of several overarching reasons. Typically, these reasons include improving revenue by increasing the productivity of the team, elevating the success rate of some particular aspect of the business, improving customer outcomes by reducing the time to respond to incoming information, or reducing costs associated with errors or other waste. These reasons tie into more fine-grained business metrics that should be identified when starting an ML project. For example, if the goal is to increase revenue by enhancing the team or company’s capabilities, then relevant metrics for measuring the current state of productivity might relate to sales and marketing metrics such as quota attainment rate, cost per lead, or net sales revenue, among others. For improving customer success, relevant metrics may include churn, customer satisfaction scores, or rate of lost deals. Most importantly, identifying these business needs via correct metrics is crucial when developing projects rather than identifying solutions without understanding the concrete “why” behind their necessity. Deciding to implement an ML process for triaging support tickets or summarizing long documents won’t make sense unless the impact of doing so is effectively measured as a baseline.

Matching Business Metrics to ML Equivalents

Once appropriate business metrics have been chosen for the project based on the overall goals, they should be matched to one or more ML metrics based on the tasks identified for ML to tackle. Typically, a cluster of metrics for the ML side will be needed to capture an improvement on the business side. For example, suppose the goal is to increase productivity by accelerating a team working on a particular process, perhaps quality assurance of software pre-deployment. In that case, the business metrics might be the time from start to end of testing, the number of elements tested, and the number of tests run per session. However, adjacent and equally important metrics include the number of bugs caught pre-deployment and the number of previously passing tests that now fail. When establishing an ML process to handle some of the quality assurance team’s work, the metrics are not just latency or how fast the model can assess expected versus actual output for discrepancies but also model accuracy—specifically, the number of false positives and negatives. Because mistakes that the model makes may need to be assessed by a human, too many flagged issues will add to the team’s workload, and too many missed issues will result in work upstream for the development team. Alongside having a benchmark of the team’s current performance, based on the business metrics discussed, the baseline accuracy is also useful for assessing whether ML performs at least as well to be considered valid.

Sometimes, depending on the use case or task and the type of output required, the ML metrics may not be as straightforward as accuracy or error rate. However, some way of measuring how well an ML process is performing usually exists, as long as one can measure how well things work without it. Natural language processing (NLP) is an example of one domain with less direct metrics. For instance, tasks like summarizing text or generating content appear to be difficult to assess on the surface. Still, if some time is spent to build a dataset of input text and examples of the desired output, then metrics such as Recall-Oriented Understudy for Gisting Evaluation (ROUGE) can be used. This and other metrics measure things like the overlap of words between desired and actual output to account for the fact that no single “correct” answer exists; rather, answers vary according to levels of correctness.

Building the ML POC Roadmap

Finally, with the ML metrics defined, the process of building a brief but detailed roadmap of the POC implementation and experimentation approaches can start. This should involve dividing the whole task into smaller, simpler pieces that are quick to validate in terms of success with an ML model, which could include triaging urgent versus non-urgent support tickets instead of a more complicated set of classifications. Additionally, a preliminary literature review or prior art search should be conducted to identify a series of approaches with increasing complexity in case the simpler options fail at the task. Implementing this strategy starts with off-the-shelf models or those available through a third-party application programming interface (if allowable in the business context), then moves to architectures that can be adjusted by fine-tuning or retraining for the particular task, and finishes with those that need to be implemented from scratch with perhaps more complicated training regimes. The costs for each approach can be approximated to provide a fair assessment of the return on investment (ROI) for the ML project based on the expected value. Doing so highlights whether some POC directions will be prohibitively expensive to try and helps set the cutoff point for abandoning the project if the best performance achieved is insufficient.

Conclusion

Introducing ML into a company’s processes or products helps achieve overarching business goals like improving productivity or reducing costs and errors. However, to ensure success, these ML projects need to start on the right foot. Measuring impact via the appropriate business metrics is key, followed by identifying the right ML metrics to show that developed models will provide the expected value because they work as well as required. Once these details are established, building a roadmap to capture the direction of POC implementation will keep the project on track. This roadmap will also ensure that ROI versus effort is tracked so that a sufficient point of abandonment can be identified that balances the two. These best practices are the first of several required to transition from business goals to ML POC to production.

Follow this AI blog series to learn the other important stages involved in developing an ML POC in an agile but robust fashion and how to put the resulting outputs into production. Future blogs will cover identifying and establishing a dataset for the project, setting up experimentation tooling, developing resources and approaches for POC-building, including open-source models, creating guidelines and focus points when extending to a production-ready version, as well as considering what to anticipate and monitor post-deployment.

Search for articles and topics on Wevolver

"artificial intelligence"

Topic