In a two-month collaboration with Neural Magic, Uhura Solutions successfully achieved significant progress in optimizing their AI-driven document-centric process automation platform.
Key components that were improved include:
- Optimized infrastructure costs for production by reducing required CPU cores to service workload
- Throughput speedup of 4-6x on CPUs
- Accuracy recovery of up to 99% compared to base models
The Client: Uhura Solutions
Uhura is an innovative AI platform designed to read and understand complex documents just like humans do. The company has developed an intuitive, low-code document-centric process automation platform, making AI accessible to all businesses. This platform empowers businesses to effortlessly capture both digital and scanned documents, swiftly discern and categorize them within moments, and expedite data analysis with unparalleled precision.
The Situation
The operational capacity of Uhura Solutions is built upon tasks powered by machine learning (ML) models. This entails the creation of algorithms and models capable of learning from extensive datasets and making meaningful predictions. The adoption of ML is experiencing a significant uptick across various industries due to its potential to improve operational efficiency, drive positive business outcomes, and enhance client satisfaction. Simultaneously, the deployment of these models in live production environments requires substantial computational resources, frequently achieved through specialized hardware known as graphics processing units (GPUs).
The Challenge
To attain the required level of performance, Uhura Solutions employs GPUs. As part of its commitment to client satisfaction and a constant need to upgrade their services, the company has been exploring additional avenues for deploying its solution, recognizing that not all clients have access to GPUs.
Uhura Solutions aspired to deploy ML models on more accessible, cost-efficient hardware, namely central processing units (CPUs). Achieving comparable performance on CPUs requires the application of the right model optimization techniques in conjunction with appropriate inference runtime software.
In pursuit of this business goal, Uhura Solutions sought collaboration with Neural Magic, a company dedicated to assisting clients in delivering ML innovations without introducing unnecessary complexity or additional costs. Neural Magic addresses the challenges associated with GPU limitations through the use of a software-based AI solution, the incorporation of state-of-the-art research innovations, and the utilization of standard commodity CPU hardware.
The Solution: Neural Magic
Neural Magic facilitated Uhura in achieving their desired AI-driven business outcomes by leveraging software and readily available CPUs. After jointly establishing success metrics, both teams focused on conducting experiments and integrating Neural Magic’s software into Uhura’s future production pipeline.
Neural Magic conducted investigations into one-shot and training-aware pruning and quantization techniques to establish a streamlined training and optimization pipeline. The experimentation encompassed various models and use cases, including YOLO for computer vision, BERT for natural language processing, and Donut for end-to-end document classification.
With the assistance of Neural Magic’s SparseML library, Uhura effectively incorporated cutting-edge pruning and quantization research into these models, resulting in notable inference speed improvements without compromising the model’s accuracy. DeepSparse, Neural Magic’s inference runtime, played a pivotal role in enhancing the performance of these optimized models in production, delivering GPU-like capabilities on standard x86 CPUs.
To further support Uhura’s success, Neural Magic integrated several key product workflows into SparseML and DeepSparse, specifically designed to enhance the performance of the Donut model family. In the post-engagement phase, Neural Magic and Uhura are considering the possibility of contributing the Donut model to SparseZoo, an open-source ML model repository, for the benefit of a wider audience.
The Results
Uhura Solutions partnered with Neural Magic for several key reasons. Firstly, they aimed to reduce the costs associated with deploying and running ML models. By utilizing CPUs for running models, Uhura achieved cost savings on hardware while maintaining comparable performance, with accuracy recovery rates reaching up to 99%. Secondly, Uhura needed to optimize ML models for CPU usage to enhance the speed and efficiency of their operations. This was particularly crucial as Uhura relies on real-time or near-real-time data processing to deliver its services, seeking faster model inference times to support quicker decision-making and an improved client experience. Lastly, Uhura sought to optimize ML models for CPU-based deployment to enhance operational flexibility and scalability. Deploying models on CPUs allowed Uhura to work with a broader range of hardware, including standard commodity hardware, and facilitated easier scaling of their operations. This, in turn, reduced the time required to introduce new products and services to the market.
“Through this highly collaborative engagement, we have enabled Uhura to achieve independence from GPU burdens yet maintain their desired AI-driven business outcomes. Our optimization expertise has enhanced the speed and efficiency of Uhura’s operations, supporting their client commitments.”
- Jeannie Finks, Head of Customer Success, Neural Magic
At the end of the engagement, Uhura was able to recognize the following success criteria:
- Optimized infrastructure costs for production by reducing required CPU cores to service workload
- Throughput speedup of 4-6x on CPUs
- Accuracy recovery of up to 99% compared to base models
AI is advancing rapidly, and our collaborative efforts persist. With access to continuously evolving state-of-the-art ML research, techniques, and Neural Magic software, Uhura Solutions is now well-positioned to utilize the knowledge acquired during our engagement to address both current and future business objectives, including upcoming optimization research centered around large language models.
“Uhura is positioned to elevate their ML strategy both technically and fiscally; we are proud to partner with them as we all usher in the generative AI era.”
- Jeannie Finks, Head of Customer Success, Neural Magic