Image Credit: VentureBeat made with Midjourney
Head over to our on-demand library to view sessions from VB Transform 2023. Register Here
Artificial intelligence — generative AI, in particular — is the talk of the town. Applications like ChatGPT and LaMDA have sent shockwaves across industries, with the potential to revolutionize the way we work and interact with technology.
One fundamental characteristic that distinguishes AI from traditional software is its non-deterministic nature. Even with the same input, different rounds of computing produce different results. While this characteristic contributes significantly to AI’s exciting technological potential, it also presents challenges, particularly in measuring the effectiveness of AI-based applications.
Below are some of the intricacies of these challenges, as well as some ways that strategic R&D management can approach solving them.
The nature of AI applications
Unlike traditional software systems where repetition and predictability are both expected and crucial to functionality, the non-deterministic nature of AI applications means that they do not produce consistent, predictable results from the same inputs. Nor should they — ChatGPT wouldn’t make such a splash if it spat out the same scripted responses over and over again instead of something new each time.
VB Transform 2023 On-Demand
Did you miss a session from VB Transform 2023? Register to access the on-demand library for all of our featured sessions.
This unpredictability stems from the algorithms employed in machine learning and deep learning, which rely on statistical models and complex neural networks. These AI systems are designed to continually learn from data and make informed decisions, leading to varying outputs based on the context, training input, and model configurations.
The challenge of measuring success
With their probabilistic outcomes, algorithms programmed for uncertainty, and reliance on statistical models, AI applications make it challenging to define a clear-cut measure of success based on predetermined expectations. In other words, AI can, in essence, think, learn and create in ways akin to the human mind … but how do we know if what it thinks is right?
Another critical complication is the influence of data quality and diversity. AI models rely heavily on the quality, relevance and diversity of the data they are trained on — the information they “learn” from. For these applications to succeed, they must be trained on representative data that encompasses a diverse range of scenarios, including edge cases. Assessing the adequacy and accurate representation of training data becomes crucial to determining the overall success of an AI application. However, given the relative novelty of AI and the yet-to-be-determined standards for the quality and diversity of data it uses, the quality of outcomes fluctuates widely across applications.
Sometimes, however, it is the influence of the human mind — more specifically, contextual interpretation and human bias — that complicates measuring success in artificial intelligence. AI tools often require this human assessment because these applications need to adapt to different situations, user biases and other subjective factors.
Accordingly, measuring success in this context becomes a complex task as it involves capturing user satisfaction, subjective evaluations, and user-specific outcomes, which may not be easily quantifiable.
Overcoming the challenges
Understanding the background behind these complications is the first step to coming up with the strategies needed to improve success evaluation and make AI tools work better. Here are three strategies that can help:
1. Define probabilistic success metrics
Given the inherent uncertainty in AI application results, those tasked with assessing their success must come up with entirely new metrics designed specifically to capture probabilistic outcomes. Success models that might have made sense for traditional software systems are simply incompatible with AI tool configurations.
Instead of focusing solely on deterministic performance measures such as accuracy or precision, incorporating probabilistic measures like confidence intervals or probability distributions — statistical metrics that assess the probability of different outcomes within specific parameters — can provide a more comprehensive picture of success.
2. More robust validation and evaluation
Establishing rigorous validation and evaluation frameworks is essential for AI applications. This includes comprehensive testing, benchmarking against relevant sample datasets, and conducting sensitivity analyses to assess the system’s performance under varying conditions. Regularly updating and retraining models to adapt to evolving data patterns helps maintain accuracy and reliability.
3. User-centric evaluation
AI success does not solely exist within the confines of the algorithm. The effectiveness of the outputs from the standpoint of those who receive them is equally important.
As such, it is crucial to incorporate user feedback and subjective assessments when measuring the success of AI applications, particularly for consumer-facing tools. Gathering insights through surveys, user studies and qualitative assessments can provide valuable information about user satisfaction, trust and perceived utility. Balancing objective performance metrics with user-centric output evaluations will yield a more holistic view of success.
Assess for success
Measuring the success of any given AI tool requires a nuanced approach that acknowledges the probabilistic nature of its outputs. Those involved in creating and fine-tuning AI in any capacity, particularly from an R&D perspective, must recognize the challenges posed by this inherent uncertainty.
Only by defining appropriate probabilistic metrics, conducting rigorous validation and incorporating user-centric evaluations can the industry effectively navigate the thrilling, uncharted waters of artificial intelligence.
Dima Dobrinsky is VP R&D at Panoply by SQream.
Welcome to the VentureBeat community!
DataDecisionMakers is where experts, including the technical people doing data work, can share data-related insights and innovation.
If you want to read about cutting-edge ideas and up-to-date information, best practices, and the future of data and data tech, join us at DataDecisionMakers.
You might even consider contributing an article of your own!
Read More From DataDecisionMakers