Without data, there would be no AI. ChatGPT is built on 570TB of data, which represents more than 300 billion words collected from publicly available information on the internet. And among those data points you can find biases, inaccuracies, and plain old lies.
AI’s success in the business world depends on the reliability of the data that powers its machine learning engine. And since AI is only as good as the data feeding it, it’s important to fight “data poisoning” with human intelligence.
What is data poisoning? We define it as the modification of accurate to become malicious, or the use of inaccurate or malicious data to feed an AI model. Data poisoning typically refers to an intentional malicious act by hackers or other bad actors. For example, someone may intentionally mislabel a stop sign as a yield sign, which could lead to accidents if the information is served up to support real-time maps or self-driving car GPS. Or someone may use the wrong word in another language as a joke to taint translation AI technology.
The dangers of data poisoning
Data poisoning is harmful because it generates false or misleading information. But it’s not only hackers that cause poor data within an AI model. Any false information that is input into a large language model (LLM) will lead to AI “hallucinations” of confident, yet wrong answers.
This could be intentional or unintentional, benign, or harmful. One unintentional healthcare example is when an AI model without an intensive data source review recognizes a pattern of patients lying down as having more severe outcomes, leading to a rise in false negatives wherein people with likely severe issues are missed.
So how can brands working with AI prevent data poisoning? It’s all about prevention before launching AI and, once deployed, proactive monitoring and fast reaction to any inaccuracies. In the contact center, QA, data annotation, and fraud detection programs are essential.
Before AI is launched, create a closed language model for AI training. Be mindful of the data that is input into the system, and keep humans in the loop with high-quality, contextual data annotation.
Data and AI ops are business necessities
There’s still a big disconnect at many brands: they know robust data pipelines and AI operations are key to realizing business value from their AI initiatives, but they lack the manpower or strategy (or both) to scale and sustain data and AI operations. That means they’re missing out on opportunities.
More than half (51%) of decision makers say data accuracy is critical to the success of AI, yet only 6% have achieved data accuracy higher than 90%, according to a 2022 report by Appen that polled more than 500 decision makers in the United States, United Kingdom, Ireland and Germany.
The contact center houses so much data that can make AI more powerful and beneficial. It could be time to rethink your approach, starting with increasing your investment in data annotation.
In the Appen report, 51% of respondents said data accuracy is critical to the way they use AI, but 39% said they struggle with a lack of qualified people or technical resources. That’s a major barrier.
The old, traditional way of hiring data annotators – crowdsourcing and paying them per annotation – no longer makes sense. This outdated approach only incentivizes annotators to work quickly, prompting them to rush and comprise the quality of annotations along the way.
With data annotation, quality is as important as quantity, especially when it comes to weeding out data poisoning. So focus on finding high-quality annotators who will perform the job with a high accuracy rate. Seek out annotators who possess vertical-specific expertise that’s relevant to your business. The more specialized your data is, the more nuanced expertise your annotators should have.
Getting high-quality annotators will take an investment of time and resources but having this human component to your AI model will pay dividends far into the future.
Prioritize the integrity of your model
Being more discerning about the annotators you hire will help keep the integrity of your model, but there are a few other things you can do to ensure AI efforts and investments pay off over the long run.
There’s nothing worse than realizing you left valuable data out of the equation at the outset and finding that you’ve got to restart your modeling from scratch. That’s why it’s critical to have your model fleshed out first; you need to know what you’re looking for and take stock of all the available data you may have – before you ever hand it over to your annotators.
Taking time to front-load the process in this way, casting a wider net from the beginning when it comes to your data, can save time-consuming and expensive headaches down the line.
The ROI of data annotation
Brands that embrace better data annotation will reap many benefits – to CX, associate experience, and their bottom lines:
- Improved data accuracy: Investing in high-quality annotators and incentivizing them based on accuracy rather than volume will improve the caliber of data informing your AI, creating better outputs and helping to inform decision making. It also helps combat data poisoning.
- Skilled experts yield better results: Finding great annotators takes an investment of time and resources. Seek out annotators who possess vertical-specific expertise that’s relevant to your business.
- Investing in training now leads to higher ROI: Strong annotators need to be subject matter experts, with deep industry experience, and it takes a comprehensive and rigorous training approach to set them up for success. The better their skillset, the more value annotators will add to your AI processes.
With AI penetrating more of the contact center space, it’s the right time to rethink how you’re approaching data.
For more on the importance of investing in data annotators, check out our Tips & Takeaways, “Achieve smarter AI with better data annotation.”