This article by Seth Earley was originally published on MDM.COM.
Machine learning and AI programs run on data. The quality and reliability of that data is a critical ingredient to your formula for leveraging AI. The old “garbage in/garbage out” saying still applies no matter how advanced the algorithm.
There have been many misconceptions regarding AI that have impacted the success of these projects. For AI projects, having the correct “training data” is critical to a positive outcome. Many projects go over budget or are not completed on time due an underestimation of the time needed to train the algorithm or the inability to access the correct data.
Here are five misconceptions about data and AI projects:
At the height of AI hype, many vendors of AI technology claimed that their algorithms could ingest data that was incomplete or of poor quality and were smart enough to find patterns and make predictions even if the data was in poor shape. This is simply not the case. It is true that some algorithms can help with data quality but those use cases are highly specific and still require the right “reference data” that the system could use to train and find or correct issues with operational data.
Context is as important for AI as it is for people. Just like people need to orient when looking for answers (you don’t look for iPhone solutions in a car repair manual) the data source for AI requires curation and context. If we are building a question-answering system for a consumer, it does not make sense to ingest complex engineering documents. When IBM trained Watson to play Jeopardy!, ingesting some data sources reduced performance. More data was not necessarily helpful. The program required carefully selected data.
There are some very limited use cases where a chatbot can be turned on out of the box. However, chat bots and IVAs need the same training that a human needs. You would never drop a new hire into a support role without training. The AI needs the same. Any meaningful functionality will be powered by your knowledge and data sources and those sources require the correct format and structure to be retrieved by a cognitive assistant. Chat bots are a channel – to knowledge, content and information.
In many projects, IT is left with addressing data problems that arise from business processes and business decisions. Imagine that salespeople will not enter data into a CRM system. That is not something that IT can solve since it is a business process issue. It cannot be simply outsourced to a low-cost offshore provider. Data needs to be owned by the business and support business goals. IT is the enabler but cannot own business data.
Data governance is more important than ever. What data is owned by the organization? What can be done with it? What are the data sources and how is it being consumed or translated by other systems and processes? How well is data being leveraged to produce value for the enterprise and the customer? How can data issues be addressed and remediated? The data infrastructure of the organization is essential. Investments need to be prioritized and results measured. Strong data governance helps get the organization’s data house in order.
The future belongs to organizations that can best merge their processes, business value and customer relationships with advance AI capabilities. Data is critical and in fact is more important than the algorithm. Getting your data house in order needs to be a priority with board-level attention and funding commensurate with the scale of the enterprise and data challenges. That will be a formula for success.