This Articles was published online at Baseline Magazine on May 3, 2011.
IBM’s Jeopardy-playing Watson computer has been hailed as a technology triumph – the ability of computers to understand human language and broad knowledge topics – not just facts and trivia but ambiguous language including puns, double entendre’s and idioms.
The technology is impressive and IBM has set its sights on many commercial applications in healthcare, financial services and customer service operations. Few organizations have the resources it took to build Watson - $3mm worth of hardware (off the shelf servers with almost 3000 processors and 1 terabyte of RAM) – not to mention millions in research. Nevertheless, the question remains, does Watson embed a solution approach that enterprises can exploit or learn from? How readily can a “Watson” be applied to the knowledge and content access problems of the typical enterprise?
A few clues lie in the nature of knowledge access and in some of the challenges that Watson team members discussed in Articless and interviews. First, here are some principles that Watson exploited:
- Watson used multiple algorithms to process information. These included the usual keyword matching algorithms of run-of-the-mill search, “temporal” (time based) reasoning that understand dates and relative time calculations, “statistical paraphrasing” an approach to convey ideas using different words, and “geospatial reasoning” – a way of interpreting locations and geographies, and various approaches to unstructured information processing.
- Watson can be characterized as “semantic search” or natural language search. That is, questions are asked in plain English as opposed to a structured query and this question is parsed into its semantic and syntactic (meaning and grammatical structure) components. The parsed question is then processed against the system’s knowledge base derived from over 200 million pages of information.
Other than keyword matching which parses terms and processes them against a dumb bag of words, more complex and powerful approaches require an underlying structure to the information. These structures are in the form of taxonomies and ontologies which tell the system how concepts relate to one another. Many organizations are beginning to build these taxonomy frameworks for purposes of e commerce, document management, intranet and knowledge base applications.
The second point is that Watson demonstrates key elements of solutions that do not assume that users know exactly how to frame questions regarding what they want. As much research on search shows, more often than not users ask ambiguous questions and expect precise results. Therefore we need to build solution that help them with the queries. These are the same approaches for structuring the information in the first place (the structures that the tools require to make sense of the data are the same ones that help guide users in their choices). Think of the new navigation/search approaches used in ecommerce sites – choosing color, size, brand, price, etc. help users find what they need and precisely navigate to specific information.
Bottom line is that tools like IBM’s Watson are a great leap forward in capabilities, but there is no free lunch – Watson’s power comes from organizing content. Tools for gaining insights and finding answers will get better as time goes on, but human judgment needs to be applied to information to develop a foundation of meaning and structure.