Artificial intelligence (AI) promises to deliver enterprises higher efficiency, increased accuracy and greater utilization of corporate information assets.
But these promises can only come true if the AI is built on a solid information architecture.
While many would like to believe that AI is combination of magic and pixie dust, for human-to-machine conversations to become reliable, a significant amount of foundational effort must take place. And this process begins with the basics of knowledge management.
Developing a framework starts with the concept of a “domain model.”
A domain model represents the concepts and terminology of a particular industry or specialized area of knowledge. For example, a domain model for an insurance company might include products, services, risks, regions, topics, processes, audiences, content types, customer types, etc.
The domain model represents common terminology, structure and concepts for most business processes within the target domain. At a deeper level of detail, the domain model is converted into “schemas,” which more precisely describe the process, data and context for the area of interest. The schemas become your metadata structures.
The business world contains thousands of schemas and contexts, and you cannot resolve all of them once. Begin the framework process by starting with one, something critical to the business. By narrowing the focus, it will allow you to measure the business impact more readily.
Writing a schema creates a structure which supports machine training. Machine learning models function more accurately when told what is important to your organization.
Search also benefits from domain models and schemas by limiting and contextualizing possible results to make them more meaningful to the user and relevant to the domain.
Start by building a logical map of information related to a specific business process or need.
Take the proposal creation process of an enterprise as an example. Proposals in this setting contain many information elements. The challenge is finding the right mix of information which relates to the potential client's requirements and to deliver the completed proposal on time.
Businesses waste hours searching for the various content components, whether product configurations, specifications, information about people such as bios and photos, and other components such as case studies, project plans and cost elements.
Data structures, domain models and schemas (with taxonomies and controlled vocabularies) comprise the ontology — defined as the elements described and all of the relationships between them.
In the insurance domain, the relationship between the vocabulary of risks and that of regions can be called “risks in a region.” These ontological relationships capture knowledge about the organization, its processes and relationships in the real world and become the framework on which knowledge is organized.
Machine learning algorithms can place knowledge elements and artifacts into a knowledge base, and then surface the content users need in the context of their goal.
A salesperson creating a proposal can retrieve the information needed to construct it because the process of completing a proposal is part of the framework. The schema structures the information for that process.
If this sounds like old fashioned knowledge engineering, it is. The difference is how the tools scale across content and data sources as well as interpret human intent.
Many of the elements humans need to learn how to complete a task are the same ones required by an AI-driven search application.
In fact, “training content” remains the biggest obstacle to enabling AI programs. IBM Watson's creators noted that — as one would expect — feeding the AI the wrong information sources degraded performance.
Over time, the machine learning that AI offers will move search to a proactive experience. Machines can watch/listen to what humans are trying to do, and refer to repositories of past search actions, improving its ability to return the most relevant results.
AI-enhanced search depends on a knowledge management foundation. Even when the algorithms perform without external ontologies, someone made these decisions and embedded the architecture in the code.
Knowledge engineering will play an even more important role in the success of emergent technologies such as bots and conversational interfaces.
AI tools do not remove the need for domain models, schemas, content architecture, ontologies and other design elements. To the contrary, context must be defined and retained via knowledge engineering approaches.
Knowledge engineering means you are building mechanisms that deliver search results containing the right content to support your employees' processes.
Taking this a step further, using component authoring enables content to be re-used at even finer levels of granularity and in more contexts. AI interprets human intent and contextualizes and personalizes results at a scale impossible with knowledge engineering alone.
These precepts aren't new.
Google has been enhancing its contextualized search for a while, showing steady improvements over time. Traditionally, Google searches resulted in a long list of web pages which contained the answers buried inside.
Well-worded searches sometimes resulted in direct hits, where the answers were part of the URL’s page title. This “trained” the world on how to tag web content for easier retrieval (and is why marketers fear the dreaded Google algorithm!).
You may have noticed, however, that Google has started giving direct answers when it can (e.g. search “How old is Drew Barrymore?” — the answer appears in a bio box astride the list of search results).
These responses use the Google “knowledge graph,” a structure of related concepts and attributes, which is a form of schema related by an ontology. But progress has been slow. Even while possessing the largest database of queries known to man, Google still struggles to consistently offer this level of response.
On a positive note, you don’t have to solve what Google is trying to solve. Your business’s area of expertise is a tiny subset of the world’s knowledge that Google tries to master.
Unfortunately, when it comes to enterprise search, Google doesn't reach inside the firewall.
Much of corporate knowledge is stored safely within protected networks that commercial search engines cannot reach. Further, this information is often unstructured and lack the links that facilitate ranking and retrieval algorithms.
Corporate information often exists in untagged documents and files, outside of a meaningful metadata structure. In order for a machine to understand this unstructured content requires it first be classified and mapped, as well as placed within a context of its use in the business.
Many classification attempts fail because the information architecture does not account for the potential uses of content and information. Instead, businesses organize unstructured information in haphazard ways — either through an accidental architecture or by way of personal, idiosyncratic and inconsistent organizing approaches.
Haphazard approaches do not support user needs within their business processes. AI-enabled search is contextual by nature. Search is always tied to a step within a business process. Thinking about search this way changes how you approach information architecture design — basing it on the context of the business process.
For example, a pharmaceutical company is constantly in the process of complying with FDA regulations in order to advance its products to market. All of the related information that goes into this process can be architected, indexed, and classified to enable much faster access when searched. Advanced knowledge management professionals are attacking this problem today, resulting in thousands of hours of labor reduction, and a faster time-to-market.
If your company is venturing into AI-enabled search, don’t start with machine learning.
Machines first need to be trained on how to learn — whether through schemas or through classifying and organizing information to a set of common purposes within a standard table-driven database. Training sets need to be the gold standard of content and represent clean data in order for them to be useful.
Depending on the purpose, the training sets may require varying levels of structure, but the data must always be of superior quality. Remember what Watson's creators taught us: Putting garbage into an AI system results in garbage AI results.
Consider how humans learn and apply the same approach to AI.
Humans do not learn about the entire business at once. A talented knowledge worker starts by learning about their part of the business. In the same way, a learning “AI machine” is far more successful when tasked with a focused set of business processes and clear objectives. Most of today’s successful AI solutions are narrowly focused.
Feeding the system knowledge about the user, the task and the related content is “the learning” that must occur. It provides context. This process starts one department or one function at a time. Departmental and process-specific AI search applications can then cross over within industries, then cross to adjacent industries, and finally to “business” in general.
But the industry is not there yet.
Setting the right expectations is difficult. AI is not magic, but its value amplifies when you take the machine learning and enabling it in new environments. Each iteration results in refinement for all.
In our third and final post, we will look at the biggest challenge that faces businesses bringing AI into the workplace: people.
Editor's note: This is the second in a three-part series. Read the first article here and the third here.
This series first appeared on CMSWire.com.