Expert Insights | Earley Information Science

The Knowledge Architecture Foundation That Makes AI-Driven Search Actually Work

Written by Earley Information Science Team | Mar 7, 2017 5:00:00 AM

The appeal of AI-driven search is straightforward: systems that understand what employees are trying to accomplish, surface the right information in the right context, and reduce the time and friction that knowledge work currently involves. The technology to do this exists. What determines whether it works in practice is not the sophistication of the algorithms. It is the quality of the knowledge architecture underneath them.

There is a persistent belief that AI is capable of inferring meaning and structure from raw, unorganized content. In controlled research environments with carefully curated data, this is sometimes possible. In the messy reality of enterprise information -- scattered across systems, inconsistently tagged, organized idiosyncratically if organized at all -- it is not. Human-to-machine conversations become reliable only when a substantial foundation of knowledge engineering work has been done first. That work begins with the fundamentals of how information is structured, classified, and connected.

This is the second article in a three-part series on AI-driven enterprise search. The first examined the potential of conversational AI for the workplace. The third addresses the organizational and talent challenges of implementation.

This series originally appeared on CMSWire.

Step One: Develop a Domain Model and Schema Framework

Every AI-driven search application operates within a domain -- a specific area of business knowledge with its own vocabulary, concepts, processes, and relationships. Before any machine learning can be applied usefully, that domain needs to be explicitly described.

A domain model captures the core concepts and terminology of a particular industry or business area. For an insurance organization, this might include products, services, risk categories, geographic regions, regulatory topics, customer types, content types, and business processes. The domain model establishes a shared vocabulary and conceptual structure that reflects how the business actually thinks and operates.

From the domain model, more detailed schemas are developed. Schemas describe data, process, and context with greater precision -- they become the metadata structures that allow information to be organized, retrieved, and connected in meaningful ways. When a machine learning model is provided with a well-constructed schema, it has a defined framework for understanding what matters within a particular context, which significantly improves the accuracy and relevance of its outputs.

The practical implication for organizations beginning this work is to start narrow. The business world contains an enormous number of schemas and contexts, and attempting to address all of them at once is a reliable path to an unfinished project. Selecting one critical business area, building a complete and accurate schema for it, and demonstrating measurable impact creates the foundation for expanding scope -- and gives the program a concrete success to point to.

Step Two: Build Ontologies That Capture Organizational Knowledge

A schema describes structure. An ontology describes relationships -- the connections between concepts that reflect how knowledge is actually organized and how one piece of information relates to another.

Consider the proposal creation process at a professional services firm. Constructing a proposal requires drawing together a wide variety of information components: product configurations, technical specifications, team member profiles, case studies, project plans, pricing elements, and regulatory or compliance considerations relevant to the client. Each of these components exists somewhere in the organization's repositories. Finding the right combination efficiently, and assembling it in a way that addresses the specific requirements of a given opportunity, is a significant knowledge challenge.

An ontology for this process maps the relationships between these components -- which case studies are relevant to which solution types, which technical specifications correspond to which product configurations, which team profiles are associated with which service areas. These relationships encode organizational knowledge in a form that AI-driven search can use to surface the right information at the right moment in the proposal process.

In an insurance context, the relationship between risk vocabulary and regional vocabulary might be described as "risks applicable in a region." These structured connections represent real-world relationships that the organization understands but that no machine can infer from untagged documents alone. Building them explicitly is what makes AI-driven search genuinely useful rather than superficially impressive.

Scaling Knowledge Engineering with Machine Learning

The elements of this work -- domain models, schemas, taxonomies, controlled vocabularies, ontological relationships -- collectively represent knowledge engineering. This is not a new discipline. What is new is the ability to apply these structures at scale across large, diverse content and data sources, and to combine them with machine learning in ways that interpret human intent rather than just matching keywords.

Machine learning algorithms can categorize content, apply metadata, and surface knowledge artifacts in the context of a user's goal. A salesperson building a proposal can retrieve relevant components because the process of proposal construction has been modeled, the schema structures the required information, and the system understands the relationship between what the salesperson is trying to accomplish and the content that supports it.

Training data quality is the most important variable in this process. IBM's experience developing Watson established a principle that has been confirmed repeatedly: feeding incorrect or poorly structured information into an AI system degrades its performance. Training sets need to represent the best available examples of the content the system will be asked to work with -- clean, well-structured, and representative of real use cases. Organizations that treat training data preparation as a secondary concern will produce AI systems that reflect that approach in their outputs.

Over time, well-designed AI-driven search systems improve through use. They observe patterns in how employees search, what they select, and what they do with the results. This feedback loop allows the system to refine its relevance model continuously -- moving from reactive retrieval toward proactive assistance that anticipates what a user needs based on the context of their current task.

Why Google Is Not the Model for Enterprise Search

Google's evolution toward direct answers and knowledge graph responses offers a useful reference point. Rather than returning a list of pages that may contain an answer somewhere within them, Google increasingly surfaces the answer itself -- a name, a date, a definition -- at the top of results. This is the product of a knowledge graph: a structured network of concepts, attributes, and relationships that allows the system to reason about queries rather than simply match terms.

The enterprise search problem shares the same structural requirements but operates under different conditions. Corporate knowledge assets are not publicly indexed. They live behind firewalls, inside document repositories, within application databases, and in the heads of employees. Commercial search engines do not reach them. And unlike the public web -- where the volume and density of links and citations creates natural ranking signals -- enterprise content is often unstructured, untagged, and without the metadata that would support meaningful retrieval.

The encouraging reality is that enterprise scope is tractable in ways that Google's scope is not. An organization does not need to model all human knowledge. It needs to model the specific domain of its own business -- a finite set of concepts, processes, relationships, and vocabulary. That is a challenging but achievable engineering problem. Organizations that approach it systematically, domain by domain, build a compound knowledge asset that grows more valuable with each addition.

Connecting Information Architecture to Business Process

The most important design principle for enterprise knowledge architecture is that search does not happen in a vacuum. Every search is embedded in a task -- a step within a business process that a person is trying to complete. Designing information architecture around that reality, rather than around the structure of existing repositories, changes both what gets built and how well it serves users.

A pharmaceutical organization managing regulatory submissions is a clear example. The process of preparing and advancing a product through FDA review involves a large and interconnected set of information components -- clinical trial data, regulatory correspondence, labeling requirements, manufacturing documentation, and more. When this information is architected, indexed, and classified in alignment with the regulatory submission process, the time required to locate and assemble the necessary components can be reduced dramatically. Organizations that have done this work report significant reductions in the labor involved in compliance processes and measurable improvements in time-to-market.

This process-centered approach to information architecture applies across industries and functions. The specific schemas and ontologies differ by domain. The principle is consistent: structure information around how people actually use it, and AI-driven search will have the foundation it needs to deliver relevant, contextual results.

Training AI Systems the Right Way

Organizations approaching AI-driven search for the first time should resist the temptation to begin with machine learning. The machine cannot learn effectively until the foundation is in place -- until the schemas, ontologies, and training content have been prepared to the necessary standard.

The analogy to human learning is instructive. A new employee does not learn the entire organization on their first day. They start with their own role, their own team, and the specific processes they are responsible for. They build outward from there as they develop competence and context. AI systems follow the same learning logic. A narrowly focused implementation -- one department, one process, one well-defined domain -- produces better results than a broad deployment on inadequately prepared content. Success in a focused area establishes the pattern for expansion.

As the system matures and its scope grows -- from one department to adjacent functions, from one industry context to related ones -- the value of the knowledge architecture compounds. Each new domain that is properly modeled adds to the organization's collective AI capability. Each iteration of learning refines the system's relevance model. The investment in foundational knowledge engineering pays dividends not once but continuously, as the organization's AI-driven search capability develops into a genuine competitive asset.

This article originally appeared on CMSWire and has been revised for Earley.com. Read the other articles in this series: How AI-Driven Search Could Bring Us Closer to the Intelligent Workplace and AI-Driven Enterprise Search Is Closer Than You Think.