Earley AI Podcast - Episode 23: Why AI Is a Content Problem - Knowledge Graphs, Schema.org, and the Case for Boutique Ontologies with Andy Fitzgerald

From PhD in English to Information Architect: How Language, Structure, and Linked Data Make AI Actually Work 

Guest: Andy Fitzgerald, Information Architecture and Content Strategy Consultant 

Hosts: Seth Earley, CEO at Earley Information Science 

            Chris Featherstone, Sr. Director of AI/Data Product/Program Management at Salesforce 

Published on: December 5, 2022

 

 

 

In this episode, Seth Earley speaks with Andy Fitzgerald, an information architecture and content strategy consultant based in Seattle whose path from a PhD in English literature to knowledge graphs turns out to be less surprising than it sounds - because both fields are ultimately about how humans make sense of the world through language. Andy explains how Schema.org works as a gateway drug to RDF and ontologies, why the "make it like Google" request from executives reveals a fundamental misunderstanding of what Google actually does with content, and why organizations that skip knowledge architecture end up using AI as an expensive sledgehammer to fix problems they could have avoided. He also shares UXMethods.org - his boutique knowledge graph experiment proving that smaller, standards-based ontologies deliver real value without enterprise-scale complexity.

Key Takeaways:

  • Language is never natural - it is always constructed, and information architecture is fundamentally the discipline of making those constructions legible to both humans and machines.
  • Schema.org gives algorithms the context they need to interpret web content, enabling search to understand that "gluten-free delivery near me" means food delivery from a restaurant.
  • Making internal search "like Google" requires the same content curation, semantic markup, and structural investment that organizations pour into external SEO - but almost never apply internally.
  • If your taxonomy has a large miscellaneous category, your domain model is probably wrong - the structure hasn't created enough findable places for things to go.
  • AI without a knowledge graph is just a machine that confidently operates without context - the KFC Germany promotion disaster and Microsoft's Tay chatbot both illustrate what happens when humans leave the loop.
  • A knowledge graph is data in context, and the open world assumption means it can absorb new data sources and relationships without breaking - unlike closed relational databases that return false for anything not already defined.
  • Gall's Law says every complex system that works evolved from a simple system that works - starting small with a focused, standards-based ontology is not a budget compromise, it is the strategically correct approach.

 

Insightful Quotes:

"A little semantic goes a long way. A complex system that works will invariably be found to have emerged from a simple system that works - start there." - Andy Fitzgerald (citing Jim Hendler and Gall's Law)

"If the information isn't there, you can't automatically fix the data. It takes a person to provide it. Applying NLP to chaotic content is a sledgehammer - you're making up for past sins." - Andy Fitzgerald

"In AI, until it has a body, it's missing that foundation. When we provide ontologies and schema, we give it the context that we forget we ever generated ourselves - the context that comes from being embodied in the world." - Andy Fitzgerald

Tune in to discover how Andy Fitzgerald's journey from English literature to UX to knowledge graphs reveals why every AI project is ultimately a content problem - and why a small, well-designed ontology is more valuable than a sprawling one that nobody can explain in plain language.



Links:


Thanks to our sponsors:

 

Podcast Transcript: Knowledge Graphs, Structured Content, and Why AI Is Always a Content Problem

Transcript introduction

This transcript captures a conversation between Seth Earley and Andy Fitzgerald about the deep connection between language, information architecture, and AI - tracing Andy's journey from English literature to UX design to knowledge graphs, exploring how Schema.org and ontologies provide the contextual scaffolding that makes machines smart, and making the case for boutique-scale knowledge graphs through Andy's UXMethods.org project. The discussion covers the KFC Germany bot disaster, Gall's Law, the linguistics of embodied cognition, and why starting small with open standards is always the right move.

Transcript

Seth Earley: Good morning, good afternoon, good evening, depending upon your time zone. Welcome to our podcast. I'm Seth Earley. Normally we have Chris Featherstone, but he was not able to make it today. Our guest is a man after my own heart, with deep expertise in knowledge engineering, knowledge graphs, and structured content design. He spends his time helping mid-sized organizations get their heads and hands around their content and content management. Joining us today from Seattle, please welcome Andy Fitzgerald.

Andy Fitzgerald: Thanks so much.

Seth Earley: Andy, your degree is a PhD in English and literature. How did you get from that to information architecture and knowledge graphs?

Andy Fitzgerald: The short answer is it was a dog-legged, circuitous path - as I think most people in user experience design and information architecture come from diverse backgrounds. The path started from graduate school in English. I decided early on that while I loved the academy and loved teaching, I didn't want to spend my whole career there. My last job as a grad student was one I had no business having - basically a webmaster for a humanities site on campus. I looked at their site and quickly realized that what they needed was not web mastering but information architecture. I basically learned how to do some of what I do now on the ground there. I found a mentor who was invaluable in making sure I didn't completely screw everything up, and then transitioned into user experience design - because around 2010 there was much more need for broad UX design than specifically for information architecture.

The really serendipitous thing is that the deeper I got into user experience design, and the more I continued to grow and move into that specialization, the more the things I was doing as an English student in grad school came back around. Because it's all about making sense of how we make sense of the world with words and with language. There's nothing natural about language in the sense that it's all construction, it's all very complex. Picking those things apart takes me right back to what I was doing in grad school. Now I like it as much as I did then.

Seth Earley: Tell us how that initial web editor role at the Simpson Center for the Humanities catalyzed your structural thinking.

Andy Fitzgerald: The Simpson Center at the University of Washington had a web editor role - basically someone to update their site. Their site had been around for a good ten years, cobbled together with all the bubble gum and string you'd expect from a humanities center. They did a good job keeping it up to date, but it was a very brittle structure. I knew going into it that if I could, I would like to turn that role into a pivot point toward information architecture. I spent the first month or so learning the job and making sure I could deliver on what they hired me to do, and then I audaciously put together a project proposal and went to the associate director and said, "Hey, this is what I think we actually need to be doing." I could see the wheels turning in her head - something like, "Wow, this guy wants to do a lot more than we hired him for, and he's not asking for more money. Let's see what happens." For me, frankly, the learning opportunity was the whole point. That was my pivot into what I wanted to do.

Seth Earley: How did you start getting into knowledge graphs specifically?

Andy Fitzgerald: That's been in the last few years - maybe the last five. It's been thinking about content as data versus content as pages. Some of that came out of learning about Schema.org when it emerged as a way to basically use algorithms to help people parse and find information on the web. As we continue to get more and more data, more pages and more content, it's increasingly necessary to have help finding things. Nobody's going to sift through pages and pages of results. We need the robots to help us, and they can do that better when we've told them, in a language they can understand, what the content is really about. Schema.org was my first look into that, and from Schema.org it was a slippery slope right down the rabbit hole into RDF, thinking about knowledge graphs and ontologies, and how those things fit together with artificial intelligence and the ability to make sense of information at scale.

Seth Earley: Tell us about Schema.org specifically and how it relates to knowledge cards and what people mean when they say "make it like Google."

Andy Fitzgerald: Schema.org is a controlled vocabulary for things on the web. It's rooted in e-commerce - it was originally its own entity and was acquired by Google. If I sell shoes online, my shoes range in sizes from six to ten and in prices from eight to thirty-five dollars. All that is information people understand perfectly well, but a robot looking at that stuff just sees numbers and text with no context for what a shoe is, what a price is, what a size is. What Schema.org does is put that information in context. You can say that a shoe is an article of clothing that comes in sizes, and relate those things together so an algorithm knows that a product marked as a shoe with a related size and color property should be treated as connected entities.

A better example: if I do a voice search on Google for "gluten-free delivery near me," it comes back with a list of restaurants near my area that deliver gluten-free food. I haven't said "food delivery," but something like Schema.org allows an algorithm to say - if he's talking about gluten-free and delivery, he's probably talking about a restaurant. It can make those sense assumptions.

Knowledge cards are a great example of what this enables. A lot of them come out of Wikipedia and Wikidata, and the structured content from those. There's a great quote - actually the thesis of "Thinking, Fast and Slow" by Daniel Kahneman - what you see is all there is. As humans looking at the world, if we don't see an option we assume it's not there. It's certainly the case for algorithms too. If Google can look at my website and see that I'm an information architect who's been practicing for twelve or fifteen years - that information isn't automatically there. I may have printed it on a page, but when I mark it up with Schema.org, it's tagging bits of content and saying, "this is experience." Just as I can look at someone's LinkedIn profile and see "information architecture, twelve years" and understand those two things are related - an algorithm doesn't have that context naturally. Schema.org provides it, and in an AI sense, that's what allows our machines to be smart.

Seth Earley: When executives say "make it like Google" for internal search, what do they need to understand?

Andy Fitzgerald: Making it like Google means understanding a little bit about what Google actually does. They crawl the web with massive servers that hit your web pages and everything connected to them, and the way they make sense of those pages is through the semantic structure of HTML. They look at the H1, the title tags. They look at and often ignore meta descriptions because we're terrible about making those truthful. They look at Schema.org, assemble all of that into a database, and return results. All of that comes back to making information legible to algorithms. For organizations, the first step is just using semantic markup correctly - making sure your headings use actual H1 tags instead of styled spans that look like headings to humans but look like nothing to an algorithm.

There are hundreds of millions of dollars spent in the SEO marketplace optimizing external content so it can be surfaced on Google. If we put even a fraction of that energy into our internal content - curating it, making sure it's on topic, well-written, and contains the signals that internal search algorithms need - it would be like Google. But we don't, and that's the gap.

Seth Earley: There was a great observation at Taxonomy Boot Camp about the miscellaneous category problem.

Andy Fitzgerald: Helen Lippell from the taxonomists panel had a response I really loved. Someone in the audience asked how do you keep people from putting everything in the miscellaneous category - and her answer was: if you have a lot of things in the miscellaneous category, your domain model is probably wrong. You probably haven't created a structure that allows people to put things in those actual categories.

That makes me think of work I've done for city government intranets - called in to fix things because people can't find anything, and they waste hours standing up and shouting over cubicle walls: "Hey, where do I find this thing?" The bane of intranets everywhere. A lot of it is because people are allowed to self-organize and it becomes a thicket of unfindable information. There isn't a taxonomy or a model for organizing things. So when taxonomy and schema are left until later or not thought of at all, you end up applying natural language processing to fix it - which can fill some gaps, but it's a really hard and expensive way to do it.

Seth Earley: It's a sledgehammer. You're making up for past sins in court.

Andy Fitzgerald: Exactly. And we've seen some examples of what happens when automation runs without proper structure and human oversight. The KFC Germany incident is one. KFC ran a promotion in Germany that coincided with a solemn national commemoration. Their automated system sent a push notification suggesting people celebrate with crispy chicken. The internet went crazy. KFC immediately issued an apology - it was a bot, something was wired up and went out that never should have. Wildly embarrassing. That's the automation problem - when you've got hundreds of thousands of documents, some auto-tagging is probably a good idea, but there always has to be a human element, what you'd call "human in the loop," to verify that you don't recommend crispy chicken sandwiches for a cataclysmic historical event.

Seth Earley: So define knowledge graphs for the audience, and then connect it to how they help with content and AI.

Andy Fitzgerald: The shortest way I can put it: a knowledge graph is data in context.

If I have a data set - let's say an inventory of articles of clothing - that data on its own doesn't have the context that even something like Schema.org can provide. Structurally, your data is probably in a relational database, which is what's often called a closed world system - if something isn't in the database, it comes back as a false value. In ontologies and graph databases, there's the concept of the open world assumption: absence of evidence is not evidence of absence.

What a knowledge graph consists of is a data set like your inventory, plus a taxonomy or ontology that defines the rules - if it's a shoe, it has to have a size and a color. That lets you do two things: validate information going in (you can't enter a new product as a shoe without providing size and color), and when publishing, use those relationships to query in ways you didn't anticipate ("show me all the brown shoes in size ten in inventory").

I have a client right now building structured content solutions for point-of-care microlearning in healthcare - specific information for nurses specific to their unit. By being granular about how we describe the elements in that content and relating them together, I can now look at the data set as a whole and say: I have a new client who is a NICU Level Three. I want to query all hundred other clients who are NICU Level Three and see what content all of them have, ninety percent have, eighty percent have. Now I can go to my new client and say: we're very confident you'll want this, somewhat confident about this, you might also want this. It's a way to mine our content set as data.

The graph part comes from that open world assumption - the idea that there can always be new information. So we can layer in traffic data. Show me not only the resources that everyone has, but of the units most like this new one coming in, which resources have the highest traffic? With relational databases you could do some joins and probably make it work, but it's work. With a knowledge graph and open standards like the OWL language, if entity X in my analytics data set is the same thing as entity Q in my records database, you can create that entire combined data set with a graph query. It's a way of using data in multiple ways by describing it not as pages but as data elements with relationships.

Andy Fitzgerald: I should also address the AI implication directly. There's a set of thinking that has profoundly influenced how I practice information architecture - cognitive linguistics. The idea that our understanding of the world is based in metaphor. We understand new things based on old things we understand. Thinkers like George Lakoff and Mark Johnson have walked that all the way back to embodiment - we understand the world because we have bodies. Lakoff talks about kinesthetic image schemas as the first things we understand that everything else builds on. I have a body, so I understand inside and outside. I understand part and whole. Everything comes from those beginnings.

For AI: until it has a body, it's missing that foundation. It's missing that context. When we can provide it - when we provide ontologies, schema, knowledge graphs - we give it the context that we forget we ever generated ourselves, the context that comes from being in the world physically. That's what we're shortcutting when we build these structures well.

Seth Earley: And when organizations try to build AI applications without that knowledge architecture, what do you see?

Andy Fitzgerald: We see the Microsoft Tay chatbot - over three or four days on the internet it became deeply problematic because there was no structured knowledge constraining what it could learn and repeat. We wildly underestimate how complex thinking is and how complex linguistic processes are, because they're so easy to us. Of course we feel like, "I'm good at coding, I can code that." Some of the more successful AI applications are the ones that communicate a better awareness of their limitations. Siri was launched to be very personal and human-like, but when you hit just enough complexity - "remind me in an hour and a half" doing something completely unexpected - it seems stupid because it promised you it was smarter. More rudimentary-seeming tools that failed gracefully were actually better experiences, because you didn't have mismatched expectations.

AI at the end of the day is a content problem. If you've got hardcoded rules, no ontology, no NLP, just pattern matching - you're completely off the mark. You're missing an apostrophe and the whole thing breaks.

Seth Earley: Tell us about your boutique knowledge graph experiment - UXMethods.org.

Andy Fitzgerald: UXMethods.org is an experiment of mine that brings together everything we've been discussing - structured content, knowledge modeling, ontologies, knowledge graphs - at a small scale. You might wonder why small scale, since knowledge graphs and ontologies are usually applied to very large data sets. Most ontology applications I've seen examples of are in biotech - tracking proteins, tracking drug interactions, very complex huge data sets that need tons of computing power. But the open world assumption and connected content and linked data bring advantages that smaller organizations can benefit from too.

The motivation came from teaching information architecture to students in a UX design certificate course at the School of Visual Concepts in Seattle. I'd teach them card sorting, tree testing, navigation design. They got all the individual concepts, understood the techniques, knew how to use the tools. But what consistently stumped them was: how do I string these things together? How do they relate?

UXMethods.org provides an overview of the methods of user experience design - of which there is no shortage on the internet, so that's not the problem I'm solving. What the knowledge graph does is connect the different methods. The ontology is very basic - it's all about inputs and outputs. Card sorting creates insight into mental models and user preferences. Other methods like category design, taxonomy design, and wireframing use mental models as inputs. Card sorting itself takes inputs generated by user interviews, task analysis, and stakeholder research. An experienced UX practitioner knows all of this. Someone new to the field doesn't. What UXMethods does is capture each method with its steps and the data about its inputs and outputs, and then the graph mines the data - calculating which other methods feed the greatest number of inputs into card sorting, and which provide the greatest number of outputs used by downstream methods.

You could do this with front-end templates, but as you continue to add content the calculations get more complex. By building it as a knowledge graph, if I want to weight these algorithms or bring in Google Analytics data to rank methods by traffic, they can start to blend because of that open world assumption.

Seth Earley: Is the juice worth the squeeze for a boutique-scale knowledge graph?

Andy Fitzgerald: So far, yes. And part of what I was testing was: is it cost-effective to create a small model? There are a couple of nuggets of wisdom that inform the answer.

One is from Jim Hendler, who co-authored with Tim Berners-Lee the 2001 semantic web paper - he likes to say that a little semantic goes a long way. The other is from John Gall, who wrote a funny little book in the seventies called Systematics. He articulates what we now call Gall's Law: a complex system that works will invariably be found to have evolved from a simple system that works. That simple system that works is what we should be building toward.

There's also a beautiful thing about graph databases and the open world assumption: with relational databases, if you create something small and keep adding to it, you eventually get a mess - more tables, more joins, more brittleness. With a graph database, you create something small, add to it, add to it more, and you still just have a graph database. It grows gracefully.

Starting small also means you're working with open standards - OWL, RDFS, SPARQL. UXMethods uses a tiny bit of OWL and some SKOS. If I need to grow that and bring in enterprise-grade tools like PoolParty, they use the same standards I'm using now. If I need to move it onto Amazon Neptune, as long as it's standards-based, it can grow. Starting small isn't a budget compromise - it's the strategically correct approach. You get a system that works and provides value, and you build from there.

Seth Earley: I completely agree. Think of a knowledge architecture as a focused subset of the broader information architecture, matching the use case. When people build enormous, heavyweight ontologies trying to model everything, you start seeing relationships that are irrelevant to the use case, and then you're lost in the complexity. Some simplicity is hidden complexity - don't start with too much complexity up front.

Andy Fitzgerald: There's a useful distinction between simplicity and clarity from architect Eddie Covert. The iPhone is very complex, but it's clear. Simplifying means removing functionality - removing the ability to do something. Clarifying, as Apple has done beautifully for consumer electronics, means you can have that complexity without burdening people with instructions for resetting the clock on a VCR. Clarity is the goal, not simplicity.

Seth Earley: Any final advice for people building information structures and knowledge systems?

Andy Fitzgerald: This isn't directly a plug for people like you and me who do this kind of work, but I'll pass on some wisdom I've observed from people who've tried to put together knowledge graphs, complex systems, or even just taxonomies: hire people who know what they're talking about. I've seen a lot of taxonomy projects where someone who was interested in doing it gave it a try, ended up spinning and frustrating a lot of people, and eventually someone with real expertise had to be brought in anyway.

Seth Earley: I'd add a corollary: bring someone in who knows what they're talking about and can explain it to you in plain language. I've met many people who call themselves ontologists, information architects, or taxonomists, and they speak in very abstract terms that are very difficult to understand. It sounds like they're smart. Sometimes it's not because they're smart - it's because they're confusing. The right expertise explains complexity clearly. Thank you so much for your time today, Andy. It's always a pleasure.

Andy Fitzgerald: Thank you.

 

Meet the Author
Earley Information Science Team

We're passionate about managing data, content, and organizational knowledge. For 25 years, we've supported business outcomes by making information findable, usable, and valuable.