Expert Insights | Earley Information Science

Ten Common Mistakes Companies Make When Developing a Taxonomy

Written by Seth Earley | Jul 22, 2012 4:00:00 AM

Taxonomy development might seem like a straightforward case of setting up categories for products or services, but in fact it is more complex than it seems. The decision making around taxonomy design and taxonomy development requires an understanding of how the taxonomy will be applied to the user experience (UX), to content management and content classification, and to enterprise search, which leverages machine learning and natural language processing types of artificial intelligence algorithms.

Most enterprises require multiple taxonomies, and when those taxonomies are mapped together the result is an ontology. Many of the subject matter experts who are involved in taxonomy design and taxonomy development decision-making understand their area of expertise but are not fluent in of the methods for developing a taxonomy.

Frequently, the person tasked with taxonomy development responsibilities becomes an “accidental taxonomist” who ends up learning on the job.

How Are Taxonomies Used in the Enterprise?

Organizational taxonomies need to serve different purposes and competing interests, and sometimes the resulting taxonomy can become a compromise. Different types of taxonomies are used in different ways. For example, a knowledge sharing or content management system requires a different taxonomy structure from that used in an ecommerce application (though there will likely be shared elements). Content sources need to be analyzed and an information architecture, informed by the taxonomy, designed. If there is an existing taxonomy, the new terms will need to be integrated as appropriate. The original taxonomy used in a content environment may have been developed over a period of years and grown organically, so it may need to be reassessed.

Taxonomy creation can borrow from that existing structure, reusing common taxonomy terms if they are still relevant. Much of the work involves paring those terms down to only those needed for content tagging with metadata and other information architecture constructs.   The taxonomy project should leverage analytics to understand what terms are being most used (this can come from search analytics as well as content analytics). If one of reasons for developing a taxonomy is to power a learning management system, educational objectives become one of the inputs. UX research is critical to any taxonomy project.

10 Common Mistakes Companies Make When Developing a Taxonomy

During the course of our consulting engagements over the years, we have seen businesses make all kinds of mistakes when developing taxonomies. We see these common errors whether the taxonomy is to be used for content management, document management, or search development. In some cases, taxonomies are used as master data and reference data for ERP systems. These gotchas still apply.  Here are the top 10 mistakes we see among companies that are seeking to develop taxonomies.

  1. Mistaking taxonomy for navigation.
  2. Trying to use an “out of the box” or pre-built taxonomy.
  3. Creating an overly granular taxonomy.
  4. Not maintaining the taxonomy.
  5. Not matching personnel skills to requirements .
  6. Incorrect implementation.
  7. Improper technology.
  8. Inadequate tagging of legacy content.
  9. Lack of tagging compliance for new content.
  10. Incorrect auto-tagging configuration.

1. Mistaking taxonomy for navigation.

For some time many design firms and information architects referred to navigational structures as “taxonomy.” This is indeed one possible application of taxonomy, and we do have navigational hierarchies. But taxonomy is not the same as navigation. We need to consider classification in addition to navigation. We can then create multiple navigational structures from a single classification mechanism that contains multiple “facets” or trees). An effective enterprise taxonomy can include different taxonomies for different audiences.

2. Trying to use an “out of the box” or pre-built taxonomy.

Standards can sometimes be leveraged for creating a taxonomy but starting with a generic industry vocabulary or a very large term set can lead to additional work and take the project down the incorrect path. Taxonomies must be appropriate to the use case, and UX research is a critical part of the process. Using another organization’s taxonomy does not provide any competitive differentiators and may not meet the needs of users or support scenarios. The best taxonomy for an organization is one that is developed in-house, with a deep understanding of what the organization is offering, and what its customers want.

3. Creating an overly granular taxonomy.

Many efforts lead to taxonomies that are at too fine a level of detail to be practical. Taxonomy design can make or break a customer experience. There always has to be a reason for a term. That is where use cases come in. If you cannot identify the specifical reason for the term, it should not be allowed in the taxonomy. The term may be in a search thesaurus as a non-preferred term (SOW and Proposal for example – are they truly different and would they need to be accessed separately? Is there a use case? ) but it won’t crowd dropdown lists and menu choices.

4. Not maintaining the taxonomy.

A taxonomy is a living, changing entity. Some organizations spend adequate time on development but then do not have the resources and processes in place to maintain the taxonomy after it is deployed. Taxonomy maintenance requires that consistent naming conventions be applied and observed, and that there is a process for error detection and correction. Multiple taxonomies are likely to exist in an enterprise. These taxonomies may have a competing interest between two uses with differing requirements. The right professionals need to be brought into the process to ensure that different perspectives are accounted for. But match the review to the change. Small changes need little review, while more impactful decisions require deeper analysis and more buy-in. Changing the product hierarchies would be an example of a major change.

5. Not matching personnel skills to requirements

The skills of the taxonomist are specialized, and failing to address all the requirements can lead to an ineffective taxonomy. For example, some projects that have employed librarians who were not well versed in the practical application of taxonomy. The end result was a taxonomy that was “correct” in theory, but not practical for the business. Taxonomies have to solve real problems based on use cases. Use cases need to be verifiable as to whether the task can be completed and how findability or navigation facilitates or impedes the process. Some of the analytical aspects of developing taxonomies extend well beyond the activities associated with traditional librarians, or even with those trained in the classification aspect of taxonomy, such as botanists. The skills needed could include familiarity with metadata, user testing, and the use of auto-tagging software, and the ability to reconcile classification issues among different stakeholders.

6. Incorrect implementation.

Some organizations have handed off the taxonomy to the IT team without a close partnership with the developer of the taxonomy. That has led to “bolting” the taxonomy on to the software application as a navigational structure as opposed to integrating it into the core architecture for classification and metadata models.

7. Improper technology.

The correct technology needs to be available to leverage taxonomies - especially the so-called “associative” relationships from a thesaurus structure. Ontology for ecommerce goes beyond taxonomy. Ontology management tools such as those from Semantic Web Company (PoolParty semantic suite), data.world, Stardog as well as open-source tools like Neo4j can be part of the taxonomy management and application ecosystem.

8. Inadequate tagging of legacy content.

It is tempting to take a “going forward” approach and just include new content to the taxonomy. However, if the taxonomy is not applied to content that already exists in the system, the benefits of the taxonomy will not be fully realized. Tagging can be facilitated with machine learning approaches to autoclassification. Legacy content needs to be curated and cleansed as part of the optimization process.

9. Lack of tagging compliance for new content.

Especially when a taxonomy is first introduced, culture change is required. New content needs to be correctly tagged by content curators. Some workers may be resistant to adding a new task to their job, while others may not feel confident how to do it. A quality metric should be used to ensure that tags are appropriately applied. For hints on encouraging compliance, please see: How to get people to tag documents.

10. Incorrect auto-tagging configuration.

Auto-tagging can be a boon when large volumes of new information are coming in, or when repositories of legacy information are being added. However, if the auto-classifiers, whether rules-based or statistical, are incorrectly configured, it will lead to poor results. Considerable thought should be put into establishing the rules. Training and testing of the system are required in order to have a successful auto-classifier.

In conclusion

We minimize the risks of these ten mistakes by adhering to the correct methodology but also applying judgment and experience to the process. This means not adhering to the process blindly, but taking into consideration exceptions and validating assumptions and choices with user experience data. An effective taxonomy avoids these errors and validates results as the program progresses. Ensuring that these issues are accounted for will enable the organization to get real value from taxonomy development projects.

Ready to get started? Contact us today to set up a time to talk about your project.