What is an ontology

Appendix III

What is an ontology?

Christopher A. Welty

Ontology is a field of Philosophy concerned with the "nature of existence." It's practice dates unequivocally to Aristotle, however the name is roughly three centuries old. Philosophers in Ontology ask and answer questions like, "What is an apple?" "How do I know it is an apple and not a peach?" "What is a hole? Is a hole made of matter?" As with any field, Ontology escapes a simple definition (such as the one above or the Computer Science one below). Rather, it includes a set of standard problems, a set of standard analysis techniques, a jargon, a community, and so on. This is true of all disciplines. What is Physics? What is Chemistry? Is "cooking" chemistry?

In Computer Science, "an ontology" is usually defined as "a specification of a conceptualization." One might easily ask what that means, and I will describe it in more detail, however I should point out that the growing popularity of the term has of course diluted its original intent. It is coming to mean any specification that resulted from careful analysis of a problem. People in artificial intelligence, software engineering, and databases who have been "modeling" domains for software or database systems for 10-20 years have a tendency to take this approach, and call what they have been doing all along "ontology" without really knowing what Ontology (the field of Philosophy) actually is. One of the reasons this is bad is that philosophers have already solved a lot of the problems that computer scientists are struggling with today, which was one of the main reasons those of us who tried to get Ontology adopted in Computer Science tried. The attempt was neither a complete success nor a complete failure.

That said, what is a conceptualization and what is a specification?

For every term in our language, humans ascribe some meaning. We could go on forever discussing how that meaning is ascribed; it may include images, feelings, events, other words, sounds, etc. This meaning that we ascribe to a set of terms is our conceptualization. Most importantly, conceptualizations:

are hidden
differ from person to person.

Within a certain community this person-to-person difference may be less than between people in different communities. For example, if I say "a fast horse", everyone will probably think s/he knows what I'm talking about. But an expert in horses would probably ask me to be more specific, after all a fast thoroughbred horse (a race horse) is different from a fast draft horse (the kind that pull the Budweiser wagon).

This "being more specific" is a specification. An ontology, loosely defined, is anything that helps make our conceptualizations less ambiguous, that helps us close the gap between our conceptualizations - or at least make the gaps clear. While philosophers have been talking about this for hundreds of years, computer systems have made the problem more pressing, since they encode a conceptualization of one person or group, and typically are very brittle with respect to making it clear what conceptualization has been encoded, and tolerating other conceptualizations.

For example, in doing requirements analysis⁽¹⁾ for a software system to be used by a government office, one of the managers, full of rhetoric and enthusiasm, made this simple statement, "We want the system to be accessible to any person."

Now, doesn't that sound simple? Don't you think you understand what he means?

Part of our analysis technique involves jumping on statements like this. "OK," we asked, "How are these people to be identified to the system?"

"By social security number, of course."

"What if they don't have a social security number?"

"Well, I don't know." Discussion. "Then they don't pay taxes."

"So the system will not be accessible to anyone, only people with social security numbers?"

"...who pay taxes."

It got worse, but the point is that even the simplest of statements is the tip of an iceberg, small little glimpses of these conceptualizations. An ontology should expose more of the iceberg before you sink.

A lexicon is the simplest form of system that could be called an ontology. It is just a set of words, with no particular structure, and their meanings.

A taxonomy is a very simple ontology that is one step better than a lexicon in that it provides some structure in the form of links to more specific and more general terms. For "horse" you might have: more general: ungulate. More specific: thoroughbred, draft. We are accustomed to taxonomies in biology, but it is a generic tool. "Person who pays taxes" is more specific than "Person".

A thesaurus is a very specific thing in computer science - there is an ANSI standard for electronic thesauri. It is also a very simplistic form of ontology, one step better than a taxonomy in that it adds a "related-to" link. For horse you might have the general and specific links mentioned above, and then "related-to" racing and pony.

Relational databases in particular the "EER" (extended entity-relationship model) provide the possibility for the next level of complexity in an ontology, because you are not limited to only three types of links between terms. For example, I might also want to know for horses that each horse has a mother and a father, both of which must be horses.

The step from thesauri to relational databases raises another important point. For the first three levels of ontology, we were talking only about terms, not about real things. One can talk about the meaning of "horse", but one can not point to "horse". "Horse" is an abstract concept, a grouping of characteristics that we, as a result of our sensory apparatus, conveniently assign to a particular set of things in the real world. I can point to "Secretariat" (a famous race horse who won the triple crown), and say "he is a horse" because he is real. Database systems are the most primitive tools for modeling that provide for this distinction between abstract concepts and concrete things. Normally, the set of concrete things in a database ("the data") are fundamentally different from the set of descriptions of things in a database ("the model"). We do not usually think of an ontology as including the data, but this is not strict. For example,

"A horse is a mammal, and has a mother and a father that are horses." (Part of the ontology) "Secretariat is a horse, his mother was Somethingroyal and his father was Bold Ruler" (Part of the data, NOT the ontology).

We might denote the former as:

HORSE:

IS-A: MAMMAL
FATHER: HORSE
MOTHER: HORSE

And the latter as:

Secretariat:

TYPE: HORSE
FATHER: Bold Ruler
MOTHER: Somethingroyal

So, again, Secretariat, Bold Ruler, and Somethingroyal are not part of the ontology. HORSE, MAMMAL, FATHER, and MOTHER are.

To show that what we traditionally call "data" can sometimes be required in an ontology, take this classic example:

COUNTRY:

IS-A: GEOGRAPHIC-REGION

PERSON:

IS-A: MAMMAL
BORN-IN: COUNTRY

Italy:

TYPE: COUNTRY

So COUNTRY, PERSON , GEOGRAPHIC-REGION, MAMMAL and BORN-IN are part of the ontology (TYPE and IS-A are part of the underlying representation). Italy is data. However, if we want:

ITALIAN:

IS-A: PERSON
BORN-IN: italy

In this case ITALIAN is clearly part of the ontology, but it requires a piece of data for its definition. This is a subtle issue, the point is that this distinction between what is and isn't in the ontology is not always a sharp one.

More complex ontologies are those that include more expressive ways of describing the world. If I want to say, "a grandfather is the father of a parent" this is something I can't define declaritively in a database system. It requires something like first order logic, which has the ability to represent various forms of deduction.

Logic is one of the simplest ways to represent expressive ontologies, because it is very brittle in general and only provides for very strict interpretations. I can't say, "the wife of a father is USUALLY a mother" in a logic-based system, and here we can talk about implemented systems that support ontologies that have "uncertain" knowledge in them. These systems are probably the most expressive.

From a practical perspective, the reason all implemented systems don't use the most expressive means possible is that using very expressive descriptions requires a lot of computation. For example, if I managed to get all the census data into a computer, and lets say for each person I had also recorded their mother and father. That's 260,000,000 people. Now let's say I defined grandfather as above, and asked the computer to compute for me all the grandfathers in the U.S.

I would not get an answer right away.

The more expressive a system is for supporting specifications, the more time it takes that system to perform operations. This is a fundamental tradeoff.

Also by Christopher A. Welty

http://www.cs.vassar.edu/faculty/welty/papers/

Guarino, Nicola and Chris Welty. "Identity, Unity, and Individuality: Towards a formal toolkit for ontological analysis." To appear in, Horn, W. ed., Proceedings of ECAI-2000: The European Conference on Artificial Intelligence. IOS Press. August, 2000.

Welty, Chris and Jessica Jenkins. "An Ontology for Subject." J. Knowledge and Data Engineering. 31(2)155-182. September, 1999. Elsevier.

Welty, Chris and Nancy Ide. "Using the right tools: enhancing retrieval from marked-up documents." J. Computers and the Humanities. 33(10):59-84. April, 1999. Kluwer.

Welty, Chris. "Toward an Ontology for Library Modalities." In S. Ali, ed., Proceedings of the AAAI-98 Workshop on Representations for Multi-modal Human-Computer Interaction.

Welty, Chris. "The Ontological Nature of Subject Taxonomies." In N. Guarino, ed., Formal Ontology in Information Systems. IOS Press Frontiers in AI Applictions Series. Trento, Italy. June, 1998.

For more on ontologies:

"Formal Ontology and Conceptual Analysis: A Structured Bibliography" at http://www.ladseb.pd.cnr.it/infor/ontology/Papers/Ontobiblio/TOC.html

1. Requirements analysis is when you talk to the users and try to understand what they think they want, and what they really want, modeled on what is possible.