DON'T PANIC

2017년 6월 7일 수요일

Ontology Engineering

1. When to Use an Ontology
- Knowledge management
1) control vocabulary
2) making domain assumptions more explicit
3) separate the metadata structure from the data itself
4) change in metadata does not necessarily require change in the data

- Knowledge sharing
1) the clear model of your data enables other machines and people to understand it, and thus and reuse it

- Knowledge integration
1) ontologies can bridge between several data sources

- Knowledge analysis
1) using a rich data model enables more complex analysis to be made on the data

- Type of Ontologies
1) Representation ontologies
* describe low-level primitive representations
e.g. OWL, RDF, RDF Schema (language itself)

2) General or upper-level ontologies
* describe high-level, abstract, concepts
e.g. Cyc (Commonsense ontology)

3) Domain ontologies
* describe a particular domain extensively
e.g. GO (Gene Ontology), CIDOC CRM (for cultural heritage)

4) Application ontologies
* mainly designed to answer to the needs of a particular application
e.g. FOAF(Friend Of A Friend)

https://en.wikipedia.org/wiki/Ontology_engineering

http://owl.cs.manchester.ac.uk/about/ontology-engineering/

2. Ontology Building Methodologies
: no standard methodology for ontology construction
- Ontology Development Life-Cycle
1) Specification: specifying the ontology's purpose and scope
* why are you building this ontology?
* what will this ontology be used for?
* what is the domain of interest?
* how much detail do you need?
* Competency Questions: what are some questions you need the ontology to answer?

2) Conceptualisation: identifying the concepts to include in your ontology, and how they relate to each other
* what can you reuse?
* ontologies are meant to be reusable!

3) Formalisation: moving from a list of concepts to a formal model
* define the hierarchy of concepts and relations: top-down/bottom-up / middle-out
* class or relation? : if the subclass doesn't need any new relations (or restriction), then consider making it a relation
* class or instance? : if it can have its own instances or subclasses, then it should be a class

4) Implementation: choosing a language and implementing it with an ontology editor

5) Evaluation: implementing the ontology in an ontology editor helps to get syntax correct
* check the ontology against your competency questions
* W3C RDF validator: http://www.w3.org/RDF/Validator/

6) Documentation: documenting the design and implementation rational is crucial for future usability and understanding of the ontology

http://wiki.opensemanticframework.org/index.php/Ontology_Development_Methodologies

3. Conceptual modelling
- Conceptualisation (1): Vocabulary
1) what are the terms we would like to talk about?
2) what properties do those terms have?
3) competency questions provide a useful starting point
4) investigating homonyms and synonyms

- Conceptualisation (2): Classes
1) select the terms that represent objects having independent existence rather than terms that describe these objects
2) classes represent concepts in the domain and not the words that denote these concepts
3) typically nouns and nominal phrases, but not restricted to them

- Conceptualisation (3): Class hierarchy
1) a subclass represents a concept that is a 'kind of' the concept that the superclass represents
2) Roles are not subclasses
3) application dependent or subjective

- Conceptualisation (4): Properties
1) for each property in the list, we must determine which class it describes
* attributes: measurable properties of a class
* relationships: inter-entity connections

- Conceptualisation (5): Domain and range
1) refine the semantics of the properties
2) when defining a domain or a range for a slot, find the most general classes or class that can be respectively the domain or the range of the slots

- Conceptualisation (6): Inverse properties
1) modelling with inverse properties is redundant, but allows acquisition of the information in either direction

- Conceptualisation (7): Instances
1) entities of a certain type

Web Ontology Language

1. Introducing Web Ontology Language (OWL)
- OWL has more expressive formalism than RDF Schema
1) instance classification
2) consistency checking
3) subsumption reasoning

- two versions of OWL
1) OWL 1.0: different subsets of OWL features give rise to the following sublanguages (colloquially known as species)
* OWL Lite: SHIF(D)
! less complex reasoning at the expense of less expressive language
* OWL DL: SHOIN(D) + complete and decidable + higher worst-case complexity than OWL Lite
! supports all OWL constructs, with some restrictions
* OWL Full: no restrictions on use of language constructs
! potentially, undecidable (you can still receive some answers but not for all questions)
http://www.cs.man.ac.uk/~ezolin/dl/

2) OWL 2.0: more expressive than OWL 1.0, and takes advantage of developments in DL reasoning techniques in the intervening time

2. OWL 1.0 Features and Syntax
- Ontology header
<owl:Ontology rdf:about="">
<owl:versionInfo>1.4</owl:versionInfo>
<rdfs:comment>An example ontology</rdfs:comment>
</owl:Ontology>

- Versioning support
1) owl:versionInfo: version number etc
2) owl:priorVersion: an ontology is a previous version of this
3) owl:backwardCompatibleWith: the specified ontology is a previous version of this one, and this is compatible with it
4) owl:incompatibleWith: the specified ontology is a previous version of this one, but this is incompatible with it

- OWL class types
1) owl:Class: distinct from rdfs:Class
2) owl:Thing: the class that includes everything
3) owl:Nothing: the empty class

- OWL property types
1) owl:ObjectProperty: the class of resource-valued properties
2) owl:DatatypeProperty: the class of literal-valued properties
3) owl:AnnotationProperty: used to type properties
!! OWL versus RDF Schema
* reflexive definitions of RDF Schema means that some resources are treated as both classes and instances, or instances and properties
=> ambiguous semantics for these resources
can't tell from context whether they're instances or classes
can't select the appropriate interpretation function
* the introduction of owl:Class, owl:ObjectProperty and owl:DatatypeProperty eliminates this ambiguity

- OWL local cardinality constraints
1) owl:minCardinality: property R has at least n values
2) owl:maxCardinality: property R has at most n values
3) owl:cardinality: property R has exactly n values

<owl:Restrictions> 
<owl:onProperty rdf:resource="#distilledBy"/>
<owl:cardinality>1</owl:cardinality> 
</owl:Restriction>

- OWL local range constraints
1) owl:someValuesFrom: there exists a value for property R of type C
2) owl:allValuesFrom: property R has only values of type C

<owl:Class rdf:about="#Vegetarian">
<owl:equivalentClass>
<owl:Restrictions>
<owl:onProperty rdf:resource="#eat"/>
<owl:allValuesFrom rdf:resource="#Plant"/>
</owl:Restriction>
</owl:equivalentClass>
</owl:Class>

- OWL local value constraints
1) owl:hasValue: property R has a value which is X
<owl:Restrictions>
<owl:onProperty rdf:resource="#hasColour"/>
<owl:hasValue rdf:resource="#Green"/>
</owl:Restriction>

- OWL set constructs
1) owl:intersectionOf
2) owl:unionOf
3) owl:complementOf

- Equivalence and identity relations
1) owl:sameAs
2) owl:equivalentClass
3) owl:equivalentProperty
<owl:Thing rdf:about="#MorningStar">
<owl:sameAs rdf:resource="#EveningStar"/>
</owl:Thing>

- Non-equivalence relations
1) owl:differentFrom: can be used to specify a limited unique name assumption
2) owl:AllDifferent / owl:distinctMembers
<owl:AllDifferent>
<owl:distinctMembers rdf:oarseType="Collection">
<rdf:Description rdf:about="#John"/>
<rdf:Description rdf:about="#Paul"/>
<rdf:Description rdf:about="#Ringo"/>
</owl:distinctMembers>
</owl:AllDifferent>
3) OWL (and DLs in general) make the Open World Assumption

- Necessary Class Definitions
1) If we know that something is a X, then it must fulfil the conditions ...
2) defined using rdfs:subClassOf

- Sufficient Class Definitions
1) if we know that something has this property, then it belongs to this class ...
2) defined using rdfs:subClassOf - in the other direction

- Necessary and sufficient Class Definitions
1) if something fulfils the conditions ..., then it is an X
2) defined using owl:equivalentClass

- Inverse: defines a property as the inverse of another property
<owl:ObjectProperty rdf:about="#hasAuthor">
<owl:InverseOf rdf:resource="#write"/>
</owl:ObjectProperty>

- Symmetric: P(x, y) iff P(y, x)
<owl:SymmetricProperty rdf:about="#hasSibiling"/>

- Transitive: P(x, y) and P(y, z) implies P(x, z)
<owl:TransitiveProperty rdf:about="#hasAncestor"/>

- Functional: P(x, y) and P(x, z) implies y = z
<owl:FuntionalProperty rdf:about="#hasBirthdate"/>

- Inverse Functional: P(y, x) and P(z, x) implies y = z
<owl:InverseFunctionalProperty rdf:about="#hasFingerPrint"/>

- Disjoint classes: members of one class cannot also be members of some specified other class
<owl:Class rdf:about="#MaleHuman">
<rdfs:subClassOf rdf:resource="#Human"/>
<owl:disjointWith rdf:resourcce="#FemaleHuman"/>
</owl:Class>

- Ontology modularisation
1) owl:imports mechanism for including other ontologies
2) also possible to use terms form other ontologies without explicitly importing them
3) importing requires certain entailments, whereas simple use does not require those entailments

https://en.wikipedia.org/wiki/Web_Ontology_Language

https://www.w3.org/TR/owl-guide/

3. OWL 2
- From OWL 1 to OWL 2: changes between 1 and 2 fall into the following categories
1) syntactic sugar
* owl:AllDisjointClasses : allows to define a class as the union of a number of other classes, all of which are pairwise disjoint
* owl:NegativePropertyAssertions : lets us assert that an individual does not have a particular property value

2) constructs for increased expressivity
* owl:hasSelf: defines a class of individuals which are related to themselves by a given property
* qualified cardinality: lets us specify both the local range of a property and the number of values taken by the property
<owl:Restriction>
<owl:onProperty rdf:resource="#hasPart"/>
<owl:onClass rdf:resource="#Wheel"/>
<owl:cardinality rdf;datatype="&xsd;integer">4</owl:cardinality>
</owl:Restriction>

* owl:ReflexiveProperties: allows to assert that a property is globally reflexive (relates every object to itself)
* owl:IrreflexiveProperties: allows to assert that a property relates no object to itself
* owl:AsymmetricProperties: allows to assert that a property is asymmetric - If p(x, y), then not p(y, x)
* owl:propertyDisjointWith: allows to state that two individuals cannot be related to each other by two different properties that have been declared disjoint

* property chain inclusion: defines a property as a composition of other properties
<owl:ObjectProperty rdf:about="#hasUncle">
<owl:propertyChainAxiom rdf:parseType="Collection">
<owl:ObjectProperty rdf:about="#hasParent"/>
<owl:ObjectProperty rdf:about="#hasBrother"/>
</owl:propertyChainAxiom>
</owl:ObjectProperty>

* owl:hasKey: let's define uniquely identifying keys that comprise several properties
* owl:onDatatype: allows to define subsets of datatypes that constrain the range of values allowed by a datatype

3) datatype support

4) metamodelling
* Punning: OWL 1 required the names used to identify classes, properties, individuals and datatypes to be disjoint but OWL 2 relaxes this (the same name can be used for both a class and an individual>
https://www.w3.org/2007/OWL/wiki/Punning

5) annotation

https://www.w3.org/TR/owl2-primer/

http://semanticweb.org/wiki/OWL_2.html

SPARQL Protocol and RDF Query Language

1. RDF Triplestores
- Specialised databases for storing RDF data
- Can be viewed as a database containing a single table
1) table contains subject/predicate/object as minimum
2) many triplestores store quads to maintain provenance

https://en.wikipedia.org/wiki/Triplestore

http://www.krisalexander.com/uncategorized/2013/07/16/the-difference-between-a-triplestore-and-a-relational-database/

- Querying Triplestores
1) the ranges of the subjects, predicates and objects define a space
2) a triple is a point within this space
3) query patterns define regions within this space
* (s p ?), (s ? o), (? p o) define lines
* (s ? ?), (? ? o), (? p ?) define planes
4) answering queries = finding points in these regions

- RDF Query Languages
1) Typical query structure: <list of variables to be bound and returned> <list of triple patterns and constraints>
2) W3C specified a single query language which combines the best features of its predecessors: SPARQL

2. SPARQL Query Language
- SPARQL
1) SQL-like syntax:
PREFIX foaf: <http://xmlns.com/foaf/0.1/> // need to define namespaces for vocabulary terms used
SELECT ?name ?mbox // variables prefixed with '?'
WHERE {
?x foaf:name ?name . // triple patterns expressed in N3-like syntax
?x foaf:mbox ?mbox . // intersection is defaulted between triple patterns
}

PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT ?v
WHERE {
?v ?p "cat"@en . // literal strings must have language tag, as in graph
?x ?q 42 . // integers in SPARQL queries are implicitly typed as xsd:integer
}

2) Blank nodes
! no guarantee that blank node labels will be unchanged over repeated queries

- SPARQL Constraints: constraints can be applied to variables
SELECT ?title
WHERE {
?x dc:title ?title .
FILTER regax(?title, "^SPARQL")
}

- Optional Graph Patterns
SELECT ?name ?mbox
WHERE {
?x foaf:name ?name .
OPTIONAL {
?x foaf:mbox ?mbox .
}
}
=> I'm interested in mbox, but it doesn't matter to be at first part

- Union Graph Patterns
PREFIX dc10: <http://purl.org/dc/elements/1.0/>
PREFIX dc11: <http://purl.org/dc/elements/1.1/>
SELECT ?title
WHERE {
{ ?book dc10:title ?title }
UNION
{ ?book dc11:title ?title }
}

- Default Graph / Named Graphs
SELECT ?who ?g ?mbox
FROM <http://example.org/dft.ttl> // the default graph. unless otherwise specified, all triples are taken from that graph
FROM NAMED <http://example.org/alice>
FROM NAMED <http://example.org/bob>
WHERE {
?g dc:publisher ?who .
GRAPH ?g { ?who foaf:mbox ?mbox }
}

- Ordering
SELECT ?name
WHERE {
?x foaf:name ?name ; :empId ?emp .
} ORDER BY ? name DESC(?emp) // order results by ?name, and the by ?emp in descending order
// ASC() sorts in ascending order

- Limit and Offset
SELECT ?name
WHERE {
?x foaf:name ?name .
}
ORDER BY ? name
LIMIT 5
OFFSET 10 // returns five results, after skipping the first then results

- Distinct
SELECT DISTINCT ?name // removes duplicate results
WHERE {
?x foaf:name ?name .
}

- CONSTRUCT: returns an RDF graphs, specified by a graph template
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX vcard: <http://www.w3.org/2001/vcard-rdf/3.0#>
CONSTRUCT {
?x vcard:FN ?name .
?x vcard:EMAIL ?mail .
}
WHERE {
?x foaf:name ?name .
?x foaf:mbox ?mail .
}

- ASK: checks whether a query pattern has a solution

- DESCRIBE: returns a single graph containing information about a resource or resources
1) nature of description left unspecified: black node closure / concise bounded description
2) structure of returned data is determined by the SPARQL query processor

https://en.wikipedia.org/wiki/SPARQL

https://www.w3.org/TR/rdf-sparql-query/

https://jena.apache.org/tutorials/sparql.html

http://www.cambridgesemantics.com/semantic-university/sparql-by-example

3. SPARQL Protocol
: defines the method of interaction with a SPARQL processor

- SPARQL over HTTP
1) HTTP GET
* query encoded as HTTP requested URI
* limitation on query length
GET /sparql/?query=EncodedQuery HTTP/1.1
HOST: www.example.org
User-agent: my-sparql-client/0.1
2) HTTP POST
* query encoded in HTTP request body
* no limitation on query length

- SPARQL results format : based on Standard XML format
<?xml version="1.0" ?>
<sparql xmlns="http://www.w3.org/2005/sparql-results#">
<head>
<variable name="book" /> 
<variable name="who" />
</head>
<results distinct="false" ordered="false"> 
<result> 
<binding name="book"> 
<uri>http://www.example.org/book/book5</uri>
<binding>
<binding name="who">
<bnode>r29392923r2922</bnode>
</binding>
</result>
...
</results>
</sparql>

https://w3c.github.io/rdf-tests/sparql11/data-sparql11/protocol/index.html

4. SPARQL 1.1
: extending SPARQL

- INSERT DATA
PREFIX dc: <http://purl.org/dc/elements/1.1/>
INSERT DATA {
<http://example.org/book1> dc:title "A new book" ;
dc:creator "A.N.Other" .
}

- DELETE DATA
PREFIX dc: <http://purl.org/dc/elements/1.1/>
DELETE DATA {
<http://example.org/book2> dc:title "David Copperfield" ;
dc:creator "Edmund Wells" .
}

- UPDATE: DELETE/INSERT
PREFIX foaf: <http://xmlns.com/foaf/0.1>
DELETE DATA {
?person foaf:givenName "Bill" .
}
INSERT DATA {
?person foaf:givenName "William" .
}

- Aggregates
PREFIX : <http://books.example.org/>
SELECT (SUM(?lprice> AS ?totalPrice)
WHERE {
?org :affiliates ?auth .
?auth :writesBook ?book .
?book :price ?lprice .
}
GROUP BY ?org
HAVING (SUM(?lprice) > 10)

* other aggregate functions: COUNT, MIN, MAX, GROUP_CONCAT, SAMPLE

- Subqueries
PREFIX : <http://people.example.org/>
SELECT ?y ?minName
WHERE {
:alice :knows ?y .
{
SELECT ?y (MIN(?name) AS ?minName)
WHERE {
?y :name ?name .
}
GROUP BY ?y
}
}

- Alternatives: sugared syntax for UNION
PREFIX dc: <http://purl.org/dc/elements/1.1/>
SELECT ?displayString
WHERE {
:book1 dc:title|rdfs:label ?displayString .
}

- Sequences: sugared syntax for bnode
SELECT ?name
WHERE {
?x foaf:mbox <mailto:alice@example.org> .
?x foaf:knows/foaf:name ?name .
}
=> ?x foaf:knows [ foaf:name ?name ]

- Arbitrary Sequences
SELECT ?name
WHERE {
?x foaf:mbox <mailto:alice@example.org> .
?x foaf:knows+/foaf:name ?name .
}

* repeat operators: * (zero or more occurrences) / + (one or more occurrence)
{n} (exactly n occurrences) / {n,m} (between n and m occurrences)

- Inverses
SELECT ?name
WHERE {
<mailto:alice@example.org? ^foaf:mbox ?x .
?x foaf:knows/^foaf:name ?name .
FILTER(?x != ?name)
}

https://www.w3.org/TR/sparql11-protocol/

Description Logic

1. Description Logics
: A family of knowledge representation formalisms
- a subset of first order predicate logic (FOPL)
- decidable: trade-off of expressivity against algorithmic complexity

- Description logics restrict the predicate types that can be used
1) Person(x): unary predicates denote class membership
2) hasChild(x, y): binary predicates denote relations (roles) between instances

- Defining ontologies with Description Logics
1) describe classes (concepts) in terms of their necessary and sufficient attributes (roles)
* A is a necessary attribute of C: If an object is an instance of C, then it has A
* A is a sufficient attribute of C: If an object has A, then it is an instance of C

- Description Logic Reasoning Tasks
1) satisfaction: can this class have any instances?
2) subsumption: is every instance of class A necessarily an instance of class B?
3) classification: what classes is this object an instance of?

https://en.wikipedia.org/wiki/Description_logic

2. Syntax
- Concept Constructors

Boolean class constructors	￢C, C ∩ D, C ∪ D	Child ∩ Happy => the class of things which are both children and happy
Restrictions on role successors	∀R.C, ∃R.C	∀hasPet.Cat => the class of things all of whose pets are cats or, which only have pets that are cats ! includes those things which have no pets ∃hasPet.Cat => the class of things which have some pet that is a cat ! must have at least one pet
Number restrictions (cardinality constraints) on roles	≤n R, ≥n R, =n R	≥2 originCountry => the class of things with more than one country of origin
Nominal (singleton concepts)	{x}
Universal class, top	Τ
Contradiction, bottom	⊥

- Role Constructors

Inverse roles	R-
Transitive roles	R+
Role composition	R ° S

- OWL and Description Logics
1) Not every description logic supports all constructors
2) More constructors = more expressive = higher complexity
3) OWL DL is equivalent to the logic SHOIN(D)
http://www.cs.man.ac.uk/~ezolin/dl/

- Knowledge Bases
1) TBox terminology: a set of axioms describing the structure of the domain (concepts, roles)

Concept inclusion	C ⊆ D
Concept equivalence	C ≡ D
Role inclusion	R ⊆ S
Role equivalence	R ≡ S
Role transitivity	R+ ⊆ R

2) ABox terminology: a set of axioms describing a concrete situation (instances)

Concept instantiation	x:D
Role instantiation	<x, y>:R

3. Semantics
- Description Logics and Predicate Logic
1) Description Logics are a subset of first order Predicate Logic
2) Every DL expression can be converted into an equivalent FOPL expression

- Every concept C is translated to a formula ΦC(x)
1) Boolean class constructors
￢ΦC(x) = Φ￢C(x)
ΦC∩D(x) = ΦC(x) ∩ ΦD(x)
ΦC∪D(x) = ΦC(x) ∪ ΦD(x)

2) Restrictions
Φ∀R.C(y) = ∀x.R(y,x) ⇒ ΦC(x)
Φ∃R.C(y) = ∃x.R(y,x) ∧ ΦC(x)

3) Concept inclusion
C ⊆ D = ∀x.C(x) ⇒ ΦD(x)

4) Concept equivalence
C ≡ D = ∀x.C(x) ⇔ ΦD(x)

- Interpretation function : ext()

Syntax	Semantics	Notes
ext(￢C)	△\ext(C)	Complement
ext(C ∩ D)	ext(C) ∩ ext(D)	Conjunction
ext(C ∪ D)	ext(C) ∪ ext(D)	Disjunction
ext(∀R.C)	{x\| ∀y.<x,y> ∈ ext(R) ⇒ y ∈ ext(C)}	Universal
ext(∃R.C)	{x\| ∃y.<x,y> ∈ ext(R) ∧ y ∈ ext(C)}	Existential
≤n R	{x\| #{y\| <x,y> ∈ ext(R)} ≤ n }	Max Cardinality
≥n R	{x\| #{y\| <x,y> ∈ ext(R) ≥ n }	Min Cardinality
=n R	{x\| #{y\| <x,y> ∈ ext(R) = n }	Cardinality
ext(Τ)	△	Top
ext(⊥)	Ø	Bottom

https://arxiv.org/pdf/1201.4089.pdf

Ontologies

1. Knowledge Representation
- Knowledge representation is central to the Semantic Web
- Long-standing concern in Artificial Intelligence
- Most AI systems (and therefore the Semantic Web systems) consist of ...
1) a knowledge base (KB): structured according to the knowledge representation approach taken
2) an inference mechanism: set of procedures that are used to examine the knowledge base to answer questions solve problems or make decisions within the domain

https://en.wikipedia.org/wiki/Knowledge_representation_and_reasoning

2. Ontologies
- An ontology is a specification of a conceptualisation
1) specification: a formal description
2) conceptualisation: the objects, concepts, and other entities that are assumed to exist in some area of interest and the relationships that hold among them

- Ontology in Computer Science
1) constituted by a specific vocabulary used to describe a certain reality
2) a set of explicit assumptions regarding the intended meaning of the vocabulary
3) benefits: shared understanding / facilitate communication / inter-operability

- Ontology Structure: typically have two distinct components
1) Names for important concepts in the domain
2) Background knowledge/constraints on the domain

3. Informal Usage
: Informally, 'ontology' may also be used to describe a number of other types of conceptual specification

- Controlled Vocabularies
1) An explicitly enumerated list of terms, each with an unambiguous, non-redundant definition
2) no structure exists between terms - a flat list
e.g. Library of Congress Subject Headings (LCSH)

- Taxonomies
1) a collection of controlled vocabulary terms organised into a hierarchical structure
2) each term is in one or more parent-child relationships
e.g. Library Classification schemes(Library of Congress, Dewey Decimal, UDC)

- Thesauri
1) a taxonomy with additional relations showing lateral connections: Related Terms (RT) and See Also
2) parent-child relation usually described in terms of Broader Terms (BT) and Narrower Terms (NT)

- Ontology
1) an ontology further specialises types of relationships (particularly related terms)
2) typically includes class definitions and hierarchy, relation definitions ad hierarchy
3) may also include constraints, axioms and rule-based knowledge

- Summary
1) Controlled Vocabulary + Hierarchy = Taxonomy
2) Taxonomy + later relations = Thesaurus
3) Thesaurus + typed relations + constraints + rules + axioms = Ontology

http://www-ksl.stanford.edu/kst/what-is-an-ontology.html

http://protege.stanford.edu/publications/ontology_development/ontology101-noy-mcguinness.html