Matching with Background Knowledge

MELT supports multiple external sources of background knowledge for matching:

  1. WordNet
  2. Wiktionary
  3. Wikidata
  4. DBpedia
  5. BabelNet
  6. WebIsALOD

Core Concepts

The related classes/implementations can be found in de.uni_mannheim.informatik.dws.melt.matching_jena_matchers.external.

Any external background knowledge source implements ExternalResource and, therefore, has a name (getName()) and an associated linker (getLinker()). A LabelToConceptLinker is responsible for linking natural language Strings, such as “European Union” to concepts in the background knowledge source, such as Q458. Throughout the implementation, there is a distinction between a link which can be any identifier in the background knowledge source and a label.

There are currently two relevant capabilities (interfaces): SynonymCapability for external resources that contain synonyms (or heuristics to obtain those) and HypernymCapability for external resources that contain hypernyms (broader concepts).

Matching with WordNet

WordNet is a well known lexical resource. It is a database of English words grouped in sets which represent a particular meaning, called synsets; further semantic relations such as hypernymy also exist in the database. The resource is publicly available. The knowledge source can be used to obtain synonyms (SynonymCapability) and hypernyms (HypernymCapability). The core class is WordNetKnowledgeSource.

Matching with Wiktionary

Wiktionary is a collaboratively built dictionary. As there is no official API for this dataset, the DBnary graph is used. The knowledge source can be used to obtain synonyms (SynonymCapability) and hypernyms (HypernymCapability).

The core class is WiktionaryKnowledgeSource. If a TDB path is passed to the constuctor, TDB is used, else a SPARQL connection to the endpoint is established.

Use Wiktionary with TDB

  • Download the core files in your desired language from the DBnary download page.
  • Unzip the bz2 file.
  • Install Apache TDB Command Line Utilities.
  • Create your TDB dataset e.g. by running tdbloader2 --loc ./wiktionary_tdb en_dbnary_ontolex_20210301.ttl
  • Initialize WiktionaryKnowledgeSource with the path to your tdb directory (in this case <...>/wiktionary_tdb)

Matching with Wikidata

Wikidata is a publicly built knowledge graph. The knowledge source can be used to obtain synonyms (SynonymCapability) and hypernyms (HypernymCapability). The core class is WikidataKnowledgeSource.

Matching with DBpedia

The knowledge source can be used to obtain synonyms (SynonymCapability) and hypernyms (HypernymCapability). The core class is DBpediaKnowledgeSourceTest. If a TDB path is passed to the constuctor, TDB is used, else a SPARQL connection to the endpoint is established.

Use DBpedia with TDB

Create a TDB dataset (see instructions above) which is comprised of at least the following files:

A full overview of DBpedia download links can be found on the databus Web page.

Matching with BabelNet

BabelNet is a very large multilingual knowledge graph that combines multiple other sources such as Wikipedia, WordNet, and Wiktionary. Unlike the other sources of background knowledge, BabelNet cannot be easily mass-queried. Researchers need to ask for the Lucene indices. Those are required to run the MELT BabelNet module. The core class is BabelNetKnowledgeSource.

In order to use BabelNet, perform the following steps: 1) Obtain the BabelNet indices (you can request them via the BabelNet Web site). 2) Copy the config folder to your project root. 3) Set the babelnet.dir in the template_babelnet.var.properties file and rename it to babelnet.var.properties. The babelnet.dir property needs contain the directory where the BabelNet indices are stored. 4) Download the WordNet database files. 5) Set the jlt.wordnetVersion and the jlt.wordnetPrefix in template_jlt.var.properties. Rename the file to jlt.var.properties.

If you build a JAR file, make sure that the config directory exists in the same directory in which the JAR is executed.

Matching with WebIsALOD

WebIsALOD is a large RDF graph consisting of Web crawled hypernymy relations. The graph is available in two flavors: A filtered version containing less noise (referred to in the implementation as WebIsAlodClassic) and the full version containing a decent amount of noise (referred to in the implementation as WebIsAlodXL). The core classes are WebIsAlodClassicKnowledgeSource and WebIsAlodXLKnowledgeSource.