Matching with Background Knowledge
MELT supports multiple external sources of background knowledge for matching:
Core Concepts
The related classes/implementations can be found in de.uni_mannheim.informatik.dws.melt.matching_jena_matchers.external
.
Any external background knowledge source implements ExternalResource
and, therefore, has a name (getName()
) and an associated linker (getLinker()
). A LabelToConceptLinker
is responsible for linking natural language Strings, such as “European Union” to concepts in the background knowledge source, such as Q458. Throughout the implementation, there is a distinction between a link which can be any identifier in the background knowledge source and a label.
There are currently two relevant capabilities (interfaces): SynonymCapability
for external resources that contain synonyms (or heuristics to obtain those) and HypernymCapability
for external resources that contain hypernyms (broader concepts).
Matching with WordNet
WordNet is a well known lexical resource. It is a database of English words grouped in sets which represent a particular meaning, called synsets; further semantic relations such as hypernymy also exist in the database. The resource is publicly available. The knowledge source can be used to obtain synonyms (SynonymCapability
) and hypernyms (HypernymCapability
). The core class is WordNetKnowledgeSource
.
Matching with Wiktionary
Wiktionary is a collaboratively built dictionary. As there is no official API for this dataset, the DBnary graph is used. The knowledge source can be used to obtain synonyms (SynonymCapability
) and hypernyms (HypernymCapability
).
The core class is WiktionaryKnowledgeSource
. If a TDB path is passed to the constuctor, TDB is used, else a SPARQL connection to the endpoint is established.
Use Wiktionary with TDB
- Download the core files in your desired language from the DBnary download page.
- Unzip the bz2 file.
- Install Apache TDB Command Line Utilities.
- Create your TDB dataset e.g. by running
tdbloader2 --loc ./wiktionary_tdb en_dbnary_ontolex_20210301.ttl
- Initialize
WiktionaryKnowledgeSource
with the path to your tdb directory (in this case<...>/wiktionary_tdb
)
Matching with Wikidata
Wikidata is a publicly built knowledge graph. The knowledge source can be used to obtain synonyms (SynonymCapability
) and hypernyms (HypernymCapability
). The core class is WikidataKnowledgeSource
.
Matching with DBpedia
The knowledge source can be used to obtain synonyms (SynonymCapability
) and hypernyms (HypernymCapability
). The core class is DBpediaKnowledgeSourceTest
. If a TDB path is passed to the constuctor, TDB is used, else a SPARQL connection to the endpoint is established.
Use DBpedia with TDB
Create a TDB dataset (see instructions above) which is comprised of at least the following files:
disambiguations_lang=en.ttl
labels_lang=en.ttl
instance-types_lang=en_specific.ttl
mappingbased-literals_lang=en.ttl
A full overview of DBpedia download links can be found on the databus Web page.
Matching with BabelNet
BabelNet is a very large multilingual knowledge graph that combines multiple other sources such as Wikipedia, WordNet, and Wiktionary. Unlike the other sources of background knowledge, BabelNet cannot be easily mass-queried. Researchers need to ask for the Lucene indices. Those are required to run the MELT BabelNet module. The core class is BabelNetKnowledgeSource
.
In order to use BabelNet, perform the following steps: 1) Obtain the BabelNet indices (you can request them via the BabelNet Web site). 2) Copy the config
folder to your project root. 3) Set the babelnet.dir
in the template_babelnet.var.properties
file and rename it to babelnet.var.properties
. The babelnet.dir
property needs contain the directory where the BabelNet indices are stored. 4) Download the WordNet database files. 5) Set the jlt.wordnetVersion
and the jlt.wordnetPrefix
in template_jlt.var.properties
. Rename the file to jlt.var.properties
.
If you build a JAR file, make sure that the config directory exists in the same directory in which the JAR is executed.
Matching with WebIsALOD
WebIsALOD is a large RDF graph consisting of Web crawled hypernymy relations. The graph is available in two flavors: A filtered version containing less noise (referred to in the implementation as WebIsAlod
Classic
) and the full version containing a decent amount of noise (referred to in the implementation as WebIsAlod
XL
). The core classes are WebIsAlodClassicKnowledgeSource
and WebIsAlodXLKnowledgeSource
.