The MELT-ML module exposes some machine learning functionality that is implemented in Python. This is achieved through the start of a python process within Java. The communication is performed through local HTTP calls. This is also shown in the following figure.
The program will use the default
python command of your system path. Note that Python 3 is required together with the dependencies listed in /matching-ml/melt-resources/requirements.txt.
If you want to use a special python environment, you can create a file named
python_command.txt in your
melt-resources directory (create if not existing) containing the path to your python executable. You can, for example, use the executable of a certain Anaconda environment.
Here, an Anaconda environment, named
matching will be used.
Machine learning requires data for training. For positive examples, a share from the reference can be sampled (e.g. using method
generateTrackWithSampledReferenceAlignment) or a high-precision matcher can be used. Multiple strategies exist to generate negative samples:
For example, negatives can be generated randomly using a absolute number of negatives to be generated (class
AddNegativesRandomlyAbsolute) or a relative share of negatives (class
AddNegativesRandomlyShare). If the gold standard is not known, is also possible to exploit the one-to-one assumption and add random correspondences involving elements that already appear in the positive set of correspondences (class
AddNegativesRandomlyOneOneAssumption). MELT contains multiple out-of-the box strategies that are already implemented as matching components which can be used within a matching pipeline. All of them implement interface
AddNegatives. Since multiple flavors can be thought of (e.g. generating type homogeneous or type heterogeneous correspondences), a negatives generator can be easily written from scratch or customized for specific purposes. MELT offers some helper classes to do so such as
RandomSampleOntModel which can be used to sample elements from ontologies.
MELT offers the usage of transformer models through a filter class
TransformersFilter. For reasons of performance, it is sensible to use a high-recall matcher first and use the transformer components for the final selection of correspondences. Any transformer model that is locally available or available via the huggingface repository can be used in MELT.
In a first step, the filter will write a (temporary) CSV file to disk with the string representations of each correspondence. The desired string representations can be defined with a
TextExtractor. The transformer python code is called via the python server (see above).
TransformerFilter is intended to be used in a matching pipeline:
The default transformer training objectives are not suitable for the task of ontology matching. Therefore, a pre-trained model needs to be fine-tuned. Once a training alignment is available, class
TransformersFineTuner can be used to train and persist a model (which then can be applied in a matching pipeline using
TransformerFilter as shown in the figure above).
Like the TransformersFilter, the TransformersFineTuner is a matching component that can be used in a matching pipeline. Note that this pipeline can only be used for training and model serialization. An example for such a pipeline is visualized below: