Class MachineLearningScikitFilter

All Implemented Interfaces:
Filter, IMatcher<org.apache.jena.ontology.OntModel,Alignment,Properties>, eu.sealsproject.platform.res.domain.omt.IOntologyMatchingToolBridge, eu.sealsproject.platform.res.tool.api.IPlugin, eu.sealsproject.platform.res.tool.api.IToolBridge

public class MachineLearningScikitFilter extends MatcherYAAAJena implements Filter
This filter learns and applies a classifier given a training sample and an existing alignment.
  • Field Details

    • LOGGER

      private static final org.slf4j.Logger LOGGER
      Default logger.
    • trainingGenerator

      private MatcherYAAAJena trainingGenerator
      Generator for training data. If relation is equivalence, then this is the positive class. All other relations are the negative class.
    • confidenceNames

      private List<String> confidenceNames
      Which additional confidences should be used to train the classifier.
    • crossValidationNumber

      private int crossValidationNumber
      Number of cross validation to execute.
    • numberOfParallelJobs

      private int numberOfParallelJobs
      Number of jobs to execute in parallel.
  • Constructor Details

    • MachineLearningScikitFilter

      public MachineLearningScikitFilter()
    • MachineLearningScikitFilter

      public MachineLearningScikitFilter(Alignment trainingAlignment)
    • MachineLearningScikitFilter

      public MachineLearningScikitFilter(Alignment trainingAlignment, int crossValidationNumber, int numberOfParallelJobs)
    • MachineLearningScikitFilter

      public MachineLearningScikitFilter(MatcherYAAAJena trainingGenerator)
    • MachineLearningScikitFilter

      public MachineLearningScikitFilter(MatcherYAAAJena trainingGenerator, List<String> confidenceNames)
    • MachineLearningScikitFilter

      public MachineLearningScikitFilter(MatcherYAAAJena trainingGenerator, int crossValidationNumber, int numberOfParallelJobs)
    • MachineLearningScikitFilter

      public MachineLearningScikitFilter(MatcherYAAAJena trainingGenerator, List<String> confidenceNames, int crossValidationNumber, int numberOfParallelJobs)
      Constructor
      Parameters:
      trainingGenerator - generator for training data.
      confidenceNames - confidence names to use.
      crossValidationNumber - Number of cross validation to execute.
      numberOfParallelJobs - Number of jobs to execute in parallel.
  • Method Details

    • match

      public Alignment match(org.apache.jena.ontology.OntModel source, org.apache.jena.ontology.OntModel target, Alignment inputAlignment, Properties properties) throws Exception
      Description copied from class: MatcherYAAAJena
      Aligns two ontologies specified via a Jena OntModel, with an input alignment as Alignment object, and returns the mapping of the resulting alignment. Note: This method might be called multiple times in a row when using the evaluation framework. Make sure to return a mapping which is specific to the given inputs.
      Specified by:
      match in interface IMatcher<org.apache.jena.ontology.OntModel,Alignment,Properties>
      Specified by:
      match in class MatcherYAAAJena
      Parameters:
      source - This OntModel represents the source ontology.
      target - This OntModel represents the target ontology.
      inputAlignment - This mapping represents the input alignment.
      properties - Additional properties.
      Returns:
      The resulting alignment of the matching process.
      Throws:
      Exception - Any exception which occurs during matching.
    • trainAndApplyMLModel

      public static Alignment trainAndApplyMLModel(Alignment trainAlignment, Alignment predictAlignment, List<String> confidenceNames, int crossValidationNumber, int numberOfParallelJobs)
      Trains a machine learning model in python and applies it to the predictAlignment to filter it.
      Parameters:
      trainAlignment - Correspondences with an EQUIVALENCE relation are treated as positives. All other relations are treated as negatives.
      predictAlignment - the alignment to filter
      confidenceNames - the confidence names of the alignment to use (leave empty to use all additional confidences from trainAlignment.
      crossValidationNumber - the number of folds when doing a cross validation.
      numberOfParallelJobs - number of parallel jobs.
      Returns:
      the filtered alignment
    • trainAndStoreMLModel

      public static List<String> trainAndStoreMLModel(Alignment alignment, File modelFile, List<String> confidenceNames, int crossValidationNumber, int numberOfParallelJobs)
      Trains a machine learning model in python based on the given alignment and then stores the best model in a file.
      Parameters:
      alignment - Correspondences with an EQUIVALENCE relation are treated as positives. All other relations are treated as negatives.
      modelFile - the file to store the best model.
      confidenceNames - the confidence names of the alignment to use (leave empty to use all additional confidences from trainAlignment.
      crossValidationNumber - the number of folds when doing a cross validation.
      numberOfParallelJobs - number of parallel jobs.
      Returns:
      the confidences names which are used (can be directly used as input for confidenceNames in applyStoredMLModel)
    • applyStoredMLModel

      public static Alignment applyStoredMLModel(File modelFile, Alignment predictAlignment, List<String> confidenceNames)
      Load a machine learning model from a file (trained/generated with trainAndStoreMLModel) and apply it to the alignment which is then filtered.
      Parameters:
      modelFile - the file to load the ML model.
      predictAlignment - the alignment which should be filtered.
      confidenceNames - the confidence names of the alignment to use (have to be the same as in training - order has to be the same).
      Returns:
      the filtered alignment.
    • filterAlignment

      private static Alignment filterAlignment(Alignment fullAlignment, List<Correspondence> orderedAlignment, List<Integer> predictions)
    • writeDataset

      private static void writeDataset(List<Correspondence> alignment, File file, boolean includeTarget, List<String> confidenceNames) throws IOException
      Writes the given alignment to a file.
      Parameters:
      alignment - Dataset to write. Correspondences with an EQUIVALENCE relation are treated as positives. All other relations are treated as negatives.
      file - File to write.
      includeTarget - If true, the label (0 for negatives, 1 for positives) will be persisted.
      confidenceNames - the confidence names of the alignment to use.
      Throws:
      IOException - Exception in case of problems while writing.