eu.sealsproject.platform.res.tool.impl.AbstractPlugin

de.uni_mannheim.informatik.dws.melt.matching_ml.python.nlptransformers.LLMBinaryFilter

All Implemented Interfaces:: Filter, IMatcher<org.apache.jena.ontology.OntModel,Alignment,Properties>, eu.sealsproject.platform.res.domain.omt.IOntologyMatchingToolBridge, eu.sealsproject.platform.res.tool.api.IPlugin, eu.sealsproject.platform.res.tool.api.IToolBridge

public class LLMBinaryFilter extends LLMBase implements Filter

This filter asks a LLM if a given correspondence is correct or not. It has no information about the other correspondences and each correspondence becomes a prediction example for the LLM. It will add the corresponding confidence to the correspondence such that a filtering afterwards is possible.

Field Summary

Fields

Modifier and Type

Field

Description

private static final org.slf4j.Logger

LOGGER

protected Set<String>

negativeWords

Set of negative words to use.

private static final String

NEWLINE

protected Set<String>

positiveWords

Set of positive words to use.

Fields inherited from class de.uni_mannheim.informatik.dws.melt.matching_ml.python.nlptransformers.LLMBase
debugFile, loadingArguments, promt, wordForcer, wordStopper

Fields inherited from class de.uni_mannheim.informatik.dws.melt.matching_ml.python.nlptransformers.TransformersBase
cudaVisibleDevices, extractor, modelName, multipleTextsToMultipleExamples, multiProcessing, trainingArguments, transformersCache, usingTensorflow

Fields inherited from class de.uni_mannheim.informatik.dws.melt.matching_base.MatcherFile
FILE_PREFIX, FILE_SUFFIX
Constructor Summary

Constructors

Constructor

Description

LLMBinaryFilter(TextExtractorMap extractor, String modelName, String promt)

Constructor with all required parameters and default values for optional parameters (can be changed by setters).

LLMBinaryFilter(TextExtractor extractor, String modelName, String promt)

Constructor with all required parameters and default values for optional parameters (can be changed by setters).
Method Summary

Modifier and Type

Method

Description

void

addNegativeWord(String negativeWord)

void

addPositiveWord(String positiveWord)

Map<Correspondence,List<Integer>>

createPredictionFile(org.apache.jena.ontology.OntModel source, org.apache.jena.ontology.OntModel target, Alignment predictionAlignment, File outputFile, boolean append)

Create the prediction file which is a CSV file with two columns.The first column is the text from the left resource and the second column is the text from the right resource.

Set<String>

getNegativeWords()

Set<String>

getPositiveWords()

protected List<Set<String>>

getWordsToDetect()

Alignment

match(org.apache.jena.ontology.OntModel source, org.apache.jena.ontology.OntModel target, Alignment inputAlignment, Properties properties)

Aligns two ontologies specified via a Jena OntModel, with an input alignment as Alignment object, and returns the mapping of the resulting alignment.

void

setNegativeWords(Set<String> negativeWords)

void

setPositiveWords(Set<String> positiveWords)

Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_ml.python.nlptransformers.LLMBase
addGenerationArgument, addLoadingArgument, addLoadingArguments, addTrainingArgument, getDebugFile, getGenerationArguments, getLoadingArguments, getPromt, includeMoreVariations, includeMoreVariations, isWordForcer, isWordStopper, predictConfidences, setDebugFile, setGenerationArguments, setLoadingArguments, setPromt, setTrainingArguments, setWordForcer, setWordStopper

Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_ml.python.nlptransformers.TransformersBase
getCudaVisibleDevices, getCudaVisibleDevicesButOnlyOneGPU, getExamplesForBatchSizeOptimization, getExtractor, getExtractorMap, getModelName, getMultiProcessing, getTextualRepresentation, getTrainingArguments, getTransformersCache, isMultipleTextsToMultipleExamples, isOptimizeForMixedPrecisionTraining, isUsingTensorflow, setCudaVisibleDevices, setCudaVisibleDevices, setExtractor, setExtractorMap, setModelName, setMultipleTextsToMultipleExamples, setMultiProcessing, setOptimizeForMixedPrecisionTraining, setTransformersCache, setUsingTensorflow, writeExamplesToFile

Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_jena.MatcherYAAAJena
getModelSpec, match, readOntology

Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_jena.MatcherYAAA
match

Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_base.MatcherFile
match

Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_base.MatcherURL
align, align, canExecute, getType

Methods inherited from class eu.sealsproject.platform.res.tool.impl.AbstractPlugin
getId, getVersion, setId, setVersion

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface eu.sealsproject.platform.res.tool.api.IPlugin
getId, getVersion

Field Details
- NEWLINE
  
  private static final String NEWLINE
- LOGGER
  
  private static final org.slf4j.Logger LOGGER
- positiveWords
  
  protected Set<String> positiveWords
  
  Set of positive words to use. Default is "yes" (and some variations of it (generated with LLMBase.includeMoreVariations(java.util.Set))
- negativeWords
  
  protected Set<String> negativeWords
  
  Set of negative words to use. Default is "no" (and some variations of it (generated with LLMBase.includeMoreVariations(java.util.Set))
Constructor Details
- LLMBinaryFilter
  
  public LLMBinaryFilter(TextExtractorMap extractor, String modelName, String promt)
  
  Constructor with all required parameters and default values for optional parameters (can be changed by setters). It uses the systems default tmp dir to store the files with texts generated from the knowledge graphs. Pytorch is used instead of tensorflow and all visible GPUs are used for prediction.
  
  Parameters:
  
  extractor - the extractor to select which text for each resource should be used.
  
  modelName - the model name which can be a model id (a hosted model on huggingface.co) or a path to a directory containing a model and tokenizer ( see first parameter pretrained_model_name_or_path of the from_pretrained function in huggingface library). In case of a path, it should be absolute. The path can be generated by e.g. FileUtil.getCanonicalPathIfPossible(java.io.File)
  
  promt - The promt to use for the LLM. Use {left} and {right} to insert the text representation of the left and right concept.
- LLMBinaryFilter
  
  public LLMBinaryFilter(TextExtractor extractor, String modelName, String promt)
  
  Constructor with all required parameters and default values for optional parameters (can be changed by setters). It uses the systems default tmp dir to store the files with texts generated from the knowledge graphs. Pytorch is used instead of tensorflow and all visible GPUs are used for prediction.
  
  Parameters:
  
  extractor - the extractor to select which text for each resource should be used.
  
  modelName - the model name which can be a model id (a hosted model on huggingface.co) or a path to a directory containing a model and tokenizer ( see first parameter pretrained_model_name_or_path of the from_pretrained function in huggingface library). In case of a path, it should be absolute. The path can be generated by e.g. FileUtil.getCanonicalPathIfPossible(java.io.File)
  
  promt - The promt to use for the LLM. Use {left} and {right} to insert the text representation of the left and right concept.
Method Details
- match
  
  public Alignment match(org.apache.jena.ontology.OntModel source, org.apache.jena.ontology.OntModel target, Alignment inputAlignment, Properties properties) throws Exception
  
  Description copied from class: MatcherYAAAJena
  
  Aligns two ontologies specified via a Jena OntModel, with an input alignment as Alignment object, and returns the mapping of the resulting alignment. Note: This method might be called multiple times in a row when using the evaluation framework. Make sure to return a mapping which is specific to the given inputs.
  
  Specified by:
  
  match in interface IMatcher<org.apache.jena.ontology.OntModel,Alignment,Properties>
  
  Specified by:
  
  match in class MatcherYAAAJena
  
  Parameters:
  
  source - This OntModel represents the source ontology.
  
  target - This OntModel represents the target ontology.
  
  inputAlignment - This mapping represents the input alignment.
  
  properties - Additional properties.
  
  Returns:
  
  The resulting alignment of the matching process.
  
  Throws:
  
  Exception - Any exception which occurs during matching.
- createPredictionFile
  
  public Map<Correspondence,List<Integer>> createPredictionFile(org.apache.jena.ontology.OntModel source, org.apache.jena.ontology.OntModel target, Alignment predictionAlignment, File outputFile, boolean append) throws IOException
  
  Create the prediction file which is a CSV file with two columns.The first column is the text from the left resource and the second column is the text from the right resource.
  
  Parameters:
  
  source - The source model
  
  target - The target model
  
  predictionAlignment - the alignment to process. All correspondences which have enough text are used.
  
  outputFile - the csv file to which the output should be written to.
  
  append - if true, then the training alignment is append to the given file.
  
  Returns:
  
  the map which maps the the correspondence to (possibly multiple) row numbers. In case of multipleTextsToMultipleExamples is set to true, multiple rows can correspond to one correspondence, because each text (e.g. label, comment etc) of the two resources is used as an example.
  
  Throws:
  
  IOException - in case the writing fails.
- getPositiveWords
  
  public Set<String> getPositiveWords()
- setPositiveWords
  
  public void setPositiveWords(Set<String> positiveWords)
- addPositiveWord
  
  public void addPositiveWord(String positiveWord)
- getNegativeWords
  
  public Set<String> getNegativeWords()
- setNegativeWords
  
  public void setNegativeWords(Set<String> negativeWords)
- addNegativeWord
  
  public void addNegativeWord(String negativeWord)
- getWordsToDetect
  
  protected List<Set<String>> getWordsToDetect()

Class LLMBinaryFilter

Field Summary

Fields inherited from class de.uni_mannheim.informatik.dws.melt.matching_ml.python.nlptransformers.LLMBase

Fields inherited from class de.uni_mannheim.informatik.dws.melt.matching_ml.python.nlptransformers.TransformersBase

Fields inherited from class de.uni_mannheim.informatik.dws.melt.matching_base.MatcherFile

Constructor Summary

Method Summary

Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_ml.python.nlptransformers.LLMBase

Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_ml.python.nlptransformers.TransformersBase

Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_jena.MatcherYAAAJena

Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_jena.MatcherYAAA

Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_base.MatcherFile

Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_base.MatcherURL

Methods inherited from class eu.sealsproject.platform.res.tool.impl.AbstractPlugin

Methods inherited from class java.lang.Object

Methods inherited from interface eu.sealsproject.platform.res.tool.api.IPlugin

Field Details

NEWLINE

LOGGER

positiveWords

negativeWords

Constructor Details

LLMBinaryFilter

LLMBinaryFilter

Method Details

match

createPredictionFile

getPositiveWords

setPositiveWords

addPositiveWord

getNegativeWords

setNegativeWords

addNegativeWord

getWordsToDetect