eu.sealsproject.platform.res.tool.impl.AbstractPlugin

de.uni_mannheim.informatik.dws.melt.matching_ml.python.nlptransformers.TransformersFilter

All Implemented Interfaces:: Filter, IMatcher<org.apache.jena.ontology.OntModel,Alignment,Properties>, eu.sealsproject.platform.res.domain.omt.IOntologyMatchingToolBridge, eu.sealsproject.platform.res.tool.api.IPlugin, eu.sealsproject.platform.res.tool.api.IToolBridge

Direct Known Subclasses:: RelationTypePredictor

public class TransformersFilter extends TransformersBase implements Filter

This filter extracts the corresponding text for a resource (with the specified and customizable extractor) given all correspondences in the input alignment. The texts of the two resources are fed into the specified transformer model and the prediction is added in form of a confidence to the correspondence. No filtering is applied in this class.

Field Summary

Fields

Modifier and Type

Field

Description

protected BatchSizeOptimization

batchSizeOptimization

protected boolean

changeClass

private static final org.slf4j.Logger

LOGGER

private static final String

NEWLINE

Fields inherited from class de.uni_mannheim.informatik.dws.melt.matching_ml.python.nlptransformers.TransformersBase
cudaVisibleDevices, extractor, modelName, multipleTextsToMultipleExamples, multiProcessing, trainingArguments, transformersCache, usingTensorflow

Fields inherited from class de.uni_mannheim.informatik.dws.melt.matching_base.MatcherFile
FILE_PREFIX, FILE_SUFFIX
Constructor Summary

Constructors

Constructor

Description

TransformersFilter(TextExtractorMap extractor, String modelName)

Constructor with all required parameters and default values for optional parameters (can be changed by setters).

TransformersFilter(TextExtractor extractor, String modelName)

Constructor with all required parameters and default values for optional parameters (can be changed by setters).
Method Summary

Modifier and Type

Method

Description

Map<Correspondence,List<Integer>>

createPredictionFile(org.apache.jena.ontology.OntModel source, org.apache.jena.ontology.OntModel target, Alignment predictionAlignment, File outputFile, boolean append)

Create the prediction file which is a CSV file with two columns.The first column is the text from the left resource and the second column is the text from the right resource.

BatchSizeOptimization

getBatchSizeOptimization()

Returns how the batch size is optimized.

protected int

getMaximumPerDeviceEvalBatchSize(File trainingFile)

This functions tries to execute the prediction with number of example equal to the tested batch size.

boolean

isChangeClass()

Return true if the class is changed in the classification.

boolean

isOptimizeAll()

This will return the value if all optimization techiques are enabled or diabled.

boolean

isOptimizeBatchSize()

Deprecated.
better use getBatchSizeOptimization

Alignment

match(org.apache.jena.ontology.OntModel source, org.apache.jena.ontology.OntModel target, Alignment inputAlignment, Properties properties)

Aligns two ontologies specified via a Jena OntModel, with an input alignment as Alignment object, and returns the mapping of the resulting alignment.

List<Double>

predictConfidences(File predictionFilePath)

Run huggingface transformers library.

void

setBatchSizeOptimization(BatchSizeOptimization batchSizeOptimization)

Sets how the batch size is optimized.

void

setChangeClass(boolean changeClass)

If set to true, the class is changed in the classification.

void

setOptimizeAll(boolean optimize)

This will enabled or disable all possible optimization to improve prediction speed.

void

setOptimizeBatchSize(boolean optimizeBatchSize)

Deprecated.
better use setBatchSizeOptimization

Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_ml.python.nlptransformers.TransformersBase
addTrainingArgument, getCudaVisibleDevices, getCudaVisibleDevicesButOnlyOneGPU, getExamplesForBatchSizeOptimization, getExtractor, getExtractorMap, getModelName, getMultiProcessing, getTextualRepresentation, getTrainingArguments, getTransformersCache, isMultipleTextsToMultipleExamples, isOptimizeForMixedPrecisionTraining, isUsingTensorflow, setCudaVisibleDevices, setCudaVisibleDevices, setExtractor, setExtractorMap, setModelName, setMultipleTextsToMultipleExamples, setMultiProcessing, setOptimizeForMixedPrecisionTraining, setTrainingArguments, setTransformersCache, setUsingTensorflow, writeExamplesToFile

Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_jena.MatcherYAAAJena
getModelSpec, match, readOntology

Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_jena.MatcherYAAA
match

Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_base.MatcherFile
match

Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_base.MatcherURL
align, align, canExecute, getType

Methods inherited from class eu.sealsproject.platform.res.tool.impl.AbstractPlugin
getId, getVersion, setId, setVersion

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface eu.sealsproject.platform.res.tool.api.IPlugin
getId, getVersion

Field Details
- LOGGER
  
  private static final org.slf4j.Logger LOGGER
- NEWLINE
  
  private static final String NEWLINE
- changeClass
  
  protected boolean changeClass
- batchSizeOptimization
  
  protected BatchSizeOptimization batchSizeOptimization
Constructor Details
- TransformersFilter
  
  public TransformersFilter(TextExtractor extractor, String modelName)
  
  Constructor with all required parameters and default values for optional parameters (can be changed by setters). It uses the systems default tmp dir to store the files with texts generated from the knowledge graphs. Pytorch is used instead of tensorflow and all visible GPUs are used for prediction.
  
  Parameters:
  
  extractor - the extractor to select which text for each resource should be used.
  
  modelName - the model name which can be a model id (a hosted model on huggingface.co) or a path to a directory containing a model and tokenizer ( see first parameter pretrained_model_name_or_path of the from_pretrained function in huggingface library). In case of a path, it should be absolute. The path can be generated by e.g. FileUtil.getCanonicalPathIfPossible(java.io.File)
- TransformersFilter
  
  public TransformersFilter(TextExtractorMap extractor, String modelName)
  
  Constructor with all required parameters and default values for optional parameters (can be changed by setters). It uses the systems default tmp dir to store the files with texts generated from the knowledge graphs. Pytorch is used instead of tensorflow and all visible GPUs are used for prediction.
  
  Parameters:
  
  extractor - the extractor to select which text for each resource should be used.
  
  modelName - the model name which can be a model id (a hosted model on huggingface.co) or a path to a directory containing a model and tokenizer ( see first parameter pretrained_model_name_or_path of the from_pretrained function in huggingface library). In case of a path, it should be absolute. The path can be generated by e.g. FileUtil.getCanonicalPathIfPossible(java.io.File)
Method Details
- match
  
  public Alignment match(org.apache.jena.ontology.OntModel source, org.apache.jena.ontology.OntModel target, Alignment inputAlignment, Properties properties) throws Exception
  
  Description copied from class: MatcherYAAAJena
  
  Aligns two ontologies specified via a Jena OntModel, with an input alignment as Alignment object, and returns the mapping of the resulting alignment. Note: This method might be called multiple times in a row when using the evaluation framework. Make sure to return a mapping which is specific to the given inputs.
  
  Specified by:
  
  match in interface IMatcher<org.apache.jena.ontology.OntModel,Alignment,Properties>
  
  Specified by:
  
  match in class MatcherYAAAJena
  
  Parameters:
  
  source - This OntModel represents the source ontology.
  
  target - This OntModel represents the target ontology.
  
  inputAlignment - This mapping represents the input alignment.
  
  properties - Additional properties.
  
  Returns:
  
  The resulting alignment of the matching process.
  
  Throws:
  
  Exception - Any exception which occurs during matching.
- createPredictionFile
  
  public Map<Correspondence,List<Integer>> createPredictionFile(org.apache.jena.ontology.OntModel source, org.apache.jena.ontology.OntModel target, Alignment predictionAlignment, File outputFile, boolean append) throws IOException
  
  Create the prediction file which is a CSV file with two columns.The first column is the text from the left resource and the second column is the text from the right resource.
  
  Parameters:
  
  source - The source model
  
  target - The target model
  
  predictionAlignment - the alignment to process. All correspondences which have enough text are used.
  
  outputFile - the csv file to which the output should be written to.
  
  append - if true, then the training alignment is append to the given file.
  
  Returns:
  
  the map which maps the the correspondence to (possibly multiple) row numbers. In case of multipleTextsToMultipleExamples is set to true, multiple rows can correspond to one correspondence, because each text (e.g. label, comment etc) of the two resources is used as an example.
  
  Throws:
  
  IOException - in case the writing fails.
- predictConfidences
  
  public List<Double> predictConfidences(File predictionFilePath) throws Exception
  
  Run huggingface transformers library.
  
  Parameters:
  
  predictionFilePath - path to csv file with two columns (text left and text right).
  
  Returns:
  
  a list of confidences
  
  Throws:
  
  Exception - in case something goes wrong.
- getMaximumPerDeviceEvalBatchSize
  
  protected int getMaximumPerDeviceEvalBatchSize(File trainingFile)
  
  This functions tries to execute the prediction with number of example equal to the tested batch size. It will start with 2 and checks only powers of 2.
  
  Parameters:
  
  trainingFile - the training file to use
  
  Returns:
  
  the maximum per_device_eval_batch_size
- isChangeClass
  
  public boolean isChangeClass()
  
  Return true if the class is changed in the classification. This is useful if a pretrained model predict exactly the opposite class.
  
  Returns:
  
  true if the class is changed in the classification.
- setChangeClass
  
  public void setChangeClass(boolean changeClass)
  
  If set to true, the class is changed in the classification. This is useful if a pretrained model predict exactly the opposite class.
  
  Parameters:
  
  changeClass - true if the class should be changed in the classification.
- isOptimizeBatchSize
  
  public boolean isOptimizeBatchSize()
  
  Deprecated.
  better use getBatchSizeOptimization
  
  Return true if batch size optimization is turned on.
  
  Returns:
  
  true if batch size optimization is turned on.
- setOptimizeBatchSize
  
  public void setOptimizeBatchSize(boolean optimizeBatchSize)
  
  Deprecated.
  better use setBatchSizeOptimization
  
  Set the value if batch size should be optimized before running the prediction. This should only be set to true, if the dataset is huge. Otherwise the algorithm to find the largest batch size needs too much time.
  
  Parameters:
  
  optimizeBatchSize - if true, optimize the batch size every time the match method is called.
- getBatchSizeOptimization
  
  public BatchSizeOptimization getBatchSizeOptimization()
  
  Returns how the batch size is optimized.
  
  Returns:
  
  how the batch size is optimized
- setBatchSizeOptimization
  
  public void setBatchSizeOptimization(BatchSizeOptimization batchSizeOptimization)
  
  Sets how the batch size is optimized.
  
  Parameters:
  
  batchSizeOptimization - how the batch size is optimized
- setOptimizeAll
  
  public void setOptimizeAll(boolean optimize)
  
  This will enabled or disable all possible optimization to improve prediction speed. Currently this includes mixed precision training and batch size optimization.
  
  Parameters:
  
  optimize - true to enable
- isOptimizeAll
  
  public boolean isOptimizeAll()
  
  This will return the value if all optimization techiques are enabled or diabled.
  
  Returns:
  
  true if enabled.

Class TransformersFilter

Field Summary

Fields inherited from class de.uni_mannheim.informatik.dws.melt.matching_ml.python.nlptransformers.TransformersBase

Fields inherited from class de.uni_mannheim.informatik.dws.melt.matching_base.MatcherFile

Constructor Summary

Method Summary

Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_ml.python.nlptransformers.TransformersBase

Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_jena.MatcherYAAAJena

Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_jena.MatcherYAAA

Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_base.MatcherFile

Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_base.MatcherURL

Methods inherited from class eu.sealsproject.platform.res.tool.impl.AbstractPlugin

Methods inherited from class java.lang.Object

Methods inherited from interface eu.sealsproject.platform.res.tool.api.IPlugin

Field Details

LOGGER

NEWLINE

changeClass

batchSizeOptimization

Constructor Details

TransformersFilter

TransformersFilter

Method Details

match

createPredictionFile

predictConfidences

getMaximumPerDeviceEvalBatchSize

isChangeClass

setChangeClass

isOptimizeBatchSize

setOptimizeBatchSize

getBatchSizeOptimization

setBatchSizeOptimization

setOptimizeAll

isOptimizeAll