Class TransformersFilter
java.lang.Object
eu.sealsproject.platform.res.tool.impl.AbstractPlugin
de.uni_mannheim.informatik.dws.melt.matching_base.MatcherURL
de.uni_mannheim.informatik.dws.melt.matching_base.MatcherFile
de.uni_mannheim.informatik.dws.melt.matching_jena.MatcherYAAA
de.uni_mannheim.informatik.dws.melt.matching_jena.MatcherYAAAJena
de.uni_mannheim.informatik.dws.melt.matching_ml.python.nlptransformers.TransformersBase
de.uni_mannheim.informatik.dws.melt.matching_ml.python.nlptransformers.TransformersFilter
- All Implemented Interfaces:
Filter
,IMatcher<org.apache.jena.ontology.OntModel,
,Alignment, Properties> eu.sealsproject.platform.res.domain.omt.IOntologyMatchingToolBridge
,eu.sealsproject.platform.res.tool.api.IPlugin
,eu.sealsproject.platform.res.tool.api.IToolBridge
- Direct Known Subclasses:
RelationTypePredictor
This filter extracts the corresponding text for a resource (with the specified and customizable extractor) given all correspondences in the input alignment.
The texts of the two resources are fed into the specified transformer model and the prediction is added in form of a confidence to the correspondence.
No filtering is applied in this class.
-
Field Summary
Modifier and TypeFieldDescriptionprotected BatchSizeOptimization
protected boolean
private static final org.slf4j.Logger
private static final String
Fields inherited from class de.uni_mannheim.informatik.dws.melt.matching_ml.python.nlptransformers.TransformersBase
cudaVisibleDevices, extractor, modelName, multipleTextsToMultipleExamples, multiProcessing, trainingArguments, transformersCache, usingTensorflow
Fields inherited from class de.uni_mannheim.informatik.dws.melt.matching_base.MatcherFile
FILE_PREFIX, FILE_SUFFIX
-
Constructor Summary
ConstructorDescriptionTransformersFilter
(TextExtractorMap extractor, String modelName) Constructor with all required parameters and default values for optional parameters (can be changed by setters).TransformersFilter
(TextExtractor extractor, String modelName) Constructor with all required parameters and default values for optional parameters (can be changed by setters). -
Method Summary
Modifier and TypeMethodDescriptioncreatePredictionFile
(org.apache.jena.ontology.OntModel source, org.apache.jena.ontology.OntModel target, Alignment predictionAlignment, File outputFile, boolean append) Create the prediction file which is a CSV file with two columns.The first column is the text from the left resource and the second column is the text from the right resource.Returns how the batch size is optimized.protected int
getMaximumPerDeviceEvalBatchSize
(File trainingFile) This functions tries to execute the prediction with number of example equal to the tested batch size.boolean
Return true if the class is changed in the classification.boolean
This will return the value if all optimization techiques are enabled or diabled.boolean
Deprecated.match
(org.apache.jena.ontology.OntModel source, org.apache.jena.ontology.OntModel target, Alignment inputAlignment, Properties properties) Aligns two ontologies specified via a Jena OntModel, with an input alignment as Alignment object, and returns the mapping of the resulting alignment.predictConfidences
(File predictionFilePath) Run huggingface transformers library.void
setBatchSizeOptimization
(BatchSizeOptimization batchSizeOptimization) Sets how the batch size is optimized.void
setChangeClass
(boolean changeClass) If set to true, the class is changed in the classification.void
setOptimizeAll
(boolean optimize) This will enabled or disable all possible optimization to improve prediction speed.void
setOptimizeBatchSize
(boolean optimizeBatchSize) Deprecated.better usesetBatchSizeOptimization
Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_ml.python.nlptransformers.TransformersBase
addTrainingArgument, getCudaVisibleDevices, getCudaVisibleDevicesButOnlyOneGPU, getExamplesForBatchSizeOptimization, getExtractor, getExtractorMap, getModelName, getMultiProcessing, getTextualRepresentation, getTrainingArguments, getTransformersCache, isMultipleTextsToMultipleExamples, isOptimizeForMixedPrecisionTraining, isUsingTensorflow, setCudaVisibleDevices, setCudaVisibleDevices, setExtractor, setExtractorMap, setModelName, setMultipleTextsToMultipleExamples, setMultiProcessing, setOptimizeForMixedPrecisionTraining, setTrainingArguments, setTransformersCache, setUsingTensorflow, writeExamplesToFile
Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_jena.MatcherYAAAJena
getModelSpec, match, readOntology
Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_jena.MatcherYAAA
match
Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_base.MatcherFile
match
Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_base.MatcherURL
align, align, canExecute, getType
Methods inherited from class eu.sealsproject.platform.res.tool.impl.AbstractPlugin
getId, getVersion, setId, setVersion
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Methods inherited from interface eu.sealsproject.platform.res.tool.api.IPlugin
getId, getVersion
-
Field Details
-
LOGGER
private static final org.slf4j.Logger LOGGER -
NEWLINE
-
changeClass
protected boolean changeClass -
batchSizeOptimization
-
-
Constructor Details
-
TransformersFilter
Constructor with all required parameters and default values for optional parameters (can be changed by setters). It uses the systems default tmp dir to store the files with texts generated from the knowledge graphs. Pytorch is used instead of tensorflow and all visible GPUs are used for prediction.- Parameters:
extractor
- the extractor to select which text for each resource should be used.modelName
- the model name which can be a model id (a hosted model on huggingface.co) or a path to a directory containing a model and tokenizer ( see first parameter pretrained_model_name_or_path of the from_pretrained function in huggingface library). In case of a path, it should be absolute. The path can be generated by e.g.FileUtil.getCanonicalPathIfPossible(java.io.File)
-
TransformersFilter
Constructor with all required parameters and default values for optional parameters (can be changed by setters). It uses the systems default tmp dir to store the files with texts generated from the knowledge graphs. Pytorch is used instead of tensorflow and all visible GPUs are used for prediction.- Parameters:
extractor
- the extractor to select which text for each resource should be used.modelName
- the model name which can be a model id (a hosted model on huggingface.co) or a path to a directory containing a model and tokenizer ( see first parameter pretrained_model_name_or_path of the from_pretrained function in huggingface library). In case of a path, it should be absolute. The path can be generated by e.g.FileUtil.getCanonicalPathIfPossible(java.io.File)
-
-
Method Details
-
match
public Alignment match(org.apache.jena.ontology.OntModel source, org.apache.jena.ontology.OntModel target, Alignment inputAlignment, Properties properties) throws Exception Description copied from class:MatcherYAAAJena
Aligns two ontologies specified via a Jena OntModel, with an input alignment as Alignment object, and returns the mapping of the resulting alignment. Note: This method might be called multiple times in a row when using the evaluation framework. Make sure to return a mapping which is specific to the given inputs.- Specified by:
match
in interfaceIMatcher<org.apache.jena.ontology.OntModel,
Alignment, Properties> - Specified by:
match
in classMatcherYAAAJena
- Parameters:
source
- This OntModel represents the source ontology.target
- This OntModel represents the target ontology.inputAlignment
- This mapping represents the input alignment.properties
- Additional properties.- Returns:
- The resulting alignment of the matching process.
- Throws:
Exception
- Any exception which occurs during matching.
-
createPredictionFile
public Map<Correspondence,List<Integer>> createPredictionFile(org.apache.jena.ontology.OntModel source, org.apache.jena.ontology.OntModel target, Alignment predictionAlignment, File outputFile, boolean append) throws IOException Create the prediction file which is a CSV file with two columns.The first column is the text from the left resource and the second column is the text from the right resource.- Parameters:
source
- The source modeltarget
- The target modelpredictionAlignment
- the alignment to process. All correspondences which have enough text are used.outputFile
- the csv file to which the output should be written to.append
- if true, then the training alignment is append to the given file.- Returns:
- the map which maps the the correspondence to (possibly multiple) row numbers. In case of multipleTextsToMultipleExamples is set to true, multiple rows can correspond to one correspondence, because each text (e.g. label, comment etc) of the two resources is used as an example.
- Throws:
IOException
- in case the writing fails.
-
predictConfidences
Run huggingface transformers library.- Parameters:
predictionFilePath
- path to csv file with two columns (text left and text right).- Returns:
- a list of confidences
- Throws:
Exception
- in case something goes wrong.
-
getMaximumPerDeviceEvalBatchSize
This functions tries to execute the prediction with number of example equal to the tested batch size. It will start with 2 and checks only powers of 2.- Parameters:
trainingFile
- the training file to use- Returns:
- the maximum
per_device_eval_batch_size
-
isChangeClass
public boolean isChangeClass()Return true if the class is changed in the classification. This is useful if a pretrained model predict exactly the opposite class.- Returns:
- true if the class is changed in the classification.
-
setChangeClass
public void setChangeClass(boolean changeClass) If set to true, the class is changed in the classification. This is useful if a pretrained model predict exactly the opposite class.- Parameters:
changeClass
- true if the class should be changed in the classification.
-
isOptimizeBatchSize
public boolean isOptimizeBatchSize()Deprecated.better usegetBatchSizeOptimization
Return true if batch size optimization is turned on.- Returns:
- true if batch size optimization is turned on.
-
setOptimizeBatchSize
public void setOptimizeBatchSize(boolean optimizeBatchSize) Deprecated.better usesetBatchSizeOptimization
Set the value if batch size should be optimized before running the prediction. This should only be set to true, if the dataset is huge. Otherwise the algorithm to find the largest batch size needs too much time.- Parameters:
optimizeBatchSize
- if true, optimize the batch size every time the match method is called.
-
getBatchSizeOptimization
Returns how the batch size is optimized.- Returns:
- how the batch size is optimized
-
setBatchSizeOptimization
Sets how the batch size is optimized.- Parameters:
batchSizeOptimization
- how the batch size is optimized
-
setOptimizeAll
public void setOptimizeAll(boolean optimize) This will enabled or disable all possible optimization to improve prediction speed. Currently this includes mixed precision training and batch size optimization.- Parameters:
optimize
- true to enable
-
isOptimizeAll
public boolean isOptimizeAll()This will return the value if all optimization techiques are enabled or diabled.- Returns:
- true if enabled.
-
getBatchSizeOptimization