Class TransformersBaseFineTuner
java.lang.Object
eu.sealsproject.platform.res.tool.impl.AbstractPlugin
de.uni_mannheim.informatik.dws.melt.matching_base.MatcherURL
de.uni_mannheim.informatik.dws.melt.matching_base.MatcherFile
de.uni_mannheim.informatik.dws.melt.matching_jena.MatcherYAAA
de.uni_mannheim.informatik.dws.melt.matching_jena.MatcherYAAAJena
de.uni_mannheim.informatik.dws.melt.matching_ml.python.nlptransformers.TransformersBase
de.uni_mannheim.informatik.dws.melt.matching_ml.python.nlptransformers.TransformersBaseFineTuner
- All Implemented Interfaces:
IMatcher<org.apache.jena.ontology.OntModel,
,Alignment, Properties> eu.sealsproject.platform.res.domain.omt.IOntologyMatchingToolBridge
,eu.sealsproject.platform.res.tool.api.IPlugin
,eu.sealsproject.platform.res.tool.api.IToolBridge
- Direct Known Subclasses:
SentenceTransformersFineTuner
,TransformersFineTuner
This is a base class for all Transformers fine tuners.
It just contains some variables and getter and setters.
-
Field Summary
Modifier and TypeFieldDescriptionprotected boolean
private static final org.slf4j.Logger
protected static final String
protected File
protected File
Fields inherited from class de.uni_mannheim.informatik.dws.melt.matching_ml.python.nlptransformers.TransformersBase
cudaVisibleDevices, extractor, modelName, multipleTextsToMultipleExamples, multiProcessing, trainingArguments, transformersCache, usingTensorflow
Fields inherited from class de.uni_mannheim.informatik.dws.melt.matching_base.MatcherFile
FILE_PREFIX, FILE_SUFFIX
-
Constructor Summary
ConstructorDescriptionTransformersBaseFineTuner
(TextExtractorMap extractor, String initialModelName, File resultingModelLocation) Run the training of a NLP transformer.TransformersBaseFineTuner
(TextExtractor extractor, String initialModelName, File resultingModelLocation) Run the training of a NLP transformer. -
Method Summary
Modifier and TypeMethodDescriptionvoid
Removes the training datacreateTrainingFile
(org.apache.jena.ontology.OntModel source, org.apache.jena.ontology.OntModel target, Alignment trainingAlignment) Creates a new file and writes all correspondences as textual data to it.This method should only be used when appendOnlyToFile is set to true.abstract File
finetuneModel
(File trainingFile) Finetune a given model with the provided text in the csv file (three columns: first text, second text, label(0/1))Returns the final location where the finetuned model should be storedReturns the training file generated during multiple calls of the match method.boolean
Return the boolean value if training examples are additionally changed in their order.match
(org.apache.jena.ontology.OntModel source, org.apache.jena.ontology.OntModel target, Alignment inputAlignment, Properties properties) This method does only fine tune the model and does not match any entities.void
setAdditionallySwitchSourceTarget
(boolean additionallySwitchSourceTarget) If set to true, the training examples not only contain e.g.void
setResultingModelLocation
(File resultingModelLocation) Sets the final location where the finetuned model should be stored.protected int
writeClassificationFormat
(org.apache.jena.ontology.OntModel source, org.apache.jena.ontology.OntModel target, Alignment trainingAlignment, File trainFile, boolean append) int
writeTrainingFile
(org.apache.jena.ontology.OntModel source, org.apache.jena.ontology.OntModel target, Alignment trainingAlignment, File trainFile, boolean append) Writes the correspondences to a file (append or not can be chosen by a parameter).Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_ml.python.nlptransformers.TransformersBase
addTrainingArgument, getCudaVisibleDevices, getCudaVisibleDevicesButOnlyOneGPU, getExamplesForBatchSizeOptimization, getExtractor, getExtractorMap, getModelName, getMultiProcessing, getTextualRepresentation, getTrainingArguments, getTransformersCache, isMultipleTextsToMultipleExamples, isOptimizeForMixedPrecisionTraining, isUsingTensorflow, setCudaVisibleDevices, setCudaVisibleDevices, setExtractor, setExtractorMap, setModelName, setMultipleTextsToMultipleExamples, setMultiProcessing, setOptimizeForMixedPrecisionTraining, setTrainingArguments, setTransformersCache, setUsingTensorflow, writeExamplesToFile
Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_jena.MatcherYAAAJena
getModelSpec, match, readOntology
Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_jena.MatcherYAAA
match
Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_base.MatcherFile
match
Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_base.MatcherURL
align, align, canExecute, getType
Methods inherited from class eu.sealsproject.platform.res.tool.impl.AbstractPlugin
getId, getVersion, setId, setVersion
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Methods inherited from interface eu.sealsproject.platform.res.tool.api.IPlugin
getId, getVersion
-
Field Details
-
LOGGER
private static final org.slf4j.Logger LOGGER -
NEWLINE
-
resultingModelLocation
-
trainingFile
-
additionallySwitchSourceTarget
protected boolean additionallySwitchSourceTarget
-
-
Constructor Details
-
TransformersBaseFineTuner
public TransformersBaseFineTuner(TextExtractorMap extractor, String initialModelName, File resultingModelLocation) Run the training of a NLP transformer.- Parameters:
extractor
- used to extract text from a given resource. This is the text which represents a resource.initialModelName
- the initial model name for fine tuning which can be downloaded or a path to a directory containing model weights ( see first parameter pretrained_model_name_or_path of the from_pretrained function in huggingface library). This value can be also changed byTransformersBase.setModelName(java.lang.String)
.resultingModelLocation
- the final location where the fine-tuned model should be stored.
-
TransformersBaseFineTuner
public TransformersBaseFineTuner(TextExtractor extractor, String initialModelName, File resultingModelLocation) Run the training of a NLP transformer.- Parameters:
extractor
- used to extract text from a given resource. This is the text which represents a resource.initialModelName
- the initial model name for fine tuning which can be downloaded or a path to a directory containing model weights ( see first parameter pretrained_model_name_or_path of the from_pretrained function in huggingface library). This value can be also changed byTransformersBase.setModelName(java.lang.String)
.resultingModelLocation
- the final location where the fine-tuned model should be stored.
-
-
Method Details
-
match
public Alignment match(org.apache.jena.ontology.OntModel source, org.apache.jena.ontology.OntModel target, Alignment inputAlignment, Properties properties) throws Exception This method does only fine tune the model and does not match any entities.- Specified by:
match
in interfaceIMatcher<org.apache.jena.ontology.OntModel,
Alignment, Properties> - Specified by:
match
in classMatcherYAAAJena
- Parameters:
source
- This OntModel represents the source ontology.target
- This OntModel represents the target ontology.inputAlignment
- This mapping represents the input alignment.properties
- Additional properties.- Returns:
- the resulting alignment - in this special case the unmodified input alignment.
- Throws:
Exception
- in case something goes wrong
-
createTrainingFile
public File createTrainingFile(org.apache.jena.ontology.OntModel source, org.apache.jena.ontology.OntModel target, Alignment trainingAlignment) throws IOException Creates a new file and writes all correspondences as textual data to it. Can also be used for creating a validation file.- Parameters:
source
- the source modeltarget
- the target modeltrainingAlignment
- the training alignment to be written to file.- Returns:
- the training file.
- Throws:
IOException
- in case the writing fails
-
writeTrainingFile
public int writeTrainingFile(org.apache.jena.ontology.OntModel source, org.apache.jena.ontology.OntModel target, Alignment trainingAlignment, File trainFile, boolean append) throws IOException Writes the correspondences to a file (append or not can be chosen by a parameter).- Parameters:
source
- the source modeltarget
- the target modeltrainingAlignment
- the training alignment to be written to file.trainFile
- the file to write all textsappend
- true if all content should be appended to the file.- Returns:
- how many correspondences were written to the file.
- Throws:
IOException
- in case the writing fails
-
writeClassificationFormat
protected int writeClassificationFormat(org.apache.jena.ontology.OntModel source, org.apache.jena.ontology.OntModel target, Alignment trainingAlignment, File trainFile, boolean append) throws IOException - Throws:
IOException
-
finetuneModel
Finetune a given model with the provided text in the csv file (three columns: first text, second text, label(0/1))- Parameters:
trainingFile
- csv file with three columns: first text, second text, label(0/1) (can be generated withcreateTrainingFile(OntModel, OntModel, Alignment)
)- Returns:
- the final location (directory) of the finetuned model (which is also given in the constructor)
- Throws:
Exception
- in case of any error
-
finetuneModel
This method should only be used when appendOnlyToFile is set to true. This will train a transformers model on the file generated by the match method during possibley multiple calls.- Returns:
- the final location (directory) of the fine-tuned model (which is also given in the constructor)
- Throws:
Exception
- in case of any error
-
clearTrainingData
public void clearTrainingData()Removes the training data -
getTrainingFile
Returns the training file generated during multiple calls of the match method. If the match method is not called yet, this will return a non existent file.- Returns:
- the training file
-
getResultingModelLocation
Returns the final location where the finetuned model should be stored- Returns:
- the location for the trained model.
-
setResultingModelLocation
Sets the final location where the finetuned model should be stored.- Parameters:
resultingModelLocation
- the model location as a file.
-
isAdditionallySwitchSourceTarget
public boolean isAdditionallySwitchSourceTarget()Return the boolean value if training examples are additionally changed in their order. If true, the training examples not only contain e.g. A,B but also B,A because positive and negative examples still hold even when the order is changed. This will double the number of training examples.- Returns:
- true, training examples are additionally changed in their order
-
setAdditionallySwitchSourceTarget
public void setAdditionallySwitchSourceTarget(boolean additionallySwitchSourceTarget) If set to true, the training examples not only contain e.g. A,B but also B,A because positive and negative example still hold even when the order is changed. This will double the number of training examples. The default is false.- Parameters:
additionallySwitchSourceTarget
- true, if source and target should be changed.
-