Class SentenceTransformersFineTuner
java.lang.Object
eu.sealsproject.platform.res.tool.impl.AbstractPlugin
de.uni_mannheim.informatik.dws.melt.matching_base.MatcherURL
de.uni_mannheim.informatik.dws.melt.matching_base.MatcherFile
de.uni_mannheim.informatik.dws.melt.matching_jena.MatcherYAAA
de.uni_mannheim.informatik.dws.melt.matching_jena.MatcherYAAAJena
de.uni_mannheim.informatik.dws.melt.matching_ml.python.nlptransformers.TransformersBase
de.uni_mannheim.informatik.dws.melt.matching_ml.python.nlptransformers.TransformersBaseFineTuner
de.uni_mannheim.informatik.dws.melt.matching_ml.python.nlptransformers.SentenceTransformersFineTuner
- All Implemented Interfaces:
IMatcher<org.apache.jena.ontology.OntModel,
,Alignment, Properties> eu.sealsproject.platform.res.domain.omt.IOntologyMatchingToolBridge
,eu.sealsproject.platform.res.tool.api.IPlugin
,eu.sealsproject.platform.res.tool.api.IToolBridge
This matcher uses the Sentence Transformers library to build an embedding space for each resource given a textual representation of it.
Thus this matcher does not filter anything but generates matching candidates based on the text.
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate static final org.slf4j.Logger
private SentenceTransformersLoss
private static final String
private int
private int
private float
A number between zero and one which represents the proportion of the data to include in the test split.private int
Fields inherited from class de.uni_mannheim.informatik.dws.melt.matching_ml.python.nlptransformers.TransformersBaseFineTuner
additionallySwitchSourceTarget, resultingModelLocation, trainingFile
Fields inherited from class de.uni_mannheim.informatik.dws.melt.matching_ml.python.nlptransformers.TransformersBase
cudaVisibleDevices, extractor, modelName, multipleTextsToMultipleExamples, multiProcessing, trainingArguments, transformersCache, usingTensorflow
Fields inherited from class de.uni_mannheim.informatik.dws.melt.matching_base.MatcherFile
FILE_PREFIX, FILE_SUFFIX
-
Constructor Summary
ConstructorsConstructorDescriptionSentenceTransformersFineTuner
(TextExtractorMap extractor, String initialModelName, File resultingModelLocation) Run the training of a NLP sentence transformers.SentenceTransformersFineTuner
(TextExtractor extractor, String initialModelName, File resultingModelLocation) Run the training of a NLP sentence transformers. -
Method Summary
Modifier and TypeMethodDescriptionfinetuneModel
(File trainingFile) Finetune a given model with the provided text in the csv file (three columns: first text, second text, label(0/1))float
finetuneModel
(File trainingFile, File validationFile) Run the training on the training file, but evaluate the best model on the validationFile.getLoss()
int
int
float
Returns a number between zero and one which represents the proportion of the data to include in the test split.int
void
void
setNumberOfEpochs
(int numberOfEpochs) void
setTestBatchSize
(int testBatchSize) void
setTestSize
(float testSize) Sets the number between zero and one which represents the proportion of the data to include in the test splitvoid
setTrainBatchSize
(int trainBatchSize) void
setTrainingArguments
(TransformersArguments trainingArguments) This class does not allow setting training argumnets.void
setUsingTensorflow
(boolean usingTensorflow) This class only allows to set this value to false.private int
writeOneTriplet
(org.apache.jena.rdf.model.Resource anchor, org.apache.jena.rdf.model.Resource positive, org.apache.jena.rdf.model.Resource hardNegative, Map<org.apache.jena.rdf.model.Resource, Map<String, Set<String>>> cache, Writer writer) int
writeTrainingFile
(org.apache.jena.ontology.OntModel source, org.apache.jena.ontology.OntModel target, Alignment trainingAlignment, File trainFile, boolean append) Writes the correspondences to a file (append or not can be chosen by a parameter).private int
writeTripletFormat
(org.apache.jena.ontology.OntModel source, org.apache.jena.ontology.OntModel target, Alignment trainingAlignment, File trainFile, boolean append) Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_ml.python.nlptransformers.TransformersBaseFineTuner
clearTrainingData, createTrainingFile, finetuneModel, getResultingModelLocation, getTrainingFile, isAdditionallySwitchSourceTarget, match, setAdditionallySwitchSourceTarget, setResultingModelLocation, writeClassificationFormat
Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_ml.python.nlptransformers.TransformersBase
addTrainingArgument, getCudaVisibleDevices, getCudaVisibleDevicesButOnlyOneGPU, getExamplesForBatchSizeOptimization, getExtractor, getExtractorMap, getModelName, getMultiProcessing, getTextualRepresentation, getTrainingArguments, getTransformersCache, isMultipleTextsToMultipleExamples, isOptimizeForMixedPrecisionTraining, isUsingTensorflow, setCudaVisibleDevices, setCudaVisibleDevices, setExtractor, setExtractorMap, setModelName, setMultipleTextsToMultipleExamples, setMultiProcessing, setOptimizeForMixedPrecisionTraining, setTransformersCache, writeExamplesToFile
Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_jena.MatcherYAAAJena
getModelSpec, match, readOntology
Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_jena.MatcherYAAA
match
Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_base.MatcherFile
match
Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_base.MatcherURL
align, align, canExecute, getType
Methods inherited from class eu.sealsproject.platform.res.tool.impl.AbstractPlugin
getId, getVersion, setId, setVersion
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Methods inherited from interface eu.sealsproject.platform.res.tool.api.IPlugin
getId, getVersion
-
Field Details
-
LOGGER
private static final org.slf4j.Logger LOGGER -
NEWLINE
-
testSize
private float testSizeA number between zero and one which represents the proportion of the data to include in the test split. -
trainBatchSize
private int trainBatchSize -
testBatchSize
private int testBatchSize -
numberOfEpochs
private int numberOfEpochs -
loss
-
-
Constructor Details
-
SentenceTransformersFineTuner
public SentenceTransformersFineTuner(TextExtractorMap extractor, String initialModelName, File resultingModelLocation) Run the training of a NLP sentence transformers.- Parameters:
extractor
- used to extract text from a given resource. This is the text which represents a resource.initialModelName
- the initial model name for fine tuning which can be downloaded or a path to a directory containing model weights ( see first parameter pretrained_model_name_or_path of the from_pretrained function in huggingface library). This value can be also changed byTransformersBase.setModelName(java.lang.String)
.resultingModelLocation
- the final location where the fine-tuned model should be stored.
-
SentenceTransformersFineTuner
public SentenceTransformersFineTuner(TextExtractor extractor, String initialModelName, File resultingModelLocation) Run the training of a NLP sentence transformers.- Parameters:
extractor
- used to extract text from a given resource. This is the text which represents a resource.initialModelName
- the initial model name for fine tuning which can be downloaded or a path to a directory containing model weights ( see first parameter pretrained_model_name_or_path of the from_pretrained function in huggingface library). This value can be also changed byTransformersBase.setModelName(java.lang.String)
.resultingModelLocation
- the final location where the fine-tuned model should be stored.
-
-
Method Details
-
finetuneModel
Description copied from class:TransformersBaseFineTuner
Finetune a given model with the provided text in the csv file (three columns: first text, second text, label(0/1))- Specified by:
finetuneModel
in classTransformersBaseFineTuner
- Parameters:
trainingFile
- csv file with three columns: first text, second text, label(0/1) (can be generated withTransformersBaseFineTuner.createTrainingFile(OntModel, OntModel, Alignment)
)- Returns:
- the final location (directory) of the finetuned model (which is also given in the constructor)
- Throws:
PythonServerException
-
finetuneModel
Run the training on the training file, but evaluate the best model on the validationFile. The model will be stored atresultingModelLocation
given in the constructor.- Parameters:
trainingFile
- the training file to use (can be generated withTransformersBaseFineTuner.createTrainingFile(OntModel, OntModel, Alignment)
validationFile
- the validation file to use (can be generated withTransformersBaseFineTuner.createTrainingFile(OntModel, OntModel, Alignment)
- Returns:
- the best score of the validation (using the file or train test split)
- Throws:
PythonServerException
- in case of some error during the learning
-
writeTrainingFile
public int writeTrainingFile(org.apache.jena.ontology.OntModel source, org.apache.jena.ontology.OntModel target, Alignment trainingAlignment, File trainFile, boolean append) throws IOException Writes the correspondences to a file (append or not can be chosen by a parameter).- Overrides:
writeTrainingFile
in classTransformersBaseFineTuner
- Parameters:
source
- the source modeltarget
- the target modeltrainingAlignment
- the training alignment to be written to file.trainFile
- the file to write all textsappend
- true if all content should be appended to the file.- Returns:
- how many correspondences were written to the file.
- Throws:
IOException
- in case the writing fails
-
writeTripletFormat
private int writeTripletFormat(org.apache.jena.ontology.OntModel source, org.apache.jena.ontology.OntModel target, Alignment trainingAlignment, File trainFile, boolean append) throws IOException - Throws:
IOException
-
writeOneTriplet
private int writeOneTriplet(org.apache.jena.rdf.model.Resource anchor, org.apache.jena.rdf.model.Resource positive, org.apache.jena.rdf.model.Resource hardNegative, Map<org.apache.jena.rdf.model.Resource, Map<String, throws IOExceptionSet<String>>> cache, Writer writer) - Throws:
IOException
-
setTrainingArguments
This class does not allow setting training argumnets. Everything is determined by attributes.- Overrides:
setTrainingArguments
in classTransformersBase
- Parameters:
trainingArguments
- training arguments
-
setUsingTensorflow
public void setUsingTensorflow(boolean usingTensorflow) This class only allows to set this value to false.- Overrides:
setUsingTensorflow
in classTransformersBase
- Parameters:
usingTensorflow
- should be set to false-
-
getTestSize
public float getTestSize()Returns a number between zero and one which represents the proportion of the data to include in the test split.- Returns:
- a number between zero and one which represents the proportion of the data to include in the test split
-
setTestSize
public void setTestSize(float testSize) Sets the number between zero and one which represents the proportion of the data to include in the test split- Parameters:
testSize
- number between zero and one which represents the proportion of the data to include in the test split
-
getTrainBatchSize
public int getTrainBatchSize() -
setTrainBatchSize
public void setTrainBatchSize(int trainBatchSize) -
getTestBatchSize
public int getTestBatchSize() -
setTestBatchSize
public void setTestBatchSize(int testBatchSize) -
getNumberOfEpochs
public int getNumberOfEpochs() -
setNumberOfEpochs
public void setNumberOfEpochs(int numberOfEpochs) -
getLoss
-
setLoss
-