Class PythonServer
java.lang.Object
de.uni_mannheim.informatik.dws.melt.matching_ml.python.PythonServer
A client class to communicate with python libraries such as gensim.
This class follows a singleton pattern.
Communication is performed through HTTP requests.
In case you need a different python environment or python executable, create a file in directory python_server
named
python_command.txt
and write your absolute path of the python executable in that file.-
Field Summary
Modifier and TypeFieldDescriptionprivate static final int
Developer note: Do not change the default port since other applications rely on it (e.g.private static final String
Default resources directory (where the python files will be copied to by default) and where the resources are read from within the JAR.private static org.apache.http.impl.client.CloseableHttpClient
Client to communicate with the server.private static PythonServer
Instance (singleton pattern.private boolean
Indicates whether the shutdown hook has been initialized.private static boolean
Indicates whether the server has been shut down.private boolean
Indicator whether vectors shall be cached.private static final com.fasterxml.jackson.databind.ObjectMapper
ObjectMapper from jackson to generate JSON.private static final org.slf4j.Logger
Default loggerprivate static boolean
If set to true, all python files (e.g.private static int
The port that shall be used.private static String
In case someone wants to configure the python command programmatically.private File
The directory where the python files will be copied to.private static Process
The python process.private static String
The URL that shall be used to perform the requests.Local vector cache. -
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionprivate void
addModelToRequest
(org.apache.http.client.methods.HttpGet request, String modelOrVectorPath) Given a path to a model or vector file, this method determines whether it is a model or a vector file and adds the corresponding parameter to the request.alignModel
(String vectorPathSource, String vectorPathTarget, String function, Alignment alignment) Align two knowledge graph embeddingsapplyStoredMLModel
(File modelFile, File predictFile) Apply a stored model to a new file (predict file).static boolean
Checks whether all Python requirements are installed and whether the server is functional.static double
cosineSimilarity
(Double[] vector1, Double[] vector2) Calculate The cosine similarity between two vectors.private void
exportResource
(File baseDirectory, String resourceName) Export a resource embedded into a Jar file to the local file path.private String
getCanonicalPath
(File file) Obtain the canonical model path.private String
getCanonicalPath
(String filePath) Obtain the canonical model path.private String
Obtain the canonical model path.static PythonServer
Get the instance.static PythonServer
getInstance
(File resourcesDirectory) Get the instance (singleton pattern).private String
static int
getPort()
protected String
getPythonAdditionalPath
(String pythonCommand) Returns a concatenated path of directories which can be used in the PATH variable.protected String
Returns the python command which is extracted fromfile melt-resources/python_command.txt
.Get the resource directory as String.static String
double
getSimilarity
(String concept1, String concept2, String modelOrVectorPath) Ge the similarity given 2 concepts and a gensim model.Double[]
Returns the vector of a concept.int
getVocabularySize
(String modelOrVectorPath) Returns the size of the vocabulary of the stated model/vector set.getVocabularyTerms
(String modelOrVectorPath) Returns the full vocabulary of the specified model as HashSet (e.g.boolean
isInVocabulary
(String concept, File modelOrVectorPath) Returns true when the concept can be found in the vocabulary of the model.boolean
isInVocabulary
(String concept, String modelOrVectorPath) Returns true when the concept can be found in the vocabulary of the model.boolean
If true: enabled.learnAndApplyMLModel
(File trainFile, File predictFile, int cv, int jobs) Learn a ML model for a given training file.private Alignment
private void
printHello
(String name) A quick technical demo.queryDoc2VecModel
(String modelPath, List<Correspondence> alignment) Method to query a doc2vec model (which has to be trained with trainDoc2VecModel) in a batch mode.queryVectorSpaceModel
(String modelPath, Alignment alignment) Method to query a vector space model (which has to be trained with trainVectorSpaceModel) in a batch mode.double
queryVectorSpaceModel
(String modelPath, String documentIdOne, String documentIdTwo) Method to query a vector space model (which has to be trained with trainVectorSpaceModel).queryVectorSpaceModel
(String modelPath, List<Correspondence> alignment) Method to query a vector space model (which has to be trained with trainVectorSpaceModel) in a batch mode.runGroupShuffleSplit
(List<Integer> groups, double trainSize) void
runOpenEAModel
(File argumentFile, boolean save) Run the openEA library.private String
runRequest
(org.apache.http.client.methods.HttpUriRequest request) float
sentenceTransformersFineTuning
(SentenceTransformersFineTuner fineTuner, File trainingFile, File validationFile) Run fine tuning for sentence transformers.sentenceTransformersPrediction
(SentenceTransformersMatcher matcher, File corpusFile, File queriesFile) Run sentence transformers prediction.static void
setOverridePythonFiles
(boolean overrideFiles) If set to true, all python files (e.g.static void
setPort
(int port) static void
setPythonCommandBackup
(String pythonCommandBackup) Sets the python command programmatically.void
setResourcesDirectory
(File resourcesDirectory) Set the directory where the python files will be copied to.void
setVectorCaching
(boolean vectorCaching) If vector caching is turned on, similarities will be calculated on Java site (rather than in Python) and vectors are held in memories.static void
shutDown()
Shut down the service.private boolean
Initializes the server.Run text generation model (like a large language model llm) given a file with left and right value which are replaced .void
trainAndStoreMLModel
(File trainFile, File modelFile, int cv, int jobs) Learn a ML model for a given training file and stores it in the given model file.void
trainDoc2VecModel
(String modelPath, String trainingFilePath, Word2VecConfiguration configuration) Method to train a doc2vec model.void
trainVectorSpaceModel
(String modelPath, String trainingFilePath) Method to train a vector space model.boolean
trainWord2VecModel
(String modelOrVectorPath, String trainingFilePath, Word2VecConfiguration configuration) Method to train a word2vec model.private void
transformersFineTunerUpdateBaseRequest
(TransformersBaseFineTuner fineTuner, File trainingFile, org.apache.http.client.methods.HttpGet request) void
transformersFineTuning
(TransformersFineTuner fineTuner, File trainingFile) Finetune a transformers model with the given parameters and write this model to a given folder.void
transformersFineTuningHpSearch
(TransformersFineTunerHpSearch hpsearch, File trainingFile) Run a hyperparameter fine tuning.transformersMultiClassPrediction
(TransformersFilter filter, File predictionFilePath) Run a transformers model on a CSV file with two columns (text left and text right) for multi class prediction.transformersPrediction
(TransformersFilter filter, File predictionFilePath) Run a transformers model on a CSV file with two columns (text left and text right) to predict if they describe the same concept.private void
transformersUpdateBaseRequest
(TransformersBase base, org.apache.http.client.methods.HttpGet request) protected void
updateEnvironmentPath
(Map<String, String> environment, String pythonCommand) Updates the environment variable PATH with additional python needed directories like env/lib/binvoid
writeModelAsTextFile
(String modelOrVectorPath, String fileToWrite) Writes the vectors to a human-readable text file.void
writeModelAsTextFile
(String modelOrVectorPath, String fileToWrite, String entityFile) Writes the vectors to a human-readable text file.private static <T> void
writeSetToFile
(File fileToWrite, Set<T> setToWrite) This method writes the content of aSet<String>
to a file.void
writeVocabularyToFile
(String modelOrVectorPath, File fileToWrite) Writes the vocabulary of the given gensim model to a text file (UTF-8 encoded).void
writeVocabularyToFile
(String modelOrVectorPath, String fileToWritePath) Writes the vocabulary of the given gensim model to a text file (UTF-8 encoded).
-
Field Details
-
LOGGER
private static final org.slf4j.Logger LOGGERDefault logger -
DEFAULT_RESOURCES_DIRECTORY
Default resources directory (where the python files will be copied to by default) and where the resources are read from within the JAR.- See Also:
-
JSON_MAPPER
private static final com.fasterxml.jackson.databind.ObjectMapper JSON_MAPPERObjectMapper from jackson to generate JSON. -
serverUrl
The URL that shall be used to perform the requests. -
isVectorCaching
private boolean isVectorCachingIndicator whether vectors shall be cached. This means that vectors are cached locally and similarities are calculated in Java to avoid many cross-language calls. Disable in cases of infrequent calls or if memory availability is limited. -
isShutDown
private static boolean isShutDownIndicates whether the server has been shut down. Initial state: shutDown. -
vectorCache
Local vector cache. -
isHookStarted
private boolean isHookStartedIndicates whether the shutdown hook has been initialized. This flag is required in order to have only one hook despite multiple re-initializations. -
resourcesDirectory
The directory where the python files will be copied to. -
DEFAULT_PORT
private static final int DEFAULT_PORTDeveloper note: Do not change the default port since other applications rely on it (e.g. the python tests). Rather usersetPort(int)
if you need to change the port in certain cases.- See Also:
-
port
private static int portThe port that shall be used. -
pythonCommandBackup
In case someone wants to configure the python command programmatically. Precedence always has the external file. -
overridePythonFiles
private static boolean overridePythonFilesIf set to true, all python files (e.g. python server melt and requirements.txt file) will be overridden with every execution. Set it to false for testing and debugging new features in python server. -
instance
Instance (singleton pattern. -
httpClient
private static org.apache.http.impl.client.CloseableHttpClient httpClientClient to communicate with the server. -
serverProcess
The python process.
-
-
Constructor Details
-
PythonServer
private PythonServer()Constructor
-
-
Method Details
-
transformersFineTuningHpSearch
public void transformersFineTuningHpSearch(TransformersFineTunerHpSearch hpsearch, File trainingFile) throws PythonServerException Run a hyperparameter fine tuning.- Parameters:
hpsearch
- the hyper parameter search model to usetrainingFile
- path to csv file with three columns (text left, text right, label 1/0).- Throws:
PythonServerException
- in case something goes wrong.
-
transformersFineTuning
public void transformersFineTuning(TransformersFineTuner fineTuner, File trainingFile) throws PythonServerException Finetune a transformers model with the given parameters and write this model to a given folder.- Parameters:
fineTuner
- the finetuner to usetrainingFile
- path to csv file with three columns (text left, text right, label 1/0).- Throws:
PythonServerException
- in case something goes wrong.
-
transformersPrediction
public List<Double> transformersPrediction(TransformersFilter filter, File predictionFilePath) throws PythonServerException Run a transformers model on a CSV file with two columns (text left and text right) to predict if they describe the same concept.- Parameters:
filter
- the filterpredictionFilePath
- path to csv file with two columns (text left and text right).- Returns:
- a list of confidences
- Throws:
PythonServerException
- in case something goes wrong.
-
textGenerationPrediction
public List<List<Double>> textGenerationPrediction(LLMBase filter, File predictionFilePath, List<Set<String>> wordsToDetect) throws PythonServerException Run text generation model (like a large language model llm) given a file with left and right value which are replaced . Each line needs to be completed and the prediction for "yes" and "no" are evaluated.- Parameters:
filter
- the filter with information about cudaVisibleDevices, transformersCache, etcpredictionFilePath
- path to csv file with two columns (text left and text right).wordsToDetect
- the words which should be detected- Returns:
- a list of list of confidences (for each class one confidence) it corresponds to the probability that the generated token is predicted
- Throws:
PythonServerException
- in case something goes wrong.
-
transformersMultiClassPrediction
public List<List<Double>> transformersMultiClassPrediction(TransformersFilter filter, File predictionFilePath) throws PythonServerException Run a transformers model on a CSV file with two columns (text left and text right) for multi class prediction. The number of class is underspecified.- Parameters:
filter
- the filterpredictionFilePath
- path to csv file with two columns (text left and text right).- Returns:
- a list of list which contains confidences for each class.
- Throws:
PythonServerException
- in case something goes wrong.
-
sentenceTransformersPrediction
public Alignment sentenceTransformersPrediction(SentenceTransformersMatcher matcher, File corpusFile, File queriesFile) throws PythonServerException Run sentence transformers prediction.- Parameters:
matcher
- the matchercorpusFile
- path to csv file with two columns (url, text representation).queriesFile
- path to csv file with two columns (url, text representation).- Returns:
- the newly generated alignment
- Throws:
PythonServerException
- in case something goes wrong.
-
sentenceTransformersFineTuning
public float sentenceTransformersFineTuning(SentenceTransformersFineTuner fineTuner, File trainingFile, File validationFile) throws PythonServerException Run fine tuning for sentence transformers.- Parameters:
fineTuner
- the matchertrainingFile
- path to csv file with three columns (text left, text right, label 1/0).validationFile
- the path to the validation file - can also be null to use train test split of trainings file.- Returns:
- the best score of the validation (using the file or train test split).
- Throws:
PythonServerException
- in case something goes wrong.
-
transformersFineTunerUpdateBaseRequest
private void transformersFineTunerUpdateBaseRequest(TransformersBaseFineTuner fineTuner, File trainingFile, org.apache.http.client.methods.HttpGet request) -
transformersUpdateBaseRequest
private void transformersUpdateBaseRequest(TransformersBase base, org.apache.http.client.methods.HttpGet request) -
runOpenEAModel
Run the openEA library.- Parameters:
argumentFile
- the argument file to usesave
- saves the embeddings to files- Throws:
Exception
- in case something goes wrong.
-
learnAndApplyMLModel
public List<Integer> learnAndApplyMLModel(File trainFile, File predictFile, int cv, int jobs) throws Exception Learn a ML model for a given training file. This file should be comma separated and containing a header. The class attribute should be named "target".- Parameters:
trainFile
- the train filepredictFile
- the file to predictcv
- number of cross validationsjobs
- number of parallel jobs to run- Returns:
- a list of double
- Throws:
Exception
- throws exception in case of errors
-
trainAndStoreMLModel
Learn a ML model for a given training file and stores it in the given model file. The training file should be comma separated and containing a header. The class attribute should be named "target".- Parameters:
trainFile
- the train filemodelFile
- where to store the modelcv
- number of cross validationsjobs
- number of parallel jobs to run- Throws:
Exception
- throws exception in case of errors
-
applyStoredMLModel
Apply a stored model to a new file (predict file).- Parameters:
predictFile
- the predict filemodelFile
- where to store the model- Returns:
- a list of integers which represents the classes
- Throws:
Exception
- throws exception in case of errors
-
alignModel
public Alignment alignModel(String vectorPathSource, String vectorPathTarget, String function, Alignment alignment) throws Exception Align two knowledge graph embeddings- Parameters:
vectorPathSource
- the source path to a vector filevectorPathTarget
- the target path to a vector filefunction
- function which is used to translate the embeddingsalignment
- the alignment with initial mapping- Returns:
- alignment
- Throws:
Exception
- in case of errors
-
parseJSON
- Throws:
Exception
-
trainVectorSpaceModel
Method to train a vector space model. The file for the training (i.e., csv file where first column is id and second column text) has to exist already.- Parameters:
modelPath
- identifier for the model (used for querying a specific modeltrainingFilePath
- The file path to the file that shall be used for training.
-
queryVectorSpaceModel
public double queryVectorSpaceModel(String modelPath, String documentIdOne, String documentIdTwo) throws Exception Method to query a vector space model (which has to be trained with trainVectorSpaceModel).- Parameters:
modelPath
- identifier for the model (used for querying a specific modeldocumentIdOne
- Document id for the first documentdocumentIdTwo
- Document id for the second document- Returns:
- The cosine similarity in the vector space between the two documents.
- Throws:
Exception
- Thrown if there are server problems.
-
queryVectorSpaceModel
public List<Double> queryVectorSpaceModel(String modelPath, List<Correspondence> alignment) throws Exception Method to query a vector space model (which has to be trained with trainVectorSpaceModel) in a batch mode.- Parameters:
modelPath
- identifier for the model (used for querying a specific modelalignment
- the alignment which contains the source and target uris- Returns:
- The cosine similarities in the vector space between the requested documents in the same order .
- Throws:
Exception
- Thrown if there are server problems.
-
queryVectorSpaceModel
Method to query a vector space model (which has to be trained with trainVectorSpaceModel) in a batch mode.- Parameters:
modelPath
- identifier for the model (used for querying a specific modelalignment
- the alignment which contains the source and target uris- Returns:
- The alignment where the confidence is updated if possible
- Throws:
Exception
- Thrown if there are server problems.
-
trainDoc2VecModel
public void trainDoc2VecModel(String modelPath, String trainingFilePath, Word2VecConfiguration configuration) Method to train a doc2vec model. The file for the training (i.e., csv file where first column is id and second colum text) has to exist already.- Parameters:
modelPath
- identifier for the model (used for querying a specific modeltrainingFilePath
- The file path to the file that shall be used for training.configuration
- the configuration for the doc2vec model
-
queryDoc2VecModel
public List<Double> queryDoc2VecModel(String modelPath, List<Correspondence> alignment) throws Exception Method to query a doc2vec model (which has to be trained with trainDoc2VecModel) in a batch mode.- Parameters:
modelPath
- identifier for the model (used for querying a specific modelalignment
- the alignment which contains the source and target uris- Returns:
- The cosine similarities in the doc2vec space between the requested documents in the same order .
- Throws:
Exception
- Thrown if there are server problems.
-
trainWord2VecModel
public boolean trainWord2VecModel(String modelOrVectorPath, String trainingFilePath, Word2VecConfiguration configuration) Method to train a word2vec model. The file for the training (i.e., file with sentences, paths etc.) has to exist already.- Parameters:
modelOrVectorPath
- If a vector file is desired, the file ending '.kv' is required.trainingFilePath
- The file path to the file that shall be used for training or to the directory containing the files that shall be used.configuration
- The configuration for the training operation.- Returns:
- True if training succeeded, else false.
-
getSimilarity
Ge the similarity given 2 concepts and a gensim model.- Parameters:
concept1
- First concept.concept2
- Second concept.modelOrVectorPath
- The path to the model or vector file. Note that the vector file MUST end with .kv in order to be recognized as vector file.- Returns:
- -1.0 in case of failure, else similarity.
-
getVector
Returns the vector of a concept.- Parameters:
concept
- The concept for which the vector shall be obtained.modelOrVectorPath
- The model path or vector file path leading to the file to be used.- Returns:
- The vector for the specified concept.
-
isInVocabulary
Returns true when the concept can be found in the vocabulary of the model.- Parameters:
concept
- The concept/URI that shall be looked up.modelOrVectorPath
- The model or vector file. Note that the vector file MUST end with .kv in order to be recognized as vector file.- Returns:
- True if exists, else false.
-
isInVocabulary
Returns true when the concept can be found in the vocabulary of the model.- Parameters:
concept
- The concept/URI that shall be looked up.modelOrVectorPath
- The path to the model or vector file. Note that the vector file MUST end with .kv in order to be recognized as vector file.- Returns:
- True if exists, else false.
-
getVocabularyTerms
Returns the full vocabulary of the specified model as HashSet (e.g. for fast indexing). Be aware that this operation can be very memory-consuming for very large models.Note: If you want to just check whether a concept exists in the vocabulary, it is better to call
isInVocabulary(String, String)
.Note further that you do not need to build your own cache if the PythonServer has enabled vector caching (you can check this withisVectorCaching()
.- Parameters:
modelOrVectorPath
- The path to the model or vector file. Note that the vector file MUST end with .kv in * order to be recognized as vector file.- Returns:
- Returns all vocabulary entries without vectors in a String HashSet.
-
writeVocabularyToFile
Writes the vocabulary of the given gensim model to a text file (UTF-8 encoded).- Parameters:
modelOrVectorPath
- The model of which the vocabulary shall be obtained.fileToWritePath
- The file path of the file that shall be written.
-
writeVocabularyToFile
Writes the vocabulary of the given gensim model to a text file (UTF-8 encoded).- Parameters:
modelOrVectorPath
- The model of which the vocabulary shall be obtained.fileToWrite
- The file that shall be written.
-
writeSetToFile
This method writes the content of aSet<String>
to a file. The file will be UTF-8 encoded.- Type Parameters:
T
- Type of the Set.- Parameters:
fileToWrite
- File which will be created and in which the data will be written.setToWrite
- Set whose content will be written into fileToWrite.
-
addModelToRequest
private void addModelToRequest(org.apache.http.client.methods.HttpGet request, String modelOrVectorPath) Given a path to a model or vector file, this method determines whether it is a model or a vector file and adds the corresponding parameter to the request.- Parameters:
request
- The request to which the model/vector file shall be added to.modelOrVectorPath
- The path to the model/vector file.
-
getCanonicalPath
Obtain the canonical model path.- Parameters:
filePath
- The path to the gensim model or gensim vector file.- Returns:
- The canonical model path as String.
-
getCanonicalPath
Obtain the canonical model path.- Parameters:
file
- the file to get the canonical path from- Returns:
- The canonical path as String.
-
getCanonicalPathNonExistent
Obtain the canonical model path.- Parameters:
file
- the file to get the canonical path from- Returns:
- The canonical path as String.
-
runGroupShuffleSplit
- Throws:
Exception
-
printHello
A quick technical demo. If the service works, it will print "Helloname
".- Parameters:
name
- The name that shall be printed.
-
runRequest
private String runRequest(org.apache.http.client.methods.HttpUriRequest request) throws PythonServerException - Throws:
PythonServerException
-
getInstance
Get the instance.- Returns:
- Gensim instance.
-
getInstance
Get the instance (singleton pattern).- Parameters:
resourcesDirectory
- Directory where the files shall be copied to.- Returns:
- Gensim Instance
-
checkRequirements
public static boolean checkRequirements()Checks whether all Python requirements are installed and whether the server is functional.- Returns:
- True if the server is fully functional, else false.
-
shutDown
public static void shutDown()Shut down the service. -
exportResource
Export a resource embedded into a Jar file to the local file path.- Parameters:
baseDirectory
- The base directory.resourceName
- ie.: "/SmartLibrary.dll"
-
startServer
private boolean startServer()Initializes the server.- Returns:
- True if successful, else false.
-
getLogLevel
-
getPythonCommand
Returns the python command which is extracted fromfile melt-resources/python_command.txt
.- Returns:
- The python executable path.
-
updateEnvironmentPath
Updates the environment variable PATH with additional python needed directories like env/lib/bin- Parameters:
environment
- The environment to be changed.pythonCommand
- The python executable path.
-
getPythonAdditionalPath
Returns a concatenated path of directories which can be used in the PATH variable. It searches based on a python executable path, all bin directories within the python dir.- Parameters:
pythonCommand
- The python executable path.- Returns:
- a concatenated path of directories which can be used in the PATH variable.
-
cosineSimilarity
Calculate The cosine similarity between two vectors.- Parameters:
vector1
- First vector.vector2
- Second vector.- Returns:
- Cosine similarity as double.
-
writeModelAsTextFile
Writes the vectors to a human-readable text file.- Parameters:
modelOrVectorPath
- The path to the model or vector file. Note that the vector file MUST end with .kv in * order to be recognized as vector file.fileToWrite
- The file that will be written.
-
writeModelAsTextFile
Writes the vectors to a human-readable text file.- Parameters:
modelOrVectorPath
- The path to the model or vector file. Note that the vector file MUST end with .kv in * order to be recognized as vector file.fileToWrite
- The file that will be written.entityFile
- The vocabulary that shall appear in the text file (can be null if all words shall be written). The file must contain one word per line. The contents must be a subset of the vocabulary.
-
getResourcesDirectory
-
setPythonCommandBackup
Sets the python command programmatically. This is used when no external file python_command.txt is found.- Parameters:
pythonCommandBackup
- the python command.
-
setOverridePythonFiles
public static void setOverridePythonFiles(boolean overrideFiles) If set to true, all python files (e.g. python server melt and requirements.txt file) will be overridden with every execution. If you want to make changes to the python server (e.g. to develop and test a feature) you can set it to false. Then all modifications to these files will not be changed.- Parameters:
overrideFiles
- if true, override the python server files.
-
getResourcesDirectoryPath
Get the resource directory as String.- Returns:
- Directory as String.
-
setResourcesDirectory
Set the directory where the python files will be copied to.- Parameters:
resourcesDirectory
- Must be a directory.
-
getVocabularySize
Returns the size of the vocabulary of the stated model/vector set.- Parameters:
modelOrVectorPath
- The path to the model or vector file. Note that the vector file MUST end with .kv in order to be recognized as vector file.- Returns:
- -1 in case of an error else the size of the vocabulary.
-
isVectorCaching
public boolean isVectorCaching()If true: enabled. Else: false.- Returns:
- True if enabled, else false.
-
setVectorCaching
public void setVectorCaching(boolean vectorCaching) If vector caching is turned on, similarities will be calculated on Java site (rather than in Python) and vectors are held in memories. Turn this function on, if you plan to do many computations with the same set of vectors. This will increase the performance at the cost of memory.- Parameters:
vectorCaching
- True if caching shall be enabled, else false.
-
getPort
public static int getPort() -
setPort
public static void setPort(int port) -
getServerUrl
-