de.uni_mannheim.informatik.dws.melt.matching_ml.python.PythonServer

public class PythonServer extends Object

A client class to communicate with python libraries such as gensim. This class follows a singleton pattern. Communication is performed through HTTP requests. In case you need a different python environment or python executable, create a file in directory python_server named python_command.txt and write your absolute path of the python executable in that file.

Field Summary

Fields

Modifier and Type

Field

Description

private static final int

DEFAULT_PORT

Developer note: Do not change the default port since other applications rely on it (e.g.

private static final String

DEFAULT_RESOURCES_DIRECTORY

Default resources directory (where the python files will be copied to by default) and where the resources are read from within the JAR.

private static org.apache.http.impl.client.CloseableHttpClient

httpClient

Client to communicate with the server.

private static PythonServer

instance

Instance (singleton pattern.

private boolean

isHookStarted

Indicates whether the shutdown hook has been initialized.

private static boolean

isShutDown

Indicates whether the server has been shut down.

private boolean

isVectorCaching

Indicator whether vectors shall be cached.

private static final com.fasterxml.jackson.databind.ObjectMapper

JSON_MAPPER

ObjectMapper from jackson to generate JSON.

private static final org.slf4j.Logger

LOGGER

Default logger

private static boolean

overridePythonFiles

If set to true, all python files (e.g.

private static int

port

The port that shall be used.

private static String

pythonCommandBackup

In case someone wants to configure the python command programmatically.

private File

resourcesDirectory

The directory where the python files will be copied to.

private static Process

serverProcess

The python process.

private static String

serverUrl

The URL that shall be used to perform the requests.

private HashMap<String,Double[]>

vectorCache

Local vector cache.
Constructor Summary

Constructors

Modifier

Constructor

Description

private

PythonServer()

Constructor
Method Summary

Modifier and Type

Method

Description

private void

addModelToRequest(org.apache.http.client.methods.HttpGet request, String modelOrVectorPath)

Given a path to a model or vector file, this method determines whether it is a model or a vector file and adds the corresponding parameter to the request.

Alignment

alignModel(String vectorPathSource, String vectorPathTarget, String function, Alignment alignment)

Align two knowledge graph embeddings

List<Integer>

applyStoredMLModel(File modelFile, File predictFile)

Apply a stored model to a new file (predict file).

static boolean

checkRequirements()

Checks whether all Python requirements are installed and whether the server is functional.

static double

cosineSimilarity(Double[] vector1, Double[] vector2)

Calculate The cosine similarity between two vectors.

private void

exportResource(File baseDirectory, String resourceName)

Export a resource embedded into a Jar file to the local file path.

private String

getCanonicalPath(File file)

Obtain the canonical model path.

private String

getCanonicalPath(String filePath)

Obtain the canonical model path.

private String

getCanonicalPathNonExistent(File file)

Obtain the canonical model path.

static PythonServer

getInstance()

Get the instance.

static PythonServer

getInstance(File resourcesDirectory)

Get the instance (singleton pattern).

private String

getLogLevel()

static int

getPort()

protected String

getPythonAdditionalPath(String pythonCommand)

Returns a concatenated path of directories which can be used in the PATH variable.

protected String

getPythonCommand()

Returns the python command which is extracted from file melt-resources/python_command.txt.

File

getResourcesDirectory()

String

getResourcesDirectoryPath()

Get the resource directory as String.

static String

getServerUrl()

double

getSimilarity(String concept1, String concept2, String modelOrVectorPath)

Ge the similarity given 2 concepts and a gensim model.

Double[]

getVector(String concept, String modelOrVectorPath)

Returns the vector of a concept.

int

getVocabularySize(String modelOrVectorPath)

Returns the size of the vocabulary of the stated model/vector set.

Set<String>

getVocabularyTerms(String modelOrVectorPath)

Returns the full vocabulary of the specified model as HashSet (e.g.

boolean

isInVocabulary(String concept, File modelOrVectorPath)

Returns true when the concept can be found in the vocabulary of the model.

boolean

isInVocabulary(String concept, String modelOrVectorPath)

Returns true when the concept can be found in the vocabulary of the model.

boolean

isVectorCaching()

If true: enabled.

List<Integer>

learnAndApplyMLModel(File trainFile, File predictFile, int cv, int jobs)

Learn a ML model for a given training file.

private Alignment

parseJSON(String resultString)

private void

printHello(String name)

A quick technical demo.

List<Double>

queryDoc2VecModel(String modelPath, List<Correspondence> alignment)

Method to query a doc2vec model (which has to be trained with trainDoc2VecModel) in a batch mode.

Alignment

queryVectorSpaceModel(String modelPath, Alignment alignment)

Method to query a vector space model (which has to be trained with trainVectorSpaceModel) in a batch mode.

double

queryVectorSpaceModel(String modelPath, String documentIdOne, String documentIdTwo)

Method to query a vector space model (which has to be trained with trainVectorSpaceModel).

List<Double>

queryVectorSpaceModel(String modelPath, List<Correspondence> alignment)

Method to query a vector space model (which has to be trained with trainVectorSpaceModel) in a batch mode.

List<Integer>

runGroupShuffleSplit(List<Integer> groups, double trainSize)

void

runOpenEAModel(File argumentFile, boolean save)

Run the openEA library.

private String

runRequest(org.apache.http.client.methods.HttpUriRequest request)

float

sentenceTransformersFineTuning(SentenceTransformersFineTuner fineTuner, File trainingFile, File validationFile)

Run fine tuning for sentence transformers.

Alignment

sentenceTransformersPrediction(SentenceTransformersMatcher matcher, File corpusFile, File queriesFile)

Run sentence transformers prediction.

static void

setOverridePythonFiles(boolean overrideFiles)

If set to true, all python files (e.g.

static void

setPort(int port)

static void

setPythonCommandBackup(String pythonCommandBackup)

Sets the python command programmatically.

void

setResourcesDirectory(File resourcesDirectory)

Set the directory where the python files will be copied to.

void

setVectorCaching(boolean vectorCaching)

If vector caching is turned on, similarities will be calculated on Java site (rather than in Python) and vectors are held in memories.

static void

shutDown()

Shut down the service.

private boolean

startServer()

Initializes the server.

List<List<Double>>

textGenerationPrediction(LLMBase filter, File predictionFilePath, List<Set<String>> wordsToDetect)

Run text generation model (like a large language model llm) given a file with left and right value which are replaced .

void

trainAndStoreMLModel(File trainFile, File modelFile, int cv, int jobs)

Learn a ML model for a given training file and stores it in the given model file.

void

trainDoc2VecModel(String modelPath, String trainingFilePath, Word2VecConfiguration configuration)

Method to train a doc2vec model.

void

trainVectorSpaceModel(String modelPath, String trainingFilePath)

Method to train a vector space model.

boolean

trainWord2VecModel(String modelOrVectorPath, String trainingFilePath, Word2VecConfiguration configuration)

Method to train a word2vec model.

private void

transformersFineTunerUpdateBaseRequest(TransformersBaseFineTuner fineTuner, File trainingFile, org.apache.http.client.methods.HttpGet request)

void

transformersFineTuning(TransformersFineTuner fineTuner, File trainingFile)

Finetune a transformers model with the given parameters and write this model to a given folder.

void

transformersFineTuningHpSearch(TransformersFineTunerHpSearch hpsearch, File trainingFile)

Run a hyperparameter fine tuning.

List<List<Double>>

transformersMultiClassPrediction(TransformersFilter filter, File predictionFilePath)

Run a transformers model on a CSV file with two columns (text left and text right) for multi class prediction.

List<Double>

transformersPrediction(TransformersFilter filter, File predictionFilePath)

Run a transformers model on a CSV file with two columns (text left and text right) to predict if they describe the same concept.

private void

transformersUpdateBaseRequest(TransformersBase base, org.apache.http.client.methods.HttpGet request)

protected void

updateEnvironmentPath(Map<String,String> environment, String pythonCommand)

Updates the environment variable PATH with additional python needed directories like env/lib/bin

void

writeModelAsTextFile(String modelOrVectorPath, String fileToWrite)

Writes the vectors to a human-readable text file.

void

writeModelAsTextFile(String modelOrVectorPath, String fileToWrite, String entityFile)

Writes the vectors to a human-readable text file.

private static <T> void

writeSetToFile(File fileToWrite, Set<T> setToWrite)

This method writes the content of a Set<String> to a file.

void

writeVocabularyToFile(String modelOrVectorPath, File fileToWrite)

Writes the vocabulary of the given gensim model to a text file (UTF-8 encoded).

void

writeVocabularyToFile(String modelOrVectorPath, String fileToWritePath)

Writes the vocabulary of the given gensim model to a text file (UTF-8 encoded).

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Details
- LOGGER
  
  private static final org.slf4j.Logger LOGGER
  
  Default logger
- DEFAULT_RESOURCES_DIRECTORY
  
  private static final String DEFAULT_RESOURCES_DIRECTORY
  
  Default resources directory (where the python files will be copied to by default) and where the resources are read from within the JAR.
  See Also:
  
  Constant Field Values
- JSON_MAPPER
  
  private static final com.fasterxml.jackson.databind.ObjectMapper JSON_MAPPER
  
  ObjectMapper from jackson to generate JSON.
- serverUrl
  
  private static String serverUrl
  
  The URL that shall be used to perform the requests.
- isVectorCaching
  
  private boolean isVectorCaching
  
  Indicator whether vectors shall be cached. This means that vectors are cached locally and similarities are calculated in Java to avoid many cross-language calls. Disable in cases of infrequent calls or if memory availability is limited.
- isShutDown
  
  private static boolean isShutDown
  
  Indicates whether the server has been shut down. Initial state: shutDown.
- vectorCache
  
  private HashMap<String,Double[]> vectorCache
  
  Local vector cache.
- isHookStarted
  
  private boolean isHookStarted
  
  Indicates whether the shutdown hook has been initialized. This flag is required in order to have only one hook despite multiple re-initializations.
- resourcesDirectory
  
  private File resourcesDirectory
  
  The directory where the python files will be copied to.
- DEFAULT_PORT
  
  private static final int DEFAULT_PORT
  
  Developer note: Do not change the default port since other applications rely on it (e.g. the python tests). Rather user setPort(int) if you need to change the port in certain cases.
  See Also:
  
  Constant Field Values
- port
  
  private static int port
  
  The port that shall be used.
- pythonCommandBackup
  
  private static String pythonCommandBackup
  
  In case someone wants to configure the python command programmatically. Precedence always has the external file.
- overridePythonFiles
  
  private static boolean overridePythonFiles
  
  If set to true, all python files (e.g. python server melt and requirements.txt file) will be overridden with every execution. Set it to false for testing and debugging new features in python server.
- instance
  
  private static PythonServer instance
  
  Instance (singleton pattern.
- httpClient
  
  private static org.apache.http.impl.client.CloseableHttpClient httpClient
  
  Client to communicate with the server.
- serverProcess
  
  private static Process serverProcess
  
  The python process.
Constructor Details
- PythonServer
  
  private PythonServer()
  
  Constructor
Method Details
- transformersFineTuningHpSearch
  
  public void transformersFineTuningHpSearch(TransformersFineTunerHpSearch hpsearch, File trainingFile) throws PythonServerException
  
  Run a hyperparameter fine tuning.
  
  Parameters:
  
  hpsearch - the hyper parameter search model to use
  
  trainingFile - path to csv file with three columns (text left, text right, label 1/0).
  
  Throws:
  
  PythonServerException - in case something goes wrong.
- transformersFineTuning
  
  public void transformersFineTuning(TransformersFineTuner fineTuner, File trainingFile) throws PythonServerException
  
  Finetune a transformers model with the given parameters and write this model to a given folder.
  
  Parameters:
  
  fineTuner - the finetuner to use
  
  trainingFile - path to csv file with three columns (text left, text right, label 1/0).
  
  Throws:
  
  PythonServerException - in case something goes wrong.
- transformersPrediction
  
  public List<Double> transformersPrediction(TransformersFilter filter, File predictionFilePath) throws PythonServerException
  
  Run a transformers model on a CSV file with two columns (text left and text right) to predict if they describe the same concept.
  
  Parameters:
  
  filter - the filter
  
  predictionFilePath - path to csv file with two columns (text left and text right).
  
  Returns:
  
  a list of confidences
  
  Throws:
  
  PythonServerException - in case something goes wrong.
- textGenerationPrediction
  
  public List<List<Double>> textGenerationPrediction(LLMBase filter, File predictionFilePath, List<Set<String>> wordsToDetect) throws PythonServerException
  
  Run text generation model (like a large language model llm) given a file with left and right value which are replaced . Each line needs to be completed and the prediction for "yes" and "no" are evaluated.
  
  Parameters:
  
  filter - the filter with information about cudaVisibleDevices, transformersCache, etc
  
  predictionFilePath - path to csv file with two columns (text left and text right).
  
  wordsToDetect - the words which should be detected
  
  Returns:
  
  a list of list of confidences (for each class one confidence) it corresponds to the probability that the generated token is predicted
  
  Throws:
  
  PythonServerException - in case something goes wrong.
- transformersMultiClassPrediction
  
  public List<List<Double>> transformersMultiClassPrediction(TransformersFilter filter, File predictionFilePath) throws PythonServerException
  
  Run a transformers model on a CSV file with two columns (text left and text right) for multi class prediction. The number of class is underspecified.
  
  Parameters:
  
  filter - the filter
  
  predictionFilePath - path to csv file with two columns (text left and text right).
  
  Returns:
  
  a list of list which contains confidences for each class.
  
  Throws:
  
  PythonServerException - in case something goes wrong.
- sentenceTransformersPrediction
  
  public Alignment sentenceTransformersPrediction(SentenceTransformersMatcher matcher, File corpusFile, File queriesFile) throws PythonServerException
  
  Run sentence transformers prediction.
  
  Parameters:
  
  matcher - the matcher
  
  corpusFile - path to csv file with two columns (url, text representation).
  
  queriesFile - path to csv file with two columns (url, text representation).
  
  Returns:
  
  the newly generated alignment
  
  Throws:
  
  PythonServerException - in case something goes wrong.
- sentenceTransformersFineTuning
  
  public float sentenceTransformersFineTuning(SentenceTransformersFineTuner fineTuner, File trainingFile, File validationFile) throws PythonServerException
  
  Run fine tuning for sentence transformers.
  
  Parameters:
  
  fineTuner - the matcher
  
  trainingFile - path to csv file with three columns (text left, text right, label 1/0).
  
  validationFile - the path to the validation file - can also be null to use train test split of trainings file.
  
  Returns:
  
  the best score of the validation (using the file or train test split).
  
  Throws:
  
  PythonServerException - in case something goes wrong.
- transformersFineTunerUpdateBaseRequest
  
  private void transformersFineTunerUpdateBaseRequest(TransformersBaseFineTuner fineTuner, File trainingFile, org.apache.http.client.methods.HttpGet request)
- transformersUpdateBaseRequest
  
  private void transformersUpdateBaseRequest(TransformersBase base, org.apache.http.client.methods.HttpGet request)
- runOpenEAModel
  
  public void runOpenEAModel(File argumentFile, boolean save) throws Exception
  
  Run the openEA library.
  
  Parameters:
  
  argumentFile - the argument file to use
  
  save - saves the embeddings to files
  
  Throws:
  
  Exception - in case something goes wrong.
- learnAndApplyMLModel
  
  public List<Integer> learnAndApplyMLModel(File trainFile, File predictFile, int cv, int jobs) throws Exception
  
  Learn a ML model for a given training file. This file should be comma separated and containing a header. The class attribute should be named "target".
  
  Parameters:
  
  trainFile - the train file
  
  predictFile - the file to predict
  
  cv - number of cross validations
  
  jobs - number of parallel jobs to run
  
  Returns:
  
  a list of double
  
  Throws:
  
  Exception - throws exception in case of errors
- trainAndStoreMLModel
  
  public void trainAndStoreMLModel(File trainFile, File modelFile, int cv, int jobs) throws Exception
  
  Learn a ML model for a given training file and stores it in the given model file. The training file should be comma separated and containing a header. The class attribute should be named "target".
  
  Parameters:
  
  trainFile - the train file
  
  modelFile - where to store the model
  
  cv - number of cross validations
  
  jobs - number of parallel jobs to run
  
  Throws:
  
  Exception - throws exception in case of errors
- applyStoredMLModel
  
  public List<Integer> applyStoredMLModel(File modelFile, File predictFile) throws Exception
  
  Apply a stored model to a new file (predict file).
  
  Parameters:
  
  predictFile - the predict file
  
  modelFile - where to store the model
  
  Returns:
  
  a list of integers which represents the classes
  
  Throws:
  
  Exception - throws exception in case of errors
- alignModel
  
  public Alignment alignModel(String vectorPathSource, String vectorPathTarget, String function, Alignment alignment) throws Exception
  
  Align two knowledge graph embeddings
  
  Parameters:
  
  vectorPathSource - the source path to a vector file
  
  vectorPathTarget - the target path to a vector file
  
  function - function which is used to translate the embeddings
  
  alignment - the alignment with initial mapping
  
  Returns:
  
  alignment
  
  Throws:
  
  Exception - in case of errors
- parseJSON
  
  private Alignment parseJSON(String resultString) throws Exception
  
  Throws:
  
  Exception
- trainVectorSpaceModel
  
  public void trainVectorSpaceModel(String modelPath, String trainingFilePath)
  
  Method to train a vector space model. The file for the training (i.e., csv file where first column is id and second column text) has to exist already.
  
  Parameters:
  
  modelPath - identifier for the model (used for querying a specific model
  
  trainingFilePath - The file path to the file that shall be used for training.
- queryVectorSpaceModel
  
  public double queryVectorSpaceModel(String modelPath, String documentIdOne, String documentIdTwo) throws Exception
  
  Method to query a vector space model (which has to be trained with trainVectorSpaceModel).
  
  Parameters:
  
  modelPath - identifier for the model (used for querying a specific model
  
  documentIdOne - Document id for the first document
  
  documentIdTwo - Document id for the second document
  
  Returns:
  
  The cosine similarity in the vector space between the two documents.
  
  Throws:
  
  Exception - Thrown if there are server problems.
- queryVectorSpaceModel
  
  public List<Double> queryVectorSpaceModel(String modelPath, List<Correspondence> alignment) throws Exception
  
  Method to query a vector space model (which has to be trained with trainVectorSpaceModel) in a batch mode.
  
  Parameters:
  
  modelPath - identifier for the model (used for querying a specific model
  
  alignment - the alignment which contains the source and target uris
  
  Returns:
  
  The cosine similarities in the vector space between the requested documents in the same order .
  
  Throws:
  
  Exception - Thrown if there are server problems.
- queryVectorSpaceModel
  
  public Alignment queryVectorSpaceModel(String modelPath, Alignment alignment) throws Exception
  
  Method to query a vector space model (which has to be trained with trainVectorSpaceModel) in a batch mode.
  
  Parameters:
  
  modelPath - identifier for the model (used for querying a specific model
  
  alignment - the alignment which contains the source and target uris
  
  Returns:
  
  The alignment where the confidence is updated if possible
  
  Throws:
  
  Exception - Thrown if there are server problems.
- trainDoc2VecModel
  
  public void trainDoc2VecModel(String modelPath, String trainingFilePath, Word2VecConfiguration configuration)
  
  Method to train a doc2vec model. The file for the training (i.e., csv file where first column is id and second colum text) has to exist already.
  
  Parameters:
  
  modelPath - identifier for the model (used for querying a specific model
  
  trainingFilePath - The file path to the file that shall be used for training.
  
  configuration - the configuration for the doc2vec model
- queryDoc2VecModel
  
  public List<Double> queryDoc2VecModel(String modelPath, List<Correspondence> alignment) throws Exception
  
  Method to query a doc2vec model (which has to be trained with trainDoc2VecModel) in a batch mode.
  
  Parameters:
  
  modelPath - identifier for the model (used for querying a specific model
  
  alignment - the alignment which contains the source and target uris
  
  Returns:
  
  The cosine similarities in the doc2vec space between the requested documents in the same order .
  
  Throws:
  
  Exception - Thrown if there are server problems.
- trainWord2VecModel
  
  public boolean trainWord2VecModel(String modelOrVectorPath, String trainingFilePath, Word2VecConfiguration configuration)
  
  Method to train a word2vec model. The file for the training (i.e., file with sentences, paths etc.) has to exist already.
  
  Parameters:
  
  modelOrVectorPath - If a vector file is desired, the file ending '.kv' is required.
  
  trainingFilePath - The file path to the file that shall be used for training or to the directory containing the files that shall be used.
  
  configuration - The configuration for the training operation.
  
  Returns:
  
  True if training succeeded, else false.
- getSimilarity
  
  public double getSimilarity(String concept1, String concept2, String modelOrVectorPath)
  
  Ge the similarity given 2 concepts and a gensim model.
  
  Parameters:
  
  concept1 - First concept.
  
  concept2 - Second concept.
  
  modelOrVectorPath - The path to the model or vector file. Note that the vector file MUST end with .kv in order to be recognized as vector file.
  
  Returns:
  
  -1.0 in case of failure, else similarity.
- getVector
  
  public Double[] getVector(String concept, String modelOrVectorPath)
  
  Returns the vector of a concept.
  
  Parameters:
  
  concept - The concept for which the vector shall be obtained.
  
  modelOrVectorPath - The model path or vector file path leading to the file to be used.
  
  Returns:
  
  The vector for the specified concept.
- isInVocabulary
  
  public boolean isInVocabulary(String concept, File modelOrVectorPath)
  
  Returns true when the concept can be found in the vocabulary of the model.
  
  Parameters:
  
  concept - The concept/URI that shall be looked up.
  
  modelOrVectorPath - The model or vector file. Note that the vector file MUST end with .kv in order to be recognized as vector file.
  
  Returns:
  
  True if exists, else false.
- isInVocabulary
  
  public boolean isInVocabulary(String concept, String modelOrVectorPath)
  
  Returns true when the concept can be found in the vocabulary of the model.
  
  Parameters:
  
  concept - The concept/URI that shall be looked up.
  
  modelOrVectorPath - The path to the model or vector file. Note that the vector file MUST end with .kv in order to be recognized as vector file.
  
  Returns:
  
  True if exists, else false.
- getVocabularyTerms
  
  public Set<String> getVocabularyTerms(String modelOrVectorPath)
  
  Returns the full vocabulary of the specified model as HashSet (e.g. for fast indexing). Be aware that this operation can be very memory-consuming for very large models.
  Note: If you want to just check whether a concept exists in the vocabulary, it is better to call isInVocabulary(String, String).Note further that you do not need to build your own cache if the PythonServer has enabled vector caching (you can check this with isVectorCaching().
  
  Parameters:
  
  modelOrVectorPath - The path to the model or vector file. Note that the vector file MUST end with .kv in * order to be recognized as vector file.
  
  Returns:
  
  Returns all vocabulary entries without vectors in a String HashSet.
- writeVocabularyToFile
  
  public void writeVocabularyToFile(String modelOrVectorPath, String fileToWritePath)
  
  Writes the vocabulary of the given gensim model to a text file (UTF-8 encoded).
  
  Parameters:
  
  modelOrVectorPath - The model of which the vocabulary shall be obtained.
  
  fileToWritePath - The file path of the file that shall be written.
- writeVocabularyToFile
  
  public void writeVocabularyToFile(String modelOrVectorPath, File fileToWrite)
  
  Writes the vocabulary of the given gensim model to a text file (UTF-8 encoded).
  
  Parameters:
  
  modelOrVectorPath - The model of which the vocabulary shall be obtained.
  
  fileToWrite - The file that shall be written.
- writeSetToFile
  
  private static <T> void writeSetToFile(File fileToWrite, Set<T> setToWrite)
  
  This method writes the content of a Set<String> to a file. The file will be UTF-8 encoded.
  
  Type Parameters:
  
  T - Type of the Set.
  
  Parameters:
  
  fileToWrite - File which will be created and in which the data will be written.
  
  setToWrite - Set whose content will be written into fileToWrite.
- addModelToRequest
  
  private void addModelToRequest(org.apache.http.client.methods.HttpGet request, String modelOrVectorPath)
  
  Given a path to a model or vector file, this method determines whether it is a model or a vector file and adds the corresponding parameter to the request.
  
  Parameters:
  
  request - The request to which the model/vector file shall be added to.
  
  modelOrVectorPath - The path to the model/vector file.
- getCanonicalPath
  
  private String getCanonicalPath(String filePath)
  
  Obtain the canonical model path.
  
  Parameters:
  
  filePath - The path to the gensim model or gensim vector file.
  
  Returns:
  
  The canonical model path as String.
- getCanonicalPath
  
  private String getCanonicalPath(File file)
  
  Obtain the canonical model path.
  
  Parameters:
  
  file - the file to get the canonical path from
  
  Returns:
  
  The canonical path as String.
- getCanonicalPathNonExistent
  
  private String getCanonicalPathNonExistent(File file)
  
  Obtain the canonical model path.
  
  Parameters:
  
  file - the file to get the canonical path from
  
  Returns:
  
  The canonical path as String.
- runGroupShuffleSplit
  
  public List<Integer> runGroupShuffleSplit(List<Integer> groups, double trainSize) throws Exception
  
  Throws:
  
  Exception
- printHello
  
  private void printHello(String name)
  
  A quick technical demo. If the service works, it will print "Hello name".
  
  Parameters:
  
  name - The name that shall be printed.
- runRequest
  
  private String runRequest(org.apache.http.client.methods.HttpUriRequest request) throws PythonServerException
  
  Throws:
  
  PythonServerException
- getInstance
  
  public static PythonServer getInstance()
  
  Get the instance.
  
  Returns:
  
  Gensim instance.
- getInstance
  
  public static PythonServer getInstance(File resourcesDirectory)
  
  Get the instance (singleton pattern).
  
  Parameters:
  
  resourcesDirectory - Directory where the files shall be copied to.
  
  Returns:
  
  Gensim Instance
- checkRequirements
  
  public static boolean checkRequirements()
  
  Checks whether all Python requirements are installed and whether the server is functional.
  
  Returns:
  
  True if the server is fully functional, else false.
- shutDown
  
  public static void shutDown()
  
  Shut down the service.
- exportResource
  
  private void exportResource(File baseDirectory, String resourceName)
  
  Export a resource embedded into a Jar file to the local file path.
  
  Parameters:
  
  baseDirectory - The base directory.
  
  resourceName - ie.: "/SmartLibrary.dll"
- startServer
  
  private boolean startServer()
  
  Initializes the server.
  
  Returns:
  
  True if successful, else false.
- getLogLevel
  
  private String getLogLevel()
- getPythonCommand
  
  protected String getPythonCommand()
  
  Returns the python command which is extracted from file melt-resources/python_command.txt.
  
  Returns:
  
  The python executable path.
- updateEnvironmentPath
  
  protected void updateEnvironmentPath(Map<String,String> environment, String pythonCommand)
  
  Updates the environment variable PATH with additional python needed directories like env/lib/bin
  
  Parameters:
  
  environment - The environment to be changed.
  
  pythonCommand - The python executable path.
- getPythonAdditionalPath
  
  protected String getPythonAdditionalPath(String pythonCommand)
  
  Returns a concatenated path of directories which can be used in the PATH variable. It searches based on a python executable path, all bin directories within the python dir.
  
  Parameters:
  
  pythonCommand - The python executable path.
  
  Returns:
  
  a concatenated path of directories which can be used in the PATH variable.
- cosineSimilarity
  
  public static double cosineSimilarity(Double[] vector1, Double[] vector2)
  
  Calculate The cosine similarity between two vectors.
  
  Parameters:
  
  vector1 - First vector.
  
  vector2 - Second vector.
  
  Returns:
  
  Cosine similarity as double.
- writeModelAsTextFile
  
  public void writeModelAsTextFile(String modelOrVectorPath, String fileToWrite)
  
  Writes the vectors to a human-readable text file.
  
  Parameters:
  
  modelOrVectorPath - The path to the model or vector file. Note that the vector file MUST end with .kv in * order to be recognized as vector file.
  
  fileToWrite - The file that will be written.
- writeModelAsTextFile
  
  public void writeModelAsTextFile(String modelOrVectorPath, String fileToWrite, String entityFile)
  
  Writes the vectors to a human-readable text file.
  
  Parameters:
  
  modelOrVectorPath - The path to the model or vector file. Note that the vector file MUST end with .kv in * order to be recognized as vector file.
  
  fileToWrite - The file that will be written.
  
  entityFile - The vocabulary that shall appear in the text file (can be null if all words shall be written). The file must contain one word per line. The contents must be a subset of the vocabulary.
- getResourcesDirectory
  
  public File getResourcesDirectory()
- setPythonCommandBackup
  
  public static void setPythonCommandBackup(String pythonCommandBackup)
  
  Sets the python command programmatically. This is used when no external file python_command.txt is found.
  
  Parameters:
  
  pythonCommandBackup - the python command.
- setOverridePythonFiles
  
  public static void setOverridePythonFiles(boolean overrideFiles)
  
  If set to true, all python files (e.g. python server melt and requirements.txt file) will be overridden with every execution. If you want to make changes to the python server (e.g. to develop and test a feature) you can set it to false. Then all modifications to these files will not be changed.
  
  Parameters:
  
  overrideFiles - if true, override the python server files.
- getResourcesDirectoryPath
  
  public String getResourcesDirectoryPath()
  
  Get the resource directory as String.
  
  Returns:
  
  Directory as String.
- setResourcesDirectory
  
  public void setResourcesDirectory(File resourcesDirectory)
  
  Set the directory where the python files will be copied to.
  
  Parameters:
  
  resourcesDirectory - Must be a directory.
- getVocabularySize
  
  public int getVocabularySize(String modelOrVectorPath)
  
  Returns the size of the vocabulary of the stated model/vector set.
  
  Parameters:
  
  modelOrVectorPath - The path to the model or vector file. Note that the vector file MUST end with .kv in order to be recognized as vector file.
  
  Returns:
  
  -1 in case of an error else the size of the vocabulary.
- isVectorCaching
  
  public boolean isVectorCaching()
  
  If true: enabled. Else: false.
  
  Returns:
  
  True if enabled, else false.
- setVectorCaching
  
  public void setVectorCaching(boolean vectorCaching)
  
  If vector caching is turned on, similarities will be calculated on Java site (rather than in Python) and vectors are held in memories. Turn this function on, if you plan to do many computations with the same set of vectors. This will increase the performance at the cost of memory.
  
  Parameters:
  
  vectorCaching - True if caching shall be enabled, else false.
- getPort
  
  public static int getPort()
- setPort
  
  public static void setPort(int port)
- getServerUrl
  
  public static String getServerUrl()

Class PythonServer

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Details

LOGGER

DEFAULT_RESOURCES_DIRECTORY

JSON_MAPPER

serverUrl

isVectorCaching

isShutDown

vectorCache

isHookStarted

resourcesDirectory

DEFAULT_PORT

port

pythonCommandBackup

overridePythonFiles

instance

httpClient

serverProcess

Constructor Details

PythonServer

Method Details

transformersFineTuningHpSearch

transformersFineTuning

transformersPrediction

textGenerationPrediction

transformersMultiClassPrediction

sentenceTransformersPrediction

sentenceTransformersFineTuning

transformersFineTunerUpdateBaseRequest

transformersUpdateBaseRequest

runOpenEAModel

learnAndApplyMLModel

trainAndStoreMLModel

applyStoredMLModel

alignModel

parseJSON

trainVectorSpaceModel

queryVectorSpaceModel

queryVectorSpaceModel

queryVectorSpaceModel

trainDoc2VecModel

queryDoc2VecModel

trainWord2VecModel

getSimilarity

getVector

isInVocabulary

isInVocabulary

getVocabularyTerms

writeVocabularyToFile

writeVocabularyToFile

writeSetToFile

addModelToRequest

getCanonicalPath

getCanonicalPath

getCanonicalPathNonExistent

runGroupShuffleSplit

printHello

runRequest

getInstance

getInstance

checkRequirements

shutDown

exportResource

startServer

getLogLevel

getPythonCommand

updateEnvironmentPath

getPythonAdditionalPath

cosineSimilarity

writeModelAsTextFile

writeModelAsTextFile

getResourcesDirectory

setPythonCommandBackup

setOverridePythonFiles

getResourcesDirectoryPath

setResourcesDirectory