Class Word2VecConfiguration

java.lang.Object
de.uni_mannheim.informatik.dws.melt.matching_ml.python.Word2VecConfiguration

public class Word2VecConfiguration
extends Object
The configuration for the word2vec calculation.
  • Field Details

    • LOGGER

      private static org.slf4j.Logger LOGGER
      Default logger.
    • type

      private Word2VecType type
      Model type. Default type: SG.
    • vectorDimension

      private int vectorDimension
      Size of the vector. Default: 200.
    • VECTOR_DIMENSION_DEFAULT

      public static final int VECTOR_DIMENSION_DEFAULT
      Default value for parameter vectorDimension.
      See Also:
      Constant Field Values
    • windowSize

      private int windowSize
      The size of the window during the word2vec training. Default: 5.
    • WINDOW_SIZE_DEFAULT

      public static final int WINDOW_SIZE_DEFAULT
      Default value for parameter windowSize.
      See Also:
      Constant Field Values
    • iterations

      private int iterations
      Iterations during the word2vec training.
    • ITERATIONS_DEFAULT

      public static final int ITERATIONS_DEFAULT
      Default value for parameter iterations.
      See Also:
      Constant Field Values
    • negatives

      private int negatives
      The number of negatives during the word2vec training. Default 5.
    • NEGATIVES_DEFAULT

      public static final int NEGATIVES_DEFAULT
      Default value for parameter negatives.
      See Also:
      Constant Field Values
    • minCount

      private int minCount
      The minimum count for the word2vec training.
    • MIN_COUNT_DEFAULT

      public static final int MIN_COUNT_DEFAULT
      Default for parameter minCount
      See Also:
      Constant Field Values
    • numberOfThreads

      private int numberOfThreads
      The number of threads to be used for the computation.
    • sample

      private double sample
      Documentation of parameter from the gensim documentation: "The threshold for configuring which higher-frequency words are randomly downsampled, useful range is (0, 1e-5)."
    • SAMPLE_DEFAULT

      public static final double SAMPLE_DEFAULT
      Default for sample parameter.
      See Also:
      Constant Field Values
    • epochs

      private int epochs
      The epochs to be used for the training.
    • EPOCHS_DEFAULT

      public static final int EPOCHS_DEFAULT
      Default for epochs parameter.
      See Also:
      Constant Field Values
  • Constructor Details

    • Word2VecConfiguration

      public Word2VecConfiguration()
      Default Constructor. Many parameters are assumed such as training type SG.
    • Word2VecConfiguration

      public Word2VecConfiguration​(Word2VecType type)
      Constructor
      Parameters:
      type - Training type (SG/CBOW).
    • Word2VecConfiguration

      public Word2VecConfiguration​(Word2VecType type, int vectorDimension)
      Constructor
      Parameters:
      type - Training type (SG/CBOW).
      vectorDimension - ize of the vectors (number of elements).
    • Word2VecConfiguration

      public Word2VecConfiguration​(Word2VecType type, int vectorDimension, int iterations)
      Constructor
      Parameters:
      type - Training type (SG/CBOW).
      vectorDimension - Size of the vectors (number of elements).
      iterations - aka epochs
  • Method Details

    • getNumberOfThreads

      public int getNumberOfThreads()
    • setNumberOfThreads

      public void setNumberOfThreads​(int numberOfThreads)
    • getNegatives

      public int getNegatives()
    • setNegatives

      public void setNegatives​(int negatives)
    • getIterations

      public int getIterations()
    • setIterations

      public void setIterations​(int iterations)
    • getWindowSize

      public int getWindowSize()
    • setWindowSize

      public void setWindowSize​(int windowSize)
    • getVectorDimension

      public int getVectorDimension()
    • setVectorDimension

      public void setVectorDimension​(int vectorDimension)
    • getMinCount

      public int getMinCount()
    • setMinCount

      public void setMinCount​(int minCount)
    • getType

      public Word2VecType getType()
    • setType

      public void setType​(Word2VecType type)
    • getSample

      public double getSample()
    • setSample

      public void setSample​(double sample)
    • getEpochs

      public int getEpochs()
    • setEpochs

      public void setEpochs​(int epochs)