java.lang.Object

de.uni_mannheim.informatik.dws.melt.matching_jena_matchers.external.services.stringOperations.StringOperations

public class StringOperations extends Object

A helper class for string operations.

Nested Class Summary

Nested Classes

Modifier and Type

Class

Description

static enum

StringOperations.AbbreviationHandler

Enum which indicates how shortcuts in camel case are handeled.
Field Summary

Fields

Modifier and Type

Field

Description

static HashSet<String>

ENGLISH_NUMBER_WORDS_SET

A set containing nominal and cardinal numbers from 1 to 1ß

private static final org.slf4j.Logger

LOGGER

private static final String

PATH_TO_STOPWORD_FILE

private static final String

PATH_TO_STOPWORD_FILE_JAR

private static final HashSet<String>

separatingWords

private static HashSet<String>

stopwords
Constructor Summary

Constructors

Constructor

Description

StringOperations()
Method Summary

Modifier and Type

Method

Description

static HashSet<String>

addAlternativeWritingsSimple(HashSet<String> set)

Generate alternative writings (particularly interesting for English and German hyphenation).

static String

addTagIfNotExists(String addTagString)

Adds tags if they are not there yet.

static String

cleanStringForDBpediaQuery(String inputString)

This method removes illegal characters of a string when used in a SPARQL query.

static String

cleanValueFromTypeAnnotation(String valueToClean)

Will clean a value from a type annotation.

static String[]

clearArrayFromNumbers(String[] array)

Given a String array, numeric tokens will be removed.

static String[]

clearArrayFromStopwords(String[] arrayWithStopwords)

Returns an array cleaned from stopwords.

static HashSet<String>

clearHashSetFromStopwords(HashSet<String> hashSetWithStopwords)

Removes the stopwords from the given HashSet.

private static String

concatArray(String[] array)

Concatenates a string array to one string separated by spaces.

static boolean

containsSplitWords(String phrase)

static boolean

containsSplitWords(String[] phraseTokens)

static String

convertToTag(String stringToConvert)

Converts a string to a tag.

static String

getCommaSeparatedString(HashSet<String> set)

Get a comma separated list of the given HashSet<String>.

static float

getLevenshteinDistanceSimilarTokensOneWay(String[] sarray1, String[] sarray2)

Return the Levenshtein similarity between two token sets.

static int

getNumberOfTokensBestGuess(String phrase)

Returns the number of tokens that were found in a phrase.

static int

getNumberOfTokensBestGuess(String phrase, StringOperations.AbbreviationHandler handler)

Returns the number of tokens that were found in a phrase.

static boolean

hasSimilarTokenWriting(String[] sarray1, String[] sarray2, float tolerance)

Checks whether two arrays have a similar writing.

static void

initStopwords()

Initialize reading stopwords.

static boolean

isCamelCase(String phrase)

Function which indicates whether a phrase is in camel case or not.

static boolean

isEnglishNumberWord(String stringToBeChecked)

Checks whether the stringToBeChecked is a nominal or cardinal number in English in written format.

static boolean

isMeaningfulFragment(String fragment)

Checks whether a fragment is meaningful by counting the number of digits.

static boolean

isNaturalNumber(String stringToBeChecked)

Returns whether the stringToBeChecked is a number e.g.

static boolean

isSameString(String s1, String s2)

This method checks whether two Strings are very similar by performing simple string operations.

static boolean

isSameStringIgnoringStopwords(String s1, String s2)

This method checks whether two Strings are very similar by performing simple string operations.

static boolean

isSameStringIgnoringStopwordsAndNumbers(String s1, String s2)

This method checks whether two Strings are very similar by performing simple string operations.

static boolean

isSameStringIgnoringStopwordsAndNumbersWithSpellingCorrection(String s1, String s2, float maxAllowedEditDistance)

static boolean

isSameStringStemming(String s1, String s2)

This method checks whether two Strings are very similar by performing simple string operations including Porter's stemmer.

static boolean

isSpaceCase(String phrase)

Function which indicates whether a phrase is space separated or not.

static boolean

isUnderscoreCase(String phrase)

Function which indicates whether a phrase is in underscore case or not.

private static void

lazyInitStopwords()

Initialize reading stopwords file if it has not been read before.

static void

printStringArray(String[] stringArray)

A method which prints the content of a string array to the command line.

static @NotNull List<String>

readListFromFile(File file)

Reads a List from the file as specified by the file.

static @NotNull List<String>

readListFromFile(String filePath)

Reads a List from the file as specified by the file path.

static @NotNull Set<String>

readSetFromFile(File file)

Reads a Set from the file as specified by the file.

static @NotNull Set<String>

readSetFromFile(String filePath)

Reads a Set from the file as specified by the file path.

static String

reduceToLettersOnly(String string)

Cleans a string from anything that is not a letter.

static String

removeEnglishGenitiveS(String string)

Removes the English genitive s.

static String[]

removeEnglishGenitiveS(String[] array)

Removes free floating "s", "S", and cuts "'s".

static HashSet<String>

removeEnglishGenitiveS(HashSet<String> set)

Remove free floating s from the given set.

static String

removeEnglishPlural(String stringToBeModified)

Remove the plural in English words.

static String

removeLanguageAnnotation(String s)

Removes the language annotation from a string.

static String

removeNonAlphanumericCharacters(String stringWithPunctuation)

Removes everything that is not a digit, character, space, or underscore.

static HashSet<String>

removeNumbers(HashSet<String> set)

Remove numbers from a set of strings.

static String

removeTag(String tagToConvert)

Removes the tags of a tag.

static String[]

splitUsingSplitWords(String[] phraseTokens)

static String

stemPorter(String word)

Wrapping of Porter's Stemming Code.

static String[]

tokenizeBestGuess(String phrase)

Given an arbitrary phrase, the method determines which casing is used and applies the suited tokenizer.

static String[]

tokenizeBestGuess(String phrase, StringOperations.AbbreviationHandler handler)

Given an arbitrary phrase, the method determines which casing is used and applies the suited tokenizer.

static String[]

tokenizeCamelCase(String phrase, StringOperations.AbbreviationHandler handler)

Given a camel cased String, this method will split it into multiple tokens.

static String[]

tokenizeCamelCaseAndSlash(String phrase, StringOperations.AbbreviationHandler handler)

Tokenize and use camelCase and slashes as tokenization tokens.

static String[]

tokenizeSpaceCase(String phrase)

Tokenizes phrase using strings.

static String[]

tokenizeUnderScoreCase(String phrase)

Tokenizes phrase using lower scores.

private static String[]

tokenizeWithoutCamelCaseRecognition(String phrase)

Split using slash, underscore and space.

static <T> void

writeSetToFile(File fileToWrite, Set<T> setToWrite)

This method writes the content of a Set<String> to a file.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Details
- LOGGER
  
  private static final org.slf4j.Logger LOGGER
- separatingWords
  
  private static final HashSet<String> separatingWords
- stopwords
  
  private static HashSet<String> stopwords
- PATH_TO_STOPWORD_FILE
  
  private static final String PATH_TO_STOPWORD_FILE
  See Also:
  
  Constant Field Values
- PATH_TO_STOPWORD_FILE_JAR
  
  private static final String PATH_TO_STOPWORD_FILE_JAR
  See Also:
  
  Constant Field Values
- ENGLISH_NUMBER_WORDS_SET
  
  public static HashSet<String> ENGLISH_NUMBER_WORDS_SET
  
  A set containing nominal and cardinal numbers from 1 to 1ß
Constructor Details
- StringOperations
  
  public StringOperations()
Method Details
- isCamelCase
  
  public static boolean isCamelCase(String phrase)
  
  Function which indicates whether a phrase is in camel case or not.
  
  Parameters:
  
  phrase - The phrase to be checked.
  
  Returns:
  
  true if phrase is in camel case, else false.
- isUnderscoreCase
  
  public static boolean isUnderscoreCase(String phrase)
  
  Function which indicates whether a phrase is in underscore case or not.
  
  Parameters:
  
  phrase - The phrase to be checked.
  
  Returns:
  
  True if phrase is in underscore case, else false.
- isSpaceCase
  
  public static boolean isSpaceCase(String phrase)
  
  Function which indicates whether a phrase is space separated or not.
  
  Parameters:
  
  phrase - The phrase to be checked.
  
  Returns:
  
  True if space-separated, else false.
- tokenizeCamelCase
  
  public static String[] tokenizeCamelCase(String phrase, StringOperations.AbbreviationHandler handler)
  
  Given a camel cased String, this method will split it into multiple tokens.
  
  Parameters:
  
  phrase - The phrase to be tokenized.
  
  handler - Determines how to handle abbreviations.
  
  Returns:
  
  The tokens of the phrase.
- tokenizeSpaceCase
  
  public static String[] tokenizeSpaceCase(String phrase)
  
  Tokenizes phrase using strings.
  
  Parameters:
  
  phrase - The phrase to be tokenized.
  
  Returns:
  
  The tokens of the phrase.
- tokenizeUnderScoreCase
  
  public static String[] tokenizeUnderScoreCase(String phrase)
  
  Tokenizes phrase using lower scores.
  
  Parameters:
  
  phrase - The phrase to be tokenized.
  
  Returns:
  
  The tokens of the phrase.
- printStringArray
  
  public static void printStringArray(String[] stringArray)
  
  A method which prints the content of a string array to the command line.
  
  Parameters:
  
  stringArray - Array to be printed.
- tokenizeBestGuess
  
  public static String[] tokenizeBestGuess(String phrase, StringOperations.AbbreviationHandler handler)
  
  Given an arbitrary phrase, the method determines which casing is used and applies the suited tokenizer. The tokenizer is not very aggressive. A '-' for instance, will not be used as splitter. For camel cased phrases with abbreviations, all combinations are determined if no handler is defined.
  
  Parameters:
  
  phrase - The phrase to be tokenized.
  
  handler - The handler which determines how abbreviations shall be handled.
  
  Returns:
  
  Tokens.
- tokenizeWithoutCamelCaseRecognition
  
  private static String[] tokenizeWithoutCamelCaseRecognition(String phrase)
  
  Split using slash, underscore and space.
  
  Parameters:
  
  phrase - Phrase to be splitted.
  
  Returns:
  
  Array of individual tokens.
- tokenizeCamelCaseAndSlash
  
  public static String[] tokenizeCamelCaseAndSlash(String phrase, StringOperations.AbbreviationHandler handler)
  
  Tokenize and use camelCase and slashes as tokenization tokens.
  
  Parameters:
  
  phrase - The phrase to be tokenized.
  
  handler - Abbreviation handler.
  
  Returns:
  
  String array of tokens.
- tokenizeBestGuess
  
  public static String[] tokenizeBestGuess(String phrase)
  
  Given an arbitrary phrase, the method determines which casing is used and applies the suited tokenizer. For camel cased phrases with abbreviations, it is assumed that an upper case follows an abbreviation.
  
  Parameters:
  
  phrase - The phrase to be tokenized.
  
  Returns:
  
  Tokens.
- getNumberOfTokensBestGuess
  
  public static int getNumberOfTokensBestGuess(String phrase, StringOperations.AbbreviationHandler handler)
  
  Returns the number of tokens that were found in a phrase.
  
  Parameters:
  
  phrase - The phrase to be checked.
  
  handler - defines the handling of abbreviations. Note that AbbreviationHandler.CONSIDER_ALL leads to more tokens than actually exist because combinations are employed.
  
  Returns:
  
  Number of tokens.
- getNumberOfTokensBestGuess
  
  public static int getNumberOfTokensBestGuess(String phrase)
  
  Returns the number of tokens that were found in a phrase. Note that the number of tokens is obtained using AbbreviationHandler.UPPER_CASE_FOLLOWS_ABBREVIATION in the default case. Note further that stopword removal is not taken into account. Be careful when mixing with stopword removal.
  
  Parameters:
  
  phrase - The phrase that shall be checked.
  
  Returns:
  
  The number of tokens.
- containsSplitWords
  
  public static boolean containsSplitWords(String phrase)
  
  Parameters:
  
  phrase - The phrase to be checked.
  
  Returns:
  
  True if the phrase contains split words.
- containsSplitWords
  
  public static boolean containsSplitWords(String[] phraseTokens)
  
  Parameters:
  
  phraseTokens - The tokens that shall be processed.
  
  Returns:
  
  True if the tokens contain split words, else false.
- splitUsingSplitWords
  
  public static String[] splitUsingSplitWords(String[] phraseTokens)
- concatArray
  
  private static String concatArray(String[] array)
  
  Concatenates a string array to one string separated by spaces.
  
  Parameters:
  
  array - Array that shall be concatenated.
  
  Returns:
  
  Concatenated array as String.
- cleanStringForDBpediaQuery
  
  public static String cleanStringForDBpediaQuery(String inputString)
  
  This method removes illegal characters of a string when used in a SPARQL query.
  
  Parameters:
  
  inputString - Input String.
  
  Returns:
  
  Edited String.
- reduceToLettersOnly
  
  public static String reduceToLettersOnly(String string)
  
  Cleans a string from anything that is not a letter.
  
  Parameters:
  
  string - String to be cleaned.
  
  Returns:
  
  Cleaned String.
- writeSetToFile
  
  public static <T> void writeSetToFile(File fileToWrite, Set<T> setToWrite)
  
  This method writes the content of a Set<String> to a file. The file will be UTF-8 encoded.
  
  Type Parameters:
  
  T - Type of the Set.
  
  Parameters:
  
  fileToWrite - File which will be created and in which the data will be written.
  
  setToWrite - Set whose content will be written into fileToWrite.
- readSetFromFile
  
  @NotNull public static @NotNull Set<String> readSetFromFile(String filePath)
  
  Reads a Set from the file as specified by the file path.
  
  Parameters:
  
  filePath - The path to the file that is to be read.
  
  Returns:
  
  The parsed file as Set.
- readSetFromFile
  
  @NotNull public static @NotNull Set<String> readSetFromFile(File file)
  
  Reads a Set from the file as specified by the file.
  
  Parameters:
  
  file - The file that is to be read.
  
  Returns:
  
  The parsed file as Set.
- readListFromFile
  
  @NotNull public static @NotNull List<String> readListFromFile(String filePath)
  
  Reads a List from the file as specified by the file path.
  
  Parameters:
  
  filePath - The path to the file that is to be read.
  
  Returns:
  
  The parsed file as List.
- readListFromFile
  
  @NotNull public static @NotNull List<String> readListFromFile(File file)
  
  Reads a List from the file as specified by the file.
  
  Parameters:
  
  file - The file that is to be read.
  
  Returns:
  
  The parsed file as List.
- convertToTag
  
  public static String convertToTag(String stringToConvert)
  
  Converts a string to a tag. Example: "Hagrid" will be converted to "<Hagrid>". If the string is already a tag, the string will be returned as it is.s
  
  Parameters:
  
  stringToConvert - The String which shall be converted to a tag.
  
  Returns:
  
  The String as tag.
- removeTag
  
  public static String removeTag(String tagToConvert)
  
  Removes the tags of a tag. Example: "<Hagrid>" will be converted to "Hagrid".
  
  Parameters:
  
  tagToConvert - The tag which shall be converted.
  
  Returns:
  
  The string as non-tag.
- addTagIfNotExists
  
  public static String addTagIfNotExists(String addTagString)
  
  Adds tags if they are not there yet. "<Hagrid>" will be converted to "<Hagrid>", "Hagrid" will be converted to "<Hagrid>", "<Hagrid" will be converted to "<Hagrid>" etc.
  
  Parameters:
  
  addTagString - String to which tags shall be added.
  
  Returns:
  
  Tagged string.
- removeEnglishPlural
  
  public static String removeEnglishPlural(String stringToBeModified)
  
  Remove the plural in English words.
  
  Parameters:
  
  stringToBeModified - The string that shall be modified.
  
  Returns:
  
  Modified string.
- removeLanguageAnnotation
  
  public static String removeLanguageAnnotation(String s)
  
  Removes the language annotation from a string. If the string does not have a language annotation, the string will be returned unchanged. Example: "Hagrid@en" will be changed to "Hagrid".
  
  Parameters:
  
  s - String to be changed.
  
  Returns:
  
  String without language annotation.
- cleanValueFromTypeAnnotation
  
  public static String cleanValueFromTypeAnnotation(String valueToClean)
  
  Will clean a value from a type annotation. Example. "0.816318^^http://www.w3.org/2001/XMLSchema#float" will be cleaned to 0.816318.
  
  Parameters:
  
  valueToClean - The value that shall be cleaned.
  
  Returns:
  
  The cleaned value as String.
- isSameStringStemming
  
  public static boolean isSameStringStemming(String s1, String s2)
  
  This method checks whether two Strings are very similar by performing simple string operations including Porter's stemmer.
  
  Parameters:
  
  s1 - String 1.
  
  s2 - String 2.
  
  Returns:
  
  boolean
- isSameString
  
  public static boolean isSameString(String s1, String s2)
  
  This method checks whether two Strings are very similar by performing simple string operations. Stopwords are retained.
  
  Parameters:
  
  s1 - String 1
  
  s2 - String 2
  
  Returns:
  
  boolean
- isSameStringIgnoringStopwordsAndNumbersWithSpellingCorrection
  
  public static boolean isSameStringIgnoringStopwordsAndNumbersWithSpellingCorrection(String s1, String s2, float maxAllowedEditDistance)
- hasSimilarTokenWriting
  
  public static boolean hasSimilarTokenWriting(String[] sarray1, String[] sarray2, float tolerance)
  
  Checks whether two arrays have a similar writing. Every token is matched to its most similar token. Tokens can be used multiple times.
  
  Parameters:
  
  sarray1 - Array 1
  
  sarray2 - Array 2
  
  tolerance - The minimal tolerance that is allowed.
  
  Returns:
  
  True if the distance is less or equal to the allowed distance.
- getLevenshteinDistanceSimilarTokensOneWay
  
  public static float getLevenshteinDistanceSimilarTokensOneWay(String[] sarray1, String[] sarray2)
  
  Return the Levenshtein similarity between two token sets. This is only a one-way test: if sarray2 contains all tokens of sarray1, then the distance will be 0 even though sarray2 might contain additional tokens that are not contained in sarray2. Tokens can be used multiple times
  
  Parameters:
  
  sarray1 - Array 1
  
  sarray2 - Array 2
  
  Returns:
  
  Distance as float.
- isSameStringIgnoringStopwords
  
  public static boolean isSameStringIgnoringStopwords(String s1, String s2)
  
  This method checks whether two Strings are very similar by performing simple string operations. Stopwords are removed.
  
  Parameters:
  
  s1 - String 1
  
  s2 - String 2
  
  Returns:
  
  boolean
- isSameStringIgnoringStopwordsAndNumbers
  
  public static boolean isSameStringIgnoringStopwordsAndNumbers(String s1, String s2)
  
  This method checks whether two Strings are very similar by performing simple string operations. Stopwords and numbers are removed.
  
  Parameters:
  
  s1 - String 1
  
  s2 - String 2
  
  Returns:
  
  boolean
- clearArrayFromStopwords
  
  public static String[] clearArrayFromStopwords(String[] arrayWithStopwords)
  
  Returns an array cleaned from stopwords. Retains the ordering.
  
  Parameters:
  
  arrayWithStopwords - Array with stopwords.
  
  Returns:
  
  Array without stopwords.
- clearHashSetFromStopwords
  
  public static HashSet<String> clearHashSetFromStopwords(HashSet<String> hashSetWithStopwords)
  
  Removes the stopwords from the given HashSet.
  
  Parameters:
  
  hashSetWithStopwords - HashSet from which the stopwords shall be removed.
  
  Returns:
  
  Cleared HashSet
- removeEnglishGenitiveS
  
  public static String[] removeEnglishGenitiveS(String[] array)
  
  Removes free floating "s", "S", and cuts "'s".
  
  Parameters:
  
  array - Array to be transformed.
  
  Returns:
  
  New array.
- removeEnglishGenitiveS
  
  public static HashSet<String> removeEnglishGenitiveS(HashSet<String> set)
  
  Remove free floating s from the given set.
  
  Parameters:
  
  set - Set from which s shall be removed.
  
  Returns:
  
  Set with removed s/S.
- stemPorter
  
  public static String stemPorter(String word)
  
  Wrapping of Porter's Stemming Code.
  
  Parameters:
  
  word - Word to be stemmed.
  
  Returns:
  
  Stemmed word.
- lazyInitStopwords
  
  private static void lazyInitStopwords()
  
  Initialize reading stopwords file if it has not been read before.
- initStopwords
  
  public static void initStopwords()
  
  Initialize reading stopwords.
- isMeaningfulFragment
  
  public static boolean isMeaningfulFragment(String fragment)
  
  Checks whether a fragment is meaningful by counting the number of digits.
  
  Parameters:
  
  fragment - The fragment for which relevance shall be checked.
  
  Returns:
  
  Returns false if at least half of the fragment is composed of digits.
- addAlternativeWritingsSimple
  
  public static HashSet<String> addAlternativeWritingsSimple(HashSet<String> set)
  
  Generate alternative writings (particularly interesting for English and German hyphenation).
  
  Parameters:
  
  set - The set which shall be processed..
  
  Returns:
  
  The new set with alternative writings.
- removeNumbers
  
  public static HashSet<String> removeNumbers(HashSet<String> set)
  
  Remove numbers from a set of strings.
  
  Parameters:
  
  set - Set from which numbers shall be removed.
  
  Returns:
  
  A new set with no number instances.
- clearArrayFromNumbers
  
  public static String[] clearArrayFromNumbers(String[] array)
  
  Given a String array, numeric tokens will be removed.
  
  Parameters:
  
  array - The array from which numeric components shall be removed.
  
  Returns:
  
  The new array will be of smaller length while the order of tokens will be retained.
- isNaturalNumber
  
  public static boolean isNaturalNumber(String stringToBeChecked)
  
  Returns whether the stringToBeChecked is a number e.g. '123' or 'XI'. For reasons of performance, the syntax of roman numbers is not checked.
  
  Parameters:
  
  stringToBeChecked - The string for numeric properties shall be checked.
  
  Returns:
  
  True if roman or arabic number, else false.
- isEnglishNumberWord
  
  public static boolean isEnglishNumberWord(String stringToBeChecked)
  
  Checks whether the stringToBeChecked is a nominal or cardinal number in English in written format. The number must be between 0 and 10 in order to be detected.
  
  Parameters:
  
  stringToBeChecked - The string that shall be checked.
  
  Returns:
  
  True if the String is an English number word (e.g. 'nine' or 'fifth'), else false.
- removeNonAlphanumericCharacters
  
  public static String removeNonAlphanumericCharacters(String stringWithPunctuation)
  
  Removes everything that is not a digit, character, space, or underscore. Note: In English, this may lead to a concatenations of the genitive s together with the latter word e.g. that's → thats. It might make sense to remove those first.
  
  Parameters:
  
  stringWithPunctuation - String with punctuation.
  
  Returns:
  
  String without punctuation.
- removeEnglishGenitiveS
  
  public static String removeEnglishGenitiveS(String string)
  
  Removes the English genitive s.
  
  Parameters:
  
  string - String that might contain genitive s.
  
  Returns:
  
  Edited String.
- getCommaSeparatedString
  
  public static String getCommaSeparatedString(HashSet<String> set)
  
  Get a comma separated list of the given HashSet<String>.
  
  Parameters:
  
  set - The set that shall be represented as comma separated String.
  
  Returns:
  
  The elements of the Set in a String separated by a comma.

Class StringOperations

Nested Class Summary

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Details

LOGGER

separatingWords

stopwords

PATH_TO_STOPWORD_FILE

PATH_TO_STOPWORD_FILE_JAR

ENGLISH_NUMBER_WORDS_SET

Constructor Details

StringOperations

Method Details

isCamelCase

isUnderscoreCase

isSpaceCase

tokenizeCamelCase

tokenizeSpaceCase

tokenizeUnderScoreCase

printStringArray

tokenizeBestGuess

tokenizeWithoutCamelCaseRecognition

tokenizeCamelCaseAndSlash

tokenizeBestGuess

getNumberOfTokensBestGuess

getNumberOfTokensBestGuess

containsSplitWords

containsSplitWords

splitUsingSplitWords

concatArray

cleanStringForDBpediaQuery

reduceToLettersOnly

writeSetToFile

readSetFromFile

readSetFromFile

readListFromFile

readListFromFile

convertToTag

removeTag

addTagIfNotExists

removeEnglishPlural

removeLanguageAnnotation

cleanValueFromTypeAnnotation

isSameStringStemming

isSameString

isSameStringIgnoringStopwordsAndNumbersWithSpellingCorrection

hasSimilarTokenWriting

getLevenshteinDistanceSimilarTokensOneWay

isSameStringIgnoringStopwords

isSameStringIgnoringStopwordsAndNumbers

clearArrayFromStopwords

clearHashSetFromStopwords

removeEnglishGenitiveS

removeEnglishGenitiveS

stemPorter

lazyInitStopwords

initStopwords

isMeaningfulFragment

addAlternativeWritingsSimple

removeNumbers

clearArrayFromNumbers

isNaturalNumber

isEnglishNumberWord

removeNonAlphanumericCharacters

removeEnglishGenitiveS

getCommaSeparatedString