java.lang.Object

eu.sealsproject.platform.res.tool.impl.AbstractPlugin

de.uni_mannheim.informatik.dws.melt.matching_jena_matchers.elementlevel.StopwordExtraction

All Implemented Interfaces:: IMatcher<org.apache.jena.ontology.OntModel,Alignment,Properties>, eu.sealsproject.platform.res.domain.omt.IOntologyMatchingToolBridge, eu.sealsproject.platform.res.tool.api.IPlugin, eu.sealsproject.platform.res.tool.api.IToolBridge

public class StopwordExtraction extends MatcherYAAAJena

Extracts corpus dependent stopwords from instances, classes and properties.

Field Summary

Fields

Modifier and Type

Field

Description

private boolean

countDistinctTermsPerResource

If true, counts only tokens only once (even if it appears in one literal multiple times or multiple times in different literals).

private static final org.slf4j.Logger

LOGGER

private double

stopwordsPercentage

The percentage how many resources this token must have to count as a stopword.

private Function<String,Collection<String>>

tokenizer

Tokenizer function.

private int

topNStopwords

Extracts the N top most tokens as stopwords.

private List<TextExtractor>

valueExtractors

Literal extractors to choose which literal/properties should be used.

Fields inherited from class de.uni_mannheim.informatik.dws.melt.matching_base.MatcherFile
FILE_PREFIX, FILE_SUFFIX
Constructor Summary

Constructors

Constructor

Description

StopwordExtraction(Function<String,Collection<String>> tokenizer, boolean countDistinctTermsPerResource, int topNStopwords, double stopwordsPercentage, TextExtractor... valueExtractors)

Extracts the stopwords based on two criteria.

StopwordExtraction(Function<String,Collection<String>> tokenizer, boolean countDistinctTermsPerResource, int topNStopwords, double stopwordsPercentage, List<TextExtractor> valueExtractors)

Extracts the stopwords based on two criteria.

StopwordExtraction(Function<String,Collection<String>> tokenizer, double stopwordsPercentage, org.apache.jena.rdf.model.Property... properties)

Extracts the stopwords based on the percentage (should be between 0 and 1).

StopwordExtraction(Function<String,Collection<String>> tokenizer, int topNStopwords, org.apache.jena.rdf.model.Property... properties)

Extracts the stopwords based on the top most occuring tokens.
Method Summary

Modifier and Type

Method

Description

Set<String>

extractStopwords(Iterable<? extends org.apache.jena.rdf.model.Resource> resources)

Set<String>

extractStopwords(Iterator<? extends org.apache.jena.rdf.model.Resource> resources)

Alignment

match(org.apache.jena.ontology.OntModel source, org.apache.jena.ontology.OntModel target, Alignment inputAlignment, Properties properties)

Aligns two ontologies specified via a Jena OntModel, with an input alignment as Alignment object, and returns the mapping of the resulting alignment.

void

storeExtractedStopwords(Iterable<? extends org.apache.jena.rdf.model.Resource> resources, String key)

void

storeExtractedStopwords(Iterator<? extends org.apache.jena.rdf.model.Resource> resources, String key)

Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_jena.MatcherYAAAJena
getModelSpec, match, readOntology

Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_jena.MatcherYAAA
match

Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_base.MatcherFile
match

Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_base.MatcherURL
align, align, canExecute, getType

Methods inherited from class eu.sealsproject.platform.res.tool.impl.AbstractPlugin
getId, getVersion, setId, setVersion

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface eu.sealsproject.platform.res.tool.api.IPlugin
getId, getVersion

Field Details
- LOGGER
  
  private static final org.slf4j.Logger LOGGER
- valueExtractors
  
  private List<TextExtractor> valueExtractors
  
  Literal extractors to choose which literal/properties should be used.
- tokenizer
  
  private Function<String,Collection<String>> tokenizer
  
  Tokenizer function.
- countDistinctTermsPerResource
  
  private boolean countDistinctTermsPerResource
  
  If true, counts only tokens only once (even if it appears in one literal multiple times or multiple times in different literals).
- topNStopwords
  
  private int topNStopwords
  
  Extracts the N top most tokens as stopwords.
- stopwordsPercentage
  
  private double stopwordsPercentage
  
  The percentage how many resources this token must have to count as a stopword. Range between zero and one.
Constructor Details
- StopwordExtraction
  
  public StopwordExtraction(Function<String,Collection<String>> tokenizer, boolean countDistinctTermsPerResource, int topNStopwords, double stopwordsPercentage, List<TextExtractor> valueExtractors)
  
  Extracts the stopwords based on two criteria. 1) top most occurring tokens 2) percentage. It will stop if one of the two criteria is fulfilled.
  
  Parameters:
  
  tokenizer - tokenizer
  
  countDistinctTermsPerResource - If true, counts only tokens only once (even if it appears in one literal multiple times or multiple times in different literals).
  
  topNStopwords - how many stopwords to extract
  
  stopwordsPercentage - the percentage of how often a token should appear.
  
  valueExtractors - Literal extractors to choose which literal/properties should be used.
- StopwordExtraction
  
  public StopwordExtraction(Function<String,Collection<String>> tokenizer, boolean countDistinctTermsPerResource, int topNStopwords, double stopwordsPercentage, TextExtractor... valueExtractors)
  
  Extracts the stopwords based on two criteria. 1) top most occurring tokens 2) percentage. It will stop if one of the two criteria is fulfilled.
  
  Parameters:
  
  tokenizer - tokenizer
  
  countDistinctTermsPerResource - If true, counts only tokens only once (even if it appears in one literal multiple times or multiple times in different literals).
  
  topNStopwords - how many stopwords to extract
  
  stopwordsPercentage - the percentage of how often a token should appear.
  
  valueExtractors - Literal extractors to choose which literal/properties should be used.
- StopwordExtraction
  
  public StopwordExtraction(Function<String,Collection<String>> tokenizer, int topNStopwords, org.apache.jena.rdf.model.Property... properties)
  
  Extracts the stopwords based on the top most occuring tokens.
  
  Parameters:
  
  tokenizer - tokenizer
  
  topNStopwords - how many stopwords to extract
  
  properties - the properties which should be used for extracting the literals (text).
- StopwordExtraction
  
  public StopwordExtraction(Function<String,Collection<String>> tokenizer, double stopwordsPercentage, org.apache.jena.rdf.model.Property... properties)
  
  Extracts the stopwords based on the percentage (should be between 0 and 1). E.g. a token is a stopword if it occurs in more than 3 percent (0.03) of all resources.
  
  Parameters:
  
  tokenizer - tokenizer
  
  stopwordsPercentage - the percentage of how often a token should appear.
  
  properties - the properties which should be used for extracting the literals (text).
Method Details
- match
  
  public Alignment match(org.apache.jena.ontology.OntModel source, org.apache.jena.ontology.OntModel target, Alignment inputAlignment, Properties properties) throws Exception
  
  Description copied from class: MatcherYAAAJena
  
  Aligns two ontologies specified via a Jena OntModel, with an input alignment as Alignment object, and returns the mapping of the resulting alignment. Note: This method might be called multiple times in a row when using the evaluation framework. Make sure to return a mapping which is specific to the given inputs.
  
  Specified by:
  
  match in interface IMatcher<org.apache.jena.ontology.OntModel,Alignment,Properties>
  
  Specified by:
  
  match in class MatcherYAAAJena
  
  Parameters:
  
  source - This OntModel represents the source ontology.
  
  target - This OntModel represents the target ontology.
  
  inputAlignment - This mapping represents the input alignment.
  
  properties - Additional properties.
  
  Returns:
  
  The resulting alignment of the matching process.
  
  Throws:
  
  Exception - Any exception which occurs during matching.
- storeExtractedStopwords
  
  public void storeExtractedStopwords(Iterable<? extends org.apache.jena.rdf.model.Resource> resources, String key)
- storeExtractedStopwords
  
  public void storeExtractedStopwords(Iterator<? extends org.apache.jena.rdf.model.Resource> resources, String key)
- extractStopwords
  
  public Set<String> extractStopwords(Iterable<? extends org.apache.jena.rdf.model.Resource> resources)
- extractStopwords
  
  public Set<String> extractStopwords(Iterator<? extends org.apache.jena.rdf.model.Resource> resources)

Class StopwordExtraction

Field Summary

Fields inherited from class de.uni_mannheim.informatik.dws.melt.matching_base.MatcherFile

Constructor Summary

Method Summary

Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_jena.MatcherYAAAJena

Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_jena.MatcherYAAA

Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_base.MatcherFile

Methods inherited from class de.uni_mannheim.informatik.dws.melt.matching_base.MatcherURL

Methods inherited from class eu.sealsproject.platform.res.tool.impl.AbstractPlugin

Methods inherited from class java.lang.Object

Methods inherited from interface eu.sealsproject.platform.res.tool.api.IPlugin

Field Details

LOGGER

valueExtractors

tokenizer

countDistinctTermsPerResource

topNStopwords

stopwordsPercentage

Constructor Details

StopwordExtraction

StopwordExtraction

StopwordExtraction

StopwordExtraction

Method Details

match

storeExtractedStopwords

storeExtractedStopwords

extractStopwords

extractStopwords