public class ConcordanceTagger extends AbstractFileSelector implements java.lang.Runnable
This class depends on three external libraries: JWI, JSemcor, and the Stanford POS Tagger.
Use the main method of this class for its default functionality.
TaggedConcordanceIterator| Modifier and Type | Class and Description |
|---|---|
protected static class |
ConcordanceTagger.TaggerToken
Represents a semcor token that is not yet tagged.
|
| Constructor and Description |
|---|
ConcordanceTagger() |
| Modifier and Type | Method and Description |
|---|---|
protected void |
addWords(edu.mit.jsemcor.element.IWordform wf,
int tokenNum,
java.util.List<ConcordanceTagger.TaggerToken> result,
edu.mit.jwi.morph.IStemmer stemmer)
Stems each of the words in the provided wordform, adding the tagger
tokens created from these stems, words and token number to the given
results list.
|
protected java.io.File |
getLocation(java.lang.Class<?> key)
Utility method for getting a location that has a default stored in the
Java preferences.
|
protected edu.stanford.nlp.ling.SentenceProcessor<edu.stanford.nlp.ling.HasWord,? extends edu.stanford.nlp.ling.TaggedWord> |
getPOSTagger()
Returns a maximum entropy tagger using a Stanford NLP tagging model
selected by the user.
|
protected edu.mit.jsemcor.main.IConcordanceSet |
getSemcor()
Returns the Semcor concordance set or
null if the directory
cannot be found. |
protected edu.mit.jwi.morph.IStemmer |
getStemmer()
Returns a stemmer that requires Wordnet or
null if the
Wordnet directory cannot be found. |
protected java.io.Writer |
getWriter()
Returns a writer for the file to which the tagged concordance will be
written.
|
static void |
main(java.lang.String[] args)
Tags the Semcor corpus.
|
protected java.util.ArrayList<edu.stanford.nlp.ling.HasWord> |
makeSentence(edu.mit.jsemcor.element.ISentence s,
edu.mit.jwi.morph.IStemmer stemmer)
Returns a Stanford parser sentence that contains all the tokens from the
specified JSemcor sentence, with MWE expressions broken into their
constituent tokens.
|
void |
process(edu.mit.jsemcor.element.IContextID startContext,
int startSent,
java.lang.Iterable<? extends edu.mit.jsemcor.main.IConcordance> cs,
edu.stanford.nlp.ling.SentenceProcessor<edu.stanford.nlp.ling.HasWord,? extends edu.stanford.nlp.ling.TaggedWord> posTagger,
edu.mit.jwi.morph.IStemmer stemmer,
java.io.Writer writer,
IProgressBar pb)
Tags the all contexts provided by the concordance set, using the
specified tagger, writing the data to the specified writer.
|
protected void |
process(edu.mit.jsemcor.element.IContextID cid,
edu.mit.jsemcor.element.ISentence s,
edu.stanford.nlp.ling.SentenceProcessor<edu.stanford.nlp.ling.HasWord,? extends edu.stanford.nlp.ling.TaggedWord> posTagger,
edu.mit.jwi.morph.IStemmer stemmer,
java.io.Writer writer)
Tags the provided sentence, using the specified tagger, writing the data
to the specified writer.
|
void |
process(java.lang.Iterable<? extends edu.mit.jsemcor.main.IConcordance> cs,
edu.stanford.nlp.ling.SentenceProcessor<edu.stanford.nlp.ling.HasWord,? extends edu.stanford.nlp.ling.TaggedWord> posTagger,
edu.mit.jwi.morph.IStemmer stemmer,
java.io.Writer writer,
IProgressBar pb)
Tags the all contexts provided by the concordance set, using the
specified tagger, writing the data to the specified writer.
|
void |
run() |
protected void |
setLocation(java.lang.Class<?> key,
java.io.File loc)
Sets a default location into the Java Preferences.
|
protected java.util.List<java.lang.String> |
stem(java.lang.String token,
edu.mit.jsemcor.element.IWordform wf,
edu.mit.jwi.morph.IStemmer stemmer)
Stems the given token.
|
choose, chooseDirectory, chooseFile, chooseFileForWriting, getFileChooserpublic static void main(java.lang.String[] args)
TaggedConcordanceIterator class.args - standard main method arguments; ignoredpublic void run()
run in interface java.lang.Runnableprotected edu.mit.jsemcor.main.IConcordanceSet getSemcor()
null if the directory
cannot be found.null if the directory
cannot be found.protected edu.mit.jwi.morph.IStemmer getStemmer()
null if the
Wordnet directory cannot be found.null if the
Wordnet directory cannot be found.protected edu.stanford.nlp.ling.SentenceProcessor<edu.stanford.nlp.ling.HasWord,? extends edu.stanford.nlp.ling.TaggedWord> getPOSTagger()
throws java.lang.Exception
null if no model is
selected or found.MaxentTagger using a Stanford NLP tagging model
selected by the user. Will return null if no model
is selected or found.java.lang.Exception - if there is a problem instantiating the maximum entropy
tagger.protected java.io.Writer getWriter()
throws java.io.IOException
null
if no output file is selected.null if no output file is
selected.java.io.IOException - if an exception occurs when constructing the file writerprotected java.io.File getLocation(java.lang.Class<?> key)
getLocation in class AbstractFileSelectorkey - the class that serves as key for this locationnull if noneprotected void setLocation(java.lang.Class<?> key,
java.io.File loc)
setLocation in class AbstractFileSelectorkey - the class that serves as key for this locationloc - the location to be saved to the preferencespublic void process(java.lang.Iterable<? extends edu.mit.jsemcor.main.IConcordance> cs,
edu.stanford.nlp.ling.SentenceProcessor<edu.stanford.nlp.ling.HasWord,? extends edu.stanford.nlp.ling.TaggedWord> posTagger,
edu.mit.jwi.morph.IStemmer stemmer,
java.io.Writer writer,
IProgressBar pb)
throws java.io.IOException
cs - the concordance set from which contexts should be drawn, may
not be nullposTagger - the part of speech tagger to be used to tag the sentences, may
not be nullstemmer - a stemmer used to stem wordswriter - the writer to which results should be written, may not be
nullpb - the progress bar to which progress is to be reported; may be
nulljava.io.IOException - if there is a problem writing to the provided writerjava.lang.NullPointerException - if any argument is nullpublic void process(edu.mit.jsemcor.element.IContextID startContext,
int startSent,
java.lang.Iterable<? extends edu.mit.jsemcor.main.IConcordance> cs,
edu.stanford.nlp.ling.SentenceProcessor<edu.stanford.nlp.ling.HasWord,? extends edu.stanford.nlp.ling.TaggedWord> posTagger,
edu.mit.jwi.morph.IStemmer stemmer,
java.io.Writer writer,
IProgressBar pb)
throws java.io.IOException
startContext - the context where the tagging should begin. If
null, the tagging will being with the first
context.startSent - the sentence number past which tagging should being. If the
number is non-positive, no sentences in the specified context
are skippedcs - the concordance set from which contexts should be drawn, may
not be nullposTagger - the part of speech tagger to be used to tag the sentences, may
not be nullstemmer - a stemmer used to stem wordswriter - the writer to which results should be written, may not be
nullpb - the progress bar to which progress is to be reported; may be
nulljava.io.IOException - if there is a problem writing to the provided writerjava.lang.NullPointerException - if any of the concordance set, tagger, or writer are
nullprotected void process(edu.mit.jsemcor.element.IContextID cid,
edu.mit.jsemcor.element.ISentence s,
edu.stanford.nlp.ling.SentenceProcessor<edu.stanford.nlp.ling.HasWord,? extends edu.stanford.nlp.ling.TaggedWord> posTagger,
edu.mit.jwi.morph.IStemmer stemmer,
java.io.Writer writer)
throws java.io.IOException
cid - the context containing the sentences - the sentence being taggedposTagger - the part of speech tagger to be used to tag the sentences, may
not be nullstemmer - the stemmer used to stem the tokens, may not be
nullwriter - the writer to which results should be written, may not be
nulljava.io.IOException - if there is a problem writing to the provided writerjava.lang.NullPointerException - if any of the sentence, tagger, or writer are
nullprotected java.util.ArrayList<edu.stanford.nlp.ling.HasWord> makeSentence(edu.mit.jsemcor.element.ISentence s,
edu.mit.jwi.morph.IStemmer stemmer)
IToken object in the original
semcor sentence.s - a JSemcor ISentence object to be transformedstemmer - the stemmer to use when making the wordsjava.lang.NullPointerException - if the specified sentence is nullprotected void addWords(edu.mit.jsemcor.element.IWordform wf,
int tokenNum,
java.util.List<ConcordanceTagger.TaggerToken> result,
edu.mit.jwi.morph.IStemmer stemmer)
wf - the wordform whose constituent words are to be stemmedtokenNum - the number of the token to be tagged, inside the wordformresult - the list to which the tagger tokens will be addedstemmer - the stemmer used to stem the tokens, may not be
nullprotected java.util.List<java.lang.String> stem(java.lang.String token,
edu.mit.jsemcor.element.IWordform wf,
edu.mit.jwi.morph.IStemmer stemmer)
token - the token to be stemmedwf - the wordform from which the token is drawn the wordform from
which the token is drawnstemmer - the stemmer used to stem the tokens, may not be
nullnull.Copyright © 2011 Massachusetts Institute of Technology. All Rights Reserved.