intent.corpora package

Submodules

intent.corpora.POSCorpus module

Created on Mar 6, 2014

@author: rgeorgi

class intent.corpora.POSCorpus.POSCorpus(seq=[])[source]

Bases: list

POS Tag corpus object to attempt to unify inputs and outputs.

accuracy(other)[source]
add(inst)[source]
mallet(lowercase=True)[source]
matches(other)[source]
raw()[source]
classmethod read_simpletagger(fp, **kwargs)[source]

Simpletagger format is used by Mallet and consists of one token per line, with its features listed first and then its label listed last.

Parameters:
  • cls
  • fp
classmethod read_slashtags(path, **kwargs)[source]

Method to read in a corpus in the form of Token/TAG. (Assumes the default delimiter “/”)

Parameters:
  • path (str) – File path to the slashtagged file to read in.
  • delimiter (str) – Delimiter between token and Tag
Returns:

POSCorpus

slashtags(delimeter='/', lowercase=True)[source]

Return the corpus in slashtags ( John/NN Stewart/NN ) format.

@param delimeter: @param lowercase:

split(percent=100.0)[source]
token_handler(tokens)[source]
tokens()[source]
types()[source]
write(path, format, delimeter='/', outdir='/Users/rgeorgi/Documents/code/intent-doc', lowercase=True)[source]
writesplit(train_path, test_path, split, format, delimeter='/', outdir='/Users/rgeorgi/Documents/code/intent-doc', lowercase=True)[source]
exception intent.corpora.POSCorpus.POSCorpusException(msg=None)[source]

Bases: Exception

class intent.corpora.POSCorpus.POSCorpusInstance(seq=[], id_ref=None)[source]

Bases: list

append(token)[source]
mallet(lowercase=True)[source]
matches(other)[source]
raw(lowercase=True)[source]
slashtags(delimeter='/', lowercase=True)[source]
intent.corpora.POSCorpus.process_slashtag_file(path, token_func, delimeter='/')[source]

A universal function to process “slashtag”-style (e.g. Fountain/NOUN) files.

Parameters:
  • path – Path to the slashtag file.
  • func – Function to apply to each token.
intent.corpora.POSCorpus.process_wsj_file(fp, token_func)[source]

A function for processing WSJ parse files.

Parameters:
  • fp (str) –
  • token_func
Returns:

Module contents