intent.interfaces package

Submodules

intent.interfaces.giza module

Created on Feb 14, 2014

class intent.interfaces.giza.A3files(prefix, name='aln')[source]

Bases: object

clean()[source]
merge(merged_path)[source]
class intent.interfaces.giza.CooccurrenceFile[source]

Bases: collections.defaultdict

An internal representation of a cooccurrence file.

dump(path=None)[source]
class intent.interfaces.giza.GizaAligner[source]

Bases: object

A class to run GIZA

force_align(e_snts, f_snts)[source]
classmethod load(prefix)[source]

Load a stored giza alignment file to resume

Parameters:
  • prefix (path+base) – Prefix for the non-text files
  • e (path) – Path to the “e” file
  • f (path) – Path to the “f” file
resume(prefix, new_e, new_f)[source]

“Force” align a new set of data using the old model, per the instructions at:

http://www.kyloo.net/software/doku.php/mgiza:forcealignment

temp_align(e_snts, f_snts, func)[source]
Parameters:
  • e_snts (list[list[str]]) – e sentences
  • f_snts (list[list[str]]) – f sentences
  • func (method) – The function to use on the data, either training from scratch or resuming.
temp_train(e_snts, f_snts)[source]
train(prefix, e, f)[source]

Train the giza word alignments on the provided text files.

Parameters:
  • prefix (path+prefix) – Prefix for where the giza output files will be stored.
  • e (path) – Path to the “e” file
  • f (path) – Path to the “f”
exception intent.interfaces.giza.GizaAlignmentException[source]

Bases: Exception

An exception class for Giza errors.

class intent.interfaces.giza.GizaFiles(prefix, e, f, name='aln')[source]

Bases: object

Giza produces so many files, it’s easy just to initialize an object to represent all the files that will be produced, based on the input F, E text files, and the prefix provided for output.

a
a3
a3merged
aligned_sents()[source]

Read in the (merged) A3 file and return the AlignedSents of (src, tgt) alignments.

Return type:list[AlignedSent]
cfg
clean()[source]
d3
d4
decoder
e_vcb
ef
ef_cooc
ef_snt
f_vcb
fe
fe_cooc
fe_snt
merge_a3()[source]
n
p0
perp
t
txt_to_snt(ev=None, fv=None)[source]

This function will generate .snt files in the appropriate place based on the vocabularies and text files provided.

class intent.interfaces.giza.TestTrain(methodName='runTest')[source]

Bases: unittest.case.TestCase

setUp()[source]
test_giza_train_toy()[source]
class intent.interfaces.giza.Vocab[source]

Bases: object

Internal representation for a .vcb file, so that they can be quickly rewritten.

Note that “1” is the symbol reserved for end-of-sentence, so the indices should start with “2”

add(word, count=1)[source]

Add a word to the vocab and assign it a new id.

add_from_txt(path)[source]
dump(path=None)[source]
get_id(w, add=False)[source]

Get the ID for a word. If “add” is False, raise an exception if the word is not found in the vocab. Otherwise, add it and return the new ID.

items()[source]
classmethod load(path)[source]

Create a vocab object from a path.

Parameters:path (filepath) – Path to the .vcb file to load
string_to_ids(string, add=False)[source]

Given a string, convert it to the ids representation expected by GIZA, using the words in this vocab. If an unknown word is discovered, raise an Exception.

string_to_snt(string, add=False)[source]

Do what string_to_ids does, but return a string.

exception intent.interfaces.giza.VocabNotFoundException[source]

Bases: Exception

class intent.interfaces.giza.VocabWord(word, id)[source]

Bases: object

A simple class to contain words in the vocab and keep track of their ID, while hashing the same as the string that they represent.

intent.interfaces.mallet_crf module

Created on Apr 4, 2014

@author: rgeorgi

intent.interfaces.mallet_crf.setup()[source]
intent.interfaces.mallet_crf.test(test_path, model_path, out_f=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>, err_f=<_io.TextIOWrapper name='<stderr>' mode='w' encoding='UTF-8'>)[source]
intent.interfaces.mallet_crf.train(train_path, model_path, out_f=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>, err_f=<_io.TextIOWrapper name='<stderr>' mode='w' encoding='UTF-8'>)[source]
intent.interfaces.mallet_crf.write_and_eval(test_path, model_path, out_path, out_f=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>, err_f=<_io.TextIOWrapper name='<stderr>' mode='w' encoding='UTF-8'>)[source]
intent.interfaces.mallet_crf.write_out(test_path, model_path, tag_out, out_f=<_io.TextIOWrapper name='<stderr>' mode='w' encoding='UTF-8'>, err_f=<_io.TextIOWrapper name='<stderr>' mode='w' encoding='UTF-8'>)[source]

intent.interfaces.mallet_crf_constraints module

Created on Mar 6, 2014

@author: rgeorgi

intent.interfaces.mallet_crf_constraints.setup()[source]
intent.interfaces.mallet_crf_constraints.train(train_path, test_path, constraint_path, model_path, log_path=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>)[source]

intent.interfaces.mallet_maxent module

Created on Apr 4, 2014

@author: rgeorgi

exception intent.interfaces.mallet_maxent.ClassifierException[source]

Bases: Exception

exception intent.interfaces.mallet_maxent.EmptyStringException[source]

Bases: intent.interfaces.mallet_maxent.ClassifierException

class intent.interfaces.mallet_maxent.MalletMaxent(model=None)[source]

Bases: object

classify(string)[source]
classify_string(s, **kwargs)[source]

Run the classifier on a string, breaking it apart as necessary.

Parameters:s (str) – String to classify
classify_token(token, **kwargs)[source]
close()[source]
info()[source]

Print the feature statistics for the given model. (Assumes MaxEnt)

intent.interfaces.mallet_maxent.svmlight_to_vectors(txt)[source]

Convert a text file to vectors.

Parameters:txt – Path to the text file.
intent.interfaces.mallet_maxent.train_txt(txt_path, model_path)[source]

Train a classifier from a svm-light format text file.

Parameters:
  • txt_path
  • model_path

intent.interfaces.stanford_parser module

OO interface for communicating with the Stanford Parser

author:rgeorgi
class intent.interfaces.stanford_parser.ParseResult[source]

Bases: object

class intent.interfaces.stanford_parser.ParseTest(methodName='runTest')[source]

Bases: unittest.case.TestCase

setUp()[source]
class intent.interfaces.stanford_parser.StanfordParser[source]

Bases: object

Instantiate an object which can be called upon to return either phrase structure parses or dependency parses.

close()[source]
parse(string, id_base=None)[source]

Use the parser to parse the given string, and parse it for both dependency tree and phrase structure trees.

Parameters:
  • string (str) – String to parse
  • id_base
parse_interpreter(str)[source]
intent.interfaces.stanford_parser.parse_interpreter(str, parse_queue)[source]
intent.interfaces.stanford_parser.parser_stderr_handler(msg)[source]

intent.interfaces.stanford_tagger module

Created on Oct 22, 2013

@author: rgeorgi

exception intent.interfaces.stanford_tagger.CriticalTaggerError[source]

Bases: intent.interfaces.stanford_tagger.TaggerError

class intent.interfaces.stanford_tagger.StanfordPOSTagger(model)[source]

Bases: object

Instantiate a java VM to run the stanford tagger.

close()[source]
tag(s, **kwargs)[source]
Return type:list[POSToken]
tag_tokenization(tokenization, **kwargs)[source]
exception intent.interfaces.stanford_tagger.TaggerError[source]

Bases: Exception

class intent.interfaces.stanford_tagger.TestPeriodTagging(methodName='runTest')[source]

Bases: unittest.case.TestCase

runTest(result=None)[source]
intent.interfaces.stanford_tagger.stanford_stderr_handler(line)[source]
intent.interfaces.stanford_tagger.stanford_stdout_handler(output, queue)[source]
intent.interfaces.stanford_tagger.tag(string, model)[source]
intent.interfaces.stanford_tagger.test_postagger(test_file, model_path, out_file, delimeter='/')[source]
Parameters:
  • test_file
  • model_path
  • out_file
  • delimeter
intent.interfaces.stanford_tagger.test_postagger_on_conll(test_file, model_path, out_file, delimeter='/')[source]
intent.interfaces.stanford_tagger.train_postagger(train_path, model_path, delimeter='/')[source]

Given the slashtag file train_file, train a tagger model from it and output it to model_path.

Parameters:
  • train_path – Path to input slashtag file
  • model_path – Path to output the model
  • delimeter – Delimeter to separate words/tags
intent.interfaces.stanford_tagger.train_postagger_on_conll(train_file, model_path, delimeter='/')[source]

Module contents