intent.alignment package

Submodules

intent.alignment.Alignment module

Created on Feb 21, 2014

@author: rgeorgi

class intent.alignment.Alignment.AlignPair[source]

Bases: tuple

class intent.alignment.Alignment.AlignedCorpus(iter=None)[source]

Bases: list

read(src_path, tgt_path, aln_path, limit=None)[source]

Read in the morph:gloss:aln alignment format

@param src_path: Source sents @param tgt_path: Target sents @param aln_path: Alignment filename @param limit: Sentence limit

read_giza(src_path, tgt_path, a3, limit=None)[source]

Method intended to read a giza A3.final file into an alignment format.

@param src_path: path to the source sentences @param tgt_path: path to the target sentences @param a3: path to the giza A3.final file. @param limit:

write(src_path, tgt_path, aln_path)[source]
class intent.alignment.Alignment.AlignedSent(src_tokens, tgt_tokens, aln)[source]

Bases: object

An object class to contain source and target tokens, and an alignment between the two.

aligned_words()[source]
aln_with_nulls()[source]

Return alignment with explicit NULL alignments.

flipped()[source]
classmethod from_giza_lines(tgt, aln)[source]

Return the target-to-source alignment from the target and aln lines of giza.

get_attr(key)[source]
get_src(i)[source]
get_tgt(i)[source]
pairs(src=None, tgt=None)[source]
serialize_src()[source]
serialize_src_h()[source]
set_attr(key, val)[source]
src
src_text
src_to_tgt(i)[source]
src_to_tgt_words(i)[source]
srclen
tgt
tgt_text
tgt_to_src(i)[source]
tgt_to_src_words(i)[source]
tgtlen
unaligned_src_indices()[source]
unaligned_src_words()[source]
unaligned_tgt_indices()[source]
unaligned_tgt_words()[source]
wordpair(ip)[source]

Return the wordpair corresponding with an alignment pair. @param ip:

wordpairs(ips=None)[source]

Return the pairs of words referred to by the indices in ips.

This is either an aribitrary pair of indices passed as an argument, or the indices contained in the alignment property. @param ips:

class intent.alignment.Alignment.AlignedSentOutputCase(methodName='runTest')[source]

Bases: unittest.case.TestCase

runTest()[source]
class intent.alignment.Alignment.Alignment(iter=[], type=None)[source]

Bases: set

Simply, a set of (src_index, tgt_index) pairs in a set.

add(*args, **kwargs)[source]
all_src()[source]
all_tgt()[source]
contains_src(key)[source]
contains_tgt(key)[source]
copy(*args, **kwargs)[source]
Return type:Alignment
final(a)[source]
Return type:Alignment
flip()[source]

For an alignment of { (a, b) ... (c, d) } pairs, return an Alignment of { (b, a) ... (d, c) }

Return type:Alignment
classmethod from_giza(giza)[source]
Given a giza style alignment string, such as:

NULL ({ 3 }) fact ({ }) 1ss ({ 1 }) refl ({ }) wash ({ 2 }) ben ({ 5 4 }) punc ({ }) ne ({ 6 }) shirt ({ 4 })

...where the integers represent an indexed-from-one reference to the target line, and the words are tokens from the source line, return a (src, tgt) index alignment.

Parameters:giza (str) – Alignment string as described above.
grow_diag(a2)[source]
Return type:Alignment
grow_diag_final(a2)[source]
intersection(*args, **kwargs)[source]
Return type:Alignment
nonzeros()[source]
serialize_src()[source]
src_to_tgt(key)[source]
tgt_to_src(key)[source]
union(*args, **kwargs)[source]
Return type:Alignment
exception intent.alignment.Alignment.AlignmentError(value)[source]

Bases: Exception

class intent.alignment.Alignment.AlignmentTest(methodName='runTest')[source]

Bases: unittest.case.TestCase

runTest()[source]
class intent.alignment.Alignment.GizaAlignmentTest(methodName='runTest')[source]

Bases: unittest.case.TestCase

setUp()[source]
test_alignment_reading()[source]
test_alignmentsent_reading()[source]
exception intent.alignment.Alignment.HeuristicAlignmentException[source]

Bases: Exception

class intent.alignment.Alignment.MorphAlign(iter=[])[source]

Bases: intent.alignment.Alignment.Alignment

Special subclass of alignment that holds not only src and tgt indices, but also a remapped middle index

GlossAlign
MorphAlign
add(item)[source]
add_str(string)[source]
flip()[source]
remap(aln)[source]

Given another alignment, return a new alignment where its indices are either either remapped to an entry in the remapping, or returned as-is.

@param aln: Alignment to remap.

remapping
intent.alignment.Alignment.combine_corpora(a1, a2, method='intersect')[source]
intent.alignment.Alignment.combine_sents(s1, s2, method='intersect')[source]
intent.alignment.Alignment.exact_match(src, tgt)[source]
intent.alignment.Alignment.gram_match(src, tgt)[source]
intent.alignment.Alignment.heur_alignments(gloss_tokens, trans_tokens, **kwargs)[source]

Obtain heuristic alignments between gloss and translation tokens

Parameters:
  • gloss_tokens (list[Token]) – The gloss tokens
  • trans_tokens (list[Token]) – The trans tokens
  • iteration (int) – Number of iterations looking for matches
intent.alignment.Alignment.heuristic_chain(gloss_tokens, trans_tokens, methods, aln=None, multiple_matches=True)[source]
intent.alignment.Alignment.heuristic_iteration(gloss_tokens, trans_tokens, aln, comparison_function, multiple_matches=True, iteration=1, report=False)[source]
Parameters:
  • gloss_tokens (list[str]) –
  • trans_tokens (list[str]) –
  • aln (Alignment) –
  • comparison_function
  • iteration
intent.alignment.Alignment.intersect(a1, a2)[source]
intent.alignment.Alignment.refined_combine(s1, s2)[source]

Implements the “refined” alignment algorithm from Och & Ney 2003

@param s1: @param s2:

intent.alignment.Alignment.stem_match(src, tgt)[source]
intent.alignment.Alignment.union(a1, a2)[source]

Module contents