intent.igt package

Submodules

intent.igt.consts module

intent.igt.grams module

Module to help dealing with “grams” (sub-token level gloss-line elements)

intent.igt.grams.fix_gram(gram)[source]
intent.igt.grams.gram_matches(gram)[source]
intent.igt.grams.write_gram(token, **kwargs)[source]

intent.igt.igtutils module

Created on Mar 11, 2014

author:rgeorgi
class intent.igt.igtutils.TestLangLines(methodName='runTest')[source]

Bases: unittest.case.TestCase

keep_something_test()[source]
runTest()[source]
intent.igt.igtutils.clean_gloss_string(ret_str)[source]
intent.igt.igtutils.clean_lang_string(ret_str)[source]

Clean the language string.

Parameters:ret_str
Returns:
intent.igt.igtutils.clean_lang_token(orig_str, lowercase=True)[source]
intent.igt.igtutils.clean_trans_string(ret_str)[source]
intent.igt.igtutils.collapse_spaces(ret_str)[source]
intent.igt.igtutils.concat_lines(linelist)[source]
intent.igt.igtutils.extract_judgment(line)[source]

Given a string, attempt to extract the judgment character (“*” or ”?”) from it.

Parameters:line (str) –
Returns:Tuple of the altered line and the judgment character. :rtype: tuple[str, str]
intent.igt.igtutils.fix_grams(ret_str)[source]

Search for gram strings that have been split with whitespace and rejoin them.

For instance “3 SG” will become “3SG”

intent.igt.igtutils.get_judgment(line)[source]
intent.igt.igtutils.grammaticality(ret_str)[source]
intent.igt.igtutils.hyphenate_infinitive(ret_str)[source]
intent.igt.igtutils.is_strict_columnar_alignment(s_a, s_b)[source]
intent.igt.igtutils.join_morphs(ret_str)[source]

Find tokens that have letters or numbers on two sides separated by a period or morph and join them.

E.g. MASC . 1SG becomes MASC.1SG

intent.igt.igtutils.merge_lines(linelist)[source]

Given two lines, merge characters that fall into blank space on the other line.

@param linelist:

intent.igt.igtutils.rejoin_letter(ret_str, letter='t', direction='right')[source]

Reattach lone letters hanging out by their lonesome. @param ret_str:

intent.igt.igtutils.remove_byte_char(ret_str)[source]
intent.igt.igtutils.remove_elipses(ret_str)[source]
intent.igt.igtutils.remove_external_punctuation(ret_str)[source]
intent.igt.igtutils.remove_extra_parens(ret_str)[source]
intent.igt.igtutils.remove_extra_punc(ret_str)[source]
intent.igt.igtutils.remove_final_punctuation(ret_str)[source]
intent.igt.igtutils.remove_hyphens(ret_str)[source]
intent.igt.igtutils.remove_leading_numbers(ret_str)[source]
intent.igt.igtutils.remove_leading_punctuation(ret_str)[source]
intent.igt.igtutils.remove_numbering(ret_str)[source]
intent.igt.igtutils.remove_parenthetical_numbering(ret_str)[source]
intent.igt.igtutils.remove_period_numbering(ret_str)[source]

Remove period-initial numbering like: | 1. a. ii.

intent.igt.igtutils.remove_solo_punctuation(ret_str)[source]
intent.igt.igtutils.replace_group_with_whitespace(match_obj)[source]
intent.igt.igtutils.rgencode(o)[source]
intent.igt.igtutils.rgp(o)[source]
intent.igt.igtutils.split_punctuation(ret_str)[source]
intent.igt.igtutils.strict_columnar_alignment(s_a, s_b)[source]
intent.igt.igtutils.strip_leading_whitespace(lines)[source]

Given

intent.igt.igtutils.surrounding_quotes_and_parens(ret_str)[source]

intent.igt.metadata module

Created on Apr 9, 2015

author:rgeorgi
intent.igt.metadata.add_word_level_info(obj, val)[source]
intent.igt.metadata.del_meta(obj, meta_type, metadata_type=None)[source]

Remove the specified Meta type.

Parameters:
  • obj
  • meta_type
intent.igt.metadata.del_meta_attr(obj, meta_type, attr, metadata_type=None)[source]

Remove the specified meta attribute

Parameters:
  • obj
  • meta_type
  • attr
intent.igt.metadata.find_meta(obj, meta_type, metadata_type='intent-meta')[source]
Given an object, search to find the text value of a Meta item
with the given type type.
Parameters:
  • obj – Object to search for metadata on
  • meta_type
Returns:

Return type:

Meta

intent.igt.metadata.find_meta_attr(obj, meta_type, attr, metadata_type='intent-meta')[source]

Find the specific value of a metadata attribute, or None if the meta item does not exist, or does not have the specified attribute.

Parameters:
  • obj
  • meta_type
  • attr
Returns:

str or None

intent.igt.metadata.find_meta_text(obj, meta_type)[source]
intent.igt.metadata.find_metadata(obj, metadata_type)[source]
intent.igt.metadata.get_intent_method(obj)[source]

Return the intent method used to generate the given object, or None if the method is not specified.

Parameters:obj – Object on which to look for the metadata
Returns:str or None
intent.igt.metadata.get_intent_proj_aln_method(obj)[source]

Return the alignment method used to do the projection, if such metadata exists.

intent.igt.metadata.get_meta_timestamp(m)[source]
intent.igt.metadata.get_word_level_info(obj)[source]
intent.igt.metadata.is_contentful_meta(m)[source]
intent.igt.metadata.is_contentful_metadata(md)[source]
intent.igt.metadata.remove_word_level_info(obj)[source]
intent.igt.metadata.set_intent_method(obj, method)[source]

Set the data provenance attributes of the metadata to show that they were sourced from intent, using the specified method method.

Parameters:
  • obj – Object to add metadata to.
  • method – Method to set as the method attribute on the meta item.
intent.igt.metadata.set_intent_proj_data(obj, source_tier, aln_type)[source]

Using the source_tier tier, add some metadata to this instance to describe the source_tier that created the projection material.

Parameters:
  • obj
  • source_tier
intent.igt.metadata.set_meta(obj, m, metadata_type='intent-meta', timestamp=True)[source]
intent.igt.metadata.set_meta_attr(obj, meta_type, attr, val, metadata_type='intent-meta', timestamp=True)[source]

Add an arbitrary piece of metadata to a XIGT object that accepts metadata

Parameters:
  • obj – XIGT object to add a piece of metadata to.
  • meta_type – Type of the Meta object to add
  • val – Text value for the meta object to be added.
  • metadata_type – Type for the metadata container in which to append the item.
Meta_type obj:

XigtContainerMixin

Raises:

Exception – If obj is of a meta_type that does not contain metadata.

intent.igt.metadata.set_meta_text(obj, meta_type, text, metadata_type='intent-meta')[source]
intent.igt.metadata.timestamp_meta(m)[source]

intent.igt.rgxigt module

Subclassing of the xigt package to add a few convenience methods.

class intent.igt.rgxigt.RGIgt(**kwargs)[source]

Bases: xigt.model.Igt

add_gloss_lang_alignments()[source]
add_normal_line(tier, tag, func)[source]
all_tags()[source]
clean_tier(merge=False, generate=True)[source]
has_corruption()[source]

Return True if instance has “CR” in it, indicating corruption.

has_double_column()[source]
heur_align(**kwargs)[source]
normal_tier(clean=True, generate=True)[source]
project_trans_to_lang(aln_method=None, tag_method=None)[source]

Project POS tags from the translation line directly to the language line. This assumes that we have a bilingual alignment between translation words and language words already.

raw_tier()[source]

intent.igt.tests module

Created on Feb 24, 2015

author:rgeorgi <rgeorgi@uw.edu>
class intent.igt.tests.GlossAlignTest(methodName='runTest')[source]

Bases: unittest.case.TestCase

test_gloss_projection_unaligned()[source]
class intent.igt.tests.POSTestCase(methodName='runTest')[source]

Bases: unittest.case.TestCase

setUp()[source]
test_add_pos_tags()[source]
test_classify_pos_tags()[source]
test_tag_trans_line()[source]
class intent.igt.tests.TextParseTest(methodName='runTest')[source]

Bases: unittest.case.TestCase

glosses_test()[source]

Test that the glosses are rendered correctly.

line_test()[source]

Test that lines are rendered correctly.

setUp()[source]
set_bilingual_align_test()[source]

Set the bilingual alignment manually, and ensure that it is read back correctly.

word_align_test()[source]

Test that the gloss has been automatically aligned at the word level correctly.

class intent.igt.tests.XigtParseTest(methodName='runTest')[source]

Bases: unittest.case.TestCase

Testcase to make sure we can load from XIGT objects.

giza_align_test()[source]
heur_align_test()[source]
setUp()[source]
xigt_load_test()[source]

Module contents