intent.utils package

Submodules

intent.utils.ConfigFile module

Created on Oct 23, 2013

@author: rgeorgi

Basic config file. Comments and blank lines are ignored.

Variables are stored in a dictionary.

Supports ‘$’ references, as long as they are ordered correctly.

Also automatically attempts to parses lines into python types (lists, integers).

class intent.utils.ConfigFile.ConfigFile(path)[source]

Bases: intent.utils.argpasser.ArgPasser

Configuration file

getpath(key)[source]

Retrieve a path from the config file, returning a path that is relative to this config file if necessary. :param key: :type key:

set_defaults(dict)[source]
exception intent.utils.ConfigFile.ConfigFileException[source]

Bases: Exception

class intent.utils.ConfigFile.ConfigFileTests(methodName='runTest')[source]

Bases: unittest.case.TestCase

setUp()[source]
tearDown()[source]
testBool()[source]
testRefs()[source]
exception intent.utils.ConfigFile.NoOptionException[source]

Bases: intent.utils.ConfigFile.ConfigFileException

exception intent.utils.ConfigFile.SetConflict[source]

Bases: intent.utils.ConfigFile.ConfigFileException

intent.utils.TagCounter module

intent.utils.arg_consts module

intent.utils.argpasser module

Created on Jan 27, 2015

@author: rgeorgi

class intent.utils.argpasser.ArgPasser(d={})[source]

Bases: dict

Argpasser is just a drop-in replacement for a **kwarg dict, but allows for things that evaluate to false in the dict to be returned without being replaced by the default.

get(k, default=None, t=None)[source]

Using the key k, attempt to retrieve the value from the dictionary. A default replacement is available, and a “type” argument which can be applied to verify the argument is of the right type.

Parameters:
  • k – the key
  • default – what to return if k was not found in the dict
  • t (type) – the type function to apply to the retrieved argument.
class intent.utils.argpasser.ArgPasserTests(methodName='runTest')[source]

Bases: unittest.case.TestCase

setUp()[source]
testBool()[source]
testInt()[source]
exception intent.utils.argpasser.ArgPassingException[source]

Bases: Exception

intent.utils.argpasser.add_args_to_namespace(ap, ns, overwrite=False)[source]

Given an ArgPasser, add the key, value pairs to a given namespace. :param ap: ArgPasser with the values to use. :type ap: ArgPasser :param ns: Namespace to add the ArgPasser values to. :type ns: Namespace :param overwrite: Whether or not to overwrite existing values. :type overwrite: bool

intent.utils.argpasser.argp(f)[source]

This decorator takes a function that normally has some keyword args and instead makes it require one keyword arg that is an argpasser. This is helpful for doing some of the default tests with argpasser.

Parameters:f (func) –

intent.utils.argutils module

Created on Aug 26, 2013

author:rgeorgi
exception intent.utils.argutils.CommandLineException[source]

Bases: Exception

class intent.utils.argutils.DefaultHelpParser(prog=None, usage=None, description=None, epilog=None, parents=[], formatter_class=<class 'argparse.HelpFormatter'>, prefix_chars='-', fromfile_prefix_chars=None, argument_default=None, conflict_handler='error', add_help=True)[source]

Bases: argparse.ArgumentParser

Make the argparser default to printing help when an error is encountered.

convert_arg_line_to_args(arg_line)[source]
error(message)[source]
exception intent.utils.argutils.PathArgException[source]

Bases: intent.utils.argutils.CommandLineException

exception intent.utils.argutils.PathInvalidException[source]

Bases: intent.utils.argutils.PathArgException

exception intent.utils.argutils.PathNotExistsException[source]

Bases: intent.utils.argutils.PathArgException

intent.utils.argutils.configfile(path)[source]
intent.utils.argutils.csv_choices(choice_list)[source]
intent.utils.argutils.exists(path)[source]

Type for passing to argparse to verify that the argument is an extant path.

intent.utils.argutils.existsdir(path, rootpath=None)[source]

Type for passing to argparse to verify that the argument both:

  • Is a directory
  • Exists on the filesystem
Parameters:
  • path – Path to check
  • rootpath – Path from which to construct relative paths.
intent.utils.argutils.existsfile(path)[source]

Type for passing to argparse to verify that the argument both:

  • Is a file
  • Exists on the filesystem
intent.utils.argutils.globfiles(file_arg)[source]

Given a glob pattern, find files matching it.

Parameters:file_arg – glob pattern
Returns:
raise argparse.ArgumentError:
 
intent.utils.argutils.proportion(arg)[source]
intent.utils.argutils.require_opt(option, msg, must_exist=False, must_exist_msg='The file "%s" was not found\n')[source]
intent.utils.argutils.writedir(path)[source]
intent.utils.argutils.writefile(path, mode='w', encoding='utf-8')[source]

Ensure that this file is writable in the given path, and return it as an open file object.

Parameters:
  • path (filepath) – Path to the file to write
  • mode ([ 'w' | 'wb' ]) – Write mode
  • encoding (encoding) – File encoding

intent.utils.dicts module

Created on Aug 26, 2013

@author: rgeorgi

class intent.utils.dicts.CountDict[source]

Bases: object

add(key, value=1)[source]
distribution(use_keys=<class 'list'>, add_n=0)[source]
get(k, d=0)[source]
items()[source]
keys()[source]
largest()[source]
most_frequent(minimum=0, num=1)[source]

Return the @num entries with the highest counts that also have at least @minimum occurrences.

@param minimum: int @param num: int

most_frequent_counts(minimum=0, num=1)[source]
total()[source]
class intent.utils.dicts.DefaultOrderedDict(default_factory=None, *a, **kw)[source]

Bases: collections.OrderedDict

copy()[source]
class intent.utils.dicts.GreedyTest(methodName='runTest')[source]

Bases: unittest.case.TestCase

runTest()[source]
class intent.utils.dicts.MatrixTest(methodName='runTest')[source]

Bases: unittest.case.TestCase

runTest()[source]
class intent.utils.dicts.POSEvalDict[source]

Bases: intent.utils.dicts.TwoLevelCountDict

This dictionary is used for evaluation. Items are stored in the dictionary as:

{real_label:{assigned_label:count}}

This also supports greedy mapping techniques for evaluation.

accuracy()[source]
all_matches()[source]
assigned_tags()[source]
breakdown_csv()[source]
col_total(assigned_tag)[source]
Parameters:assigned_tag – The assigned tag to count
Returns:The number of tokens that have been assigned the tag assigned_tag, including false positives.
error_matrix(csv=False, ansi=False)[source]

Print an error matrix with the columns being the tags assigned by the system and the rows being the gold standard answers.

fmeasure()[source]
gold_tags()[source]
greedy_1_to_1(debug=False)[source]

Remap the tags one-to-one in such a way as to maximize matches.

This will be similar to bubble sort. Start off with 1:1. Then, go through each pair of tags and try swapping the two. If we get a net gain of matches, then keep the swap, otherwise don’t. Repeat until we get a full run of no swaps.

greedy_n_to_1()[source]

Remap the tags in such a way to maximize matches. In this mapping, multiple output tags can map to the same gold tag.

keys()[source]
map(k)[source]
matches(t)[source]
overall_breakdown(title=None)[source]
precision()[source]
recall()[source]
tag_fmeasure(tag)[source]

Calculate f-measure for a given tag :param tag: :rtype: float

tag_precision(tag)[source]

Calculate the precision for a given tag

Return type:float
tag_recall(tag)[source]

Calculate recall for a given tag :param tag: Input tag :rtype: float

unaligned(unaligned_tag='UNK')[source]
unmap(k)[source]
class intent.utils.dicts.StatDict(type=<class 'int'>)[source]

Bases: collections.defaultdict

counts
distribution
total
class intent.utils.dicts.TwoLevelCountDict[source]

Bases: object

add(key_a, key_b, value=1)[source]
combine(other)[source]
distribution(as_string=False, as_csv=False)[source]
fulltotal()[source]
keys()[source]
most_frequent(key, num=1, key2_re='')[source]
sub_distribution(key, use_keys=<class 'list'>, add_n=0)[source]
top_n(key, n=1, min_num=1, key2_re=None)[source]
total(key)[source]
Parameters:key
Returns:Number of tokens that have the “REAL” tag key

intent.utils.env module

Created on Feb 12, 2015

@author: rgeorgi

intent.utils.env.load_posdict()[source]
intent.utils.env.set_env_lang_utf8()[source]
intent.utils.env.xigt_testfile(s)[source]

intent.utils.fileutils module

Created on Oct 24, 2013

@author: rgeorgi

intent.utils.fileutils.dir_above(path, n=1)[source]
intent.utils.fileutils.globlist(globlist)[source]
intent.utils.fileutils.lc(fname)[source]

Get the linecount for a file. :param fname:

intent.utils.fileutils.makedirs(path)[source]
intent.utils.fileutils.matching_files(dirpath, pattern, recursive=False)[source]

Return the paths matching a pattern in a directory, optionally recurse into the subdirectories.

@param dirpath: directory to scan @param pattern: regular expression to match paths upon @param recursive: whether or not to recurse into the directories.

intent.utils.fileutils.remove_safe(path)[source]
intent.utils.fileutils.swapext(path, ext)[source]

Swap the extension on a file

Parameters:
  • path (filepath) – Path to the file
  • ext (str) – new extension (if not starting with ”.” one will be added)

intent.utils.listutils module

Created on Oct 23, 2013

@author: rgeorgi

class intent.utils.listutils.FlattenTest(methodName='runTest')[source]

Bases: unittest.case.TestCase

test_flatten()[source]
intent.utils.listutils.all_indices(item, seq)[source]
intent.utils.listutils.chunkIt(seq, num)[source]

Divide a sequence seq into num roughly equal pieces.

Parameters:
  • seq
  • num
Returns:

intent.utils.listutils.flatten_list(obj)[source]

Given a set of embedded lists, return a single, “flattened” list.

Parameters:obj
Returns:
intent.utils.listutils.uniqify(seq, idfun=None)[source]

Given a sequence, return a sequence that contains only the unique items (while preserving the order).

Parameters:
  • seq – Sequence to uniqify
  • idfun – Function to apply to the instances to determine whether they are “unique”
Returns:

list

intent.utils.logconsts module

intent.utils.string_utils module

author:Ryan Georgi <rgeorgi@uw.edu>
intent.utils.string_utils.lemmatize_token(st, pos=None)[source]
intent.utils.string_utils.replace_invalid_xml(s)[source]
intent.utils.string_utils.stem_token(st)[source]
intent.utils.string_utils.string_compare_with_processing(s1, s2, **kwargs)[source]

Given two strings, do all the various processing tricks to decide if they match or not.

Parameters:
  • s1 (str) – First string to compare
  • s2 (str) – Second string to compare

intent.utils.systematizing module

Created on Oct 22, 2013

@author: rgeorgi

class intent.utils.systematizing.ProcessCommunicator(cmd, stdout_func=None, stderr_func=None, shell=False, blocking=False)[source]

Bases: object

This is a class to make communicating between a commandline program easier. It will make available stdin and stdout pipes, while allowing for the stderr to be handled by a custom handler.

kill()[source]
poll()[source]
stderr
stdin
stdout
wait()[source]
class intent.utils.systematizing.ProcessCommunicatorTest(methodName='runTest')[source]

Bases: unittest.case.TestCase

error_test()[source]
intent.utils.systematizing.enqueue_output(out, queue)[source]
intent.utils.systematizing.handle_stderr(p, queue, func)[source]
intent.utils.systematizing.piperunner(cmd, log_name=None)[source]

Fancy way to call a blocking subprocess and log its activity, while

Parameters:
  • cmd
  • log_name
intent.utils.systematizing.thread_handler(out, func)[source]

intent.utils.token module

Created on Mar 21, 2014

@author: rgeorgi

class intent.utils.token.GoldTagPOSToken(content, **kwargs)[source]

Bases: intent.utils.token.Token

classmethod fromToken(t, taglabel=None, goldlabel=None)[source]
goldlabel
taglabel
class intent.utils.token.Morph(seq='', start=None, stop=None, parent=None)[source]

Bases: intent.utils.token.Token

This class is what makes up an IGTToken. Should be comparable to a token

classmethod fromToken(token, parent)[source]
class intent.utils.token.POSToken(content, **kwargs)[source]

Bases: intent.utils.token.Token

classmethod fromToken(t, **kwargs)[source]
label
class intent.utils.token.Span(tup)[source]

Bases: object

Just return a character span.

start
stop
class intent.utils.token.Token(content, **kwargs)[source]

Bases: object

attrs
lower()[source]
morphed_tokens()[source]
morphequals(o, **kwargs)[source]
morphs(**kwargs)[source]
parent
seq
value()[source]
exception intent.utils.token.TokenException[source]

Bases: Exception

class intent.utils.token.Tokenization(seq=[], original='')[source]

Bases: list

Container class for a tokenization.

slashtags(delimiter='/')[source]
text()[source]
intent.utils.token.morpheme_tokenizer(st)[source]

Tokenize a string splitting it on typical morpheme boundaries: [ - . : = ( ) ] :param st:

intent.utils.token.sentence_tokenizer(st)[source]
intent.utils.token.tag_tokenizer(st, delimeter='/')[source]
intent.utils.token.tokenize_item(it, tokenizer=<function whitespace_tokenizer>)[source]

:rtype : __generator[Token]

intent.utils.token.tokenize_string(st, tokenizer=<function whitespace_tokenizer>)[source]

:rtype : Tokenization

intent.utils.token.whitespace_tokenizer(st)[source]

intent.utils.uniqify module

Created on Oct 23, 2013

@author: rgeorgi

intent.utils.uniqify.uniqify(seq, idfun=None)[source]

Module contents