util — General functions and constants

The util module defines general functions and constants used throughout the library.

pylangacq.util.clean_utterance(utterance, phon=False)[source]

Filter away the CHAT-style annotations in utterance.

Parameters:

utterance : str

The utterance as a str

phon : bool, optional

whether we are handling PhonBank data; defaults to False. If True, words like “xxx” and “yyy” won’t be removed.

Returns:

str

pylangacq.util.clean_word(word)[source]

Clean the word.

Parameters:word : str
Returns:str
pylangacq.util.convert_date_to_tuple(date_str)[source]

Convert date_str to (year, month, day).

Parameters:date_str : str
Returns:(int, int, int)

Examples

>>> convert_date_to_tuple('01-FEB-2016')
(2016, 2, 1)
pylangacq.util.find_indices(longstr, substring)[source]

Find all indices of non-overlapping substring in longstr.

Parameters:

longstr : str

substring : str

Returns:

list of int

List of indices of the long string for where substring occurs

pylangacq.util.get_lemma_from_mor(mor)[source]

Extract lemma from mor.

Parameters:mor : tuple(str, str, str)
Returns:str
pylangacq.util.get_participant_code(tier_marker_seq)[source]

Return the participant code from a tier marker set.

Parameters:

tier_marker_seq : iterable of str

A sequence of tier markers like {'CHI', '%mor', '%gra'}

Returns:

str

A participant code, e.g., 'CHI'. Return None if no participant code is found.

pylangacq.util.remove_extra_spaces(inputstr)[source]

Remove extra spaces in inputstr so that there are only single spaces.

Parameters:inputstr : str
Returns:str