util — General functions and constants

The util module defines general functions and constants used throughout the library.

pylangacq.util.clean_utterance(utterance, phon=False)[source]

Filter away the CHAT-style annotations in utterance.

Parameters
utterancestr

The utterance as a str

phonbool, optional

whether we are handling PhonBank data; defaults to False. If True, words like “xxx” and “yyy” won’t be removed.

Returns
str
pylangacq.util.clean_word(word)[source]

Clean the word.

Parameters
wordstr
Returns
str
pylangacq.util.convert_date_to_tuple(date_str)[source]

Convert date_str to (year, month, day).

Parameters
date_strstr
Returns
(int, int, int)

Examples

>>> convert_date_to_tuple('01-FEB-2016')
(2016, 2, 1)
pylangacq.util.find_indices(longstr, substring)[source]

Find all indices of non-overlapping substring in longstr.

Parameters
longstrstr
substringstr
Returns
list of int

List of indices of the long string for where substring occurs

pylangacq.util.get_lemma_from_mor(mor)[source]

Extract lemma from mor.

Parameters
mortuple(str, str, str)
Returns
str
pylangacq.util.get_participant_code(tier_marker_seq)[source]

Return the participant code from a tier marker set.

Parameters
tier_marker_seqiterable of str

A sequence of tier markers like {'CHI', '%mor', '%gra'}

Returns
str

A participant code, e.g., 'CHI'. Return None if no participant code is found.

pylangacq.util.get_time_marker(utterance)[source]

Get the timer marker in this utterance.

Time marker provides the start and end times (in milliseconds) for a segment in a digitized video or audio file. For example:

·0_1073·

‘·’ is ASCII CODE 21 (0x15), for NAK (Negative Acknowledgement)

Parameters
utterancestr

The raw utterance

Returns
tuple of (int, int)

The start and end times (in milliseconds) for this utterance

Notes

If the option “multiple” is selected in the @Options field, then these ‘·’ bullets may also occur within utterances. However, this function only returns one timer marker. (See https://talkbank.org/manuals/CHAT.pdf)

pylangacq.util.remove_extra_spaces(inputstr)[source]

Remove extra spaces in inputstr so that there are only single spaces.

Parameters
inputstrstr
Returns
str