Library reference¶
-
class
pylangacq.
Reader
(*filenames, **kwargs)[source]¶ A class for reading multiple CHAT files.
- Parameters
- filenamesstr or iterable or str, optional
One or more filenames. A filename may match exactly a CHAT file (e.g.,
'eve01.cha'
) or matches multiple files by glob patterns (e.g.,'eve*.cha'
, for'eve01.cha'
,'eve02.cha'
, etc.).*
matches any number (including zero) of characters, while?
matches exactly one character. A filename can be either an absolute or relative path. If no filenames are provided, an empty Reader instance is created.- kwargs
Only the keyword
encoding
is recognized, which defaults to ‘utf8’. (New in version 0.9)
Methods
IPSyn
(self[, participant])Return a map from a file path to the file’s IPSyn.
MLU
(self[, participant])Return a map from a file path to the file’s MLU by morphemes.
MLUm
(self[, participant])Return a map from a file path to the file’s MLU by morphemes.
MLUw
(self[, participant])Return a map from a file path to the file’s MLU by words.
TTR
(self[, participant])Return a map from a file path to the file’s TTR.
abspath
(self, basename)Return the absolute path of
basename
.add
(self, *filenames)Add one or more CHAT
filenames
to the current reader.age
(self[, participant, months])Return a map from a file path to the participant’s age.
clear
(self)Clear everything and reset as an empty Reader instance.
concordance
(self, search_item[, …])Return a list of utterances with search_item for participant.
date_of_birth
(self)Return a map from a file path to the date of birth.
dates_of_recording
(self)Return a map from a file path to the date of recording.
filenames
(self[, sorted_by_age])Return the set of absolute-path filenames.
from_chat_files
(*filenames, **kwargs)Create a
Reader
object with CHAT data files.from_chat_str
(chat_str[, encoding])Create a
Reader
object with CHAT data as a string.headers
(self)Return a dict mapping a file path to the headers of that file.
index_to_tiers
(self)Return a dict mapping a file path to the file’s index_to_tiers dict.
languages
(self)Return a map from a file path to the languages used.
number_of_files
(self)Return the number of files.
number_of_utterances
(self[, participant, …])Return the number of utterances for participant in all files.
part_of_speech_tags
(self[, participant, …])Return the part-of-speech tags in the data for participant.
participant_codes
(self[, by_files])Return the participant codes (e.g.,
{'CHI', 'MOT'}
).participants
(self)Return a dict mapping a file path to the file’s participant info.
remove
(self, *filenames)Remove one or more CHAT
filenames
from the current reader.search
(self, search_item[, participant, …])Return a list of elements containing search_item by participant.
sents
(self[, participant, exclude, by_files])Return a list of sents by participant in all files.
tagged_sents
(self[, participant, exclude, …])Return a list of tagged sents by participant in all files.
tagged_words
(self[, participant, exclude, …])Return a list of tagged words by participant in all files.
update
(self, reader)Combine the current CHAT Reader instance with
reader
.utterances
(self[, participant, exclude, …])Return a list of (participant, utterance) pairs from all files.
word_frequency
(self[, participant, exclude, …])Return a word frequency counter for participant in all files.
word_ngrams
(self, n[, participant, exclude, …])Return a word
n
-gram counter byparticipant
in all files.words
(self[, participant, exclude, by_files])Return a list of words by participant in all files.
-
IPSyn
(self, participant='CHI')[source]¶ Return a map from a file path to the file’s IPSyn.
IPSyn = index of productive syntax
- Parameters
- participantstr, optional
The specified participant (default to
'CHI'
).
- Returns
- dict(str: int)
-
MLU
(self, participant='CHI')[source]¶ Return a map from a file path to the file’s MLU by morphemes.
MLU = mean length of utterance. This method is identical to
MLUm
.- Parameters
- participantstr, optional
The specified participant (default to
'CHI'
).
- Returns
- dict(str: float)
-
MLUm
(self, participant='CHI')[source]¶ Return a map from a file path to the file’s MLU by morphemes.
MLU = mean length of utterance. This method is identical to
MLUm
.- Parameters
- participantstr, optional
The specified participant (default to
'CHI'
).
- Returns
- dict(str: float)
-
MLUw
(self, participant='CHI')[source]¶ Return a map from a file path to the file’s MLU by words.
MLU = mean length of utterance.
- Parameters
- participantstr, optional
The specified participant (default to
'CHI'
).
- Returns
- dict(str: float)
-
TTR
(self, participant='CHI')[source]¶ Return a map from a file path to the file’s TTR.
TTR = type-token ratio
- Parameters
- participantstr, optional
The specified participant (default to
'CHI'
).
- Returns
- dict(str: float)
-
abspath
(self, basename)[source]¶ Return the absolute path of
basename
.- Parameters
- basenamestr
The basename (e.g., “foobar.cha”) of the desired data file.
- Returns
- str
-
add
(self, *filenames)[source]¶ Add one or more CHAT
filenames
to the current reader.- Parameters
- *filenames
Filenames may take glob patterns with wildcards
*
and?
.
-
age
(self, participant='CHI', months=False)[source]¶ Return a map from a file path to the participant’s age.
The age is in the form of (years, months, days).
- Parameters
- participantstr, optional
The specified participant
- monthsbool, optional
If
True
, age is in months.
- Returns
- dict(str: tuple(int, int, int)) or dict(str: float)
-
concordance
(self, search_item, participant=None, exclude=None, match_entire_word=True, lemma=False, by_files=False)[source]¶ Return a list of utterances with search_item for participant.
All strings are aligned for search_item by space padding to create the word concordance effect.
- Parameters
- search_itemstr
Word or lemma to search for.
- match_entire_wordbool, optional
If False (default: True), substring matching is performed.
- lemmabool, optional
If True (default: False), search_item refers to the lemma (from “mor” in the tagged word) instead.
- participantstr or iterable of str, optional
Participants of interest. If unspecified or
None
, all participants are included.- excludestr or iterable of str, optional
Participants to exclude. If unspecified or
None
, no participants are excluded.- by_filesbool, optional
If
True
, return dict(absolute-path filename: X for that file) instead of X for all files altogether.
- Returns
- list, or dict(str: list)
-
date_of_birth
(self)[source]¶ Return a map from a file path to the date of birth.
- Returns
- dict(str: dict(str: tuple(int, int, int)))
-
dates_of_recording
(self)[source]¶ Return a map from a file path to the date of recording.
The date of recording is in the form of (year, month, day).
- Returns
- dict(str: list(tuple(int, int, int)))
-
filenames
(self, sorted_by_age=False)[source]¶ Return the set of absolute-path filenames.
- Parameters
- sorted_by_agebool, optional
Whether to return the filenames as a list sorted by the target child’s age.
- Returns
- set of str or list of str
-
classmethod
from_chat_files
(*filenames, **kwargs)[source]¶ Create a
Reader
object with CHAT data files.- Parameters
- filenamesstr or iterable or str, optional
One or more filenames. A filename may match exactly a CHAT file (e.g.,
'eve01.cha'
) or matches multiple files by glob patterns (e.g.,'eve*.cha'
, for'eve01.cha'
,'eve02.cha'
, etc.).*
matches any number (including zero) of characters, while?
matches exactly one character. A filename can be either an absolute or relative path. If no filenames are provided, an empty Reader instance is created.- kwargs
Only the keyword
encoding
is recognized, which defaults to ‘utf8’. (New in version 0.9)
- Returns
- Reader
Notes
Because CHAT data most likely comes as files on disk, an equivalent library top-level function
pylangacq.read_chat
is defined for convenience.
-
classmethod
from_chat_str
(chat_str, encoding='utf8')[source]¶ Create a
Reader
object with CHAT data as a string.- Parameters
- chat_strstr
CHAT data as an in-memory string. It would be what a single CHAT data file contains.
- encoding
Encoding of the CHAT data
- Returns
- Reader
-
headers
(self)[source]¶ Return a dict mapping a file path to the headers of that file.
- Returns
- dict(str: dict)
-
index_to_tiers
(self)[source]¶ Return a dict mapping a file path to the file’s index_to_tiers dict.
- Returns
- dict(str: dict)
-
languages
(self)[source]¶ Return a map from a file path to the languages used.
- Returns
- dict(str: list(str))
-
number_of_utterances
(self, participant=None, exclude=None, by_files=False)[source]¶ Return the number of utterances for participant in all files.
- Parameters
- participantstr or iterable of str, optional
Participants of interest. If unspecified or
None
, all participants are included.- excludestr or iterable of str, optional
Participants to exclude. If unspecified or
None
, no participants are excluded.- by_filesbool, optional
If
True
, return dict(absolute-path filename: X for that file) instead of X for all files altogether.
- Returns
- int or dict(str: int)
Return the part-of-speech tags in the data for participant.
- Parameters
- participantstr or iterable of str, optional
Participants of interest. If unspecified or
None
, all participants are included.- excludestr or iterable of str, optional
Participants to exclude. If unspecified or
None
, no participants are excluded.- by_filesbool, optional
If
True
, return dict(absolute-path filename: X for that file) instead of X for all files altogether.
- Returns
- set or dict(str: set)
-
participant_codes
(self, by_files=False)[source]¶ Return the participant codes (e.g.,
{'CHI', 'MOT'}
).- Parameters
- by_filesbool, optional
If
True
, return dict(absolute-path filename: X for that file) instead of X for all files altogether.
- Returns
- set(str) or dict(str: set(str))
-
participants
(self)[source]¶ Return a dict mapping a file path to the file’s participant info.
- Returns
- dict(str: dict)
-
remove
(self, *filenames)[source]¶ Remove one or more CHAT
filenames
from the current reader.- Parameters
- *filenames
Filenames may take glob patterns with wildcards
*
and?
.
-
search
(self, search_item, participant=None, exclude=None, match_entire_word=True, lemma=False, output_tagged=True, output_sents=True, by_files=False)[source]¶ Return a list of elements containing search_item by participant.
- Parameters
- search_itemstr
Word or lemma to search for.
- match_entire_wordbool, optional
Whether to match for the entire word.
- lemmabool, optional
Whether the
search_item
refers to the lemma (from “mor” in the tagged word) instead.- output_taggedbool, optional
Whether a word in the return object is a tagged word of the (word, pos, mor, rel) tuple; otherwise just a word string.
- output_sentsbool, optional
Whether each element in the return object is a list for each utterance; otherwise each element is a word (tagged or untagged) without the utterance structure.
- participantstr or iterable of str, optional
Participants of interest. If unspecified or
None
, all participants are included.- excludestr or iterable of str, optional
Participants to exclude. If unspecified or
None
, no participants are excluded.- by_filesbool, optional
If
True
, return dict(absolute-path filename: X for that file) instead of X for all files altogether.
- Returns
- list or dict(str: list)
-
sents
(self, participant=None, exclude=None, by_files=False)[source]¶ Return a list of sents by participant in all files.
- Parameters
- participantstr or iterable of str, optional
Participants of interest. If unspecified or
None
, all participants are included.- excludestr or iterable of str, optional
Participants to exclude. If unspecified or
None
, no participants are excluded.- by_filesbool, optional
If
True
, return dict(absolute-path filename: X for that file) instead of X for all files altogether.
- Returns
- list(list(str)) or dict(str: list(list(str)))
-
tagged_sents
(self, participant=None, exclude=None, by_files=False)[source]¶ Return a list of tagged sents by participant in all files.
- Parameters
- participantstr or iterable of str, optional
Participants of interest. If unspecified or
None
, all participants are included.- excludestr or iterable of str, optional
Participants to exclude. If unspecified or
None
, no participants are excluded.- by_filesbool, optional
If
True
, return dict(absolute-path filename: X for that file) instead of X for all files altogether.
- Returns
- list(list(tuple)) or dict(str: list(list(tuple)))
-
tagged_words
(self, participant=None, exclude=None, by_files=False)[source]¶ Return a list of tagged words by participant in all files.
- Parameters
- participantstr or iterable of str, optional
Participants of interest. If unspecified or
None
, all participants are included.- excludestr or iterable of str, optional
Participants to exclude. If unspecified or
None
, no participants are excluded.- by_filesbool, optional
If
True
, return dict(absolute-path filename: X for that file) instead of X for all files altogether.
- Returns
- list(tuple) or dict(str: list(tuple))
-
update
(self, reader)[source]¶ Combine the current CHAT Reader instance with
reader
.- Parameters
- readerReader
-
utterances
(self, participant=None, exclude=None, clean=True, by_files=False)[source]¶ Return a list of (participant, utterance) pairs from all files.
- Parameters
- cleanbool, optional
Whether to filter away the CHAT annotations in the utterance.
- participantstr or iterable of str, optional
Participants of interest. If unspecified or
None
, all participants are included.- excludestr or iterable of str, optional
Participants to exclude. If unspecified or
None
, no participants are excluded.- by_filesbool, optional
If
True
, return dict(absolute-path filename: X for that file) instead of X for all files altogether.
- Returns
- list(str) or dict(str: list(str))
-
word_frequency
(self, participant=None, exclude=None, keep_case=True, by_files=False)[source]¶ Return a word frequency counter for participant in all files.
- Parameters
- participantstr or iterable of str, optional
Participants of interest. If unspecified or
None
, all participants are included.- excludestr or iterable of str, optional
Participants to exclude. If unspecified or
None
, no participants are excluded.- by_filesbool, optional
If
True
, return dict(absolute-path filename: X for that file) instead of X for all files altogether.- keep_casebool, optional
If
True
(the default), case distinctions are kept, e.g., word tokens like “the” and “The” are treated as distinct. IfFalse
, all word tokens are forced to be in lowercase.
- Returns
- Counter, or dict(str: Counter)
-
word_ngrams
(self, n, participant=None, exclude=None, keep_case=True, by_files=False)[source]¶ Return a word
n
-gram counter byparticipant
in all files. participant : str or iterable of str, optionalParticipants of interest. If unspecified or
None
, all participants are included.- excludestr or iterable of str, optional
Participants to exclude. If unspecified or
None
, no participants are excluded.- by_filesbool, optional
If
True
, return dict(absolute-path filename: X for that file) instead of X for all files altogether.- keep_casebool, optional
If
True
(the default), case distinctions are kept, e.g., word tokens like “the” and “The” are treated as distinct. IfFalse
, all word tokens are forced to be in lowercase.
- Returns
- Counter, or dict(str: Counter)
-
words
(self, participant=None, exclude=None, by_files=False)[source]¶ Return a list of words by participant in all files.
- Parameters
- participantstr or iterable of str, optional
Participants of interest. If unspecified or
None
, all participants are included.- excludestr or iterable of str, optional
Participants to exclude. If unspecified or
None
, no participants are excluded.- by_filesbool, optional
If
True
, return dict(absolute-path filename: X for that file) instead of X for all files altogether.
- Returns
- list(str) or dict(str: list(str))
-
pylangacq.
read_chat
(*filenames, **kwargs)[source]¶ Create a
Reader
object with CHAT data files.- Parameters
- filenamesstr or iterable or str, optional
One or more filenames. A filename may match exactly a CHAT file (e.g.,
'eve01.cha'
) or matches multiple files by glob patterns (e.g.,'eve*.cha'
, for'eve01.cha'
,'eve02.cha'
, etc.).*
matches any number (including zero) of characters, while?
matches exactly one character. A filename can be either an absolute or relative path. If no filenames are provided, an empty Reader instance is created.- kwargs
Only the keyword
encoding
is recognized, which defaults to ‘utf8’. (New in version 0.9)
- Returns
- Reader
Modules: