Accessing metadataΒΆ

A Reader object has an array of methods for accessing metadata (based on headers given by the @ lines) and data (the transcriptions with * and dependent tiers with %). This page introduces the metadata methods. For data methods, see Transcriptions and annotations. For details of the Reader class, see The Reader class API.

Warning

If you are running a script on Windows, be sure to put all your code under the scope of if __name__ == '__main__':. (PyLangAcq uses the multiprocessing module to read data files.)

Metadata methods for handling information from the @ headers:

Method Return object
participant_codes() set of participant codes across all files; if by_file is True, then dict(filename: set of participant codes) instead
participants() dict(filename: dict(participant code: dict of the @ID information for that participant))
age() dict(filename: tuple of (years, months, days))
languages() dict(filename: list of languages based on the @Languages header)
date() dict(filename: tuple of (year, month, day))
headers() dict(filename: dict(header name: the content of that header))

Among these methods, only participant_codes() has the optional parameter by_files.

To illustrate metadata access methods, it is helpful to be familiar with what the headers (= the lines beginning with @) look like in a CHAT transcript such as eve01.cha:

@UTF8
@PID:       11312/c-00034743-1
@Begin
@Languages: eng
@Participants:      CHI Eve Target_Child , MOT Sue Mother , COL Colin Investigator , RIC Richard Investigator
@ID:        eng|Brown|CHI|1;6.|female|||Target_Child|||
@ID:        eng|Brown|MOT|||||Mother|||
@ID:        eng|Brown|COL|||||Investigator|||
@ID:        eng|Brown|RIC|||||Investigator|||
@Date:      15-OCT-1962
@Time Duration:     10:00-11:00

Using the metadata access methods:

>>> from pprint import pprint
>>> import pylangacq as pla
>>> eve = pla.read_chat('Brown/Eve/*.cha')
>>> eve01_filename = eve.find_filename('eve01.cha')  # absolute-path filename of eve01.cha
>>> eve.participant_codes()  # across all 20 files
{'RIC', 'COL', 'URS', 'FAT', 'GLO', 'CHI', 'MOT'}
>>> eve.participant_codes(by_files=True)[eve01_filename]  # only for eve01.cha
{'COL', 'CHI', 'MOT', 'RIC'}
>>>
>>> pprint(eve.participants()[eve01_filename])
{'CHI': {'SES': '',
         'age': '1;6.',
         'corpus': 'Brown',
         'custom': '',
         'education': '',
         'group': '',
         'language': 'eng',
         'participant_name': 'Eve',
         'participant_role': 'Target_Child',
         'sex': 'female'},
 'COL': {'SES': '',
         'age': '',
         'corpus': 'Brown',
         'custom': '',
         'education': '',
         'group': '',
         'language': 'eng',
         'participant_name': 'Colin',
         'participant_role': 'Investigator',
         'sex': ''},
 'MOT': {'SES': '',
         'age': '',
         'corpus': 'Brown',
         'custom': '',
         'education': '',
         'group': '',
         'language': 'eng',
         'participant_name': 'Sue',
         'participant_role': 'Mother',
         'sex': ''},
 'RIC': {'SES': '',
         'age': '',
         'corpus': 'Brown',
         'custom': '',
         'education': '',
         'group': '',
         'language': 'eng',
         'participant_name': 'Richard',
         'participant_role': 'Investigator',
         'sex': ''}}
>>>
>>> eve.age()[eve01_filename]  # defaults to the target child's age; (years, months, days)
(1, 6, 0)
>>> eve.age(month=True)[eve01_filename]  # target child's age in months
18.0
>>> eve.age(participant='MOT')[eve01_filename]  # no age info for MOT
(0, 0, 0)
>>>
>>> eve.languages()[eve01_filename]  # list but not set; ordering matters in bi/multilingualism
['eng']
>>>
>>> eve.date()[eve01_filename]  # date of recording
(1962, 10, 17)

If the CHAT file has headers that are not covered by specific built-in methods illustrated above, they are always accessible with headers():

>>> pprint(eve.headers()[eve01_filename])
{'Date': '17-OCT-1962',
 'Languages': 'eng',
 'PID': '11312/c-00034743-1',
 'Participants': {'CHI': {'SES': '',
                          'age': '1;6.',
                          'corpus': 'Brown',
                          'custom': '',
                          'education': '',
                          'group': '',
                          'language': 'eng',
                          'participant_name': 'Eve',
                          'participant_role': 'Target_Child',
                          'sex': 'female'},
                  'COL': {'SES': '',
                          'age': '',
                          'corpus': 'Brown',
                          'custom': '',
                          'education': '',
                          'group': '',
                          'language': 'eng',
                          'participant_name': 'Colin',
                          'participant_role': 'Investigator',
                          'sex': ''},
                  'MOT': {'SES': '',
                          'age': '',
                          'corpus': 'Brown',
                          'custom': '',
                          'education': '',
                          'group': '',
                          'language': 'eng',
                          'participant_name': 'Sue',
                          'participant_role': 'Mother',
                          'sex': ''},
                  'RIC': {'SES': '',
                          'age': '',
                          'corpus': 'Brown',
                          'custom': '',
                          'education': '',
                          'group': '',
                          'language': 'eng',
                          'participant_name': 'Richard',
                          'participant_role': 'Investigator',
                          'sex': ''}},
 'Tape Location': '850',
 'Time Duration': '11:30-12:00',
 'UTF8': ''}