lingpy.data package¶
Subpackages¶
Submodules¶
lingpy.data.derive module¶
Module for the derivation of sound class models.
The module provides functions for the customized compilation of sound-class models. All models are defined in simple text files. In order to guarantee their quick access when loading the library, the models are compiled and stored in binary files.
- lingpy.data.derive.compile_dvt(path='')¶
Function compiles diacritics, vowels, and tones.
Notes
Diacritics, vowels, and tones are defined in the
data/models/dv/
directory of the LingPy package and automatically loaded when loading the LingPy library. The values are defined as the constantsrcParams['vowels']
,rcParams['diacritics']
, andrcParams['tones']
. Their core purpose is to guide the tokenization of IPA strings (cf.ipa2tokens()
). In order to change the variables, one simply has to change the text filesdiacritics
,tones
, andvowels
in thedata/models/dv
directory. The structure of these files is fairly simple: Each line contains a vowel or a diacritic character, whereas diacritics are preceded by a dash.
- lingpy.data.derive.compile_model(model, path=None)¶
Function compiles customized sound-class models.
- Parameters
model : str
A string indicating the name of the model which shall be created.
path : str
A string indication the path where the model-folder is stored.
See also
Notes
A model is defined by a folder placed in
data/models
directory of the LingPy package. The name of the folder reflects the name of the model. It contains three files: the fileconverter
, the fileINFO
, and the optional filescorer
. The format requirements for these files are as follows:INFO
The
INFO
-file serves as a reference for a given sound-class model. It can contain arbitrary information (and also be empty). If one wants to define specific characteristics, like thesource
, thecompiler
, thedate
, or adescription
of a given model, this can be done by employing a key-value structure in which the key is preceded by an@
and followed by a colon and the value is written right next to the key in the same line, e.g.:@source: Dolgopolsky (1986)
This information will then be read from the
INFO
file and rendered when printing the model to screen with help of theprint()
function.converter
The
converter
file contains all sound classes which are matched with their respective sound values. Each line is reserved for one class, precede by the key (preferably an ASCII-letter) representing the class:B : ɸ, β, f, p͡f, p͜f, ƀ E : ɛ, æ, ɜ, ɐ, ʌ, e, ᴇ, ə, ɘ, ɤ, è, é, ē, ě, ê, ɚ D : θ, ð, ŧ, þ, đ G : x, ɣ, χ ...
matrix
A scoring matrix indicating the alignment scores of all sound-class characters defined by the model. The scoring is structured as a simple tab-delimited text file. The first cell contains the character names, the following cells contain the scores in redundant form (with both triangles being filled):
B 10.0 -10.0 5.0 ... E -10.0 5.0 -10.0 ... F 5.0 -10.0 10.0 ... ...
scorer
The
scorer
file (which is optional) contains the graph of class-transitions which is used for the calculation of the scoring dictionary. Each class is listed in a separate line, followed by the symbolsv
,``c``, ort
(indicating whether the class represents vowels, consonants, or tones), and by the classes it is directly connected to. The strength of this connection is indicated by digits (the smaller the value, the shorter the path between the classes):A : v, E:1, O:1 C : c, S:2 B : c, W:2 E : v, A:1, I:1 D : c, S:2 ...
The information in such a file is automatically converted into a scoring dictionary (see
List2012b
for details).
Based on the information provided by the files, a dictionary for the conversion of IPA-characters to sound classes and a scoring dictionary are created and stored as a binary. The model can be loaded with help of the
Model
class and used in the various classes and functions provided by the library.
lingpy.data.model module¶
Module for handling sequence models.
- class lingpy.data.model.Model(model, path=None)¶
Bases:
object
Class for the handling of sound-class models.
- Parameters
model : { ‘sca’, ‘dolgo’, ‘asjp’, ‘art’, ‘_color’ }
A string indicating the name of the model which shall be loaded. Select between:
‘sca’ - the SCA sound-class model (see
List2012a
),‘dolgo’ - the DOLGO sound-class model (see: :evobib:`Dolgopolsky1986’),
‘asjp’ - the ASJP sound-class model (see
Brown2008
andBrown2011
),‘art’ - the sound-class model which is used for the calculation of sonority profiles and prosodic strings (see
List2012
), and‘_color’ - the sound-class model which is used for the coloring of sound-tokens when creating html-output.
Notes
Models are loaded from binary files which can be found in the
data/models/
folder of the LingPy package. A model has two essential attributes:converter
– a dictionary with IPA-tokens as keys and sound-class characters as values, andscorer
– a scoring dictionary with tuples of sound-class characters as keys and scores (integers or floats) as values.
Examples
When loading LingPy, the models
sca
,asjp
,dolgo
, andart
are automatically loaded, and they are accessible via therc()
function for global settings:>>> from lingpy import * >>> rc('asjp') <sca-model "asjp">
Define variables for the standard models for convenience:
>>> asjp = rc('asjp') >>> sca = rc('sca') >>> dolgo = rc('dolgo') >>> art = rc('art')
Check how the letter
a
is converted in the various models:>>> for m in [asjp,sca,dolgo,art]: ... print('{0} > {1} ({2})'.format('a',m.converter['a'],m.name)) ... a > a (asjp) a > A (sca) a > V (dolgo) a > 7 (art)
Retrieve basic information of a given model:
>>> print(sca) Model: sca Info: Extended sound class model based on Dolgopolsky (1986) Source: List (2012) Compiler: Johann-Mattis List Date: 2012-03
Attributes
converter
dict
A dictionary with IPA tokens as keys and sound-class characters as values.
scorer
dict
A scoring dictionary with tuples of sound-class characters as keys and similarity scores as values.
info
dict
A dictionary storing the key-value pairs defined in the
INFO
.name
str
The name of the model which is identical with the name of the folder from wich the model is loaded.
- lingpy.data.model.load_dvt(path='')¶
Function loads the default characters for IPA diacritics and IPA vowels of LingPy.
Module contents¶
LingPy comes along with many different kinds of predefined data. When loading the library, the following dictionary is automatically loaded and employed by all LingPy modules:
- rcParams : dict
As an alternative to all global variables, this dictionary contains all these variables, and additional ones. This dictionary is used for internal coding purposes and stores parameters that are globally set (if not defined otherwise by the user), such as
specific debugging messages (warnings, messages, errors)
default values, such as “gop” (gap opening penalty), “scale” (scaling factor
by which extended gaps are penalized), or “figsize” (the default size of
figures if data is plotted using matplotlib).
These default values can be changed with help of the
rc
function that takes any keyword and any variable as input and adds or modifies the specific key of the rcParams dictionary, but also provides more complex functions that change whole sets of variables, such as the following statement:>>> rc(schema="asjp")which switches the variables “asjp”, “dolgo”, etc. to the ASCII-based transcription system of the ASJP project.
If you want to change the content of c{rcParams} directly, you need to import the dictionary explicitly:
>>> from lingpy.settings import rcParamsHowever, changing the values in the dictionary randomly can produce unexpected behavior and we recommend to use the regular
rc
function for this purpose.
- lingpy.settings.rc(rval=None, rcParams_=None, **keywords)¶
Function changes parameters globally set for LingPy sessions.
- Parameters
rval : string (default=None)
Use this keyword to specify a return-value for the rc-function.
schema : {“ipa”, “asjp”}
Change the basic schema for sequence comparison. When switching to “asjp”, this means that sequences will be treated as sequences in ASJP code, otherwise, they will be treated as sequences written in basic IPA.
rcParams_ : Allow passing in a plain dict for testing.
Notes
This function is the standard way to communicate with the rcParams dictionary which is not imported as a default. If you want to see which parameters there are, you can load the rcParams dictonary directly:
>>> from lingpy.settings import rcParams
However, be careful when changing the values. They might produce some unexpected behavior.
Examples
Import LingPy:
>>> from lingpy import *
Switch from IPA transcriptions to ASJP transcriptions:
>>> rc(schema="asjp")
You can check which “basic orthography” is currently loaded:
>>> rc(basic_orthography) 'asjp' >>> rc(schema='ipa') >>> rc(basic_orthography) 'fuzzy'