lingpy.convert package¶
Submodules¶
lingpy.convert.cldf module¶
Basic functions for the conversion from LingPy to CLDF and vice versa.
- lingpy.convert.cldf.to_cldf(wordlist, path='cldf', source_path=None, ref='cogid', segments='tokens', form='ipa', note='note', form_in_source='value', source=None, alignment=None)¶
Convert a wordlist in LingPy to CLDF.
- Parameters
wordlist : ~lingpy.basic.wordlist.Wordlist
A regular Wordlist object (or similar).
path : str (default=’cldf’)
The name of the directory to which the files will be written.
source_path : str (default=None)
If available, specify the path of your BibTex file with the sources.
ref : str (default=”cogid”)
The column in which the cognate sets are stored.
segments : str (default=”tokens”)
The column in which the segmented phonetic strings are stored.
form : str (default=”ipa”)
The column in which the unsegmented phonetic strings are stored.
note : str (default=None)
The column in which you store your comments.
form_in_source : str (default=None)
The column in which you store the original form in the source.
source : str (default=None)
The column in which you store your source information.
alignment : str (default=”alignment”)
The column in which you store the alignments.
lingpy.convert.graph module¶
Conversion routines for the GML format.
- lingpy.convert.graph.gls2gml(gls, graph, tree, filename='')¶
Create GML-representation of a given gain-loss-scenario (GLS).
- Parameters
gls : list
A list of tuples, indicating the origins of characters along a tree.
graph : networkx.graph
A graph that serves as a template for the plotting of the GLS.
tree : cogent.tree.PhyloNode
A tree object.
- lingpy.convert.graph.igraph2networkx(graph)¶
- lingpy.convert.graph.networkx2igraph(graph)¶
Helper function converts networkx graph to igraph graph object.
- lingpy.convert.graph.nwk2gml(treefile, filename='')¶
Function converts a tree in newick format to a network in gml-format.
- treefilestr
Either a str defining the path to a file containing the tree in Newick-format, or the tree-string itself.
- filenamestr (default=’lingpy’)
The name of the output GML-file. If filename is set to c{None}, the function returns a
Graph
.
- Returns
graph : networkx.Graph
- lingpy.convert.graph.radial_layout(treestring, change=<function <lambda>>, degree=100, filename='', start=0, root='root')¶
Function calculates a simple radial tree layout.
- Parameters
treefile : str
Either a str defining the path to a file containing the tree in Newick-format, or the tree-string itself.
filename : str (default=None)
The name of the output file (GML-format). If set to c{None}, no output will be written to file.
change : function (default = lambda x:2 * x**2)
The function used to modify the radius in the polar projection of the tree.
- Returns
graph : networkx.Graph
A graph representation of the tree with coordinates specified in the graphics-attribute of the nodes.
Notes
This function creates a radial tree-layout from a given tree specified in Newick format.
lingpy.convert.html module¶
Basic functions for HTML-plots.
- lingpy.convert.html.alm2html(infile, title='', shorttitle='', filename='', colored=False, main_template='', table_template='', dataset='', confidence=False, **keywords)¶
Convert files in
alm
-format into coloredhtml
-format.- Parameters
title : str
Define the title of the output file. If no title is provided, the default title
LexStat - Automatic Cognate Judgments
will be used.shorttitle : str
Define the shorttitle of the
html
-page. If no title is provided, the default titleLexStat
will be used.
Notes
The coloring of sound segments with respect to the sound class they belong to is based on the definitions given in the
color
Model
. It can easily be changed and adapted.
- lingpy.convert.html.colorRange(number, brightness=300)¶
Function returns different colors for the given range.
Notes
Idea taken from http://stackoverflow.com/questions/876853/generating-color-ranges-in-python .
- lingpy.convert.html.msa2html(msa, shorttitle='', filename='', template='', **keywords)¶
Convert files in
msa
-format into coloredhtml
-format.- Parameters
msa : dict
A dictionary object that contains all the information of an MSA object.
shorttitle : str
Define the shorttitle of the
html
-page. If no title is provided, the default titleSCA
will be used.filename : str (default=””)
Define the name of the output file. If no name is defined, the name of the input file will be taken as a default.
template : str (default=””)
The path to the template file. If no name is defined, the basic template will be used. The basic template currently used can be found under
lingpy/data/templates/msa2html.html
.
See also
Notes
The coloring of sound segments with respect to the sound class they belong to is based on the definitions given in the
color
Model
. It can easily be changed and adapted.Examples
Load the libary.
>>> from lingpy import *
Load an
msq
-file from the test-sets.>>> msa = MSA('harry.msq')
Align the data progressively and carry out a check for swapped sites.
>>> msa.prog_align() >>> msa.swap_check() >>> print(msa) w o l - d e m o r t w a l - d e m a r - v - l a d i m i r -
Save the data to the file
harry.msa
.>>> msa.output('msa',filename='harry')
Save the
msa
-object ashtml
.>>> msa.output('html',filename='harry')
- lingpy.convert.html.msa2tex(infile, template='', filename='', **keywords)¶
Convert an MSA to a tabular representation which can easily be used in LaTeX documents.
- lingpy.convert.html.psa2html(infile, **kw)¶
Function converts a PSA-file into colored html-format.
- lingpy.convert.html.string2html(taxon, string, swaps=[], tax_len=None)¶
Function converts an (aligned) string into colored html-format.
@deprecated
- lingpy.convert.html.tokens2html(string, swaps=[], tax_len=None)¶
Function converts an (aligned) string into colored html-format.
Notes
This function is currently not used by any other program. So it might be useful to just deprecate it.
@deprecated
lingpy.convert.plot module¶
Module provides functions for the transformation of text data into visually appealing format.
- lingpy.convert.plot.plot_concept_evolution(scenarios, tree, fileformat='pdf', degree=90, **keywords)¶
Plot the evolution according to the MLN method of all words for a given concept.
- Parameters
tree : str
A tree representation in Newick format.
fileformat : str (default=”pdf”)
A valid fileformat according to Matplotlib.
degree : int (default=90)
The degree by which the tree is drawn. 360 yields a circular tree, 180 yields a tree filling half of the space of a circle.
- lingpy.convert.plot.plot_gls(gls, treestring, degree=90, fileformat='pdf', **keywords)¶
Plot a gain-loss scenario for a given reference tree.
- lingpy.convert.plot.plot_heatmap(wordlist, filename='heatmap', fileformat='pdf', ref='cogid', normalized=False, refB='', **keywords)¶
Create a heatmap-representation of shared cognates for a given wordlist.
- Parameters
wordlist : lingpy.basic.wordlist.Wordlist
A Wordlist object containing cognate IDs.
filename : str (default=”heatmap”)
Name of the file to which the heatmap will be written.
fileformat : str (default=”pdf”)
A regular matplotlib-fileformat (pdf, png, pgf, svg).
ref : str (default=”cogid’)
The name of the column that contains the cognate identifiers.
normalized : {bool str} (default=True)
If set to c{False}, don’t normalize the data. Otherwise, select the normalization method, choose between:
“jaccard” for the Jaccard-distance (see
Bategelj1995
for details), and“swadesh” for traditional lexicostatistical calculation of shared cognate percentages.
cmap : matplotlib.cm (default=matplotlib.cm.jet)
The color scheme to be used for the heatmap.
steps : int (default=5)
The number of steps in which names of taxa will be written to the axes.
xrotation : int (default=45)
The rotation of the taxon-names on the x-axis.
colorbar : bool (default=True)
Specify, whether a colorbar should be added to the plot.
figsize : tuple (default=(10,10))
Specify the size of the figure.
tree : str (default=’’)
A tree passed for the taxa in Newick-format. If no tree is specified, the method looks for a tree object in the Wordlist.
Notes
This function plots shared cognate percentages.
- lingpy.convert.plot.plot_tree(treestring, degree=90, fileformat='pdf', root='root', **keywords)¶
Plot a Newick tree to PDF or other graphical formats.
- Parameters
treestring : str
A string in Newick format.
degree : int
Determine the degree of the tree (this determines how “circular” the tree will be).
fileformat : str (default=”pdf”)
Select the fileformat to which the tree shall be written.
filename : str
Determine the name of the file to which the data shall be written. Defaults to a timestamp.
figsize : tuple (default=(10,10))
Determine the size of the figure.
lingpy.convert.strings module¶
Basic functions for the conversion of Python-internal data into strings.
- lingpy.convert.strings.matrix2dst(matrix, taxa=None, stamp='', filename='', taxlen=10, comment='#')¶
Convert matrix to dst-format.
- Parameters
taxa : {None, list}
List of taxon names corresponding to the distances. Make sure that you only use alphanumeric characters and the understroke for assigning the taxon names. Especially avoid the usage of brackets, since this will confuse many phylogenetic programs.
stamp : str (default=’’)
Convenience stamp passed as a comment that can be used to indicate how the matrix was created.
filename : str
If you specify a filename, the data will be written to file.
taxlen : int (default=10)
Indicate how long the taxon names are allowed to be. The Phylip package only allows taxon names consisting of maximally 10 characters. Other packages, however, allow more. If Phylip compatibility is not important for you and you just want to allow for as long taxon names as possible, set this value to 0.
comment : str (default = ‘#’)
The comment character to be used when adding additional information in the “stamp”.
- Returns
output : {str or file}
Depending on your settings, this function returns a string in DST (=Phylip) format, or a file containing the string.
- lingpy.convert.strings.msa2str(msa, wordlist=False, comment='#', _arange='{stamp}{comment}\n{meta}{comment}\n{body}', merge=False)¶
Function converts an MSA object into a string.
- lingpy.convert.strings.multistate2nex(taxa, matrix, filename='', missing='?')¶
Convert the data in a given wordlist to NEXUS-format for multistate analyses in PAUP.
- Parameters
taxa : list
The list of taxa that shall be written to file.
matrix : list
The multi-state matrix with the first dimension indicating the taxa, and the second their states.
filename : str (default=””)
If not specified, the filename of the Wordlist will be taken, otherwise, it specifies the name of the file to which the data will be written.
- lingpy.convert.strings.pap2csv(taxa, paps, filename='')¶
Write paps created by the Wordlist class to a csv-file.
- lingpy.convert.strings.pap2nex(taxa, paps, missing=0, filename='', datatype='STANDARD')¶
Function converts a list of paps into nexus file format.
- Parameters
taxa : list
List of taxa.
paps : {list, dict}
A two-dimensional list with the first dimension being identical to the number of taxa and the second dimension being identical to the number of paps. If a dictionary is passed, each key represents a given pap. The following two structures will thus be treated identically:
>>> paps = [[1,0],[1,0],[1,0]] # two languages, three paps >>> paps = {1:[1,0], 2:[1,0], 3:[1,0]} # two languages, three paps
missing : {str, int} (default=0)
Indicate how missing characters are represented in the original data.
- lingpy.convert.strings.scorer2str(scorer)¶
Convert a scoring function to a string.
- lingpy.convert.strings.write_nexus(wordlist, mode='mrbayes', filename='mrbayes.nex', ref='cogid', missing='?', gap='-', custom=None, custom_name='lingpy', commands=None, commands_name='mrbayes')¶
Write a nexus file for phylogenetic analyses.
- Parameters
wordlist : lingpy.basic.wordlist.Wordlist
A Wordlist object containing cognate IDs.
mode : str (default=”mrbayes”)
- The name of the output nexus style. Valid values are:
‘MRBAYES’: a MrBayes formatted nexus file.
‘SPLITSTREE’: a SPLITSTREE formatted nexus file.
‘BEAST’: a BEAST formatted nexus file.
- ‘BEASTWORDS’: a BEAST formatted nexus for word-partitioned
analyses.
‘TRAITLAB’: a TRAITLab formatted nexus.
filename : str (default=None)
Name of the file to which the nexus file will be written. If set to c{None}, then this function will not write the nexus ontent to a file, but simply return the content as a string.
ref: str (default=”cogid”) :
Column in which you store the cognate sets in your data.
gap : str (default=”-“)
The symbol for gaps (not relevant for linguistic analyses).
missing : str (default=”?”)
The symbol for missing characters.
custom : list {default=None)
This information allows to add custom information to the nexus file, like, for example, the structure of the characters, their original concept, or their type, and it will be written into a custom block in the nexus file. The name of the custom block can be specified with help of the custom_name keyword. The content is a list of strings which will be written line by line into the custom block.
custom_name : str (default=”lingpy”)
The name of the custom block which will be written to the file.
commands : list (default=None)
If specified, will write an additional block containing commands for phylogenetic software. The commands are passed as a list, containing strings. The name of the block is given by the keywords commands_name.
commands_name : str (default=”mrbayes”)
Determines how the block will be called to which the commands will be written.
- Returns
nexus : str
A string containing nexus file output
lingpy.convert.tree module¶
Functions for tree calculations and working with trees.
- lingpy.convert.tree.nwk2tree_matrix(newick)¶
Convert a newick file to a tree matrix.
Notes
This is an additional function that can be used for plots with help of matplotlibs functions. The tree_matrix is compatible with those matrices that scipy’s linkage functions create.
Module contents¶
Package provides different methods for file conversion.