lingpy.convert package

Submodules

lingpy.convert.cldf module

Basic functions for the conversion from LingPy to CLDF and vice versa.

lingpy.convert.cldf.to_cldf(wordlist, path='cldf', source_path=None, ref='cogid', segments='tokens', form='ipa', note='note', form_in_source='value', source=None, alignment=None)

Convert a wordlist in LingPy to CLDF.

Parameters

wordlist : ~lingpy.basic.wordlist.Wordlist

A regular Wordlist object (or similar).

path : str (default=’cldf’)

The name of the directory to which the files will be written.

source_path : str (default=None)

If available, specify the path of your BibTex file with the sources.

ref : str (default=”cogid”)

The column in which the cognate sets are stored.

segments : str (default=”tokens”)

The column in which the segmented phonetic strings are stored.

form : str (default=”ipa”)

The column in which the unsegmented phonetic strings are stored.

note : str (default=None)

The column in which you store your comments.

form_in_source : str (default=None)

The column in which you store the original form in the source.

source : str (default=None)

The column in which you store your source information.

alignment : str (default=”alignment”)

The column in which you store the alignments.

lingpy.convert.graph module

Conversion routines for the GML format.

lingpy.convert.graph.gls2gml(gls, graph, tree, filename='')

Create GML-representation of a given gain-loss-scenario (GLS).

Parameters

gls : list

A list of tuples, indicating the origins of characters along a tree.

graph : networkx.graph

A graph that serves as a template for the plotting of the GLS.

tree : cogent.tree.PhyloNode

A tree object.

lingpy.convert.graph.igraph2networkx(graph)
lingpy.convert.graph.networkx2igraph(graph)

Helper function converts networkx graph to igraph graph object.

lingpy.convert.graph.nwk2gml(treefile, filename='')

Function converts a tree in newick format to a network in gml-format.

treefilestr

Either a str defining the path to a file containing the tree in Newick-format, or the tree-string itself.

filenamestr (default=’lingpy’)

The name of the output GML-file. If filename is set to c{None}, the function returns a Graph.

Returns

graph : networkx.Graph

lingpy.convert.graph.radial_layout(treestring, change=<function <lambda>>, degree=100, filename='', start=0, root='root')

Function calculates a simple radial tree layout.

Parameters

treefile : str

Either a str defining the path to a file containing the tree in Newick-format, or the tree-string itself.

filename : str (default=None)

The name of the output file (GML-format). If set to c{None}, no output will be written to file.

change : function (default = lambda x:2 * x**2)

The function used to modify the radius in the polar projection of the tree.

Returns

graph : networkx.Graph

A graph representation of the tree with coordinates specified in the graphics-attribute of the nodes.

Notes

This function creates a radial tree-layout from a given tree specified in Newick format.

lingpy.convert.html module

Basic functions for HTML-plots.

lingpy.convert.html.alm2html(infile, title='', shorttitle='', filename='', colored=False, main_template='', table_template='', dataset='', confidence=False, **keywords)

Convert files in alm-format into colored html-format.

Parameters

title : str

Define the title of the output file. If no title is provided, the default title LexStat - Automatic Cognate Judgments will be used.

shorttitle : str

Define the shorttitle of the html-page. If no title is provided, the default title LexStat will be used.

Notes

The coloring of sound segments with respect to the sound class they belong to is based on the definitions given in the color Model. It can easily be changed and adapted.

lingpy.convert.html.colorRange(number, brightness=300)

Function returns different colors for the given range.

Notes

Idea taken from http://stackoverflow.com/questions/876853/generating-color-ranges-in-python .

lingpy.convert.html.msa2html(msa, shorttitle='', filename='', template='', **keywords)

Convert files in msa-format into colored html-format.

Parameters

msa : dict

A dictionary object that contains all the information of an MSA object.

shorttitle : str

Define the shorttitle of the html-page. If no title is provided, the default title SCA will be used.

filename : str (default=””)

Define the name of the output file. If no name is defined, the name of the input file will be taken as a default.

template : str (default=””)

The path to the template file. If no name is defined, the basic template will be used. The basic template currently used can be found under lingpy/data/templates/msa2html.html.

Notes

The coloring of sound segments with respect to the sound class they belong to is based on the definitions given in the color Model. It can easily be changed and adapted.

Examples

Load the libary.

>>> from lingpy import *

Load an msq-file from the test-sets.

>>> msa = MSA('harry.msq')

Align the data progressively and carry out a check for swapped sites.

>>> msa.prog_align()
>>> msa.swap_check()
>>> print(msa)
w    o    l    -    d    e    m    o    r    t
w    a    l    -    d    e    m    a    r    -
v    -    l    a    d    i    m    i    r    -

Save the data to the file harry.msa.

>>> msa.output('msa',filename='harry')

Save the msa-object as html.

>>> msa.output('html',filename='harry')
lingpy.convert.html.msa2tex(infile, template='', filename='', **keywords)

Convert an MSA to a tabular representation which can easily be used in LaTeX documents.

lingpy.convert.html.psa2html(infile, **kw)

Function converts a PSA-file into colored html-format.

lingpy.convert.html.string2html(taxon, string, swaps=[], tax_len=None)

Function converts an (aligned) string into colored html-format.

@deprecated

lingpy.convert.html.tokens2html(string, swaps=[], tax_len=None)

Function converts an (aligned) string into colored html-format.

Notes

This function is currently not used by any other program. So it might be useful to just deprecate it.

@deprecated

lingpy.convert.plot module

Module provides functions for the transformation of text data into visually appealing format.

lingpy.convert.plot.plot_concept_evolution(scenarios, tree, fileformat='pdf', degree=90, **keywords)

Plot the evolution according to the MLN method of all words for a given concept.

Parameters

tree : str

A tree representation in Newick format.

fileformat : str (default=”pdf”)

A valid fileformat according to Matplotlib.

degree : int (default=90)

The degree by which the tree is drawn. 360 yields a circular tree, 180 yields a tree filling half of the space of a circle.

lingpy.convert.plot.plot_gls(gls, treestring, degree=90, fileformat='pdf', **keywords)

Plot a gain-loss scenario for a given reference tree.

lingpy.convert.plot.plot_heatmap(wordlist, filename='heatmap', fileformat='pdf', ref='cogid', normalized=False, refB='', **keywords)

Create a heatmap-representation of shared cognates for a given wordlist.

Parameters

wordlist : lingpy.basic.wordlist.Wordlist

A Wordlist object containing cognate IDs.

filename : str (default=”heatmap”)

Name of the file to which the heatmap will be written.

fileformat : str (default=”pdf”)

A regular matplotlib-fileformat (pdf, png, pgf, svg).

ref : str (default=”cogid’)

The name of the column that contains the cognate identifiers.

normalized : {bool str} (default=True)

If set to c{False}, don’t normalize the data. Otherwise, select the normalization method, choose between:

  • “jaccard” for the Jaccard-distance (see Bategelj1995 for details), and

  • “swadesh” for traditional lexicostatistical calculation of shared cognate percentages.

cmap : matplotlib.cm (default=matplotlib.cm.jet)

The color scheme to be used for the heatmap.

steps : int (default=5)

The number of steps in which names of taxa will be written to the axes.

xrotation : int (default=45)

The rotation of the taxon-names on the x-axis.

colorbar : bool (default=True)

Specify, whether a colorbar should be added to the plot.

figsize : tuple (default=(10,10))

Specify the size of the figure.

tree : str (default=’’)

A tree passed for the taxa in Newick-format. If no tree is specified, the method looks for a tree object in the Wordlist.

Notes

This function plots shared cognate percentages.

lingpy.convert.plot.plot_tree(treestring, degree=90, fileformat='pdf', root='root', **keywords)

Plot a Newick tree to PDF or other graphical formats.

Parameters

treestring : str

A string in Newick format.

degree : int

Determine the degree of the tree (this determines how “circular” the tree will be).

fileformat : str (default=”pdf”)

Select the fileformat to which the tree shall be written.

filename : str

Determine the name of the file to which the data shall be written. Defaults to a timestamp.

figsize : tuple (default=(10,10))

Determine the size of the figure.

lingpy.convert.strings module

Basic functions for the conversion of Python-internal data into strings.

lingpy.convert.strings.matrix2dst(matrix, taxa=None, stamp='', filename='', taxlen=10, comment='#')

Convert matrix to dst-format.

Parameters

taxa : {None, list}

List of taxon names corresponding to the distances. Make sure that you only use alphanumeric characters and the understroke for assigning the taxon names. Especially avoid the usage of brackets, since this will confuse many phylogenetic programs.

stamp : str (default=’’)

Convenience stamp passed as a comment that can be used to indicate how the matrix was created.

filename : str

If you specify a filename, the data will be written to file.

taxlen : int (default=10)

Indicate how long the taxon names are allowed to be. The Phylip package only allows taxon names consisting of maximally 10 characters. Other packages, however, allow more. If Phylip compatibility is not important for you and you just want to allow for as long taxon names as possible, set this value to 0.

comment : str (default = ‘#’)

The comment character to be used when adding additional information in the “stamp”.

Returns

output : {str or file}

Depending on your settings, this function returns a string in DST (=Phylip) format, or a file containing the string.

lingpy.convert.strings.msa2str(msa, wordlist=False, comment='#', _arange='{stamp}{comment}\n{meta}{comment}\n{body}', merge=False)

Function converts an MSA object into a string.

lingpy.convert.strings.multistate2nex(taxa, matrix, filename='', missing='?')

Convert the data in a given wordlist to NEXUS-format for multistate analyses in PAUP.

Parameters

taxa : list

The list of taxa that shall be written to file.

matrix : list

The multi-state matrix with the first dimension indicating the taxa, and the second their states.

filename : str (default=””)

If not specified, the filename of the Wordlist will be taken, otherwise, it specifies the name of the file to which the data will be written.

lingpy.convert.strings.pap2csv(taxa, paps, filename='')

Write paps created by the Wordlist class to a csv-file.

lingpy.convert.strings.pap2nex(taxa, paps, missing=0, filename='', datatype='STANDARD')

Function converts a list of paps into nexus file format.

Parameters

taxa : list

List of taxa.

paps : {list, dict}

A two-dimensional list with the first dimension being identical to the number of taxa and the second dimension being identical to the number of paps. If a dictionary is passed, each key represents a given pap. The following two structures will thus be treated identically:

>>> paps = [[1,0],[1,0],[1,0]] # two languages, three paps
>>> paps = {1:[1,0], 2:[1,0], 3:[1,0]} # two languages, three paps

missing : {str, int} (default=0)

Indicate how missing characters are represented in the original data.

lingpy.convert.strings.scorer2str(scorer)

Convert a scoring function to a string.

lingpy.convert.strings.write_nexus(wordlist, mode='mrbayes', filename='mrbayes.nex', ref='cogid', missing='?', gap='-', custom=None, custom_name='lingpy', commands=None, commands_name='mrbayes')

Write a nexus file for phylogenetic analyses.

Parameters

wordlist : lingpy.basic.wordlist.Wordlist

A Wordlist object containing cognate IDs.

mode : str (default=”mrbayes”)

The name of the output nexus style. Valid values are:
  • ‘MRBAYES’: a MrBayes formatted nexus file.

  • ‘SPLITSTREE’: a SPLITSTREE formatted nexus file.

  • ‘BEAST’: a BEAST formatted nexus file.

  • ‘BEASTWORDS’: a BEAST formatted nexus for word-partitioned

    analyses.

  • ‘TRAITLAB’: a TRAITLab formatted nexus.

filename : str (default=None)

Name of the file to which the nexus file will be written. If set to c{None}, then this function will not write the nexus ontent to a file, but simply return the content as a string.

ref: str (default=”cogid”) :

Column in which you store the cognate sets in your data.

gap : str (default=”-“)

The symbol for gaps (not relevant for linguistic analyses).

missing : str (default=”?”)

The symbol for missing characters.

custom : list {default=None)

This information allows to add custom information to the nexus file, like, for example, the structure of the characters, their original concept, or their type, and it will be written into a custom block in the nexus file. The name of the custom block can be specified with help of the custom_name keyword. The content is a list of strings which will be written line by line into the custom block.

custom_name : str (default=”lingpy”)

The name of the custom block which will be written to the file.

commands : list (default=None)

If specified, will write an additional block containing commands for phylogenetic software. The commands are passed as a list, containing strings. The name of the block is given by the keywords commands_name.

commands_name : str (default=”mrbayes”)

Determines how the block will be called to which the commands will be written.

Returns

nexus : str

A string containing nexus file output

lingpy.convert.tree module

Functions for tree calculations and working with trees.

lingpy.convert.tree.nwk2tree_matrix(newick)

Convert a newick file to a tree matrix.

Notes

This is an additional function that can be used for plots with help of matplotlibs functions. The tree_matrix is compatible with those matrices that scipy’s linkage functions create.

Module contents

Package provides different methods for file conversion.