lingpy.thirdparty.cogent package

Submodules

lingpy.thirdparty.cogent.newick module

Newick format with all features as per the specs at:

http://evolution.genetics.washington.edu/phylip/newick_doc.html http://evolution.genetics.washington.edu/phylip/newicktree.html

ie:

Unquoted label underscore munging Quoted labels Inner node labels Lengths [ … ] Comments (discarded) Unlabeled tips

also:

Double quotes can be used. Spaces and quote marks are OK inside unquoted labels.

exception lingpy.thirdparty.cogent.newick.TreeParseError

Bases: ValueError

lingpy.thirdparty.cogent.newick.parse_string(text, constructor, **kw)

Parses a Newick-format string, using specified constructor for tree.

Calls constructor(children, name, attributes)

Note: underscore_unmunge, if True, replaces underscores with spaces in the data that’s read in. This is part of the Newick format, but it is often useful to suppress this behavior.

lingpy.thirdparty.cogent.tree module

Classes for storing and manipulating a phylogenetic tree.

These trees can be either strictly binary, or have polytomies (multiple children to a parent node).

Trees consist of Nodes (or branches) that connect two nodes. The Tree can be created only from a newick formatted string read either from file or from a string object. Other formats will be added as time permits.

Tree can:
  • Deal with either rooted or unrooted tree’s and can convert between these types.

  • Return a sub-tree given a list of tip-names

  • Identify an edge given two tip names. This method facilitates the statistical modelling by simplyifying the syntax for specifying sub-regions of a tree.

  • Assess whether two Tree instances represent the same topology.

Definition of relevant terms or abbreviations:
  • edge: also known as a branch on a tree.

  • node: the point at which two edges meet

  • tip: a sequence or species

  • clade: all and only the nodes (including tips) that descend from a node

  • stem: the edge immediately preceeding a clade

lingpy.thirdparty.cogent.tree.LoadTree(filename=None, treestring=None, tip_names=None, underscore_unmunge=False)

Constructor for tree.

Arguments, use only one of:
  • filename: a file containing a newick or xml formatted tree.

  • treestring: a newick or xml formatted tree string.

  • tip_names: a list of tip names.

Notes

Underscore_unmunging is turned off by default, although it is part of the Newick format. Set underscore_unmunge to True to replace underscores with spaces in all names read.

class lingpy.thirdparty.cogent.tree.PhyloNode(*args, **kwargs)

Bases: lingpy.thirdparty.cogent.tree.TreeNode

property Length
balanced()

Tree ‘rooted’ here with no neighbour having > 50% of the edges.

Notes

Using a balanced tree can substantially improve performance of the likelihood calculations. Note that the resulting tree has a different orientation with the effect that specifying clades or stems for model parameterisation should be done using the ‘outgroup_name’ argument.

bifurcating(constructor=None)
compareByPartitions(other, debug=False)
distance(other)

Returns branch length between self and other.

getDistances(endpoints=None)

The distance matrix as a dictionary.

Usage:

Grabs the branch lengths (evolutionary distances) as a complete matrix (i.e. a,b and b,a).

getNewick(with_distances=False, semicolon=True, escape_name=True)

Return the newick string for this tree.

Arguments:
  • with_distances: whether branch lengths are included.

  • semicolon: end tree string with a semicolon

  • escape_name: if any of these characters []’”(),:;_ exist in a

    nodes name, wrap the name in single quotes

NOTE: This method returns the Newick representation of this node and its descendents. This method is a modification of an implementation by Zongzhi Liu

prune()

Reconstructs correct tree after nodes have been removed.

Internal nodes with only one child will be removed and new connections and Branch lengths will be made to reflect change.

rootAtMidpoint()

return a new tree rooted at midpoint of the two tips farthest apart

this fn doesn’t preserve the internal node naming or structure, but does keep tip to tip distances correct. uses unrootedDeepcopy()

rootedAt(edge_name)

Return a new tree rooted at the provided node.

Usage:

This can be useful for drawing unrooted trees with an orientation that reflects knowledge of the true root location.

rootedWithTip(outgroup_name)

A new tree with the named tip as one of the root’s children

sameTopology(other)

Tests whether two trees have the same topology.

scaleBranchLengths(max_length=100, ultrametric=False)

Scales BranchLengths in place to integers for ascii output.

Warning: tree might not be exactly the length you specify.

Set ultrametric=True if you want all the root-tip distances to end up precisely the same.

setTipDistances()

Sets distance from each node to the most distant tip.

tipToTipDistances(endpoints=None, default_length=1)

Returns distance matrix between all pairs of tips, and a tip order.

Warning: .__start and .__stop added to self and its descendants.

tip_order contains the actual node objects, not their names (may be confusing in some cases).

totalDescendingBranchLength()

Returns total descending branch length from self

unrooted()

A tree with at least 3 children at the root.

unrootedDeepcopy(constructor=None, parent=None)
class lingpy.thirdparty.cogent.tree.TreeBuilder(mutable=False, constructor=<class 'lingpy.thirdparty.cogent.tree.PhyloNode'>)

Bases: object

createEdge(children, name, params, nameLoaded=True)

Callback for newick parser

edgeFromEdge(edge, children, params=None)

Callback for tree-to-tree transforms like getSubTree

exception lingpy.thirdparty.cogent.tree.TreeError

Bases: Exception

class lingpy.thirdparty.cogent.tree.TreeNode(Name=None, Children=None, Parent=None, Params=None, NameLoaded=True, **kwargs)

Bases: object

Store information about a tree node. Mutable.

Parameters:

Name: label for the node, assumed to be unique. Children: list of the node’s children. Params: dict containing arbitrary parameters for the node. NameLoaded: ?

property Parent

Accessor for parent.

If using an algorithm that accesses Parent a lot, it will be much faster to access self._parent directly, but don’t do it if mutating self._parent! (or, if you must, remember to clean up the refs).

ancestors()

Returns all ancestors back to the root. Dynamically calculated.

append(i)

Appends i to self.Children, in-place, cleaning up refs.

asciiArt(show_internal=True, compact=False, labels=False)

Returns a string containing an ascii drawing of the tree.

Parameters

show_internal: bool :

includes internal edge names.

compact: bool :

use exactly one line per tip.

labels: {bool, list} :

specify specific labels for all nodes in the tree.

Notes

The labels-keyword was added to the function by JML.

childGroups()

Returns list containing lists of children sharing a state.

In other words, returns runs of tip and nontip children.

compareByNames(other)

Equality test for trees by name

compareBySubsets(other, exclude_absent_taxa=False)

Returns fraction of overlapping subsets where self and other differ.

Other is expected to be a tree object compatible with PhyloNode.

Notes

Names present in only one of the two trees will count as mismatches: if you don’t want this behavior, strip out the non-matching tips first.

compareName(other)

Compares TreeNode by name

copy(memo=None, _nil=[], constructor='ignored')

Returns a copy of self using an iterative approach

copyRecursive(memo=None, _nil=[], constructor='ignored')

Returns copy of self’s structure, including shallow copy of attrs.

constructor is ignored; required to support old tree unit tests.

copyTopology(constructor=None)

Copies only the topology and labels of a tree, not any extra data.

Useful when you want another copy of the tree with the same structure and labels, but want to e.g. assign different branch lengths and environments. Does not use deepcopy from the copy module, so _much_ faster than the copy() method.

deepcopy(memo=None, _nil=[], constructor='ignored')

Returns a copy of self using an iterative approach

descendantArray(tip_list=None)

Returns numpy array with nodes in rows and descendants in columns.

A value of 1 indicates that the decendant is a descendant of that node/ A value of 0 indicates that it is not

Also returns a list of nodes in the same order as they are listed in the array.

tip_list is a list of the names of the tips that will be considered, in the order they will appear as columns in the final array. Internal nodes will appear as rows in preorder traversal order.

extend(items)

Extends self.Children by items, in-place, cleaning up refs.

getConnectingEdges(name1, name2)

returns a list of edges connecting two nodes

includes self and other in the list

getConnectingNode(name1, name2)

Finds the last common ancestor of the two named edges.

getDistances(endpoints=None)

The distance matrix as a dictionary.

Usage:

Grabs the branch lengths (evolutionary distances) as a complete matrix (i.e. a,b and b,a).

getEdgeNames(tip1name, tip2name, getclade, getstem, outgroup_name=None)

Return the list of stem and/or sub tree (clade) edge name(s). This is done by finding the common intersection, and then getting the list of names. If the clade traverses the root, then use the outgroup_name argument to ensure valid specification.

Arguments:
  • tip1/2name: edge 1/2 names

  • getstem: whether the name of the clade stem edge is returned.

  • getclade: whether the names of the edges within the clade are returned

  • outgroup_name: if provided the calculation is done on a version of the tree re-rooted relative to the provided tip.

Usage:

The returned list can be used to specify subtrees for special parameterisation. For instance, say you want to allow the primates to have a different value of a particular parameter. In this case, provide the results of this method to the parameter controller method setParamRule() along with the parameter name etc..

getEdgeVector()

Collect the list of edges in postfix order

getMaxTipTipDistance()

Returns the max tip tip distance between any pair of tips

Returns (dist, tip_names, internal_node)

getNewick(with_distances=False, semicolon=True, escape_name=True)

Return the newick string for this tree.

Arguments:
  • with_distances: whether branch lengths are included.

  • semicolon: end tree string with a semicolon

  • escape_name: if any of these characters []’”(),:;_ exist in a

    nodes name, wrap the name in single quotes

NOTE: This method returns the Newick representation of this node and its descendents. This method is a modification of an implementation by Zongzhi Liu

getNewickRecursive(with_distances=False, semicolon=True, escape_name=True)

Return the newick string for this edge.

Arguments:
  • with_distances: whether branch lengths are included.

  • semicolon: end tree string with a semicolon

  • escape_name: if any of these characters []’”(),:;_ exist in a

    nodes name, wrap the name in single quotes

getNodeMatchingName(name)
getNodeNames(includeself=True, tipsonly=False)

Return a list of edges from this edge - may or may not include self. This node (or first connection) will be the first, and then they will be listed in the natural traverse order.

getNodesDict()

Returns a dict keyed by node name, value is node

Will raise TreeError if non-unique names are encountered

getParamValue(param, edge)

returns the parameter value for named edge

getSubTree(name_list, ignore_missing=False, keep_root=False)

A new instance of a sub tree that contains all the otus that are listed in name_list.

ignore_missing: if False, getSubTree will raise a ValueError if name_list contains names that aren’t nodes in the tree

keep_root: if False, the root of the subtree will be the last common ancestor of all nodes kept in the subtree. Root to tip distance is then (possibly) different from the original tree If True, the root to tip distance remains constant, but root may only have one child node.

getTipNames(includeself=False)

return the list of the names of all tips contained by this edge

get_LCA(*nodes)

Find lowest common ancestor of a given number of nodes.

Notes

This function is supposed to yield the same output as lowestCommonAncestor does. It was added in order to overcome certain problems in the original function, resulting from attributes added to a PhyloNode-object that make the use at time unsecure. Furthermore, it works with an arbitrary list of nodes (including tips and internal nodes).

indexInParent()

Returns index of self in parent.

insert(index, i)

Inserts an item at specified position in self.Children.

isRoot()

Returns True if the current is a root, i.e. has no parent.

isTip()

Returns True if the current node is a tip, i.e. has no children.

isroot()

Returns True if root of a tree, i.e. no parent.

istip()

Returns True if is tip, i.e. no children.

iterNontips(include_self=False)

Iterates over nontips descended from self, [] if none.

include_self, if True (default is False), will return the current node as part of the list of nontips if it is a nontip.

iterTips(include_self=False)

Iterates over tips descended from self, [] if self is a tip.

lastCommonAncestor(other)

Finds last common ancestor of self and other, or None.

Always tests by identity.

lca(other)

Finds last common ancestor of self and other, or None.

Always tests by identity.

levelorder(include_self=True)

Performs levelorder iteration over tree

lowestCommonAncestor(tipnames)

Lowest common ancestor for a list of tipnames

This should be around O(H sqrt(n)), where H is height and n is the number of tips passed in.

makeTreeArray(dec_list=None)

Makes an array with nodes in rows and descendants in columns.

A value of 1 indicates that the decendant is a descendant of that node/ A value of 0 indicates that it is not

also returns a list of nodes in the same order as they are listed in the array

maxTipTipDistance()

returns the max distance between any pair of tips

Also returns the tip names that it is between as a tuple

nameUnnamedNodes()

sets the Data property of unnamed nodes to an arbitrary value

Internal nodes are often unnamed and so this function assigns a value for referencing.

nonTipChildren()

Returns direct children in self that have descendants.

nontips(include_self=False)

Returns nontips descended from self.

pop(index=- 1)

Returns and deletes child of self at index (default: -1)

postorder(include_self=True)

Performs postorder iteration over tree.

This is somewhat inelegant compared to saving the node and its index on the stack, but is 30% faster in the average case and 3x faster in the worst case (for a comb tree).

Zongzhi Liu’s slower but more compact version is:

def postorder_zongzhi(self):

stack = [[self, 0]] while stack:

curr, child_idx = stack[-1] if child_idx < len(curr.Children):

stack[-1][1] += 1 stack.append([curr.Children[child_idx], 0])

else:

yield stack.pop()[0]

pre_and_postorder(include_self=True)

Performs iteration over tree, visiting node before and after.

preorder(include_self=True)

Performs preorder iteration over tree.

prune()

Reconstructs correct topology after nodes have been removed.

Internal nodes with only one child will be removed and new connections will be made to reflect change.

reassignNames(mapping, nodes=None)

Reassigns node names based on a mapping dict

mapping : dict, old_name -> new_name nodes : specific nodes for renaming (such as just tips, etc…)

remove(target)

Removes node by name instead of identity.

Returns True if node was present, False otherwise.

removeNode(target)

Removes node by identity instead of value.

Returns True if node was present, False otherwise.

root()

Returns root of the tree self is in. Dynamically calculated.

sameShape(other)

Ignores lengths and order, so trees should be sorted first

separation(other)

Returns number of edges separating self and other.

setMaxTipTipDistance()

Propagate tip distance information up the tree

This method was originally implemented by Julia Goodrich with the intent of being able to determine max tip to tip distances between nodes on large trees efficiently. The code has been modified to track the specific tips the distance is between

setParamValue(param, edge, value)

set’s the value for param at named edge

siblings()

Returns all nodes that are children of the same parent as self.

Notes

Excludes self from the list. Dynamically calculated.

sorted(sort_order=[])

An equivalent tree sorted into a standard order. If this is not specified then alphabetical order is used. At each node starting from root, the algorithm will try to put the descendant which contains the lowest scoring tip on the left.

subset()

Returns set of names that descend from specified node

subsets()

Returns all sets of names that come from specified node and its kids

tipChildren()

Returns direct children of self that are tips.

tips(include_self=False)

Returns tips descended from self, [] if self is a tip.

traverse(self_before=True, self_after=False, include_self=True)

Returns iterator over descendants. Iterative: safe for large trees.

Notes

self_before includes each node before its descendants if True. self_after includes each node after its descendants if True. include_self includes the initial node if True.

self_before and self_after are independent. If neither is True, only terminal nodes will be returned.

Note that if self is terminal, it will only be included once even if self_before and self_after are both True.

This is a depth-first traversal. Since the trees are not binary, preorder and postorder traversals are possible, but inorder traversals would depend on the data in the tree and are not handled here.

traverse_recursive(self_before=True, self_after=False, include_self=True)

Returns iterator over descendants. IMPORTANT: read notes below.

Notes

traverse_recursive is slower than traverse, and can lead to stack errors. However, you _must_ use traverse_recursive if you plan to modify the tree topology as you walk over it (e.g. in post-order), because the iterative methods use their own stack that is not updated if you alter the tree.

self_before includes each node before its descendants if True. self_after includes each node after its descendants if True. include_self includes the initial node if True.

self_before and self_after are independent. If neither is True, only terminal nodes will be returned.

Note that if self is terminal, it will only be included once even if self_before and self_after are both True.

This is a depth-first traversal. Since the trees are not binary, preorder and postorder traversals are possible, but inorder traversals would depend on the data in the tree and are not handled here.

writeToFile(filename, with_distances=True, format=None)

Save the tree to filename

Arguments:
  • filename: self-evident

  • with_distances: whether branch lengths are included in string.

  • format: default is newick, xml is alternate. Argument overrides the filename suffix. All attributes are saved in the xml format.

lingpy.thirdparty.cogent.tree.cmp(a, b)
lingpy.thirdparty.cogent.tree.comb(items, n=None)

Yields each successive combination of n items.

items: a slicable sequence. n: number of items in each combination This version from Raymond Hettinger, 2006/03/23

Module contents

Simple py3-port of PyCogent’s (http://pycogent.sourceforge.net) Tree classes.