The PhyloPandas DataFrame¶
The phylopandas dataframe is the core datastructure in this package. It defines a set of columns (or grammar) for phylogenetic data. A few advantages of defining such a grammar is: 1) we can leverage powerful+interactive visualization tools like Vega and 2) we standardize phylogenetic data in a familiar format.
Columns of a Phylopandas DataFrame¶
When reading sequence data, the following information will be stored on the dataframe.
sequence
: DNA or protein sequence.id
: user defined label or identifier.description
: user defined description.
When reading tree data, the following information will be stored on the dataframe.
type
: label describing the type of node; either “leaf” or “node”.parent
: label of parent node.branch_length
: distance from parent node.
PhyloPandas indexes each sequence using a randomly generated 10 character key.
If reading tree data from a PhyloPandas DataFrame containing sequence data, the two dataframes will be merged on the randomly generated index (unless otherwise specified).
If reading sequence data from a PhyloPandas DataFrae containing tree data, the two dataframes will be merged on the randomly generated index (unless otherwise specified).