augur.utils module

exception augur.utils.InvalidTreeError

Bases: Exception

Represents an error loading a phylogenetic tree from a filename.

augur.utils.annotate_parents_for_tree(tree)

Annotate each node in the given tree with its parent.

Examples

>>> import io
>>> tree = Bio.Phylo.read(io.StringIO("(A, (B, C))"), "newick")
>>> not any([hasattr(node, "parent") for node in tree.find_clades()])
True
>>> tree = annotate_parents_for_tree(tree)
>>> tree.root.parent is None
True
>>> all([hasattr(node, "parent") for node in tree.find_clades()])
True

augur.utils.augur(shell=True)

Locate how to re-invoke ourselves (_this_ specific Augur).

This function returns the appropriate command to re-invoke the current Augur installation, useful for subprocess calls that need to launch Augur commands. The format of the returned command depends on the shell parameter, which mirrors the identically named parameter to subprocess.run.

Parameters:: shell (bool) – Controls the format of the returned command. When True, returns a string suitable for shell=True in subprocess.run. When False, returns a list of arguments suitable for shell=False in subprocess.run. Default is True.
Return type:: Union[str, List[str]]

augur.utils.available_cpu_cores(fallback=1)

Returns the number (an int) of CPU cores available to this process, if determinable, otherwise the number of CPU cores available to the computer, if determinable, otherwise the fallback number (which defaults to 1).

Return type:: int

augur.utils.first_line(text): Returns the first line of the given text, ignoring leading and trailing whitespace.

augur.utils.genome_features_to_auspice_annotation(features, ref_seq_name=None, assert_nuc=False)

Parameters:

features (dict) – keys: feature names, values: Bio.SeqFeature.SeqFeature objects
ref_seq_name (str (optional)) – Exported as the seqid for each feature. Note this is unused by Auspice
assert_nuc (bool (optional)) – If true, one of the feature key names must be “nuc”

Returns:

annotations – See schema-annotations.json for the schema this conforms to

Return type:

dict

augur.utils.get_augur_version(): Returns a string of the current augur version.

augur.utils.get_json_name(args, default=None)

augur.utils.get_parent_name_by_child_name_for_tree(tree): Return dictionary mapping child node names to parent node names

augur.utils.json_to_tree(json_dict, root=True, parent_cumulative_branch_length=None)

Returns a Bio.Phylo tree corresponding to the given JSON dictionary exported by tree_to_json.

Assigns links back to parent nodes for the root of the tree.

Examples

Test opening a JSON from augur export v1.

>>> import json
>>> json_fh = open("tests/data/json_tree_to_nexus/flu_h3n2_ha_3y_tree.json", "r")
>>> json_dict = json.load(json_fh)
>>> tree = json_to_tree(json_dict)
>>> tree.name
'NODE_0002020'
>>> len(tree.clades)
2
>>> tree.clades[0].name
'NODE_0001489'
>>> hasattr(tree, "attr")
True
>>> "dTiter" in tree.attr
True
>>> tree.clades[0].parent.name
'NODE_0002020'
>>> tree.clades[0].branch_length > 0
True

Test opening a JSON from augur export v2.

>>> json_fh = open("tests/data/zika.json", "r")
>>> json_dict = json.load(json_fh)
>>> tree = json_to_tree(json_dict)
>>> hasattr(tree, "name")
True
>>> len(tree.clades) > 0
True
>>> tree.clades[0].branch_length > 0
True

Branch lengths should be the length of the branch to each node and not the length from the root. The cumulative branch length from the root gets its own attribute.

>>> tip = [tip for tip in tree.find_clades(terminal=True) if tip.name == "USA/2016/FLWB042"][0]
>>> round(tip.cumulative_branch_length, 6)
0.004747
>>> round(tip.branch_length, 6)
0.000186

augur.utils.load_features(*args, **kwargs)

augur.utils.load_mask_sites(mask_file)

Load masking sites from either a BED file or a masking file.

Parameters:: mask_file (str) – Path to the BED or masking file
Returns:: Sorted list of unique zero-indexed sites
Return type:: list of int

augur.utils.nthreads_value(value): Argument value validation and casting function for –nthreads.

augur.utils.parse_genes_argument(input)

augur.utils.read_bed_file(bed_file)

Read a BED file and return a list of excluded sites.

This function attempts to parse the given file as a BED file, based on the specification at <https://genome.ucsc.edu/FAQ/FAQformat.html#format1>, using the following rules:

BED files may start with one or more optional header lines
Header lines must begin with one of “browser”, “chrom”, “track” — this comparison is done case-insensitively. Note that “chrom” is not recognized by the above standard or bedtools but is included because it has historically been supported and is frequently used in the wild
Any line starting with “#” is treated as a comment line, and skipped completely
Once data (non-header) lines appear in the file, header lines are no longer allowed
Data lines have a number of fields, but we are only interested in the first three, which are mandatory in BED files: chrom, chromStart, and chromEnd. All fields beyond these first three are ignored
The values of the chromStart and chromEnd field must be integer strings
The value in the chrom field must match for all data lines – this is an augur-specific requirement, not something that arises out of the format spec

Any failure to conform to the above rules will raise an error.

Parameters:: bed_file (str) – Path to the BED file
Returns:: Sorted list of unique zero-indexed sites
Return type:: list of int

augur.utils.read_colors(overrides=None, use_defaults=True)

augur.utils.read_entries(*files, comment_char='#')

Reads entries (one per line) from one or more plain text files.

Entries can be commented with full-line or inline comments. For example, the following is a valid file:

# this is a comment at the top of the file
strain1  # exclude strain1 because it isn't sequenced properly
strain2
  # this is an empty line that will be ignored.

Parameters:: files (iterable of str) – one or more names of text files with one entry per line
Returns:: lines from the given input files
Return type:: set

augur.utils.read_lat_longs(overrides=None, use_defaults=True)

augur.utils.read_mask_file(mask_file)

Read a masking file and return a list of excluded sites.

Masking files have a single masking site per line, either alone or as the second column of a tab-separated file. These sites are assumed to be one-indexed, NOT zero-indexed. Incorrectly formatted lines will be skipped.

Parameters:: mask_file (str) – Path to the masking file
Returns:: Sorted list of unique zero-indexed sites
Return type:: list of int

augur.utils.read_node_data(fnames, tree=None, validation_mode=ValidationMode.ERROR)

augur.utils.read_tree(fname, min_terminals=3)

Safely load a tree from a given filename or raise an error if the file does not contain a valid tree.

Parameters:

fname (str) – name of a file containing a phylogenetic tree
min_terminals (int) – minimum number of terminals required for the parsed tree as a sanity check on the tree

Raises:

InvalidTreeError – If the given file exists but does not seem to contain a valid tree format.

Returns:

BioPython tree instance

Return type:

Bio.Phylo.BaseTree.Tree

augur.utils.write_augur_json(data, file)

Write data as JSON to the given file with Augur version info.

This is a simplified wrapper around write_json() that adds the Augur version as a top-level key “generated_by”.

Unlike write_json(), output is not minified by default to preserve backwards-compatible behavior for commands that produce node data JSONs. Minification can be forced via the AUGUR_MINIFY_JSON environment variable.

augur.utils.write_json(*args, **kwargs)