4.1.5. Generating and analyzing a hopping graph — hop.graph

Interprete the high density sites as graph (‘transport graph’), with the sites as vertices and transitions (sampled by the simulation) as edges. The graph is directed.

Each edge (transition) is decorated with the dominant transition rate, the number of events seen, and an instance of fit_func, which represents the fitted function to the survival times.

Each vertex (site) is decorated with the average residency time (and stdev, N).

Typical use of the module:

TN = TransportNetwork(hoppingTrajectory,density)
hopgraph = TN.HoppingGraph()
hopgraph.save('hopgraph')

The basic object is the hop.graph.HoppingGraph; see its documentation for further analysis methods.

4.1.5.1. Classes and functions

class hop.graph.CombinedGraph(g0=None, g1=None, filename=None)

Hybrid graph between hop graphs that share common nodes.

equivalent_sites_stats(graphnumber, elabels, equivalence=True)

Print statistics about one or a list of equivalence sites for the numbered graph.

CombinedGraph.equivalent_sites_stats(graphnumber,elabels)

Arguments:

graphnumber index into CombinedGraph.graphs (typically, 0 or 1) elabels single label or list of labels of equivalence sites

(without a ‘*’ if the default identifier is used)
equivalence True: interprete elabels as equivalence labels
False: elabels are labels local to the graph (as used
in the output of this method)
export(igraph, filename=None, format='XGMML', imageformat=None, use_filtered_graph=True)

Layout the combined graph and highlight the chosen graph.

h.export(igraph=0)

Arguments:

graph 0 or 1, selects which graph is to be highlighted filename name for the output files; appropriate suffixes are added

automatically

format XGMML or dot imageformat graphics output format (png, jpg, ps, svg, … see below) use_filtered_graph

By default, the filtered graph (see the filter() method) is plotted. If set to False then the original HoppingGraph is used instead.

Common nodes are always highlighted in red and shown with the common label. Nodes and edges belonging to the selected graph are shown in black; the other graph is only shown in light gray.

The graph is only written to an image file if an image format is supplied. See https://networkx.lanl.gov/reference/pygraphviz/pygraphviz.agraph.AGraph-class.html#draw for possible output formats but png, jpg, ps are safe bets.

Format:
XGMML http://www.cs.rpi.edu/~puninj/XGMML/draft-xgmml.html#Intro and
GML http://www.infosun.fim.uni-passau.de/Graphlet/GML/

dot See http://graphviz.org/doc/info/attrs.html for attributes.

Note: On Mac OS X 10.3.9+fink the pygraphviz rendering is buggy and does not include node labels. Simply use the exported .dot file and use Mac OS X graphviz from http://www.pixelglow.com/graphviz/

export3D(**kwargs)

Export pdb and psf file for visualization in 3D.

>>> h.export3D()
Uses h.site_properties if it exists.
>>> h.export3D(density)
Uses a (hopefully matching) Density object to pull in site_properties.
Arguments:

density hop.sitemap.Density with full site_properties filename prefix for output files: <filename>.psf and <filename>.pdb use_filtered_graph

define a filtered graph with h.filter() first

The method writes a psf and a pdb file from the graph, suitable for visualization in, for instance, VMD.

Sites are represented as residues of resname ‘NOD’; each site is marked by one ‘ATOM’ (of type CA) at the center of geometry of the site. Edges are bonds between those pseudo atoms.

#Currently: B-factor 1 if common site label exist, 0 otherwis # occupancy: avg site occupancy # (but this should become customizable)

One should use a filtered graph with the bulk site removed for visualization.

Bugs: * with a filtered graph, the degree is the one of the filtered

graph and not of the real underlying graph
  • cannot yet select what to display in B-factor and occupancy field: choose from: [‘identity’,’occupancy’,’degree’,’volume’]
is_connected(igraph, n1, n2)

Return True if nodes n1 and n2 in graph igraph are connected.

load(filename=None)

Reinstantiate CombinedGraph from a pickled CombinedGraph (from save()).

plot(igraph, filename=None, format='png', use_filtered_graph=True, label_sites=None, prog='neato', cmap=None, max_node_size=500, interactive=True, **drawargs)

Plot filtered graph using matplotlib.

Arguments:

igraph number of the graph (0 or 1) filename file to write to format any format that matplotlib allows and pdf use_filtered_graph

use a previously defined filtered graph (should be True)
label_sites {‘all’:False, ‘common’:True, ‘none’:False} switches that determine
which labels to add to the nodes
prog layout program, can be any of the graphviz programs
‘dot’,’neato’,’twopi’,’circo’,’fdp’,’nop’
cmap matplotlib color map: nodes are colored by distance of the site from
the geometric center of all sites (excluding bulk)

max_node_size maximum node size (in point**2, q.v. matplotlib.scatter()) interactive True: display graph. False: only save to file (eg if no X11) **drawargs additional keyword arguments to networkx.draw() (q.v.)

eg ‘linewidths=(0.01,)’ for vanishing outlines.
plot_fits(**kwargs)

Plot survival time fit against data.

plot_fits(ncol=2)

The time values are taken to cover all measured tau.

ncol number of columns nrow number of rows per page plottype ‘linear’ or ‘log’ dt time step in ps; use value in self.trjdata[‘dt’] or 1ps use_filtered_graph

True: use the filtered graph (see filter()), False: use raw data.

directory save all pdf files under this directory format file format for plot (png,eps,pdf… depends on matplotlib) interactive False: do not display graphs on scren (default)

True: show graphs on screen, can be slow and probably requires ipython as your python shell

verbosity chattiness level

All N graphs are laid out in nrow x ncol grids on as many pages/figures as necessary.

The pages are written as eps/pdf files using a fixed filename in the given directory (‘survival_times’ by default).

site_properties

site_properties of the combined graph, indexed by node label.

stats(igraph, data=None)

Statistics for the hopping graph.

d = stats(igraph,[data=dict])

Without the data argument, the method just returns some interesting values gathered from the graph igraph and the density. If a data dictionary is given, then the raw data are loaded into the dict and can be processed further by histogramming etc.

Arguments:

igraph number of the graph data optional dictionary to hold raw data for

processing; modified by method
Returns:

d dictionary with expressive keys, holding the results

tabulate_k(**kwargs)

List of tuples (from, to, rate (in 1/ns), number of transitions).

class hop.graph.HoppingGraph(graph=None, properties=None, filename=None, trjdata=None, site_properties=None)

A directed graph that describes the average movement of molecules between different well-defined sites by treating the sites as nodes and transitions as edges.

Attributes:
graph
graph with edges; edges contain rates, fit functions, etc
properties
raw data for edges
trjdata
metadata of the original trajectory
site_properties
density-derived node properties, imported from hop.sitemap.Density
theta
dict of nodes with residence times (see compute_site_times())
occupancy_avg
average occupancy with standard deviation (see compute_site_occupancy())
occupancy_std
(numpy array)
Methods:
compute_site_occupancy()
Computes occupancies from the residency times theta and updates self.occupancy_avg and self.occupancy_std.
compute_site_times()
Computes residency time theta.
save()
save graph as a pickled file
load()
reinstantiate graph from saved file; typically just use the constructor with the filename argument
filter()
make a filtered graph for further analysis and visualization; most plot/export functions require a filtered graph
plot_fits()
plot fits of the survival time against the data
tabulate_k()
table of rate constants
export()
export graph as a dot file that can be used with graphviz
export3D()
export graph as a psf/pdb file combination for visualization in VMD

Properties for nodes are always stored as numpy arrays so that one can directly index with the node label (==site label), which is an integer from 0 to the number of nodes. Note that 0 is the interstitial (and only contains bogus data or None), and 1 is the bulk. The bulk site is often excluded from analysis because it is different in nature from the ‘real’ sites defined as high density regions.

Directed graph with edges containing the rate k_ji,number of observations and S(t) fit.

h = HoppingGraph(graph,properties) h = HoppingGraph(filename=’HoppingGraph.pickle’)
Arguments:
graph
networkx graph with nodes (i) and edges (i,j)
properties
dictionary of edges: For each edge e, properties contains a dictionary, which contains under the key ‘tau’ a list of observed waiting times tau_ji. nodes are also listed if they do not participate in a transition
trjdata

dictionary describing properties of the trajectory such as time step ‘dt’ or name of ‘dcd’ and ‘psf’.

Attributes that are in use:
dt
time between saved snapshots in ps
hoppsf
hopping trajectory psf file name
hopdcd
hopping trajectory dcd file name
density
pickle file of the density with the sites
totaltime
length of trajectory in ps[*]_
Not used:
time_unit
‘ps’
site_properties
list of site properties: hop.sitemap.Density.site_properties (add if you want graphs with mapped labels) (Really required for most things…!)

When the graph is built from edges and properties then the rate constants are calculated. For graphs with many hopping events this can take a long time (hours…).

The decorated and directed graph is accessible as HoppingGraph.graph

BUGS:
  • trjdata is required for full functionality but it is currently the user’s responsibility to fill it appropriately (although TransportNetwork.compute_residency_times() already adds some data)
  • site_properties are required and must be added with the constructor
compute_site_occupancy()

Computes occupancies from the residency times theta and updates self.occupancy.

compute_site_occupancy()

occupancy::
N_i
o[i] = 1/T Sum theta[i,k]
k=1

where T is the total trajectory time and the sum runs over all residency times that were recorded for the site i.

attributes:

self.occupancy_avg
numpy array with occupancies, site label == index
self.occupancy_std
numpy array with error estimates for occupancies (Delta = Delta(theta)/T; this is a biased estimate because Delta(theta) is calculated with N instead of N-1)
compute_site_times(verbosity=3)

Compute the ‘residency’ time of each water molecule on each site.

compute_site_times()

The ‘life time’ of a site i is computed as

theta[i] = <t[*,i]>

where t[*,i] stands for all waiting times t[j,i] for hops from site i to all other sites AND the waiting times t[i] of molecules that are not observed to leave site i.

  • The function updates self.theta[site] for each site with an

array of residency times (in ps).

  • The life times are stored in self.lifetime_avg[site] and self.lifetime_std[site]

TODO: * Maybe use the barrier time as well (or a portion thereof,

perhaps proportional to the barrier height (related to the kji) — rate theory??)
connectedness(n)

Return values that measure connectedness (can be used in occupancy field)

equivalent_sites_stats(elabels, equivalence=True)

Statistics about one or a list of equivalence sites.

g.equivalent_sites_stats(elabels,equivalence=True)

Arguments:

elabels a single label or a list of node labels equivalence True: interprete elabels as ‘equivalence labels’, i.e. the label

attached to a site common to two densities False: elabels are labels local to the graph
export(filename=None, format='XGMML', use_filtered_graph=True, use_mapped_labels=True)

Export the graph to a graph format or an image.

export(‘hopgraph’,format=’XGMML’,use_filtered_graph=True)

Arguments:
filename name for the output files; appropriate suffixes are added
automatically

format output format: graphs (XGMML or DOT) or image (png, jpg, ps, svg) use_filtered_graph

By default, the filtered graph (see the filter() method) is plotted. If set to False then the original HoppingGraph is used instead.
use_mapped_labels
If site_properties is provided then each node that has been identified to exist in a reference network is coloured black and the mapped label is printed instead of the graph label.
export3D(density=None, filename=None, use_filtered_graph=True)

Export pdb and psf file for visualization in 3D.

>>> h.export3D()
Uses h.site_properties if it exists.
>>> h.export3D(density)
Uses a (hopefully matching) Density object to pull in site_properties.
Arguments:

density hop.sitemap.Density with full site_properties filename prefix for output files: <filename>.psf and <filename>.pdb use_filtered_graph

define a filtered graph with h.filter() first

The method writes a psf and a pdb file from the graph, suitable for visualization in, for instance, VMD.

Sites are represented as residues of resname ‘NOD’; each site is marked by one ‘ATOM’ (of type CA) at the center of geometry of the site. Edges are bonds between those pseudo atoms.

#Currently: B-factor 1 if common site label exist, 0 otherwis # occupancy: avg site occupancy # (but this should become customizable)

One should use a filtered graph with the bulk site removed for visualization.

Bugs: * with a filtered graph, the degree is the one of the filtered

graph and not of the real underlying graph
  • cannot yet select what to display in B-factor and occupancy field: choose from: [‘identity’,’occupancy’,’degree’,’volume’]
filename(filename=None, ext=None, set_default=False, use_my_ext=False)

Supply a file name for the object.

fn = filename() —> <default_filename> fn = filename(‘name.ext’) —> ‘name’ fn = filename(ext=’pickle’) —> <default_filename>’.pickle’ fn = filename(‘name.inp’,’pdf’) –> ‘name.pdf’ fn = filename(‘foo.pdf’,ext=’png’,use_my_ext=True) –> ‘foo.pdf’

The returned filename is stripped of the extension (use_my_ext=False) and if provided, another extension is appended. Chooses a default if no filename is given. Raises a ValueError exception if no default file name is known.

If set_default=True then the default filename is also set.

use_my_ext=True lets the suffix of a provided filename take priority over a default ext(tension).

filter(exclude=None)

Create a filtered version of the graph.

For looking at most things: >>> h.filter(exclude={‘outliers’:True})

For looking at exchange rates and plotting: >>> h.filter(exclude={‘outliers’:True, ‘Nmin’:5, ‘unconnected’:True})

For export3D do not use the bulk site: >>> h.filter(exclude={‘outliers’:True,’bulk’:True})

This method makes a copy of the hopping graph and applies the filter rules to the copy. Other output functions use this copy if it exists.

exclude dict of components to exclude. May contain
{‘outliers’:True, ‘Nmin’:integer,
‘bulk’: True, ‘unconnected’:True}

If outliers == True then all edges from the ‘outlier’ node are deleted previous to displaying the graph. Those edges correspond to particles starting in a region not covered by the intial histogram boundaries and enter a mapped site at a later point in time.

With Nmin, any node that has fewer than Nmin transition is discarded.

unconnected == True finaly filters all nodes that have no edges left

from_site(edge)

Returns the originating site of hop.

internal_sites()

Returns list of sites that have no connection to the bulk.

is_connected(n1, n2)

True if node n1 has any connection to the site n2.

is_from_bulk(edge)

True if the edge originated in the bulk.

is_internal(n)

True if site n has no connection to the bulk.

is_isolated(n)

True if site n has no connections to other sites (ie its degree equals 0).

isolated_sites()

Returns list of sites that have no other connections.

load(filename=None)

Reinstantiate HoppingGraph from a pickled HoppingGraph (from save()).

number_of_hops(edge)

Number of transitions recorded.

plot_fits(ncol=2, nrow=3, dt=None, plottype='log', use_filtered_graph=True, directory='survival_times', format='png', interactive=False, verbosity=3)

Plot survival time fit against data.

plot_fits(ncol=2)

The time values are taken to cover all measured tau.

ncol number of columns nrow number of rows per page plottype ‘linear’ or ‘log’ dt time step in ps; use value in self.trjdata[‘dt’] or 1ps use_filtered_graph

True: use the filtered graph (see filter()), False: use raw data.

directory save all pdf files under this directory format file format for plot (png,eps,pdf… depends on matplotlib) interactive False: do not display graphs on scren (default)

True: show graphs on screen, can be slow and probably requires ipython as your python shell

verbosity chattiness level

All N graphs are laid out in nrow x ncol grids on as many pages/figures as necessary.

The pages are written as eps/pdf files using a fixed filename in the given directory (‘survival_times’ by default).

rate(edge)

Returns the fastest rate on an edge, in ns^-1

rates(n, use_filtered_graph=True)

Returns k_tot, k_in, k_out (and N_*) for site n (bulk rates omitted from k).

dictionary = rates(n,use_filtered=True)

k_in = sum_j k_nj (> 0) j<>bulk k_out = sum_j k_jn (< 0) j<>bulk k_tot = k_in + k_out

Note that k_tot should be ~0 if a bulk rate is included because the graph should obey detailed balance.

save(filename=None)

Save HoppingGraph as a pickled python object.

select_graph(use_filtered_graph)

Returns filtered graph for True argument, or the raw graph otherwise)

show_rates(filename=None)

Print the rates (in 1/ns) between sites, and the total number of observations.

show_rates(file=filename)

By default, prints to stdout but if file = filename then filename is opened and data are written to the file.

A description of the fit function used to obtain the rate is also printed in the last column.

Only the “dominant” rate is shown; see the fit_func description for cases when two rates were computed.

show_site(sites, use_filtered_graph=True)

Display data about sites (list of site labels or single site).

show_total_rates(use_filtered_graph=True)

Display total rates for all nodes (excluding bulk –> site contributions).

site_properties

Site_properties, indexed by node label. Setting this attribut also updates self.equivalent_sites_index.

stats(data=None)

Statistics for the hopping graph.

stats([data=dict]) –> dict

Without the data argument, the method just returns some interesting values gathered from the graph and the density. If a data dictionary is given, then the raw data are loaded into the dict and can be processed further by histogramming etc.

Arguments:
data

optional dictionary to hold raw data for processing; modified by method

Returns:

dictionary with expressive keys, holding the results

tabulate_k()

List of tuples (from, to, rate (in 1/ns), number of transitions).

to_site(edge)

Returns the site to which a hop is directed.

waitingtime_fit(edge)

Returns the fit function for the edge’s waiting time distribution.

write_psf(graph, props, filename=None)

Pseudo psf with nodes as atoms and edges as bonds

class hop.graph.TransportNetwork(traj, density=None, sitelabels=None)

A framework for computing graphs from hopping trajectories.

The unit of time is ps.

The TransportNetwork is an intermediate data structure that is mainly used in order to build a HoppingGraph with the TransportNetwork.HoppingGraph() method.

Setup a transport graph from a hopping trajectory instance.

::
hops = hop.trajectory.HoppingTrajectory(hopdcd=’whop.dcd’,hoppsf=’whop.psf’) tn = TransportNetwork(hops)
HoppingGraph(verbosity=3)

Compute the HoppingGraph from the data and return it.

compute_site_occupancy()

Computes occupancies from the residency times theta and updates self.occupancy.

occupancy::
N_i
o[i] = 1/T Sum theta[i,k]
k=1

where T is the total trajectory time and the sum runs over all residency times that were recorded for the site i.

Attributes:
self.occupancy
numpy array with occupancies, site label == index
self.occupancy_error
numpy array with error estimates for occupancies (Delta = Delta(theta)/T; this is a biased estimate because Delta(theta) is calculated with N instead of N-1)
compute_site_times(verbosity=3)

Compute the ‘residency’ time of each water molecule on each site.

The ‘site time’ of a site i is computed as:

theta[i] = 1/T_sim ( Sum_j tau[j,i] + Sum tau[i] )

tau[j,i] is the waiting time for hops from i to j. tau[i] is the waiting time for molecules that are not observed to leave site i.

The function updates self.theta[site] for each site with an array of residency times (in ps).

It uses the residency times and thus requires compute_residency_times() was run previously.

TODO:Maybe use the barrier time as well (or a portion thereof, perhaps proportional to the barrier height (related to the kji) — rate theory??)
export(filename=None, format='png', exclude_outliers=False)

Export the graph as dot file and as an image.

export(filename)

See https://networkx.lanl.gov/reference/pygraphviz/pygraphviz.agraph.AGraph-class.html#draw for possible output formats.

See http://graphviz.org/doc/info/attrs.html for attributes.

Note: On Mac OS X 10.3.9+fink the pygraphviz rendering is buggy and does not include node labels. Simply use the exported .dot file and use Mac OS X graphviz from http://www.pixelglow.com/graphviz/

graph_alltransitions()

Constructs the graph that contains all transitions in the trajectory.

Populates TransportGraph.graph with a graph that contains all sites and one edge for each transition that was observed in the trajectory. Useful for an initial appraisal of the complexity of the problem.

Warning

Erases any previous contents of graph.

plot_residency_times(filename, bins=None, exclude_outliers=True)

Plot histograms of all sites

plot_residency_times(‘sitetime.eps’)

pylab always writes the figure to the named file. If pylab is already running, display the graph with pylab.show().

The histograms are normalized and the time values are the left edges of the bins.

If bins=None then the number of bins is determined heuristically.

plot_site_occupancy(filename, bins=10, exclude_sites=[0, 1])

Plot site occupancy (from compute_site_occupancy().

plot_site_occupancy(filename,exclude_sites=[0,1])

filename name of output file bins bins for histogram (see numpy.histogram) exclude_site list of site labels which are NOT plotted. Typically,

exclude interstitial and bulk.
hop.graph.Unitstep(x, x0)
Heaviside step function
/ 1 if x >= x0
Unitstep(x,x0) == Theta(x - x0) = { 0.5 if x == x0
0 if x < x0

This is a numpy ufunc.

CAVEAT:If both x and x0 are arrays of length > 1 then weird things are going to happen because of broadcasting. Using nD arrays can also lead to surprising results.
See also:http://mathworld.wolfram.com/HeavisideStepFunction.html
class hop.graph.fitExp(x, y)

y = f(x) = exp(-p[0]*x)

f_factory()

Stub for fit function factory, which returns the fit function. Override for derived classes.

initial_values()

List of initital guesses for all parameters p[]

class hop.graph.fitExp2(x, y)

y = f(x) = p[0]*exp(-p[1]*x) + (1-p[0])*exp(-p[2]*x)

f_factory()

Stub for fit function factory, which returns the fit function. Override for derived classes.

initial_values()

List of initital guesses for all parameters p[]

class hop.graph.fit_func(x, y)

Fit a function f to data (x,y) using the method of least squares.

Attributes:

parameters list of parameters of the fit

f_factory()

Stub for fit function factory, which returns the fit function. Override for derived classes.

fit(x)

Applies the fit to all x values

initial_values()

List of initital guesses for all parameters p[]

class hop.graph.fitlin(x, y)

y = f(x) = p[0]*x + p[1]

f_factory()

Stub for fit function factory, which returns the fit function. Override for derived classes.

initial_values()

List of initital guesses for all parameters p[]

hop.graph.survivalfunction(waitingtimes, block_w=200, block_t=1000)

Returns the survival function S(t), defined by a list of waiting times.

survival([t0, t1, …, tN]) –> S(t)

S(t) is a function that gives the fractional number of particles that have not yet left the site after time t. It is 1 at t=0 and decays to 0.

Arguments:
waitingtimes

sequence of the waiting times from the simulations

block_w

reduce memory consumption by working on chunks of the waiting times of size <block_w>; reduce block_w if the code crashes with :Exception:`MemoryError`.

block_t

chunk input function arguments into blocks of size block_t

TODO:

Make S(t) an interpolation function: massive speedup and fewer memory problems