4.1.1. Defining solvation sites — hop.sitemap

Histogram positions of particles from a MD trajectory on a grid. Calculate the density, change units (both of the grid and of the density), save the density, export into 3D visualization formats, manipulate the density as a numpy array.

class hop.sitemap.Density(grid=None, edges=None, filename=None, dxfile=None, parameters=None, unit=None, metadata=None)

Class with an annotated density, i.e. additional information for each grid cell. Adds information about sites to the grid. A ‘site’ consists of all connected grid cells with a density >= threshold.

A site is defined as a set of at least ‘MINsite’ grid cells with density >= threshold that are located in each others’ first and second nearest neighbour shell (of 26 cells, on the cubic lattice). A site is labelled by an integer 1..N. The interstitial is labelled ‘0’. By default, a site may consist of a single grid cell (MINsite == 1) but this can be changed by setting the parameter MINsite to another number >1.

When neither grid nor edges are given then the density object can also be read from a pickled file (filename) or a OpenDX file (dxfile). In the latter case, care should be taken to properly set up the units and the isDensity parameter:

>>> g = Density(dxfile='bulk.dx',parameters={'isDensity':True,'MINsite':1},
                unit={'length':'Angstrom','density':'Angstrom^{-3}'}, ....)

Attributes:

grid density on a grid edges the lower and upper edges of the grid cells along the

three dimensions of the grid

map grid with cells labeled as sites (after label_sites()) sites list of sites: site 0 is the interstitial, then follows

the largest site, and then sites in decreasing order. Each site is a list of tuples. Each tuple is the index (i,j,k) into the map or grid.

graph NetworkX graph of the cells

unit physical units of various components P (default) values of parameters

Methods:

map_sites(threshold)
label all sites, defined by the threshold. The threshold value is stored with the object as the default. The default can be explicitly set as P[‘threshold’]

save(filename) save object.pickle load(filename) restore object.pickle (or use d=Density(filename=<filename>)) export() write density to a file for visualization export_map() write individual sites

Adds information about sites to the grid. Sites are all cells with a density >= threshold.

density = Density(kargs**)

Sets up a Grid with additional data, namely the site map The threshold is given as key-value pair in the parameters dictionary and is assumed to be in the same units as the density.

If the input grid is a histogram then it is transformed into a density.

When neither grid nor edges are given then the density object can also be read from a pickled file (filename) or a OpenDX file (dxfile). In the latter case, care should be taken to properly set up the units and the isDensity parameter if the dx file is a density:

>>> g = Density(dxfile='bulk.dx',parameters={'isDensity':True},
            unit={'length':'Angstrom','density':'Angstrom^{-3}'}, ....)
export3D(filename=None, site_labels='default')

Export pdb and psf file of site centres for interactive visualization.

>>> density.export3D()
Arguments:
filename prefix for output files:
<filename>.psf, <filename>.pdb, and <filename>.vmd

site_labels selects sites (See site_labels())

The method writes a psf and a pdb file from the site map, suitable for visualization in, for instance, VMD. In addition, a VMD tcl file is produced. When it is sourced in VMD then the psf and pdb are loaded and site labels are shown next to the sites.

Sites are represented as residues of resname ‘NOD’; each site is marked by one ‘ATOM’ (of type CA) at the center of geometry of the site.

Bulk and interstitial are always filtered from the list of sites because they do not have a well defined center.

export_map(labels='default', format='dx', directory=None, value='density', combined=False, verbosity=3)

Write sites as a density file for visualization.

export_map(**kwargs)

labels=’default’ Select the sites that should be exported. Can be
a list of numbers (site labels) or one of the keywords recognized by site_labels() (qv). The interstitial is always excluded.
combined=False True: write one file. False: write one file for each
site.

format=’dx’ Only dx format supported directory=’site_maps’

Files are created in new directory, ‘site_maps’ by default. File names are generated and indexed with the label of the site. By default, ‘site_maps’ is located in the same directory as the default filename.
value= ‘density’ Writes the actual density in the site.
‘threshold’ The densities have the threshold value wherever the
site is defined. Note that the interstitial (label = 0) is also written.

<float> Writes the value <float> into the site.

verbosity=3 Set to 0 to disable status messages.

Quick hack to write out sites. Each site can be written as a separate density file (combined=False) so that one can distinguish them easily in say VMD. Display with

vmd site_maps/*.dx
find_equivalence_sites_with(reference, fmt='%d*', update_reference=True, use_ref_equivalencesites=False, verbosity=0, equivalence_graph='equivalence_graph.png')

Find overlapping sites with a reference density and update site descriptions.

Density.find_equivalence_sites_with(ref)

Arguments:

ref a Density object defined on the same grid fmt python format string used for the equivalent_name, which should

contain %d for the reference label number (max 10 chars) (but see below for magical use of xray water names)
update_reference
True (default): Also update the site_properties in the reference so that one can make graphs that highlight the common sites. (This is recommended.) False: don’t change the reference
use_ref_equivalencesites
True: use sites + equivalence sites from the reference density False*: remove all equivalence sites als from the ref density
verbosity For verbosity >= 3 output some statistics; verbosity >=5 also
returns the equivalence graph for analysis; verbosity >= 7 displays the graph (and saves to equivalence_graph.png).

An ‘equivalence site’ is a site that contains all sites that overlap in real space with another site in the reference density. This also means that two or more sites in one density can become considered equivalent if they both overlap with a larger site in the other density, and it is also possible that one creates ‘equivalence’ chains (0,a) <-> (1,b) <-> (0,c) <-> (1,d) (although (0,a) ~<-> (1,d), and by construction (0,a) ~<-> (0,c) and (1,b) ~<-> (1,d)), leading to extensive equivalence sites.

When hopping properties are computed, an equivalence site is used instead of the individual sub sites.

The equivalence sites themselves are constructed as new sites and added to the list of sites; their site numbers are constructed by adding to the total number of existing sites. Sub-sites are marked up by an entry of the equivalence site’s site number in site_properties.equivalence_site.

The common sites are consecutively numbered, starting at 2, from the one containing most sites to the one with fewest.

The method updates Density.site_properties.equivalent_name with the new descriptor of the equivalent site. Equivalent site names are consecutively numbered, starting at 2, and can be optionally formatted with the fmt argument.

However, if the reference density was built from an X-ray density AND if each site corresponds to single X-ray water molecule then the equivalence names contain the water identifiers eg ‘W136’ or ‘W20_W34_W36’.

See the hop.sitemap.find_common_sites() function for more details.

has_bulk()

Returns True if a bulk site has been inserted and False otherwise.

map_hilo(lomin=0.0, lomax=0.5, himin=2.72)

Experimental mapping of low density sites together with high density ones.

Keywords:
lomin

low-density sites must have a density > lomin [0.0]

lomax

low-density sites must have a density < lomax [0.5]

himin

high-density sites must have a density > himin [2.72]

map_sites(threshold=None)

Find regions of connected density and label them consecutively

map_sites([threshold=<threshold>])

threshold Use the given threshold to generate the graph; the threshold
is assumed to be in the same units as the density. (This updates the Density object’s threshold value as well.)

The interstitial has label ‘0’, the largest connected subgraph has ‘1’ etc. The sites (i.e.the list of indices into map/grid) can be accesed as Density.sites[label].

masked_density(density, site_labels)

Returns only that portion of density that corresponds to sites; everything else is zeroed.

masked = masked_density(density,sites)

Arguments:

density a array commensurate with the map site_labels label or list of site labels

Results:

Returns numpy array of same shape as input with non-site cells zeroed.

remove_equivalence_sites()

Delete equivalence sites and recompute site map.

site_insert_bulk(bulkdensity, bulklabel=1, force=False)

Insert a bulk site from a different density map as bulk site into this density.

site_insert_bulk(bulkdensity)

This is a bit of a hack. The idea is that one can use a site from a different map (computed from the same trajectory with the same grid!) and insert it into the current site map to define a different functional region. Typically, the bulk site is the largest site in bulkdensity (and has site label 1) but if this is not the case manually choose the appropriate bulklabel.

The site is always inserted as the bulk site in the current density.

Example: >>> bulkdensity = hop.interactive.make_density(psf,dcd,’bulk’,delta=1.0,

atomselection=’name OH2 and not within 4.0 of protein’)
>>> bulkdensity.map_sites(threshold=0.6)
>>> density.site_insert_bulk(bulkdensity)
>>> density.save()
>>> del bulkdensity
site_insert_nobulk()

Insert an empty bulk site for cases when this is convenient.

site_labels(include='default', exclude='default')

Return a list of site labels, possibly filtered.

L = site_labels(include=<inclusions>,exclude=<exclusions>)

<inclusions> and <exclusions> consist of a list of site labels (integers) and/or keywords that describe a site selection. All entries in one list are logically ORed. All exclusions are then removed from the inclusions and the final list of site labels is returned as a numpy array. (As a special case, the argument need not be a list but can be a single keyword or site label).

For convenience, some inclusions such as ‘subsites’ and ‘equivalencesites’ automatically remove themselves from the exclusions.

For standard use the defaults should do what you expect, i.e. only see the sites that are relevant or that have been mapped in a hopping trajectory.

Set verbosity to 10 in order to see the parsed selection.

<inclusions>
‘all’ all mapped sites, including bulk and subsites of
equivalent sites (but read the NOTE below: set exclude=None)
‘default’ all mapped sites, including bulk but excluding subsites
and interstitial
‘sites’ all mapped sites, excluding bulk and interstitial
(removes ‘subsites’ and ‘equivalencesites’ from exclusions)

‘subsites’ all sites that have been renamed or aggreated into equivalence sites ‘equivalencesites’

only the equivalence sites

int, list site label(s)

<exclusions>
‘default’ equivalent to [‘interstitial’,’subsites’]; always applied unless
exludsions=None is set!

None do not apply any exclusions ‘interstitial’

exclude interstitial (almost no reason to ever include it)
‘subsites’ exclude sites that have been aggregated or
simply renamed as equivalence sites
‘equivalencesites’
exclude equivalence sites (and possibly include subsites)

‘bulk’ exclude the bulk site

Provides the ordered list L of site labels, excluding sites listed in the exclude list. Site labels are integers, starting from ‘0’ (the interstitial). These labels are the index into the site_properties[] and sites[] arrays.

NOTE that by default the standard exclusions are already being applied to any ‘include’; if one really wants all sites one has to set exclude=None.

Exclusions are applied _after_ inclusions.

‘site’ discards the bulk site, self.P[‘bulk_site’]; this parameter is automatically set when adding the bulk site with site_insert_bulk().

See find_equivalence_sites_with() for more on equivalence sites and subsites.

site_occupancy(**labelargs)

Returns the labels and the average/stdev occupancy of the labeled site(s).

labels, <N>, std(N) = site_coocupancy(include=’all’ | <int> | <list>)

Average occupancy is the average number of water molecules on the site i:

<N_i> = <n_i> * V_i

where n_i is the average density of the site and V_i its volume.

The label selection arguments are directly passed to site_labels() (see doc string).

If the interstitial is included then 0,0 is returned for the interstitial site (so ignore those numbers).

site_remove_bulk(force=False)

Cleanup bulk site.

site_volume(**labelargs)

Returns the label(s) and volume(s) of the selected sites.

labels, volumes = site_volume(‘all’)

The volume is calculated in the unit set in unit[‘length’]. The label selection arguments are directly passed to site_labels() (see doc string).

The volume of the interstitial (if included) is returned as 0 (which is not correct but for technical reasons more convenient).

stats(data=None)

Statistics for the density (excludes bulk, interstitial, subsites).

d = stats([data=dict])

subsites_of(equivsites, kind='sitelabel')

Return subsites of given equivalence sites as a dict.

dict <– subsites_of(equivsites,kind=’sitelabel’)

The dict is indexed by equivsite label. There is one list of subsites for each equivsitelabel.

kind ‘sitelabel’: equivsites are the sitelabels as uses internally; this is
the default because site_labels() returns these numbers and so one can directly use the output from site_labels() as input (see example)
‘equivlabel’: equivsites are treated as labels of equivalence sites;
these are integers N that typically start at 2
‘name’: equivsites are treated as strings that are given as names
to sites; the default settings produce something like ‘N*’

EXAMPLES:

dens.subsites_of(dens.site_labels(‘equivalencesites’)) dens.subsites_of([2,5,10], kind=’equivsites’) dens.subsites_of(‘10*’, kind=’name’)

NOTE: * equivlabel == 0 is silently filtered (it is used as a merker for NO equivalence

site)
  • empty equivalence sites show up as empty entries in the output dict; typically this means that one gave the wrong input or kind
class hop.sitemap.Grid(grid=None, edges=None, filename=None, dxfile=None, parameters=None, unit=None, metadata=None)

Class to manage a multidimensional grid object.

The grid (Grid.grid) can be manipulated as a standard numpy array. Changes can be saved to a file using the save() method. The grid can be restored using the load() method or by supplying the filename to the constructor.

The attribute Grid.metadata holds a user-defined dictionary that can be used to annotate the data. It is saved with save().

The export(format=’dx’) method always exports a 3D object, the rest should work for an array of any dimension.

Create a Grid object from data.

From a numpy.histogramdd():
g = Grid(grid,edges)
From files (created with Grid.save(<filename>):
g = Grid(filename=<filename>)
From a dx file:
g = Grid(dxfile=<dxfile>)

Arguments:

grid histogram or density and … edges list of arrays, the lower and upper bin edges along the axes

(both are output by numpy.histogramdd())
filename file name of a pickled Grid instance (created with
Grid.save(filename))

dxfile OpenDX file parameters dictionary of class parameters; saved with save()

isDensity False: grid is a histogram with counts,
True: a density. Applying Grid.make_density() sets it to True.
unit dict(length=’Angstrom’, density=None)
length: physical unit of grid edges (Angstrom or nm) density: unit of the density if isDensity == True or None
metadata a user defined dictionary of arbitrary values
associated with the density; the class does not touch metadata[] but stores it with save()

Returns: g a Grid object

If the input histogram consists of counts per cell then the make_density() method converts the grid to a physical density. For a probability density, divide it by grid.sum() or use normed=True right away in histogramdd().

If grid, edges, AND filename are given then the extension-stripped filename is stored as the default filename.

NOTE:

  • It is suggested to construct the Grid object from a histogram, to supply the appropriate length unit, and to use make_density() to obtain a density. This ensures that the length- and the density unit correspond to each other.

TODO: * arg list is still messy * probability density not supported as a unit

centers()

Returns the coordinates of the centers of all grid cells as an iterator.

convert_density(unit='Angstrom^{-3}')

Convert the density to the physical units given by unit

unit can be one of the following:

name description of the unit
Angstrom^{-3} particles/A**3
nm^{-3} particles/nm**3
SPC density of SPC water at standard conditions
TIP3P … see MDAnalysis.units.water
TIP4P … see MDAnalysis.units.water
water density of real water at standard conditions (0.997 g/cm**3)
Molar mol/l
Note: (1) This only works if the initial length unit is provided.
  1. Conversions always go back to unity so there can be rounding and floating point artifacts for multiple conversions.

There may be some undesirable cross-interactions with convert_length…

convert_length(unit='Angstrom')

Convert Grid object to the new unit:

unit Angstrom, nm

This changes the edges but will not change the density; it is the user’s responsibility to supply the appropriate unit if the Grid object is constructed from a density. It is suggested to start from a histogram and a length unit and use make_density().

export(filename=None, format='dx')

export density to file using the given format; use ‘dx’ for visualization.

export(filename=<filename>,format=<format>)

The <filename> can be omitted if a default file name already exists for the object (e.g. if it was loaded from a file or it was saved before.) Do not supply the filename extension. The correct one will be added by the method.

The default format for export() is ‘dx’.

Only implemented formats:

dx OpenDX (WRITE ONLY) python pickle (use Grid.load(filename) to restore); Grid.save()

is simpler than export(format=’python’).
importdx(dxfile)

Initializes Grid from a OpenDX file.

make_density()

Convert the grid (a histogram, counts in a cell) to a density (counts/volume).

make_density()

Note: (1) This changes the grid irrevocably.
  1. For a probability density, manually divide by grid.sum().
hop.sitemap.find_common_sites(a, b, use_equivalencesites=None)

Find sites that overlap in space in Density a and b.

m = find_common_sites(a,b)

Arguments:

a Density instance b Density instance

Returns:

array of mappings between sites in a and b that overlap m[:,0] site labels in a m[:,1] site labels in b dict(m) translates labels in a to labels in b dict(m[:,[1,0]])

translates labels in b to labels in a
hop.sitemap.find_overlap_coeff(a, b)

Find sites that overlap in space in Density a and b.

m = find_overlap_coeff(a,b)

Arguments:

a Density instance b Density instance

Returns:
array sites in a and b that overlap
and array of probability of overlap for overlapped sites

m[:,0] site labels in a m[:,1] site labels in b oc amount of overlap

hop.sitemap.remap_density(density, ref, verbosity=0)

Transform a Density object to a grid given by a reference Density.

>>> newdensity = remap_density(old,ref)

The user is repsonsible to guarantee that: * the grid spacing is the same in both densities * the grids only differ by a translation, not a rotation

Arguments:

old Density object with site map ref reference Density object that provides the new grid shape verbosity=0 increase to up to 3 for status and diagnostic messages

Returns:
newdensity Density object with old’s density and site map transformed

to ref’s coordinate system. It is now possible to manipulate newdensity’s and ref’s arrays (grid and map) together, e.g.

>>> common = (newdensity.map > 1 & ref.map > 1)
>>> pairs = newdensity.map[common], ref.map[common]

Note that this function is not well implemented at the moment and can take a considerable amount of time on bigger grids (100x100x100 take about 3 Min).

An implicit assumption is that the two coordinate systems for the two grids are parallel and are only offset by a translation. This cannot be checked based on the available data and must be guaranteed by the user. RMS-fitting the trajectories is sufficient for this to hold.

BUGS:

  • This is not a good way to do the remapping: It requires parallel coordinate systems and the exact same delta.
  • It is slow.
  • It would be much better to interpolate density on the reference grid,
hop.sitemap.unique_tuplelist(x)

Sort a list of tuples and remove all values None