4.1.2. Generating densities from trajectories — hop.density

As an input a trajectory is required that

  1. Has been centered on the protein of interest.
  2. Has all molecules made whole that have been broken across periodic boundaries.
  3. Has the solvent molecules remap so that they are closest to the solute (this is important when using funky unit cells such as dodechedra or truncated octahedra).

4.1.2.1. Classes and functions

class hop.density.BfactorDensityCreator(pdb, delta=1.0, atomselection='resname HOH and name O', metadata=None, padding=1.0, sigma=None)

Create a density grid from a pdb file using MDAnalysis.

dens = BfactorDensityCreator(psf,pdb,…).PDBDensity()

The main purpose of this function is to convert crystal waters in an X-ray structure into a density so that one can compare the experimental density with the one from molecular dynamics trajectories. Because a pdb is a single snapshot, the density is estimated by placing Gaussians of width sigma at the position of all selected atoms.

Sigma can be fixed or taken from the B-factor field, in which case sigma is taken as sqrt(3.*B/8.)/pi.

TODO:

  • Make Gaussian convolution more efficient (at least for same sigma) because right now it is VERY slow (which may be acceptable if one only runs this once)
  • Using a temporary Creator class with the PDBDensity() helper method is clumsy (but was chosen as to keep the PDBDensity class clean and __init__ compatible with Density).

Construct the density from psf and pdb and the atomselection.

pdb : str
PDB file or MDAnalysis.Universe;
atomselection : str
selection string (MDAnalysis syntax) for the species to be analyzed
delta : float
bin size for the density grid in Angstroem (same in x,y,z) [1.0]
metadata : dict
dictionary of additional data to be saved with the object
padding : float
increase histogram dimensions by padding (on top of initial box size)
sigma : float
width (in Angstrom) of the gaussians that are used to build up the density; if None (the default) then uses B-factors from pdb

For assigning X-ray waters to MD densities one might have to use a sigma of about 0.5 A to obtain a well-defined and resolved x-ray water density that can be easily matched to a broader density distribution.

The following creates the density with the B-factors from the pdb file:

DC = BfactorDensityCreator(pdb, delta=1.0, atomselection="name HOH",
                           padding=2, sigma=None)
density = DC.Density()

density_from_PDB() for a convenience function

PDBDensity(threshold=None)

Returns a PDBDensity object.

The PDBDensity is a Density with a xray2psf translation table; it has also got an empty bulk site inserted (so that any further analysis which assumes that site number 1 is the bulk) does not discard a valid site.

threshold Use the given threshold to generate the graph; the threshold
is assumed to be in the same units as the density. None: choose defaults (1.0 if bfactors were used, 1.3 otherwise)
class hop.density.DensityCollector(name, universe, **kwargs)

Collect subsequent coordinate frames to build up a Density.

class hop.density.PDBDensity(grid=None, edges=None, filename=None, dxfile=None, parameters=None, unit=None, metadata=None)

Density with additional information about original crystal structure.

This is simply the Density class (see below) enhanced by the add_xray2psf(), W(), and Wequiv() methods.

Note that later analysis often ignores the site with the bulknumber by default so one should (after computing a site map) also insert an empty bulk site:

# canonical way to build a PDBDensity # (builds the sitepa at threshold and inserts a pseudo bulk site) xray = BfactorDensityCreator(…).PDBDensity(threshold)

# rebuild site map xray.map_sites(threshold) # map sites at density cutoff threshold xray.site_insert_nobulk() # insert ‘fake’ bulk site at position SITELABEL[‘bulk’]

# find X-ray waters that correspond to a site in another density Y: # (1) build the list of equivalence sites, using the x-ray density as reference Y.find_equivalence_sites(xray) # also updates equiv-sites in xray! # (2) look at the matches in xray xray.Wequiv() TODO: not working yet

Density Class

Class with an annotated density, i.e. additional information

for each grid cell. Adds information about sites to the grid. A ‘site’ consists of all connected grid cells with a density >= threshold.

A site is defined as a set of at least ‘MINsite’ grid cells with density >= threshold that are located in each others’ first and second nearest neighbour shell (of 26 cells, on the cubic lattice). A site is labelled by an integer 1..N. The interstitial is labelled ‘0’. By default, a site may consist of a single grid cell (MINsite == 1) but this can be changed by setting the parameter MINsite to another number >1.

When neither grid nor edges are given then the density object can also be read from a pickled file (filename) or a OpenDX file (dxfile). In the latter case, care should be taken to properly set up the units and the isDensity parameter:

>>> g = Density(dxfile='bulk.dx',parameters={'isDensity':True,'MINsite':1},
                unit={'length':'Angstrom','density':'Angstrom^{-3}'}, ....)

Attributes:

grid density on a grid edges the lower and upper edges of the grid cells along the

three dimensions of the grid

map grid with cells labeled as sites (after label_sites()) sites list of sites: site 0 is the interstitial, then follows

the largest site, and then sites in decreasing order. Each site is a list of tuples. Each tuple is the index (i,j,k) into the map or grid.

graph NetworkX graph of the cells

unit physical units of various components P (default) values of parameters

Methods:

map_sites(threshold)
label all sites, defined by the threshold. The threshold value is stored with the object as the default. The default can be explicitly set as P[‘threshold’]

save(filename) save object.pickle load(filename) restore object.pickle (or use d=Density(filename=<filename>)) export() write density to a file for visualization export_map() write individual sites

Adds information about sites to the grid. Sites are all cells with a density >= threshold.

density = Density(kargs**)

Sets up a Grid with additional data, namely the site map The threshold is given as key-value pair in the parameters dictionary and is assumed to be in the same units as the density.

If the input grid is a histogram then it is transformed into a density.

When neither grid nor edges are given then the density object can also be read from a pickled file (filename) or a OpenDX file (dxfile). In the latter case, care should be taken to properly set up the units and the isDensity parameter if the dx file is a density:

>>> g = Density(dxfile='bulk.dx',parameters={'isDensity':True},
            unit={'length':'Angstrom','density':'Angstrom^{-3}'}, ....)
W(N, returntype='auto', format=False)

Returns the resid of water N.

If returntype == ‘psf’ then N is interpreted as the resid in the x-ray crystal structure (or original pdb file) and a resid N’ in the psf is returned.

If returntype == ‘xray’ then N is a resid in the psf and the corresponding crystal structure water is returned. This is useful to label water molecules by their published identifier, eg ‘W128’.

If the returntype is set to ‘auto’ and N starts with a W (eg ‘W128’) then it is assumed to be a crystal water and the returntype is automatically set to psf, otherwise it acts like ‘xray’.

Arguments:

N resid of molecule (can be an iterable) returntype ‘auto’ | ‘psf’ | ‘xray’ format False: return a integer number

True: default string (either “WN’” for x-ray or “#N’” for psf) python format string: if the string contains %(resid)d then the string

will be used as a format, otherwise the bare number is returned without raising an error
Wequiv(format=True)

Return a list of the PDB resids of the equivalent sites.

array = Wequiv(format=True)

format True: array of identifiers ‘Wnn’
False: array of integers string: python format string; %(resid)d is replaced
add_xray2psf(pdbfile, regex='\\s*W\\s*|HOH|WAT|.*TIP.*|.*SPC.*')

Add translation table between sequential psf numbering and original pdb numbering for water.

D.add_xray2psf(pdbfilename)

The original pdb is read and all water molecules are sequentially mapped to the water molecules in the psf (without any checks). The pdb is read and analyzed using Bio.PDB.

pdbfilename Original crystallographic pdb file regex extended regular expression to detect water residues

equivalence_sites(format=True)

All equivalence sites (if defined) together with crystallographic water labels.

recarray <– equivalence_sites(self,format=True)

The numpy.recarray has columns
equivalence_label the integer label of the equivalence site equivalence_name the name, a string xray the identifier of the X-ray water

equivalence_label and equivalence_name are identical between the densities from which the equivalence sites were computed. The xray identifier is specific for the structure; by default it is a string such as ‘W135’.

format True: print ‘W<N>’ identifier
False: integer <N> (see W() for more possibilities)

BUG: THIS IS NOT WORKING AS THOUGHT BECAUSE THERE IS NO 1-1 MAPPING BETWEEN WATER MOLECULES AND SITES AND BECAUSE SITES ARE NOT NUMBERED IN THE SAME ORDER AS THE WATER MOLECULES

TODO: The proper way to do this is to find all water molecules within a cutoff of each grid cell that belongs to a site and then store all the waters as the string name of the site.

site2resid(sitelabel)

Returns the resid of the particle that provided the density for the site.

site_insert_nobulk()

Insert an empty bulk site for cases when this is convenient.

hop.density.density_from_Universe(*args, **kwargs)

Create a hop.sitemap.Density from a :class:`Universe.

See also

MDAnalysis.analysis.density.density_from_Universe() for all parameters and density_from_trajectory() for a convenience wrapper.

hop.density.density_from_trajectory(*args, **kwargs)

Create a density grid from a trajectory.

density_from_trajectory(PSF, DCD, delta=1.0, atomselection=’name OH2’, …) –> density

or

density_from_trajectory(PDB, XTC, delta=1.0, atomselection=’name OH2’, …) –> density
Arguments:
psf/pdb/gro

topology file

dcd/xtc/trr/pdb

trajectory; if reading a single PDB file it is sufficient to just provide it once as a single argument

Keywords:
atomselection

selection string (MDAnalysis syntax) for the species to be analyzed [“name OH2”]

delta

approximate bin size for the density grid in Angstroem (same in x,y,z) (It is slightly adjusted when the box length is not an integer multiple of delta.) [1.0]

metadata

dictionary of additional data to be saved with the object

padding

increase histogram dimensions by padding (on top of initial box size) in Angstroem [2.0]

soluteselection

MDAnalysis selection for the solute, e.g. “protein” [None]

cutoff

With cutoff, select ‘<atomsel> NOT WITHIN <cutoff> OF <soluteselection>’ (Special routines that are faster than the standard AROUND selection) [0]

verbosity: int

level of chattiness; 0 is silent, 3 is verbose [3]

Returns:

hop.sitemap.Density

TODO:
  • Should be able to also set skip and start/stop for data collection.

Note

  • In order to calculate the bulk density, use

    atomselection=’name OH2’,soluteselection=’protein and not name H*’,cutoff=3.5

    This will select water oxygens not within 3.5 A of the protein heavy atoms. Alternatively, use the VMD-based density_from_volmap() function.

  • The histogramming grid is determined by the initial frames min and max.

  • metadata will be populated with psf, dcd, and a few other items. This allows more compact downstream processing.

See also

docs for MDAnalysis.analysis.density.density_from_Universe() (defaults for kwargs are defined there).

hop.density.print_combined_equivalence_sites(target, reference)

Tabulate equivalence sites of target against the reference.

BUG: THIS IS NOT WORKING (because the assignment sites <–> waters is broken)