Exercise #5: Obtaining H-bonding pairs and microstates data
In this exercise, we will run a custom simulation on 4LZT and process its microstates ensemble to produce hydrogen-bond networks using MCCE4.
0. Pre-requisite:
Ensure you have the conda environment for mc4 activated.
conda activate mc4
Ensure you have installed MCCE4-Tools. If not, please follow these steps.
Tool: ms_hbnets
This tool processes a file in the ms_out subfolder of a simulation (a.k.a. ‘the msout file’), for H-bonding microstates and H-bonding pairs, outputting their count and occupancies in their respective files, e.g. hb_pairs_pHeH.csv and hb_states_pHeH.csv.
-
Definitions:
- H-bonding pairs: The pairs of Hydrogen donor and acceptor conformers
- H-bonding microstates: The Monte Carlo sampling microstates that contain any of the structural Hydrogen-bonding pairs
Help on the tool: ms_hbnets -h
usage: ms_hbnets
ms_hbnets --load_states # to get hb states instead of pairs
ms_hbnets -ph 5
ms_hbnets -n_states 30000
ms_hbnets --run_checks # + -mcce_dir, -ph, -eh if needed; all other options: ignored
Gather the H-bonding conformer pairs and states occupancies
from the microstates file given a mcce dir, pH & Eh.
options:
-h, --help show this help message and exit
-mcce_dir MCCE_DIR MCCE run directory; Default: .
-ph PH Titration pH; Default: 7
-eh EH Titration Eh; Default: 0
--load_states Include to load the H-bonding states instead of the H-bonding pairs (default)
-n_states N_STATES Number of H-bonding states to return, possibly (no effect without --load_states); Default: 25000
--run_checks Perform checks on main outputs; Default: False
-v, --verbose To ouput more details; Default: False
1. Prepare the directory:
Enter the working directory for this exercise:
cd mcce_workflows
mkdir ex5; cd ex5
2. Get the pdb file & run p_info:
getpdb 4lzt
p_info 4lzt.pdb
3. Run a full simulation at pH7:
run_mcce4 4lzt.pdb -initial 7 -n 1 --ms
4. Run detect_hbonds (another tool in MCCE4-Tools):
ms_hbnets uses the output of detect_hbonds, which works on a pdb (‘step2out.pdb’ by default), and therefore, returns all _structural H-bonding donor and acceptors pairs; its main output is ‘step2_out_hah.txt’.
detect_hbonds # backbone atoms are included by default, add flag --no_bk to exclude them
5. Run ms_hbnets:
Since we are using the default options, the tool will look for the ‘step2_out_hah.txt’ file in order to process it for the current pH.
ms_hbnets
The outputs of
ms_hbnetsare pH-dependent as some conformers may not be free at all pH points. this is why the output files describe below retain the “pHeH” string found in the corresponding ‘msout file’ selected by the-phand-ehcommand line options.
Main outputs of ms_hbnets:
Six files that retain the ‘pHeH’ string of the msout file name in use to indicate they are pH-dependent.
Note: All ‘hb_’ files are sorted descendingly by count.
hah_pHeH.txt
This file is the reduced version of the output of detect_hbonds where these pairs are filtered out:
- pairs of backbone conformers
- pairs with one or more ‘always off’ conformers
- pairs with one or more conformers with 0 occupancy
expanded_hah_pHeH.csv
This file is both a reduced and an expanded version of the ‘step2_out_hah.txt’ file.
-
Reduced: The reduced file hah_pHeH.txt is the input to the function that creates the expanded file.
-
Expanded: Extra columns flag whether a conformer is free or not, which leads to a mapping of conformer indices to H-bonds matrix indices (‘Mi’, ‘Mj’). These two columns, also present in the pairs file can be used to retrieve the coordinates of the donors and acceptors atoms, for example.
hb_states_pHeH.csv
This file lists the identifier, average energy, count and occupancy of each H-bonding microstate saved according to the n_states option value (default: 25,000):
- Column names: ‘state_id’, ‘averE’, ‘count’, ‘occ’
- Column types: string, float, integer, float
- Example (shortened):
# Data for the 22,847 saved hb_states whose sum count represents 5.57% of the state space (83,550/1,500,000) state_id,averE,count,occ "(HIS+1A0015_006,THR01A0089_003),(SER01A0050_003,ASP-1A0048_010),(THRBKA0051_000,SER01A0060_005)[...]",-0.0,40,2.2e-05So, ‘state_id’ is a string of conformer identifier pairs (tuples) and indicates which H-bonding pairs define the microstate (the next hb state will have a different set of pairs).
hb_states_pairs_pHeH.csv
This file lists the effective count and occ of the pairs in all state_id of the above hb_states file.
- Column names: ‘Mi’, ‘Mj’, ‘donor’, ‘acceptor’, ‘count’, ‘occ’
- Column types: integer, integer, string, string, integer, float
- Example (shortened):
# States hb_pairs; last 2 columns: state count, occ Mi,Mj,donor,acceptor,count,occ 219,375,LYS+1A0096_002,HISBKA0015_000,94469,0.999048 384,81,ARGBKA0045_000,ASN01A0044_001,93724,0.991170 124,382,GLN01A0057_001,ALABKA0042_000,93558,0.989414 124,388,GLN01A0057_001,ASPBKA0052_000,93558,0.989414
hb_pairs_pHeH.csv
This file lists the count and occupancy of each H-bonding conformer pairs found in the entire state space.
- Columns names: ‘Mi’, ‘Mj’, ‘donor’, ‘acceptor’, ‘count’, ‘occ’
- Column types: integer, integer, string, string, integer, float
- Example (shortened):
Mi,Mj,donor,acceptor,count,occ 22,25,SER01A0060_005,THR01A0069_003,1169166,0.974305
hb_pairs_res_pHeH.csv
This file lists the count and occupancy of each H-bonding pairs found in the hb_pairs file grouped by residue pairs. As this file is possibly smaller than the conformer-based file above, it is more suitable for importing in Cytoscape, for example.
- Columns names: ‘res_d’, ‘res_a’, ‘count’, ‘occ’
- Column types: string, string, integer, float
- Example (shortened):
res_d,res_a,count,occ R_A45,N_A44,1784729.0,0.991516
Method:
Default behavior: Get the H-bonding pairs data
ms_hbnets reads each accepted state in the ‘msout file’ and increments the count of each H-bonding pairs in the reduced file,’hah_pHeH.txt’, by the microstate count, then calculates the occupancy of each pair given the entire (conformer) microstate space using the sum of all counts, which is the size of the microstate space.
Alternate behavior: Get the H-bonding states data (using --load_states)
Each accepted state in the ‘msout file’ is read for its count and energy, and all the H-bonding pairs found in a microstate constitute a H-bonding state, whose identifier/key is a string of comma-separated tuples of donor and acceptor pairs, e.g.: “(d1, a1),(d2,a2),[…]”. The microstate count and energy are accumulated on unique H-bonding states. The sum of the H-bonding states count yields the size of the H-bonding space and this value is used to calculate the occupancy and average energy of each H-bonding state.
The number of H-bonding states returned in this file is determined in accordance with the value for the -n_states command line option (default: 25,000). In the vast majority of cases, this is an upper bound: only in the rare case where all H-bonding microstates were unique would the returned number of saved H-bonding microstates match that value.
Determining whether the -n_states option value is sufficient
The hb_states_pHeH.csv file starts with a commented line, e.g.:
# Data for the 22,847 saved hb_states whose sum count represents 5.57% of the state space (83,550/1,500,000)
Naturally, one may wonder whether such a low percentage of the entire state space represented by the saved H-bonding states in this file is sufficient. One way to find out is to compare the hb_pairs_pHeH.csv file (obtained when ms_hbnets is run without the --load_states option) with the hb_states_pairs_pHeH.csv file. The number of returned states is sufficient if these two conditions are met:
- The pairs’ order is the same (the files are sorted the same way)
- The occupancies agree at least to the 1‰ level (3rd digit past the decimal point)
Example with hb_states produced using --load_states -n_states 20000:
The first 3 columns are from the hb_pairs file (absolute occupancy); the last 3 from the hb_states_pairs file. Since the two conditions are not met, the number of saved states is too low, and for a state space of 1,500,000 in this case, the default number of states to save (25,000) is adequate (not shown).
| indices | count | occupancy | indices | count | occupancy |
|---|---|---|---|---|---|
| 284,144 | 1492652 | 0.995101 | 258,247 | 65077 | 0.99__4__635 |
| 258,247 | 1491164 | 0.994109 | 300,247 | 65077 | 0.994635 |
| 300,247 | 1491164 | 0.994109 | 301,247 | 65077 | 0.994635 |
| 301,247 | 1491164 | 0.994109 | 284,144 | 65068 | 0.994498 |
| 295,68 | 1483643 | 0.989095 | 295,68 | 64747 | 0.989592 |
ms_hbnets is ‘W.I.P’ (work in progress)
The ‘Mi’,’Mj’ columns provide a ‘key’ to, for example, retrieve the coordinates of donors and acceptors, if the positions were needed in a graph (network) analysis, which is not yet included in the tool.
TODO:
- Add network analysis using hb_states file
- Add contribution from Jose to show how to obtain a Cytoscape graph using the hb_pairs_res file.