How to use Microstate Analysis Library?¶
Page last updated on: 2020-11-19
What is Microstate Analysis Library?¶
Monte Carlo sampling in MCCE evaluates protein residue side chains' choice of being at different positions and charge states. This process typically takes thousands of millions of steps and contains of a wealth of information such as:
- What microstates charge states are most likely at different energy level?
- What microstates are at the lowest energy of the sampling energy landscape?
- Do other residues respond to certain changes of a specific residue?
Microstate Analysis Library is a tool set to replay Monte Carlo sampling and reveal the above information.
If you installed MCCE with conda/miniconda, Microstate Analysis Library is already included in the site-package path and it can be imported directly.
If you installed MCCE from the source, Microstate Analysis Library is named as ms_analysis.py
in bin directory. You need to copy this program to the same directory of your scripts or Jupyter Notebook to properly import it.
Get basic Monte Carlo sampling information¶
Prerequisite:
You need to enable microstate recording in MCCE step 4 to have microstates saved.
step4.py --ms
You will have a ms_out
directory and microstate files like pH0eH0ms.txt pH10eH0ms.txt pH11eH0ms.txt pH12eH0ms.txt ...
under this directory.
Import the library
import ms_analysis as msa
Show basic information of a MC titration¶
Monte Carlo sapling steps are saved in files, one file at one totration condition. When we load the information back, one object is needed for one totration condition.
mc = msa.MSout("ms_out/pH4eH0ms.txt")
print("Monte Carlo T = %.3f" % mc.T)
print("Monte Carlo pH = %.3f" % mc.pH)
print("Monte Carlo Eh = %.3f" % mc.Eh)
print("Number of MC counts = %d" % mc.N_ms)
print("Number of microstates = %d" % mc.N_uniq)
print("The lowest E = %.3f" % mc.lowest_E)
print("The average E = %.3f" % mc.average_E)
print("The hishest E = %.3f" % mc.highest_E)
Monte Carlo T = 298.150
Monte Carlo pH = 4.000
Monte Carlo Eh = 0.000
Number of MC counts = 1500000
Number of microstates = 266328
The lowest E = -226.198
The average E = -221.267
The hishest E = -206.544
Access Conformer and Residue information¶
Conformers are not tied to Monte Carlo sampling, so it is a separate object in Microstate Analysis Library. Conformers have index of conformers, conformer ID, index of residues (not initialized), residue ID, occupancy (not initialized), and charge.
print("iconf ires confid resid occ crg")
for conf in msa.conformers:
print("%5d %4d %s %s %5.3f %6.3f" % (conf.iconf, conf.ires, conf.confid, conf.resid, conf.occ, conf.crg))
iconf ires confid resid occ crg
0 0 NTR01A0001_001 NTRA0001_ 0.000 0.000
1 0 NTR+1A0001_002 NTRA0001_ 0.000 1.000
2 0 LYS01A0001_001 LYSA0001_ 0.000 0.000
3 0 LYS+1A0001_002 LYSA0001_ 0.000 1.000
4 0 VAL01A0002_001 VALA0002_ 0.000 0.000
5 0 PHE01A0003_001 PHEA0003_ 0.000 0.000
6 0 ARG01A0005_001 ARGA0005_ 0.000 0.000
7 0 ARG02A0005_002 ARGA0005_ 0.000 0.000
8 0 ARG02A0005_003 ARGA0005_ 0.000 0.000
9 0 ARG03A0005_004 ARGA0005_ 0.000 0.000
10 0 ARG+1A0005_005 ARGA0005_ 0.000 1.000
11 0 CYD01A0006_001 CYDA0006_ 0.000 0.000
12 0 GLU01A0007_001 GLUA0007_ 0.000 0.000
13 0 GLU01A0007_002 GLUA0007_ 0.000 0.000
14 0 GLU02A0007_003 GLUA0007_ 0.000 0.000
15 0 GLU02A0007_004 GLUA0007_ 0.000 0.000
16 0 GLU-1A0007_005 GLUA0007_ 0.000 -1.000
17 0 LEU01A0008_001 LEUA0008_ 0.000 0.000
18 0 ALA01A0009_001 ALAA0009_ 0.000 0.000
19 0 ALA01A0010_001 ALAA0010_ 0.000 0.000
20 0 ALA01A0011_001 ALAA0011_ 0.000 0.000
21 0 MET01A0012_001 META0012_ 0.000 0.000
22 0 LYS01A0013_001 LYSA0013_ 0.000 0.000
23 0 LYS+1A0013_002 LYSA0013_ 0.000 1.000
24 0 ARG01A0014_001 ARGA0014_ 0.000 0.000
25 0 ARG02A0014_002 ARGA0014_ 0.000 0.000
26 0 ARG03A0014_003 ARGA0014_ 0.000 0.000
27 0 ARG+1A0014_004 ARGA0014_ 0.000 1.000
28 0 HIS01A0015_001 HISA0015_ 0.000 0.000
29 0 HIS01A0015_002 HISA0015_ 0.000 0.000
30 0 HIS02A0015_003 HISA0015_ 0.000 0.000
31 0 HIS02A0015_004 HISA0015_ 0.000 0.000
32 0 HIS+1A0015_005 HISA0015_ 0.000 1.000
33 0 HIS+1A0015_006 HISA0015_ 0.000 1.000
34 0 LEU01A0017_001 LEUA0017_ 0.000 0.000
35 0 ASP01A0018_001 ASPA0018_ 0.000 0.000
36 0 ASP01A0018_002 ASPA0018_ 0.000 0.000
37 0 ASP02A0018_003 ASPA0018_ 0.000 0.000
38 0 ASP02A0018_004 ASPA0018_ 0.000 0.000
39 0 ASP-1A0018_005 ASPA0018_ 0.000 -1.000
40 0 ASN01A0019_001 ASNA0019_ 0.000 0.000
41 0 TYR01A0020_001 TYRA0020_ 0.000 0.000
42 0 TYR01A0020_002 TYRA0020_ 0.000 0.000
43 0 TYR-1A0020_003 TYRA0020_ 0.000 -1.000
44 0 ARG01A0021_001 ARGA0021_ 0.000 0.000
45 0 ARG02A0021_002 ARGA0021_ 0.000 0.000
46 0 ARG03A0021_003 ARGA0021_ 0.000 0.000
47 0 ARG+1A0021_004 ARGA0021_ 0.000 1.000
48 0 TYR01A0023_001 TYRA0023_ 0.000 0.000
49 0 TYR-1A0023_002 TYRA0023_ 0.000 -1.000
50 0 SER01A0024_001 SERA0024_ 0.000 0.000
51 0 SER01A0024_002 SERA0024_ 0.000 0.000
52 0 SER01A0024_003 SERA0024_ 0.000 0.000
53 0 LEU01A0025_001 LEUA0025_ 0.000 0.000
54 0 ASN01A0027_001 ASNA0027_ 0.000 0.000
55 0 ASN01A0027_002 ASNA0027_ 0.000 0.000
56 0 TRP01A0028_001 TRPA0028_ 0.000 0.000
57 0 VAL01A0029_001 VALA0029_ 0.000 0.000
58 0 CYD01A0030_001 CYDA0030_ 0.000 0.000
59 0 ALA01A0031_001 ALAA0031_ 0.000 0.000
60 0 ALA01A0032_001 ALAA0032_ 0.000 0.000
61 0 LYS01A0033_001 LYSA0033_ 0.000 0.000
62 0 LYS+1A0033_002 LYSA0033_ 0.000 1.000
63 0 PHE01A0034_001 PHEA0034_ 0.000 0.000
64 0 GLU01A0035_001 GLUA0035_ 0.000 0.000
65 0 GLU01A0035_002 GLUA0035_ 0.000 0.000
66 0 GLU02A0035_003 GLUA0035_ 0.000 0.000
67 0 GLU02A0035_004 GLUA0035_ 0.000 0.000
68 0 GLU-1A0035_005 GLUA0035_ 0.000 -1.000
69 0 SER01A0036_001 SERA0036_ 0.000 0.000
70 0 ASN01A0037_001 ASNA0037_ 0.000 0.000
71 0 ASN01A0037_002 ASNA0037_ 0.000 0.000
72 0 PHE01A0038_001 PHEA0038_ 0.000 0.000
73 0 ASN01A0039_001 ASNA0039_ 0.000 0.000
74 0 ASN01A0039_002 ASNA0039_ 0.000 0.000
75 0 THR01A0040_001 THRA0040_ 0.000 0.000
76 0 THR01A0040_002 THRA0040_ 0.000 0.000
77 0 GLN01A0041_001 GLNA0041_ 0.000 0.000
78 0 GLN01A0041_002 GLNA0041_ 0.000 0.000
79 0 ALA01A0042_001 ALAA0042_ 0.000 0.000
80 0 THR01A0043_001 THRA0043_ 0.000 0.000
81 0 THR01A0043_002 THRA0043_ 0.000 0.000
82 0 ASN01A0044_001 ASNA0044_ 0.000 0.000
83 0 ASN01A0044_002 ASNA0044_ 0.000 0.000
84 0 ARG01A0045_001 ARGA0045_ 0.000 0.000
85 0 ARG02A0045_002 ARGA0045_ 0.000 0.000
86 0 ARG03A0045_003 ARGA0045_ 0.000 0.000
87 0 ARG+1A0045_004 ARGA0045_ 0.000 1.000
88 0 ASN01A0046_001 ASNA0046_ 0.000 0.000
89 0 ASN01A0046_002 ASNA0046_ 0.000 0.000
90 0 ASN01A0046_003 ASNA0046_ 0.000 0.000
91 0 ASN01A0046_004 ASNA0046_ 0.000 0.000
92 0 THR01A0047_001 THRA0047_ 0.000 0.000
93 0 ASP01A0048_001 ASPA0048_ 0.000 0.000
94 0 ASP01A0048_002 ASPA0048_ 0.000 0.000
95 0 ASP01A0048_003 ASPA0048_ 0.000 0.000
96 0 ASP01A0048_004 ASPA0048_ 0.000 0.000
97 0 ASP02A0048_005 ASPA0048_ 0.000 0.000
98 0 ASP02A0048_006 ASPA0048_ 0.000 0.000
99 0 ASP02A0048_007 ASPA0048_ 0.000 0.000
100 0 ASP02A0048_008 ASPA0048_ 0.000 0.000
101 0 ASP-1A0048_009 ASPA0048_ 0.000 -1.000
102 0 ASP-1A0048_010 ASPA0048_ 0.000 -1.000
103 0 SER01A0050_001 SERA0050_ 0.000 0.000
104 0 SER01A0050_002 SERA0050_ 0.000 0.000
105 0 SER01A0050_003 SERA0050_ 0.000 0.000
106 0 THR01A0051_001 THRA0051_ 0.000 0.000
107 0 THR01A0051_002 THRA0051_ 0.000 0.000
108 0 ASP01A0052_001 ASPA0052_ 0.000 0.000
109 0 ASP01A0052_002 ASPA0052_ 0.000 0.000
110 0 ASP01A0052_003 ASPA0052_ 0.000 0.000
111 0 ASP01A0052_004 ASPA0052_ 0.000 0.000
112 0 ASP02A0052_005 ASPA0052_ 0.000 0.000
113 0 ASP02A0052_006 ASPA0052_ 0.000 0.000
114 0 ASP02A0052_007 ASPA0052_ 0.000 0.000
115 0 ASP02A0052_008 ASPA0052_ 0.000 0.000
116 0 ASP-1A0052_009 ASPA0052_ 0.000 -1.000
117 0 ASP-1A0052_010 ASPA0052_ 0.000 -1.000
118 0 TYR01A0053_001 TYRA0053_ 0.000 0.000
119 0 TYR01A0053_002 TYRA0053_ 0.000 0.000
120 0 TYR-1A0053_003 TYRA0053_ 0.000 -1.000
121 0 ILE01A0055_001 ILEA0055_ 0.000 0.000
122 0 LEU01A0056_001 LEUA0056_ 0.000 0.000
123 0 GLN01A0057_001 GLNA0057_ 0.000 0.000
124 0 GLN01A0057_002 GLNA0057_ 0.000 0.000
125 0 ILE01A0058_001 ILEA0058_ 0.000 0.000
126 0 ASN01A0059_001 ASNA0059_ 0.000 0.000
127 0 ASN01A0059_002 ASNA0059_ 0.000 0.000
128 0 ASN01A0059_003 ASNA0059_ 0.000 0.000
129 0 ASN01A0059_004 ASNA0059_ 0.000 0.000
130 0 SER01A0060_001 SERA0060_ 0.000 0.000
131 0 SER01A0060_002 SERA0060_ 0.000 0.000
132 0 SER01A0060_003 SERA0060_ 0.000 0.000
133 0 SER01A0060_004 SERA0060_ 0.000 0.000
134 0 SER01A0060_005 SERA0060_ 0.000 0.000
135 0 ARG01A0061_001 ARGA0061_ 0.000 0.000
136 0 ARG02A0061_002 ARGA0061_ 0.000 0.000
137 0 ARG03A0061_003 ARGA0061_ 0.000 0.000
138 0 ARG03A0061_004 ARGA0061_ 0.000 0.000
139 0 ARG+1A0061_005 ARGA0061_ 0.000 1.000
140 0 TRP01A0062_001 TRPA0062_ 0.000 0.000
141 0 TRP01A0063_001 TRPA0063_ 0.000 0.000
142 0 CYD01A0064_001 CYDA0064_ 0.000 0.000
143 0 ASN01A0065_001 ASNA0065_ 0.000 0.000
144 0 ASN01A0065_002 ASNA0065_ 0.000 0.000
145 0 ASP01A0066_001 ASPA0066_ 0.000 0.000
146 0 ASP01A0066_002 ASPA0066_ 0.000 0.000
147 0 ASP01A0066_003 ASPA0066_ 0.000 0.000
148 0 ASP01A0066_004 ASPA0066_ 0.000 0.000
149 0 ASP02A0066_005 ASPA0066_ 0.000 0.000
150 0 ASP02A0066_006 ASPA0066_ 0.000 0.000
151 0 ASP02A0066_007 ASPA0066_ 0.000 0.000
152 0 ASP02A0066_008 ASPA0066_ 0.000 0.000
153 0 ASP-1A0066_009 ASPA0066_ 0.000 -1.000
154 0 ASP-1A0066_010 ASPA0066_ 0.000 -1.000
155 0 ARG01A0068_001 ARGA0068_ 0.000 0.000
156 0 ARG02A0068_002 ARGA0068_ 0.000 0.000
157 0 ARG03A0068_003 ARGA0068_ 0.000 0.000
158 0 ARG+1A0068_004 ARGA0068_ 0.000 1.000
159 0 THR01A0069_001 THRA0069_ 0.000 0.000
160 0 THR01A0069_002 THRA0069_ 0.000 0.000
161 0 THR01A0069_003 THRA0069_ 0.000 0.000
162 0 PRO01A0070_001 PROA0070_ 0.000 0.000
163 0 SER01A0072_001 SERA0072_ 0.000 0.000
164 0 ARG01A0073_001 ARGA0073_ 0.000 0.000
165 0 ARG02A0073_002 ARGA0073_ 0.000 0.000
166 0 ARG03A0073_003 ARGA0073_ 0.000 0.000
167 0 ARG+1A0073_004 ARGA0073_ 0.000 1.000
168 0 ASN01A0074_001 ASNA0074_ 0.000 0.000
169 0 ASN01A0074_002 ASNA0074_ 0.000 0.000
170 0 LEU01A0075_001 LEUA0075_ 0.000 0.000
171 0 CYD01A0076_001 CYDA0076_ 0.000 0.000
172 0 ASN01A0077_001 ASNA0077_ 0.000 0.000
173 0 ILE01A0078_001 ILEA0078_ 0.000 0.000
174 0 PRO01A0079_001 PROA0079_ 0.000 0.000
175 0 CYD01A0080_001 CYDA0080_ 0.000 0.000
176 0 SER01A0081_001 SERA0081_ 0.000 0.000
177 0 ALA01A0082_001 ALAA0082_ 0.000 0.000
178 0 LEU01A0083_001 LEUA0083_ 0.000 0.000
179 0 LEU01A0084_001 LEUA0084_ 0.000 0.000
180 0 SER01A0085_001 SERA0085_ 0.000 0.000
181 0 SER01A0085_002 SERA0085_ 0.000 0.000
182 0 SER01A0086_001 SERA0086_ 0.000 0.000
183 0 SER01A0086_002 SERA0086_ 0.000 0.000
184 0 ASP01A0087_001 ASPA0087_ 0.000 0.000
185 0 ASP01A0087_002 ASPA0087_ 0.000 0.000
186 0 ASP02A0087_003 ASPA0087_ 0.000 0.000
187 0 ASP02A0087_004 ASPA0087_ 0.000 0.000
188 0 ASP-1A0087_005 ASPA0087_ 0.000 -1.000
189 0 ILE01A0088_001 ILEA0088_ 0.000 0.000
190 0 THR01A0089_001 THRA0089_ 0.000 0.000
191 0 THR01A0089_002 THRA0089_ 0.000 0.000
192 0 THR01A0089_003 THRA0089_ 0.000 0.000
193 0 ALA01A0090_001 ALAA0090_ 0.000 0.000
194 0 SER01A0091_001 SERA0091_ 0.000 0.000
195 0 SER01A0091_002 SERA0091_ 0.000 0.000
196 0 VAL01A0092_001 VALA0092_ 0.000 0.000
197 0 ASN01A0093_001 ASNA0093_ 0.000 0.000
198 0 ASN01A0093_002 ASNA0093_ 0.000 0.000
199 0 CYD01A0094_001 CYDA0094_ 0.000 0.000
200 0 ALA01A0095_001 ALAA0095_ 0.000 0.000
201 0 LYS01A0096_001 LYSA0096_ 0.000 0.000
202 0 LYS+1A0096_002 LYSA0096_ 0.000 1.000
203 0 LYS01A0097_001 LYSA0097_ 0.000 0.000
204 0 LYS+1A0097_002 LYSA0097_ 0.000 1.000
205 0 ILE01A0098_001 ILEA0098_ 0.000 0.000
206 0 VAL01A0099_001 VALA0099_ 0.000 0.000
207 0 SER01A0100_001 SERA0100_ 0.000 0.000
208 0 SER01A0100_002 SERA0100_ 0.000 0.000
209 0 SER01A0100_003 SERA0100_ 0.000 0.000
210 0 ASP01A0101_001 ASPA0101_ 0.000 0.000
211 0 ASP01A0101_002 ASPA0101_ 0.000 0.000
212 0 ASP02A0101_003 ASPA0101_ 0.000 0.000
213 0 ASP02A0101_004 ASPA0101_ 0.000 0.000
214 0 ASP-1A0101_005 ASPA0101_ 0.000 -1.000
215 0 ASN01A0103_001 ASNA0103_ 0.000 0.000
216 0 ASN01A0103_002 ASNA0103_ 0.000 0.000
217 0 MET01A0105_001 META0105_ 0.000 0.000
218 0 ASN01A0106_001 ASNA0106_ 0.000 0.000
219 0 ASN01A0106_002 ASNA0106_ 0.000 0.000
220 0 ALA01A0107_001 ALAA0107_ 0.000 0.000
221 0 TRP01A0108_001 TRPA0108_ 0.000 0.000
222 0 VAL01A0109_001 VALA0109_ 0.000 0.000
223 0 ALA01A0110_001 ALAA0110_ 0.000 0.000
224 0 TRP01A0111_001 TRPA0111_ 0.000 0.000
225 0 ARG01A0112_001 ARGA0112_ 0.000 0.000
226 0 ARG02A0112_002 ARGA0112_ 0.000 0.000
227 0 ARG03A0112_003 ARGA0112_ 0.000 0.000
228 0 ARG03A0112_004 ARGA0112_ 0.000 0.000
229 0 ARG+1A0112_005 ARGA0112_ 0.000 1.000
230 0 ASN01A0113_001 ASNA0113_ 0.000 0.000
231 0 ASN01A0113_002 ASNA0113_ 0.000 0.000
232 0 ARG01A0114_001 ARGA0114_ 0.000 0.000
233 0 ARG02A0114_002 ARGA0114_ 0.000 0.000
234 0 ARG03A0114_003 ARGA0114_ 0.000 0.000
235 0 ARG+1A0114_004 ARGA0114_ 0.000 1.000
236 0 CYD01A0115_001 CYDA0115_ 0.000 0.000
237 0 LYS01A0116_001 LYSA0116_ 0.000 0.000
238 0 LYS+1A0116_002 LYSA0116_ 0.000 1.000
239 0 THR01A0118_001 THRA0118_ 0.000 0.000
240 0 THR01A0118_002 THRA0118_ 0.000 0.000
241 0 ASP01A0119_001 ASPA0119_ 0.000 0.000
242 0 ASP01A0119_002 ASPA0119_ 0.000 0.000
243 0 ASP02A0119_003 ASPA0119_ 0.000 0.000
244 0 ASP02A0119_004 ASPA0119_ 0.000 0.000
245 0 ASP-1A0119_005 ASPA0119_ 0.000 -1.000
246 0 VAL01A0120_001 VALA0120_ 0.000 0.000
247 0 GLN01A0121_001 GLNA0121_ 0.000 0.000
248 0 GLN01A0121_002 GLNA0121_ 0.000 0.000
249 0 ALA01A0122_001 ALAA0122_ 0.000 0.000
250 0 TRP01A0123_001 TRPA0123_ 0.000 0.000
251 0 ILE01A0124_001 ILEA0124_ 0.000 0.000
252 0 ARG01A0125_001 ARGA0125_ 0.000 0.000
253 0 ARG02A0125_002 ARGA0125_ 0.000 0.000
254 0 ARG03A0125_003 ARGA0125_ 0.000 0.000
255 0 ARG03A0125_004 ARGA0125_ 0.000 0.000
256 0 ARG+1A0125_005 ARGA0125_ 0.000 1.000
257 0 CYD01A0127_001 CYDA0127_ 0.000 0.000
258 0 ARG01A0128_001 ARGA0128_ 0.000 0.000
259 0 ARG02A0128_002 ARGA0128_ 0.000 0.000
260 0 ARG03A0128_003 ARGA0128_ 0.000 0.000
261 0 ARG+1A0128_004 ARGA0128_ 0.000 1.000
262 0 LEU01A0129_001 LEUA0129_ 0.000 0.000
263 0 CTR01A0129_001 CTRA0129_ 0.000 0.000
264 0 CTR01A0129_002 CTRA0129_ 0.000 0.000
265 0 CTR02A0129_003 CTRA0129_ 0.000 0.000
266 0 CTR02A0129_004 CTRA0129_ 0.000 0.000
267 0 CTR-1A0129_005 CTRA0129_ 0.000 -1.000
Residue information is put in Monte Carlo sampling. The reason is Monte Carlo sampling sets fixed conformers/residues aside, and only samples free residues and free conformers. In the Monte Carlo sampling output we have what conformers are fixed and what the free residues are composed.
# fixed conformer indecies
print(mc.fixed_iconfs)
[3, 4, 5, 10, 11, 17, 18, 19, 20, 21, 23, 27, 34, 40, 47, 48, 53, 56, 57, 58, 59, 60, 62, 63, 69, 72, 73, 79, 87, 92, 103, 121, 122, 125, 139, 140, 141, 142, 158, 162, 163, 167, 168, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 189, 193, 196, 199, 200, 202, 204, 205, 206, 217, 220, 221, 222, 223, 224, 229, 235, 236, 238, 246, 249, 250, 251, 256, 257, 261, 262]
# free residues expressed as index of conformers. It is a list of residues, and each residue has a list of conformers it can choose from.
for res in mc.free_residues:
print(res)
[0, 1]
[13, 14, 15, 16]
[29, 30, 32, 33]
[36, 37, 38, 39]
[41, 42]
[50, 51, 52]
[54, 55]
[64, 65, 67, 68]
[70, 71]
[75, 76]
[77, 78]
[80, 81]
[82, 83]
[88, 89, 90, 91]
[94, 97, 101, 102]
[106, 107]
[108, 109, 110, 111, 112, 113, 114, 115, 116, 117]
[118, 119]
[123, 124]
[126, 127, 128, 129]
[131, 132, 133, 134]
[143, 144]
[149, 153, 154]
[159, 160, 161]
[180, 181]
[182, 183]
[186, 188]
[190, 191, 192]
[194, 195]
[197, 198]
[207, 208, 209]
[210, 211, 212, 213, 214]
[215, 216]
[218, 219]
[230, 231]
[239, 240]
[241, 242, 243, 244, 245]
[247, 248]
[263, 264, 265, 266, 267]
To get a residue ID for free residues,we need to use conformer information because residue ID is in conformer record.
for res in mc.free_residues:
print(msa.conformers[res[0]].resid) # use resid from the first conformer in that residue
NTRA0001_
GLUA0007_
HISA0015_
ASPA0018_
TYRA0020_
SERA0024_
ASNA0027_
GLUA0035_
ASNA0037_
THRA0040_
GLNA0041_
THRA0043_
ASNA0044_
ASNA0046_
ASPA0048_
THRA0051_
ASPA0052_
TYRA0053_
GLNA0057_
ASNA0059_
SERA0060_
ASNA0065_
ASPA0066_
THRA0069_
SERA0085_
SERA0086_
ASPA0087_
THRA0089_
SERA0091_
ASNA0093_
SERA0100_
ASPA0101_
ASNA0103_
ASNA0106_
ASNA0113_
THRA0118_
ASPA0119_
GLNA0121_
CTRA0129_
Access microstates¶
Each microstate is stored as data structure:
class Microstate:
def __init__(self, state, E, count):
self.state = state
self.E = E
self.count = count
Microstates are stored as a dictionary with the string value of state as the key, and Microstate as the value. One can get the list of microstates by
microstates = list(mc.microstates.values())
# print first 10 entries
for ms in microstates[:10]:
print(ms.state, ms.E, ms.count)
[1, 15, 32, 39, 41, 51, 54, 65, 70, 76, 78, 80, 83, 88, 101, 106, 116, 119, 123, 126, 132, 143, 153, 160, 181, 183, 188, 192, 195, 197, 208, 210, 215, 219, 231, 239, 243, 247, 267] -220.605118 11
[1, 15, 32, 39, 41, 51, 54, 65, 70, 76, 78, 80, 83, 88, 101, 107, 116, 119, 123, 126, 132, 143, 153, 160, 181, 183, 188, 192, 195, 197, 208, 210, 215, 219, 231, 239, 243, 247, 267] -220.287003 3
[1, 15, 32, 39, 41, 51, 54, 65, 70, 76, 78, 80, 83, 88, 101, 107, 116, 119, 123, 126, 132, 143, 153, 160, 181, 183, 188, 192, 195, 197, 208, 210, 215, 219, 231, 239, 243, 248, 267] -220.543747 1
[1, 15, 32, 39, 41, 51, 54, 65, 70, 76, 78, 80, 83, 88, 101, 107, 116, 119, 123, 126, 132, 144, 153, 160, 181, 183, 188, 192, 195, 197, 208, 210, 215, 219, 231, 239, 243, 248, 267] -219.397995 1
[1, 15, 32, 39, 41, 51, 54, 65, 70, 76, 78, 80, 83, 88, 101, 107, 116, 119, 123, 126, 132, 144, 153, 160, 181, 182, 188, 192, 195, 197, 208, 210, 215, 219, 231, 239, 243, 248, 267] -217.926498 1
[1, 15, 32, 39, 41, 51, 54, 65, 70, 76, 78, 80, 83, 88, 101, 107, 116, 119, 123, 126, 132, 143, 153, 160, 181, 182, 188, 192, 195, 197, 208, 210, 215, 219, 231, 239, 243, 248, 267] -219.07225 4
[1, 15, 32, 39, 41, 51, 54, 65, 70, 76, 78, 80, 83, 88, 101, 107, 116, 119, 123, 126, 134, 143, 153, 160, 181, 182, 188, 192, 195, 197, 208, 210, 215, 219, 231, 239, 243, 248, 267] -218.826874 4
[1, 15, 32, 39, 41, 51, 54, 65, 70, 76, 78, 80, 83, 91, 101, 107, 116, 119, 123, 126, 134, 143, 153, 160, 181, 182, 188, 192, 195, 197, 208, 210, 215, 219, 231, 239, 243, 248, 267] -218.07251 2
[1, 15, 32, 39, 41, 51, 54, 65, 70, 76, 78, 80, 83, 91, 101, 107, 116, 119, 123, 126, 134, 143, 153, 160, 181, 183, 188, 192, 195, 197, 208, 210, 215, 219, 231, 239, 243, 248, 267] -219.544006 7
[1, 16, 32, 39, 41, 51, 54, 65, 70, 76, 78, 80, 83, 91, 101, 107, 116, 119, 123, 126, 134, 143, 153, 160, 181, 183, 188, 192, 195, 197, 208, 210, 215, 219, 231, 239, 243, 248, 267] -221.183884 2
Calculate residue charges for a group of conformers¶
free_res_crg = msa.ms_convert2sumcrg(microstates, mc.free_residues)
for ir in range(len(mc.free_residues)):
resname = msa.conformers[mc.free_residues[ir][0]].resid # use resid from the first conformer in that residue
print("%s %6.3f" % (resname, free_res_crg[ir]))
NTRA0001_ 0.992
GLUA0007_ -0.733
HISA0015_ 0.995
ASPA0018_ -0.962
TYRA0020_ 0.000
SERA0024_ 0.000
ASNA0027_ 0.000
GLUA0035_ -0.090
ASNA0037_ 0.000
THRA0040_ 0.000
GLNA0041_ 0.000
THRA0043_ 0.000
ASNA0044_ 0.000
ASNA0046_ 0.000
ASPA0048_ -0.991
THRA0051_ 0.000
ASPA0052_ -0.698
TYRA0053_ 0.000
GLNA0057_ 0.000
ASNA0059_ 0.000
SERA0060_ 0.000
ASNA0065_ 0.000
ASPA0066_ -0.987
THRA0069_ 0.000
SERA0085_ 0.000
SERA0086_ 0.000
ASPA0087_ -0.977
THRA0089_ 0.000
SERA0091_ 0.000
ASNA0093_ 0.000
SERA0100_ 0.000
ASPA0101_ -0.219
ASNA0103_ 0.000
ASNA0106_ 0.000
ASNA0113_ 0.000
THRA0118_ 0.000
ASPA0119_ -0.670
GLNA0121_ 0.000
CTRA0129_ -0.888
Group microstates by energy¶
Here is an example to divide microstates into energy bands.
Unique microstate couns and total counts over bands¶
# get the energy bin size from the range of MC energy
e_step = (mc.highest_E - mc.lowest_E)/20
ticks = [mc.lowest_E + e_step*(i) for i in range(20)]
ms_in_bands = msa.groupms_byenergy(mc.microstates.values(), ticks)
uniqms_in_bands = [len(band) for band in ms_in_bands]
print(uniqms_in_bands)
[234, 3387, 18352, 38339, 48204, 46960, 38989, 28568, 18870, 11624, 6654, 3420, 1617, 674, 243, 104, 55, 21, 11, 2]
# Now compute the counts instead of unique microstates
counts_in_bands = [sum([ms.count for ms in band]) for band in ms_in_bands]
print(counts_in_bands)
[31231, 96883, 181201, 240564, 254871, 227699, 176626, 122414, 77258, 45436, 24604, 12170, 5324, 2336, 743, 388, 165, 60, 23, 4]
# plot them
import matplotlib.pyplot as plt
fig=plt.figure(figsize=(18,6))
ax1 = fig.add_subplot(121)
ax1.title.set_text('Unique microstates') # charge over energy bands
ax1.bar([i for i in range(20)], uniqms_in_bands)
ax2 = fig.add_subplot(122)
ax2.title.set_text('Total counts')
ax2.bar([i for i in range(20)], counts_in_bands)
plt.show()
Charge over energy bands¶
crg_in_bands = []
for band in ms_in_bands:
band_total_crg = 0.0
for ms in band:
band_total_crg += msa.ms_charge(ms)*ms.count
crg_in_bands.append(band_total_crg/msa.ms_counts(band))
print(crg_in_bands)
[-6.129454708462745, -5.798344394785463, -5.6241852969906345, -5.446471625014549, -5.284802900290735, -5.118779616950448, -4.983683036472547, -4.845483359746434, -4.724416888865878, -4.590610969275464, -4.436067306129084, -4.405012325390304, -4.340721262208866, -4.333476027397261, -4.141318977119784, -4.461340206185567, -3.8424242424242423, -4.116666666666666, -3.8260869565217392, -3.0]
fig=plt.figure(figsize=(8,6))
plt.bar([i for i in range(20)], crg_in_bands)
plt.show()
Which charge is the most populated¶
counts_vs_charge = {}
for ms in mc.microstates.values():
charge = int(msa.ms_charge(ms))
if charge in counts_vs_charge:
counts_vs_charge[charge] += ms.count
else:
counts_vs_charge[charge] = ms.count
charges = list(counts_vs_charge.keys())
charges.sort()
counts_population = [counts_vs_charge[i] for i in charges]
fig=plt.figure(figsize=(8,6))
plt.bar(charges, counts_population)
plt.show()
Group microstate by conformer selection and show what changed¶
!grep A0035 head3.lst
00065 GLU01A0035_001 f 0.00 0.000 0 0.00 0 0 2.062 -6.408 3.293 -0.967 1.795 0.000 01O000M000 t
00066 GLU01A0035_002 f 0.00 0.000 0 0.00 0 0 2.062 -6.408 0.573 -1.299 1.375 0.000 01O000M000 t
00067 GLU02A0035_003 f 0.00 0.000 0 0.00 0 0 1.629 -6.500 3.293 1.389 1.854 0.000 02O000M000 t
00068 GLU02A0035_004 f 0.00 0.000 0 0.00 0 0 1.629 -6.500 0.573 -0.131 1.245 0.000 02O000M000 t
00069 GLU-1A0035_005 f 0.00 -1.000 0 4.75 0 -1 1.573 -6.495 0.573 -2.810 4.319 -0.300 -1O000M000 t
!grep A0035 fort.38
GLU01A0035_001 0.058 0.058 0.061 0.060 0.053 0.036 0.012 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
GLU01A0035_002 0.811 0.808 0.801 0.779 0.749 0.543 0.158 0.022 0.002 0.000 0.000 0.000 0.000 0.000 0.000
GLU02A0035_003 0.000 0.001 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
GLU02A0035_004 0.132 0.133 0.135 0.138 0.107 0.080 0.024 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
GLU-1A0035_005 0.000 0.000 0.004 0.023 0.090 0.341 0.807 0.978 0.998 1.000 1.000 1.000 1.000 1.000 1.000
Group microstates by conformer index numbers. If any conformer in the list appears in the microstate, this microstate will be put in the first group, otherwise in the second group.
netural, charged = msa.groupms_byiconf(mc.microstates.values(), [64, 65, 66, 67])
See what conformer has changed by calculating conformer occupancy difference.
# what conformers have changed
diff_occ = msa.whatchanged_conf(netural, charged)
for key in diff_occ.keys():
print("%3d, %s: %6.3f" % (key, msa.conformers[key].confid, diff_occ[key]))
0, NTR01A0001_001: 0.000
1, NTR+1A0001_002: -0.000
13, GLU01A0007_002: 0.001
14, GLU02A0007_003: 0.003
15, GLU02A0007_004: -0.003
16, GLU-1A0007_005: -0.001
29, HIS01A0015_002: -0.001
30, HIS02A0015_003: -0.000
32, HIS+1A0015_005: -0.003
33, HIS+1A0015_006: 0.004
36, ASP01A0018_002: 0.003
37, ASP02A0018_003: -0.001
38, ASP02A0018_004: -0.001
39, ASP-1A0018_005: -0.001
41, TYR01A0020_001: -0.006
42, TYR01A0020_002: 0.006
50, SER01A0024_001: 0.009
51, SER01A0024_002: -0.007
52, SER01A0024_003: -0.002
54, ASN01A0027_001: -0.019
55, ASN01A0027_002: 0.019
64, GLU01A0035_001: -0.058
65, GLU01A0035_002: -0.823
67, GLU02A0035_004: -0.118
68, GLU-1A0035_005: 1.000
70, ASN01A0037_001: 0.004
71, ASN01A0037_002: -0.004
75, THR01A0040_001: 0.001
76, THR01A0040_002: -0.001
77, GLN01A0041_001: -0.003
78, GLN01A0041_002: 0.003
80, THR01A0043_001: 0.005
81, THR01A0043_002: -0.005
82, ASN01A0044_001: -0.246
83, ASN01A0044_002: 0.246
88, ASN01A0046_001: -0.205
89, ASN01A0046_002: 0.147
90, ASN01A0046_003: 0.126
91, ASN01A0046_004: -0.068
94, ASP01A0048_002: -0.000
97, ASP02A0048_005: 0.001
101, ASP-1A0048_009: -0.009
102, ASP-1A0048_010: 0.008
106, THR01A0051_001: -0.005
107, THR01A0051_002: 0.005
108, ASP01A0052_001: 0.002
109, ASP01A0052_002: -0.000
110, ASP01A0052_003: 0.020
111, ASP01A0052_004: 0.004
112, ASP02A0052_005: 0.192
113, ASP02A0052_006: 0.016
114, ASP02A0052_007: 0.126
115, ASP02A0052_008: 0.008
116, ASP-1A0052_009: -0.349
117, ASP-1A0052_010: -0.020
118, TYR01A0053_001: 0.001
119, TYR01A0053_002: -0.001
123, GLN01A0057_001: 0.004
124, GLN01A0057_002: -0.004
126, ASN01A0059_001: -0.008
127, ASN01A0059_002: 0.001
128, ASN01A0059_003: 0.007
129, ASN01A0059_004: -0.000
131, SER01A0060_002: 0.002
132, SER01A0060_003: -0.028
133, SER01A0060_004: -0.002
134, SER01A0060_005: 0.028
143, ASN01A0065_001: 0.003
144, ASN01A0065_002: -0.003
149, ASP02A0066_005: 0.001
153, ASP-1A0066_009: 0.001
154, ASP-1A0066_010: -0.002
159, THR01A0069_001: 0.002
160, THR01A0069_002: -0.006
161, THR01A0069_003: 0.004
180, SER01A0085_001: 0.004
181, SER01A0085_002: -0.004
182, SER01A0086_001: -0.001
183, SER01A0086_002: 0.001
186, ASP02A0087_003: -0.005
188, ASP-1A0087_005: 0.005
190, THR01A0089_001: -0.000
191, THR01A0089_002: 0.009
192, THR01A0089_003: -0.009
194, SER01A0091_001: 0.008
195, SER01A0091_002: -0.008
197, ASN01A0093_001: -0.004
198, ASN01A0093_002: 0.004
207, SER01A0100_001: -0.012
208, SER01A0100_002: 0.003
209, SER01A0100_003: 0.009
210, ASP01A0101_001: 0.005
211, ASP01A0101_002: 0.001
212, ASP02A0101_003: -0.003
213, ASP02A0101_004: 0.013
214, ASP-1A0101_005: -0.017
215, ASN01A0103_001: -0.002
216, ASN01A0103_002: 0.002
218, ASN01A0106_001: 0.003
219, ASN01A0106_002: -0.003
230, ASN01A0113_001: 0.056
231, ASN01A0113_002: -0.056
239, THR01A0118_001: -0.000
240, THR01A0118_002: 0.000
241, ASP01A0119_001: 0.003
242, ASP01A0119_002: 0.000
243, ASP02A0119_003: 0.019
244, ASP02A0119_004: 0.000
245, ASP-1A0119_005: -0.023
247, GLN01A0121_001: -0.003
248, GLN01A0121_002: 0.003
263, CTR01A0129_001: -0.001
264, CTR01A0129_002: -0.003
265, CTR02A0129_003: -0.003
266, CTR02A0129_004: -0.003
267, CTR-1A0129_005: 0.010
See what has changed in terms of residue's conformer population change. The residue difference is measured by Bhattachayya distance of residues' conformer populations.
# What residues have changed
diff_bhd = msa.whatchanged_res(netural, charged, mc.free_residues)
resname = []
bhd = []
for ir in range(len(mc.free_residues)):
if diff_bhd[ir] < 99.0:
resname.append(msa.conformers[mc.free_residues[ir][0]].resid)
bhd.append(diff_bhd[ir])
def autolabel(rects, ax):
"""Attach a text label above each bar in *rects*, displaying its height."""
for rect in rects:
height = rect.get_height()
ax.annotate('%.3f' % height,
xy=(rect.get_x() + rect.get_width() / 2, height),
xytext=(0, 3), # 3 points vertical offset
textcoords="offset points",
ha='center', va='bottom')
import numpy as np
x = np.arange(len(resname))
width = 0.2
fig, ax = plt.subplots(figsize=(18,8))
rects = ax.bar(x, bhd, width)
ax.set_ylabel("Residue difference measured by Bhattachayya distance")
autolabel(rects, ax)
plt.xticks(x, resname, rotation='vertical')
plt.show()
Group microstate by conformer names, and chained selection¶
confnames = [conf.confid for conf in msa.conformers]
for confname in confnames:
print(confname)
NTR01A0001_001
NTR+1A0001_002
LYS01A0001_001
LYS+1A0001_002
VAL01A0002_001
PHE01A0003_001
ARG01A0005_001
ARG02A0005_002
ARG02A0005_003
ARG03A0005_004
ARG+1A0005_005
CYD01A0006_001
GLU01A0007_001
GLU01A0007_002
GLU02A0007_003
GLU02A0007_004
GLU-1A0007_005
LEU01A0008_001
ALA01A0009_001
ALA01A0010_001
ALA01A0011_001
MET01A0012_001
LYS01A0013_001
LYS+1A0013_002
ARG01A0014_001
ARG02A0014_002
ARG03A0014_003
ARG+1A0014_004
HIS01A0015_001
HIS01A0015_002
HIS02A0015_003
HIS02A0015_004
HIS+1A0015_005
HIS+1A0015_006
LEU01A0017_001
ASP01A0018_001
ASP01A0018_002
ASP02A0018_003
ASP02A0018_004
ASP-1A0018_005
ASN01A0019_001
TYR01A0020_001
TYR01A0020_002
TYR-1A0020_003
ARG01A0021_001
ARG02A0021_002
ARG03A0021_003
ARG+1A0021_004
TYR01A0023_001
TYR-1A0023_002
SER01A0024_001
SER01A0024_002
SER01A0024_003
LEU01A0025_001
ASN01A0027_001
ASN01A0027_002
TRP01A0028_001
VAL01A0029_001
CYD01A0030_001
ALA01A0031_001
ALA01A0032_001
LYS01A0033_001
LYS+1A0033_002
PHE01A0034_001
GLU01A0035_001
GLU01A0035_002
GLU02A0035_003
GLU02A0035_004
GLU-1A0035_005
SER01A0036_001
ASN01A0037_001
ASN01A0037_002
PHE01A0038_001
ASN01A0039_001
ASN01A0039_002
THR01A0040_001
THR01A0040_002
GLN01A0041_001
GLN01A0041_002
ALA01A0042_001
THR01A0043_001
THR01A0043_002
ASN01A0044_001
ASN01A0044_002
ARG01A0045_001
ARG02A0045_002
ARG03A0045_003
ARG+1A0045_004
ASN01A0046_001
ASN01A0046_002
ASN01A0046_003
ASN01A0046_004
THR01A0047_001
ASP01A0048_001
ASP01A0048_002
ASP01A0048_003
ASP01A0048_004
ASP02A0048_005
ASP02A0048_006
ASP02A0048_007
ASP02A0048_008
ASP-1A0048_009
ASP-1A0048_010
SER01A0050_001
SER01A0050_002
SER01A0050_003
THR01A0051_001
THR01A0051_002
ASP01A0052_001
ASP01A0052_002
ASP01A0052_003
ASP01A0052_004
ASP02A0052_005
ASP02A0052_006
ASP02A0052_007
ASP02A0052_008
ASP-1A0052_009
ASP-1A0052_010
TYR01A0053_001
TYR01A0053_002
TYR-1A0053_003
ILE01A0055_001
LEU01A0056_001
GLN01A0057_001
GLN01A0057_002
ILE01A0058_001
ASN01A0059_001
ASN01A0059_002
ASN01A0059_003
ASN01A0059_004
SER01A0060_001
SER01A0060_002
SER01A0060_003
SER01A0060_004
SER01A0060_005
ARG01A0061_001
ARG02A0061_002
ARG03A0061_003
ARG03A0061_004
ARG+1A0061_005
TRP01A0062_001
TRP01A0063_001
CYD01A0064_001
ASN01A0065_001
ASN01A0065_002
ASP01A0066_001
ASP01A0066_002
ASP01A0066_003
ASP01A0066_004
ASP02A0066_005
ASP02A0066_006
ASP02A0066_007
ASP02A0066_008
ASP-1A0066_009
ASP-1A0066_010
ARG01A0068_001
ARG02A0068_002
ARG03A0068_003
ARG+1A0068_004
THR01A0069_001
THR01A0069_002
THR01A0069_003
PRO01A0070_001
SER01A0072_001
ARG01A0073_001
ARG02A0073_002
ARG03A0073_003
ARG+1A0073_004
ASN01A0074_001
ASN01A0074_002
LEU01A0075_001
CYD01A0076_001
ASN01A0077_001
ILE01A0078_001
PRO01A0079_001
CYD01A0080_001
SER01A0081_001
ALA01A0082_001
LEU01A0083_001
LEU01A0084_001
SER01A0085_001
SER01A0085_002
SER01A0086_001
SER01A0086_002
ASP01A0087_001
ASP01A0087_002
ASP02A0087_003
ASP02A0087_004
ASP-1A0087_005
ILE01A0088_001
THR01A0089_001
THR01A0089_002
THR01A0089_003
ALA01A0090_001
SER01A0091_001
SER01A0091_002
VAL01A0092_001
ASN01A0093_001
ASN01A0093_002
CYD01A0094_001
ALA01A0095_001
LYS01A0096_001
LYS+1A0096_002
LYS01A0097_001
LYS+1A0097_002
ILE01A0098_001
VAL01A0099_001
SER01A0100_001
SER01A0100_002
SER01A0100_003
ASP01A0101_001
ASP01A0101_002
ASP02A0101_003
ASP02A0101_004
ASP-1A0101_005
ASN01A0103_001
ASN01A0103_002
MET01A0105_001
ASN01A0106_001
ASN01A0106_002
ALA01A0107_001
TRP01A0108_001
VAL01A0109_001
ALA01A0110_001
TRP01A0111_001
ARG01A0112_001
ARG02A0112_002
ARG03A0112_003
ARG03A0112_004
ARG+1A0112_005
ASN01A0113_001
ASN01A0113_002
ARG01A0114_001
ARG02A0114_002
ARG03A0114_003
ARG+1A0114_004
CYD01A0115_001
LYS01A0116_001
LYS+1A0116_002
THR01A0118_001
THR01A0118_002
ASP01A0119_001
ASP01A0119_002
ASP02A0119_003
ASP02A0119_004
ASP-1A0119_005
VAL01A0120_001
GLN01A0121_001
GLN01A0121_002
ALA01A0122_001
TRP01A0123_001
ILE01A0124_001
ARG01A0125_001
ARG02A0125_002
ARG03A0125_003
ARG03A0125_004
ARG+1A0125_005
CYD01A0127_001
ARG01A0128_001
ARG02A0128_002
ARG03A0128_003
ARG+1A0128_004
LEU01A0129_001
CTR01A0129_001
CTR01A0129_002
CTR02A0129_003
CTR02A0129_004
CTR-1A0129_005
You can Group conformers by the conformer IDs. IDs are in a list, and each ID is considered as a match as long as it is a substring of the conformer name. The selected microstates that contain all conformers in the ID list will be returned in the first group, and the rest are in the second group.
# Let's select charged GLU35
glu35_charged, _ = msa.groupms_byconfid(microstates, ["GLU-1A0035"])
print(len(glu35_charged))
28695
# find charged ASP48 from the above selected conformers
glu35_asp48_charged, _ = msa.groupms_byconfid(glu35_charged, ["ASP-1A0048"])
print(len(glu35_asp48_charged))
28379
# Or we can do it in one command
glu35_asp48_charged2, _ = msa.groupms_byconfid(microstates, ["GLU-1A0035", "ASP-1A0048"])
print(len(glu35_asp48_charged2))
28379