This page was generated from docs/examples/transmission_ftir/PyIRoGlass_Transmission.ipynb. Interactive online version: .

Transmission FTIR Spectra

This Jupyter notebook provides an example workflow for processing transmission FTIR spectra through PyIRoGlass.
The Jupyter notebook and data can be accessed here: https://github.com/SarahShi/PyIRoGlass/blob/main/docs/examples/transmission_ftir/.
You need to have the PyIRoGlass PyPi package on your machine once. If you have not done this, please uncomment (remove the #) symbol and run the cell below.

[1]:

#!pip install PyIRoGlass

Load Python Packages and Data

Load Python Packages

[2]:

# Import packages

import os
import sys
import glob
import numpy as np
import pandas as pd

import PyIRoGlass as pig

from IPython.display import Image

import matplotlib
from matplotlib import pyplot as plt
from matplotlib import rc, cm

%matplotlib inline
%config InlineBackend.figure_format = 'retina'

pig.__version__

[2]:

'0.6.1'

Set paths to data

[3]:

# Change paths to direct to folder with transmission FTIR spectra

TRANS_PATHS = 'SPECTRA/'
print(TRANS_PATHS)

CHEMTHICK_PATH = 'ChemThick.csv'
print(CHEMTHICK_PATH)

SPECTRA/
ChemThick.csv

Set desired output file directory name

[4]:

# Change to be what you want the prefix of your output files to be.
OUTPUT_PATH = 'RESULTS'
print(OUTPUT_PATH)

RESULTS

Load transmission FTIR spectra and Chemistry Thickness Data

The file names from the spectra (what comes before the .CSV) are important when we load in melt compositions and thicknesses. Unique identifiers identify the same samples. Make sure that this ChemThick.CSV file has the same sample names as the spectra you load in.

[5]:

# Load the path to transmission FTIR spectra

loader = pig.SampleDataLoader(spectrum_path=TRANS_PATHS, chemistry_thickness_path=CHEMTHICK_PATH)
DFS_DICT, CHEMISTRY, THICKNESS = loader.load_all_data()

Let’s look at what a dictionary of transmission FTIR spectra look like. Samples are identified by their file names and the wavenumber and absorbance data are stored for each spectrum.

[6]:

DFS_DICT

[6]:

{'AC4_OL49_021920_30x30_H2O_a':             Absorbance
 Wavenumber
 1000.917      6.000000
 1002.845      6.000000
 1004.774      3.212358
 1006.702      6.000000
 1008.631      3.550053
 ...                ...
 5490.577      0.658218
 5492.505      0.657289
 5494.434      0.657169
 5496.362      0.658473
 5498.291      0.660256

 [2333 rows x 1 columns],
 'AC4_OL53_101220_256s_30x30_a':             Absorbance
 Wavenumber
 1000.916      6.000000
 1002.845      2.809911
 1004.774      2.584419
 1006.702      2.808356
 1008.631      3.712419
 ...                ...
 5490.576      0.118337
 5492.505      0.117460
 5494.433      0.117553
 5496.362      0.117506
 5498.291      0.116924

 [2333 rows x 1 columns],
 'STD_D1010_012821_256s_100x100_a':             Absorbance
 Wavenumber
 1000.916      3.844739
 1002.845      3.630789
 1004.774      6.000000
 1006.702      6.000000
 1008.631      6.000000
 ...                ...
 5490.576      0.394656
 5492.505      0.395436
 5494.433      0.396272
 5496.362      0.396495
 5498.291      0.396368

 [2333 rows x 1 columns]}

Display the dataframe of glass compositions

[7]:

CHEMISTRY

[7]:

	SiO2	TiO2	Al2O3	Fe2O3	FeO	MnO	MgO	CaO	Na2O	K2O	P2O5
Sample
AC4_OL49_021920_30x30_H2O_a	52.34	1.04	17.92	1.93	7.03	0.20	3.63	7.72	4.25	0.78	0.14
AC4_OL53_101220_256s_30x30_a	47.95	1.00	18.88	2.04	7.45	0.19	4.34	9.84	3.47	0.67	0.11
STD_D1010_012821_256s_100x100_a	51.41	1.26	16.58	0.00	7.58	0.00	7.57	10.98	3.01	0.37	0.18

Display the dataframe of wafer thicknesses

[8]:

THICKNESS

[8]:

	Thickness	Sigma_Thickness
Sample
AC4_OL49_021920_30x30_H2O_a	91.25	3
AC4_OL53_101220_256s_30x30_a	39.00	3
STD_D1010_012821_256s_100x100_a	231.00	3

See that the sample names of the spectra in the dictionary, glass compositions and thicknesses in the dataframe all align.

We’re ready to roll – MCMC, here we come!

We use the function Run_All_Spectra, which takes in two arguments:

Dictionary of spectra
Desired output directory name, or None to prevent figure generation.

Running this code will take a few minutes per spectra, as it is fitting \(\mathrm{10^6}\) baselines and peaks to your spectrum to sample uncertainty. If any samples fail, they will be returned in the list FAILURES.

The function automatically saves this file as a CSV, so you have this information. We will also use this dataframe to calculate concentration.

[9]:

DF_OUTPUT, FAILURES = pig.calculate_baselines(DFS_DICT, OUTPUT_PATH)


::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
  Multi-core Markov-chain Monte Carlo (mc3).
  Version 3.1.3.
  Copyright (c) 2015-2024 Patricio Cubillos and collaborators.
  mc3 is open-source software under the MIT license (see LICENSE).
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::


::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
  Warning:
    The number of requested CPUs (4) is >= than the number of
available CPUs (2).  Enforced ncpu to 1.
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

Least-squares best-fitting parameters:
  [ 1.10352342e+00 -1.06147087e+00  9.35733627e-01 -9.21115138e-02
  3.00000000e-01  1.42788938e+03  2.88207396e+01  1.09418301e-01
  1.51745559e+03  3.58474255e+01  1.07059542e-01  6.59012777e-01
  1.22070957e-01  1.53688095e-02 -3.11486494e-04  1.23428192e+00]

Yippee Ki Yay Monte Carlo!
Start MCMC chains  (Wed May 15 18:53:36 2024)

[:         ]  10.0% completed  (Wed May 15 18:53:54 2024)
Out-of-bound Trials:
[   0    6    0    0 8013    0    0   42    0  568   30    0    0    0
    0    0]
Best Parameters: (chisq=369.0163)
[ 1.10352342e+00 -1.06147087e+00  9.35733627e-01 -9.21115138e-02
  3.00000000e-01  1.42788938e+03  2.88207396e+01  1.09418301e-01
  1.51745559e+03  3.58474255e+01  1.07059542e-01  6.59012777e-01
  1.22070957e-01  1.53688095e-02 -3.11486494e-04  1.23428192e+00]

[::        ]  20.0% completed  (Wed May 15 18:54:12 2024)
Out-of-bound Trials:
[    0    26     0     6 16792     0     9    44     0  2994    33     0
     0     0     0     0]
Best Parameters: (chisq=369.0163)
[ 1.10352342e+00 -1.06147087e+00  9.35733627e-01 -9.21115138e-02
  3.00000000e-01  1.42788938e+03  2.88207396e+01  1.09418301e-01
  1.51745559e+03  3.58474255e+01  1.07059542e-01  6.59012777e-01
  1.22070957e-01  1.53688095e-02 -3.11486494e-04  1.23428192e+00]
Gelman-Rubin statistics for free parameters:
[1.10674391 1.11232564 1.01213278 1.11479217 1.0702222  1.10090109
 1.12319507 1.08980551 1.07207929 1.11107936 1.05487019 1.01538335
 1.01958656 1.09037442 1.11475635 1.11684148]

[:::       ]  30.0% completed  (Wed May 15 18:54:30 2024)
Out-of-bound Trials:
[    0    86     0    17 25840     3    29    48     0  6313    36     0
     0     0     0     0]
Best Parameters: (chisq=369.0163)
[ 1.10352342e+00 -1.06147087e+00  9.35733627e-01 -9.21115138e-02
  3.00000000e-01  1.42788938e+03  2.88207396e+01  1.09418301e-01
  1.51745559e+03  3.58474255e+01  1.07059542e-01  6.59012777e-01
  1.22070957e-01  1.53688095e-02 -3.11486494e-04  1.23428192e+00]
Gelman-Rubin statistics for free parameters:
[1.00684269 1.00639861 1.00127873 1.00501015 1.00993297 1.00299196
 1.00946854 1.00251714 1.00408626 1.00474032 1.00258649 1.0026802
 1.00203227 1.00341554 1.00658911 1.00609207]
All parameters converged to within 1% of unity.

[::::      ]  40.0% completed  (Wed May 15 18:54:48 2024)
Out-of-bound Trials:
[    0   167     0    41 37568     5    85    49     0 10207    37     0
     0     0     0     0]
Best Parameters: (chisq=369.0163)
[ 1.10352342e+00 -1.06147087e+00  9.35733627e-01 -9.21115138e-02
  3.00000000e-01  1.42788938e+03  2.88207396e+01  1.09418301e-01
  1.51745559e+03  3.58474255e+01  1.07059542e-01  6.59012777e-01
  1.22070957e-01  1.53688095e-02 -3.11486494e-04  1.23428192e+00]
Gelman-Rubin statistics for free parameters:
[1.00294441 1.0029953  1.0005859  1.00282158 1.00159926 1.0031075
 1.00643434 1.002798   1.00270522 1.00405324 1.00155813 1.00050084
 1.00098768 1.00184398 1.0031285  1.00311975]
All parameters converged to within 1% of unity.

[:::::     ]  50.0% completed  (Wed May 15 18:55:05 2024)
Out-of-bound Trials:
[    0   253     0    60 49564     5   145    52     0 14113    38     0
     0     0     0     0]
Best Parameters: (chisq=369.0163)
[ 1.10352342e+00 -1.06147087e+00  9.35733627e-01 -9.21115138e-02
  3.00000000e-01  1.42788938e+03  2.88207396e+01  1.09418301e-01
  1.51745559e+03  3.58474255e+01  1.07059542e-01  6.59012777e-01
  1.22070957e-01  1.53688095e-02 -3.11486494e-04  1.23428192e+00]
Gelman-Rubin statistics for free parameters:
[1.0008746  1.00087287 1.00096411 1.00084459 1.00074084 1.00223498
 1.00369    1.00143909 1.0020051  1.00296911 1.00099564 1.00048715
 1.00055666 1.0010247  1.00089598 1.00090896]
All parameters converged to within 1% of unity.

[::::::    ]  60.0% completed  (Wed May 15 18:55:23 2024)
Out-of-bound Trials:
[    0   361     0    77 61426     6   200    53     0 18169    38     0
     0     0     0     0]
Best Parameters: (chisq=369.0163)
[ 1.10352342e+00 -1.06147087e+00  9.35733627e-01 -9.21115138e-02
  3.00000000e-01  1.42788938e+03  2.88207396e+01  1.09418301e-01
  1.51745559e+03  3.58474255e+01  1.07059542e-01  6.59012777e-01
  1.22070957e-01  1.53688095e-02 -3.11486494e-04  1.23428192e+00]
Gelman-Rubin statistics for free parameters:
[1.00121444 1.00127185 1.00054922 1.00120909 1.00098455 1.00122816
 1.00206778 1.00141519 1.00094463 1.00162735 1.00101469 1.00043052
 1.00046465 1.00057947 1.00127999 1.00131894]
All parameters converged to within 1% of unity.

All parameters satisfy the GR convergence threshold of 1.01, stopping
the MCMC.

MCMC Summary:
-------------
  Number of evaluated samples:        601875
  Number of parallel chains:               9
  Average iterations per chain:        66875
  Burned-in iterations per chain:      20000
  Thinning factor:                         5
  MCMC sample size (thinned, burned):  84375
  Acceptance rate:   23.15%
med_central

Parameter name     best fit   median      1sigma_low   1sigma_hi        S/N
--------------- -----------  -----------------------------------  ---------
B_mean           1.1035e+00   1.1308e+00 -3.3480e-02  4.1006e-02       28.5
B_PC1           -1.0615e+00  -7.7975e-01 -3.6055e-01  4.2951e-01        2.6
B_PC2            9.3573e-01   9.3262e-01 -2.0061e-02  2.0121e-02       46.5
B_PC3           -9.2112e-02  -5.7341e-02 -5.8443e-02  6.3083e-02        1.5
B_PC4            3.0000e-01   2.7896e-01 -2.9216e-02  1.5487e-02       12.2
G1430_peak       1.4279e+03   1.4279e+03 -1.3981e+00  1.4562e+00      994.5
G1430_std        2.8821e+01   2.8770e+01 -1.1710e+00  1.2983e+00       23.6
G1430_amp        1.0942e-01   1.0858e-01 -4.0393e-03  4.0504e-03       27.0
G1515_peak       1.5175e+03   1.5176e+03 -1.5510e+00  1.5450e+00      982.4
G1515_std        3.5847e+01   3.6092e+01 -2.1055e+00  2.1574e+00       17.9
G1515_amp        1.0706e-01   1.0691e-01 -3.5939e-03  3.6792e-03       29.6
H1635_mean       6.5901e-01   6.5922e-01 -3.1228e-03  3.0796e-03      213.8
H1635_PC1        1.2207e-01   1.1995e-01 -1.4689e-02  1.4120e-02        8.5
H1635_PC2        1.5369e-02   1.5071e-02 -2.2155e-02  2.1627e-02        0.7
m               -3.1149e-04  -2.6380e-04 -6.0680e-05  7.3484e-05        4.5
b                1.2343e+00   1.2200e+00 -2.2714e-02  1.9491e-02       56.5

  Best-parameter's chi-squared:       367.5930
  Best-parameter's -2*log(posterior): 369.0163
  Bayesian Information Criterion:     469.8369
  Reduced chi-squared:                  0.6338
  Standard deviation of residuals:  0.00785345

For a detailed summary with all parameter posterior statistics see
/home/docs/checkouts/readthedocs.org/user_builds/pyiroglass/checkouts/latest/docs/examples/transmission_ftir/NPZTXTFILES/RESULTS/AC4_OL49_021920_30x30_H2O_a_statistics.txt

Output sampler files:
  /home/docs/checkouts/readthedocs.org/user_builds/pyiroglass/checkouts/latest/docs/examples/transmission_ftir/NPZTXTFILES/RESULTS/AC4_OL49_021920_30x30_H2O_a_statistics.txt
  /home/docs/checkouts/readthedocs.org/user_builds/pyiroglass/checkouts/latest/docs/examples/transmission_ftir/NPZTXTFILES/RESULTS/AC4_OL49_021920_30x30_H2O_a.npz
  /home/docs/checkouts/readthedocs.org/user_builds/pyiroglass/checkouts/latest/docs/examples/transmission_ftir/LOGFILES/RESULTS/AC4_OL49_021920_30x30_H2O_a.log

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
  Multi-core Markov-chain Monte Carlo (mc3).
  Version 3.1.3.
  Copyright (c) 2015-2024 Patricio Cubillos and collaborators.
  mc3 is open-source software under the MIT license (see LICENSE).
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::


::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
  Warning:
    The number of requested CPUs (4) is >= than the number of
available CPUs (2).  Enforced ncpu to 1.
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

Least-squares best-fitting parameters:
  [ 5.11148215e-01 -2.96910168e+00  2.75566349e-01 -5.06622373e-01
  2.27844141e-01  1.42739189e+03  3.45255612e+01  5.04958690e-02
  1.51924154e+03  3.21665128e+01  5.31133458e-02  3.00627000e-01
 -3.90990559e-02  5.29060308e-02 -4.45749713e-04  7.21455940e-01]

Yippee Ki Yay Monte Carlo!
Start MCMC chains  (Wed May 15 18:55:46 2024)

[:         ]  10.0% completed  (Wed May 15 18:56:05 2024)
Out-of-bound Trials:
[   2 6210    0  257  535    0   97  239    2    0  151    1    0    0
    0    0]
Best Parameters: (chisq=49.0306)
[ 5.11148215e-01 -2.96910168e+00  2.75566349e-01 -5.06622373e-01
  2.27844141e-01  1.42739189e+03  3.45255612e+01  5.04958690e-02
  1.51924154e+03  3.21665128e+01  5.31133458e-02  3.00627000e-01
 -3.90990559e-02  5.29060308e-02 -4.45749713e-04  7.21455940e-01]

[::        ]  20.0% completed  (Wed May 15 18:56:24 2024)
Out-of-bound Trials:
[    3 11569     0   536   953    58  2279   254    15   439   161     1
     0     0     0     0]
Best Parameters: (chisq=49.0306)
[ 5.11148215e-01 -2.96910168e+00  2.75566349e-01 -5.06622373e-01
  2.27844141e-01  1.42739189e+03  3.45255612e+01  5.04958690e-02
  1.51924154e+03  3.21665128e+01  5.31133458e-02  3.00627000e-01
 -3.90990559e-02  5.29060308e-02 -4.45749713e-04  7.21455940e-01]
Gelman-Rubin statistics for free parameters:
[1.04070395 1.04391759 1.01359305 1.05452592 1.03465832 1.05513652
 1.07643753 1.01383557 1.03130795 1.08898324 1.01518967 1.01549802
 1.01878584 1.059296   1.04268791 1.04436281]

[:::       ]  30.0% completed  (Wed May 15 18:56:42 2024)
Out-of-bound Trials:
[    6 19659     0  1201  1773   241  5402   268    48  1764   171     2
     0     0     0     0]
Best Parameters: (chisq=49.0306)
[ 5.11148215e-01 -2.96910168e+00  2.75566349e-01 -5.06622373e-01
  2.27844141e-01  1.42739189e+03  3.45255612e+01  5.04958690e-02
  1.51924154e+03  3.21665128e+01  5.31133458e-02  3.00627000e-01
 -3.90990559e-02  5.29060308e-02 -4.45749713e-04  7.21455940e-01]
Gelman-Rubin statistics for free parameters:
[1.01267478 1.01193183 1.00374429 1.01092897 1.00937829 1.00655317
 1.00350387 1.00164477 1.00465704 1.00882303 1.00351224 1.00683759
 1.00268963 1.00524447 1.01238426 1.01161427]

[::::      ]  40.0% completed  (Wed May 15 18:57:00 2024)
Out-of-bound Trials:
[    9 29156     0  1906  2652   454  8951   285    81  3421   181     2
     0     0     0     0]
Best Parameters: (chisq=49.0306)
[ 5.11148215e-01 -2.96910168e+00  2.75566349e-01 -5.06622373e-01
  2.27844141e-01  1.42739189e+03  3.45255612e+01  5.04958690e-02
  1.51924154e+03  3.21665128e+01  5.31133458e-02  3.00627000e-01
 -3.90990559e-02  5.29060308e-02 -4.45749713e-04  7.21455940e-01]
Gelman-Rubin statistics for free parameters:
[1.00219844 1.00193164 1.00270254 1.00242991 1.00253898 1.00202185
 1.0023219  1.0018268  1.00197808 1.00417544 1.00260007 1.00147866
 1.00235118 1.00336759 1.00202446 1.0018298 ]
All parameters converged to within 1% of unity.

[:::::     ]  50.0% completed  (Wed May 15 18:57:17 2024)
Out-of-bound Trials:
[   15 39504     0  2690  3567   660 12659   302   134  5185   187     2
     0     0     0     0]
Best Parameters: (chisq=49.0306)
[ 5.11148215e-01 -2.96910168e+00  2.75566349e-01 -5.06622373e-01
  2.27844141e-01  1.42739189e+03  3.45255612e+01  5.04958690e-02
  1.51924154e+03  3.21665128e+01  5.31133458e-02  3.00627000e-01
 -3.90990559e-02  5.29060308e-02 -4.45749713e-04  7.21455940e-01]
Gelman-Rubin statistics for free parameters:
[1.00216381 1.00192736 1.00145218 1.00166651 1.00243878 1.00140878
 1.00177984 1.00170569 1.00118871 1.00325101 1.00162273 1.00045052
 1.00141221 1.00234885 1.0019753  1.00175179]
All parameters converged to within 1% of unity.

[::::::    ]  60.0% completed  (Wed May 15 18:57:35 2024)
Out-of-bound Trials:
[   18 50296     0  3516  4630   878 16486   316   194  7081   196     2
     0     0     0     0]
Best Parameters: (chisq=49.0306)
[ 5.11148215e-01 -2.96910168e+00  2.75566349e-01 -5.06622373e-01
  2.27844141e-01  1.42739189e+03  3.45255612e+01  5.04958690e-02
  1.51924154e+03  3.21665128e+01  5.31133458e-02  3.00627000e-01
 -3.90990559e-02  5.29060308e-02 -4.45749713e-04  7.21455940e-01]
Gelman-Rubin statistics for free parameters:
[1.00146824 1.00134519 1.00071332 1.00114479 1.00181826 1.0006573
 1.00176888 1.0006087  1.00055357 1.0012184  1.00085995 1.00050291
 1.00101082 1.00110086 1.00136138 1.0012221 ]
All parameters converged to within 1% of unity.

All parameters satisfy the GR convergence threshold of 1.01, stopping
the MCMC.

MCMC Summary:
-------------
  Number of evaluated samples:        601830
  Number of parallel chains:               9
  Average iterations per chain:        66870
  Burned-in iterations per chain:      20000
  Thinning factor:                         5
  MCMC sample size (thinned, burned):  84366
  Acceptance rate:   24.15%
med_central

Parameter name     best fit   median      1sigma_low   1sigma_hi        S/N
--------------- -----------  -----------------------------------  ---------
B_mean           5.1115e-01   5.5681e-01 -3.4433e-02  5.5146e-02       11.4
B_PC1           -2.9691e+00  -2.4953e+00 -3.5542e-01  5.6596e-01        6.4
B_PC2            2.7557e-01   2.7039e-01 -2.0701e-02  2.0729e-02       13.4
B_PC3           -5.0662e-01  -4.4657e-01 -4.9591e-02  7.3767e-02        8.0
B_PC4            2.2784e-01   1.8925e-01 -4.6668e-02  3.8958e-02        5.3
G1430_peak       1.4274e+03   1.4271e+03 -3.1174e+00  3.1980e+00      457.1
G1430_std        3.4526e+01   3.4477e+01 -2.9253e+00  2.9915e+00       12.4
G1430_amp        5.0496e-02   5.0050e-02 -4.3360e-03  4.4195e-03       11.5
G1515_peak       1.5192e+03   1.5194e+03 -2.7407e+00  2.8295e+00      535.5
G1515_std        3.2167e+01   3.2985e+01 -2.6867e+00  3.0157e+00       11.7
G1515_amp        5.3113e-02   5.3351e-02 -3.7499e-03  3.7743e-03       14.1
H1635_mean       3.0063e-01   3.0106e-01 -3.1460e-03  3.0936e-03       96.3
H1635_PC1       -3.9099e-02  -4.1653e-02 -1.5008e-02  1.4993e-02        2.6
H1635_PC2        5.2906e-02   5.1733e-02 -1.9698e-02  1.8173e-02        2.8
m               -4.4575e-04  -3.6561e-04 -6.0057e-05  9.6661e-05        5.7
b                7.2146e-01   6.9737e-01 -2.8763e-02  1.8002e-02       30.5

  Best-parameter's chi-squared:        48.0236
  Best-parameter's -2*log(posterior):  49.0306
  Bayesian Information Criterion:     150.2674
  Reduced chi-squared:                  0.0828
  Standard deviation of residuals:  0.0028386

For a detailed summary with all parameter posterior statistics see
/home/docs/checkouts/readthedocs.org/user_builds/pyiroglass/checkouts/latest/docs/examples/transmission_ftir/NPZTXTFILES/RESULTS/AC4_OL53_101220_256s_30x30_a_statistics.txt

Output sampler files:
  /home/docs/checkouts/readthedocs.org/user_builds/pyiroglass/checkouts/latest/docs/examples/transmission_ftir/NPZTXTFILES/RESULTS/AC4_OL53_101220_256s_30x30_a_statistics.txt
  /home/docs/checkouts/readthedocs.org/user_builds/pyiroglass/checkouts/latest/docs/examples/transmission_ftir/NPZTXTFILES/RESULTS/AC4_OL53_101220_256s_30x30_a.npz
  /home/docs/checkouts/readthedocs.org/user_builds/pyiroglass/checkouts/latest/docs/examples/transmission_ftir/LOGFILES/RESULTS/AC4_OL53_101220_256s_30x30_a.log

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
  Multi-core Markov-chain Monte Carlo (mc3).
  Version 3.1.3.
  Copyright (c) 2015-2024 Patricio Cubillos and collaborators.
  mc3 is open-source software under the MIT license (see LICENSE).
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::


::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
  Warning:
    The number of requested CPUs (4) is >= than the number of
available CPUs (2).  Enforced ncpu to 1.
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

Least-squares best-fitting parameters:
  [ 4.00000000e+00  7.53559935e-01 -4.45578581e-01 -3.77207822e-01
  1.77129379e-01  1.43150802e+03  3.10778620e+01  6.77796735e-02
  1.52279967e+03  3.51836515e+01  7.59276773e-02  1.73134547e-01
  6.68571626e-02 -3.77537401e-02 -2.32934274e-04  2.46271033e+00]

Yippee Ki Yay Monte Carlo!
Start MCMC chains  (Wed May 15 18:57:59 2024)

[:         ]  10.0% completed  (Wed May 15 18:58:16 2024)
Out-of-bound Trials:
[19065     0     0    28   120    12   243    82    21   287   132     6
     0     0     0     0]
Best Parameters: (chisq=1775.0923)
[ 4.00000000e+00  7.53559935e-01 -4.45578581e-01 -3.77207822e-01
  1.77129379e-01  1.43150802e+03  3.10778620e+01  6.77796735e-02
  1.52279967e+03  3.51836515e+01  7.59276773e-02  1.73134547e-01
  6.68571626e-02 -3.77537401e-02 -2.32934274e-04  2.46271033e+00]

[::        ]  20.0% completed  (Wed May 15 18:58:34 2024)
Out-of-bound Trials:
[35263     0     0   106   360    63   850   111    85  2086   146     9
     0     0     0     1]
Best Parameters: (chisq=1775.0923)
[ 4.00000000e+00  7.53559935e-01 -4.45578581e-01 -3.77207822e-01
  1.77129379e-01  1.43150802e+03  3.10778620e+01  6.77796735e-02
  1.52279967e+03  3.51836515e+01  7.59276773e-02  1.73134547e-01
  6.68571626e-02 -3.77537401e-02 -2.32934274e-04  2.46271033e+00]
Gelman-Rubin statistics for free parameters:
[1.01233837 1.01184456 1.00741813 1.01639069 1.02067431 1.01462154
 1.01051526 1.01675564 1.03651138 1.04546149 1.01582452 1.01018856
 1.01004655 1.05970834 1.01461074 1.01542152]

[:::       ]  30.0% completed  (Wed May 15 18:58:51 2024)
Out-of-bound Trials:
[51457     3     0   166   532   101  1467   117   133  6249   154     9
     0     0     0     1]
Best Parameters: (chisq=1775.0923)
[ 4.00000000e+00  7.53559935e-01 -4.45578581e-01 -3.77207822e-01
  1.77129379e-01  1.43150802e+03  3.10778620e+01  6.77796735e-02
  1.52279967e+03  3.51836515e+01  7.59276773e-02  1.73134547e-01
  6.68571626e-02 -3.77537401e-02 -2.32934274e-04  2.46271033e+00]
Gelman-Rubin statistics for free parameters:
[1.004173   1.00282192 1.00093154 1.0032197  1.00057238 1.00195384
 1.00223199 1.00056874 1.00319196 1.0032321  1.00210912 1.00105151
 1.00070689 1.00482151 1.00389478 1.00283896]
All parameters converged to within 1% of unity.

[::::      ]  40.0% completed  (Wed May 15 18:59:08 2024)
Out-of-bound Trials:
[67140     4     0   219   712   152  2118   129   229 10925   162    11
     0     0     0     1]
Best Parameters: (chisq=1775.0923)
[ 4.00000000e+00  7.53559935e-01 -4.45578581e-01 -3.77207822e-01
  1.77129379e-01  1.43150802e+03  3.10778620e+01  6.77796735e-02
  1.52279967e+03  3.51836515e+01  7.59276773e-02  1.73134547e-01
  6.68571626e-02 -3.77537401e-02 -2.32934274e-04  2.46271033e+00]
Gelman-Rubin statistics for free parameters:
[1.00201451 1.00081547 1.00050274 1.00220312 1.00114344 1.00165563
 1.00113884 1.00130837 1.00148193 1.00153847 1.00033209 1.00018104
 1.00059669 1.00304991 1.00111535 1.00100912]
All parameters converged to within 1% of unity.

[:::::     ]  50.0% completed  (Wed May 15 18:59:25 2024)
Out-of-bound Trials:
[82632     6     0   286   908   194  2820   134   326 16046   167    11
     0     0     0     1]
Best Parameters: (chisq=1775.0923)
[ 4.00000000e+00  7.53559935e-01 -4.45578581e-01 -3.77207822e-01
  1.77129379e-01  1.43150802e+03  3.10778620e+01  6.77796735e-02
  1.52279967e+03  3.51836515e+01  7.59276773e-02  1.73134547e-01
  6.68571626e-02 -3.77537401e-02 -2.32934274e-04  2.46271033e+00]
Gelman-Rubin statistics for free parameters:
[1.00249236 1.00068757 1.00034393 1.00107017 1.00048702 1.0010046
 1.00080603 1.00056217 1.00138244 1.0017703  1.00021123 1.00078196
 1.00067332 1.00245056 1.00108487 1.00056292]
All parameters converged to within 1% of unity.

[::::::    ]  60.0% completed  (Wed May 15 18:59:42 2024)
Out-of-bound Trials:
[98014     6     0   335  1081   242  3506   135   403 21307   170    12
     0     0     0     1]
Best Parameters: (chisq=1775.0923)
[ 4.00000000e+00  7.53559935e-01 -4.45578581e-01 -3.77207822e-01
  1.77129379e-01  1.43150802e+03  3.10778620e+01  6.77796735e-02
  1.52279967e+03  3.51836515e+01  7.59276773e-02  1.73134547e-01
  6.68571626e-02 -3.77537401e-02 -2.32934274e-04  2.46271033e+00]
Gelman-Rubin statistics for free parameters:
[1.00146932 1.00058927 1.0005502  1.00109683 1.00036258 1.001198
 1.00043513 1.00066496 1.00044444 1.00184353 1.00017173 1.00064879
 1.00079275 1.00206012 1.00075582 1.00061728]
All parameters converged to within 1% of unity.

All parameters satisfy the GR convergence threshold of 1.01, stopping
the MCMC.

MCMC Summary:
-------------
  Number of evaluated samples:        601785
  Number of parallel chains:               9
  Average iterations per chain:        66865
  Burned-in iterations per chain:      20000
  Thinning factor:                         5
  MCMC sample size (thinned, burned):  84357
  Acceptance rate:   20.74%
med_central

Parameter name     best fit   median      1sigma_low   1sigma_hi        S/N
--------------- -----------  -----------------------------------  ---------
B_mean           4.0000e+00   3.9958e+00 -7.0525e-03  3.1259e-03      635.7
B_PC1            7.5356e-01   6.9709e-01 -1.0260e-01  9.1575e-02        7.4
B_PC2           -4.4558e-01  -4.4482e-01 -1.8960e-02  1.9145e-02       23.4
B_PC3           -3.7721e-01  -3.8551e-01 -2.7287e-02  2.6996e-02       13.8
B_PC4            1.7713e-01   1.8305e-01 -2.2940e-02  2.3830e-02        7.5
G1430_peak       1.4315e+03   1.4317e+03 -2.2220e+00  2.4052e+00      626.9
G1430_std        3.1078e+01   3.1377e+01 -2.2298e+00  2.5538e+00       12.9
G1430_amp        6.7780e-02   6.7566e-02 -4.1183e-03  4.0412e-03       16.6
G1515_peak       1.5228e+03   1.5232e+03 -2.2956e+00  2.4261e+00      639.5
G1515_std        3.5184e+01   3.5486e+01 -2.7842e+00  2.6735e+00       14.1
G1515_amp        7.5928e-02   7.5496e-02 -3.3240e-03  3.3605e-03       22.6
H1635_mean       1.7313e-01   1.7288e-01 -3.0152e-03  3.0531e-03       56.8
H1635_PC1        6.6857e-02   6.7955e-02 -1.4359e-02  1.4622e-02        4.6
H1635_PC2       -3.7754e-02  -4.1360e-02 -2.3998e-02  2.3766e-02        1.6
m               -2.3293e-04  -2.4200e-04 -1.4715e-05  1.2075e-05       15.8
b                2.4627e+00   2.4657e+00 -6.3437e-03  6.6631e-03      370.2

  Best-parameter's chi-squared:       1773.9710
  Best-parameter's -2*log(posterior): 1775.0923
  Bayesian Information Criterion:     1876.2148
  Reduced chi-squared:                   3.0586
  Standard deviation of residuals:  0.0172524

For a detailed summary with all parameter posterior statistics see
/home/docs/checkouts/readthedocs.org/user_builds/pyiroglass/checkouts/latest/docs/examples/transmission_ftir/NPZTXTFILES/RESULTS/STD_D1010_012821_256s_100x100_a_statistics.txt

Output sampler files:
  /home/docs/checkouts/readthedocs.org/user_builds/pyiroglass/checkouts/latest/docs/examples/transmission_ftir/NPZTXTFILES/RESULTS/STD_D1010_012821_256s_100x100_a_statistics.txt
  /home/docs/checkouts/readthedocs.org/user_builds/pyiroglass/checkouts/latest/docs/examples/transmission_ftir/NPZTXTFILES/RESULTS/STD_D1010_012821_256s_100x100_a.npz
  /home/docs/checkouts/readthedocs.org/user_builds/pyiroglass/checkouts/latest/docs/examples/transmission_ftir/LOGFILES/RESULTS/STD_D1010_012821_256s_100x100_a.log

It took 3 minutes to process 3 spectra on my Macbook Pro 2.6 GHz 6-Core Intel Core i7. It takes about 7.5 minutes to process 3 spectra on Google Colab, given the presence of fewer cpus.

Run_All_Spectra returns a dataframe of outputs. Let’s look at what’s included.

[10]:

DF_OUTPUT

[10]:

	PH_3550_M	PH_3550_STD	H2Ot_3550_MAX	BL_H2Ot_3550_MAX	H2Ot_3550_SAT	PH_1635_BP	PH_1635_STD	PH_1515_BP	PH_1515_STD	P_1515_BP	...	PC4_BP	PC4_STD	m_BP	m_STD	b_BP	b_STD	PH_1635_PC1_BP	PH_1635_PC1_STD	PH_1635_PC2_BP	PH_1635_PC2_STD
AC4_OL49_021920_30x30_H2O_a	2.17225	0.002212	2.649837	0.459859	*	0.659013	0.003082	0.107060	0.003619	1517.455586	...	0.300000	0.024590	-0.000311	0.000070	1.234282	0.021843	0.122071	0.014446	0.015369	0.021786
AC4_OL53_101220_256s_30x30_a	1.523343	0.003309	1.631044	0.135541	-	0.300627	0.003121	0.053113	0.003760	1519.241535	...	0.227844	0.043025	-0.000446	0.000079	0.721456	0.023675	-0.039099	0.014984	0.052906	0.019022
STD_D1010_012821_256s_100x100_a	2.710894	0.001587	2.915508	0.206641	*	0.173135	0.003047	0.075928	0.003358	1522.799669	...	0.177129	0.023619	-0.000233	0.000015	2.462710	0.006652	0.066857	0.014413	-0.037754	0.023534

3 rows × 45 columns

We can look at all the columns in this dataframe, given the size.

[11]:

DF_OUTPUT.columns

[11]:

Index(['PH_3550_M', 'PH_3550_STD', 'H2Ot_3550_MAX', 'BL_H2Ot_3550_MAX',
       'H2Ot_3550_SAT', 'PH_1635_BP', 'PH_1635_STD', 'PH_1515_BP',
       'PH_1515_STD', 'P_1515_BP', 'P_1515_STD', 'STD_1515_BP', 'STD_1515_STD',
       'PH_1430_BP', 'PH_1430_STD', 'P_1430_BP', 'P_1430_STD', 'STD_1430_BP',
       'STD_1430_STD', 'PH_5200_M', 'PH_5200_STD', 'PH_4500_M', 'PH_4500_STD',
       'STN_P5200', 'ERR_5200', 'STN_P4500', 'ERR_4500', 'AVG_BL_BP',
       'AVG_BL_STD', 'PC1_BP', 'PC1_STD', 'PC2_BP', 'PC2_STD', 'PC3_BP',
       'PC3_STD', 'PC4_BP', 'PC4_STD', 'm_BP', 'm_STD', 'b_BP', 'b_STD',
       'PH_1635_PC1_BP', 'PH_1635_PC1_STD', 'PH_1635_PC2_BP',
       'PH_1635_PC2_STD'],
      dtype='object')

All columns with the prefix of PH represent a peak height. All columns with the suffix of _M represent the mean value, and the suffix of _STD represents 1 \(\sigma\).

The column H2Ot_3550_SAT returns a - if the sample is not saturated, and a * if the sample is saturated. This is based on the maximum absorbance of the peak, and the warning of * indicates that we must consider the concentrations more. The following functions calculating concentration handle this and will suggest best values to use.

The columns STN_P5200 and STN_P4500 represent the signal to noise ratios for the \(\mathrm{H_2O_{m,5200}}\) and \(\mathrm{OH^-_{4500}}\) peaks. If the values are greater than 4, indicating that the signal is meaningful, the ERR_5200 and ERR_4500 peaks return a - value. If signal-to-noise is too low, the warning of * is returned.

The columns after describe the fitting parameters for generating the baseline and the \(\mathrm{H_2O_{m,1635}}\) peak, so you can generate the baseline yourself.

Outputs

Quite few figures, log files, and npz files are generated by Run_All_Spectra, assuming you provide an export path and not just the value of None. Let’s look at a few of them together.

PyIRoGlass creates this figure for visualizing how each peak within the 1000-5500 cm\({^{-1}}\) is fit, with their peak heights shown.

[12]:

Image("https://github.com/sarahshi/PyIRoGlass/raw/main/docs/_static/AC4_OL49_021920_30x30_H2O_a.png")

[12]:

../../_images/examples_transmission_ftir_PyIRoGlass_Transmission_27_0.png

We can visualize how well PyIRoGlass does in fitting this transmission FTIR spectrum, with the modelfit figure. This plots the fit from \(\mathrm{MC^3}\) against the transmission FTIR spectrum, with the residual in fit.

[13]:

Image("https://github.com/sarahshi/PyIRoGlass/raw/main/docs/_static/AC4_OL49_021920_30x30_H2O_a_modelfit.png")

[13]:

../../_images/examples_transmission_ftir_PyIRoGlass_Transmission_29_0.png

The histogram figure shows the distribution of posterior probability densities, with the mean value displayed in the navy dashed line. The shaded region represents the 68% confidence interval around the value.

[14]:

Image("https://github.com/sarahshi/PyIRoGlass/raw/main/docs/_static/AC4_OL49_021920_30x30_H2O_a_histogram.png")

[14]:

../../_images/examples_transmission_ftir_PyIRoGlass_Transmission_31_0.png

The pairwise figure plots the posterior probability density distribution for the 16 fitting parameters of Equation 10, allowing for the visualization of covariance within the parameters. Accounting for covariance allows us to properly account for uncertainty.

[15]:

Image("https://github.com/sarahshi/PyIRoGlass/raw/main/docs/_static/AC4_OL49_021920_30x30_H2O_a_pairwise.png")

[15]:

../../_images/examples_transmission_ftir_PyIRoGlass_Transmission_33_0.png

The trace figure shows how the parameters evolve through MCMC sampling.

[16]:

Image("https://github.com/sarahshi/PyIRoGlass/raw/main/docs/_static/AC4_OL49_021920_30x30_H2O_a_trace.png")

[16]:

../../_images/examples_transmission_ftir_PyIRoGlass_Transmission_35_0.png

LOG and NPZ

.log files record the performance of the MCMC algorithm through the samples, and the best parameters at each 10% increment. These are shown above.

.npz files store all the best-parameters, sampled parameters, etc. in a ready-to-use NumPy format.

We won’t open these here, but these are quite useful to review!

Concentrations

We now want to convert all those peak heights (with uncertainties) to concentrations (with uncertainties), by applying the Beer-Lambert Law. We do so by using the calculate_concentrations function, which takes in these parameters and samples over N samples for a secondary MCMC:

DF_OUTPUT: Output from Run_All_Spectra
THICKNESS: Wafer thickness loaded from ChemThick
CHEMISTRY: Glass composition loaded from ChemThick
OUTPUT_PATH: Output directory name, or None to prevent figure generation.

[17]:

concentrations_df = pig.calculate_concentrations(DF_OUTPUT, CHEMISTRY, THICKNESS, OUTPUT_PATH)

We’re all done now! Let’s print your results.

[18]:

concentrations_df

[18]:

	H2Ot_MEAN	H2Ot_STD	H2Ot_3550_M	H2Ot_3550_STD	H2Ot_3550_SAT	H2Om_1635_BP	H2Om_1635_STD	CO2_MEAN	CO2_STD	CO2_1515_BP	...	epsilon_H2Ot_3550	sigma_epsilon_H2Ot_3550	epsilon_H2Om_1635	sigma_epsilon_H2Om_1635	epsilon_CO2	sigma_epsilon_CO2	epsilon_H2Om_5200	sigma_epsilon_H2Om_5200	epsilon_OH_4500	sigma_epsilon_OH_4500
AC4_OL49_021920_30x30_H2O_a	2.545209	0.160672	2.4206	0.1894	*	1.301434	0.189726	753.184415	36.514063	745.9116	...	66.142594	7.503769	37.322108	8.645060	258.429949	18.362798	1.009458	0.300803	0.861196	0.279571
AC4_OL53_101220_256s_30x30_a	4.036918	0.431879	4.036918	0.431879	-	1.49134	0.258261	737.553146	62.397098	756.185931	...	64.493655	7.380834	34.452486	8.504269	293.261300	16.287120	0.901474	0.295829	0.779611	0.274924
STD_D1010_012821_256s_100x100_a	0.903953	0.096934	1.201405	0.087883	*	0.153236	0.025411	157.114814	7.287584	165.492165	...	62.751984	7.251813	31.421482	8.354348	311.700491	15.127604	0.787418	0.290510	0.693438	0.270004

3 rows × 35 columns

There are a few things to note. Each column with the suffix _MEAN represents the mean value, _BP represents the best-parameter from MCMC, and _STD represents the standard deviation. We recommend the use of the ‘H2Ot_MEAN’, ‘H2Ot_STD’, ‘CO2_MEAN’, and ‘CO2_STD’ columns. The columns with the suffix STN show the signal-to-noise ratio of the NIR peaks, and the columns with the prefix ERR just process this information, returning a ‘-’ if the peaks are meaningful and a ’*’ if the signal is too low.

Concentrations of \(\mathrm{H_2O}\) depend on whether your sample is saturated or not. If your sample is unsaturated (marked by H2Ot_3550_SAT == ‘-’), the column ‘H2Ot_MEAN’==‘H2Ot_3550_M’. If your sample is saturated (marked by H2Ot_3550_SAT == ’*‘), the column of ’H2Ot_MEAN’==‘H2Om_1635_BP’+‘OH_4500_M’. The \(\mathrm{H_2O_{t, 3550}}\) peak cannot be used, given potential nonlinearity in the Beer-Lambert Law. See the discussion of this handling of speciation in the paper.

The column ‘Density’ contains the densities used for the final concentration. The values between ‘Density’ and ‘Density_Sat’ will be different if the sample is saturated, showing the difference in densities when using variable concentrations of \(\mathrm{H_2O_m}\).

‘Tau’ and ‘Eta’ calculate the compositional parameters required for determining molar absorptivity. All calculated molar absorptivities and their uncertainties (sigma_ prefix) from the inversion are provided in the dataframe.

[ ]: