All-electron DFT calculation of super large-scale proteins is achieved using the DFT MO calculation program ProteinDF, and the QCLO (Quasi-Canonical Localized Orbital) or RMO (Regional Molecular Orbital) semi-automatic calculation method program. The QCLO and RMO programs generate the initial guess for ProteinDF execution. Once the QCLO or RMO program is executed, ProteinDF retrieves the generated initial guess, and performs all-electron calculations following user-specified scenarios.
Warning
The current QCLO program will not be updated in the future. When a new fully automatic calculation program is completely developed, the descriptions in this manual may become outdated.
The automatic all-electron calculation program QCLO assists the calculation convergence process using the QCLO or RMO method. QCLOs or RMOs are orbitals localized in a certain unit of amino acid residue but relatively close to its canonical orbital. Extracting and combing these QCLOs or RMOs allows us to generate a good initial guess for calculating the MO of a peptide chain. From the calculation result, we obtain yet another QCLO or RMO, and again use the orbital as the initial guess for a larger peptide chain calculation. Repeating this process and gradually extending the length of the peptide chain, we achieve all-electron calculation of proteins. We call this method “Auto-convergence method for all-electron calculation of proteins”. This method allows us to easily perform accurate quantum chemical calculations on proteins with a wide range of functionalities.
Generally, it is difficult to obtain the electronic states of large molecules, such as proteins or peptide chains, using only a single-point calculation without any reference. We therefore first divide proteins into smaller fragments such as amino acid residues, and start calculations from the fragments. Using the results, we then further calculate larger peptide chains. The Fig. 1 shows the method overview:
Step 1 divides a protein or peptide chain into single amino acid residues and performs calculation on each residue. Step 2 calculates a unit of three residues using the results of Step 1. From Step 2 onwards, residues are extracted with overlaps, as shown in Fig. 10.1. Using the results in Step 2 as the initial guess, Step 3 calculates peptide chains made of several residues. In this step, the initial guess is generated by combing the results of the first two residues of the peptide chain, those of all middle residues among the three-residue units, and those of the last two residues. Repeating this procedure and gradually extending the length of peptide chains, we finally obtain the total protein energy. We call the molecules generated during this process frame molecules. Step 1 uses Harris’s guess for computation, and Step 2 generates the initial guess by combining electron densities derived in Step 1. In this approach, the more frame molecules are combined, the more errors accumulate. Furthermore, relatively large errors may occur at combing sites, although residues are extracted with overlaps. These errors may cause a critical problem especially during large molecule calculation. Step 3 onwards, therefore, uses a different method based on localized orbitals to generate the initial guess.
The expression of molecular orbitals allows a degree of freedom, and can be transformed into many forms using unitary transformations. The following two expressions are typically used: canonical orbitals and localized orbitals. Localized orbitals are the orbitals obtained based on the assumption of their maximum concentration within a minimum space. To judge localization, the criteria proposed by Edmiston and Rüdenberg, as well as those by Foster and Boys, are well known. There are other methods, such as the Population method of Pipek and Mezey, and the RMO method of Gu. In any method, the more orbitals are localized into a specific region, the greater the index value. Chemists find intuitive the Edmiston-Rüdenberg, Population, and RMO methods, since in localized orbitals obtained by these method indices, core electrons are localized around a nucleus, bonding valence electrons are around the bond, and non-bonding valence electrons are found as lone-pair orbitals
ProteinDF employs the Population method and RMO method, which allow faster computation than the Edmiston-Rüdenberg method. To generate a good guess for peptide chain computation, from Step 3 onwards we use a method based on localized orbitals, as mentioned above. When MOs are expressed in a localized state, they can be separated into individual orbitals with chemically reasonable approximation. Once orbitals are localized through an intricate procedure, they can be separated and combined safely and freely, and an initial guess of good accuracy can be obtained. We named this orbital Quasi-Canonical Localized Orbital (QCLO) after its properties. The RMO method of localizing orbitals into a specific region basically has a similar handling procedure, although based on a different calculation system. In general, the larger the molecular size, the faster the computation of the RMO method compared with that of the QCLO. As shown in Fig. 2, These methods generate initial guesses by 1) dividing a peptide chain into fragments, such as amino residue side chains and peptide bonds; 2) obtaining an orbital which spreads only over each fragment and is also similar to the fragment’s canonical orbital; and 3) combining all the fragment orbitals to obtain an initial guess for calculating the MO of the entire peptide chain. For orbital localization, we start calculation from the frame molecules with three or more residues, in order to include the effect of the target molecule surroundings, and to express peptide bonds in a strict manner. Note that when calculating frame molecules of peptide chains, the user is required to group them into main chains and side chains. This grouping allows the system to automatically divide the target molecule into fragments.
The procedure for generating QCLOs and RMOs is as follows:
Step 1: Molecular orbital calculations for each frame molecule
Perform the molecular orbital calculation of each frame molecule. For the structure of the frame molecule, use the corresponding part of the original peptide chain. Add H and OH to the N-terminal and C-terminal at the split points, respectively. The orbital obtained here is a canonical orbital which spreads over the frame molecule.
Step 2: Localized orbital calculations for each frame molecule
Transform the molecular orbitals obtained in Step 1 to those localized on individual chemical bonds or lone pairs. The calculation procedure varies between QCLOs and RMOs.
Step 3: QCLO calculations for each fragment
From the orbitals obtained in Step 2, pick up an orbital which belongs to each fragment. Using the coefficient matrix, change the Kohn-Sham matrix (Fock matrices for ab initio HF) of the frame molecule from atomic orbital basis to localized orbital basis. By solving the eigenvalue equation of the Kohn-Sham matrix of the fragment, we can obtain an orbital, which is localized at the fragment but also spreads over the entire fragment. Repeat Steps 1 to 3 for all the frame molecules and fragments to obtain their QCLOs or RMOs. Then Step 4 generates the initial guess.
Step 4: Combing localized orbitals
The QCLOs and RMOs obtained in Step 3 are calculated based on the unit of the frame molecule. It is therefore necessary to remove the orbital components of H and OH, which were added in Step 1, since they do not exist in the actual peptides. Then combine QCLOs or RMOs of all fragments to generate an orbital set of the entire peptide chain. Next, apply a Löwdin orthogonalization to the orbital set. Here, we can obtain orbitals almost identical to those in Step 3, since the Löwdin method achieves orthogonalization with the original state hardly changed. Through these processes, we can thus obtain an orthogonalized LCAO matrix over the entire peptide chain.
We call the method consisting of Steps 1 to 4 the convergence process for all-electron calculation of proteins.
This function is based on the QCLO method which eliminates redundant calculations. This function generates the initial data (LCAO) for ProteinDF, using each fragment’s QCLO result obtained in ProteinDF.
ProteinDF solves the Roothaan equation by transforming it to an orthogonal basis using the matrix .
Transforming the atomic orbital (AO) based KS matrix to the orthogonal basis KS matrix.
Performing level shift to the KS matrix
Diagonalizing the KS matrix to obtain the coefficient matrix in orthogonal basis
Transforming the coefficient matrix to the AO basis
The following generally describes the QCLO method calculation procedure:
1st step:
Perform normal SCF MO calculations on all amino acids. Generate the initial guess from the atomic electron density.
2nd step:
Obtain initial guess by extracting and combining the monomer electron densities obtained in the 1st step. Pick up the localized orbitals (LO) and assign them to fragments. The QCLOs of the fragments can be obtained by solving the following eigen equations:
3rd step onwards:
Generate an initial guess by combining the QCLOs in the 2nd step. Then apply the Löwdin’s orthogonalization to the combined QCLOs. The transformed QCLO is hardly changed from the original orbital. Obtain the initial guess for each fragment from the orthogonalized QCLO. We can obtain the Fock or Kohn-Sham matrix of the fragment by the following equation:
Note that the solution of this equation expresses the QCLO of the 3rd step within the space defined by the QCLO of the previous step. Fig. 3 shows the processing flow of the QCLO method:
Specify the input keywords for the semi-automatic calculation program in the input file QcStep. The following shows an example of the QcStep file:
// Qclo input file
>>>>CONTROL
step-selection = { 1 2 3 4 }
filename = 1hrc_amber.nowat.pdb
>>>>STEP1
execution = {creation integral guess pdf }
sequential-frames = {
18-24|1
}
>>>>STEP2
execution = {creation integral guessrho pdf lo pickup}
sequential-frames = {
18-24|3
}
>>>>STEP3
execution = {creation integral guessqclo pdf lo pickup}
sequential-frames = {
18-22
21-24
}
>>>>STEP4
execution = {creation integral guessqclo pdf }
sequential-frames = {
18-24
}
pdf-keywords = {
max-iteration = 5
}
>>>>GLOBAL
pdf-keywords = {
scf-acceleration = damping
scf-acceleration/damping/damping-factor = 0.90
}
In QcStep files, specify the keywords in a block defined with >>>>. There are three blocks: CONTROL, STEP#, and GLOBAL. The # signifies step numbers. The text beginning with // up to the line feed code is commented out
To execute the QCLO auto-calculation method program from the command line, specify the command as follows:
% $PDF_HOME/bin/QCLO.x
The CONTROL block specifies the keywords regarding the auto-calculation method in general.
Controls the steps for the auto-calculation method. In the brace { }, the user can specify the number (#) in the STEP# block keyword described in the same file.
Step number specified in the STEP block
{ (Step number at which to perform calculation) }
step-selection = { 1 2 3 4 }
STEP block specifies the keywords regarding each calculation step.
Defines peptide-chain frames on which to perform the calculation sequence, as in sequential-frames ={ }. The description format in the brace { } is as follows:
Divides the peptide chain consisting of the residues from $1 to $2 into groups of $3 residues, allowing overlaps of two residues. The number of residues constituting the end fragment can be other than $3.
Amino acid residue with the number $4
Peptide chain made of the residue number $5 to $6 (equivalent to $1-$2|$3 above when $3 is omitted)
None
Numbers of residues which construct peptide chain frames
sequential-frames = {
18-24|3
}
Defines general frame molecules on which to perform the calculation sequence, as in the following format:
general-frames = {
name : {
…
}
}
Any string except system-reserved names can be specified for name. In the brace { }, the values specified for sequential-frames keyword can be specified.
None
general-frames = {
H2O : {
{
}
}
}
Controls the executions for the current STEP#. The following strings can be specified in the brace { }:
Creates work environment.
Executes integrals on ProteinDF.
Generates initial guess in ProteinDF.
Generates initial guess with p~bonds.
Generates initial guess by combining density matrices
Generates initial guess with QCLOs (available in Step 3 onwards)
Uses the initial data file specified with qclo-keywords.
ProteinDF calculation
Calculation for orbital localization
Picks up localized orbitals to assign to fragments.
Molecular orbital calculation with the extended QCLO method
Step1: execution = {creation integral guess pdf }
Step2: execution = {creation integral guessrho pdf lo pickup }
Step3 onwards: execution = { creation integral guessqclo pdf lo pickup }
creation,integral, guess, guessrho, guessqclo, guessfile, lo, pickup, pdf, pdfqclo
execution = { creation integral guess pdf }
execution = { creation integral guessrho pdf lo pickup }
execution = { creation integral guessqclo pdf lo pickup }
The example above shows the default values of STEP1, STEP2, and STEP#(#>2).
The keywords described in this block are reflected in all blocks. In the current system, the user can describe the following keywords pdf-keywords and qclo-keywords:
Specifies the ProteinDF keywords. These keywords will be only valid for the ProteinDF calculations within the block where the pdf-keywords is described. Use the following format:
pdf-keywords = {
ProteinDFプログラムの入力キーワード群
}
These keywords can be described anywhere in the auto-calculation method input file. The priority and effective range for the keywords are as follows:
{}内 > {}外 > STEP#ブロック > GLOBALブロック > CONTROLブロック
None
ProteinDF program input keyword
pdf-keywords = {
max-iteration = 5
}
Specifies the keywords for the QCLO auto-calculation method program. These keywords will be only valid for automatic calculations within the block where the qclo-keywords is described. Use the following format:
qclo-keywords = {
自動計算法プログラムの入力キーワード群
}
These keywords can be described anywhere in the auto-calculation method input file. The priority and effective range for the keywords are as follows:
{}内 > {}外 > STEP#ブロック > GLOBALブロック > CONTROLブロック
None
Keywords for the auto-calculation method program
qclo-keywords = {
set_fragment_by_element
}
Specifies fragment divisions for generating frame-molecule QCLOs in the following format:
fragment = {
フラグメント1の名前 = {
フラグメント1の構成要素1
...
フラグメント1の構成要素m
}
...
フラグメントnの名前 = {
...
}
}
Describe the above specification in the frame molecule definitions.
None
fragment = {
frag_res18 = {
18
}
frag_res19_21 = {
19-21
}
frag_res22 = {
22
}
}
Adds ethyl group to frame molecules. The coordinate of the specified residue side chain is used for the ethyl group coordinate.
None
Residue number
add_ethyl = 18
Regarding a frame molecule as a part of the specified amino acid residue, extracts the part corresponding to the side chain to generate an R side fragment.
None
Residue number
set_fragment_add_ethyl = 18
Prohibits adding H or OH to the breaking points of peptide bonds.
None
None
no_add_terminal
Creates the fragments for QCLO generation in the same way as when generating an initial guess.
None
None
set_fragment_by_element
Includes unoccupied orbitals in localizing process during QCLO generation.
None
None
localize_unocc_orbital
Includes unoccupied orbitals in the process assigning LOs to fragments.
None
None
pickup_unocc_orbital
Includes unoccupied orbitals when generating initial guess from QCLOs.
None
None
guessqclo_combine_unocc_orbita
Specifies the initial data file of LCAO expansion coefficient matrix for RKS calculations.
None
File name
initial-guess-lcao-file = step3/13_19/result.guess.lcao.rks
Specifies the initial data file of LCAO expansion coefficient matrix for α orbitals in UKS calculations.
None
File name
initial-guess-lcao-alpha-file = step3/13_19/result.guess.lcao.uks-alpha
Specifies the initial data file of LCAO expansion coefficient matrix for β orbitals in UKS calculations.
None
File name
initial-guess-lcao-beta-file = step3/13_19/result.guess.lcao.uks-beta
Specifies the initial data file of occupation number for RKS calculations.
None
File name
initial-guess-occ-file = step3/13_19/result.guess.occ.rks
Specifies the initial data file of α electron occupation number for UKS calculations.
None
File name
initial-guess-occ-alpha-file = step3/13_19/result.guess.occ.uks-alpha
Specifies the initial data file of β electron occupation number for UKS calculations.
None
File name
initial-guess-occ-beta-file = step3/13_19/result.guess.occ.uks-beta