Impact of Machine Learning on Raman and Raman Optical Activity (ROA) Spectroscopic Analyses of ribonucleic acid structure

----------------------


INTRODUCTION
Raman (R) and Raman Optical Activity (ROI) spectroscopies are highly sensitive measurement techniques based on the principle of light-chemical bond vibrational interaction in molecules of the material (Hobro et al., 2008;Madey & Yates Jr 2013), that have been employed extensively in analytical sciences (Ayres et al., 2021;Fan et al., 2011).Light interaction with chemical-bond electron density results in molecular vibration-excitation and light frequency shift explains the Raman effect (Ahmed & Jackson 2014).Moreover, the effect is also observed when the elastic scattering of light and energy exchange with material excitation, for instance, lattice vibration in solid material occurs.Therefore, the vibrational fingerprint inherently associated with a specific molecular structure could be inquisitively analyzed to gain insight into molecular identification and characterization (Das & Agrawal 2011;Garcia-Rico et al. 2018).
The application of Raman analysis has been recognized in various scientific areas including pharmaceutical science for characterizing unknown biomolecules of medicinal interest (Craig et al., 2013;Li-Chan 1996;Movasaghi et al., 2007).Despite the high potential of the Raman spectroscopy technique, it is complicated to process and extract valuable information from the spectral data of complexity with random noise that necessitates the robust processing technique (Gautam et al., 2015;Pelletier 2003).The key components of the Raman spectroscopy and Raman scattering phenomenon are illustrated in Figures 1a & 1b (Orlando et al., 2021;Rostron et al., 2016).Long data processing time and error-prone analytical results with traditional computational approaches are not sufficient to meet the multidimensional Raman spectroscopy-based research about biological molecules including structural insight of RNA (Antonio & Schultz 2014;Butler et al., 2016;Guo et al., 2021).Therefore, with the advent of artificial intelligence (AI), recently, machine-learning (ML) emerged as a potential analytical tool/technique to address such issues with its capabilities to make automated predictions and mine deep complex data including spectral data (Kusters et al., 2020;Xu et al., 2021).AI is accomplished based on training with prelabeled data to provide predictions on fresh data input which plays a crucial role in  et al., 2023;Leardi 2002;Rocha et al., 2020).The applications of decision trees, support vector machines, random forests, and artificial neural networks ML algorithms have been described and reported in recently published scientific reports (Bhatti et al., 2023;Charbuty & Abdulazeez 2021;Ding et al., 2011), in the recent past which could play a vital role in Raman spectral comprehensive data analysis more effectively compared to the traditional data processing technique (Carey et al., 2015).
ML algorithms in tandem with traditional processing techniques such as principal component analysis (PCA), partial least-square regression (PLS), linearregression (LR), linear-discriminant analysis (LDA), least-square (LS), and quadrantdiscriminant analysis (QDA) in conjunction with spectral preprocessing techniques have been reported to be employed to automated classification of the spectral data of biomolecules (RNA), therefore, these algorithms have been identified as the remarkable research subject in the last few years (Fan et al., 2023;Han et al., 2022;Luo et al., 2022;Zhang et al., 2020).The application of artificial intelligence potentially expedites the determination of molecular patterns and connections based on analyzing a given data set and predicting valuable results.Though there are review articles on the application of ML in various scientific areas are, however, information on the impact of ML on Raman spectroscopic analysis of RNA has not sufficiently published.Therefore, this review aimed to summarize the structural organization of ribonucleic acid and the impacts of ML on Raman and ROA spectral analysis of motifs and elements in RNA structure along with the future direction of spectral research with the application of ML.The methodological approaches for the ML-assisted Raman and ROA spectroscopies are depicted in Figure 2.

Structural Organization of RNAs:
NA negatively charged biopolymer with 2'-hydroxyl group imparting which differentiates it from DNA thermodynamically and conformationally that reflects in the structure and the function of RNA (Fohrer et al., 2006).A variety of RNA exists each with specific functional features including catalysis, protein synthesis, and gene expression regulation (Minchin & Lodge 2019).RNA conformation is explained under 3 different structural organization levels: (A) primary structure, (B) secondary structure, and (C) tertiary structure.The sequence of nucleotides in RNA defines the primary organizational level while the secondary structure involves different base pairing modes (canonical and noncanonical) that describe the two-dimensional (2D)folding of the biopolymer.Moreover, the tertiary structure involves the interaction between various secondary structural motifs that leads to overall three-dimensional arrangements of RNA (Abraham et al., 2008;Dirheimer et al., 1994;Eric Westhof & Pascal Auffinger 2000).
The primary structural organization encompasses an arrangement with a sequence of four different nucleotides of ribonucleic acid (Madison 1968).Although RNA (a single-stranded molecule) has the potential for folding itself to form diverse structural motifs.The pairing of the adjacent nucleotides is what defines the secondary structure of the ribonucleic acid.Unlike proteins, the secondary structural stability of RNA is unrelated to its tertiary conformation.As a result of it, RNA folding involves first the formation followed by the consolidation of its secondary structure (Dima et al., 2005).
RNA base pairing follows a canonical (Watson-Crick) base-pairing (interaction of Guanine-cytosine while adenine-uracil with H-bonds).Moreover, the Watson-Crick interaction is prevalent among RNA molecules, and it has a remarkable attribute in the formation of RNA helices.In addition, it also follows non-canonical (Non-Watson-Crick) base pairing which constitutes about 40% of the total base pairing in RNA molecules (Lemieux & Major 2002).This mode of pairing provides distinctive sites for interactions with proteins, ligands, and metals.Furthermore, noncanonical base pairing is critical for the existence of the Aform structure of RNA (Lemieux & Major 2002;Sharma et al., 2010).Common noncanonical pairings include the G-U-Wobble pair, A+:C, and G-A-pairs.The G-U wobble-pair is the most frequently detected base pair in RNA.Moreover, wobble interaction produces distinctive structural, chemical, and ligand binding capabilities.In addition, G-U base-pairs are thermodynamically more stable.This stability allows the wobble base to be involved in various biological activities (Halder & Bhattacharyya 2013).On the other hand, A+: C base pairs are observed in ribozymes and some RNA loops.In this type of interaction, the addition of a proton exposes a hydrogen (H)-bond to the cytosine-carbonyl group.This feature provides further chemical diversity to RNA (Chen et al., 2012;Halder & Bhattacharyya 2013).Moreover, G: U-basepairs are spotted commonly in internal loops of RNA tertiary structure, therefore, they assist the folding of RNA and also enhance the ligand binding capability of RNA molecules (Chen et al., 2005).

Suborganization of Secondary RNA Structures:
RNA secondary structures include a. stems, b. loops and c. pseudoknots.A stem develops when two or more adjacent complementary nucleotides are paired.On the other hand, the unpaired nucleotides in the stems are called loops (Holbrook 2005 The presence of four nucleotides in an RNA loop forms a structure called tetraloops.According to the sequence of the residues, there are three main types of tetraloops: GNRA, UNCG as well as CUUG (A, U, G/C stand for N, and R stand for A/G).
Although each family has distinct nucleotide sequences, they are structurally very similar.Functionally, various roles of different types of tetraloops have been reported.For example, the GNRA tetraloop acts as a site for protein interaction (Thapar et al., 2014).Moreover, the UUCG tetraloop serves as a site for RNA folding and prohibits clustering of large molecules whereas; GAAA plays a critical role in interactions stabilizing tertiary structure (Nicolas Leulliot et al., 1999;Thapar et al., 2014).
Internal loops are another type of RNA secondary motif.These loops are formed because of the unpaired nucleotides present between two stems (Schroeder & Turner 2000).In addition, internal loops have two subdivisions: symmetric internal loops, which include an equal number of the residues or strands, and asymmetric internal loops, which involve unequal numbers of nucleotides.Internal loops are crucial for many biologically significant functions, with one of these being to provide free energy for RNA folding (Hammond et al., 2010).Bulges can be defined as unpaired regions of nucleotides that arise only from one RNA strand.The size of the bulge varies from single to numerous residues.Furthermore, bulges influence the assembly of RNA architecture (Danaee et al., 2018).The fourth kind of RNA loop is the Multibranch loop.The M-loop is a complex structure from which several loops exit.Numerous Multibranch loops are present in rRNA since they are critical for configuring RNA secondary structure (Diamond et al., 2001).Moreover, pseudoknots are considered one of the most prominent structures of RNA (Staple & Butcher 2005).They have evolved because of the pairing of a hairpin-loop having a single-stranded complementary sequence.
Sometimes, base-pairing phenomena occur between 2 or more than 2 hairpin loops.The formation of pseudoknots in catalytic RNAs is more obvious than in other RNA types (Hajdin et al., 2013).In addition, RNA pseudoknots are required for many biological functions of human RNA, such as telomerase activity (Theimer et al., 2005), therefore, in addition, the presence of pseudoknots in viral RNA is essential for replication and gene expression (Brierley et al., 2007).

RNA Structural Motifs and Structural Elements:
Structural motifs and structural elements are two terms used for further understanding of various structures of molecules (Butcher & Pyle 2011;Hendrix et al., 2005).RNA motifs are specific areas within the molecule with defined lengths and sequences of nucleotides.They usually behave as one unit and perform specific structural or biological functions (Kinjo & Nakamura 2012).Motifs in RNAs are primarily identified by a unique sequence of nucleotides in some areas of functional RNAs, such as tRNA and rRNA (Hendrix et al. 2005), examples of some of these motifs include different tetraloops, the kink-turn, the sarcin-ricin loop, and the T-loop are tabulated in Table 1.2006).

RNA Tertiary Structure:
The diverse types of secondary structural elements and motifs interact with one another developing a more sophisticated structural organization that determines the overall comprehensive architecture of the molecule (Abraham et al., 2008).Despite its role in formulating RNA overall conformation, a tertiary configuration is also directly related to many biological functions performed by ribonucleic acid (E.Westhof & P. Auffinger 2000).Moreover, the interconnection between different secondary structural motifs involved in RNA tertiary structure can be divided into three main interactions: (a) between two double strands, (b) between a helical strand and an unpaired region, and (c) between two unpaired regions (Abraham et al., 2008).Interaction between two double strands can be subdivided into: a) Coaxial stacking and b) adenosine platform.Coaxial stacking occurs when two double strands are next to each other this causes stacking of their terminal base pairs (Tyagi & Mathews 2007;Zhang et al., 2011).Interaction between helical strand and unpaired region in which the binding between a double strand and a single region involves four types of organizations: a) triplex, b) tetra loop, c) metal core, and d) ribose zipper.The triplex includes the binding of a double helical strand with one single strand.
The triplex -forming oligonucleotides (TFOs) present in the single helix bind with the double strand via non-Watson -Crick base pairing.(Buske et al., 2011).Tetraloops are important structural motifs with numerous biological functions.Furthermore, tetraloops can produce another critical type of binding known as tetra looptetraloop receptor interaction (Moore 1999).This type of interaction is an important RNA motif that can bring different structures to proximity.
The binding of tetraloop with a receptor is mainly established between GNRA tetraloop and a target receptor containing a GAAA motif in the minor groove of RNA.Moreover, hydrogen bonds are formed between the OH groups present in the receptor as well as the GNRA.In addition to the H-bonding, the adenosine platform performs a crucial role in binding with A 2 from the tetraloop adding more stability to the motif (Westhof & Fritsch 2000).

Raman and ROA Spectroscopies for Structural Analysis of Ribonucleic Acid:
The role of vibrationalspectroscopies in structural biology is paramount because of their sensitivity to unfold vast structural information and their applicability to diverse biomolecules under various conditions (Hobro et al., 2008).Raman spectroscopy and ROA are the two principal vibration spectroscopies (Ashton et al., 2007).ROA operates based on the Raman-scattering phenomenon of light and measures chirality allied with the Raman transition as illustrated in Figure 3 (Batista Jr et al., 2015).Measuring Raman and ROA spectra simultaneously from the same specimen are achieved they exhibit high sensitivity to diverse attributes of macromolecular-structure, therefore, the data retrieved shows complementarity (Batista Jr et al. 2015).The combined potential of the Raman and ROA have been exploited to investigate RNA structure to a greater extent (Hobro et al., 2007).Despite being complementary techniques, the Raman and ROA could also be used independently for RNA structural analysis (Barron et al., 2003).Moreover, these techniques have been used, recently, to identify structural conformations, for instance, the GNRA tetraloop (Hernández et al. 2003;N Leulliot et al. 1999).Novel RNA structural information could be obtained by ROA (Blanch et al., 2002).Additionally, Raman and ROA spectroscopic spectra may be inquisitively evaluated to delineate the alteration in RNA sequences and structure by analyzing the specific spectral changes (Blanch et al., 2002).Future Perspectives and Conclusions: ML-assisted Raman and ROA spectroscopies reflect great potential in expediting ribonucleic acid research with high precision and considerable accuracy.Machine learning is based on the analysis of huge data divided into training and test datasets, which helps in automatizing the whole process to provide accurate, valuable prediction results.These vibration spectroscopic techniques are one of the most appropriate analytical tools for studying RNA sequences, motifs, and elements and their conformation alterations under various conditions.However, the limited data size of Raman spectra remains a potential challenge for the ML-assisted Raman and ROA spectroscopies which needs to be addressed with priority.In addition, establishing a public database with standard normalization and data processing methods for obtaining Raman spectra worldwide would be the future direction to address the Raman spectral data size.Furthermore, minimizing the time taken to retrieve the required number of Raman spectroscopic imageries with the enhanced spectrophotometric device efficiency would for extracting reliable RNA structural information.Moreover, the miniaturization of a spectrophotometer and the advancement of ML techniques, in the future, would be a powerful combinatorial tool for future indepth analysis of RNA structures along with other biomolecular structures.ML-assisted data Raman spectroscopy would also guide more effective analysis of the huge and complex biomolecular data and therefore, it would be able to revolutionize the nucleic acid and protein research's speed, accuracy, and reliability along with minimizing manpower and analysis cost.

Fig. 1 .
Fig. 1.Illustration of the (a); key components of the Raman spectroscopy and (b); Raman scattering phenomena.Plank's constant (h), frequency of the incident light (υ0), Vm: frequency of molecular vibration (υm), and energy at the ground level (E0).
Duarte & Ståhl 2019; Lewis & Denning 2018).Consequently, ML could be used to analyze the spectral datasets obtained from Raman spectroscopy to unfold the structural complexity of the RNA structural organization (Qi et al., 2023).because it can analyze high-dimensional datasets efficiently and find significant connections and patterns beyond the functional groups level in a molecule (Adhikari

Fig. 2 .
Fig. 2. Depiction of the Machine Learning Approaches assisted the Raman and Raman Optical Activity (ROI) spectroscopies for RNA structures.PCA; principal component analysis, PLS; partial least square regression, LR; linear regression, LDA; linear discriminant analysis, LS; least square, and QDA; quadrant discriminant analysis.

Table 1 .
Tabulation of structural details of RNA motifs.