Stanford Ribonanza RNA Folding (7th place, Gold Medal)

Summary

The following repository includes my scripts, experiments, and notes that document my progress through the Stanford Ribonanza RNA Folding, more ifnormation about the competition here . This work resulted in 7th place and a gold medal, the full solution write up can he foound here

Installation

I recommend using the official nvidia or kaggle docker images with the appropriate CUDA version for the best compatibility, data can be downloaded from offical page here, for bpp and ss I recommend to install arnie

Experimetns and Results

exp_name	description	CV	LB
`exp_00`	Initial experiment using `RNA_ModelV2` on `RNA_DatasetBaseline`. Utilizes 1D convolution after `nn.Embedding` layer and transformer. Configuration includes `64` batch size, `12` workers, `192` dimensions, `12` depth, `32` dim_head, and `64` epochs with `5e-4` learning rate and `0.05` weight decay. The experiment is running on `CUDA` device.
`exp_01`	Baseline experiment using `CustomTransformerV0` on `RNA_DatasetBaseline`. Incorporates a simple embedding layer fed to the Encoder and uses rotary embeddings via the `ContinuousTransformerWrapper` class. Configuration includes `64` batch size, `12` workers, `192` dimensions, `12` depth, `32` dim_head, and `64` epochs with `5e-4` learning rate and `0.05` weight decay. The experiment is running on `CUDA` device.
`exp_02`	Same as `exp_01` but uses `TransformerWrapper` instead of `ContinuousTransformerWrapper`. Configuration includes `64` batch size, `12` workers, `192` dimensions, `12` depth, `32` dim_head, and `64` epochs with `5e-4` learning rate and `0.05` weight decay. The experiment is running on `CUDA` device.
`exp_03`	Experiment using `CustomTransformerV1` on `RNA_DatasetBaselineSplit`. This version generates a new split based on hd-hit and uses `TransformerWrapper` instead of `ContinuousTransformerWrapper`. Configuration includes `64` batch size, `12` workers, `192` dimensions, `12` depth, `32` dim_head, and `64` epochs with `5e-4` learning rate and `0.05` weight decay. The experiment is running on `CUDA` device and uses a custom fold split defined in `fold_split.csv`.	`13.92`	`0.16144`
`exp_04`	Same as `exp_00` but with a new splitting method defined in `fold_split.csv`. Uses `RNA_ModelV2` on `RNA_DatasetBaselineSplit`. Configuration includes `64` batch size, `12` workers, `192` dimensions, `12` depth, `32` dim_head, and `64` epochs with `5e-4` learning rate and `0.05` weight decay. The experiment is running on `CUDA` device.	`0.1347`	`0.1559`
`exp_05`	Model `RNA_ModelV3` used on `RNA_DatasetBaselineSplitbppV0`. Incorporates a transformer and every 4th layer a Graph Attention Network (GAT) is added which uses BPP. This BPP is masked (first `26` and last `21`) and further filtered with values > `0.5` to generate edge index. Configuration includes `64` batch size, `12` workers, `192` dimensions, `12` depth, `32` dim_head, and `64` epochs with `5e-4` learning rate and `0.05` weight decay. The experiment is running on `CUDA` device and uses a custom fold split defined in `fold_split.csv`.	`0.1304`	`0.1514`
`exp_06`	Model `RNA_ModelV4` used on `RNA_DatasetBaselineSplitbppV0`. This version uses a transformer that saves intermediate results every n layer. These intermediates are then concatenated and applied to several layers of a Graph Convolutional Network (GCN). The edge index for the GCN is determined by BPP in the same manner as `exp_05`. The final layer of the GCN is concatenated with the final layer of the transformer and passed to a feed-forward network. Configuration includes `64` batch size, `12` workers, `192` dimensions, `12` depth, `32` dim_head, and `64` epochs with `5e-4` learning rate and `0.05` weight decay. The experiment is running on `CUDA` device and uses a custom fold split defined in `fold_split.csv`.	sas `exp_04`
`exp_07`	Model `RNA_ModelV6` used on `RNA_DatasetBaselineSplitbppV0`. This experiment is similar to `exp_05` but tests the use of regular graph convolution instead of attention. The performance observed was similar to `exp_04`. Configuration includes `64` batch size, `12` workers, `192` dimensions, `12` depth, `32` dim_head, and `64` epochs with `5e-4` learning rate and `0.05` weight decay. The experiment is running on `CUDA` device and uses a custom fold split defined in `fold_split.csv`.	sas `exp_04`
`exp_08`	Model `RNA_ModelV3` used on `RNA_DatasetBaselineSplitssV0`. This experiment is similar to `exp_05`, but instead of using BPP, it uses `ss_roi` from Vienna, which represents secondary structure prediction without adapters. Configuration includes `64` batch size, `12` workers, `192` dimensions, `12` depth, `32` dim_head, and `64` epochs with `5e-4` learning rate and `0.05` weight decay. The experiment is running on `CUDA` device and uses a custom fold split defined in `fold_split.csv`.	`0.131570`	`0.15175`
`exp_09`	Model `RNA_ModelV7` used on `RNA_DatasetBaselineSplitbppV0`. This experiment differs from `exp_05`. After the extractor layer, the features are fed into a 4-layer GAT attention network with BPP>0.5 serving as edges. Subsequently, these features are concatenated with the extractor feature and passed to the transformer. Configuration includes `64` batch size, `12` workers, `192` dimensions, `12` depth, `32` dim_head, and `64` epochs with `5e-4` learning rate and `0.05` weight decay. The experiment is running on `CUDA` device and uses a custom fold split defined in `fold_split.csv`.	`0.131179`	`0.15143`
`exp_10`	Model `RNA_ModelV2SS` used on `RNA_DatasetBaselineSplitssV1`. This experiment mirrors `exp_04` in terms of utilizing the original transformer model. However, the unique aspect of this experiment is the use of `ss_full`, which is embedded using the Extractor layer. This layer also serves to embed the sequence. Post embedding, both the sequence and ss features are concatenated and passed to the transformer. Configuration matches previous settings and the experiment is running on the `CUDA` device, using the custom fold split defined in `fold_split.csv`. not FT	`0.1351`
`exp_11`	Model `RNA_ModelV8` used on `RNA_DatasetBaselineSplitbppV1`. An iteration similar to `exp_12` but integrates a graph attention layer at the end instead of concatenating. This results in a decreased performance, indicating the potential downside of adding local attention at the final stage. The experiment uses a doubled dimension size of `192 * 2`, and all other configurations, including logging with `wandb` on the `CUDA` device, remain consistent with the previous setup.	`bad`
`exp_12`	Model `RNA_ModelV7` used on `RNA_DatasetBaselineSplitbppV1`. This experiment is an iteration over `exp_09` with an increased dimension size (`192 * 2`). This version utilizes the full BPP matrix, taking into account adapter probabilities. As in previous experiments, BPP values greater than `0.5` are chosen for edges. Other configurations remain consistent with prior settings, and the experiment is logged with `wandb` and running on the `CUDA` device using the custom fold split in `fold_split.csv`.	`0.1310`	`0.15345`
`exp_13`	Model `RNA_ModelV9` used on `RNA_DatasetBaselineSplitssbppV0`. This experiment combines both full BPP and SS (with adapters). Two separate small GNNs operate on SS and BPP independently. The outputs from these GNNs are then concatenated and supplied to a transformer. The model uses a doubled dimension size of `192 * 2`. Other configurations, including logging with `wandb` on the `CUDA` device, remain consistent with the previous setup.	`0.1304`	`0.15218`
`exp_14`	Model `RNA_ModelV10` used on `RNA_DatasetBaselineSplitssbppV0`. This experiment iterates over `exp_13` with modifications to the node features for the GAT. The node features now have positional encoding, and the last residual connection in the GAT layer has been removed. Other configurations, including logging with `wandb` on the `CUDA` device, remain consistent with the previous setup.	`0.1299`	`0.15181`
`exp_15`	Model `RNA_ModelV7` used on `RNA_DatasetBaselineSplitbppV1`. This experiment is similar to `exp_13`, but it's a true `B` model with a dimension size of `768` (`192 * 4`). The experiment also uses a longer training duration with `128` epochs and a reduced learning rate of `1e-5`. Other configurations, including logging with `wandb` on the `CUDA` device, remain consistent with the previous setup. Training crsashed segm error	`n/a`
`exp_16`	Model `RNA_ModelV11` used on `RNA_DatasetBaselineSplitssbppV1`. The experiment starts with the Extractor embedding sequences. The Extractor's features are passed to a GAT with edges calculated from the SS. The output is then combined with `bpp` (probability) using `bmm` and applied to a gated residual unit. Previously, I used `bpp` in GAT for pairs exceeding 0.5 probability. The result is concatenated with the Extractor's original features before feeding it to a transformer. This model has so far shown superior performance, perhaps due to the incorporation of full bpp probabilities.	`0.12709`	`0.1481`
`exp_17`	Model `RNA_ModelV12` used on `RNA_DatasetBaselineSplitssbppV1`. This experiment is akin to `exp_16`, but the model does not utilize the `GAT` for `ss`. Instead, it merges the raw full probability `bpp` through `bmm` and a gated residual unit in `BppFeedForwardwithRes` layer, after the initial `4` transformer blocks.	`overfit`
`exp_18`	Model `RNA_ModelV13` used on `RNA_DatasetBaselineSplitssbppV1`. This experiment deploys a standard transformer mode with Extractor. However, during its first eight layers, the `BppFeedForwardwithRes` mechanism is used alternately on `ss` and `bpp`. The pattern followed is bpp, ss, bpp, ss, and so on. The idea is to ascertain the potential benefits of this alternating incorporation of structural and base pairing probabilities within the transformer's processing layers.
`exp_19`	Model `RNA_ModelV14` used on `RNA_DatasetBaselineSplitssbppV1`. This experiment is akin to `exp_16`. Initially, the sequences are embedded using the Extractor. The extracted features are passed to a GAT, using SS as edges. The output from the GAT is then combined with `bpp` via the `BppFeedForwardwithRes` mechanism. To this combination, the original features extracted by the Extractor are added as a residual. The final combined features are then fed into the transformer. Unlike `exp_16`, this experiment doesn't concatenate the features, and each branch takes an input of size `dim//2`. The intent is to see how the combined GAT and transformer mechanism works without concatenating features from different branches.	`0.1275`
`exp_20`	Model `RNA_ModelV15` used on `RNA_DatasetBaselineSplitssbppV1`. This experiment draws inspiration from `exp_16`. The sequences undergo embedding through the Extractor and then move onto a modified GAT with `graph_layers=6` and `heads=8`. In contrast to `exp_16`, which performs the bpp combination immediately post-GAT, this setup introduces the `bpp` combination via `BppFeedForwardwithRes` only after the second transformer encoder layer. The rationale is to allow the initial layers to primarily focus on SS features, and then to incorporate the bpp data. Results indicate that the timing of bpp injection has a minimal impact, and its mere presence significantly influences the outcomes.	`~exp_16`
`exp_21`	Model `RNA_ModelV16` used on `RNA_DatasetBaselineSplitssbppV1`. This experiment deviates from using a GAT. Instead, post extraction, there are two small transformer encoders dedicated to `bpp` and `ss` respectively. Each encoder comprises `3` layers of transformers. Subsequently, the `bpp` or `ss` are combined using a gated residual unit through `GatedResidualCombination`. The features from these encoders are then concatenated and passed to a standard transformer. Initial training showcased promise, however, inconsistency in the form of spikes was observed during the later stages. A slight overfitting was also detected as the training progressed, suggesting the potential need for techniques like SWA.
`exp_22`	Model `RNA_ModelV17` used on `RNA_DatasetBaselineSplitssbppV1`. This iteration avoids GAT and is somewhat aligned with the methodology of `exp_21`. Two separate transformers are in play; one dedicated to `ss` and the other to `bpp`. The block called `EncoderResidualCombBlockV1` is used for each transformer. For `ss`, combination with the extractor layer (via `GatedResidualCombination`) is done before passing to the transformer. In contrast, for `bpp`, the combination is performed post its processing by the transformer. The output features of `ss` and `bpp` transformers are then combined and supplied to a standard transformer. A notable introduction in this experiment is the Exponential Moving Average (EMA), which ensured stable performance during training, quicker convergence, and eliminated spikes in validation. However, mild overfitting was observed, suggesting potential adjustments like increasing dropout.	`0.1268`	`0.14621`
`exp_23`	Model `RNA_ModelV17` utilized on `RNA_DatasetBaselineSplitssbppV2`. This experiment mirrors the architecture and methodology of `exp_22`, with a few differences. The model's dropout rates (`layer_dropout`, `attn_dropout`, and `drop_pat_dropout`) have been increased to 0.25. Furthermore, the `bpp` value now represents an average derived from three packages: 'vienna_2', 'contrafold_2', and organizers' data. This change to the `bpp` is anticipated to introduce a more robust representation. The rest of the configurations remain consistent with `exp_22`.	`0.1272`	`0.14709`
`exp_24`	Model `RNA_ModelV18FM` used on `RNA_DatasetBaselineFM`. In this experiment, finetuning was performed on the publicly available RNA foundation model, known as `RNA-FM`. The model was entirely unfrozen and additional output layers were appended. The dropout rate for this model was set to 0.2. Notably, no `bpp` was utilized in this setup. Given that this is a finetuning on an existing model, it will be interesting to observe how previous knowledge and architectures from `RNA-FM` benefit the training process.	`bad`
`exp_25`	This experiment utilizes the `RNA-FM` model as in `exp_24`. However, rather than merely finetuning the existing model (which resulted in a poor score in `exp_24`), there's a structural adjustment here. The `RNA-FM` functions as an extractor. The embeddings derived from layer 12 of `RNA-FM` are taken and the procedure from `exp_22` is followed. Yet, the transformer architecture is modified. The two primary transformers that integrate `bpp` and `ss` with the features from `RNA-FM` consist of three layers each. The outputs of these transformers are merged and supplied to a 6-layer transformer. It's notable that the `RNA-FM` is kept frozen in this setup, ensuring its weights remain unchanged during the training process.	`0.1293`
`exp_25_unfreeze`	Following the results of `exp_25`, this experiment unfreezes the backbone (`RNA-FM`) and fine-tunes the entire model. The observation reveals that although the score improved compared to `exp_25` (reaching a metric of `0.1277`), the model began to overfit severely after a certain point. It suggests a recurring trend that models which have the `RNA-FM` backbone unfrozen are prone to overfitting. It's also notable that this run used the weights from `exp_25` as its initial state (`md_wt = 'exp_25/models/model.pth'`) and had a modified learning rate that was an average of `5e-5` and `5e-4`. The total epochs for this experiment was reduced to `16` given the observation of overfitting.	`0.1277`
`exp_26`	Model `RNA_ModelV20` used on `RNA_DatasetBaselineSplitssbppV3`. This experiment closely follows the configuration of `exp_22`, but there's a significant change in the input data. Instead of feeding `ss`, an average of extra `bpp` from three distinct packages (Vienna, Contrafold, and RanFormer) is utilized. The architecture still employs two separate transformers, one for the aforementioned average `bpp` and the other for the original `bpp`. The combination of `bpp` or `extrabpp` is achieved using a gated residual unit via `GatedResidualCombination`. These processed features are then merged and supplied to a typical transformer. The two initial transformer mechanisms vary in when the combination takes place: the first integrates before feeding to the encoder, while the latter performs the integration post-encoder.	`0.1267`	`0.14671`
`exp_27`	This experiment introduces an interesting approach where the `RNA-FM` model acts as an evolutionary module due to its training on a large RNA dataset. Its features are extracted and run in parallel with a regular extractor. The extracted features then undergo a combination process with `bpp`. Following this, the features derived from `RNA-FM` and the combination are concatenated and subsequently passed through a standard transformer. The experiment aims to harness the potential of the `RNA-FM` model by leveraging its evolutionary insights while maintaining an independent extraction and combination process. Its resulted in overfit at the end.	`0.1307`
`exp_28`	This experiment employs the `RNA-FM` model, specifically to predict `bpp` values which then undergo transformation through a sigmoid layer. The transformed `bpp` values are combined with features from a regular extractor. Following this combination, the features are concatenated with the original extracted features and then passed to a transformer which has been modified to have a depth of 9. This experimental setup intends to refine the learning process by allowing the `RNA-FM` model to make predictions which are subsequently combined with the traditional extraction process. A future plan is in place to unfreeze the concatenated model, which might enhance the overall model performance.	`0.1333`
`exp_29`	This experiment replicates the conditions of `exp_28`, known for its high-performing cross-validation results. However, a significant modification is made in the `extra_bpp` feature extraction process. Instead of utilizing the comprehensive set of three different sources (`Vienna`, `Contrafold`, and `RanFormer`) for secondary structure (`ss`) data, this iteration solely employs data from the `rnafm`	`0.1271`	`0.14673`
`exp_30`	In this iteration, the setup is deliberately aligned with the conditions of `exp_19`, which previously achieved a notable score. However, `exp_30` diverges by integrating an averaged `bpp` feature that combines the original `bpp` with that derived from `rnafm`. Despite this strategic fusion, intended to harness more robust or comprehensive structural insights, the model's performance disappointingly declines, as evidenced by a score of `0.1290` compared to the `0.1275` from `exp_19`	`0.1290`
`exp_31`	This iteration introduces an innovative approach in handling `bpp` data through the deployment of a `CombinationTransformerEncoder`. This specialized block harmonizes a standard transformer with a subsequent `Combination` layer designed to multiply the input with `bpp`, followed by dual `conv1d` operations activated by `relu`. This encoder is stratified into 8 distinctive layers, with a strategic injection of `bpp` in one layer, while `extra_bpp`—an averaged ensemble from sources including `rnafm`, `vienna_2`, `contrafold_2`, and `rnaformer`—is integrated into another. Concluding the architecture are 4 blocks of unaltered transformers.	`0.1259`	`0.1459`
`exp_32`	Progressing from the advancements of `exp_31`, this experiment evolves the architecture by implementing the `CombinationTransformerEncoderV1`. This construct enhances the sequence of interactions by introducing a layout that progresses through a `transformer_encoder`, integrates `bpp`, advances through another `transformer_encoder`, incorporates `extra_bpp`, and concludes with a final `transformer_encoder` with the `ss`. Each segment in this chain is mixed `Combination` layer. Additionally, every block (`CombinationTransformerEncoderV1`) is fortified with a residual connection. The experiment has indicated an improvement over its predecessor, `exp_31`.	`0.1247`	`0.14436`
`exp_32-ft-ex-ft-sr`	FT `exp_32`, on `sr` and `external_data`	`0.1245`	`0.145`
`exp_32_psd`	FT `exp_32_ft`, on `final0_PLfolds_ft_tot` psudolables	`0.12361`
`exp_32_psd_v1`	FT `exp_32_psd`, on `final0_PLfolds_ft_tot` psudolables	`0.1231`
`exp_32_psd_v3`	FT `exp_32_psd`, on `final0_PLfolds_ft_tot` psudolables	`0.1227`	`0.14074`
`exp_32_ft_after_psd`	FT `exp_32_psd_v3`, ft on `sr=True`	`0.1221`	`0.14128`
`exp_32_psd_v3_final_comb_PL_v1`	FT `final_comb_PLfoldsEXft_tot`	`0.1222`
`exp_32-ft-after-PLfoldsEXft`	FT `exp_32_psd_v3_final_comb_PL_v1`, ft on `sr=True`	`0.121`	`0.14136`
`exp_32-psd_v3_ex_ft`	FT on external dataset
`exp_32_ex_ft_flip_sr`	FT `exp_32_ex_ft_sr`, on `external`, and added `flip` augmentations	`0.1255`
`exp_32_v1`	This iteration replicates `exp_32`, but incorporates two key changes: 1) `bpp` is stored as a numpy array, simplifying the loading process and potentially speeding up training. 2) The `rnafm` feature, previously deemed noisy, is excluded. The primary goal is to observe any performance variations due to these changes.	`0.1250`	`0.14491`
`exp_32_v2`	This variant is modeled after `exp_32_v1`, but incorporates a deeper `bpp_transfomer_depth` of `6`, increased from `4`. Furthermore, the training data has been sourced from `train_corrected.parquet`, which might be a refined or updated dataset. The primary goal is to see if deeper `bpp_transfomer_depth` and the new dataset enhance the performance.	`0.1247`	`0.1448`
`exp_32_v2_ft_ex_sr`	same as `exp_32_v2`, but finetuned on external and sr	`0.125184`	`0.14572`
`exp_32_v3`	This experiment mirrors `exp_32`. Additionally, the `bpp_transfomer_depth` is increased to `6` from `4`, potentially enhancing the model's ability to process and integrate bpp information. Finetuned on external dataset	`0.1249`	`0.14463`
`exp_32_v3_ex_ft_sr_ft`	This experiment mirrors `exp_32_v3`.	`0.1251`	`0.1449`
`exp_32_v3_psd_v2`	ft on `exp_32_v3` using psd `final_comb_PLfoldsEXft_tot`	`0.123`	`0.1403`
`exp_32_psd_v3_ex_ft`	ft on `exp_32_v3_psd_v2` using only external and `sr=true`	`0.123`	`0.1403`
`exp_33`	This experiment marks a departure from the previous transformer-based approaches, venturing into convolutional neural networks (CNNs) with the implementation of a standard EfficientNetV2_1d (referred to as efnetv2 in the context). The model integrates an initial extractor layer, funneling processed features into the EfficientNet structure. Tthe first attempt at adapting the efnetv2 for sequence data like RNA poses challenges, reflected in a local CV score of `0.1623`.	`0.1623`
`exp_34`	Building upon the framework established in `exp_32`, Firstly, it replaces the standard `rnatormer` bpp with `rnaformverv1`. Secondly, a critical fix was implemented in the `combination` layer, resolving an issue related to padding mask that potentially compromised previous models' learning efficiency. Unlike its predecessors, `exp_34` does not employ sampling, really bad score, did not look good, perhaps sampler or mask
`exp_35`	same as `exp_32`, but it replaces the standard `rnatormer` bpp with `rnaformverv1` , its working fine, i think i can reach same score	sas `exp_32`
`exp_37`	same as `exp_35`, but it replaces the standard `rnatormer` bpp with `rnaformverv1` removed `rnafm`, its working fine, i think i can reach same score, `RNA_DatasetBaselineSplitssbppV6`, `RNA_ModelV26`, in the model residul replaced with `GRUresidual`	sas `exp_32`
`exp_38`	The approach in `exp_38` . Notably, this experiment employs a strategic data augmentation method by introducing a 'flip' technique. In addition added `rnafm` to `extra_bpp` as augmentaion because its noisy, `"vienna_2"`, `"contrafold_2"`, and `"rnaformerv1"`, Dataset uses `RNA_DatasetBaselineSplitssbppV7Flip`. Furthermore, `exp_38` , using `ExtractorV3`, that uses `MLP` instead of `res_block`. In the continued pursuit of architectural excellence, the model uses several blocks of the newly introduced `CombinationTransformerEncoderV2`, each endowed with `GRUGating`. Convergense a bit slow, maybe remove noise in `rnafm`	TBD
`exp_39`	In `exp_39`, the model architecture shifts to `RnaModelConvV2`, integrating 1D convolutions with `CombinationTransformerEncoderV1` blocks. The focus is to explore the effectiveness of 1D convolutions in the feature extraction phase. While the preliminary outcomes indicate an improvement, they still lag behind the best-performing models like `exp_32`	`0.1332`
`exp_40`	This experiment is a variant of `exp_39`, but it replaces the `RnaModelConvV3`, bassicly EffBlock followed by `CombinationTransformerEncoderV1`, it was going good but then overfitted	`0.12724`
`exp_41`	The model utilizes `ConvolutionConcatBlockV4`, consisting of `EffBlock` and `CombinationTransformerEncoderV29`. The latter takes an average of all `bpps` and `ss` inputs instead of individually feeding them to transformers. There are 6 blocks of `ConvolutionConcatBlockV4` followed by 10 standard transformer blocks. The primary objective is to see if averaging inputs combined with convolutional blocks can lead to a better performance. Trained untill `epoch` `28` then its started to overfit	`0.1278`

Name		Name	Last commit message	Last commit date
Latest commit History 118 Commits
eda		eda
exp		exp
nbs		nbs
rnacomp		rnacomp
sub		sub
README.md		README.md
settings.ini		settings.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Stanford Ribonanza RNA Folding (7th place, Gold Medal)

Summary

Installation

Experimetns and Results

About

Releases

Packages

Languages

Horikitasaku/rna-stanford

Folders and files

Latest commit

History

Repository files navigation

Stanford Ribonanza RNA Folding (7th place, Gold Medal)

Summary

Installation

Experimetns and Results

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages