Skip to content

Applying Self-Supervised Representation Learning to the Ultraviolet Near Infrared Optical Northern Survey

License

Notifications You must be signed in to change notification settings

ashley-ferreira/AstroMASK

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

91 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AstroMASK: Astronomy Masked Autoencoder Self-supervised Knowledge

Applying Self-Supervised Representation Learning to the Ultraviolet Near Infrared Optical Northern Survey

NOTE: This repository is very much still a work in progress.

About

The Ultraviolet Near Infrared Optical Northern Survey (UNIONS) uses observations from three telescopes in Hawaii and aims to answer some of the most fundamental questions in astrophysics such as determining the properties of dark matter and dark energy, as well as the growth of structure in the Universe. However, being able to effectively search through and categorize the data in order to extract these insights can be cumbersome. This project hopes to exploit recent advances in a sub-field of Machine Learning (ML), called Self-Supervised Learning (SSL), including Masked Autoencoders (MAE) with Vision Transformer (ViT) backbones to train a model to produce meaningful lower-dimensional representations of astronomy observations without the need for explicit labels. These models have shown to be effective at performing similarity searches and take far fewer labels to fine-tune for downstream tasks such as strong lens detection.

Learn More

Presentations

machine learning jamboree slides from November 2023

PHYS 437A mid-term presentation slides from October 2023

UNIONS and Japanese Euclid Consortium Joint Meeting from January 2024, learn more here

PHYS 437B mid-term presentation slides from February 2024

Posters

CCUWiP poster from January 2024

Reports

PHYS 437A written report from December 2023

One of the mistakes that exists in this report is that when generating representations, the mask ratio is still set at 0.5 but should be set to 0.0. Here is a high-quality version of the t-SNE and UMAP with the proper mask ratio.

Pre-requisites

/arc/projects/unions/ssl/data/processed/unions-cutouts/ugriz_lsb/10k_per_h5

Quick Start

  1. Follow this link to CANFAR Science Portal and log on to your CADC account
  2. Launch a notebook with container image "skaha/astroml-notebook:latest"
  3. Enter the following in their terminal with your CADC username in the place of YOUR_USERNAME
cadc-get-cert -u YOUR_USERNAME
  1. Then run the following to launch a GPU session
curl -E .ssl/cadcproxy.pem 'https://ws-uv.canfar.net/skaha/v0/session?name=notebookgpu&cores=2&ram=16&gpus=1' -d image="images.canfar.net/skaha/astroml-gpu-notebook:latest"
  1. Now, if you return to the CANFAR Science Portal you should see a new notebook session that has access to a GPU. If it stays greyed out this is likely because all GPUs are currently claimed.
  2. Navigate to the directory you save you want to save code in and clone the following two repositories
git clone https://github.com/astroai/dark3d.git
git clone https://github.com/ashley-ferreira/mae.git
  1. You are now ready to train the model! Navigate to the mae directory and run the script that trains the model
cd mae
python main_pretrain.py

which saves checkpoints in

/arc/projects/unions/ssl/data/processed/unions-cutouts/ugriz_lsb/output_dir/DATETIME

where DATETIME is the time at which the code began running in UTC.

  1. To analyze the outputs of this model, specifically the image reconstructions and the representations through UMAPs, t-SNEs, and similarity search, then run the notebooks found in mae/demo.

Acknowledgements

Many others have contributed to this effort including my supervisors Sebastien Fabbro and Mike Hudson, as well as Spencer Bialek, Nat Comeau, Nick Heesters, and Leonardo Ferreira.

This research used the facilities of the CADC operated by the National Research Council of Canada with the support of the Canadian Space Agency. Without the CADC's CANFAR platform, none of this work would have been possible, the platform was used to host and access the data as well as perform all computational work needed.

All data used for this project is from UNIONS and so this survey has been instrumental in every aspect of this project.

Built With

Python Notebook PyTorch WandB

W&B experiment tracking software gives free student accounts and that was tremendously helpful to be able to use to debug and keep track of different experiments.

Finally, this work project has heavily relied on open-source software. All the programming was done in Python and made use of its many associated packages including NumPy, Matplotlib, and two key contributions from Meta AI: PyTorch and Faiss. Additionally, this work made use of Astropy a community-developed core Python package and an ecosystem of tools and resources for astronomy.

This code is also forked from another repository for which information is available below:

Masked Autoencoders: A PyTorch Implementation

This is a PyTorch/GPU re-implementation of the paper Masked Autoencoders Are Scalable Vision Learners:

@Article{MaskedAutoencoders2021,
  author  = {Kaiming He and Xinlei Chen and Saining Xie and Yanghao Li and Piotr Doll{\'a}r and Ross Girshick},
  journal = {arXiv:2111.06377},
  title   = {Masked Autoencoders Are Scalable Vision Learners},
  year    = {2021},
}
  • The original implementation was in TensorFlow+TPU. This re-implementation is in PyTorch+GPU.

  • This repo is a modification on the DeiT repo. Installation and preparation follow that repo.

  • This repo is based on timm==0.3.2, for which a fix is needed to work with PyTorch 1.8.1+.

Catalog

  • Visualization demo
  • Pre-trained checkpoints + fine-tuning code
  • Pre-training code

Visualization demo

Run our interactive visualization demo using Colab notebook (no GPU needed):

Fine-tuning with pre-trained checkpoints

The following table provides the pre-trained checkpoints used in the paper, converted from TF/TPU to PT/GPU:

ViT-Base ViT-Large ViT-Huge
pre-trained checkpoint download download download
md5 8cad7c b8b06e 9bdbb0

The fine-tuning instruction is in FINETUNE.md.

By fine-tuning these pre-trained models, we rank #1 in these classification tasks (detailed in the paper):

ViT-B ViT-L ViT-H ViT-H448 prev best
ImageNet-1K (no external data) 83.6 85.9 86.9 87.8 87.1
following are evaluation of the same model weights (fine-tuned in original ImageNet-1K):
ImageNet-Corruption (error rate) 51.7 41.8 33.8 36.8 42.5
ImageNet-Adversarial 35.9 57.1 68.2 76.7 35.8
ImageNet-Rendition 48.3 59.9 64.4 66.5 48.7
ImageNet-Sketch 34.5 45.3 49.6 50.9 36.0
following are transfer learning by fine-tuning the pre-trained MAE on the target dataset:
iNaturalists 2017 70.5 75.7 79.3 83.4 75.4
iNaturalists 2018 75.4 80.1 83.0 86.8 81.2
iNaturalists 2019 80.5 83.4 85.7 88.3 84.1
Places205 63.9 65.8 65.9 66.8 66.0
Places365 57.9 59.4 59.8 60.3 58.0

Pre-training

The pre-training instruction is in PRETRAIN.md.

License

This project is under the CC-BY-NC 4.0 license. See LICENSE for details.

About

Applying Self-Supervised Representation Learning to the Ultraviolet Near Infrared Optical Northern Survey

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 66.6%
  • Python 33.4%