GitHub - ashley-ferreira/AstroMASK: Applying Self-Supervised Representation Learning to the Ultraviolet Near Infrared Optical Northern Survey

AstroMASK: Astronomy Masked Autoencoder Self-supervised Knowledge

Applying Self-Supervised Representation Learning to the Ultraviolet Near Infrared Optical Northern Survey

NOTE: This repository is very much still a work in progress.

About

The Ultraviolet Near Infrared Optical Northern Survey (UNIONS) uses observations from three telescopes in Hawaii and aims to answer some of the most fundamental questions in astrophysics such as determining the properties of dark matter and dark energy, as well as the growth of structure in the Universe. However, being able to effectively search through and categorize the data in order to extract these insights can be cumbersome. This project hopes to exploit recent advances in a sub-field of Machine Learning (ML), called Self-Supervised Learning (SSL), including Masked Autoencoders (MAE) with Vision Transformer (ViT) backbones to train a model to produce meaningful lower-dimensional representations of astronomy observations without the need for explicit labels. These models have shown to be effective at performing similarity searches and take far fewer labels to fine-tune for downstream tasks such as strong lens detection.

Learn More

Presentations

machine learning jamboree slides from November 2023

PHYS 437A mid-term presentation slides from October 2023

UNIONS and Japanese Euclid Consortium Joint Meeting from January 2024, learn more here

PHYS 437B mid-term presentation slides from February 2024

Posters

CCUWiP poster from January 2024

Reports

PHYS 437A written report from December 2023

One of the mistakes that exists in this report is that when generating representations, the mask ratio is still set at 0.5 but should be set to 0.0. Here is a high-quality version of the t-SNE and UMAP with the proper mask ratio.

Pre-requisites

read access to the dark3d repo
a Weights & Biases account
CADC account
read access to the following path on CADC's CANFAR

/arc/projects/unions/ssl/data/processed/unions-cutouts/ugriz_lsb/10k_per_h5

Quick Start

Follow this link to CANFAR Science Portal and log on to your CADC account
Launch a notebook with container image "skaha/astroml-notebook:latest"
Enter the following in their terminal with your CADC username in the place of YOUR_USERNAME

cadc-get-cert -u YOUR_USERNAME

Then run the following to launch a GPU session

curl -E .ssl/cadcproxy.pem 'https://ws-uv.canfar.net/skaha/v0/session?name=notebookgpu&cores=2&ram=16&gpus=1' -d image="images.canfar.net/skaha/astroml-gpu-notebook:latest"

Now, if you return to the CANFAR Science Portal you should see a new notebook session that has access to a GPU. If it stays greyed out this is likely because all GPUs are currently claimed.
Navigate to the directory you save you want to save code in and clone the following two repositories

git clone https://github.com/astroai/dark3d.git
git clone https://github.com/ashley-ferreira/mae.git

You are now ready to train the model! Navigate to the mae directory and run the script that trains the model

cd mae
python main_pretrain.py

which saves checkpoints in

/arc/projects/unions/ssl/data/processed/unions-cutouts/ugriz_lsb/output_dir/DATETIME

where DATETIME is the time at which the code began running in UTC.

To analyze the outputs of this model, specifically the image reconstructions and the representations through UMAPs, t-SNEs, and similarity search, then run the notebooks found in mae/demo.

Acknowledgements

Many others have contributed to this effort including my supervisors Sebastien Fabbro and Mike Hudson, as well as Spencer Bialek, Nat Comeau, Nick Heesters, and Leonardo Ferreira.

This research used the facilities of the CADC operated by the National Research Council of Canada with the support of the Canadian Space Agency. Without the CADC's CANFAR platform, none of this work would have been possible, the platform was used to host and access the data as well as perform all computational work needed.

All data used for this project is from UNIONS and so this survey has been instrumental in every aspect of this project.

Built With

W&B experiment tracking software gives free student accounts and that was tremendously helpful to be able to use to debug and keep track of different experiments.

Finally, this work project has heavily relied on open-source software. All the programming was done in Python and made use of its many associated packages including NumPy, Matplotlib, and two key contributions from Meta AI: PyTorch and Faiss. Additionally, this work made use of Astropy a community-developed core Python package and an ecosystem of tools and resources for astronomy.

This code is also forked from another repository for which information is available below:

Masked Autoencoders: A PyTorch Implementation

This is a PyTorch/GPU re-implementation of the paper Masked Autoencoders Are Scalable Vision Learners:

@Article{MaskedAutoencoders2021,
  author  = {Kaiming He and Xinlei Chen and Saining Xie and Yanghao Li and Piotr Doll{\'a}r and Ross Girshick},
  journal = {arXiv:2111.06377},
  title   = {Masked Autoencoders Are Scalable Vision Learners},
  year    = {2021},
}

The original implementation was in TensorFlow+TPU. This re-implementation is in PyTorch+GPU.
This repo is a modification on the DeiT repo. Installation and preparation follow that repo.
This repo is based on timm==0.3.2, for which a fix is needed to work with PyTorch 1.8.1+.

Catalog

Visualization demo
Pre-trained checkpoints + fine-tuning code
Pre-training code

Visualization demo

Run our interactive visualization demo using Colab notebook (no GPU needed):

Fine-tuning with pre-trained checkpoints

The following table provides the pre-trained checkpoints used in the paper, converted from TF/TPU to PT/GPU:

	ViT-Base	ViT-Large	ViT-Huge
pre-trained checkpoint	download	download	download
md5	`8cad7c`	`b8b06e`	`9bdbb0`

The fine-tuning instruction is in FINETUNE.md.

By fine-tuning these pre-trained models, we rank #1 in these classification tasks (detailed in the paper):

	ViT-B	ViT-L	ViT-H	ViT-H₄₄₈	prev best
ImageNet-1K (no external data)	83.6	85.9	86.9	87.8	87.1
following are evaluation of the same model weights (fine-tuned in original ImageNet-1K):
ImageNet-Corruption (error rate)	51.7	41.8	33.8	36.8	42.5
ImageNet-Adversarial	35.9	57.1	68.2	76.7	35.8
ImageNet-Rendition	48.3	59.9	64.4	66.5	48.7
ImageNet-Sketch	34.5	45.3	49.6	50.9	36.0
following are transfer learning by fine-tuning the pre-trained MAE on the target dataset:
iNaturalists 2017	70.5	75.7	79.3	83.4	75.4
iNaturalists 2018	75.4	80.1	83.0	86.8	81.2
iNaturalists 2019	80.5	83.4	85.7	88.3	84.1
Places205	63.9	65.8	65.9	66.8	66.0
Places365	57.9	59.4	59.8	60.3	58.0

Pre-training

The pre-training instruction is in PRETRAIN.md.

License

This project is under the CC-BY-NC 4.0 license. See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 91 Commits
demo		demo
util		util
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
FINETUNE.md		FINETUNE.md
LICENSE		LICENSE
PRETRAIN.md		PRETRAIN.md
README.md		README.md
engine_finetune.py		engine_finetune.py
engine_pretrain.py		engine_pretrain.py
linprobe_redshift.ipynb		linprobe_redshift.ipynb
linprobe_redshift_script.py		linprobe_redshift_script.py
main_finetune.py		main_finetune.py
main_linprobe.py		main_linprobe.py
main_pretrain.py		main_pretrain.py
models_mae.py		models_mae.py
models_vit.py		models_vit.py
similarity_search.py		similarity_search.py
submitit_finetune.py		submitit_finetune.py
submitit_linprobe.py		submitit_linprobe.py
submitit_pretrain.py		submitit_pretrain.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AstroMASK: Astronomy Masked Autoencoder Self-supervised Knowledge

Applying Self-Supervised Representation Learning to the Ultraviolet Near Infrared Optical Northern Survey

About

Learn More

Presentations

Posters

Reports

Pre-requisites

Quick Start

Acknowledgements

Built With

Masked Autoencoders: A PyTorch Implementation

Catalog

Visualization demo

Fine-tuning with pre-trained checkpoints

Pre-training

License

About

Releases

Packages

Languages

License

ashley-ferreira/AstroMASK

Folders and files

Latest commit

History

Repository files navigation

AstroMASK: Astronomy Masked Autoencoder Self-supervised Knowledge

Applying Self-Supervised Representation Learning to the Ultraviolet Near Infrared Optical Northern Survey

About

Learn More

Presentations

Posters

Reports

Pre-requisites

Quick Start

Acknowledgements

Built With

Masked Autoencoders: A PyTorch Implementation

Catalog

Visualization demo

Fine-tuning with pre-trained checkpoints

Pre-training

License

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages