Neb-usgs

This repository is designed to serve as a helping guide to any USGS employee or contractor that need to learn how to use the compute cluster at a much faster pace than just learning from the documentation alone. Let me repeat, ALONE. That means that I expect you to read the documentation because it is your responsibility to everyone who uses Nebula to know how it allocates resources and uses CPUs and memory to complete massive workloads. Plus it really helps to understand how the WM (Workload Manager) operates.

DISCLAIMER

This repository could be used as a general use reference to any Slurm Workload Manager, but that being said, scripts in this repo are designed for people familiar with ISIS3 and specifically for USGS internal staff.

In this document I will use (USGS) in the headers to tell which sections are for USGS users only

Introduction

Nebula is a Slurm Implementation that is used by USGS Astro employees to process very large sums of data files. The main purpose of this system is to run programs and jobs in hours on a super powerful machine to speed up a processing task that would take weeks for a normal machine to complete.

Nebula uses Slurm to manage its resources and schedule jobs. Slurm is an open-source, highly scalable cluster management and job scheduling system for large and small Linux clusters. This system is used in many fields and has uses in many fields yet to be discovered. This document can serve as a helping hand for anyone who wants to learn more about what a system like this could be used for and how to use it yourself.

Prerequisites (USGS)

IT has to clear you for connection to Nebula. But After you get clearance you should be able to simply ssh in using your USGS credentials. You should use the /scratch directory to process all data because that folder has the highest priority in the workload manager configuration, so make sure that IT creates a folder in /scratch specifically for you to access and change as you please.

Examples

To begin any job on a Slurm WLM you must invoke the job dispatch command. sbatch. Look at this link for some help understanding the options for slurm jobss.

Single Command Line Task

sbatch --job-name=EchoExampleJob --wrap="echo 'Slurm Job Ran'" --output=output-job-%j.out

Important Must use keyword --wrap to tell WLM that you are sending a string as a command instead of a file. Other arguments are:

--job-name: The name of the job so you can see its progress using the command squeue.
--output: The name of the output file. can also be a path to specify where to place the log file.

Simple Batch File

First you must prepare a script to pass to the WLM. In what ever path you wish to run the job in, create a file called slurm-job.sbatch that contains these lines and comments

#!/bin/bash
# File Desc: This is a test file for a Slurm job.

#SBATCH --partition=<SOME PARTITION> # required for most architectures but varies for every Slurm instance
#SBATCH --job-name=myjobname
#SBATCH --output=myjob.slurm.out

echo "Slurm Job run from bash script file"
echo "I can just list commands now"

Lastly, run the job command with the job script as the only argument. sbatch slurm-job.sbatch

Multi-Processor Example

For this example we want to create another script file, call it slurm-threaded.sbatch.

#!/bin/bash
# File Desc: This is a test file to show how to initialize a slurm job using multiple processing cores on a single node
#SBATCH --job-name=threadedjobname
#SBATCH --time=01:00:00
#SBATCH --partition=<Some Partition>
#SBATCH --output=threaded.slurm.out

# Here is where we can initialize the processor architecture for the job
#SBATCH --nodes=1                           # Use 1 processing node
#SBATCH --ntasks-per-node=1                 # Allow only 1 file per node
#SBATCH --cpus-per-task=4                   # give the node 4 cores to use

# give +2GB of memory to the cpus from the default ( default = 2GB )
#SBATCH --mem-per-cpu=4G

parallel "echo 'Echoed from some core on node'"

Then, just run the dispatch command again. sbatch slurm-threaded.sbatch

Learn More About USGS Nebula

Reading through these sites will give you basic understanding of the controls you will need to understand and use often when testing. Ex. 'scancel' Some of these links will also give you a much stronger understanding of what Nebula is build using and how Nebula allocates its resources.

Nebula Internal Homepage ( USGS Internal Network * No Public Access *)

USGS Employees Only

Slurm Workload Manager

Slurm is a workload managing tool that run different operations on a computer using nodes and tasks to manage the resources of the machine and run processes on multiple processors with various processing cores free to work on different jobs all at once.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
res		res
scripts		scripts
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Neb-usgs

DISCLAIMER

Introduction

Prerequisites (USGS)

Examples

Single Command Line Task

Simple Batch File

Multi-Processor Example

Learn More About USGS Nebula

Nebula Internal Homepage ( USGS Internal Network * No Public Access *)

Slurm Workload Manager

About

Releases

Packages

Languages

ChaddFrasier/Neb-usgs

Folders and files

Latest commit

History

Repository files navigation

Neb-usgs

DISCLAIMER

Introduction

Prerequisites (USGS)

Examples

Single Command Line Task

Simple Batch File

Multi-Processor Example

Learn More About USGS Nebula

Nebula Internal Homepage ( USGS Internal Network * No Public Access *)

Slurm Workload Manager

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages