Skip to content

heliohackweek/mms_data_hunt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

58 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

memes_logo

The MMS Event Explorer (MeMeS)

MeMeS is a Python-based collaborative project incorporating research performed during the 2020 Helio HackWeek, hosted by the NASA Center for Climate Simulation, the University of Washington eScience Institute, NVIDIA, and the University of Maryland Department of Geographical Sciences.

Contents

  1. Project Objective
  2. Development
  3. Contributing and Resources

Project Objective

To contribute to the scientific community a research-aiding utility that identifies characteristic Heliophysical events within the Magnetospheric Multiscale (MMS) mission observation record using identified events from Scientists in the Loop (SITL) reports as the ground-truth training set for machine learning training and event selection.


Development

Development of a project is not always linear and with varying degrees of skill and knowledge one may be more interested in contributing to different portions of this single effort. Below, are possible paths of development that have been envisioned for this particular scope of a 3-day programming sprint and out of scope development for future collaboration:

  • Step 0: Process SITL Reports (currently in progress)
    • Step 0.1: Build the ground-truth database
      1. Retrieve ASCII reports from Berkeley in a programmatic way.
        • (out of scope): Create a RESTFul, web-based API that allows scientists to search for events.
      2. Parse reports to obtain records of BBF and DF events.
        • (out of scope): Create a machine learning algorithm to pick up more events based upon case sensitivity, misspelling, and typographic errors. For now, a 1-year record of exact (or minimal variations) phrase occurances will suffice.
      3. Store results in a database for future use and organization.
        • (out of scope): Weigh the options of SQL and SQL-less types of storage. For now, a SQLite file-based database will suffice.
      4. (out of scope): Automate process for future reports as well as process full record of reports.
      5. (out of scope): Create a standardized mechanism and format for reporting.
    • Step 0.2: Frontend work
      1. (out of scope): Website with sarch that queries database based upon event type, date range, or last N events.
      2. (out of scope): API underneath website so researchers can also grab event information from database using HTTP methods.
      3. (out of scope): Visually map events to xy and yz planes for visual representation and identification.
      4. (out of scope): Add functionality of mapping/visualualizations to show data layers or with slight transparency.
      5. (out of scope): Extend to produce a catalog of all events reported (beyond BBF and DF).
  • Step 1: Event Finder
    • Step 1.1: Build a Python package to search MMS data for events.
      1. Use pyspedas to retrieve/stream data for specific dates and times reported from a sample CSV file from SITL database of reports.
      2. Identify specific type of observation from MMS needed to identify/characterize events.
      3. Put data in a Pandas dataframe or structure in a more Python way.
      4. Manipulate data to identify event duration, magnitude, location, and observational files used.
      5. Add event information to corresponding event record within SITL database to reference data, imagery, and other useful information (metadata or resources).
    • Step 1.2: Use ML to find similar type of events for a different time period.
      1. Once a small subset of events have been confirmed from the data, a training set will be created for each event type.
      2. Use of tensorflow (specifically requested by scientists) to produce results of similar events within the spanse of observational records.
      3. Optimize algorithm to optimize resource use and I/O overhead.
    • Step 1.3: Use GPU processing to optimize ML techniques
      1. Use RAPIDS on NVIDIA resources to expedite processing of data for events
      2. Build optimized algorithms to sift through large amounts of data for analysis

Contributing and Resources

Thank you for your interest in this project! We welcome pull requests from developers of all skill levels. To get started, simply fork the master branch on GitHub to your personal account and then clone the fork into your development environment.