Skip to content

mahynski/chemometric-carpentry

Repository files navigation

A Short Course in Chemometrics

Objectives

🎯 The goal of this short course is to introduce and explain elementary chemometric analysis methods. We will also touch on more advanced ML approaches. The course will cover the use of python-based tools that can accelerate your workflow and improve reproducibility. We will assume no prior knowledge or familiarity with any of these methods, tools, or mathematical background. We will review only as much mathematics as is necessary to ground an understanding of the methods discussed since a deep understanding is not necessary for application, which is the focus of this course.

🚀 What we hope to achieve:

  1. Give you a new set of tools to help you do your job better
  2. Create a coherent and more consistent approach to chemometric analysis by introducing you to a standard library for these tasks
  3. Improve reproducibility and transparency
  4. Create a community where ideas, needs, and methodologies can be exchanged

📚 In the end you will be able to go to a library of standardized example notebooks, select the one you need, enter your data, then run it from start to finish. This course will also teach you to modify and expand things as needed.

Outline

  1. Introduction
    • 📓 The Jupyter Notebook
      • The Basics
      • Google Colab
      • Managing Your Session
      • Installing Python Packages
      • Saving Code
    • 🐍 The Python Language
      • Why Learn Python?
      • Before We Get Started
      • Variables
        • Built-in Data Types
        • Variable Assignment and Operators
        • Sequences: Lists, Dictionaries, and Tuples
        • Referencing
      • Logic
        • Comparison Operators
        • Logical Operators
        • If Else Statements
      • Loops
        • For Loops
        • While Loops
      • Numpy, Scipy, and Pandas
        • Numpy
        • Scipy
        • Pandas
      • Plotting with Matplotlib
      • Defining Functions
        • Documentation and Type Hints
        • Scope
        • Number and Order of Arguments
        • Default Values
      • Object Orientation and Classes
    • 🔬 Chemometrics
      • The Authentication Problem
        • Some Motivating Examples
        • Class Models
        • A Machine Learning Perspective
      • $N << p$
      • Regression, Classification, and Clustering
      • scitkit-learn
      • PyChemAuth
    • 🔮 Statistics Background
      • $\chi^2$ statistics
      • Performance Metrics
      • Rashomon sets
      • Bias-Variance Tradeoff
  2. Techniques
    • Exploratory Data Analysis (EDA)
      • Basic Suggestions
      • Jensen-Shannon Divergence
        • What is it?
        • Developing an Intuition
        • JSD Reveals Plausible Tree Stumps
        • Identifying Clusters
        • Binary vs OvA
        • Common Pitfalls
      • See also:
    • Pipelines
    • Evaluation Metrics
    • Cross-Validation
  3. 🚦 Pre-processing
    • Scaling and Centering
    • Filtering
      • MSC
      • SNV and RNV
      • Savitzky-Golay
    • Missing Values and Imputation
      • Limits of Detection (LOD)
      • Basic Imputation
      • Predictive Imputers
    • Class Balancing
      • SMOTE
      • Edited Nearest Neighbors (ENN)
      • SMOTEENN
      • ScaledSMOTEENN
      • Imblearn pipelines
    • Feature Selection
  4. 🔳 Conventional Chemometric Models
  5. 💻 Machine Learning Models
    • 📈 Regression Models
      • Artificial Neural Networks
      • Explainable Boosting Machine
    • ✅ Classification Models
      • 🌳 Decision Trees
        • Visualizing Decision Trees
        • Visualizing Decision Boundaries
        • Pros and Cons
      • 🎼 Ensemble Methods
        • Bagging
        • Boosting
      • 🌳🌳🌳 Random Forests
      • Logistic Regression
    • Authentication Models
      • EllipticManifold
      • Out-of-Distribution / Novelty Detection
        • 🌳🙉🌳 Isolation Forest
        • Other Resources
      • Open Set Recognition
    • AutoML
      • What is it?
      • Caveats
  6. 🔍 Comparison and Inspection
    • Comparing Relative Performance of Pipelines
    • 👀 Model-agnostic Inspection Methods
      • Permutation Feature Importance (PFI)
      • SHapley Additive exPlanations (SHAP)
        • Shapley Values (Theory)
        • Computing SHAP Values (Practice)
        • Margin Space Explanation
        • Best Practices
    • Do I Need More Data?
  7. 💾 Saving and Sharing Models
  8. 📁 Case Studies

Next Steps:

  • ❓ You can ask questions, provide feedback, and find community support on the GitHub Discussions page for this course.
  • ✖️ If you find a mistake please submit a Bug Report.
  • 🔭 If you would us to cover new area(s) or have an idea to improve this course, please submit a Feature Request!
  • 💡 Is you have requests or ideas specific to PyChemAuth you can find similar options on its Issues page.
  • 🤝 Please consider contributing to PyChemAuth examples!

Instructor(s):

Thanks to 👏

The logo was designed using Google Gemini (Imagen 3) with the prompt "Design a logo for determining geographic origin using chemistry and statistical models" on Nov. 8, 2024.

About

A course in chemometric (data) carpentry.

Resources

License

Stars

Watchers

Forks

Packages

No packages published