Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mondrian Forests #10

Open
wants to merge 60 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
60 commits
Select commit Hold shift + click to select a range
6023dd1
Update ClassifierOutput docstring
MarcoDiFrancesco Apr 11, 2024
feba8a0
Add RegressionOutput to common
MarcoDiFrancesco Apr 11, 2024
c13d3c6
Merge branch 'online-ml:main' into main
MarcoDiFrancesco Apr 11, 2024
308a082
Add boilerplate code for mondrian forest
MarcoDiFrancesco Apr 12, 2024
3ba0e3a
Add keystroke dataset
MarcoDiFrancesco Apr 12, 2024
2f9e03d
Add all functions calls with unimplemented errors
MarcoDiFrancesco Apr 15, 2024
7b63db5
Add predict steps to be refactored
MarcoDiFrancesco Apr 15, 2024
d5bb6db
Add get features function
MarcoDiFrancesco Apr 16, 2024
b5b7ec4
Add Array library
MarcoDiFrancesco Apr 16, 2024
d613df2
Add randomization for cache tests
MarcoDiFrancesco Apr 16, 2024
2174472
Disable test github actions and enable only check
MarcoDiFrancesco Apr 17, 2024
1c91530
Remove verbose from build and test
MarcoDiFrancesco Apr 17, 2024
44cfba4
Add Stats struct and impl
MarcoDiFrancesco Apr 18, 2024
4c6ebe4
Add rust caching in actions
MarcoDiFrancesco Apr 18, 2024
1ccabc4
Split MondrianTree and MondrianForest
MarcoDiFrancesco Apr 22, 2024
ac71b06
Refactor to use Tree Vector indicies instead of pointers
MarcoDiFrancesco Apr 23, 2024
8aad4ed
Change actions cargo.lock to cargo.toml
MarcoDiFrancesco Apr 23, 2024
8c91dd8
Add print function for MondrianTree
MarcoDiFrancesco Apr 23, 2024
6b38849
Adding print functions to mondriantree and node
MarcoDiFrancesco Apr 23, 2024
107354a
Implement and test predict_proba
MarcoDiFrancesco Apr 24, 2024
4385fe8
Add unit test for predict_proba
MarcoDiFrancesco Apr 24, 2024
49d4e3e
Add final implementation of inference (predict_proba)
MarcoDiFrancesco Apr 24, 2024
a16d3e7
Add random distribution to extend mondrian block
MarcoDiFrancesco Apr 25, 2024
de5d67a
Add full extend_mondrian_block implementation
MarcoDiFrancesco Apr 25, 2024
667d35e
Add synthetic dataset and tree integrity tests
MarcoDiFrancesco Apr 25, 2024
f79864d
Fix pointer of grandpa on extend_mondrian_block
MarcoDiFrancesco Apr 26, 2024
989c176
Add recursive repr mondrian forest
MarcoDiFrancesco Apr 26, 2024
75e5feb
Add score function
MarcoDiFrancesco Apr 29, 2024
da4a00a
Remove debug statements
MarcoDiFrancesco Apr 30, 2024
717161f
Adjust code to River behaviour
MarcoDiFrancesco Apr 30, 2024
a9ca4bc
Adapt _go_downwards from River
MarcoDiFrancesco May 3, 2024
ccc9b1d
Update function names from nel215 to River
MarcoDiFrancesco May 3, 2024
30fb86b
Comment debug prints
MarcoDiFrancesco May 3, 2024
a619415
Remove unused imports
MarcoDiFrancesco May 3, 2024
da23d14
Add synthetic dataset download
MarcoDiFrancesco May 3, 2024
85030ad
Rename MondrianForest to MondrianForestClassifier
MarcoDiFrancesco May 6, 2024
c4753f1
Update readme with classification run instructions
MarcoDiFrancesco May 6, 2024
a08f922
Add update_leaf flag to create_leaf
MarcoDiFrancesco May 13, 2024
a00cfe5
Fix mondrian forest classifier test
MarcoDiFrancesco May 13, 2024
4d9ef48
Remove create_leaf flag
MarcoDiFrancesco May 20, 2024
0217db2
Add create leafs when reaching a leaf
MarcoDiFrancesco May 24, 2024
1e5a874
Add assert to check for NaN probability
MarcoDiFrancesco May 24, 2024
6971c21
Revert removal of split_time
MarcoDiFrancesco May 24, 2024
782d1f2
Add test cases
MarcoDiFrancesco May 29, 2024
a5bd895
Remove unused `child_is_on_edge_parent` test case
MarcoDiFrancesco May 29, 2024
3544c28
Add debug statement for overwriting variance aware estimation
MarcoDiFrancesco May 29, 2024
9083d8e
Add synthetic regression target boilerplate
MarcoDiFrancesco Jun 4, 2024
43cce28
Add Classification and Regression division of MF
MarcoDiFrancesco Jun 7, 2024
e58638b
Add regression task and parent_has_finite_values test
MarcoDiFrancesco Jun 11, 2024
fed6daf
Fix child_inside_parent test
MarcoDiFrancesco Jun 11, 2024
760de79
Remove prints in excess
MarcoDiFrancesco Jun 11, 2024
54bb202
Add regression metrics
MarcoDiFrancesco Jun 12, 2024
0d74d3f
Fix test keystroke dataset
MarcoDiFrancesco Jun 12, 2024
c60b381
Change description of synthetic dataset
MarcoDiFrancesco Jun 12, 2024
ec2109a
Add baseline comparison for regression
MarcoDiFrancesco Jun 24, 2024
b77ba69
Add machine degradation dataset
MarcoDiFrancesco Jul 9, 2024
a6c1b8b
Add genesis demostrator dataset
MarcoDiFrancesco Jul 10, 2024
4a4b9f5
Update machine degradation with redirect
MarcoDiFrancesco Jul 10, 2024
23c109e
Update src/datasets/synthetic_regression.rs
smastelini Jul 29, 2024
38e64ee
Update src/datasets/synthetic.rs
smastelini Jul 29, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Add regression task and parent_has_finite_values test
  • Loading branch information
MarcoDiFrancesco committed Jun 11, 2024
commit e58638b1a6b9fa05ba1187b6fca72eb502ac0df7
4 changes: 2 additions & 2 deletions examples/anomaly_detection/credit_card.rs
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
use light_river::anomaly::half_space_tree::HalfSpaceTree;
use light_river::common::ClassifierOutput;
use light_river::common::ClassifierTarget;
use light_river::common::ClfTarget;
use light_river::datasets::credit_card::CreditCard;
use light_river::metrics::rocauc::ROCAUC;
use light_river::metrics::traits::ClassificationMetric;
Expand All @@ -16,7 +16,7 @@ fn main() {
let window_size: u32 = 1000;
let n_trees: u32 = 50;
let height: u32 = 6;
let pos_val_metric = ClassifierTarget::from("1".to_string());
let pos_val_metric = ClfTarget::from("1".to_string());
let pos_val_tree = pos_val_metric.clone();
let mut roc_auc: ROCAUC<f32> = ROCAUC::new(Some(10), pos_val_metric.clone());
// INITIALIZATION
Expand Down
8 changes: 4 additions & 4 deletions examples/classification/keystroke.rs
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
use light_river::common::ClassifierTarget;
use light_river::common::ClfTarget;
use light_river::datasets::keystroke::Keystroke;
use light_river::mondrian_forest::mondrian_forest::{MondrianForest, MondrianForestClassifier};

Expand All @@ -24,7 +24,7 @@ fn get_labels(transactions: IterCsv<f32, File>) -> Vec<String> {
let mut labels = vec![];
for t in transactions {
let data = t.unwrap();
// TODO: use instead 'to_classifier_target' and a vector of 'ClassifierTarget'
// TODO: use instead 'to_classifier_target' and a vector of 'ClfTarget'
let target = data.get_y().unwrap()["subject"].to_string();
if !labels.contains(&target) {
labels.push(target);
Expand Down Expand Up @@ -59,10 +59,10 @@ fn main() {

let x = data.get_observation();
let y = data.to_classifier_target("subject").unwrap();
// TODO: generalize to non-classification only by implementing 'ClassifierTarget'
// TODO: generalize to non-classification only by implementing 'ClfTarget'
// instead of taking directly the string.
let y = match y {
ClassifierTarget::String(y) => y,
ClfTarget::String(y) => y,
_ => unimplemented!(),
};
let y = labels.clone().iter().position(|l| l == &y).unwrap();
Expand Down
10 changes: 5 additions & 5 deletions examples/classification/synthetic.rs
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
use light_river::mondrian_forest::mondrian_forest::MondrianForestClassifier;

use light_river::common::{Classifier, ClassifierTarget};
use light_river::common::{Classifier, ClfTarget};
use light_river::datasets::synthetic::Synthetic;
use light_river::stream::iter_csv::IterCsv;
use ndarray::Array1;
Expand All @@ -24,7 +24,7 @@ fn get_labels(transactions: IterCsv<f32, File>) -> Vec<String> {
let mut labels = vec![];
for t in transactions {
let data = t.unwrap();
// TODO: use instead 'to_classifier_target' and a vector of 'ClassifierTarget'
// TODO: use instead 'to_classifier_target' and a vector of 'ClfTarget'
let target = data.get_y().unwrap()["label"].to_string();
if !labels.contains(&target) {
labels.push(target);
Expand Down Expand Up @@ -68,11 +68,11 @@ fn main() {

let y = data.to_classifier_target("label").unwrap();
let y = match y {
ClassifierTarget::String(y) => y,
ClfTarget::String(y) => y,
_ => unimplemented!(),
};
let y = labels.clone().iter().position(|l| l == &y).unwrap();
let y = ClassifierTarget::from(y);
let y = ClfTarget::from(y);
// println!("=M=1 x:{}, idx: {}", x, idx);

// Skip first sample since tree has still no node
Expand All @@ -87,7 +87,7 @@ fn main() {
);
}

// if idx == 527 {
// if idx == 4 {
// break;
// }

Expand Down
15 changes: 5 additions & 10 deletions examples/regression/synthetic_regression.rs
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
use light_river::mondrian_forest::mondrian_forest::MondrianForestRegressor;

use light_river::common::{Regressor, RegressorTarget};
use light_river::common::{RegTarget, Regressor};
use light_river::datasets::synthetic_regression::SyntheticRegression;
use light_river::stream::iter_csv::IterCsv;
use ndarray::Array1;
Expand Down Expand Up @@ -52,18 +52,13 @@ fn main() {

let y = data.to_regression_target("label").unwrap();

// println!("=M=1 x:{}, idx: {}", x, idx);
println!("=M=1 idx={idx}, x={x}, y={y}");

// Skip first sample since tree has still no node
if idx != 0 {
let score = mf.predict_one(&x, &y);
score_total += score;
// println!(
// "Accuracy: {} / {} = {}",
// score_total,
// dataset_size - 1,
// score_total / idx.to_f32().unwrap()
// );
let pred = mf.predict_one(&x, &y);
let err = (pred - y).powi(2);
println!("pred: {pred}, y: {y}, err: {err}");
}

// if idx == 527 {
Expand Down
12 changes: 5 additions & 7 deletions src/anomaly/half_space_tree.rs
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ use std::convert::TryFrom;
use std::mem;
use std::ops::{AddAssign, DivAssign, MulAssign, SubAssign};

use crate::common::{ClassifierOutput, ClassifierTarget, Observation};
use crate::common::{ClassifierOutput, ClfTarget, Observation};

// Return the index of a node's left child node.
#[inline]
Expand Down Expand Up @@ -97,15 +97,15 @@ pub struct HalfSpaceTree<F: Float + FromPrimitive + AddAssign + SubAssign + MulA
n_nodes: u32,
trees: Option<Trees<F>>,
first_learn: bool,
pos_val: Option<ClassifierTarget>,
pos_val: Option<ClfTarget>,
}
impl<F: Float + FromPrimitive + AddAssign + SubAssign + MulAssign + DivAssign> HalfSpaceTree<F> {
pub fn new(
window_size: u32,
n_trees: u32,
height: u32,
features: Option<Vec<String>>,
pos_val: Option<ClassifierTarget>,
pos_val: Option<ClfTarget>,
// rng: ThreadRng,
) -> Self {
// let mut rng = rand::thread_rng();
Expand Down Expand Up @@ -220,9 +220,7 @@ impl<F: Float + FromPrimitive + AddAssign + SubAssign + MulAssign + DivAssign> H
score = F::one() - (score / self.max_score());

return Some(ClassifierOutput::Probabilities(HashMap::from([(
ClassifierTarget::from(
self.pos_val.clone().unwrap_or(ClassifierTarget::from(true)),
),
ClfTarget::from(self.pos_val.clone().unwrap_or(ClfTarget::from(true))),
score,
)])));
// return Some(score);
Expand Down Expand Up @@ -262,7 +260,7 @@ mod tests {
n_trees,
height,
None,
Some(ClassifierTarget::from("1".to_string())),
Some(ClfTarget::from("1".to_string())),
);

// LOOP
Expand Down
Loading