Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mondrian Forests #10

Open
wants to merge 60 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
60 commits
Select commit Hold shift + click to select a range
6023dd1
Update ClassifierOutput docstring
MarcoDiFrancesco Apr 11, 2024
feba8a0
Add RegressionOutput to common
MarcoDiFrancesco Apr 11, 2024
c13d3c6
Merge branch 'online-ml:main' into main
MarcoDiFrancesco Apr 11, 2024
308a082
Add boilerplate code for mondrian forest
MarcoDiFrancesco Apr 12, 2024
3ba0e3a
Add keystroke dataset
MarcoDiFrancesco Apr 12, 2024
2f9e03d
Add all functions calls with unimplemented errors
MarcoDiFrancesco Apr 15, 2024
7b63db5
Add predict steps to be refactored
MarcoDiFrancesco Apr 15, 2024
d5bb6db
Add get features function
MarcoDiFrancesco Apr 16, 2024
b5b7ec4
Add Array library
MarcoDiFrancesco Apr 16, 2024
d613df2
Add randomization for cache tests
MarcoDiFrancesco Apr 16, 2024
2174472
Disable test github actions and enable only check
MarcoDiFrancesco Apr 17, 2024
1c91530
Remove verbose from build and test
MarcoDiFrancesco Apr 17, 2024
44cfba4
Add Stats struct and impl
MarcoDiFrancesco Apr 18, 2024
4c6ebe4
Add rust caching in actions
MarcoDiFrancesco Apr 18, 2024
1ccabc4
Split MondrianTree and MondrianForest
MarcoDiFrancesco Apr 22, 2024
ac71b06
Refactor to use Tree Vector indicies instead of pointers
MarcoDiFrancesco Apr 23, 2024
8aad4ed
Change actions cargo.lock to cargo.toml
MarcoDiFrancesco Apr 23, 2024
8c91dd8
Add print function for MondrianTree
MarcoDiFrancesco Apr 23, 2024
6b38849
Adding print functions to mondriantree and node
MarcoDiFrancesco Apr 23, 2024
107354a
Implement and test predict_proba
MarcoDiFrancesco Apr 24, 2024
4385fe8
Add unit test for predict_proba
MarcoDiFrancesco Apr 24, 2024
49d4e3e
Add final implementation of inference (predict_proba)
MarcoDiFrancesco Apr 24, 2024
a16d3e7
Add random distribution to extend mondrian block
MarcoDiFrancesco Apr 25, 2024
de5d67a
Add full extend_mondrian_block implementation
MarcoDiFrancesco Apr 25, 2024
667d35e
Add synthetic dataset and tree integrity tests
MarcoDiFrancesco Apr 25, 2024
f79864d
Fix pointer of grandpa on extend_mondrian_block
MarcoDiFrancesco Apr 26, 2024
989c176
Add recursive repr mondrian forest
MarcoDiFrancesco Apr 26, 2024
75e5feb
Add score function
MarcoDiFrancesco Apr 29, 2024
da4a00a
Remove debug statements
MarcoDiFrancesco Apr 30, 2024
717161f
Adjust code to River behaviour
MarcoDiFrancesco Apr 30, 2024
a9ca4bc
Adapt _go_downwards from River
MarcoDiFrancesco May 3, 2024
ccc9b1d
Update function names from nel215 to River
MarcoDiFrancesco May 3, 2024
30fb86b
Comment debug prints
MarcoDiFrancesco May 3, 2024
a619415
Remove unused imports
MarcoDiFrancesco May 3, 2024
da23d14
Add synthetic dataset download
MarcoDiFrancesco May 3, 2024
85030ad
Rename MondrianForest to MondrianForestClassifier
MarcoDiFrancesco May 6, 2024
c4753f1
Update readme with classification run instructions
MarcoDiFrancesco May 6, 2024
a08f922
Add update_leaf flag to create_leaf
MarcoDiFrancesco May 13, 2024
a00cfe5
Fix mondrian forest classifier test
MarcoDiFrancesco May 13, 2024
4d9ef48
Remove create_leaf flag
MarcoDiFrancesco May 20, 2024
0217db2
Add create leafs when reaching a leaf
MarcoDiFrancesco May 24, 2024
1e5a874
Add assert to check for NaN probability
MarcoDiFrancesco May 24, 2024
6971c21
Revert removal of split_time
MarcoDiFrancesco May 24, 2024
782d1f2
Add test cases
MarcoDiFrancesco May 29, 2024
a5bd895
Remove unused `child_is_on_edge_parent` test case
MarcoDiFrancesco May 29, 2024
3544c28
Add debug statement for overwriting variance aware estimation
MarcoDiFrancesco May 29, 2024
9083d8e
Add synthetic regression target boilerplate
MarcoDiFrancesco Jun 4, 2024
43cce28
Add Classification and Regression division of MF
MarcoDiFrancesco Jun 7, 2024
e58638b
Add regression task and parent_has_finite_values test
MarcoDiFrancesco Jun 11, 2024
fed6daf
Fix child_inside_parent test
MarcoDiFrancesco Jun 11, 2024
760de79
Remove prints in excess
MarcoDiFrancesco Jun 11, 2024
54bb202
Add regression metrics
MarcoDiFrancesco Jun 12, 2024
0d74d3f
Fix test keystroke dataset
MarcoDiFrancesco Jun 12, 2024
c60b381
Change description of synthetic dataset
MarcoDiFrancesco Jun 12, 2024
ec2109a
Add baseline comparison for regression
MarcoDiFrancesco Jun 24, 2024
b77ba69
Add machine degradation dataset
MarcoDiFrancesco Jul 9, 2024
a6c1b8b
Add genesis demostrator dataset
MarcoDiFrancesco Jul 10, 2024
4a4b9f5
Update machine degradation with redirect
MarcoDiFrancesco Jul 10, 2024
23c109e
Update src/datasets/synthetic_regression.rs
smastelini Jul 29, 2024
38e64ee
Update src/datasets/synthetic.rs
smastelini Jul 29, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Add random distribution to extend mondrian block
  • Loading branch information
MarcoDiFrancesco committed Apr 25, 2024
commit a16d3e7829c5b55ff33e3a15cda9cf72ee7cba1c
1 change: 1 addition & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ rand = "0.8.5"
time = "0.3.29"
half = "2.3.1"
ndarray = "0.15.6"
rand_distr = "0.4.3"

[dev-dependencies]
criterion = { version = "0.5", features = ["html_reports"] }
Expand Down
4 changes: 3 additions & 1 deletion examples/classification/keystroke.rs
Original file line number Diff line number Diff line change
Expand Up @@ -70,11 +70,13 @@ fn main() {
// DEBUG: remove it
x_ord = x_ord.slice(s![0..2]).to_owned();

println!("=M=1 partial_fit");
mf.partial_fit(&x_ord, &y);

println!("=M=2 predict_proba");
let score = mf.predict_proba(&x_ord);

println!("=== score:: {:?}", score);
println!("=M=3 score: {:?}", score);
println!("");

counter += 1;
Expand Down
1 change: 0 additions & 1 deletion src/classification/mondrian_node.rs
Original file line number Diff line number Diff line change
Expand Up @@ -159,7 +159,6 @@ impl<F: FType> Stats<F> {
pub fn predict_proba(&self, x: &Array1<F>) -> Array1<F> {
let mut probs = Array1::zeros(self.num_labels);
let mut sum_prob = F::zero();
println!("{self}");

for (index, ((sum, sq_sum), &count)) in self
.sums
Expand Down
27 changes: 18 additions & 9 deletions src/classification/mondrian_tree.rs
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ use num::pow::Pow;
use num::traits::float;
use num::{Float, FromPrimitive};
use rand::prelude::*;
use rand_distr::{Distribution, Exp};
use std::cell::RefCell;
use std::collections::HashMap;
use std::convert::TryFrom;
Expand Down Expand Up @@ -92,8 +93,17 @@ impl<F: FType> MondrianTree<F> {
self.predict(x, root, F::one())
}

fn extend_mondrian_block(&self, node: &Node<F>, x: &Array1<F>, label: &String) {
println!("=== extend_mondrian_block not implemented")
fn extend_mondrian_block(&mut self, node_idx: usize, x: &Array1<F>, label: &String) {
// TODO: Check if we access the node somewhere else by reference (&Node).
// If so pass it by ref here instead of 'node_idx' so we don't access it twice.
let node = &self.nodes[node_idx];
let e_min = (&node.min_list - x).mapv(|v| v.max(F::zero()));
let e_max = (x - &node.max_list).mapv(|v| v.max(F::zero()));
let e_sum = &e_min + &e_max;
let rate = e_sum.sum() + F::epsilon();
let exp_dist = Exp::new(rate.to_f32().unwrap()).unwrap();
let E = F::from_f32(exp_dist.sample(&mut self.rng)).unwrap();
println!("=== rate: {}, E: {}", rate, E);
}

/// Note: In Nel215 codebase should work on multiple records, here it's
Expand All @@ -104,7 +114,7 @@ impl<F: FType> MondrianTree<F> {
if self.nodes.len() == 0 {
self.create_leaf(x, y, None);
} else {
self.extend_mondrian_block(&self.nodes[0], x, y);
self.extend_mondrian_block(0, x, y);
}
}

Expand All @@ -116,8 +126,7 @@ impl<F: FType> MondrianTree<F> {
///
/// Recursive function to predict probabilities.
fn predict(&self, x: &Array1<F>, node: &Node<F>, p_not_separated_yet: F) -> Array1<F> {
// println!("Node: {node:?}");
println!("predict_proba() - MondrianTree: {}", self);
// println!("predict_proba() - MondrianTree: {}", self);

// Step 1: Calculate the time delta from the parent node.
// If node is root its time is 0
Expand Down Expand Up @@ -152,10 +161,10 @@ impl<F: FType> MondrianTree<F> {
// let p_not_separated_yet = F::from_f32(0.8).unwrap();
// let p = F::from_f32(0.9).unwrap();

println!(
"predict() - res: {:?}, p_not_separated_yet: {:?}, p: {:?}",
res, p_not_separated_yet, p
);
// println!(
// "predict() - res: {:?}, p_not_separated_yet: {:?}, p: {:?}",
// res, p_not_separated_yet, p
// );

if node.is_leaf {
let w = p_not_separated_yet * (F::one() - p);
Expand Down
Loading