You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If you pass a non-existent file via parameter forcedsplits_filename, lightgbm appears to silently ignore it.
It should raise an informative if reading that file fails, or at least log a warning.
Reproducible Example
Using lightgbm==4.6.0 installed from PyPI.
importjsonimportlightgbmaslgbimportnumpyasnpfromsklearn.datasetsimportmake_regressionX, y=make_regression(
n_samples=10_000,
n_features=5,
n_informative=5,
random_state=42
)
# add a noise featurenoise_feature=np.random.random(size=(X.shape[0], 1))
X=np.concatenate((X, noise_feature), axis=1)
# force the use of that noise feature in every treeforced_split= {
"feature": 5,
"threshold": np.mean(noise_feature),
}
withopen("forced_splits.json", "w") asf:
f.write(json.dumps(forced_split))
# train another model, forcing it to use those splitsmodel=lgb.LGBMRegressor(
random_state=708,
n_estimators=10,
verbose=1,
forcedsplits_filename="forced_splits.json",
)
model.fit(X, y)
# noise feature was used exactly once in every tree# (because we forced LightGBM to use it)model.feature_importances_# array([ 0, 109, 132, 0, 49, 10], dtype=int32)# passing a non-existent file... no warning, no errormodel2=lgb.LGBMRegressor(
random_state=708,
n_estimators=10,
verbose=1,
forcedsplits_filename="does-not-exist.json",
)
model2.fit(X, y)
Logs from that second .fit():
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000568 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1530
[LightGBM] [Info] Number of data points in the train set: 10000, number of used features: 6
[LightGBM] [Info] Start training from score -0.889445
LGBMRegressor(forcedsplits_filename='does-not-exist.json', n_estimators=10,
random_state=708, verbose=1)
Description
If you pass a non-existent file via parameter
forcedsplits_filename
,lightgbm
appears to silently ignore it.It should raise an informative if reading that file fails, or at least log a warning.
Reproducible Example
Using
lightgbm==4.6.0
installed from PyPI.Logs from that second
.fit()
:Notes
Noticed this while working on https://stackoverflow.com/a/79435055/3986677.
I strongly suspect it is not specific to the Python package, and that changes need to be made in the C++ code.
The text was updated successfully, but these errors were encountered: