forked from dirty-cat/dirty_cat
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Use handle_unknown=ignore in SuperVectorizer (skrub-data#473)
* Use handle_unknown=ignore in SuperVectorizer Change default `low_card_cat_transformer` in SuperVectorizer to use handle_unknown="ignore" * Update changelog * Change drop to None * Fix bug for new categories for categorical columns Pandas `category` dtype conversion converts new categories to nans, so we now update the list of categories before converting. * Fix test to prevent n_samples < n_components * Update dirty_cat/_super_vectorizer.py Co-authored-by: Jovan Stojanovic <[email protected]> * Convert all categorical columns to object dtype inside SuperVectorizer This avoids dealing with the categories attached to the dtype. * Put back drop="if_binary" And use handle_unknown="error" for sklearn < 0.24.2. * Revert "Convert all categorical columns to object dtype inside SuperVectorizer" This reverts commit 34ed05f. * finish merge * change name in CHANGES.rst * Change min version for handle_unknown=ignore to 1.0.0 and change the warning message to be more informative. * warning stacklevel + fix name * replace sup_vec by table_vec --------- Co-authored-by: Jovan Stojanovic <[email protected]>
- Loading branch information
1 parent
90ea4db
commit 9b15dd2
Showing
5 changed files
with
118 additions
and
27 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters