Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dev-fix/Improvements for Local Development Setup and Unit Test Structure #1168

Merged
merged 13 commits into from
Jan 30, 2025
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Staging release 0.13.0 (#1165) (#1166)
* refactor: Upgrade the models to use keras 3.0 (#1138)

* Replace snappy with cramjam (#1091)

* add downloads tile (#1085)

* Replace snappy with cramjam

* Delete test_no_snappy

---------



* pre-commit fix (#1122)

* Bug fix for float precision calculation using categorical data with trailing zeros. (#1125)

* Revert "Bug fix for float precision calculation using categorical data with t…" (#1133)

This reverts commit d3159bd.

* refactor: move layers outside of class

* refactor: update model to keras 3.0

* fix: manifest

* fix: bugs in compile and train

* fix: bug in load_from_library

* fix: bugs in CharCNN

* refactor: loading tf model labeler

* fix: bug in data_labeler identification

* fix: update model to use proper softmax layer names

* fix: formatting

* fix: remove unused line

* refactor: drop support for 3.8

* fix: comments

* fix: comment

---------





* Fix Tox (#1143)

* tox new

* update

* update

* update

* update

* update

* update

* update

* update tox.ini

* update

* update

* remove docs

* empty retrigger

* update (#1146)

* Add Python 3.11 to GHA (#1090)

* add downloads tile (#1085)

* Add Python 3.11 to GHA

* Replace snappy with cramjam (#1091)

* add downloads tile (#1085)

* Replace snappy with cramjam

* Delete test_no_snappy

---------



* Update dask modules

* Install dask dataframe

* Update dask modules in precommit

* Correct copy/paste error

* Try again to clear Unicode

* Rolled back pre-commit dask version

* Add py311 to tox

* Bump dask to 2024.4.1

* Bump python-snappy 0.7.1

* Rewrite labeler test

* Correct isort

* Satisfy black

* And flake8

* Synced with requirements

---------



* [Vuln Fix]: Resolve mend vulnerabilities related to requests. (#1162)

* resolved check-manifest issue

* updating keras version pin to <=3.4.0

* adding comment in requirements.txt to trigger mend check

---------



---------

Co-authored-by: JGSweets <[email protected]>
Co-authored-by: Gábor Lipták <[email protected]>
Co-authored-by: Taylor Turner <[email protected]>
Co-authored-by: James Schadt <[email protected]>
Co-authored-by: Michael Davis <[email protected]>
  • Loading branch information
6 people authored Jan 13, 2025
commit 50da93f0bca6b0add053acdafd34e2f48f61fa1b
8 changes: 4 additions & 4 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -48,14 +48,14 @@ repos:
# requirements.txt
h5py>=2.10.0,
wheel>=0.33.1,
numpy>=1.22.0,
numpy<2.0.0,
pandas>=1.1.2,
python-dateutil>=2.7.5,
pytz>=2020.1,
pyarrow>=1.0.1,
chardet>=3.0.4,
fastavro>=1.0.0.post1,
python-snappy>=0.5.4,
python-snappy>=0.7.1,
charset-normalizer>=1.3.6,
psutil>=4.0.0,
scipy>=1.4.1,
Expand All @@ -80,7 +80,7 @@ repos:

# requirements-ml.txt
scikit-learn>=0.23.2,
'keras>=2.4.3,<3.0.0',
'keras>=2.4.3,<=3.4.0',
rapidfuzz>=2.6.1,
"tensorflow>=2.6.4,<2.15.0; sys.platform != 'darwin'",
"tensorflow>=2.6.4,<2.15.0; sys_platform == 'darwin' and platform_machine != 'arm64'",
Expand Down Expand Up @@ -108,7 +108,7 @@ repos:
rev: "0.48"
hooks:
- id: check-manifest
additional_dependencies: ['h5py', 'wheel', 'future', 'numpy', 'pandas',
additional_dependencies: ['h5py', 'wheel', 'future', 'numpy<2.0.0', 'pandas',
'python-dateutil', 'pytz', 'pyarrow', 'chardet', 'fastavro',
'python-snappy', 'charset-normalizer', 'psutil', 'scipy', 'requests',
'networkx','typing-extensions', 'HLL', 'datasketches', 'boto3']
Expand Down
16 changes: 0 additions & 16 deletions dataprofiler/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,22 +20,6 @@
from .validators.base_validators import Validator
from .version import __version__

try:
import snappy
except ImportError:
import warnings

warnings.warn(
"Snappy must be installed to use parquet/avro datasets."
"\n\n"
"For macOS use Homebrew:\n"
"\t`brew install snappy`"
"\n\n"
"For linux use apt-get:\n`"
"\tsudo apt-get -y install libsnappy-dev`\n",
ImportWarning,
)


def set_seed(seed=None):
# also check it's an integer
Expand Down
40 changes: 0 additions & 40 deletions dataprofiler/tests/test_data_profiler.py
Original file line number Diff line number Diff line change
Expand Up @@ -56,46 +56,6 @@ def test_data_profiling(self):
self.assertIsNotNone(profile.profile)
self.assertIsNotNone(profile.report())

def test_no_snappy(self):
import importlib
import sys
import types

orig_import = __import__
# necessary for any wrapper around the library to test if snappy caught
# as an issue

def reload_data_profiler():
"""Recursively reload modules."""
sys_modules = sys.modules.copy()
for module_name, module in sys_modules.items():
# Only reload top level of the dataprofiler
if "dataprofiler" in module_name and len(module_name.split(".")) < 3:
if isinstance(module, types.ModuleType):
importlib.reload(module)

def import_mock(name, *args, **kwargs):
if name == "snappy":
raise ImportError("test")
return orig_import(name, *args, **kwargs)

with mock.patch("builtins.__import__", side_effect=import_mock):
with self.assertWarns(ImportWarning) as w:
import dataprofiler

reload_data_profiler()

self.assertEqual(
str(w.warning),
"Snappy must be installed to use parquet/avro datasets."
"\n\n"
"For macOS use Homebrew:\n"
"\t`brew install snappy`"
"\n\n"
"For linux use apt-get:\n`"
"\tsudo apt-get -y install libsnappy-dev`\n",
)

def test_no_tensorflow(self):
import sys

Expand Down
2 changes: 1 addition & 1 deletion requirements-dev.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
check-manifest>=0.48
check-manifest>=0.50
black>=24.3.0
isort==5.12.0
pre-commit==2.19.0
Expand Down
2 changes: 1 addition & 1 deletion requirements-ml.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
scikit-learn>=0.23.2
keras>=3.0.0
keras<=3.4.0
rapidfuzz>=2.6.1
tensorflow>=2.16.0; sys.platform != 'darwin'
tensorflow>=2.16.0; sys_platform == 'darwin' and platform_machine != 'arm64'
Expand Down
7 changes: 4 additions & 3 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -1,20 +1,21 @@
h5py>=2.10.0
wheel>=0.33.1
numpy>=1.22.0
numpy<2.0.0
pandas>=1.1.2
python-dateutil>=2.7.5
pytz>=2020.1
pyarrow>=1.0.1
chardet>=3.0.4
fastavro>=1.1.0
python-snappy>=0.5.4
python-snappy>=0.7.1
charset-normalizer>=1.3.6
psutil>=4.0.0
scipy>=1.10.0
requests>=2.28.1
requests==2.32.*
networkx>=2.5.1
typing-extensions>=3.10.0.2
HLL>=2.0.3
datasketches>=4.1.0
packaging>=23.0
boto3>=1.28.61
# adding comment to trigger mend check
2 changes: 1 addition & 1 deletion tox.ini
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[tox]
envlist = py39, py310, 311, pypi-description, manifest, precom
envlist = py39, py310, py311, pypi-description, manifest, precom


[testenv]
Expand Down
Loading