-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pandas_profiling still gives index error even I reduce the dataframe size #560
Comments
Could you provide the minimal information to reproduce this error? This guide can help crafting a minimal bug report.
|
I got the same issue as shown below. How did you solve it? IndexError Traceback (most recent call last) C:\ProgramData\Anaconda3\envs\data_analysis\lib\site-packages\pandas_profiling\profile_report.py in repr_html(self) C:\ProgramData\Anaconda3\envs\data_analysis\lib\site-packages\pandas_profiling\profile_report.py in to_notebook_iframe(self) C:\ProgramData\Anaconda3\envs\data_analysis\lib\site-packages\pandas_profiling\report\presentation\flavours\widget\notebook.py in get_notebook_iframe(profile) C:\ProgramData\Anaconda3\envs\data_analysis\lib\site-packages\pandas_profiling\report\presentation\flavours\widget\notebook.py in get_notebook_iframe_srcdoc(profile) C:\ProgramData\Anaconda3\envs\data_analysis\lib\site-packages\pandas_profiling\profile_report.py in to_html(self) C:\ProgramData\Anaconda3\envs\data_analysis\lib\site-packages\pandas_profiling\profile_report.py in html(self) C:\ProgramData\Anaconda3\envs\data_analysis\lib\site-packages\pandas_profiling\profile_report.py in _render_html(self) C:\ProgramData\Anaconda3\envs\data_analysis\lib\site-packages\pandas_profiling\profile_report.py in report(self) C:\ProgramData\Anaconda3\envs\data_analysis\lib\site-packages\pandas_profiling\profile_report.py in description_set(self) C:\ProgramData\Anaconda3\envs\data_analysis\lib\site-packages\pandas_profiling\model\describe.py in describe(title, df) C:\ProgramData\Anaconda3\envs\data_analysis\lib\site-packages\pandas_profiling\model\summary.py in get_series_descriptions(df, pbar) C:\ProgramData\Anaconda3\envs\data_analysis\lib\multiprocessing\pool.py in next(self, timeout) C:\ProgramData\Anaconda3\envs\data_analysis\lib\multiprocessing\pool.py in worker(inqueue, outqueue, initializer, initargs, maxtasks, wrap_exception) C:\ProgramData\Anaconda3\envs\data_analysis\lib\site-packages\pandas_profiling\model\summary.py in multiprocess_1d(args) C:\ProgramData\Anaconda3\envs\data_analysis\lib\site-packages\pandas_profiling\model\summary.py in describe_1d(series) C:\ProgramData\Anaconda3\envs\data_analysis\lib\site-packages\pandas_profiling\model\summary.py in describe_date_1d(series, series_description) <array_function internals> in histogram(*args, **kwargs) C:\ProgramData\Anaconda3\envs\data_analysis\lib\site-packages\numpy\lib\histograms.py in histogram(a, bins, range, normed, weights, density) IndexError: index -9223372036854775808 is out of bounds for axis 0 with size 2 |
Hi ,
The error was due to a lot of data columns not having values, they were
nulls . This resulted in not being able to create an index and matrix.
Thanks for responding to my emails. I will keep posting if I receive any
errors. The only thing that I am facing now is the memory error due to
lack of RAM.
Any suggestion if I have a billion rows from the database ( Not using Vaex
as this can use big data format not direct relational data using pyodbc).
The chunksize can help but if anyone can provide the chunksize proper usage
would be great.
Thanks,
Debashis
…On Wed, Sep 23, 2020 at 11:05 AM Mike Lee ***@***.***> wrote:
I got the same issue as shown below. How did you solve it?
------------------------------
IndexError Traceback (most recent call last)
C:\ProgramData\Anaconda3\envs\data_analysis\lib\site-packages\IPython\core\formatters.py
in *call*(self, obj)
343 method = get_real_method(obj, self.print_method)
344 if method is not None:
--> 345 return method()
346 return None
347 else:
C:\ProgramData\Anaconda3\envs\data_analysis\lib\site-packages\pandas_profiling\profile_report.py
in *repr_html*(self)
395 def *repr_html*(self):
396 """The ipython notebook widgets user interface gets called by the
jupyter notebook."""
--> 397 self.to_notebook_iframe()
398
399 def *repr*(self):
C:\ProgramData\Anaconda3\envs\data_analysis\lib\site-packages\pandas_profiling\profile_report.py
in to_notebook_iframe(self)
375 with warnings.catch_warnings():
376 warnings.simplefilter("ignore")
--> 377 display(get_notebook_iframe(self))
378
379 def to_widgets(self):
C:\ProgramData\Anaconda3\envs\data_analysis\lib\site-packages\pandas_profiling\report\presentation\flavours\widget\notebook.py
in get_notebook_iframe(profile)
63 output = get_notebook_iframe_src(profile)
64 elif attribute == "srcdoc":
---> 65 output = get_notebook_iframe_srcdoc(profile)
66 else:
67 raise ValueError(
C:\ProgramData\Anaconda3\envs\data_analysis\lib\site-packages\pandas_profiling\report\presentation\flavours\widget\notebook.py
in get_notebook_iframe_srcdoc(profile)
21 width = config["notebook"]["iframe"]["width"].get(str)
22 height = config["notebook"]["iframe"]["height"].get(str)
---> 23 src = html.escape(profile.to_html())
24
25 iframe = f'<iframe width="{width}" height="{height}" srcdoc="{src}"
frameborder="0" allowfullscreen></iframe>'
C:\ProgramData\Anaconda3\envs\data_analysis\lib\site-packages\pandas_profiling\profile_report.py
in to_html(self)
346
347 """
--> 348 return self.html
349
350 def to_json(self) -> str:
C:\ProgramData\Anaconda3\envs\data_analysis\lib\site-packages\pandas_profiling\profile_report.py
in html(self)
166 def html(self):
167 if self._html is None:
--> 168 self._html = self._render_html()
169 return self._html
170
C:\ProgramData\Anaconda3\envs\data_analysis\lib\site-packages\pandas_profiling\profile_report.py
in _render_html(self)
273 from pandas_profiling.report.presentation.flavours import HTMLReport
274
--> 275 report = self.report
276
277 disable_progress_bar = not config["progress_bar"].get(bool)
C:\ProgramData\Anaconda3\envs\data_analysis\lib\site-packages\pandas_profiling\profile_report.py
in report(self)
160 def report(self):
161 if self._report is None:
--> 162 self._report = get_report_structure(self.description_set)
163 return self._report
164
C:\ProgramData\Anaconda3\envs\data_analysis\lib\site-packages\pandas_profiling\profile_report.py
in description_set(self)
141 def description_set(self):
142 if self._description_set is None:
--> 143 self._description_set = describe_df(self.title, self.df)
144 return self._description_set
145
C:\ProgramData\Anaconda3\envs\data_analysis\lib\site-packages\pandas_profiling\model\describe.py
in describe(title, df)
61 total=number_of_tasks, desc="Summarize dataset",
disable=disable_progress_bar
62 ) as pbar:
---> 63 series_description = get_series_descriptions(df, pbar)
64
65 pbar.set_postfix_str("Get variable types")
C:\ProgramData\Anaconda3\envs\data_analysis\lib\site-packages\pandas_profiling\model\summary.py
in get_series_descriptions(df, pbar)
470 # TODO: use Pool for Linux-based systems
471 with multiprocessing.pool.ThreadPool(pool_size) as executor:
--> 472 for i, (column, description) in enumerate(
473 executor.imap_unordered(multiprocess_1d, args)
474 ):
C:\ProgramData\Anaconda3\envs\data_analysis\lib\multiprocessing\pool.py in
next(self, timeout)
866 if success:
867 return value
--> 868 raise value
869
870 *next* = next # XXX
C:\ProgramData\Anaconda3\envs\data_analysis\lib\multiprocessing\pool.py in
worker(inqueue, outqueue, initializer, initargs, maxtasks, wrap_exception)
123 job, i, func, args, kwds = task
124 try:
--> 125 result = (True, func(*args, **kwds))
126 except Exception as e:
127 if wrap_exception and func is not _helper_reraises_exception:
C:\ProgramData\Anaconda3\envs\data_analysis\lib\site-packages\pandas_profiling\model\summary.py
in multiprocess_1d(args)
448 """
449 column, series = args
--> 450 return column, describe_1d(series)
451
452 # Multiprocessing of Describe 1D for each column
C:\ProgramData\Anaconda3\envs\data_analysis\lib\site-packages\pandas_profiling\model\summary.py
in describe_1d(series)
417 if series_description["type"] in type_to_func:
418 series_description.update(
--> 419 type_to_func[series_description["type"]](series,
series_description)
420 )
421 else:
C:\ProgramData\Anaconda3\envs\data_analysis\lib\site-packages\pandas_profiling\model\summary.py
in describe_date_1d(series, series_description)
230 )
231 if chi_squared_threshold > 0.0:
--> 232 histogram = np.histogram(
233 series[series.notna()].astype("int64").values, bins="auto"
234 )[0]
<*array_function* internals> in histogram(*args, **kwargs)
C:\ProgramData\Anaconda3\envs\data_analysis\lib\site-packages\numpy\lib\histograms.py
in histogram(a, bins, range, normed, weights, density)
854 # The index computation is not guaranteed to give exactly
855 # consistent results within ~1 ULP of the bin edges.
--> 856 decrement = tmp_a < bin_edges[indices]
857 indices[decrement] -= 1
858 # The last bin includes the right edge. The other bins do not.
IndexError: index -9223372036854775808 is out of bounds for axis 0 with
size 2
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#560 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AKEORMCDHGNZVVDNWNNT64TSHIMC7ANCNFSM4QOQITYQ>
.
|
@bi2017dg how was this resolved? I am facing the same issue. |
I got the same issue as shown below. How did you solve it? In [2]: a In [3]: import pandas as pd In [4]: import numpy as np In [5]: from ydata_profiling import ProfileReport In [6]: table = pd.DataFrame.from_dict(a) In [7]: profile_report = ProfileReport(
|
IndexError Traceback (most recent call last)
in
19 #ProfileReport(df_s[:10000])
20 profile = df[:1000].profile_report(title='LATE FEE & SUSPENSION Profiling Report', html={'style':{'full_width':True}})
---> 21 profile.to_file("output.html")
22 #(title='LATE FEE & SUSPENSION Profiling Report', html={'style':{'full_width':True}})
23 #profile.to_file(output_file="data profile.html")
~\Anaconda3\lib\site-packages\pandas_profiling\profile_report.py in to_file(self, output_file, silent)
243 silent: if False, opens the file in the default browser or download it in a Google Colab environment
244 """
--> 245 if not isinstance(output_file, Path):
246 output_file = Path(str(output_file))
247
~\Anaconda3\lib\site-packages\pandas_profiling\profile_report.py in to_html(self)
346 with tqdm(total=1, desc="Render JSON", disable=disable_progress_bar) as pbar:
347 data = json.dumps(description, indent=4, cls=CustomEncoder)
--> 348 pbar.update()
349 return data
350
~\Anaconda3\lib\site-packages\pandas_profiling\profile_report.py in html(self)
166 if self._df_hash == -1 and self.df is not None:
167 self._df_hash = hash_dataframe(self.df)
--> 168 return self._df_hash
169
170 @Property
~\Anaconda3\lib\site-packages\pandas_profiling\profile_report.py in _render_html(self)
273 if not silent:
274 try:
--> 275 from google.colab import files
276
277 files.download(output_file.absolute().as_uri())
~\Anaconda3\lib\site-packages\pandas_profiling\profile_report.py in report(self)
160 self._title = config["title"].get(str)
161
--> 162 return self._title
163
164 @Property
~\Anaconda3\lib\site-packages\pandas_profiling\profile_report.py in description_set(self)
141 self._report = None
142 self._html = None
--> 143 self._widgets = None
144 self._json = None
145
~\Anaconda3\lib\site-packages\pandas_profiling\model\describe.py in describe(title, df)
61 number_of_tasks = 9 + len(df.columns) + len(correlation_names)
62
---> 63 with tqdm(
64 total=number_of_tasks, desc="Summarize dataset", disable=disable_progress_bar
65 ) as pbar:
~\Anaconda3\lib\site-packages\pandas_profiling\model\summary.py in get_series_descriptions(df, pbar)
471 def get_series_description(series):
472 return describe_1d(series)
--> 473
474
475 def get_series_descriptions(df, pbar):
~\Anaconda3\lib\multiprocessing\pool.py in next(self, timeout)
746 if success:
747 return value
--> 748 raise value
749
750 next = next # XXX
~\Anaconda3\lib\multiprocessing\pool.py in worker(inqueue, outqueue, initializer, initargs, maxtasks, wrap_exception)
119 job, i, func, args, kwds = task
120 try:
--> 121 result = (True, func(*args, **kwds))
122 except Exception as e:
123 if wrap_exception and func is not _helper_reraises_exception:
~\Anaconda3\lib\site-packages\pandas_profiling\model\summary.py in multiprocess_1d(args)
448 Variable.TYPE_URL: describe_url_1d,
449 Variable.TYPE_PATH: describe_path_1d,
--> 450 Variable.TYPE_IMAGE: describe_image_1d,
451 Variable.TYPE_FILE: describe_file_1d,
452 }
~\Anaconda3\lib\site-packages\pandas_profiling\model\summary.py in describe_1d(series)
417 series: The Series to describe.
418 series_description: The dict containing the series description so far.
--> 419
420 Returns:
421 A dict containing calculated series description values.
~\Anaconda3\lib\site-packages\pandas_profiling\model\summary.py in describe_date_1d(series, series_description)
231
232 stats["monotonic_increase"] = series.is_monotonic_increasing
--> 233 stats["monotonic_decrease"] = series.is_monotonic_decreasing
234
235 stats["monotonic_increase_strict"] = (
<array_function internals> in histogram(*args, **kwargs)
~\Anaconda3\lib\site-packages\numpy\lib\histograms.py in histogram(a, bins, range, normed, weights, density)
857 # The index computation is not guaranteed to give exactly
858 # consistent results within ~1 ULP of the bin edges.
--> 859 decrement = tmp_a < bin_edges[indices]
860 indices[decrement] -= 1
861 # The last bin includes the right edge. The other bins do not.
IndexError: index -9223372036854775808 is out of bounds for axis 0 with size 2
The text was updated successfully, but these errors were encountered: