Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

histogram puts value into wrong bin #437

Closed
rsie-dev opened this issue Jun 3, 2022 · 4 comments
Closed

histogram puts value into wrong bin #437

rsie-dev opened this issue Jun 3, 2022 · 4 comments
Labels
duplicate This issue or pull request already exists

Comments

@rsie-dev
Copy link

rsie-dev commented Jun 3, 2022

I'm using
Package: pgfplots 2021/05/15 v1.18.1 Data Visualization (1.18.1)

According to the documentation the bins should be half open.
The following code creates a histogram with the bins
10-19, 19-28, 28-37, 37-46, 46-55
If I understand it correct a value of 28 should be put in the 3rd bin.
But it is put in the second.
A value of 19 is being put correctly in the second bin.
From my observation all endpoints except of the first and last bin are affected.
This example shows the problem:

\documentclass{article}
\usepackage{pgfplots}
\usepgfplotslibrary{statistics}
\pgfplotsset{compat=1.18}

\begin{document}
\begin{tikzpicture}
	\begin{axis}[
		ybar interval,
		xticklabel=\pgfmathprintnumber\tick--\pgfmathprintnumber\nexttick
		]
		\addplot+ [hist={bins=5, data min=10, data max=55}]
		table [row sep=\\,y index=0] {
			data\\
			28\\
		};
	\end{axis}
\end{tikzpicture}
\end{document}

grafik

Do I miss something or is this a bug in pgfplots?

Any help would be appreciated.

Regards,
Ralf

@muzimuzhi
Copy link
Member

pgfplots computes which interval a value (denoted by i below) belongs to in two steps:

  1. computes invh = 1 / ( (max - min) / bin), in \pgfplotsplothandlersurveyend@hist@
  2. computes nth_bin = floor((i - min) * invh + 1e-4), in \pgfplotsplothandlerhistgetbinfor@. Here the constant 1e-4 is stored in \pgfplotsplothandlerhisttol@parsed and seems to be a tolerance.

With bins=5, data min=10, data max=55 (bin_width = 9), first step gets invh = 0.1111, and second step gets nth_bin = floor(1.9999) = 1, while the expected result is 2.

As a workaround specifically for your example, using \pgfplotsplothandlerhistsettol{<a small sci num>} (inside tikzpicture but before axis env) to increase the tolerance seems to work. This tolerance is used only for histogram plots so it's probably not very dangerous to increase it. For data\\ 19\\, 2e-4 suffices, but for every middle endpoints to work, data\\ 19\\ 28\\ 37\\ 46\\, you need 4e-4. I guess with some special settings you'll finally need 1e-3.


Digging deeper, your example can be reduced to a fpu example

\documentclass{article}
\usepackage{pgf}
\usepgflibrary{fpu}
\begin{document}

\makeatletter
\pgfmathfloatdivide@{1Y1.0e0]}{1Y9.0e0]} \pgfmathresult\par % = 1Y1.111e-1
\pgfmathfloatdivide@{1Y2.0e0]}{1Y1.8e1]} \pgfmathresult\par % = 1Y1.1111e-1, more accurate
\makeatother
\end{document}

fpu math function divide (\pgfmathfloatdivide@) computes the mantissa part of its result in two steps:

  1. extends precision of its operands by 1 digit (e.g., 1Y1.0e0 -> 1Y10.000e-1, 1Y9.0e0 -> 1Y90.00e-1),
  2. computes the division of mantissas by basic (non-fpu) divide function (e.g., \pgfmath@basic@divide@{10.000}{90.00}).

It seems the accuracy of either fpu or basic math function divide should be improved. Also in the first step of fpu divide, I don't know which one would be more accurate, invh = 1 / ( (max - min) / bin) or invh = bin / (max - min)? The answer maybe relevant to the number of bins/intervals.

@muzimuzhi
Copy link
Member

Another workaround: redefine \pgfmath@basic@divide@ to use latex3 function \fp_eval:n, which is fully expandable and far more accurate (as far as you're using latex format).

\makeatletter
\def\pgfmath@basic@divide@#1#2{\edef\pgfmathresult{\csname fp_eval:n\endcsname{#1/#2}}}
\makeatother

@rsie-dev
Copy link
Author

rsie-dev commented Jun 4, 2022

Hi muzimuzhi,

using your workaround \fp_eval:n for divisions did work for me!

Thank you so much!!!

regards,
Ralf

@rsie-dev rsie-dev closed this as completed Jun 4, 2022
@hmenke
Copy link
Member

hmenke commented Jun 8, 2022

Duplicate of pgf-tikz/pgf#1148

@hmenke hmenke marked this as a duplicate of pgf-tikz/pgf#1148 Jun 8, 2022
@hmenke hmenke added the duplicate This issue or pull request already exists label Jun 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate This issue or pull request already exists
Development

No branches or pull requests

3 participants