Skip to content

Commit

Permalink
Ensure all primitives use textNumberPattern and infinfity/NaN correctly
Browse files Browse the repository at this point in the history
Currently, if the primitive type is an integer then text number parsing
disallows parsing decimal points, even if the pattern contains a decimal
point. Instead, when parsing integers, we should allow decimals as long
the fractional part is zero. And when unparsing, we should unparse a
decimal point with a zero fractional part according to the pattern.

This changes the behavior so integer parsing uses the same DecimalFormat
configuration as non-integer parsing (i.e. decimals are allowed), but we
throw a parse error if the fractional part that was parsed is non-zero.
This also means that unparsing integers now outputs decimal points
according to the pattern.

Additionally, if textNumberCheckPolicy is strict, we enable ICU
setDecimalPatternMatchRequired to true so that we allow or disallow
decimal points in the data depending on if the pattern does or does not
have a decimal point. Note that lax parsing always allows decimal points
regardless of the pattern. For this reason, we now always require the
grouping/decimal separator DFDL properties in lax mode.

One bug was discovered in ICU (ICU-22303) where if we require the
decimal point due to strict mode enabled, then ICU never parses the
infinity/NaN representation. A workaround is added to manually check for
these representations until this bug is fixed. ICU unit tests are also
added which should fail if ICU fixes this bug so we can remove this
workaround.

Make sure we always specify infinity and NaN representations from the
DFDL schema for all primitives, not just for xs:double/xs:float. There
is no way to disable infinity/NaN ICU parsing, so when if we do not
specify these values ICU just uses the locale values, which could lead
to unwanted locale specific behavior. Related, this modifies NodeInfo
types so that fromNumber fails for types that do not support
infinity/NaN (i.e. everything except Double/Float) and creates a parse
error.

Modifies virtual decimal logic to ensure we handle cases for numbers
that do not fit in a Long (should work) or contain decimal points
(should be a parse error).

Tests are updated so if they want to differentiate between int and
decimal depending on if a decimal exists in the data, then they must
specify a pattern with or without a decimal and enable strict mode--lax
mode allows a decimal regardless of type so cannot differentiate the
types.

DAFFODIL-2158
  • Loading branch information
stevedlawrence committed Mar 27, 2023
1 parent 6590127 commit f8a7867
Show file tree
Hide file tree
Showing 15 changed files with 436 additions and 123 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ import org.apache.daffodil.core.grammar.Terminal
import org.apache.daffodil.lib.api.WarnID
import org.apache.daffodil.lib.cookers.EntityReplacer
import org.apache.daffodil.lib.exceptions.Assert
import org.apache.daffodil.lib.schema.annotation.props.gen.TextNumberCheckPolicy
import org.apache.daffodil.lib.schema.annotation.props.gen.TextNumberRounding
import org.apache.daffodil.lib.util.Maybe
import org.apache.daffodil.lib.util.Maybe._
Expand Down Expand Up @@ -559,24 +560,17 @@ case class ConvertTextStandardNumberPrim(e: ElementBase)
case TextNumberRounding.Pattern => (MaybeDouble.Nope, Nope)
}

val (infRep, nanRep) = e.primType match {
case PrimType.Double | PrimType.Float =>
(One(e.textStandardInfinityRep), One(e.textStandardNaNRep))
case _ => (Nope, Nope)
}

// If the pattern contains any of these characters, we need to set both
// group and decimal separators, even if the pattern doesn't contain the
// associated character. This is because even when the pattern does not
// contain the grouping/decimal separators, ICU stills seems to take the
// separators into account. And since ICU provides default values based on
// locales, not setting them can cause subtle locale related bugs. We must
// also require the separators if the prim type is not an integer type,
// since ICU will use them even if the pattern does not specify them.
// If the pattern contains any of these characters, we need to set both group and decimal
// separators, even if the pattern doesn't contain the associated character. This is because
// even when the pattern does not contain the grouping/decimal separators, ICU stills seems
// to take the separators into account. And since ICU provides default values based on
// locales, not setting them can cause subtle locale related bugs. We also must required
// separators when parsing is lax, which parses decimals and grouping separators regardless
// of the pattern.
val requireDecGroupSeps =
patternWithoutEscapedChars.contains(",") || patternWithoutEscapedChars.contains(".") ||
patternWithoutEscapedChars.contains("E") || patternWithoutEscapedChars.contains("@") ||
!primNumeric.isInteger
(e.textNumberCheckPolicy eq TextNumberCheckPolicy.Lax)

val decSep =
if (requireDecGroupSeps) {
Expand All @@ -597,15 +591,14 @@ case class ConvertTextStandardNumberPrim(e: ElementBase)
decSep,
groupSep,
One(e.textStandardExponentRepEv),
infRep,
nanRep,
One(e.textStandardInfinityRep),
One(e.textStandardNaNRep),
e.textNumberCheckPolicy,
runtimePattern,
e.textNumberRounding,
roundingMode,
roundingIncrement,
zeroRepsRaw,
primNumeric.isInteger,
e.primType,
)
ev.compile(tunable)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -179,7 +179,6 @@ case class ConvertZonedNumberPrim(e: ElementBase)
roundingMode,
roundingIncrement,
Nil,
isInt = true,
e.primType,
)
ev.compile(tunable)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ case class ConvertTextNumberUnparser(
override def unparse(state: UState): Unit = {

val node = state.currentInfosetNode.asSimple
val value = node.dataValue.getAnyRef
val value = node.dataValue.getNumber

// The type of value should have the type of S, but type erasure makes this
// difficult to assert. Could probably check this with TypeTags or Manifest
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -64,9 +64,9 @@ case class ConvertZonedNumberUnparser(
// if we find this is not the case. Want something akin to:
// Assert.invariant(value.isInstanceOf[S])

val valueAsAnyRef = node.dataValue.getAnyRef
val number = node.dataValue.getNumber

val scaledNum = this.applyTextDecimalVirtualPointForUnparse(valueAsAnyRef)
val scaledNum = this.applyTextDecimalVirtualPointForUnparse(number)

val value = scaledNum.toString

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -704,7 +704,13 @@ object NodeInfo extends Enum {
type Kind = DecimalKind
protected override def fromString(s: String): DataValueBigDecimal = new JBigDecimal(s)
protected override def fromNumberNoCheck(n: Number): DataValueBigDecimal = asBigDecimal(n)
override def isValid(n: Number): Boolean = true
override def isValid(n: Number): Boolean = {
n match {
case d: JDouble if d.isInfinite || d.isNaN => false
case f: JFloat if f.isInfinite || f.isNaN => false
case _ => true
}
}

override val width: MaybeInt = MaybeInt.Nope

Expand All @@ -720,7 +726,13 @@ object NodeInfo extends Enum {
type Kind = IntegerKind
protected override def fromString(s: String): DataValueBigInt = new JBigInt(s)
protected override def fromNumberNoCheck(n: Number): DataValueBigInt = asBigInt(n)
override def isValid(n: Number): Boolean = true
override def isValid(n: Number): Boolean = {
n match {
case d: JDouble if d.isInfinite || d.isNaN => false
case f: JFloat if f.isInfinite || f.isNaN => false
case _ => true
}
}

override val width: MaybeInt = MaybeInt.Nope

Expand Down Expand Up @@ -795,6 +807,8 @@ object NodeInfo extends Enum {
def isValid(n: Number): Boolean = n match {
case bd: JBigDecimal => bd.signum >= 0
case bi: JBigInt => bi.signum >= 0
case d: JDouble if d.isInfinite || d.isNaN => false
case f: JFloat if f.isInfinite || f.isNaN => false
case _ => n.longValue >= 0
}

Expand All @@ -815,6 +829,8 @@ object NodeInfo extends Enum {
def isValid(n: Number): Boolean = n match {
case bd: JBigDecimal => bd.signum >= 0 && bd.compareTo(maxBD) <= 0
case bi: JBigInt => bi.signum >= 0 && bi.compareTo(max) <= 0
case d: JDouble if d.isInfinite || d.isNaN => false
case f: JFloat if f.isInfinite || f.isNaN => false
case _ => n.longValue >= 0
}
val max = new JBigInt(1, scala.Array.fill(8)(0xff.toByte))
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,6 @@ class TextNumberFormatEv(
roundingMode: Maybe[TextNumberRoundingMode],
roundingIncrement: MaybeDouble,
zeroRepsRaw: List[String],
isInt: Boolean,
primType: PrimType,
) extends Evaluatable[DecimalFormat](tci)
with InfosetCachedEvaluatable[DecimalFormat] {
Expand Down Expand Up @@ -161,6 +160,11 @@ class TextNumberFormatEv(
case TextNumberCheckPolicy.Lax => false
}
df.setParseStrict(cp)
// if strict mode is enabled, we also enable setDecimalPatternMatchRequired. This says that
// if a decimal point is in the pattern, then the data must contain a decimal point. It also
// says the reverse, that if a decimal point is not in the pattern, then the data cannot
// contain a decimal point.
df.setDecimalPatternMatchRequired(cp)

rounding match {
case TextNumberRounding.Pattern => {
Expand All @@ -182,13 +186,6 @@ class TextNumberFormatEv(
}
}

if (isInt) {
df.setMaximumFractionDigits(0)
df.setDecimalSeparatorAlwaysShown(false)
df.setParseIntegerOnly(true)
df.setParseBigDecimal(false)
}

df
}

Expand Down
Loading

0 comments on commit f8a7867

Please sign in to comment.