Ensure all primitives use textNumberPattern and infinfity/NaN correctly

Currently, if the primitive type is an integer then text number parsing disallows parsing decimal points, even if the pattern contains a decimal point. Instead, when parsing integers, we should allow decimals as long the fractional part is zero. And when unparsing, we should unparse a decimal point with a zero fractional part according to the pattern. This changes the behavior so integer parsing uses the same DecimalFormat configuration as non-integer parsing (i.e. decimals are allowed), but we throw a parse error if the fractional part that was parsed is non-zero. This also means that unparsing integers now outputs decimal points according to the pattern. Additionally, if textNumberCheckPolicy is strict, we enable ICU setDecimalPatternMatchRequired to true so that we allow or disallow decimal points in the data depending on if the pattern does or does not have a decimal point. Note that lax parsing always allows decimal points regardless of the pattern. For this reason, we now always require the grouping/decimal separator DFDL properties in lax mode. One bug was discovered in ICU (ICU-22303) where if we require the decimal point due to strict mode enabled, then ICU never parses the infinity/NaN representation. A workaround is added to manually check for these representations until this bug is fixed. ICU unit tests are also added which should fail if ICU fixes this bug so we can remove this workaround. Make sure we always specify infinity and NaN representations from the DFDL schema for all primitives, not just for xs:double/xs:float. There is no way to disable infinity/NaN ICU parsing, so when if we do not specify these values ICU just uses the locale values, which could lead to unwanted locale specific behavior. Related, this modifies NodeInfo types so that fromNumber fails for types that do not support infinity/NaN (i.e. everything except Double/Float) and creates a parse error. Modifies virtual decimal logic to ensure we handle cases for numbers that do not fit in a Long (should work) or contain decimal points (should be a parse error). Tests are updated so if they want to differentiate between int and decimal depending on if a decimal exists in the data, then they must specify a pattern with or without a decimal and enable strict mode--lax mode allows a decimal regardless of type so cannot differentiate the types. DAFFODIL-2158
apache · Mar 27, 2023 · f8a7867 · f8a7867
1 parent 6590127
commit f8a7867
Show file tree

Hide file tree

Showing 15 changed files with 436 additions and 123 deletions.
diff --git a/...ore/src/main/scala/org/apache/daffodil/core/grammar/primitives/PrimitivesTextNumber.scala b/...ore/src/main/scala/org/apache/daffodil/core/grammar/primitives/PrimitivesTextNumber.scala
@@ -23,6 +23,7 @@ import org.apache.daffodil.core.grammar.Terminal
 import org.apache.daffodil.lib.api.WarnID
 import org.apache.daffodil.lib.cookers.EntityReplacer
 import org.apache.daffodil.lib.exceptions.Assert
+import org.apache.daffodil.lib.schema.annotation.props.gen.TextNumberCheckPolicy
 import org.apache.daffodil.lib.schema.annotation.props.gen.TextNumberRounding
 import org.apache.daffodil.lib.util.Maybe
 import org.apache.daffodil.lib.util.Maybe._
@@ -559,24 +560,17 @@ case class ConvertTextStandardNumberPrim(e: ElementBase)
         case TextNumberRounding.Pattern => (MaybeDouble.Nope, Nope)
       }
 
-    val (infRep, nanRep) = e.primType match {
-      case PrimType.Double | PrimType.Float =>
-        (One(e.textStandardInfinityRep), One(e.textStandardNaNRep))
-      case _ => (Nope, Nope)
-    }
-
-    // If the pattern contains any of these characters, we need to set both
-    // group and decimal separators, even if the pattern doesn't contain the
-    // associated character. This is because even when the pattern does not
-    // contain the grouping/decimal separators, ICU stills seems to take the
-    // separators into account. And since ICU provides default values based on
-    // locales, not setting them can cause subtle locale related bugs. We must
-    // also require the separators if the prim type is not an integer type,
-    // since ICU will use them even if the pattern does not specify them.
+    // If the pattern contains any of these characters, we need to set both group and decimal
+    // separators, even if the pattern doesn't contain the associated character. This is because
+    // even when the pattern does not contain the grouping/decimal separators, ICU stills seems
+    // to take the separators into account. And since ICU provides default values based on
+    // locales, not setting them can cause subtle locale related bugs. We also must required
+    // separators when parsing is lax, which parses decimals and grouping separators regardless
+    // of the pattern.
     val requireDecGroupSeps =
       patternWithoutEscapedChars.contains(",") || patternWithoutEscapedChars.contains(".") ||
         patternWithoutEscapedChars.contains("E") || patternWithoutEscapedChars.contains("@") ||
-        !primNumeric.isInteger
+        (e.textNumberCheckPolicy eq TextNumberCheckPolicy.Lax)
 
     val decSep =
       if (requireDecGroupSeps) {
@@ -597,15 +591,14 @@ case class ConvertTextStandardNumberPrim(e: ElementBase)
       decSep,
       groupSep,
       One(e.textStandardExponentRepEv),
-      infRep,
-      nanRep,
+      One(e.textStandardInfinityRep),
+      One(e.textStandardNaNRep),
       e.textNumberCheckPolicy,
       runtimePattern,
       e.textNumberRounding,
       roundingMode,
       roundingIncrement,
       zeroRepsRaw,
-      primNumeric.isInteger,
       e.primType,
     )
     ev.compile(tunable)

diff --git a/...dil-core/src/main/scala/org/apache/daffodil/core/grammar/primitives/PrimitivesZoned.scala b/...dil-core/src/main/scala/org/apache/daffodil/core/grammar/primitives/PrimitivesZoned.scala
@@ -179,7 +179,6 @@ case class ConvertZonedNumberPrim(e: ElementBase)
       roundingMode,
       roundingIncrement,
       Nil,
-      isInt = true,
       e.primType,
     )
     ev.compile(tunable)

diff --git a/...main/scala/org/apache/daffodil/unparsers/runtime1/ConvertTextStandardNumberUnparser.scala b/...main/scala/org/apache/daffodil/unparsers/runtime1/ConvertTextStandardNumberUnparser.scala
@@ -59,7 +59,7 @@ case class ConvertTextNumberUnparser(
   override def unparse(state: UState): Unit = {
 
     val node = state.currentInfosetNode.asSimple
-    val value = node.dataValue.getAnyRef
+    val value = node.dataValue.getNumber
 
     // The type of value should have the type of S, but type erasure makes this
     // difficult to assert. Could probably check this with TypeTags or Manifest

diff --git a/...er/src/main/scala/org/apache/daffodil/unparsers/runtime1/ConvertZonedNumberUnparser.scala b/...er/src/main/scala/org/apache/daffodil/unparsers/runtime1/ConvertZonedNumberUnparser.scala
@@ -64,9 +64,9 @@ case class ConvertZonedNumberUnparser(
     // if we find this is not the case. Want something akin to:
     // Assert.invariant(value.isInstanceOf[S])
 
-    val valueAsAnyRef = node.dataValue.getAnyRef
+    val number = node.dataValue.getNumber
 
-    val scaledNum = this.applyTextDecimalVirtualPointForUnparse(valueAsAnyRef)
+    val scaledNum = this.applyTextDecimalVirtualPointForUnparse(number)
 
     val value = scaledNum.toString
 

diff --git a/daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/dpath/NodeInfo.scala b/daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/dpath/NodeInfo.scala
@@ -704,7 +704,13 @@ object NodeInfo extends Enum {
       type Kind = DecimalKind
       protected override def fromString(s: String): DataValueBigDecimal = new JBigDecimal(s)
       protected override def fromNumberNoCheck(n: Number): DataValueBigDecimal = asBigDecimal(n)
-      override def isValid(n: Number): Boolean = true
+      override def isValid(n: Number): Boolean = {
+        n match {
+          case d: JDouble if d.isInfinite || d.isNaN => false
+          case f: JFloat if f.isInfinite || f.isNaN => false
+          case _ => true
+        }
+      }
 
       override val width: MaybeInt = MaybeInt.Nope
 
@@ -720,7 +726,13 @@ object NodeInfo extends Enum {
       type Kind = IntegerKind
       protected override def fromString(s: String): DataValueBigInt = new JBigInt(s)
       protected override def fromNumberNoCheck(n: Number): DataValueBigInt = asBigInt(n)
-      override def isValid(n: Number): Boolean = true
+      override def isValid(n: Number): Boolean = {
+        n match {
+          case d: JDouble if d.isInfinite || d.isNaN => false
+          case f: JFloat if f.isInfinite || f.isNaN => false
+          case _ => true
+        }
+      }
 
       override val width: MaybeInt = MaybeInt.Nope
 
@@ -795,6 +807,8 @@ object NodeInfo extends Enum {
       def isValid(n: Number): Boolean = n match {
         case bd: JBigDecimal => bd.signum >= 0
         case bi: JBigInt => bi.signum >= 0
+        case d: JDouble if d.isInfinite || d.isNaN => false
+        case f: JFloat if f.isInfinite || f.isNaN => false
         case _ => n.longValue >= 0
       }
 
@@ -815,6 +829,8 @@ object NodeInfo extends Enum {
       def isValid(n: Number): Boolean = n match {
         case bd: JBigDecimal => bd.signum >= 0 && bd.compareTo(maxBD) <= 0
         case bi: JBigInt => bi.signum >= 0 && bi.compareTo(max) <= 0
+        case d: JDouble if d.isInfinite || d.isNaN => false
+        case f: JFloat if f.isInfinite || f.isNaN => false
         case _ => n.longValue >= 0
       }
       val max = new JBigInt(1, scala.Array.fill(8)(0xff.toByte))

diff --git a/daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/processors/EvTextNumber.scala b/daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/processors/EvTextNumber.scala
@@ -81,7 +81,6 @@ class TextNumberFormatEv(
   roundingMode: Maybe[TextNumberRoundingMode],
   roundingIncrement: MaybeDouble,
   zeroRepsRaw: List[String],
-  isInt: Boolean,
   primType: PrimType,
 ) extends Evaluatable[DecimalFormat](tci)
   with InfosetCachedEvaluatable[DecimalFormat] {
@@ -161,6 +160,11 @@ class TextNumberFormatEv(
       case TextNumberCheckPolicy.Lax => false
     }
     df.setParseStrict(cp)
+    // if strict mode is enabled, we also enable setDecimalPatternMatchRequired. This says that
+    // if a decimal point is in the pattern, then the data must contain a decimal point. It also
+    // says the reverse, that if a decimal point is not in the pattern, then the data cannot
+    // contain a decimal point.
+    df.setDecimalPatternMatchRequired(cp)
 
     rounding match {
       case TextNumberRounding.Pattern => {
@@ -182,13 +186,6 @@ class TextNumberFormatEv(
       }
     }
 
-    if (isInt) {
-      df.setMaximumFractionDigits(0)
-      df.setDecimalSeparatorAlwaysShown(false)
-      df.setParseIntegerOnly(true)
-      df.setParseBigDecimal(false)
-    }
-
     df
   }