Skip to content

Commit

Permalink
SQL: Fix FORMAT function to better comply with Microsoft SQL Server s…
Browse files Browse the repository at this point in the history
…pecification (elastic#86225)
  • Loading branch information
luigidellaquila authored May 18, 2022
1 parent fd99a50 commit f69c739
Show file tree
Hide file tree
Showing 8 changed files with 303 additions and 41 deletions.
3 changes: 3 additions & 0 deletions .editorconfig
Original file line number Diff line number Diff line change
Expand Up @@ -225,3 +225,6 @@ indent_size = 2

[*.{xsd,xml}]
indent_size = 4

[*.{csv,sql}-spec]
trim_trailing_whitespace = false
6 changes: 6 additions & 0 deletions docs/changelog/86225.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
pr: 86225
summary: Fix FORMAT function to comply with Microsoft SQL Server specification
area: SQL
type: bug
issues:
- 66560
10 changes: 5 additions & 5 deletions docs/reference/sql/functions/date-time.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -454,7 +454,7 @@ include-tagged::{sql-specs}/docs/docs.csv-spec[dateParse1]

[NOTE]
====
The resulting `date` will have the time zone specified by the user through the
The resulting `date` will have the time zone specified by the user through the
<<sql-search-api-time-zone,`time_zone`>>/<<jdbc-cfg-timezone,`timezone`>> REST/driver parameters
with no conversion applied.
Expand Down Expand Up @@ -810,7 +810,7 @@ SQL Server Format Specification].

[NOTE]
If the 1st argument is of type `time`, then pattern specified by the 2nd argument cannot contain date related units
(e.g. 'dd', 'MM', 'YYYY', etc.). If it contains such units an error is returned. +
(e.g. 'dd', 'MM', 'yyyy', etc.). If it contains such units an error is returned. +
Format specifier `F` will be working similar to format specifier `f`.
It will return the fractional part of seconds, and the number of digits will be same as of the number of `Fs` provided as input (up to 9 digits).
Result will contain `0` appended in the end to match with number of `F` provided.
Expand Down Expand Up @@ -862,9 +862,9 @@ Patterns for Date/Time Formatting].
If the 1st argument is of type `time`, then the pattern specified by the 2nd argument cannot contain date related units
(e.g. 'dd', 'MM', 'YYYY', etc.). If it contains such units an error is returned. +
The result of the patterns `TZ` and `tz` (time zone abbreviations) in some cases differ from the results returned by the `TO_CHAR`
in PostgreSQL. The reason is that the time zone abbreviations specified by the JDK are different from the ones specified by PostgreSQL.
This function might show an actual time zone abbreviation instead of the generic `LMT` or empty string or offset returned by the PostgreSQL
implementation. The summer/daylight markers might also differ between the two implementations (e.g. will show `HT` instead of `HST`
in PostgreSQL. The reason is that the time zone abbreviations specified by the JDK are different from the ones specified by PostgreSQL.
This function might show an actual time zone abbreviation instead of the generic `LMT` or empty string or offset returned by the PostgreSQL
implementation. The summer/daylight markers might also differ between the two implementations (e.g. will show `HT` instead of `HST`
for Hawaii). +
The `FX`, `TM`, `SP` pattern modifiers are not supported and will show up as `FX`, `TM`, `SP` literals in the output.

Expand Down
43 changes: 43 additions & 0 deletions x-pack/plugin/sql/qa/server/src/main/resources/date.csv-spec
Original file line number Diff line number Diff line change
Expand Up @@ -404,3 +404,46 @@ SELECT emp_no FROM test_emp WHERE DATE_ADD('day', 1, hire_date) = '2021-02-03||-
10044
10085
;



// format

formatNormalPattern
SELECT FORMAT(birth_date, 'dd/MM/yyyy') as x FROM test_emp ORDER BY emp_no LIMIT 1;
x
----------
02/09/1953
;

formatWithDoubleQuoteEscaping
SELECT FORMAT(birth_date, '"yyyy" yyyy') as x FROM test_emp ORDER BY emp_no LIMIT 1;

x
------
yyyy 1953
;

formatSingleQuote
SELECT FORMAT(birth_date, '"''" yyyy') as x FROM test_emp ORDER BY emp_no LIMIT 1;

x
------
' 1953
;

formatQuotesAndAllowedCharacters
SELECT FORMAT(birth_date, 'abc ''yyy'' yyyy') as x FROM test_emp ORDER BY emp_no LIMIT 1;

x
------
abc yyy 1953
;

formatQuotesComplexString
SELECT FORMAT(birth_date, '\t\hi\s i\s \t\h\e \y\ear yyyy an\d \t\h\e \mon\t\h MM') as x FROM test_emp ORDER BY emp_no LIMIT 1;

x
------------------------------------
this is the year 1953 and the month 09
;
12 changes: 6 additions & 6 deletions x-pack/plugin/sql/qa/server/src/main/resources/datetime.csv-spec
Original file line number Diff line number Diff line change
Expand Up @@ -1173,8 +1173,8 @@ M | 1996-11-05 00:00:00.000Z

selectFormat
schema::format_date:s|format_datetime:s|format_time:s
SELECT FORMAT('2020-04-05T11:22:33.123Z'::date, 'dd/MM/YYYY HH:mm:ss.fff') AS format_date,
FORMAT('2020-04-05T11:22:33.123Z'::datetime, 'dd/MM/YYYY HH:mm:ss.ff') AS format_datetime,
SELECT FORMAT('2020-04-05T11:22:33.123Z'::date, 'dd/MM/yyyy HH:mm:ss.fff') AS format_date,
FORMAT('2020-04-05T11:22:33.123Z'::datetime, 'dd/MM/yyyy HH:mm:ss.ff') AS format_datetime,
FORMAT('11:22:33.123456789Z'::time, 'HH:mm:ss.ff') AS format_time;

format_date | format_datetime | format_time
Expand All @@ -1184,8 +1184,8 @@ FORMAT('11:22:33.123456789Z'::time, 'HH:mm:ss.ff') AS format_time;

selectFormatWithLength
schema::format_datetime:s|length:i
SELECT FORMAT('2020-04-05T11:22:33.123Z'::datetime, 'dd/MM/YYYY HH:mm:ss.ff') AS format_datetime,
LENGTH(FORMAT('2020-04-05T11:22:33.123Z'::datetime, 'dd/MM/YYYY HH:mm:ss.ff')) AS length;
SELECT FORMAT('2020-04-05T11:22:33.123Z'::datetime, 'dd/MM/yyyy HH:mm:ss.ff') AS format_datetime,
LENGTH(FORMAT('2020-04-05T11:22:33.123Z'::datetime, 'dd/MM/yyyy HH:mm:ss.ff')) AS length;

format_datetime | length
------------------------+----------------
Expand All @@ -1194,7 +1194,7 @@ LENGTH(FORMAT('2020-04-05T11:22:33.123Z'::datetime, 'dd/MM/YYYY HH:mm:ss.ff')) A

selectFormatWithField
schema::birth_date:ts|format_birth_date1:s|format_birth_date2:s|emp_no:i
SELECT birth_date, FORMAT(birth_date, 'MM/dd/YYYY') AS format_birth_date1, FORMAT(birth_date, concat(gender, 'M/dd')) AS format_birth_date2, emp_no
SELECT birth_date, FORMAT(birth_date, 'MM/dd/yyyy') AS format_birth_date1, FORMAT(birth_date, concat(gender, 'M/dd')) AS format_birth_date2, emp_no
FROM test_emp WHERE gender = 'M' AND emp_no BETWEEN 10037 AND 10052 ORDER BY emp_no;

birth_date | format_birth_date1 | format_birth_date2 | emp_no
Expand Down Expand Up @@ -1233,7 +1233,7 @@ WHERE FORMAT(birth_date, 'MM')::integer > 10 ORDER BY emp_no LIMIT 10;

formatOrderBy
schema::birth_date:ts|format_birth_date:s
SELECT birth_date, FORMAT(birth_date, 'MM/dd/YYYY') AS format_birth_date FROM test_emp ORDER BY 2 DESC NULLS LAST LIMIT 10;
SELECT birth_date, FORMAT(birth_date, 'MM/dd/yyyy') AS format_birth_date FROM test_emp ORDER BY 2 DESC NULLS LAST LIMIT 10;

birth_date | format_birth_date
-------------------------+---------------
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3190,7 +3190,7 @@ SELECT DATE_TRUNC('days', INTERVAL '19 15:24:19' DAY TO SECONDS) AS day;

formatDate
// tag::formatDate
SELECT FORMAT(CAST('2020-04-05' AS DATE), 'dd/MM/YYYY') AS "date";
SELECT FORMAT(CAST('2020-04-05' AS DATE), 'dd/MM/yyyy') AS "date";

date
------------------
Expand All @@ -3200,7 +3200,7 @@ SELECT FORMAT(CAST('2020-04-05' AS DATE), 'dd/MM/YYYY') AS "date";

formatDateTime
// tag::formatDateTime
SELECT FORMAT(CAST('2020-04-05T11:22:33.987654' AS DATETIME), 'dd/MM/YYYY HH:mm:ss.ff') AS "datetime";
SELECT FORMAT(CAST('2020-04-05T11:22:33.987654' AS DATETIME), 'dd/MM/yyyy HH:mm:ss.ff') AS "datetime";

datetime
------------------
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,14 +20,50 @@
import java.time.temporal.TemporalAccessor;
import java.util.Locale;
import java.util.Objects;
import java.util.Set;
import java.util.function.Function;

import static org.elasticsearch.xpack.sql.util.DateUtils.asTimeAtZone;

public class DateTimeFormatProcessor extends BinaryDateTimeProcessor {

public static final String NAME = "dtformat";
private static final String[][] JAVA_TIME_FORMAT_REPLACEMENTS = {

/**
* these characters have a meaning in MS date patterns.
* If a character is not in this set, then it's still allowed in MS FORMAT patters
* but not in Java, so it has to be translated or quoted
*/
private static final Set<Character> MS_DATETIME_PATTERN_CHARS = Set.of(
'd',
'f',
'F',
'g',
'h',
'H',
'K',
'm',
'M',
's',
't',
'y',
'z',
':',
'/',
' ',
'-'
);

/**
* characters that start a quoting block in MS patterns
*/
private static final Set<Character> MS_QUOTING_CHARS = Set.of('\\', '\'', '"');

/**
* list of MS datetime patterns with the corresponding translation in Java DateTimeFormat
* (patterns that are the same in Java and in MS are not listed here)
*/
private static final String[][] MS_TO_JAVA_PATTERNS = {
{ "tt", "a" },
{ "t", "a" },
{ "dddd", "eeee" },
Expand All @@ -47,10 +83,7 @@ protected Function<TemporalAccessor, String> formatterFor(String pattern) {
if (pattern.isEmpty()) {
return null;
}
for (String[] replacement : JAVA_TIME_FORMAT_REPLACEMENTS) {
pattern = pattern.replace(replacement[0], replacement[1]);
}
final String javaPattern = pattern;
final String javaPattern = msToJavaPattern(pattern);
return DateTimeFormatter.ofPattern(javaPattern, Locale.ROOT)::format;
}
},
Expand All @@ -67,6 +100,95 @@ protected Function<TemporalAccessor, String> formatterFor(String pattern) {
}
};

protected static String msToJavaPattern(String pattern) {
StringBuilder result = new StringBuilder(pattern.length());
StringBuilder partialQuotedString = new StringBuilder();

boolean originalCharacterQuoted = false;
boolean lastTargetCharacterQuoted = false;
char quotingChar = '\\';

for (int i = 0; i < pattern.length(); i++) {
char c = pattern.charAt(i);
if (originalCharacterQuoted) {
if (quotingChar == '\\') {
// in the original pattern, this is a single quoted character, add it to the partial string
// that will be quoted in Java
originalCharacterQuoted = false;
lastTargetCharacterQuoted = true;
partialQuotedString.append(c);
} else if (c == quotingChar) {
// the original pattern is closing the quoting,
// do nothing for now, next character could open a new quoting block
originalCharacterQuoted = false;
} else {
// any character that is not a quoting char is just added to the partial quoting string
// because there could be more characters to quote after that
partialQuotedString.append(c);
}
} else {
boolean characterProcessed = false;
// the original pattern is not quoting
if (MS_QUOTING_CHARS.contains(c)) {
// next character(s) is quoted, start a quoted block on the target
originalCharacterQuoted = true;
lastTargetCharacterQuoted = true;
quotingChar = c;
characterProcessed = true;
} else {
// manage patterns that are different from MS to Java and have to be translated
for (String[] item : MS_TO_JAVA_PATTERNS) {
int fragmentLength = item[0].length();
if (i + fragmentLength <= pattern.length() && item[0].equals(pattern.substring(i, i + fragmentLength))) {
if (lastTargetCharacterQuoted) {
// now origin is not quoting for sure and the next block is a valid datetime pattern,
// that has to be translated and written as is (not quoted).
// Before doing this, let's flush the previously quoted string
// and quote it properly with Java syntax
lastTargetCharacterQuoted = false;
quoteAndAppend(result, partialQuotedString);
partialQuotedString = new StringBuilder();
}
// and then translate the pattern
result.append(item[1]);
characterProcessed = true;
i += (fragmentLength - 1); // fast-forward, because the replaced pattern could be longer than one character
break;
}
}
}
if (characterProcessed == false) {
if (MS_DATETIME_PATTERN_CHARS.contains(c) == false) {
// this character is allowed in MS, but not in Java, so it has to be quoted in the result
lastTargetCharacterQuoted = true;
partialQuotedString.append(c);
} else {
// any other character is a valid datetime pattern in both Java and MS
if (lastTargetCharacterQuoted) {
// flush the quoted string first, if any
lastTargetCharacterQuoted = false;
quoteAndAppend(result, partialQuotedString);
partialQuotedString = new StringBuilder();
}
// and then add the character itself, as it is
result.append(c);
}
}
}
}
// if the original pattern ended with a quoted block, flush it to the result and quote it in Java
if (lastTargetCharacterQuoted) {
quoteAndAppend(result, partialQuotedString);
}
return result.toString();
}

private static void quoteAndAppend(StringBuilder mainBuffer, StringBuilder fragmentToQuote) {
mainBuffer.append("'");
mainBuffer.append(fragmentToQuote.toString().replaceAll("'", "''"));
mainBuffer.append("'");
}

protected abstract Function<TemporalAccessor, String> formatterFor(String pattern);

public Object format(Object timestamp, Object pattern, ZoneId zoneId) {
Expand Down
Loading

0 comments on commit f69c739

Please sign in to comment.