Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Object validation torture (AOR proposal) #88

Open
codalogic opened this issue Aug 26, 2016 · 50 comments
Open

Object validation torture (AOR proposal) #88

codalogic opened this issue Aug 26, 2016 · 50 comments

Comments

@codalogic
Copy link
Contributor

I've been thinking about validation of objects. If we have the following JCR:

{ "bar":string, ( "foo":integer | "baz":string ) }

should the following JSON be valid or not:

{ "bar":"thing", "foo":2, "baz": "thingy" }

My feeling is not because the author is saying they want one or the other.

What do others expect?

@johnwcowan
Copy link

I agree with you. If you want coffee or tea, it doesn't mean you want one of each.

"If this is coffee, bring me tea; but if this is tea, bring me coffee."

@anewton1998
Copy link
Contributor

I know this seems weird, but in the spirit of JSON (and sometimes XML depending on opinion) all members not expressly forbidden are allowed. Or put a different way, unspecified members of an object are ignored. If you want that to work, you have to explicitly "close off" or "restrict" the object.

Here's what you are doing:

→  jcr -v -R '{ "bar":string, ( "foo":integer | "baz":string ) }' -J '{ "bar":"thing", "foo":2, "baz": "thingy" }'
"Ruleset Parse Tree"
[{:object_rule=>
   [{:member_rule=>
      {:member_name=>{:q_string=>"bar"@3},
       :primitive_rule=>{:string=>"string"@8}}},
    {:sequence_combiner=>","@14,
     :group_rule=>
      [{:member_rule=>
         {:member_name=>{:q_string=>"foo"@19},
          :primitive_rule=>{:integer_v=>"integer"@24}}},
       {:choice_combiner=>"|"@32,
        :member_rule=>
         {:member_name=>{:q_string=>"baz"@35},
          :primitive_rule=>{:string=>"string"@40}}}]}]}]
"Ruleset Map"
"Evaluating Root:"
" { \"bar\" : string ,  ( \"foo\" : integer | \"baz\" : string ) }"
[ 1:[1, 4]@3 ] Evaluating object rule starting at '"bar"@3' ( line 1 column 4 ) against data: {"bar"=>"thing", "foo"=>2, "baz"=>"thingy"}
[ 1:[1, 4]@3 ] object definition:  { "bar" : string ,  ( "foo" : integer | "baz" : string ) } data: {"bar"=>"thing", "foo"=>2, "baz"=>"thingy"}
[ 1:[1, 4]@3 ] rule repetition min = 1 max = 1 repetition step =
[ 2:[1, 4]@3 ] Evaluating member rule for key 'bar' starting at '"bar"@3' ( line 1 column 4 ) against  data: "thing"
[ 2:[1, 4]@3 ] member definition: "bar" : string data: ["bar", "thing"]
[ 3:[1, 9]@8 ] Evaluating value rule starting at '"string"@8' ( line 1 column 9 )
[ 3:[1, 9]@8 ] value definition: string data: "thing"
[ 3:[1, 9]@8 ] Value evaluation is true
[ 2:[1, 4]@3 ] Member evaluation is true
[ 1:[1, 4]@3 ] Found 1 matching members repetitions in object with min 1 and max 1
[ 1:[1, 4]@3 ] rule repetition min = 1 max = 1 repetition step =
[ 2:[1, 20]@19 ] Evaluating object rule starting at '"foo"@19' ( line 1 column 20 ) against data: {"bar"=>"thing", "foo"=>2, "baz"=>"thingy"}
[ 2:[1, 20]@19 ] object definition:  { "foo" : integer | "baz" : string } data: {"bar"=>"thing", "foo"=>2, "baz"=>"thingy"}
[ 2:[1, 20]@19 ] rule repetition min = 1 max = 1 repetition step =
[ 3:[1, 20]@19 ] Evaluating member rule for key 'foo' starting at '"foo"@19' ( line 1 column 20 ) against  data: 2
[ 3:[1, 20]@19 ] member definition: "foo" : integer data: ["foo", 2]
[ 4:[1, 25]@24 ] Evaluating value rule starting at '"integer"@24' ( line 1 column 25 )
[ 4:[1, 25]@24 ] value definition: integer data: 2
[ 4:[1, 25]@24 ] Value evaluation is true
[ 3:[1, 20]@19 ] Member evaluation is true
[ 2:[1, 20]@19 ] Found 1 matching members repetitions in object with min 1 and max 1
[ 2:[1, 20]@19 ] Object evaluation is true
[ 1:[1, 4]@3 ] Object evaluation is true
Success!

But here it is with @{not} //:any @+ to close off the object:

→  jcr -v -R '{ "bar":string, ( "foo":integer | "baz":string ), @{not} //:any @+ }' -J '{ "bar":"thing", "foo":2, "baz": "thingy" }'
"Ruleset Parse Tree"
[{:object_rule=>
   [{:member_rule=>
      {:member_name=>{:q_string=>"bar"@3},
       :primitive_rule=>{:string=>"string"@8}}},
    {:sequence_combiner=>","@14,
     :group_rule=>
      [{:member_rule=>
         {:member_name=>{:q_string=>"foo"@19},
          :primitive_rule=>{:integer_v=>"integer"@24}}},
       {:choice_combiner=>"|"@32,
        :member_rule=>
         {:member_name=>{:q_string=>"baz"@35},
          :primitive_rule=>{:string=>"string"@40}}}]},
    {:sequence_combiner=>","@48,
     :member_rule=>
      [{:not_annotation=>"not"@52},
       {:member_regex=>{:regex=>[], :regex_modifiers=>[]}},
       {:primitive_rule=>{:any=>"any"@60}}],
     :one_or_more=>"+"@65}]}]
"Ruleset Map"
"Evaluating Root:"
" { \"bar\" : string ,  ( \"foo\" : integer | \"baz\" : string ) ,  @{not} /[]/ : any }"
[ 1:[1, 4]@3 ] Evaluating object rule starting at '"bar"@3' ( line 1 column 4 ) against data: {"bar"=>"thing", "foo"=>2, "baz"=>"thingy"}
[ 1:[1, 4]@3 ] object definition:  { "bar" : string ,  ( "foo" : integer | "baz" : string ) ... data: {"bar"=>"thing", "foo"=>2, "baz"=>"thingy"}
[ 1:[1, 4]@3 ] rule repetition min = 1 max = 1 repetition step =
[ 2:[1, 4]@3 ] Evaluating member rule for key 'bar' starting at '"bar"@3' ( line 1 column 4 ) against  data: "thing"
[ 2:[1, 4]@3 ] member definition: "bar" : string data: ["bar", "thing"]
[ 3:[1, 9]@8 ] Evaluating value rule starting at '"string"@8' ( line 1 column 9 )
[ 3:[1, 9]@8 ] value definition: string data: "thing"
[ 3:[1, 9]@8 ] Value evaluation is true
[ 2:[1, 4]@3 ] Member evaluation is true
[ 1:[1, 4]@3 ] Found 1 matching members repetitions in object with min 1 and max 1
[ 1:[1, 4]@3 ] rule repetition min = 1 max = 1 repetition step =
[ 2:[1, 20]@19 ] Evaluating object rule starting at '"foo"@19' ( line 1 column 20 ) against data: {"bar"=>"thing", "foo"=>2, "baz"=>"thingy"}
[ 2:[1, 20]@19 ] object definition:  { "foo" : integer | "baz" : string } data: {"bar"=>"thing", "foo"=>2, "baz"=>"thingy"}
[ 2:[1, 20]@19 ] rule repetition min = 1 max = 1 repetition step =
[ 3:[1, 20]@19 ] Evaluating member rule for key 'foo' starting at '"foo"@19' ( line 1 column 20 ) against  data: 2
[ 3:[1, 20]@19 ] member definition: "foo" : integer data: ["foo", 2]
[ 4:[1, 25]@24 ] Evaluating value rule starting at '"integer"@24' ( line 1 column 25 )
[ 4:[1, 25]@24 ] value definition: integer data: 2
[ 4:[1, 25]@24 ] Value evaluation is true
[ 3:[1, 20]@19 ] Member evaluation is true
[ 2:[1, 20]@19 ] Found 1 matching members repetitions in object with min 1 and max 1
[ 2:[1, 20]@19 ] Object evaluation is true
[ 1:[1, 4]@3 ] rule repetition min = 1 max = Infinity repetition step =
[ 1:[1, 4]@3 ] Scanning object for [].
[ 2:[1, 53]@52 ] Evaluating member rule for key 'baz' starting at '"not"@52' ( line 1 column 53 ) against  data: "thingy"
[ 2:[1, 53]@52 ] member definition:  @{not} /[]/ : any data: ["baz", "thingy"]
[ 2:[1, 53]@52 ] Noting empty regular expression.
[ 3:[1, 61]@60 ] Evaluating value rule starting at '"any"@60' ( line 1 column 61 )
[ 3:[1, 61]@60 ] value definition: any data: "thingy"
[ 3:[1, 61]@60 ] Value evaluation is true
[ 2:[1, 53]@52 ] Not annotation changing result from true to false
[ 2:[1, 53]@52 ] Member evaluation failed:
[ 2:[1, 53]@52 ] ** LIKELY ROOT CAUSE FOR FAILURE **
[ 2:[1, 53]@52 ] ***********************************
[ 2:[1, 53]@52 ] Failed rule at line,column: [1, 53] file position offset: 52
[ 2:[1, 53]@52 ] member definition:  @{not} /[]/ : any data: ["baz", "thingy"]
[ 2:[1, 53]@52 ] JSON that failed to validate: ["baz","thingy"]
[ 2:[1, 53]@52 ] ***********************************
[ 1:[1, 4]@3 ] Found 0 matching members repetitions in object with min 1 and max Infinity
[ 1:[1, 4]@3 ] Object evaluation failed: object does not contain {:sequence_combiner=>","@48, :member_rule=>[{:not_annotation=>"not"@52}, {:member_regex=>{:regex=>[], :regex_modifiers=>[]}}, {:primitive_rule=>{:any=>"any"@60}}], :one_or_more=>"+"@65} for  rule at '"bar"@3' ( line 1 column 4 ) [ [{:member_rule=>{:member_name=>{:q_string=>"bar"@3}, :primitive_rule=>{:string=>"string"@8}}}, {:sequence_combiner=>","@14, :group_rule=>[{:member_rule=>{:member_name=>{:q_string=>"foo"@19}, :primitive_rule=>{:integer_v=>"integer"@24}}}, {:choice_combiner=>"|"@32, :member_rule=>{:member_name=>{:q_string=>"baz"@35}, :primitive_rule=>{:string=>"string"@40}}}]}, {:sequence_combiner=>","@48, :member_rule=>[{:not_annotation=>"not"@52}, {:member_regex=>{:regex=>[], :regex_modifiers=>[]}}, {:primitive_rule=>{:any=>"any"@60}}], :one_or_more=>"+"@65}] ] from rule at '"bar"@3' ( line 1 column 4 )
Failure: object does not contain {:sequence_combiner=>","@48, :member_rule=>[{:not_annotation=>"not"@52}, {:member_regex=>{:regex=>[], :regex_modifiers=>[]}}, {:primitive_rule=>{:any=>"any"@60}}], :one_or_more=>"+"@65} for  rule at '"bar"@3' ( line 1 column 4 ) [ [{:member_rule=>{:member_name=>{:q_string=>"bar"@3}, :primitive_rule=>{:string=>"string"@8}}}, {:sequence_combiner=>","@14, :group_rule=>[{:member_rule=>{:member_name=>{:q_string=>"foo"@19}, :primitive_rule=>{:integer_v=>"integer"@24}}}, {:choice_combiner=>"|"@32, :member_rule=>{:member_name=>{:q_string=>"baz"@35}, :primitive_rule=>{:string=>"string"@40}}}]}, {:sequence_combiner=>","@48, :member_rule=>[{:not_annotation=>"not"@52}, {:member_regex=>{:regex=>[], :regex_modifiers=>[]}}, {:primitive_rule=>{:any=>"any"@60}}], :one_or_more=>"+"@65}] ] from rule at '"bar"@3' ( line 1 column 4 )

If you don't want to close off the object, can you can explicitly change the logic from foo or baz to (foo and not baz) or (baz and not foo). Here is the failure to validate with that.

→  jcr -v -R '{ "bar":string, ( ( "foo":integer , @{not} "baz":string ) | ( "baz":string , @{not} "foo":integer ) ) }' -J '{ "bar":"thing", "foo":2, "baz": "thingy" }'
"Ruleset Parse Tree"
[{:object_rule=>
   [{:member_rule=>
      {:member_name=>{:q_string=>"bar"@3},
       :primitive_rule=>{:string=>"string"@8}}},
    {:sequence_combiner=>","@14,
     :group_rule=>
      [{:group_rule=>
         [{:member_rule=>
            {:member_name=>{:q_string=>"foo"@21},
             :primitive_rule=>{:integer_v=>"integer"@26}}},
          {:sequence_combiner=>","@34,
           :member_rule=>
            [{:not_annotation=>"not"@38},
             {:member_name=>{:q_string=>"baz"@44}},
             {:primitive_rule=>{:string=>"string"@49}}]}]},
       {:choice_combiner=>"|"@58,
        :group_rule=>
         [{:member_rule=>
            {:member_name=>{:q_string=>"baz"@63},
             :primitive_rule=>{:string=>"string"@68}}},
          {:sequence_combiner=>","@75,
           :member_rule=>
            [{:not_annotation=>"not"@79},
             {:member_name=>{:q_string=>"foo"@85}},
             {:primitive_rule=>{:integer_v=>"integer"@90}}]}]}]}]}]
"Ruleset Map"
"Evaluating Root:"
" { \"bar\" : string ,  (  ( \"foo\" : integer ,  @{not} \"baz\" : string ) |  ( \"baz\" : string ,  @{not} \"foo\" : integer ) ) }"
[ 1:[1, 4]@3 ] Evaluating object rule starting at '"bar"@3' ( line 1 column 4 ) against data: {"bar"=>"thing", "foo"=>2, "baz"=>"thingy"}
[ 1:[1, 4]@3 ] object definition:  { "bar" : string ,  (  ( "foo" : integer ,  @{not} "baz" ... data: {"bar"=>"thing", "foo"=>2, "baz"=>"thingy"}
[ 1:[1, 4]@3 ] rule repetition min = 1 max = 1 repetition step =
[ 2:[1, 4]@3 ] Evaluating member rule for key 'bar' starting at '"bar"@3' ( line 1 column 4 ) against  data: "thing"
[ 2:[1, 4]@3 ] member definition: "bar" : string data: ["bar", "thing"]
[ 3:[1, 9]@8 ] Evaluating value rule starting at '"string"@8' ( line 1 column 9 )
[ 3:[1, 9]@8 ] value definition: string data: "thing"
[ 3:[1, 9]@8 ] Value evaluation is true
[ 2:[1, 4]@3 ] Member evaluation is true
[ 1:[1, 4]@3 ] Found 1 matching members repetitions in object with min 1 and max 1
[ 1:[1, 4]@3 ] rule repetition min = 1 max = 1 repetition step =
[ 2:[1, 22]@21 ] Evaluating object rule starting at '"foo"@21' ( line 1 column 22 ) against data: {"bar"=>"thing", "foo"=>2, "baz"=>"thingy"}
[ 2:[1, 22]@21 ] object definition:  {  ( "foo" : integer ,  @{not} "baz" : string ) |  ( "ba ... data: {"bar"=>"thing", "foo"=>2, "baz"=>"thingy"}
[ 2:[1, 22]@21 ] rule repetition min = 1 max = 1 repetition step =
[ 3:[1, 22]@21 ] Evaluating object rule starting at '"foo"@21' ( line 1 column 22 ) against data: {"bar"=>"thing", "foo"=>2, "baz"=>"thingy"}
[ 3:[1, 22]@21 ] object definition:  { "foo" : integer ,  @{not} "baz" : string } data: {"bar"=>"thing", "foo"=>2, "baz"=>"thingy"}
[ 3:[1, 22]@21 ] rule repetition min = 1 max = 1 repetition step =
[ 4:[1, 22]@21 ] Evaluating member rule for key 'foo' starting at '"foo"@21' ( line 1 column 22 ) against  data: 2
[ 4:[1, 22]@21 ] member definition: "foo" : integer data: ["foo", 2]
[ 5:[1, 27]@26 ] Evaluating value rule starting at '"integer"@26' ( line 1 column 27 )
[ 5:[1, 27]@26 ] value definition: integer data: 2
[ 5:[1, 27]@26 ] Value evaluation is true
[ 4:[1, 22]@21 ] Member evaluation is true
[ 3:[1, 22]@21 ] Found 1 matching members repetitions in object with min 1 and max 1
[ 3:[1, 22]@21 ] rule repetition min = 1 max = 1 repetition step =
[ 4:[1, 39]@38 ] Evaluating member rule for key 'baz' starting at '"not"@38' ( line 1 column 39 ) against  data: "thingy"
[ 4:[1, 39]@38 ] member definition:  @{not} "baz" : string data: ["baz", "thingy"]
[ 5:[1, 50]@49 ] Evaluating value rule starting at '"string"@49' ( line 1 column 50 )
[ 5:[1, 50]@49 ] value definition: string data: "thingy"
[ 5:[1, 50]@49 ] Value evaluation is true
[ 4:[1, 39]@38 ] Not annotation changing result from true to false
[ 4:[1, 39]@38 ] Member evaluation failed:
[ 4:[1, 39]@38 ] ** LIKELY ROOT CAUSE FOR FAILURE **
[ 4:[1, 39]@38 ] ***********************************
[ 4:[1, 39]@38 ] Failed rule at line,column: [1, 39] file position offset: 38
[ 4:[1, 39]@38 ] member definition:  @{not} "baz" : string data: ["baz", "thingy"]
[ 4:[1, 39]@38 ] JSON that failed to validate: ["baz","thingy"]
[ 4:[1, 39]@38 ] ***********************************
[ 3:[1, 22]@21 ] Found 0 matching members repetitions in object with min 1 and max 1
[ 3:[1, 22]@21 ] Object evaluation failed: object does not contain {:sequence_combiner=>","@34, :member_rule=>[{:not_annotation=>"not"@38}, {:member_name=>{:q_string=>"baz"@44}}, {:primitive_rule=>{:string=>"string"@49}}]} for  rule at '"foo"@21' ( line 1 column 22 ) [ [{:member_rule=>{:member_name=>{:q_string=>"foo"@21}, :primitive_rule=>{:integer_v=>"integer"@26}}}, {:sequence_combiner=>","@34, :member_rule=>[{:not_annotation=>"not"@38}, {:member_name=>{:q_string=>"baz"@44}}, {:primitive_rule=>{:string=>"string"@49}}]}] ] from rule at '"bar"@3' ( line 1 column 4 )
[ 2:[1, 22]@21 ] rule repetition min = 1 max = 1 repetition step =
[ 3:[1, 64]@63 ] Evaluating object rule starting at '"baz"@63' ( line 1 column 64 ) against data: {"bar"=>"thing", "foo"=>2, "baz"=>"thingy"}
[ 3:[1, 64]@63 ] object definition:  { "baz" : string ,  @{not} "foo" : integer } data: {"bar"=>"thing", "foo"=>2, "baz"=>"thingy"}
[ 3:[1, 64]@63 ] rule repetition min = 1 max = 1 repetition step =
[ 4:[1, 64]@63 ] Evaluating member rule for key 'baz' starting at '"baz"@63' ( line 1 column 64 ) against  data: "thingy"
[ 4:[1, 64]@63 ] member definition: "baz" : string data: ["baz", "thingy"]
[ 5:[1, 69]@68 ] Evaluating value rule starting at '"string"@68' ( line 1 column 69 )
[ 5:[1, 69]@68 ] value definition: string data: "thingy"
[ 5:[1, 69]@68 ] Value evaluation is true
[ 4:[1, 64]@63 ] Member evaluation is true
[ 3:[1, 64]@63 ] Found 1 matching members repetitions in object with min 1 and max 1
[ 3:[1, 64]@63 ] rule repetition min = 1 max = 1 repetition step =
[ 4:[1, 80]@79 ] Evaluating member rule for key 'foo' starting at '"not"@79' ( line 1 column 80 ) against  data: 2
[ 4:[1, 80]@79 ] member definition:  @{not} "foo" : integer data: ["foo", 2]
[ 5:[1, 91]@90 ] Evaluating value rule starting at '"integer"@90' ( line 1 column 91 )
[ 5:[1, 91]@90 ] value definition: integer data: 2
[ 5:[1, 91]@90 ] Value evaluation is true
[ 4:[1, 80]@79 ] Not annotation changing result from true to false
[ 4:[1, 80]@79 ] Member evaluation failed:
[ 3:[1, 64]@63 ] Found 0 matching members repetitions in object with min 1 and max 1
[ 3:[1, 64]@63 ] Object evaluation failed: object does not contain {:sequence_combiner=>","@75, :member_rule=>[{:not_annotation=>"not"@79}, {:member_name=>{:q_string=>"foo"@85}}, {:primitive_rule=>{:integer_v=>"integer"@90}}]} for  rule at '"baz"@63' ( line 1 column 64 ) [ [{:member_rule=>{:member_name=>{:q_string=>"baz"@63}, :primitive_rule=>{:string=>"string"@68}}}, {:sequence_combiner=>","@75, :member_rule=>[{:not_annotation=>"not"@79}, {:member_name=>{:q_string=>"foo"@85}}, {:primitive_rule=>{:integer_v=>"integer"@90}}]}] ] from rule at '"bar"@3' ( line 1 column 4 )
[ 2:[1, 22]@21 ] Object evaluation failed: object does not contain group {:choice_combiner=>"|"@58, :group_rule=>[{:member_rule=>{:member_name=>{:q_string=>"baz"@63}, :primitive_rule=>{:string=>"string"@68}}}, {:sequence_combiner=>","@75, :member_rule=>[{:not_annotation=>"not"@79}, {:member_name=>{:q_string=>"foo"@85}}, {:primitive_rule=>{:integer_v=>"integer"@90}}]}]} for  rule at '"foo"@21' ( line 1 column 22 ) [ [{:group_rule=>[{:member_rule=>{:member_name=>{:q_string=>"foo"@21}, :primitive_rule=>{:integer_v=>"integer"@26}}}, {:sequence_combiner=>","@34, :member_rule=>[{:not_annotation=>"not"@38}, {:member_name=>{:q_string=>"baz"@44}}, {:primitive_rule=>{:string=>"string"@49}}]}]}, {:choice_combiner=>"|"@58, :group_rule=>[{:member_rule=>{:member_name=>{:q_string=>"baz"@63}, :primitive_rule=>{:string=>"string"@68}}}, {:sequence_combiner=>","@75, :member_rule=>[{:not_annotation=>"not"@79}, {:member_name=>{:q_string=>"foo"@85}}, {:primitive_rule=>{:integer_v=>"integer"@90}}]}]}] ] from rule at '"bar"@3' ( line 1 column 4 )
[ 1:[1, 4]@3 ] Object evaluation failed: object does not contain group {:sequence_combiner=>","@14, :group_rule=>[{:group_rule=>[{:member_rule=>{:member_name=>{:q_string=>"foo"@21}, :primitive_rule=>{:integer_v=>"integer"@26}}}, {:sequence_combiner=>","@34, :member_rule=>[{:not_annotation=>"not"@38}, {:member_name=>{:q_string=>"baz"@44}}, {:primitive_rule=>{:string=>"string"@49}}]}]}, {:choice_combiner=>"|"@58, :group_rule=>[{:member_rule=>{:member_name=>{:q_string=>"baz"@63}, :primitive_rule=>{:string=>"string"@68}}}, {:sequence_combiner=>","@75, :member_rule=>[{:not_annotation=>"not"@79}, {:member_name=>{:q_string=>"foo"@85}}, {:primitive_rule=>{:integer_v=>"integer"@90}}]}]}]} for  rule at '"bar"@3' ( line 1 column 4 ) [ [{:member_rule=>{:member_name=>{:q_string=>"bar"@3}, :primitive_rule=>{:string=>"string"@8}}}, {:sequence_combiner=>","@14, :group_rule=>[{:group_rule=>[{:member_rule=>{:member_name=>{:q_string=>"foo"@21}, :primitive_rule=>{:integer_v=>"integer"@26}}}, {:sequence_combiner=>","@34, :member_rule=>[{:not_annotation=>"not"@38}, {:member_name=>{:q_string=>"baz"@44}}, {:primitive_rule=>{:string=>"string"@49}}]}]}, {:choice_combiner=>"|"@58, :group_rule=>[{:member_rule=>{:member_name=>{:q_string=>"baz"@63}, :primitive_rule=>{:string=>"string"@68}}}, {:sequence_combiner=>","@75, :member_rule=>[{:not_annotation=>"not"@79}, {:member_name=>{:q_string=>"foo"@85}}, {:primitive_rule=>{:integer_v=>"integer"@90}}]}]}]}] ] from rule at '"bar"@3' ( line 1 column 4 )
Failure: object does not contain group {:sequence_combiner=>","@14, :group_rule=>[{:group_rule=>[{:member_rule=>{:member_name=>{:q_string=>"foo"@21}, :primitive_rule=>{:integer_v=>"integer"@26}}}, {:sequence_combiner=>","@34, :member_rule=>[{:not_annotation=>"not"@38}, {:member_name=>{:q_string=>"baz"@44}}, {:primitive_rule=>{:string=>"string"@49}}]}]}, {:choice_combiner=>"|"@58, :group_rule=>[{:member_rule=>{:member_name=>{:q_string=>"baz"@63}, :primitive_rule=>{:string=>"string"@68}}}, {:sequence_combiner=>","@75, :member_rule=>[{:not_annotation=>"not"@79}, {:member_name=>{:q_string=>"foo"@85}}, {:primitive_rule=>{:integer_v=>"integer"@90}}]}]}]} for  rule at '"bar"@3' ( line 1 column 4 ) [ [{:member_rule=>{:member_name=>{:q_string=>"bar"@3}, :primitive_rule=>{:string=>"string"@8}}}, {:sequence_combiner=>","@14, :group_rule=>[{:group_rule=>[{:member_rule=>{:member_name=>{:q_string=>"foo"@21}, :primitive_rule=>{:integer_v=>"integer"@26}}}, {:sequence_combiner=>","@34, :member_rule=>[{:not_annotation=>"not"@38}, {:member_name=>{:q_string=>"baz"@44}}, {:primitive_rule=>{:string=>"string"@49}}]}]}, {:choice_combiner=>"|"@58, :group_rule=>[{:member_rule=>{:member_name=>{:q_string=>"baz"@63}, :primitive_rule=>{:string=>"string"@68}}}, {:sequence_combiner=>","@75, :member_rule=>[{:not_annotation=>"not"@79}, {:member_name=>{:q_string=>"foo"@85}}, {:primitive_rule=>{:integer_v=>"integer"@90}}]}]}]}] ] from rule at '"bar"@3' ( line 1 column 4 )

And here is the validation working with that logic:

→  jcr -v -R '{ "bar":string, ( ( "foo":integer , @{not} "baz":string ) | ( "baz":string , @{not} "foo":integer ) ) }' -J '{ "bar":"thing", "foo":2  }'
"Ruleset Parse Tree"
[{:object_rule=>
   [{:member_rule=>
      {:member_name=>{:q_string=>"bar"@3},
       :primitive_rule=>{:string=>"string"@8}}},
    {:sequence_combiner=>","@14,
     :group_rule=>
      [{:group_rule=>
         [{:member_rule=>
            {:member_name=>{:q_string=>"foo"@21},
             :primitive_rule=>{:integer_v=>"integer"@26}}},
          {:sequence_combiner=>","@34,
           :member_rule=>
            [{:not_annotation=>"not"@38},
             {:member_name=>{:q_string=>"baz"@44}},
             {:primitive_rule=>{:string=>"string"@49}}]}]},
       {:choice_combiner=>"|"@58,
        :group_rule=>
         [{:member_rule=>
            {:member_name=>{:q_string=>"baz"@63},
             :primitive_rule=>{:string=>"string"@68}}},
          {:sequence_combiner=>","@75,
           :member_rule=>
            [{:not_annotation=>"not"@79},
             {:member_name=>{:q_string=>"foo"@85}},
             {:primitive_rule=>{:integer_v=>"integer"@90}}]}]}]}]}]
"Ruleset Map"
"Evaluating Root:"
" { \"bar\" : string ,  (  ( \"foo\" : integer ,  @{not} \"baz\" : string ) |  ( \"baz\" : string ,  @{not} \"foo\" : integer ) ) }"
[ 1:[1, 4]@3 ] Evaluating object rule starting at '"bar"@3' ( line 1 column 4 ) against data: {"bar"=>"thing", "foo"=>2}
[ 1:[1, 4]@3 ] object definition:  { "bar" : string ,  (  ( "foo" : integer ,  @{not} "baz" ... data: {"bar"=>"thing", "foo"=>2}
[ 1:[1, 4]@3 ] rule repetition min = 1 max = 1 repetition step =
[ 2:[1, 4]@3 ] Evaluating member rule for key 'bar' starting at '"bar"@3' ( line 1 column 4 ) against  data: "thing"
[ 2:[1, 4]@3 ] member definition: "bar" : string data: ["bar", "thing"]
[ 3:[1, 9]@8 ] Evaluating value rule starting at '"string"@8' ( line 1 column 9 )
[ 3:[1, 9]@8 ] value definition: string data: "thing"
[ 3:[1, 9]@8 ] Value evaluation is true
[ 2:[1, 4]@3 ] Member evaluation is true
[ 1:[1, 4]@3 ] Found 1 matching members repetitions in object with min 1 and max 1
[ 1:[1, 4]@3 ] rule repetition min = 1 max = 1 repetition step =
[ 2:[1, 22]@21 ] Evaluating object rule starting at '"foo"@21' ( line 1 column 22 ) against data: {"bar"=>"thing", "foo"=>2}
[ 2:[1, 22]@21 ] object definition:  {  ( "foo" : integer ,  @{not} "baz" : string ) |  ( "ba ... data: {"bar"=>"thing", "foo"=>2}
[ 2:[1, 22]@21 ] rule repetition min = 1 max = 1 repetition step =
[ 3:[1, 22]@21 ] Evaluating object rule starting at '"foo"@21' ( line 1 column 22 ) against data: {"bar"=>"thing", "foo"=>2}
[ 3:[1, 22]@21 ] object definition:  { "foo" : integer ,  @{not} "baz" : string } data: {"bar"=>"thing", "foo"=>2}
[ 3:[1, 22]@21 ] rule repetition min = 1 max = 1 repetition step =
[ 4:[1, 22]@21 ] Evaluating member rule for key 'foo' starting at '"foo"@21' ( line 1 column 22 ) against  data: 2
[ 4:[1, 22]@21 ] member definition: "foo" : integer data: ["foo", 2]
[ 5:[1, 27]@26 ] Evaluating value rule starting at '"integer"@26' ( line 1 column 27 )
[ 5:[1, 27]@26 ] value definition: integer data: 2
[ 5:[1, 27]@26 ] Value evaluation is true
[ 4:[1, 22]@21 ] Member evaluation is true
[ 3:[1, 22]@21 ] Found 1 matching members repetitions in object with min 1 and max 1
[ 3:[1, 22]@21 ] rule repetition min = 1 max = 1 repetition step =
[ 3:[1, 22]@21 ] No member 'baz' found in object.
[ 4:[1, 39]@38 ] Evaluating member rule for key '' starting at '"not"@38' ( line 1 column 39 ) against
[ 4:[1, 39]@38 ] member definition:  @{not} "baz" : string data: [nil, nil]
[ 4:[1, 39]@38 ] Not annotation changing result from false to true
[ 4:[1, 39]@38 ] Member evaluation is true
[ 3:[1, 22]@21 ] Found 1 matching members repetitions in object with min 1 and max 1
[ 3:[1, 22]@21 ] Object evaluation is true
[ 2:[1, 22]@21 ] Object evaluation is true
[ 1:[1, 4]@3 ] Object evaluation is true
Success!

@codalogic
Copy link
Contributor Author

While I agree with the idea that unknown members are ignored, at the moment I don't agree that known members that are superfluous are ignored. I think it comes down to whether | is a choice, in which you can only pick one, of n items, or whether it's an inclusive OR. xs:choice in XML schema would be the former. I think it becomes more of an issue when the data types are objects, such as in:

$message = { 
        "id" : int16, ( "request" : $request | "response" : $response ) }

But I'm happy to leave that aside for the time being and see how things develop.

@anewton1998
Copy link
Contributor

Hmmm... interesting point.

@anewton1998
Copy link
Contributor

anewton1998 commented Aug 30, 2016

I'm trying to tease out of this if this is an issue for arrays, either ordered and unordered, and I don't think so. But that is more a property of the array not allowing extra items than it is being inclusive or exclusive or.

@codalogic
Copy link
Contributor Author

I suppose you could have an array spec something like:

[ integer *, (ipv4 * | ipv6 *), string * ]

You wouldn't expect to allow both ipv4 and ipv6 addresses then.

I'll confess I haven't looked at your array code yet, but my feeling at the moment is that ordered arrays need to be modelled more like regular expressions (except rather than characters you're matching types). I recall that Kernighan and Pike came up with a simple, cut-down RE engine that may be sufficient for this (I think this is it http://www.cs.princeton.edu/courses/archive/spr09/cos333/beautiful.html).

Unordered arrays are probably more like objects with members that have a null member name.

@anewton1998
Copy link
Contributor

anewton1998 commented Aug 30, 2016

I don't get the reference to the regex engine, but I've got a local branch going to work on XOR. So far I've modified object evaluation without too much trouble (minus new tests to prove it out, but all the old tests pass). I suspect that both ordered and unordered arrays would be the same.

And I think you've hit the arrays from the right angle, and my gut feeling is that there won't be much change between ordered and unordered arrays. Let me play and see what I come up with.

That said, it would be great if the choice combiner signifies XOR in all cases (arrays and groups and objects), not just for objects.

@anewton1998
Copy link
Contributor

After some thought and tinkering, I believe xor for objects and unordered arrays are similar. I've got it implemented for objects, and my guess is that it will take little effort for unordered arrays once I have the time to get to it.

That being said, I'm unsure any special care needs to be taken for ordered (i.e. normal) arrays. Or said another way, there is no difference between inclusive or and exclusive or with arrays. And the same is true of primitives.

$thing =: ("foo" | "bar" )

Since a primitive can be only one value, there is no opportunity to have a comparison of two values. And I think the same is true of arrays, since an ordered array is always comparing an item at a specific position. Take the following JSON array:

[ 1, "foo", "fuzz" ]

The item at position 2 is "foo", and there can only be one item at position 2.

Now here is where things get interesting. Does the following rule match against the array above?

$a1 = [ integer , ("foo" * | "bar" * ), string *]'

What about this rule?

$a2 = [ integer , ("foo" + | "bar" + ), string *]

From what I believe (and what my software is telling me), $a1 fails while $a2 does not. And the reason $a1 fails is because "bar" * is true as ZERO or more "bar" are present at position 2, thus failing the XOR condition since both "foo" * and "bar"* are true.

@codalogic
Copy link
Contributor Author

The idea that $a1 = [ integer , ("foo" * | "bar" * ), string *] should always fail seems counter intuitive to me, even though I'm not able to disagree with your logic. I guess what they'd really be trying to say is $a1 = [ integer , ("foo" + | "bar" + ) ?, string *]. My worry is that people will be surprised and confused when the former always fails. It would be nice if we could tweak the rules so that it didn't, but I'm not sure how to at the moment.

Maybe we need some sort of tri-state logic involving false, soft-true and hard-true. A thing that is absent, and can be absent is a soft-true, whereas something that is present and is permitted to be present is a hard-true. Using ! to represent only one can be true we'd have a logic table (which could be iteratively applied to expressions like a ! b ! c ! d) like:

false ! hard-true = hard-true
hard-true ! soft-true = hard-true
hard-true ! hard-true = false (and shortcut exit)
soft-true ! soft-true = soft-true
false ! soft-true = soft-true
false ! false = false

A final result of false would mean validation failed, but a result of either soft-true or hard-true would mean validation passed.

Any comp sci whizzes out there that can comment?!

@johnwcowan
Copy link

On Thu, Sep 1, 2016 at 8:56 AM, Andrew Newton [email protected]
wrote:

Take the following JSON array:

[ 1, "foo", "fuzz" ]

The item at position 2 is "foo", and there can only be one item at
position 2.

Now here is where things get interesting. Does the following rule match
against the array above?

$a1 = [ integer , ("foo" * | "bar" * ), string *]'

What about this rule?

$a2 = [ integer , ("foo" + | "bar" + ), string *]

From what I believe (and what my software is telling me), $a1 fails while
$a2 does not.

I think that's a very strong counterexample to the notion that | (choice)
is XOR rather than IOR. Using the (IMO correct) IOR interpretation, a1
will succeed, because it says "an integer, forllowed by a choice between
zero or more foos and zero or more bars, followed by zero or more strings.
Your example succeeds because it has zero or more foos (in fact, just one),
so the choice is satisfied. Then the final string * picks up the fuzz, and
that's that. Indeed, $a1 is equivalent to [integer, string *].

$a2 succeeds because at least one branch of the choice succeeds: after the
integer, there is one or more foo matching foo+, and then buzz matching
string*.

But both sides can also succeed. If the test string is [13, "buzz"], then
$a1 succeeds because ia choice succeeds if either branch succeeds (IOR).
That is, foo* succeeds because there are no foos, and bar* succeeds because
there are no bars. $a2 however would fail on this test string, because it
requires either a foo or a bar and gets neither.

This is how string regular expressions behave, and we should not violate
the expectation that our regular expressions applied to seuences (arrays or
parts of arrays) will behave exactlly the same way.

@anewton1998
Copy link
Contributor

anewton1998 commented Sep 2, 2016

I agree that regular expressions are more like IOR. See this example:
http://rubular.com/r/6tacOwEuWo

That said, I'm not sure regular expressions are a good ideal as lots of people hate them. Nor do I think following their guidance is particularly helpful here. Let me explain.

If what I mean by ("bar"*|"foo"*) is that I am expecting either ["foo","foo"] or ["bar","bar"] then neither IOR nor XOR deliver:

rule: [ "foo" * | "bar" * ]

example Expected IOR XOR
["foo", "foo" ] yes true false
["foo", "bar" ] no true false
["bar", "bar" ] yes true false

In other words, IOR yields false positives and XOR yields false negatives.

Unfortunately, that leaves us with two options: 1) delve into tri-state logic as @codalogic has hinted at, or 2) simply spell out the pitfall in the spec so people understand the issue.

I'm not a fan of option 1 at present. Therefore I say we go with option 2, and that for the sake of the spec all choice combiners are XOR being that XOR makes sense for objects and unordered arrays and there is no good OR for ordered arrays other than to be consistent.

@anewton1998
Copy link
Contributor

BTW, not that I've thought this through but instead of 3 state logic, perhaps we can add a modifier to the repetition that indicates exclusivity. As in *! or +! which would mean zero or more exclusively or one or more exclusively. Just a thought.

@johnwcowan
Copy link

On Fri, Sep 2, 2016 at 11:18 AM, Andrew Newton [email protected]
wrote:

If what I mean by ("bar"|"foo") is that I am expecting either
["foo","foo"] or ["bar","bar"] then neither IOR nor XOR deliver:

rule: [ "foo" * | "bar" * ]
example Expected IOR XOR
["foo", "foo" ] yes true true
["foo", "bar" ] no true false
["bar", "bar" ] yes true true

I think we must be talking past each other. The rule reads "any number of
foos, including zero, or else any number of bars, including zero" when I
read it. So ["foo","foo"] satisfies the rule, as does ["bar","bar"], as
does [](because it is IOR, not XOR). But ["foo","bar"] cannot satisfy it,
because once you have a "foo" you are committed to all the rest (if any)
being "foo". This is exactly your "expected" column.

So I do not understand what you mean by IOR and XOR, because it does not
seem to be what I mean. I mean by IOR that either or both sides of a choice
are satisfied, and by XOR that either (but not both) sides of a choice are
satisfied. The precedents are for IOR, and I prefer IOR.

John Cowan http://www.ccil.org/~cowan [email protected]
And through this revolting graveyard of the universe the muffled,
maddening beating of drums, and thin, monotonous whine of blasphemous
flutes from inconceivable, unlighted chambers beyond Time; the
detestable pounding and piping whereunto dance slowly, awkwardly, and
absurdly the gigantic tenebrous ultimate gods --the blind, voiceless,
mindless gargoyles whose soul is Nyarlathotep. (Lovecraft)

@anewton1998
Copy link
Contributor

Sorry, I was fighting with the markdown syntax for table and made a mistake. I subsequently fixed it but that probably did not come through in email. Here it is with the empty case:

rule: [ "foo" * | "bar" * ]

example Expected IOR XOR
["foo", "foo" ] yes true false
["foo", "bar" ] no true false
["bar", "bar" ] yes true false
[ ] ?? true false

I see your point, but I need to think about this because it appears that with IOR getting around the mixed case and the empty case take a lot more work than with XOR.

@codalogic
Copy link
Contributor Author

I knocked up the following Ruby:

def headings()
    print "i(f*|b*)s  i(?:f+|b+)s\n"
end

def evaluate( ip )
    print (/i(f*|b*)s/ =~ ip) ? '  true     ' : '  false    '
    print (/i(f+|b+)s/ =~ ip) ? '  true     ' : '  false    '
    print " : #{ip}\n"
end

headings
evaluate 'is'
evaluate 'iffs'
evaluate 'ibbs'
evaluate 'ifbs'

and got the following results:

i(f*|b*)s  i(?:f+|b+)s
  true       false     : is
  true       true      : iffs
  true       true      : ibbs
  false      false     : ifbs

Significantly (for me anyway :-) ), is yields true for i(f*|b*)s.

(Sorry must dash - people coming!!!)

@codalogic
Copy link
Contributor Author

Thinking some more over the weekend, I think we can proceed by having two flags that record the result of evaluating a branch of a choice: non-absent match and absent match. As we go from branch to branch, multiple matches by virtue of being absent are permitted, but only one non-absent match is permitted (e.g. short-cut on second non-absent match found).

In spec language we could say something like:

Branches of a choice may permit the absence of content by means of their repetition property. For example [ integer * | string * ]. If content is present that matches a branch of the choice, then only one branch of the choice may match it. If content is absent, then each branch that permits absent content is permitted to match it.

Effective XOR for content that is present, and IOR for content that is absent.

@anewton1998
Copy link
Contributor

Your example was an array. Does this apply to objects and unordered arrays as well?

@codalogic
Copy link
Contributor Author

I think so. If we have:

{ ("foo" : string ? | "bar" : integer ?), "baz" : integer }

I think the following should be valid:

{ "baz" : 12 }
{ "baz" : 12, "foo" : "wibble" }
{ "bar" : 11, "baz" : 12 }

But not:

{ "baz" : 12, "bar" : 11, "foo" : "wibble" }
{ "baz" : 12, "foo" : 13 }      // Wrong type for foo

Does that look reasonable?

@johnwcowan
Copy link

On Mon, Sep 5, 2016 at 6:49 AM, Pete Cordell [email protected]
wrote:

Branches of a choice may permit the absence of content by means of their
repetition property. For example [ integer * | string * ]. If content is
present that matches a branch of the choice, then only one branch of the
choice may match it. If content is absent, then each branch that permits
absent content is permitted to match it.

That would mean that ["foo", "bar"] is not equivalent to ["foo", bar" |
"foo", "bar"], as the former matches a foo followed by a bar, and the
latter matches nothing (as it must match both branches or neither).

I do not understand the point of all this special pleading. Why not simply
use the same rules as all other regular expression engines, namely that a
choice matches if either or both of its arms match? No one else (Posix,
Perl-compatible, XSD, RNG, etc.) has a problem with
this. What problem are we trying to solve by doing things differently?

John Cowan http://vrici.lojban.org/~cowan [email protected]
A witness cannot give evidence of his age unless he can remember being born.
--Judge Blagden

@codalogic
Copy link
Contributor Author

I do not understand the point of all this special pleading. Why not simply use the same rules as all other regular expression engines, namely that a choice matches if either or both of its arms match? No one else (Posix, Perl-compatible, XSD, RNG, etc.) has a problem with this. What problem are we trying to solve by doing things differently?

I disagree with that. If I have the regular expression i(f*|b*)s then it does not match the string ifbs. It's that behaviour I think we should replicate, because that's what people would expect.

Part of the problem is that REs are looking for a path across a sequence of token. So the expression above would be represented in a state machine as:

 /---f*---\
i         s
 \---b*---/

whereas at the moment we're trying to replicate that behaviour by looking for presence or absence on a token level.

@johnwcowan
Copy link

On Mon, Sep 5, 2016 at 1:28 PM, Pete Cordell [email protected]
wrote:

I do not understand the point of all this special pleading. Why not simply
use the same rules as all other regular expression engines, namely that a
choice matches if either or both of its arms match? No one else (Posix,
Perl-compatible, XSD, RNG, etc.) has a problem with this. What problem are
we trying to solve by doing things differently?

I disagree with that. If I have the regular expression i(f_|b_)s then it
does not match the string ifbs. It's that behaviour I think we should
replicate, because that's what people would expect.

But that is the behavior I describe. In the case of "ifbs", neither the
left branch, which says "any number of fs (including zero) between the i
and s", nor the right branch, which says "any number of bs (including zero)
between the i and s" is satisfied by the substring "fb". On the left
branch, we get an "f", which is fine, but then we get a bogus "b" that
forces failure. On the right branch, the "b" might be bully, but there is
a foreign "f" there barring the way.

In other words, the choice is satisfied if either the left or the right
branch or both match, but in this case, neither arm matches so the choice
fails. "And I am right and you are right and all is right as right can be!"

@codalogic
Copy link
Contributor Author

OK, I now understand what you mean by 'both match'. I was less bothered about patterns such as ( ("foo", "bar") | ("foo", "bar") ), but patterns like ( ("foo", "bar"?) | ("foo", "baz"?) ) would be more problematic.

Where I think we have a problem is as follows. If we had:

{ "i":int8, ("f":int8 | "b":int8), "s":int8 }

should the following JSON be valid:

{ "b":1, "i":1, "s":1, "f":1 }

?

I believe not.

With IOR, a parser could see "f" present and "b" present and declare the choice valid. XOR was an attempt to say that if "f" was present, then "b" must not be.

I think this is a key difference between ordered and unordered constructs.

With ordered constructs (e.g. [ "i", ("f" | "b"), "s" ]) we're saying type i followed by type f or b followed by type s. It effectively describes the set of valid 'paths' through valid JSON input. This is similar to regular expressions like i(f|b)s which is saying character i followed by character f or b followed by character s. So, belatedly answering #88 (comment) , this is why I have been looking at regular expression logic for ordered constructs.

Unordered doesn't seem to map to regular expressions because there's no followed by concept. I'm also not sure that we could 'canonicalise' an arbitrary JSON input order into one true order to which we could apply a regular expression type logic. We also want to allow for unknown items. Hence we're falling back to logic like you might find in an if statement.

In the case of unordered, to get the result I want, I think when we get:

{ "i":int8, ("f":int8 | "b":int8), "s":int8 }

we have to modify it to:

{ "i":int8, ("f":int8, "b":int8 *0 | "b":int8, "f":int8 *0), "s":int8 }

Then we can use IOR logic.

I would even go further and say, at least as a reference implementation (there maybe more efficient ways to do it), we flatten the entire expression into a single choice of sub-sequences. So we'd end up with:

{ "i":int8, "f":int8, "s":int8, "b":int8 *0 | "i":int8, "b":int8, "s":int8, "f":int8 *0 }

Then if any branch is true then the JSON input is considered valid.

So to compute that, we could do the following steps:

  1. Compute the union of all members, e.g. "i":int8 - "f":int8 - "b":int8 - "s":int8

  2. Multiply out the expression so we flatten it to a choice of sub-sequences (employing the idea that OR is like + in normal maths, and AND is like *). For the example this yields:

    { "i":int8, "f":int8, "s":int8 | "i":int8, "b":int8, "s":int8 }

  3. Augment each sub-sequence with the members from the union of members that are not present in the sub-sequence, (noting that their type is irrelevant if we want to prevent it), setting their permitted repetition to 0; yielding:

    { "i":int8, "f":int8, "s":int8, "b":any *0 | "i":int8, "b":int8, "s":int8, "f":any *0 }

  4. We test the occurrences and types of what appears in the JSON input with what is permitted by each sub-sequence. If any sub-sequence yields true then the JSON instance is valid.

The following would be considered valid:

{ "i":1, "f":1, "s":1 }
{ "f":1, "i":1, "s":1 }    // Order is not significant
{ "i":1, "b":1, "s":1 }
{ "i":1, "b":1, "s":1, "x":1 }   // "x" is unknown so ignored

The following would be considered invalid:

{ "b":1, "i":1, "s":1, "f":1 }   // Both "f" and "b" can't be present
{ "b":"z", "i":1, "s":1 }       // "b" has the wrong type

@codalogic
Copy link
Contributor Author

This may not help a lot, but this is how XML schema describes the validation of model groups (aka sequences, choice, all). JSON's unordered objects are more like xs:all but xs:all can't accept choices as content (unless they changed it), so there's no option to cut-and-paste!

https://www.w3.org/TR/xmlschema11-1/#group-recognition

@anewton1998
Copy link
Contributor

I have to say, I think I'm lost. I thought we were all good with XOR logic on objects and unordered arrays. The issue, I think, is with arrays. Do we have a basic agreement at least on this?

@codalogic
Copy link
Contributor Author

XOR was looking good until John pointed out that rules like:

{ ( "a":int8, "b": int8 ?) | ("a":int8, "c":int8 ? ) }

would fail to validate the following JSON:

{ "a":12 }

because both legs of the choice would evaluate to true leading to a declared invalid result.

Granted, you could re-write the above to:

{ "a" : int8, ( "b":int8? | "c":int8? ) }

and have the input validate correctly, but it seems incorrect to have things that in Boolean logic terms are equivalent yield a different result depending on how you write it out.

Hence the proposal of Plan C, to flatten out all rules to a choice of sub-sequences. One benefit of this is there's no chance of getting different behaviour depending on how you write the Boolean expression because it's always reduced to a common form first.

@anewton1998
Copy link
Contributor

anewton1998 commented Sep 6, 2016

So if I understand this, we're back where we started other than to explicitly state that the choice combiner means IOR. Am I right about this?

The problem with Plan C, IMHO, is that it is too complicated to explain to JCR readers. Somebody troubleshooting why their JCR isn't working the way they want will probably be lost in the flattening rules. It seems to me that it is easier to say: 1) read the rules from left to right, 2) if one matches then "bingo!", 3) you must take into consideration zero-length matches (which, if they are familiar with regular expressions, they probably are already doing).

@johnwcowan
Copy link

On Tue, Sep 6, 2016 at 5:53 PM, Andrew Newton [email protected]
wrote:

The problem with Plan C, IMHO, is that it is too complicated to explain to
JCR readers. Somebody troubleshooting why their JCR isn't working the way
they want will probably be lost in the flattening rules.

Agreed. I haven't thought enough about objects, still less unordered
arrays, to make up my mind.

Are we trying to cater for JSON objects with non-unique keys? Original
JSON doesn't actually forbid them, but lots of JSON libraries don't handle
them well: one of the duplicate keys gets dropped. If we can ignore that
case, then I think RELAX NG's treatment of attributes will be a usable
model for objects.

Let me know.

John Cowan http://vrici.lojban.org/~cowan [email protected]
If [Tim Berners-Lee] has seen farther than others,
it is because he is standing on a stack of dwarves. --Mike Champion

@anewton1998
Copy link
Contributor

On Tue, Sep 6, 2016 at 8:26 PM, johnwcowan [email protected] wrote:

Agreed. I haven't thought enough about objects, still less unordered
arrays, to make up my mind.

Are we trying to cater for JSON objects with non-unique keys? Original
JSON doesn't actually forbid them, but lots of JSON libraries don't handle
them well: one of the duplicate keys gets dropped. If we can ignore that
case, then I think RELAX NG's treatment of attributes will be a usable
model for objects.

Let me know.

As far as I can tell, the model is similar if not the same. We are not
testing for order of JSON members in objects, nor are we making allowances
for duplicate keys. Though we should make that last item explicit in the
draft.

@codalogic
Copy link
Contributor Author

Yes, it would be useful to hear how RNG does it.

The only non-unique keys like aspect in JCR is when a regular expression is used to specify member names (e.g. /^p\d$/ : string *). The RNG rules might need to be extended to cater for this.

@codalogic
Copy link
Contributor Author

In principle the multiplying out / flattening is just the same as how you'd multiply out maths expressions. So (where $a etc are something like $a = "a":integer):

{ $a, ($b | $c), ($d | $d) }

is analogous to:

a x (b + c) x (d + e)

which becomes:

abd + abe + acd + ace

or:

{ ($a,$b,$d) | ($a,$b,$e) | ($a,$c,$d) | ($a,$c,$e) }

One problem I identified over night is how do you multiply out groups that have their own repetition count in addition to repetition counts on members within the group, e.g.:

{ $a, ($b, $c? | $d)? }

I don't think it's insurmountable though. It might look like:

$a, ($b, $c ?) ? | $a, ($d)?

But if RNG delivers what we need I'm all for that.

@anewton1998
Copy link
Contributor

Relax NG's specification for choice is here: http://www.relaxng.org/spec-20011203.html#choice-pattern
I'm not versed enough in the mathematical notation to understand it, but from what I can tell it is xor. Nor can I tell how it treats unknown attributes.

In rereading this thread, I can see that we have found cases where IOR and XOR have different utility with objects, and XOR is less useful with arrays. Here are some options:

  1. Simply use IOR all the time.
  2. Use IOR for arrays and XOR for objects and unordered arrays. I'm not particularly fond of this option as I think it will lead to confusion with the same symbol meaning different things and different times.
  3. Introduce another combiner symbol to mean XOR (such \) or maybe the double pipe (||). This puts the decision in the hands of the user, who is closer to the problem being solved. Also since we previously noted that mixing AND and OR is a recipe for insanity, we should probably follow that rule here as well a it just keeps things simple.
  4. Do 1 now, and 3 in a later specification.

@codalogic
Copy link
Contributor Author

I think it's saying that <choice> p1 p2 </choice> can validate p1 or p2 but not (by inference of the absence of a rule) p1 p2.

Given both of you are not happy with flattening the entire expression, I propose skipping that step, but still 'augmenting' the branches of the choice so that all of them contain the union of all members mentioned in the entire choice, annotating the ones that are added by the augmentation process as 'must be absent'.

So:

{ ( "a":int8, "b": int8 ?) | ("a":int8, "c":int8 ? ) } 

is augmented and processed as:

{ ( "a":int8, "b": int8 ?, @{not} "c":any) | ("a":int8, "c":int8 ?, @{not}b:any ) }

Then do IOR.

That allows:

{ "a":12 }
{ "a":12, "b":1 }
{ "a":12, "c":2 }
{ "a":12, "d":"blah" }

But not:

{ "a":12, "b":1, "c":2 }

Which I think is reasonable.

@anewton1998
Copy link
Contributor

{ ( "a":int8, "b": int8 ?) | ("a":int8, "c":int8 ? ) }

It seems to me people would more instinctively write that as

{ ( "a":int8, ( "b": int8 || "c": int8) ? } ; assumes || is xor

Also when reading the first rule, are we telling people the | means IOR or XOR? If we say IOR but they get XOR behavior that would be confusing. And would the behavior be the same for arrays?

@codalogic
Copy link
Contributor Author

While { ( "a":int8, ( "b": int8 || "c": int8) ? } might be the more logically reduced form, I think they should also be able to write { ( "a":int8, "b": int8 ?) | ("a":int8, "c":int8 ? ) } and get the same result. (It might better document what they are trying to do for example.)

I suggest we treat the reduced for as:

{ ( "a":int8, ( "b": int8, @{not}"c":any | "c": int8, @{not} "b":any) ? }

and then do IOR of the branches.

Also when reading the first rule, are we telling people the | means IOR or XOR? If we say IOR but they get XOR behavior that would be confusing.

If we tell people that | means choice and we say we implement it by augmenting the branches with any terms that are absent, and then saying the result is valid if any branch yields true then we should be OK.

And would the behavior be the same for arrays?

IMO objects and unordered arrays should be treated similarly. They're concerned by how often a member appears in the instance, and what follows what is not important. Ordered arrays need to capture what follows what, so we're following a path through the JSON instance and the above mechanism doesn't support that.

@anewton1998
Copy link
Contributor

I've been playing with your idea with examples on my whiteboard so that I can truly understand it, and going back to the examples you have given. That has been instructive for me.

I think it is interesting that it acts like IOR in some cases and XOR in other cases, but ultimately I feel this is also its chief problem with respect to readability. As a user of JCR, to truly understand how the choice acts I have to rewrite the rules, and that is not a process I think most users will find natural or easy. By contrast, I think most people will find it to be easier to apply their existing knowledge of grouping and IOR and XOR to read and understand a rule. To me, this is similar to the regular expression test you provided in this thread: my reading of the regular expression did not match some of the results and it is not obvious to me why. I had to do some reading on zero-length matching and longest matches, etc... yesterday to try to understand it.

I'm also somewhat concerned about the implementation. Supporting AND and IOR requires a simple loop through the rules with the ability to short circuit that loop. XOR requires a tad more work and requires at least two matches to short circuit, but all three can be done in the same loop (I'm sure there are other ways to implement it). To support the combined IOR/XOR logic, there needs to be a rule rewriting phase first then a run through the AND and IOR loop. This is more complex to implement and requires more runtime (I'm more concerned about the former than the latter to be honest).

@codalogic
Copy link
Contributor Author

I think it's appropriate to differentiate between 'casual' users and implementers. The story you tell each one can be slightly different.

So for users you can say something like:


The | symbol represents a choice. If the solution space of the expressions in the left and right branches do not overlap (see below), then only the left or the right, but not both, branch can be true. For example, given the choice:

{ "a":int8 | "b":int8 }

then the following JSON instances are valid:

{ "a" : 1 }
{ "b" : 1 }

The following is invalid:

{ "a" : 1, "b" : 1 }

For a rule where the solution space of the branches do overlap, e.g:

{ ( "a":int8, "b":int8 ?) | ("a":int8 ?, "b": int8) }

then the following JSON instances are valid:

{ "a":1 }
{ "b":1 }
{ "a":1, "b":1 }

Note that in the case of { "a":1, "b":1 } both branches are true and as such, the solution space is said to overlap. Appendix ??? describes a reference algorithm for how to implement this behavior.


Then in Appendix ??? we go into the detail of augmentation etc.

Regardless of what solution we adopt I think we should describe it as above. The casual user can then just rely on the examples, and draw on their experiences of regular expressions, XML schema, whatever. Only if they get a result they don't expect would they have to delve into the logic of the reference implementation. The less likely they are to get a surprising result, the less likely they are going to need to look into the detail.

@anewton1998
Copy link
Contributor

That's very good prose for describing the basics to a user. The actual specification I think is that each branch of a choice must contain the negation of the relative complement (set theory definition) of the rules of that branch with respect to the rules of each other branch, and the rules of these relative complements must have type any.

That being said, I'm worried that we are creating a problem. Call it a gut instinct, but I worry because I cannot find any other schema language or the like that has solved this problem in a such a way, and I'm not a mathematician or familiar enough with set theory to prove that it will not be a problem. But I have been tinkering, and...

Consider the following, where I'm using $ rule names even in the JSON, and that | is the CHOICE we are describing and |- is pure IOR and ! for @{not}.

{ ( $a, ( $b | $c ) ) | ( $a, ( $b | $d ) ) }

The expectation being that these JSON objects are valid:

{ $a, $b }
{ $a, $c }
{ $a, $d }

The first step is to reduce the inner most choices to IOR with relative complements:

{ ( $a, ( ( $b, !$c ) |- ( $c, !$b ) ) ) | ( $a, ( ( $b, !$d ) |- ( $d, !$b ) ) ) }

Then repeat on the outer most choice:

{ ( $a, ( ( $b, !$c ) |- ( $c, ! $b ), !( ( $b, !$d ) |- ( $d, !$b ) ) ) |- ( $a, ( ( $b, !$d ) |- ( $d, !$b ) ), !( ( $b, !$c ) |- ( $c, !$b ) ) ) }

From what I can tell, { $a, $b } will not validate because on the left side of the IOR !( ( $b, !$d ) |- ( $d, !$b ) ) because ( $b, !$d ) passes but is then negated. And on the right side of the IOR we have a similar problem.

@anewton1998
Copy link
Contributor

The other thing that gives me pause is the example from yesterday:

$a = { ( "a":int8, "b": int8 ?) | ("a":int8, "c":int8 ? ) }

Another way to write this is:

$b = ( { "a":int8, "b": int8 ? } | { "a":int8, "c":int8 ? } )

Given { "a":1, "b", 2, "c": 3 }, it is invalid for $a, assuming | is CHOICE, but valid for $b no matter if | is pure IOR or pure XOR.

@codalogic
Copy link
Contributor Author

codalogic commented Sep 8, 2016

I was proposing only augmenting with members that hadn't already been referred to in a branch, not sub-expressions. So I agree with your first expansion. For the second expansion I get:

{ ( $a, ( ($b, !$c) | ($c, !$b) ), !$d ) | ( $a, ( ($b, !$d) | ($d, !$b) ), !$c ) }

So in that case I think all of the following are valid, as desired:

{ $a, $b }
{ $a, $c }
{ $a, $d }

But the following would be invalid:

{ $a, $b, $c, $d }
{ $a, $c, $d }
{ $a, $b, $c }
{ $a, $b, $d }

If we just did IOR, without augmentation, then the following would also be considered valid:

{ $a, $b, $c }
{ $a, $b, $d }

which just seems wrong to me :-)

@codalogic
Copy link
Contributor Author

Regarding #88 (comment) , IMO $a and $b are very different constructs and you wouldn't expect them to be isomorphic (or whatever the right word is).

@anewton1998
Copy link
Contributor

WRT to the separate constructs, I agree but it is also in the class of things that should seem possible which is why we are discussing this new OR feature (we really need a name for it, like MERGE OR or MOR). It is to highlight this question: What is the behavior of | in groups contained in object vs arrays vs unordered arrays, vs groups containing objects, etc etc?

WRT the subexpression exercise, I am now more confused. You said

I was proposing only augmenting with members that hadn't already been referred to in a branch, not sub-expressions.

Yet $d is in a sub-expression. Actually, its two levels down in the object expression.

Additionally, if the operation only looks at the top OR, then there is no expansion of the sub-expressions, and so the resulting final expression is:

{ ( $a, ( $b | $c ), !$d ) | ( $a, ( $b | $d ), !$c ) }

I don't think that's what we want as { $a, $b ,$c } would be valid.

On the other hand, if we allowed for distinctions between the OR types so that XOR is allowed then the original expression could be written as (assuming |+ is XOR, |- is IOR, and | is MOR):

{ ( $a, ( $b |+ $c ) ) | ( $a, ( $b |+ $d ) ) }

This would then get rewritten as

{ ( $a, ( $b |+ $c ), !$d ) |- ( $a, ( $b |+ $d ), !$c ) }

This would validate:

{ $a, $b }
{ $a, $c }
{ $a, $d }

but not

{ $a, $b, $c }
{ $a, $c, $d }
{ $a, $b, $d }

@codalogic
Copy link
Contributor Author

When I look at:

{ ( $a, ( $b | $c ) ) | ( $a, ( $b | $d ) ) }

there are 3 | expressions:

( $b | $c )
( $b | $d )
( $a, ( $b | $c ) ) | ( $a, ( $b | $d ) )

Starting with the first one, the union of all named members is:

$b and $c

If any of the union of members do not appear in both branches, they are inserted as @{not} member. So for the first | we get:

( ($b, !$c) | ($c, !$b) )

Similarly for the second | we end up with:

( ($b, !$d) | ($d, !$b) )

Substituting these into the third and final | we get:

    ( $a, ( ($b, !$c) | ($c, !$b) ) ) | ( $a, ( ($b, !$d) | ($d, !$b) ) )

The union of members is:

$a + $b + $c + $d

The left branch is missing $d, the right branch is missing $c. So the two branches become:

( $a, ( ($b, !$c) | ($c, !$b) ), !$d )
( $a, ( ($b, !$d) | ($d, !$b) ), !$c )

which leads to:

{ ( $a, ( ($b, !$c) | ($c, !$b) ), !$d ) | ( $a, ( ($b, !$d) | ($d, !$b) ), !$c ) }

I'll call it Augmented OR or AOR for the time being.

But let's just go with IOR for the time being and I'll try to write some code to see how AOR works out in a bit more depth.

(We should probably change the names of choice-combiner to or-combiner and sequence-combiner to and-combiner as well.)

@anewton1998
Copy link
Contributor

Oddly, I just worked that out on my whiteboard and was about to comment "nevermind". :)

I believe the same works for unordered arrays. The only difference between the two algorithms is that for objects the augmented rules are always of type 'any' whereas with unordered arrays the type doesn't change (because the type is the item sought, whereas in objects the member name is being sought though a "match" also considers the type). Correct?

But let's just go with IOR for the time being and I'll try to write some code to see how AOR works out in a bit more depth.

By this do you mean let's proceed with the current spec as is, and update it to AOR in the next revision? If so, I'm in agreement (though I would like to add a note to -07 hinting at this). And I too want to see if i can code this up now that I feel I understand the algorithm more completely.

@codalogic
Copy link
Contributor Author

Oddly, I just worked that out on my whiteboard and was about to comment "nevermind". :)

Hopefully it seems a lot simpler than you initially thought :)

I believe the same works for unordered arrays. The only difference between the two algorithms is that for objects the augmented rules are always of type 'any' whereas with unordered arrays the type doesn't change (because the type is the item sought, whereas in objects the member name is being sought though a "match" also considers the type). Correct?

That's exactly my thinking too. ORDERED arrays on the other hand need something else IMO.

But let's just go with IOR for the time being and I'll try to write some code to see how AOR works out in a bit more depth.

By this do you mean let's proceed with the current spec as is, and update it to AOR in the next revision?

Yes. Hopefully by then we'll be more confident with the algorithm (or not!), and maybe more community feedback on the expected behavior.

@anewton1998
Copy link
Contributor

Opening up my code editor this afternoon, some questions came to me on the AOR algorithm. I think I know the answers, but wanted to confirm them.

Question 1: a @{not} in the choice

{ $a | $b | !$c }

produces the following rewrite

{ ($a, !$b, !$c) | ($b, !$a, !$c) | (!$c, !$a, !$b) }

correct?

Question 2: like question 1 but with repeat max of 0

{ $a | $b *0 }

produces the following rewrite

{ ($a, !$b} | ($b *0, !$a ) }

correct?

question 3: I think you answered this before, but what about possible repetition

{ $a | $b ? }

produces the following rewrite

{ ($a, !$b} | ($b?, !$a) }

correct?

For 2 and three, the algorithm is essentially the same: don't copy over the repetition into complement set. correct?

question 4: with this last one I need to revert to straight JCR syntax

{ "a":int8 | "a": [ int8 ] }

would NOT produce this

{ ("a":int8 , @{not} "a": any) | ( "a":[int8], @{not} "a": any) }

essentially, there would be no rewrite for this case because we determine what is in common based solely on the member name or regex. And if it is a regex, there is no regex normalization for the matching, right? In other words, there is no attempt to match /foo/ to /^foo$/.

question 5: like question 4 but for unordered arrays

@{unordered} [ string, int8 | [ int8] ) ]

produces the rewrite

@{unordered} [ string, (( int8, @{not} [int8] | [int8], @{not} int8 )) ]

correct? More to the point for unordered arrays, the type definitions in the complement are used "as-is". In other words, there's no attempt to say [ int8 ?] and [ int8 *1..0] are the same thing or that @{unordered} [ string, int8 ] and @{unordered} [ int8, string ] are the same thing. Just curious.

@codalogic
Copy link
Contributor Author

Opening up my code editor this afternoon, some questions came to me on
the AOR algorithm. I think I know the answers, but wanted to confirm them.

Question 1: a |@{not}| in the choice

|{ $a | $b | !$c } |

produces the following rewrite

|{ ($a, !$b, !$c) | ($b, !$a, !$c) | (!$c, !$a, !$b) } |

correct?

Question 2: like question 1 but with repeat max of 0

|{ $a | $b *0 } |

produces the following rewrite

|{ ($a, !$b} | ($b *0, !$a ) } |

correct?

Both | !$c and |$b*0 seem like corner cases to me. Unless you can think of a scenario where that construct is really helpful, I'd say we can specify whatever behavior we like. In this case, whatever makes AOR simpler; which is as you have put above.

question 3: I think you answered this before, but what about possible
repetition

|{ $a | $b ? } |

produces the following rewrite

|{ ($a, !$b} | ($b?, !$a) } |

correct?

Agreed.

For 2 and three, the algorithm is essentially the same: don't copy over
the repetition into complement set. correct?

Agreed.

question 4: with this last one I need to revert to straight JCR syntax

|{ "a":int8 | "a": [ int8 ] } |

would NOT produce this

|{ ("a":int8 , @{not} "a": any) | ( "a":[int8], @{not} "a": any) } |

essentially, there would be no rewrite for this case because we
determine what is in common based solely on the member name or regex.

Agreed

And if it is a regex, there is no regex normalization for the matching,
right? In other words, there is no attempt to match |/foo/| to |/^foo$/|.

Agreed. Treat the regex as an opaque string. A case of "user beware". So the following would not match anything, but that's the coder's fault, not ours:

{ /^p-\d+$/ : integer | /^p-[0-9]+$/ : integer }

question 5: like question 4 but for unordered arrays

|@{unordered} [ string, int8 | [ int8] ) ] |

produces the rewrite

|@{unordered} [ string, (( int8, @{not} [int8] | [int8], @{not} int8 )) ] |

correct?

Agreed

More to the point for unordered arrays, the type definitions in
the complement are used "as-is". In other words, there's no attempt to
say |[ int8 ?]| and |[ int8 *1..0]| are the same thing

I guess you mean if you have something like:

[ [int8 ?] | [int8 *0..1] ]

I need to think some more on choices of arrays and objects. It could be as simple as saying there's an array on both sides (don't care about the content), so no re-writing is required, and if any branch is true then its valid. Would be good to come up with some more use-cases, especially for objects.

or that
|@{unordered} [ string, int8 ]| and |@{unordered} [ int8, string ]| are
the same thing. Just curious.

@anewton1998
Copy link
Contributor

Ok. I'll attack objects with the above first. And the see about unordered arrays after that.

@anewton1998
Copy link
Contributor

I'm trying to swap all this back in and start coding it.
In an attempt to understand things again, I've written https://github.com/arineng/jcrvalidator/blob/aor/lib/jcr/rewrite_aor.rb

But here is the relevant part. I would appreciate verification that I'm at least on the right track.

As of right now, AOR only applies to objects.
It is described in this GitHub issue: #88
See specifically

Here is a simple AOR to IOR example:

{ "a":string | "b":string }

converts to

{ ( "a":string, @{not}"b":any ) | ( @{not}"a":any, "b":string ) }

AOR acts as an Exclusive OR (XOR) in this simple case.

Here is a more complicated example:

{ ( "a":string, "c":int8 ) | ( "b":string, "c":int8 ) }

converts to

{ ( "a":string, "c":int8, @{not}"b":any ) | ( @{not}"a":any, "b":string, "c":int8 ) }

Here the valid JSON structures are:

{ "a":"foo", "c":1 }
{ "b":"bar", "c":1 }

but this is invalid

{ "a":"foo", "b":"bar", "c":1 }

Now let's get more complicated by throwing in multiple levels of AOR:

{ ( ( "a":string, "c":int8 ) | ( "b":string, "c":int8 ) ) | "d":int8 }

coverts to

{ ( ( "a":string, "c":int8, @{not}"b":any ) | ( @{not}"a":any, "b":string, "c":int8 ) ), @{not}"d":any |
  @{not}( ( "a":string, "c":int8, @{not}"b":any ) | ( @{not}"a":any, "b":string, "c":int8 ) ), "d":any   }

where the following is valid

{ "a":"foo", "c":1 }
{ "b":"bar", "c":1 }
{ "d":2 }

but the following is not

{ "a":"foo", "b":"bar", "c":1 }
{ "a":"foo", "c":1, "d":2 }
{ "b":"bar", "c":1, "d":2 }

Now for an even more complicated multi-level example of AORs
(don't try this at home, consult a physician before reading):

{ ( ( "a":string, "c":int8 ) | ( "b":string, "c":int8 ) ) | ( "a":string, "d":int8 ) }

coverts to

{ ( ( "a":string, "c":int8, @{not}"b":any ) | ( @{not}"a":any, "b":string, "c":int8 ) ), @{not}( "a":string, "d":any ) |
  @{not}( ( "a":string, "c":int8, @{not}"b":any ) | ( @{not}"a":any, "b":string, "c":int8 ) ), ( "a":string, "d":any )   }

where the following is valid

{ "a":"foo", "c":1 }
{ "b":"bar", "c":1 }
{ "a":"foo", "d":2 }

but the following is not

{ "a":"foo", "b":"bar", "c":1 }
{ "a":"foo", "c":1, "d":2 }
{ "b":"bar", "c":1, "d":2 }
{ "a":"foo", "b":"bar", "d":2 }

Given that the above is true, the following algorithm should be used in the rewrite:

  1. traverse the tree for objects (will be going from top to bottom because the entry points for the tree are root rules)
  2. when an object is found, traverse to the lowest precedent OR and rewrite the AOR as an IOR
  3. after rewrite, go up to the next highest OR and traverse down the other side finding ORs to rewrite
  4. once all child ORs of a higher precedent OR are found, then it can be rewritten

This is rewriting the rules from the bottom to the top.

The AOR to IOR rewrite is this:

  1. Find all member rules on the left side of the OR that are not on the right side of the OR and consider them set A
    1a. where "find" means match only on the member name or regex (no regex cannonicalization is to be performed)
  2. Find all member rules on the right side of the OR that are not on the left side of the OR and consider them set B
    2a. see 1a
  3. Find all member rules that appear on both sides of the OR and consider them set C
    3a. see 1a
  4. Rewrite the left side of the OR:
    4a. Copy over set A and set C
    4b. For each rule in set B, copy it over and transform it in the following manner:
    4b1. if the rule does not have a @{not} annotation, do the following
    4b1a. change its repetition to 1
    4b1b. change its type to "any"
    4b1c. give it a @{not} annotation
  5. Rewrite the right side of the OR by repeating step 4, but with sets B & C copied as-is and set A transformed

Specifically unaccounted for in the original expression are member rules with @{not} and repetition max of 0.
They are copied over as-is.

@anewton1998
Copy link
Contributor

So it just occurred to me that my algorithm needs to take into account de-referencing rules when rule references are encountered, with a recursive dereference for group rules.

@anewton1998 anewton1998 changed the title Object validation torture Object validation torture (AOR proposal) Aug 22, 2017
@anewton1998
Copy link
Contributor

anewton1998 commented Aug 22, 2017

AOR is Complicated

After about 2 weeks of coding for AOR, I'm less comfortable with it than when I started. While I support the concept of AOR, the general algorithm needed to implement it is rather complex. Most of the implementation code for JCRValidator can be found here: https://github.com/arineng/jcrvalidator/blob/aor/lib/jcr/rewrite_aor.rb

Around line 122 I outline the highlevel algorithm for implementing AOR. The code to implement is even much more complicated (though I admit about half of that complication is the use of the Parslet tree directly instead of an intermediate data model).

Here are the things that make me uneasy.

First, as mentioned in the algorithm steps, the need to dereference and flatten group structures I think can get rather complicated. I fear that we could easily have missed something in all this.

Second, in working with AORs I feel that debugging JSON objects that don't validate requires quite a bit of work to determine what went wrong, because one must work with the rewritten rules. My code can print out the rewritten rules, but reading them can be a bit trying. In addition, since the rules are rewritten there is no way to point the end user to the exact spot where validation failed in the original rule.

Third, you'll note that in the examples I put in the code comment it is my belief the algorithm implies that nested OR groups are to be treated like member rules in the rewriting, where the @{not} annotation is placed on them. But that isn't allowed according to syntax (see #95 ). Setting aside the syntax issue, was my assumption right? How do nested OR groups work?

I do believe addressing the goals of AOR is beneficial, but after trying it I do not think AOR is the right approach. I do have an alternate proposal here.

Moving forward, I'd like to set this issue aside for future consideration. Perhaps we can support AOR with an @{aor} annotation in the future, much as I have proposed @{ex} for Excluding Unmatched Member Names. At present, I'd like to spend a little more time on JCR implementations (I have a Java one in the works) and getting what we have specified.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants