Validation of data mappings
Data is replicated to OpenSearch from Neptune using this process:
-
If a mapping for the field in question is already present in OpenSearch:
If the data can be safely converted to the existing mapping using data validation rules, then store the field in OpenSearch.
If not, drop the corresponding stream update record.
-
If there is no existing mapping for the field in question, find an OpenSearch datatype corresponding to the field's datatype in Neptune.
If the field data can be safely converted to the OpenSearch datatype using data validation rules, then store the new mapping and field data in OpenSearch.
If not, drop the corresponding stream update record.
Values are validated against equivalent OpenSearch types or existing
OpenSearch mappings rather than the Neptune types. For example, validation for
the value "123"
in "123"^^xsd:int
is done against the
long
type rather than the int
type.
Although Neptune attempts to replicate all data to OpenSearch, there are cases where datatypes in OpenSearch are totally different from the ones in Neptune, and in such cases records are skipped rather than being indexed in OpenSearch.
For example, in Neptune one property can have multiple values of different types, whereas in OpenSearch a field must have the same type across the index.
By enabling debug logs, you can view what records have been dropped during export from Neptune to OpenSearch. An example of a debug log entry is:
Dropping Record : Data type not a valid Gremlin type <Record>
Datatypes are validated as follows:
-
text
– All values in Neptune can safely be mapped to text in OpenSearch. -
long
– The following rules for Neptune datatypes apply when the OpenSearch mapping type is long (in the examples below, it is assumed that"testLong"
has along
mapping type):-
boolean
– Invalid, cannot be converted, and the corresponding stream update record is dropped.Invalid Gremlin examples are:
"testLong" : true. "testLong" : false.
Invalid SPARQL examples are:
":testLong" : "true"^^xsd:boolean ":testLong" : "false"^^xsd:boolean
-
datetime
– Invalid, cannot be converted, and the corresponding stream update record is dropped.An invalid Gremlin example is:
":testLong" : datetime('2018-11-04T00:00:00').
An invalid SPARQL example is:
":testLong" : "2016-01-01"^^xsd:date
-
float
,double
, ordecimal
– If the value in Neptune is an integer that can fit in 64 bits, it is valid and is stored in OpenSearch as a long, but if it has a fractional part, or is aNaN
or anINF
, or is larger than 9,223,372,036,854,775,807 or smaller than -9,223,372,036,854,775,808, then it is not valid and the corresponding stream update record is dropped.Valid Gremlin examples are:
"testLong" : 145.0. ":testLong" : 123 ":testLong" : -9223372036854775807
Valid SPARQL examples are:
":testLong" : "145.0"^^xsd:float ":testLong" : 145.0 ":testLong" : "145.0"^^xsd:double ":testLong" : "145.0"^^xsd:decimal ":testLong" : "-9223372036854775807"
Invalid Gremlin examples are:
"testLong" : 123.45 ":testLong" : 9223372036854775900
Invalid SPARQL examples are:
":testLong" : 123.45 ":testLong" : 9223372036854775900 ":testLong" : "123.45"^^xsd:float ":testLong" : "123.45"^^xsd:double ":testLong" : "123.45"^^xsd:decimal
-
string
– If the value in Neptune is a string representation of an integer that can be contained in a 64-bit integer, then it it is valid and is converted to along
in OpenSearch. Any other string value is invalid for an Elasticseearchlong
mapping, and the corresponding stream update record is dropped.Valid Gremlin examples are:
"testLong" : "123". ":testLong" : "145.0" ":testLong" : "-9223372036854775807"
Valid SPARQL examples are:
":testLong" : "145.0"^^xsd:string ":testLong" : "-9223372036854775807"^^xsd:string
Invalid Gremlin examples are:
"testLong" : "123.45" ":testLong" : "9223372036854775900" ":testLong" : "abc"
Invalid SPARQL examples are:
":testLong" : "123.45"^^xsd:string ":testLong" : "abc" ":testLong" : "9223372036854775900"^^xsd:string
-
-
double
– If the OpenSearch mapping type isdouble
, the following rules apply (here, the "testDouble" field is assumed to have adouble
mapping in OpenSearch):-
boolean
– Invalid, cannot be converted, and the corresponding stream update record is dropped.Invalid Gremlin examples are:
"testDouble" : true. "testDouble" : false.
Invalid SPARQL examples are:
":testDouble" : "true"^^xsd:boolean ":testDouble" : "false"^^xsd:boolean
-
datetime
– Invalid, cannot be converted, and the corresponding stream update record is dropped.An invalid Gremlin example is:
":testDouble" : datetime('2018-11-04T00:00:00').
An invalid SPARQL example is:
":testDouble" : "2016-01-01"^^xsd:date
-
Floating-point
NaN
orINF
– If the value in SPARQL is a floating-pointNaN
orINF
, then it is not valid and the corresponding stream update record is dropped.Invalid SPARQL examples are:
" :testDouble" : "NaN"^^xsd:float ":testDouble" : "NaN"^^double ":testDouble" : "INF"^^double ":testDouble" : "-INF"^^double
-
number or numeric string – If the value in Neptune is any other number or numeric string representation of a numnber that can safely be expressed as a
double
, then it is valid and is converted to adouble
in OpenSearch. Any other string value is invalid for an OpenSearchdouble
mapping, and the corresponding stream update record is dropped.Valid Gremlin examples are:
"testDouble" : 123 ":testDouble" : "123" ":testDouble" : 145.67 ":testDouble" : "145.67"
Valid SPARQL examples are:
":testDouble" : 123.45 ":testDouble" : 145.0 ":testDouble" : "123.45"^^xsd:float ":testDouble" : "123.45"^^xsd:double ":testDouble" : "123.45"^^xsd:decimal ":testDouble" : "123.45"^^xsd:string
An invalid Gremlin example is:
":testDouble" : "abc"
An Invalid SPARQL examples is:
":testDouble" : "abc"
-
-
date
– If the OpenSearch mapping type isdate
, Neptunedate
anddateTime
value are valid, as is any string value that can be parsed successfully to adateTime
format.Valid examples in either Gremlin or SPARQL are:
Date(2016-01-01) "2016-01-01" " 2003-09-25T10:49:41" "2003-09-25T10:49" "2003-09-25T10" "20030925T104941-0300" "20030925T104941" "2003-Sep-25" " Sep-25-2003" "2003.Sep.25" "2003/09/25" "2003 Sep 25" " Wed, July 10, '96" "Tuesday, April 12, 1952 AD 3:30:42pm PST" "123" "-123" "0" "-0" "123.00" "-123.00"
Invalid examples are:
123.45 True "abc"