Using RDF data - Neptune Analytics

Using RDF data

Neptune Analytics supports importing RDF data using the n-triples format. With this support it is possible to load CSV and n-triples data files into the same graph. The handling of RDF values is described below, including how RDF data is interpreted as LPG concepts and can be queried using openCypher.

Handling of RDF values

The handling of RDF specific values, that don‘t have a direct equivalent in LPG, is described here.

IRIs

Values of type IRI, like <http://example.com/Alice> , are stored as such. IRIs and Strings are distinct data types.

Calling openCypher function TOSTRING() on an IRI returns a string containing the IRI wrapped inside <>. For example, if x is the IRI <http://example.com/Alice>, then TOSTRING(x) returns "<http://example.com/Alice>". When serializing openCypher query results in json format, IRI values are included as strings in this same format.

Language-tagged literals

Values like "Hallo"@de are treated as follows:

  • When used as input for openCypher string functions, like trim(), a language-tagged string is treated as a simple string; so trim("Hallo"@de) is equivalent to trim("Hallo").

  • When used in comparison operations, like x = y or x <> y or x < y or ORDER BY, a language-tagged literal is “greater than” (and thus “not equal to”) the corresponding simple string: "Hallo" < "Hallo"@de.

Calling a function, such as TOSTRING() on a language-tagged literal, returns that literal as a string without language tag. For example, if x is the value "Hallo"@de, then TOSTRING(x) returns "Hallo". When serializing openCypher query results in JSON format, language-tagged literals are also serialized as strings without an associated language tag.

Blank nodes

Blank nodes in n-triples data files are replaced with globally unique IRIs at import time.

Loading RDF datasets that contains blank nodes is supported; but those blank nodes are represented as IRIs in the graph. When loading ntriples files the parameter blankNodeHandling needs to be specified, with the value convertToIri.

The generated IRI for a blank node has the format: <http://aws.amazon.com/neptune/vocab/v01/BNode/scope#id>

In these IRIs, scope is a unique identifier for the blank node scope, and id is the blank node identifier in the file. For example for a blank node _:b123 the generated IRI could be <http://aws.amazon.com/neptune/vocab/v01/BNode/737c0b5386448f78#b123>.

The blank node scope (e.g. 737c0b5386448f78) is generated by Neptune Analytics and designates one file within one load operation. This means that when two different ntriples files reference the same blank node identifier, like _:b123, there will be two IRIs generated, namely one for each file. All references to _:b123 within the first file will end up as references to the first IRI, like <http://aws.amazon.com/neptune/vocab/v01/BNode/1001#b123>, and all references within the second file will end up referring to another IRI, like <http://aws.amazon.com/neptune/vocab/v01/BNode/1002#b123>.