Examples of using parameters within additionalParams for tuning model-training configuration
The following examples demonstrate how to utilize the "additionalParams" feature in property-graph and RDF data models to configure various aspects of the model training process for a Neptune ML application. The examples cover a wide range of functionality, including specifying default split rates for training/validation/test data, defining node classification, regression, and link prediction tasks, as well as configuring different feature types such as numerical buckets, text embeddings, datetime, and categorical data. These detailed configurations allow you to tailor the machine learning pipeline to your specific data and modeling requirements, unlocking the full potential of the Neptune ML capabilities.
Contents
- Property-graph examples using additionalParams
- Specifying a default split rate for model-training configuration
- Specifying a node-classification task for model-training configuration
- Specifying a multi-class node classification task for model-training configuration
- Specifying a node regression task for model-training configuration
- Specifying an edge-classification task for model-training configuration
- Specifying a multi-class edge classification task for model-training configuration
- Specifying an edge regression for model-training configuration
- Specifying a link prediction task for model-training configuration
- Specifying a numerical bucket feature
- Specifying a Word2Vec feature
- Specifying a FastText feature
- Specifying a Sentence BERT feature
- Specifying a TF-IDF feature
- Specifying a datetime feature
- Specifying a category feature
- Specifying a numerical feature
- Specifying an auto feature
- RDF examples using additionalParams
- Specifying a default split rate for model-training configuration
- Specifying a node-classification task for model-training configuration
- Specifying a node regression task for model-training configuration
- Specifying a link prediction task for particular edges
- Specifying a link prediction task for all edges
Property-graph examples using additionalParams
Specifying a default split rate for model-training configuration
In the following example, the split_rate
parameter sets the default
split rate for model training. If no default split rate is specified, the training uses
a value of [0.9, 0.1, 0.0]. You can override the default value on a per-target basis
by specifying a split_rate
for each target.
In the following example, the default split_rate
field indicates
that a split rate of [0.7,0.1,0.2]
should be used unless overridden on
a per-target basis:"
"additionalParams": { "neptune_ml": { "version": "v2.0", "split_rate": [0.7,0.1,0.2], "targets": [
(...)
], "features": [(...)
] } }
Specifying a node-classification task for model-training configuration
To indicate which node property contains labeled examples for training purposes,
add a node classification element to the targets
array, using "type" :
"classification"
. Add a split_rate
field if you want to override
the default split rate.
In the following example, the node
target indicates that the
genre
property of each Movie
node should be treated
as a node class label. The split_rate
value overrides the default
split rate:
"additionalParams": { "neptune_ml": { "version": "v2.0", "targets": [ { "node": "Movie", "property": "genre", "type": "classification", "split_rate": [0.7,0.1,0.2] } ], "features": [
(...)
] } }
Specifying a multi-class node classification task for model-training configuration
To indicate which node property contains multiple labeled examples for training
purposes, add a node classification element to the targets array, using "type" :
"classification"
, and separator
to specify a character that can be
used to split a target property value into multiple categorical values. Add a
split_rate
field if you want to override the default split rate.
In the following example, the node
target indicates that the
genre
property of each Movie
node should be treated
as a node class label. The separator
field indicates that each
genre property contains multiple semicolon-separated values:
"additionalParams": { "neptune_ml": { "version": "v2.0", "targets": [ { "node": "Movie", "property": "genre", "type": "classification", "separator": ";" } ], "features": [
(...)
] } }
Specifying a node regression task for model-training configuration
To indicate which node property contains labeled regressions for training purposes,
add a node regression element to the targets array, using "type" : "regression"
.
Add a split_rate field if you want to override the default split rate.
The following node
target indicates that the rating
property of each Movie
node should be treated as a node regression label:
"additionalParams": { "neptune_ml": { "version": "v2.0", "targets": [ { "node": "Movie", "property": "rating", "type" : "regression", "split_rate": [0.7,0.1,0.2] } ], "features": [
...
] } }
Specifying an edge-classification task for model-training configuration
To indicate which edge property contains labeled examples for training purposes,
add an edge element to the targets
array, using "type" : "regression"
.
Add a split_rate field if you want to override the default split rate.
The following edge
target indicates that the metAtLocation
property of each knows
edge should be treated as an edge class label:
"additionalParams": { "neptune_ml": { "version": "v2.0", "targets": [ { "edge": ["Person", "knows", "Person"], "property": "metAtLocation", "type": "classification" } ], "features": [
(...)
] } }
Specifying a multi-class edge classification task for model-training configuration
To indicate which edge property contains multiple labeled examples for training purposes,
add an edge element to the targets
array, using "type" : "classification"
,
and a separator
field to specify a character used to split a target property
value into multiple categorical values. Add a split_rate
field if you want to
override the default split rate.
The following edge
target indicates that the sentiment
property of each repliedTo
edge should be treated as an edge class label.
The separator field indicates that each sentiment property contains multile comma-separated
values:
"additionalParams": { "neptune_ml": { "version": "v2.0", "targets": [ { "edge": ["Person", "repliedTo", "Message"], "property": "sentiment", "type": "classification", "separator": "," } ], "features": [
(...)
] } }
Specifying an edge regression for model-training configuration
To indicate which edge property contains labeled regression examples for training
purposes, add an edge
element to the targets
array, using
"type" : "regression"
. Add a split_rate
field if you want
to override the default split rate.
The following edge
target indicates that the rating
property of each reviewed
edge should be treated as an edge regression:
"additionalParams": { "neptune_ml": { "version": "v2.0", "targets": [ { "edge": ["Person", "reviewed", "Movie"], "property": "rating", "type" : "regression" } ], "features": [
(...)
] } }
Specifying a link prediction task for model-training configuration
To indicate which edges should be used for link prediction training purposes, add
an edge element to the targets array using "type" : "link_prediction"
.
Add a split_rate
field if you want to override the default split rate.
The following edge
target indicates that cites
edges
should be used for link prediction:
"additionalParams": { "neptune_ml": { "version": "v2.0", "targets": [ { "edge": ["Article", "cites", "Article"], "type" : "link_prediction" } ], "features": [
(...)
] } }
Specifying a numerical bucket feature
You can specify a numerical data feature for a node property by adding
"type": "bucket_numerical"
to the features
array.
The following node
feature indicates that the age
property of each Person
node should be treated as a numerical
bucket feature:
"additionalParams": { "neptune_ml": { "targets": [
...
], "features": [ { "node": "Person", "property": "age", "type": "bucket_numerical", "range": [1, 100], "bucket_cnt": 5, "slide_window_size": 3, "imputer": "median" } ] } }
Specifying a Word2Vec
feature
You can specify a Word2Vec
feature for a node property by adding
"type": "text_word2vec"
to the features
array.
The following node
feature indicates that the description
property of each Movie
node should be treated as a Word2Vec
feature:
"additionalParams": { "neptune_ml": { "version": "v2.0", "targets": [
...
], "features": [ { "node": "Movie", "property": "description", "type": "text_word2vec", "language": "en_core_web_lg" } ] } }
Specifying a FastText
feature
You can specify a FastText
feature for a node property by adding
"type": "text_fasttext"
to the features
array. The
language
field is required, and must specify one of the following
languages codes:
en
(English)zh
(Chinese)hi
(Hindi)es
(Spanish)fr
(French)
Note that the text_fasttext
encoding cannot handle more than
one language at a time in a feature.
The following node
feature indicates that the French description
property of each Movie
node should be treated as a FastText
feature:
"additionalParams": { "neptune_ml": { "version": "v2.0", "targets": [
...
], "features": [ { "node": "Movie", "property": "description", "type": "text_fasttext", "language": "fr", "max_length": 1024 } ] } }
Specifying a Sentence BERT
feature
You can specify a Sentence BERT
feature for a node property by adding
"type": "text_sbert"
to the features
array. You don't need
to specify the language, since the method automatically encodes text features using
a multilingual language model.
The following node
feature indicates that the description
property of each Movie
node should be treated as a Sentence BERT
feature:
"additionalParams": { "neptune_ml": { "version": "v2.0", "targets": [
...
], "features": [ { "node": "Movie", "property": "description", "type": "text_sbert128", } ] } }
Specifying a TF-IDF
feature
You can specify a TF-IDF
feature for a node property by adding
"type": "text_tfidf"
to the features
array.
The following node
feature indicates that the bio
property of each Person
node should be treated as a TF-IDF
feature:
"additionalParams": { "neptune_ml": { "version": "v2.0", "targets": [
...
], "features": [ { "node": "Movie", "property": "bio", "type": "text_tfidf", "ngram_range": [1, 2], "min_df": 5, "max_features": 1000 } ] } }
Specifying a datetime
feature
The export process automatically infers datetime
features for date
properties. However, if you want to limit the datetime_parts
used for
a datetime
feature, or override a feature specification so that a property
that would normally be treated as an auto
feature is explicitly treated as a
datetime
feature, you can do so by adding a "type": "datetime"
to the features array.
The following node
feature indicates that the createdAt
property of each Post
node should be treated as a datetime
feature:
"additionalParams": { "neptune_ml": { "version": "v2.0", "targets": [
...
], "features": [ { "node": "Post", "property": "createdAt", "type": "datetime", "datetime_parts": ["month", "weekday", "hour"] } ] } }
Specifying a category
feature
The export process automatically infers auto
features for string
properties and numeric properties containing multiples values. For numeric properties
containing single values, it infers numerical
features. For date
properties it infers datetime
features.
If you want to override a feature specification so that a property is treated
as a categorical feature, add a "type": "category"
to the features array.
If the property contains multiple values, include a separator
field.
For example:
"additionalParams": { "neptune_ml": { "version": "v2.0", "targets": [
...
], "features": [ { "node": "Post", "property": "tag", "type": "category", "separator": "|" } ] } }
Specifying a numerical
feature
The export process automatically infers auto
features for string
properties and numeric properties containing multiples values. For numeric properties
containing single values, it infers numerical
features. For date
properties it infers datetime
features.
If you want to override a feature specification so that a property is treated as a
numerical
feature, add "type": "numerical"
to the features array.
If the property contains multiple values, include a separator
field.
For example:
"additionalParams": { "neptune_ml": { "version": "v2.0", "targets": [
...
], "features": [ { "node": "Recording", "property": "duration", "type": "numerical", "separator": "," } ] } }
Specifying an auto
feature
The export process automatically infers auto
features for string
properties and numeric properties containing multiples values. For numeric properties
containing single values, it infers numerical
features. For date
properties it infers datetime
features.
If you want to override a feature specification so that a property is treated
as an auto
feature, add "type": "auto"
to the features array.
If the property contains multiple values, include a separator
field.
For example:
"additionalParams": { "neptune_ml": { "version": "v2.0", "targets": [
...
], "features": [ { "node": "User", "property": "role", "type": "auto", "separator": "," } ] } }
RDF examples using additionalParams
Specifying a default split rate for model-training configuration
In the following example, the split_rate
parameter sets the default
split rate for model training. If no default split rate is specified, the training uses
a value of [0.9, 0.1, 0.0]. You can override the default value on a per-target basis
by specifying a split_rate
for each target.
In the following example, the default split_rate
field indicates
that a split rate of [0.7,0.1,0.2]
should be used unless overridden on
a per-target basis:"
"additionalParams": { "neptune_ml": { "version": "v2.0", "split_rate": [0.7,0.1,0.2], "targets": [
(...)
] } }
Specifying a node-classification task for model-training configuration
To indicate which node property contains labeled examples for training purposes,
add a node classification element to the targets
array, using "type" :
"classification"
. Add a node field to indicate the node type of target nodes.
Add a predicate
field to define which literal data is used as the target
node feature of the target node. Add a split_rate
field if you want to
override the default split rate.
In the following example, the node
target indicates that the
genre
property of each Movie
node should be treated
as a node class label. The split_rate
value overrides the default
split rate:
"additionalParams": { "neptune_ml": { "version": "v2.0", "targets": [ { "node": "http://aws.amazon.com/neptune/csv2rdf/class/Movie", "predicate": "http://aws.amazon.com/neptune/csv2rdf/datatypeProperty/genre", "type": "classification", "split_rate": [0.7,0.1,0.2] } ] } }
Specifying a node regression task for model-training configuration
To indicate which node property contains labeled regressions for training purposes,
add a node regression element to the targets array, using "type" : "regression"
.
Add a node
field to indicate the node type of target nodes. Add a
predicate
field to define which literal data is used as the target node
feature of the target node. Add a split_rate
field if you want to override
the default split rate.
The following node
target indicates that the rating
property of each Movie
node should be treated as a node regression label:
"additionalParams": { "neptune_ml": { "version": "v2.0", "targets": [ { "node": "http://aws.amazon.com/neptune/csv2rdf/class/Movie", "predicate": "http://aws.amazon.com/neptune/csv2rdf/datatypeProperty/rating", "type": "regression", "split_rate": [0.7,0.1,0.2] } ] } }
Specifying a link prediction task for particular edges
To indicate which edges should be used for link prediction training purposes, add
an edge element to the targets array using "type" : "link_prediction"
.
Add subject
, predicate
and object
fields to
specify the edge type. Add a split_rate
field if you want to override
the default split rate.
The following edge
target indicates that directed
edges
that connect Directors
to Movies
should be used for link
prediction:
"additionalParams": { "neptune_ml": { "version": "v2.0", "targets": [ { "subject": "http://aws.amazon.com/neptune/csv2rdf/class/Director", "predicate": "http://aws.amazon.com/neptune/csv2rdf/datatypeProperty/directed", "object": "http://aws.amazon.com/neptune/csv2rdf/class/Movie", "type" : "link_prediction" } ] } }
Specifying a link prediction task for all edges
To indicate that all edges should be used for link prediction training purposes,
add an edge
element to the targets array using "type" :
"link_prediction"
. Do not add subject
, predicate
, or
object
fields. Add a split_rate
field if you want to override
the default split rate.
"additionalParams": { "neptune_ml": { "version": "v2.0", "targets": [ { "type" : "link_prediction" } ] } }