Examples of using parameters within additionalParams for tuning model-training configuration - Amazon Neptune

Examples of using parameters within additionalParams for tuning model-training configuration

The following examples demonstrate how to utilize the "additionalParams" feature in property-graph and RDF data models to configure various aspects of the model training process for a Neptune ML application. The examples cover a wide range of functionality, including specifying default split rates for training/validation/test data, defining node classification, regression, and link prediction tasks, as well as configuring different feature types such as numerical buckets, text embeddings, datetime, and categorical data. These detailed configurations allow you to tailor the machine learning pipeline to your specific data and modeling requirements, unlocking the full potential of the Neptune ML capabilities.

Property-graph examples using additionalParams

Specifying a default split rate for model-training configuration

In the following example, the split_rate parameter sets the default split rate for model training. If no default split rate is specified, the training uses a value of [0.9, 0.1, 0.0]. You can override the default value on a per-target basis by specifying a split_rate for each target.

In the following example, the default split_rate field indicates that a split rate of [0.7,0.1,0.2] should be used unless overridden on a per-target basis:"

"additionalParams": { "neptune_ml": { "version": "v2.0", "split_rate": [0.7,0.1,0.2], "targets": [ (...) ], "features": [ (...) ] } }

Specifying a node-classification task for model-training configuration

To indicate which node property contains labeled examples for training purposes, add a node classification element to the targets array, using "type" : "classification". Add a split_rate field if you want to override the default split rate.

In the following example, the node target indicates that the genre property of each Movie node should be treated as a node class label. The split_rate value overrides the default split rate:

"additionalParams": { "neptune_ml": { "version": "v2.0", "targets": [ { "node": "Movie", "property": "genre", "type": "classification", "split_rate": [0.7,0.1,0.2] } ], "features": [ (...) ] } }

Specifying a multi-class node classification task for model-training configuration

To indicate which node property contains multiple labeled examples for training purposes, add a node classification element to the targets array, using "type" : "classification", and separator to specify a character that can be used to split a target property value into multiple categorical values. Add a split_rate field if you want to override the default split rate.

In the following example, the node target indicates that the genre property of each Movie node should be treated as a node class label. The separator field indicates that each genre property contains multiple semicolon-separated values:

"additionalParams": { "neptune_ml": { "version": "v2.0", "targets": [ { "node": "Movie", "property": "genre", "type": "classification", "separator": ";" } ], "features": [ (...) ] } }

Specifying a node regression task for model-training configuration

To indicate which node property contains labeled regressions for training purposes, add a node regression element to the targets array, using "type" : "regression". Add a split_rate field if you want to override the default split rate.

The following node target indicates that the rating property of each Movie node should be treated as a node regression label:

"additionalParams": { "neptune_ml": { "version": "v2.0", "targets": [ { "node": "Movie", "property": "rating", "type" : "regression", "split_rate": [0.7,0.1,0.2] } ], "features": [ ... ] } }

Specifying an edge-classification task for model-training configuration

To indicate which edge property contains labeled examples for training purposes, add an edge element to the targets array, using "type" : "regression". Add a split_rate field if you want to override the default split rate.

The following edge target indicates that the metAtLocation property of each knows edge should be treated as an edge class label:

"additionalParams": { "neptune_ml": { "version": "v2.0", "targets": [ { "edge": ["Person", "knows", "Person"], "property": "metAtLocation", "type": "classification" } ], "features": [ (...) ] } }

Specifying a multi-class edge classification task for model-training configuration

To indicate which edge property contains multiple labeled examples for training purposes, add an edge element to the targets array, using "type" : "classification", and a separator field to specify a character used to split a target property value into multiple categorical values. Add a split_rate field if you want to override the default split rate.

The following edge target indicates that the sentiment property of each repliedTo edge should be treated as an edge class label. The separator field indicates that each sentiment property contains multile comma-separated values:

"additionalParams": { "neptune_ml": { "version": "v2.0", "targets": [ { "edge": ["Person", "repliedTo", "Message"], "property": "sentiment", "type": "classification", "separator": "," } ], "features": [ (...) ] } }

Specifying an edge regression for model-training configuration

To indicate which edge property contains labeled regression examples for training purposes, add an edge element to the targets array, using "type" : "regression". Add a split_rate field if you want to override the default split rate.

The following edge target indicates that the rating property of each reviewed edge should be treated as an edge regression:

"additionalParams": { "neptune_ml": { "version": "v2.0", "targets": [ { "edge": ["Person", "reviewed", "Movie"], "property": "rating", "type" : "regression" } ], "features": [ (...) ] } }

To indicate which edges should be used for link prediction training purposes, add an edge element to the targets array using "type" : "link_prediction". Add a split_rate field if you want to override the default split rate.

The following edge target indicates that cites edges should be used for link prediction:

"additionalParams": { "neptune_ml": { "version": "v2.0", "targets": [ { "edge": ["Article", "cites", "Article"], "type" : "link_prediction" } ], "features": [ (...) ] } }

Specifying a numerical bucket feature

You can specify a numerical data feature for a node property by adding "type": "bucket_numerical" to the features array.

The following node feature indicates that the age property of each Person node should be treated as a numerical bucket feature:

"additionalParams": { "neptune_ml": { "targets": [ ... ], "features": [ { "node": "Person", "property": "age", "type": "bucket_numerical", "range": [1, 100], "bucket_cnt": 5, "slide_window_size": 3, "imputer": "median" } ] } }

Specifying a Word2Vec feature

You can specify a Word2Vec feature for a node property by adding "type": "text_word2vec" to the features array.

The following node feature indicates that the description property of each Movie node should be treated as a Word2Vec feature:

"additionalParams": { "neptune_ml": { "version": "v2.0", "targets": [ ... ], "features": [ { "node": "Movie", "property": "description", "type": "text_word2vec", "language": "en_core_web_lg" } ] } }

Specifying a FastText feature

You can specify a FastText feature for a node property by adding "type": "text_fasttext" to the features array. The language field is required, and must specify one of the following languages codes:

  • en   (English)

  • zh   (Chinese)

  • hi   (Hindi)

  • es   (Spanish)

  • fr   (French)

Note that the text_fasttext encoding cannot handle more than one language at a time in a feature.

The following node feature indicates that the French description property of each Movie node should be treated as a FastText feature:

"additionalParams": { "neptune_ml": { "version": "v2.0", "targets": [ ... ], "features": [ { "node": "Movie", "property": "description", "type": "text_fasttext", "language": "fr", "max_length": 1024 } ] } }

Specifying a Sentence BERT feature

You can specify a Sentence BERT feature for a node property by adding "type": "text_sbert" to the features array. You don't need to specify the language, since the method automatically encodes text features using a multilingual language model.

The following node feature indicates that the description property of each Movie node should be treated as a Sentence BERT feature:

"additionalParams": { "neptune_ml": { "version": "v2.0", "targets": [ ... ], "features": [ { "node": "Movie", "property": "description", "type": "text_sbert128", } ] } }

Specifying a TF-IDF feature

You can specify a TF-IDF feature for a node property by adding "type": "text_tfidf" to the features array.

The following node feature indicates that the bio property of each Person node should be treated as a TF-IDF feature:

"additionalParams": { "neptune_ml": { "version": "v2.0", "targets": [ ... ], "features": [ { "node": "Movie", "property": "bio", "type": "text_tfidf", "ngram_range": [1, 2], "min_df": 5, "max_features": 1000 } ] } }

Specifying a datetime feature

The export process automatically infers datetime features for date properties. However, if you want to limit the datetime_parts used for a datetime feature, or override a feature specification so that a property that would normally be treated as an auto feature is explicitly treated as a datetime feature, you can do so by adding a "type": "datetime" to the features array.

The following node feature indicates that the createdAt property of each Post node should be treated as a datetime feature:

"additionalParams": { "neptune_ml": { "version": "v2.0", "targets": [ ... ], "features": [ { "node": "Post", "property": "createdAt", "type": "datetime", "datetime_parts": ["month", "weekday", "hour"] } ] } }

Specifying a category feature

The export process automatically infers auto features for string properties and numeric properties containing multiples values. For numeric properties containing single values, it infers numerical features. For date properties it infers datetime features.

If you want to override a feature specification so that a property is treated as a categorical feature, add a "type": "category" to the features array. If the property contains multiple values, include a separator field. For example:

"additionalParams": { "neptune_ml": { "version": "v2.0", "targets": [ ... ], "features": [ { "node": "Post", "property": "tag", "type": "category", "separator": "|" } ] } }

Specifying a numerical feature

The export process automatically infers auto features for string properties and numeric properties containing multiples values. For numeric properties containing single values, it infers numerical features. For date properties it infers datetime features.

If you want to override a feature specification so that a property is treated as a numerical feature, add "type": "numerical" to the features array. If the property contains multiple values, include a separator field. For example:

"additionalParams": { "neptune_ml": { "version": "v2.0", "targets": [ ... ], "features": [ { "node": "Recording", "property": "duration", "type": "numerical", "separator": "," } ] } }

Specifying an auto feature

The export process automatically infers auto features for string properties and numeric properties containing multiples values. For numeric properties containing single values, it infers numerical features. For date properties it infers datetime features.

If you want to override a feature specification so that a property is treated as an auto feature, add "type": "auto" to the features array. If the property contains multiple values, include a separator field. For example:

"additionalParams": { "neptune_ml": { "version": "v2.0", "targets": [ ... ], "features": [ { "node": "User", "property": "role", "type": "auto", "separator": "," } ] } }

RDF examples using additionalParams

Specifying a default split rate for model-training configuration

In the following example, the split_rate parameter sets the default split rate for model training. If no default split rate is specified, the training uses a value of [0.9, 0.1, 0.0]. You can override the default value on a per-target basis by specifying a split_rate for each target.

In the following example, the default split_rate field indicates that a split rate of [0.7,0.1,0.2] should be used unless overridden on a per-target basis:"

"additionalParams": { "neptune_ml": { "version": "v2.0", "split_rate": [0.7,0.1,0.2], "targets": [ (...) ] } }

Specifying a node-classification task for model-training configuration

To indicate which node property contains labeled examples for training purposes, add a node classification element to the targets array, using "type" : "classification". Add a node field to indicate the node type of target nodes. Add a predicate field to define which literal data is used as the target node feature of the target node. Add a split_rate field if you want to override the default split rate.

In the following example, the node target indicates that the genre property of each Movie node should be treated as a node class label. The split_rate value overrides the default split rate:

"additionalParams": { "neptune_ml": { "version": "v2.0", "targets": [ { "node": "http://aws.amazon.com/neptune/csv2rdf/class/Movie", "predicate": "http://aws.amazon.com/neptune/csv2rdf/datatypeProperty/genre", "type": "classification", "split_rate": [0.7,0.1,0.2] } ] } }

Specifying a node regression task for model-training configuration

To indicate which node property contains labeled regressions for training purposes, add a node regression element to the targets array, using "type" : "regression". Add a node field to indicate the node type of target nodes. Add a predicate field to define which literal data is used as the target node feature of the target node. Add a split_rate field if you want to override the default split rate.

The following node target indicates that the rating property of each Movie node should be treated as a node regression label:

"additionalParams": { "neptune_ml": { "version": "v2.0", "targets": [ { "node": "http://aws.amazon.com/neptune/csv2rdf/class/Movie", "predicate": "http://aws.amazon.com/neptune/csv2rdf/datatypeProperty/rating", "type": "regression", "split_rate": [0.7,0.1,0.2] } ] } }

To indicate which edges should be used for link prediction training purposes, add an edge element to the targets array using "type" : "link_prediction". Add subject, predicate and object fields to specify the edge type. Add a split_rate field if you want to override the default split rate.

The following edge target indicates that directed edges that connect Directors to Movies should be used for link prediction:

"additionalParams": { "neptune_ml": { "version": "v2.0", "targets": [ { "subject": "http://aws.amazon.com/neptune/csv2rdf/class/Director", "predicate": "http://aws.amazon.com/neptune/csv2rdf/datatypeProperty/directed", "object": "http://aws.amazon.com/neptune/csv2rdf/class/Movie", "type" : "link_prediction" } ] } }

To indicate that all edges should be used for link prediction training purposes, add an edge element to the targets array using "type" : "link_prediction". Do not add subject, predicate, or object fields. Add a split_rate field if you want to override the default split rate.

"additionalParams": { "neptune_ml": { "version": "v2.0", "targets": [ { "type" : "link_prediction" } ] } }