Collection types - Amazon SageMaker

Collection types

Collection types provide a way to organize and structure data for efficient retrieval and analysis. They are used in ML databases to define the schema of a dataset and its elements. In Amazon SageMaker Feature Store, the supported collection types include list, set, and vector.

Collections are a grouping of elements in which each element within the collection must have the same feature type (String, Integral, or Fractional). For example, a collection can contain elements with all of the element feature types as Fractional, but a collection cannot contain elements with some feature types as Fractional and some feature types as String.

Only InMemory online store feature groups currently support collection types. The following list describes the collection type options.

List: An ordered collection of elements.

  • The length of the list is determined by how many elements are in the collection.

  • Example: You can have a list such as [‘a’, ‘b’, ‘a’], because the list preserves the order and can have repeat elements.

Set: An unordered collection of unique elements.

  • The length of the set is determined by how many unique elements are in the collection.

  • Example: You cannot have a set such as [‘a’, 'b', 'a'], because it contains a repeat element. The set will instead have the elements [‘a’, ‘b’], because the set only contains unique elements.

Vector: A specialized list that represents a fixed-size array of elements. The order of the elements hold significance, such that the positions of the elements represent certain properties of the data.

  • The elements in the vector collection type must have the Fractional feature type.

  • You may only have one vector collection type per online store InMemory tier feature group.

  • The dimension (number of elements in the vector) of the vector is predetermined by you and is specified using VectorDimension. The max dimension limit is 8192.

  • Example: You can have a vector such as [4.2, -6.3, 4.2], where the first, second, and third elements can represent the x, y, and z positions in physical space.

There are no limits on the length of the collections, as long as they don't exceed the maximum size of a record. For the maximum size of a record, see Quotas, naming rules and data types.