Using Delta Sync operations on versioned data sources in AWS AppSync
Note
We now primarily support the APPSYNC_JS runtime and its documentation. Please consider using the APPSYNC_JS runtime and its guides here.
Client applications in AWS AppSync store data by caching GraphQL responses locally to disk
in a mobile/web application. Versioned data sources and Sync
operations give
customers the ability to perform the sync process using a single resolver. This allows clients
to hydrate their local cache with results from one base query that might have a lot of records,
and then receive only the data altered since their last query (the delta updates). By allowing clients to perform the base
hydration of the cache with an initial request and incremental updates in another, you can move
the computation from your client application to the backend. This is substantially more
efficient for client applications that frequently switch between online and offline
states.
To implement Delta Sync, the Sync
query uses the Sync
operation on
a versioned data source. When an AWS AppSync mutation changes an item in a versioned data source,
a record of that change will be stored in the Delta
table as well. You can choose to use different Delta
tables (e.g. one per type, one per domain area) for other versioned data sources or a single
Delta table for your API. AWS AppSync recommends
against using a single Delta table for multiple APIs
to avoid the collision of primary keys.
In addition, Delta Sync clients can also receive a subscription as an argument, and then the client coordinates subscription reconnects and writes between offline to online transitions. Delta Sync performs this by automatically resuming subscriptions (including exponential backoff and retry with jitter through different network error scenarios), and storing events in a queue. The appropriate delta or base query is then run before merging any events from the queue, and finally processing subscriptions as normal.
Documentation for client configuration options, including the Amplify DataStore, is
available on the Amplify Framework websiteSync
operations to work with the Delta Sync client for optimal data access.
One-Click Setup
To automatically set up the GraphQL endpoint in AWS AppSync with all the resolvers configured and the necessary AWS resources, use this AWS CloudFormation template:
This stack creates the following resources in your account:
-
2 DynamoDB tables (Base and Delta)
-
1 AWS AppSync API with API key
-
1 IAM Role with policy for DynamoDB tables
Two tables are used to partition your sync queries into a second table that acts as a journal of missed events when the clients were offline. To keep the queries efficient on the delta table, Amazon DynamoDB TTLs are used to automatically groom the events as necessary. The TTL time is configurable for your needs on the data source (you might want this as 1hour, 1day, etc.).
Schema
To demonstrate Delta Sync, the sample application creates a Posts schema backed by a Base and Delta table in DynamoDB. AWS AppSync automatically writes the mutations to both tables. The sync query pulls records from the Base or Delta table as appropriate, and a single subscription is defined to show how clients can leverage this in their reconnection logic.
input CreatePostInput { author: String! title: String! content: String! url: String ups: Int downs: Int _version: Int } interface Connection { nextToken: String startedAt: AWSTimestamp! } type Mutation { createPost(input: CreatePostInput!): Post updatePost(input: UpdatePostInput!): Post deletePost(input: DeletePostInput!): Post } type Post { id: ID! author: String! title: String! content: String! url: AWSURL ups: Int downs: Int _version: Int _deleted: Boolean _lastChangedAt: AWSTimestamp! } type PostConnection implements Connection { items: [Post!]! nextToken: String startedAt: AWSTimestamp! } type Query { getPost(id: ID!): Post syncPosts(limit: Int, nextToken: String, lastSync: AWSTimestamp): PostConnection! } type Subscription { onCreatePost: Post @aws_subscribe(mutations: ["createPost"]) onUpdatePost: Post @aws_subscribe(mutations: ["updatePost"]) onDeletePost: Post @aws_subscribe(mutations: ["deletePost"]) } input DeletePostInput { id: ID! _version: Int! } input UpdatePostInput { id: ID! author: String title: String content: String url: String ups: Int downs: Int _version: Int! } schema { query: Query mutation: Mutation subscription: Subscription }
The GraphQL schema is standard, but a couple things are worth calling out before moving
forward. First, all of the mutations automatically first write to the Base table and then to the Delta table. The Base table is the central source of truth for state while the Delta table is your journal. If you don’t pass in the
lastSync: AWSTimestamp
, the syncPosts
query runs against the
Base table and hydrates the cache as well as
running at periodic times as a global catchup
process for edge cases when clients are offline longer than your configured TTL
time in the Delta table. If you do pass in the
lastSync: AWSTimestamp
, the syncPosts
query runs against your
Delta table and is used by clients to retrieve
changed events since they were last offline. Amplify clients automatically pass the
lastSync: AWSTimestamp
value, and persist to disk appropriately.
The _deleted field on Post is used for DELETE operations. When clients are offline and records are removed from the Base table, this attribute notifies clients performing synchronization to evict items from their local cache. In cases where clients are offline for longer periods of time and the item has been removed before the client can retrieve this value with a Delta Sync query, the global catch-up event in the base query (configurable in the client) runs and removes the item from the cache. This field is marked optional because it only returns a value when running a sync query that has deleted items present.
Mutations
For all of the mutations, AWS AppSync does a standard Create/Update/Delete operation in the
Base table and also records the change in the
Delta table automatically. You can reduce or
extend the time to keep records by modifying the DeltaSyncTableTTL
value on the
data source. For organizations with a high velocity of data, it may make sense to keep this
short. Alternatively, if your clients are offline for longer periods of time, it might be
prudent to keep this longer.
Sync Queries
The base query is a DynamoDB Sync operation
without a lastSync
value specified. For many organizations, this works because
the base query only runs on startup and at a periodic basis thereafter.
The delta query is a DynamoDB Sync operation with
a lastSync
value specified. The delta
query executes whenever the client comes back online from an offline state (as
long as the base query periodic time hasn’t triggered to run). Clients automatically track the
last time they successfully ran a query to sync data.
When a delta query is run, the query’s resolver uses the ds_pk
and
ds_sk
to query only for the records that have changed since the last time the
client performed a sync. The client stores the appropriate GraphQL response.
For more information on executing Sync Queries, see the Sync Operation documentation.
Example
Let’s start first by calling a createPost
mutation to create an item:
mutation create { createPost(input: {author: "Nadia", title: "My First Post", content: "Hello World"}) { id author title content _version _lastChangedAt _deleted } }
The return value of this mutation will look as follows:
{ "data": { "createPost": { "id": "81d36bbb-1579-4efe-92b8-2e3f679f628b", "author": "Nadia", "title": "My First Post", "content": "Hello World", "_version": 1, "_lastChangedAt": 1574469356331, "_deleted": null } } }
If you examine the contents of the Base table, you will see a record that looks like:
{ "_lastChangedAt": { "N": "1574469356331" }, "_version": { "N": "1" }, "author": { "S": "Nadia" }, "content": { "S": "Hello World" }, "id": { "S": "81d36bbb-1579-4efe-92b8-2e3f679f628b" }, "title": { "S": "My First Post" } }
If you examine the contents of the Delta table, you will see a record that looks like:
{ "_lastChangedAt": { "N": "1574469356331" }, "_ttl": { "N": "1574472956" }, "_version": { "N": "1" }, "author": { "S": "Nadia" }, "content": { "S": "Hello World" }, "ds_pk": { "S": "AppSync-delta-sync-post:2019-11-23" }, "ds_sk": { "S": "00:35:56.331:81d36bbb-1579-4efe-92b8-2e3f679f628b:1" }, "id": { "S": "81d36bbb-1579-4efe-92b8-2e3f679f628b" }, "title": { "S": "My First Post" } }
Now we can simulate a Base query that a client
will run to hydrate its local data store using a syncPosts
query like:
query baseQuery { syncPosts(limit: 100, lastSync: null, nextToken: null) { items { id author title content _version _lastChangedAt } startedAt nextToken } }
The return value of this Base query will look as follows:
{ "data": { "syncPosts": { "items": [ { "id": "81d36bbb-1579-4efe-92b8-2e3f679f628b", "author": "Nadia", "title": "My First Post", "content": "Hello World", "_version": 1, "_lastChangedAt": 1574469356331 } ], "startedAt": 1574469602238, "nextToken": null } } }
We’ll save the startedAt
value later to simulate a Delta query, but first we need to make a change to our
table. Let’s use the updatePost
mutation to modify our existing Post:
mutation updatePost { updatePost(input: {id: "81d36bbb-1579-4efe-92b8-2e3f679f628b", _version: 1, title: "Actually this is my Second Post"}) { id author title content _version _lastChangedAt _deleted } }
The return value of this mutation will look as follows:
{ "data": { "updatePost": { "id": "81d36bbb-1579-4efe-92b8-2e3f679f628b", "author": "Nadia", "title": "Actually this is my Second Post", "content": "Hello World", "_version": 2, "_lastChangedAt": 1574469851417, "_deleted": null } } }
If you examine the contents of the Base table now, you should see the updated item:
{ "_lastChangedAt": { "N": "1574469851417" }, "_version": { "N": "2" }, "author": { "S": "Nadia" }, "content": { "S": "Hello World" }, "id": { "S": "81d36bbb-1579-4efe-92b8-2e3f679f628b" }, "title": { "S": "Actually this is my Second Post" } }
If you examine the contents of the Delta table now, you should see two records:
-
A record when the item was created
-
A record for when the item was updated.
The new item will look like:
{ "_lastChangedAt": { "N": "1574469851417" }, "_ttl": { "N": "1574473451" }, "_version": { "N": "2" }, "author": { "S": "Nadia" }, "content": { "S": "Hello World" }, "ds_pk": { "S": "AppSync-delta-sync-post:2019-11-23" }, "ds_sk": { "S": "00:44:11.417:81d36bbb-1579-4efe-92b8-2e3f679f628b:2" }, "id": { "S": "81d36bbb-1579-4efe-92b8-2e3f679f628b" }, "title": { "S": "Actually this is my Second Post" } }
Now we can simulate a Delta query to retrieve
modifications that occurred when a client was offline. We will use the startedAt
value returned from our Base query to make the
request:
query delta { syncPosts(limit: 100, lastSync: 1574469602238, nextToken: null) { items { id author title content _version } startedAt nextToken } }
The return value of this Delta query will look as follows:
{ "data": { "syncPosts": { "items": [ { "id": "81d36bbb-1579-4efe-92b8-2e3f679f628b", "author": "Nadia", "title": "Actually this is my Second Post", "content": "Hello World", "_version": 2 } ], "startedAt": 1574470400808, "nextToken": null } } }