Some AWS operations return truncated results that require subsequent requests in order to retrieve the entire result set. The subsequent requests typically require pagination tokens or markers from the previous request in order to retrieve the next set of results. Working with these tokens can be cumbersome, since you must manually keep track of them, and the API for each service you are using may differ in the names and details of the tokens.
The AWS SDK for PHP has a feature called Iterators that allow you to retrieve an entire result set without
manually handling pagination tokens or markers. The Iterators in the SDK implement PHP's Iterator
interface, which
allows you to easily enumerate or iterate through resources from a result set with foreach
.
Operations that start with List
or Describe
, or any other operations that are designed to return multiple
records can be used with Iterators. To use an Iterator, you must call the getIterator()
method of the client and
provide the operation name. The following is an example of creating an Amazon S3 ListObjects
Iterator, to iterate
over objects in a bucket.
$iterator = $client->getIterator('ListObjects', array('Bucket' => 'my-bucket'));
foreach ($iterator as $object) {
echo $object['Key'] . "\n";
}
The getIterator()
method also accepts a command object for the first argument. If you have a command object already
instantiated, you can create an iterator directly from the command object.
$command = $client->getCommand('ListObjects', array('Bucket' => 'my-bucket'));
$iterator = $client->getIterator($command);
The actual object returned by getIterator()
is an instance of the Aws\Common\Iterator\AwsResourceIterator
class
(see the API docs
for more information about its methods and properties). This class implements PHP's native Iterator
interface, which
is why it works with foreach
, can be used with iterator functions like iterator_to_array
, and integrates well
with SPL iterators like LimitIterator
.
Iterator objects only store one "page" of results at a time and only make as many requests as they need based on the
current iteration. The S3 ListObjects
operation only returns up to 1000 objects at a time. If your bucket has ~10000
objects, then the iterator would need to do 10 requests. However, it does not execute the subsequent requests until
needed. If you are iterating through the results, the first request would happen when you start iterating, and the
second request would not happen until you iterate to the 1001th object. This can help your application save memory by
only holding one page of results at a time.
Iterators accept an extra set of parameters that are not passed into the commands. You can set a limit on the number of
results you want with the limit
parameter, and you can control how many results you want to get back per request
using the page_size
parameter. If no limit
is specified, then all results are retrieved. If no page_size
is
specified, then the Iterator will use the maximum page size allowed by the operation being executed.
The following example will make 10 Amazon S3 ListObjects
requests (assuming there are more than 1000 objects in the
specified bucket) that each return up to 100 objects. The foreach
loop will yield up to 999 objects.
$iterator = $client->getIterator('ListObjects', array(
'Bucket' => 'my-bucket'
), array(
'limit' => 999,
'page_size' => 100
));
foreach ($iterator as $object) {
echo $object['Key'] . "\n";
}
There are some limitations to the limit
and page_size
parameters though. Not all operations support specifying
a page size or limit, so the Iterator will do its best with what you provide. For example, if an operation always
returns 1000 results, and you specify a limit of 100, the Iterator will only yield 100 results, even though the actual
request sent to the service yielded 1000.
Iterators emit 2 kinds of events:
resource_iterator.before_send
- Emitted right before a request is sent to retrieve results.resource_iterator.after_send
- Emitted right after a request is sent to retrieve results.Iterator objects extend the Guzzle\Common\AbstractHasDispatcher
class which exposes the addSubscriber()
method
and the getEventDispatcher()
method. To attach listeners, you can use the following example which echoes a message
right before and after a request is executed by the iterator.
$iterator = $client->getIterator('ListObjects', array(
'Bucket' => 'my-bucket'
));
// Get the event dispatcher and register listeners for both events
$dispatcher = $iterator->getEventDispatcher();
$dispatcher->addListener('resource_iterator.before_send', function ($event) {
echo "Getting more results…\n";
});
$dispatcher->addListener('resource_iterator.after_send', function ($event) use ($iterator) {
$requestCount = $iterator->getRequestCount();
echo "Results received. {$requestCount} request(s) made so far.\n";
});
foreach ($iterator as $object) {
echo $object['Key'] . "\n";
}