AWS Doc SDK ExamplesWord
기계 번역으로 제공되는 번역입니다. 제공된 번역과 원본 영어의 내용이 상충하는 경우에는 영어 버전이 우선합니다.
SDK for Python(Boto3)을 사용한 Firehose 예제
다음 코드 예제에서는 Firehose와 AWS SDK for Python (Boto3) 함께를 사용하여 작업을 수행하고 일반적인 시나리오를 구현하는 방법을 보여줍니다.
작업은 대규모 프로그램에서 발췌한 코드이며 컨텍스트에 맞춰 실행해야 합니다. 작업은 개별 서비스 함수를 직접적으로 호출하는 방법을 보여주며 관련 시나리오의 컨텍스트에 맞는 작업을 볼 수 있습니다.
시나리오는 동일한 서비스 내에서 또는 다른 AWS 서비스와 결합된 상태에서 여러 함수를 호출하여 특정 태스크를 수행하는 방법을 보여주는 코드 예제입니다.
각 예제에는 컨텍스트에서 코드를 설정하고 실행하는 방법에 대한 지침을 찾을 수 있는 전체 소스 코드에 대한 링크가 포함되어 있습니다.
작업
다음 코드 예시에서는 PutRecord
을 사용하는 방법을 보여 줍니다.
- Python용 SDK(Boto3)
-
참고
더 많은 on GitHub가 있습니다. AWS 코드 예시 리포지토리
에서 전체 예시를 찾고 설정 및 실행하는 방법을 배워보세요. class FirehoseClient: """ AWS Firehose client to send records and monitor metrics. Attributes: config (object): Configuration object with delivery stream name and region. delivery_stream_name (str): Name of the Firehose delivery stream. region (str): AWS region for Firehose and CloudWatch clients. firehose (boto3.client): Boto3 Firehose client. cloudwatch (boto3.client): Boto3 CloudWatch client. """ def __init__(self, config): """ Initialize the FirehoseClient. Args: config (object): Configuration object with delivery stream name and region. """ self.config = config self.delivery_stream_name = config.delivery_stream_name self.region = config.region self.firehose = boto3.client("firehose", region_name=self.region) self.cloudwatch = boto3.client("cloudwatch", region_name=self.region) @backoff.on_exception( backoff.expo, Exception, max_tries=5, jitter=backoff.full_jitter ) def put_record(self, record: dict): """ Put individual records to Firehose with backoff and retry. Args: record (dict): The data record to be sent to Firehose. This method attempts to send an individual record to the Firehose delivery stream. It retries with exponential backoff in case of exceptions. """ try: entry = self._create_record_entry(record) response = self.firehose.put_record( DeliveryStreamName=self.delivery_stream_name, Record=entry ) self._log_response(response, entry) except Exception: logger.info(f"Fail record: {record}.") raise
-
API 세부 정보는 Word for Python(Boto3) PutRecord 참조의 Word를 참조하세요. AWS SDK API
-
다음 코드 예시에서는 PutRecordBatch
을 사용하는 방법을 보여 줍니다.
- Python용 SDK(Boto3)
-
참고
더 많은 on GitHub가 있습니다. AWS 코드 예시 리포지토리
에서 전체 예시를 찾고 설정 및 실행하는 방법을 배워보세요. class FirehoseClient: """ AWS Firehose client to send records and monitor metrics. Attributes: config (object): Configuration object with delivery stream name and region. delivery_stream_name (str): Name of the Firehose delivery stream. region (str): AWS region for Firehose and CloudWatch clients. firehose (boto3.client): Boto3 Firehose client. cloudwatch (boto3.client): Boto3 CloudWatch client. """ def __init__(self, config): """ Initialize the FirehoseClient. Args: config (object): Configuration object with delivery stream name and region. """ self.config = config self.delivery_stream_name = config.delivery_stream_name self.region = config.region self.firehose = boto3.client("firehose", region_name=self.region) self.cloudwatch = boto3.client("cloudwatch", region_name=self.region) @backoff.on_exception( backoff.expo, Exception, max_tries=5, jitter=backoff.full_jitter ) def put_record_batch(self, data: list, batch_size: int = 500): """ Put records in batches to Firehose with backoff and retry. Args: data (list): List of data records to be sent to Firehose. batch_size (int): Number of records to send in each batch. Default is 500. This method attempts to send records in batches to the Firehose delivery stream. It retries with exponential backoff in case of exceptions. """ for i in range(0, len(data), batch_size): batch = data[i : i + batch_size] record_dicts = [{"Data": json.dumps(record)} for record in batch] try: response = self.firehose.put_record_batch( DeliveryStreamName=self.delivery_stream_name, Records=record_dicts ) self._log_batch_response(response, len(batch)) except Exception as e: logger.info(f"Failed to send batch of {len(batch)} records. Error: {e}")
-
API 세부 정보는 Word for Python(Boto3) PutRecordBatch 참조의 Word를 참조하세요. AWS SDK API
-
시나리오
다음 코드 예제는 Firehose를 사용하여 개별 레코드 및 배치 레코드를 처리하는 방법을 보여줍니다.
- Python용 SDK(Boto3)
-
참고
더 많은 on GitHub가 있습니다. AWS 코드 예시 리포지토리
에서 전체 예시를 찾고 설정 및 실행하는 방법을 배워보세요. 이 스크립트는 개별 레코드 및 배치 레코드를 Firehose에 넣습니다.
import json import logging import random from datetime import datetime, timedelta import backoff import boto3 from config import get_config def load_sample_data(path: str) -> dict: """ Load sample data from a JSON file. Args: path (str): The file path to the JSON file containing sample data. Returns: dict: The loaded sample data as a dictionary. """ with open(path, "r") as f: return json.load(f) # Configure logging logging.basicConfig(level=logging.INFO) logger = logging.getLogger(__name__) class FirehoseClient: """ AWS Firehose client to send records and monitor metrics. Attributes: config (object): Configuration object with delivery stream name and region. delivery_stream_name (str): Name of the Firehose delivery stream. region (str): AWS region for Firehose and CloudWatch clients. firehose (boto3.client): Boto3 Firehose client. cloudwatch (boto3.client): Boto3 CloudWatch client. """ def __init__(self, config): """ Initialize the FirehoseClient. Args: config (object): Configuration object with delivery stream name and region. """ self.config = config self.delivery_stream_name = config.delivery_stream_name self.region = config.region self.firehose = boto3.client("firehose", region_name=self.region) self.cloudwatch = boto3.client("cloudwatch", region_name=self.region) @backoff.on_exception( backoff.expo, Exception, max_tries=5, jitter=backoff.full_jitter ) def put_record(self, record: dict): """ Put individual records to Firehose with backoff and retry. Args: record (dict): The data record to be sent to Firehose. This method attempts to send an individual record to the Firehose delivery stream. It retries with exponential backoff in case of exceptions. """ try: entry = self._create_record_entry(record) response = self.firehose.put_record( DeliveryStreamName=self.delivery_stream_name, Record=entry ) self._log_response(response, entry) except Exception: logger.info(f"Fail record: {record}.") raise @backoff.on_exception( backoff.expo, Exception, max_tries=5, jitter=backoff.full_jitter ) def put_record_batch(self, data: list, batch_size: int = 500): """ Put records in batches to Firehose with backoff and retry. Args: data (list): List of data records to be sent to Firehose. batch_size (int): Number of records to send in each batch. Default is 500. This method attempts to send records in batches to the Firehose delivery stream. It retries with exponential backoff in case of exceptions. """ for i in range(0, len(data), batch_size): batch = data[i : i + batch_size] record_dicts = [{"Data": json.dumps(record)} for record in batch] try: response = self.firehose.put_record_batch( DeliveryStreamName=self.delivery_stream_name, Records=record_dicts ) self._log_batch_response(response, len(batch)) except Exception as e: logger.info(f"Failed to send batch of {len(batch)} records. Error: {e}") def get_metric_statistics( self, metric_name: str, start_time: datetime, end_time: datetime, period: int, statistics: list = ["Sum"], ) -> list: """ Retrieve metric statistics from CloudWatch. Args: metric_name (str): The name of the metric. start_time (datetime): The start time for the metric statistics. end_time (datetime): The end time for the metric statistics. period (int): The granularity, in seconds, of the returned data points. statistics (list): A list of statistics to retrieve. Default is ['Sum']. Returns: list: List of datapoints containing the metric statistics. """ response = self.cloudwatch.get_metric_statistics( Namespace="AWS/Firehose", MetricName=metric_name, Dimensions=[ {"Name": "DeliveryStreamName", "Value": self.delivery_stream_name}, ], StartTime=start_time, EndTime=end_time, Period=period, Statistics=statistics, ) return response["Datapoints"] def monitor_metrics(self): """ Monitor Firehose metrics for the last 5 minutes. This method retrieves and logs the 'IncomingBytes', 'IncomingRecords', and 'FailedPutCount' metrics from CloudWatch for the last 5 minutes. """ end_time = datetime.utcnow() start_time = end_time - timedelta(minutes=10) period = int((end_time - start_time).total_seconds()) metrics = { "IncomingBytes": self.get_metric_statistics( "IncomingBytes", start_time, end_time, period ), "IncomingRecords": self.get_metric_statistics( "IncomingRecords", start_time, end_time, period ), "FailedPutCount": self.get_metric_statistics( "FailedPutCount", start_time, end_time, period ), } for metric, datapoints in metrics.items(): if datapoints: total_sum = sum(datapoint["Sum"] for datapoint in datapoints) if metric == "IncomingBytes": logger.info( f"{metric}: {round(total_sum)} ({total_sum / (1024 * 1024):.2f} MB)" ) else: logger.info(f"{metric}: {round(total_sum)}") else: logger.info(f"No data found for {metric} over the last 5 minutes") def _create_record_entry(self, record: dict) -> dict: """ Create a record entry for Firehose. Args: record (dict): The data record to be sent. Returns: dict: The record entry formatted for Firehose. Raises: Exception: If a simulated network error occurs. """ if random.random() < 0.2: raise Exception("Simulated network error") elif random.random() < 0.1: return {"Data": '{"malformed": "data"'} else: return {"Data": json.dumps(record)} def _log_response(self, response: dict, entry: dict): """ Log the response from Firehose. Args: response (dict): The response from the Firehose put_record API call. entry (dict): The record entry that was sent. """ if response["ResponseMetadata"]["HTTPStatusCode"] == 200: logger.info(f"Sent record: {entry}") else: logger.info(f"Fail record: {entry}") def _log_batch_response(self, response: dict, batch_size: int): """ Log the batch response from Firehose. Args: response (dict): The response from the Firehose put_record_batch API call. batch_size (int): The number of records in the batch. """ if response.get("FailedPutCount", 0) > 0: logger.info( f'Failed to send {response["FailedPutCount"]} records in batch of {batch_size}' ) else: logger.info(f"Successfully sent batch of {batch_size} records") if __name__ == "__main__": config = get_config() data = load_sample_data(config.sample_data_file) client = FirehoseClient(config) # Process the first 100 sample network records for record in data[:100]: try: client.put_record(record) except Exception as e: logger.info(f"Put record failed after retries and backoff: {e}") client.monitor_metrics() # Process remaining records using the batch method try: client.put_record_batch(data[100:]) except Exception as e: logger.info(f"Put record batch failed after retries and backoff: {e}") client.monitor_metrics()
이 파일에는 위 스크립트에 대한 구성이 포함되어 있습니다.
class Config: def __init__(self): self.delivery_stream_name = "ENTER YOUR DELIVERY STREAM NAME HERE" self.region = "us-east-1" self.sample_data_file = ( "../../../../../workflows/firehose/resources/sample_records.json" ) def get_config(): return Config()
-
API 세부 정보는 AWS SDK for Python(Boto3) API 참조의 다음 주제를 참조하세요.
-