# Source management in Security Lake
<a name="source-management"></a>

Sources are logs and events generated from a single system that match a specific event class in the [Open Cybersecurity Schema Framework (OCSF) in Security Lake](open-cybersecurity-schema-framework.md) schema. Amazon Security Lake can collect logs and events from a variety of sources, including natively supported AWS services and third-party custom sources.

Security Lake runs extract, transform, and load (ETL) jobs on raw source data, and converts the data to Apache Parquet format and the OCSF schema. After processing, Security Lake stores source data in an Amazon Simple Storage Service (Amazon S3) bucket in your AWS account in the AWS Region that the data was generated in. Security Lake creates a different Amazon S3 bucket for each Region in which you enable the service. Each source gets a separate prefix in your S3 bucket, and Security Lake organizes data from each source in a separate set of AWS Lake Formation tables.

**Topics**
+ [Collecting data from AWS services in Security Lake](internal-sources.md)
+ [Collecting data from custom sources in Security Lake](custom-sources.md)

# Collecting data from AWS services in Security Lake
<a name="internal-sources"></a>

Amazon Security Lake can collect logs and events from the following natively-supported AWS services:
+ AWS CloudTrail management and data events (S3, Lambda)
+ Amazon Elastic Kubernetes Service (Amazon EKS) Audit Logs
+ Amazon Route 53 resolver query logs
+ AWS Security Hub CSPM findings
+ Amazon Virtual Private Cloud (Amazon VPC) Flow Logs
+ AWS WAFv2 logs

Security Lake automatically transforms this data into the [Open Cybersecurity Schema Framework (OCSF) in Security Lake](open-cybersecurity-schema-framework.md) and Apache Parquet format.

**Tip**  
 To add one or more of the preceding services as a log source in Security Lake, you *don't* need to separately configure logging in these services, except CloudTrail management events. If you do have logging configured in these services, you *don't* need to change your logging configuration to add them as log sources in Security Lake. Security Lake pulls data directly from these services through an independent and duplicated stream of events. 


## Prerequisite: Verify permissions
<a name="add-internal-sources-permissions"></a>

To add an AWS service as a source in Security Lake, you must have the necessary permissions. Verify that the AWS Identity and Access Management (IAM) policy attached to the role that you use to add a source has permission to perform the following actions:
+ `glue:CreateDatabase`
+ `glue:CreateTable`
+ `glue:GetDatabase`
+ `glue:GetTable`
+ `glue:UpdateTable`
+ `iam:CreateServiceLinkedRole`
+ `s3:GetObject`
+ `s3:PutObject`

It is recommended for the role to have the following conditions and resource scope for the `S3:getObject` and `s3:PutObject` permissions.

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Sid": "AllowUpdatingSecurityLakeS3Buckets",
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:PutObject"
            ],
            "Resource": "arn:aws:s3:::aws-security-data-lake*",
              "Condition": {
                "StringEquals": {
                    "aws:ResourceAccount": "${aws:PrincipalAccount}"
                }
            }
    }
    ]
}
```

------

These actions allow you to collect logs and events from the an AWS service and send them to the correct AWS Glue database and table.

If you use a AWS KMS key for server-side encryption of your data lake, you also need permission for `kms:DescribeKey`.

## Adding an AWS service as a source
<a name="add-internal-sources"></a>

After you add an AWS service as a source, Security Lake automatically starts collecting security logs and events from it. These instructions tell you how to add a natively-supported AWS service as a source in Security Lake. For instructions on adding a custom source, see [Collecting data from custom sources in Security Lake](custom-sources.md).

------
#### [ Console ]

**To add an AWS log source (console)**

1. Open the Security Lake console at [https://console.aws.amazon.com/securitylake/](https://console.aws.amazon.com/securitylake/).

1. Choose **Sources** from the navigation pane.

1. Select the AWS service that you want to collect data from, and choose **Configure**. 

1. In the **Source settings** section, enable the source and select the **Version** of data source that you want to use for data ingestion. By default, the latest version of data source is ingested by Security Lake.
**Important**  
If you don't have the required role permissions to enable the new version of the AWS log source in the specified Region, contact your Security Lake administrator. For more information, see [Update role permissions](https://docs.aws.amazon.com/security-lake/latest/userguide/internal-sources.html#update-role-permissions).

   For your subscribers to ingest the selected version of the data source, you must also update your subscriber settings. For the details on how to edit a subscriber, see [Subscriber management in Amazon Security Lake](https://docs.aws.amazon.com//security-lake/latest/userguide/subscriber-management.html).

   Optionally, you can choose to ingest the latest version only and disable all previous source versions used for data ingestion. 

1. In the **Regions** section, select the Regions in which you want to collect data for the source. Security Lake will collect data from the source from *all* accounts in the selected Regions.

1. Choose **Enable**.

------
#### [ API ]

**To add an AWS log source (API)**

To add an AWS service as a source programmatically, use the [CreateAwsLogSource](https://docs.aws.amazon.com/security-lake/latest/APIReference/API_CreateAwsLogSource.html) operation of the Security Lake API. If you're using the AWS Command Line Interface (AWS CLI), run the [create-aws-log-source](https://awscli.amazonaws.com/v2/documentation/api/latest/reference/securitylake/create-aws-log-source.html) command. The `sourceName` and `regions` parameters are required. Optionally, you can limit the scope of the source to specific `accounts` or a specific `sourceVersion`.

**Important**  
When you don't provide a parameter in your command, Security Lake assumes that the missing parameter refers to the entire set. For example, if you don't provide the `accounts` parameter , the command applies to the entire set of accounts in your organization.

The following example adds VPC Flow Logs as a source in the designated accounts and Regions. This example is formatted for Linux, macOS, or Unix, and it uses the backslash (\$1) line-continuation character to improve readability.

**Note**  
If you apply this request to a Region in which you haven't enabled Security Lake, you'll receive an error. You can resolve the error by enabling Security Lake in that Region or by using the `regions` parameter to specify only those Regions in which you've enabled Security Lake.

```
$ aws securitylake create-aws-log-source \
--sources sourceName=VPC_FLOW,accounts='["123456789012", "111122223333"]',regions=["us-east-2"],sourceVersion="2.0"
```

------

## Getting the status of source collection
<a name="get-status-internal-sources"></a>

Choose your access method, and follow the steps to get a snapshot of the accounts and sources for which log collection is enabled in the current Region.

------
#### [ Console ]

**To get the status of log collection in the current Region**

1. Open the Security Lake console at [https://console.aws.amazon.com/securitylake/](https://console.aws.amazon.com/securitylake/).

1. On the navigation pane, choose **Accounts**.

1. Hover the cursor over the number in the **Sources** column to see which logs are enabled for the selected account.

------
#### [ API ]

To get the status of log collection in the current Region, use the [GetDataLakeSources](https://docs.aws.amazon.com/security-lake/latest/APIReference/API_GetDataLakeSources.html) operation of the Security Lake API. If you're using the AWS CLI, run the [get-data-lake-sources](https://docs.aws.amazon.com/cli/latest/reference/securitylake/get-data-lake-sources.html) command. For the `accounts` parameter, you can specify one or more AWS account IDs as a list. If your request succeeds, Security Lake returns a snapshot for those accounts in the current Region, including which AWS sources Security Lake is collecting data from and the status of each source. If you don't include the `accounts` parameter, the response includes the status of log collection for all accounts in which Security Lake is configured in the current Region.

For example, the following AWS CLI command retrieves log collection status for the specified accounts in the current Region. This example is formatted for Linux, macOS, or Unix, and it uses the backslash (\$1) line-continuation character to improve readability.

```
$ aws securitylake get-data-lake-sources \
--accounts "123456789012" "111122223333"
```

------

# Updating role permissions in Security Lake
<a name="update-role-permissions"></a>

If you don't have the required role permissions or resources—new AWS Lambda function and Amazon Simple Queue Service (Amazon SQS) queue—to ingest data from a new version of the data source, you must update your `AmazonSecurityLakeMetaStoreManagerV2` role permissions and create a new set of resources to process data from your sources.

Choose your preferred method, and follow the instructions to update your role permissions and create new resources to process data from a new version of an AWS log source in a specified Region. This is a one-time action, as the permissions and resources are automatically applied to future data source releases.

------
#### [ Console ]

**To update role permissions (console)**

1. Open the Security Lake console at [https://console.aws.amazon.com/securitylake/](https://console.aws.amazon.com/securitylake/).

   Sign in with the credentials of the delegated Security Lake administrator.

1. In the navigation pane, under **Settings**, choose **General**.

1. Choose **Update role permissions**.

1. In the **Service access** section, do one of the following: 
   + **Create and use a new service role**— You can use the **AmazonSecurityLakeMetaStoreManagerV2** role created by Security Lake.
   + **Use an existing service role**— You can choose an existing service role from the **Service role name** list. 

1. Choose **Apply**.

------
#### [ API ]

**To update role permissions (API)**

To update permissions programmatically, use the [https://docs.aws.amazon.com/security-lake/latest/APIReference/API_UpdateDataLake.html](https://docs.aws.amazon.com/security-lake/latest/APIReference/API_UpdateDataLake.html) operation of the Security Lake API. To update permissions using the AWS CLI, run the [https://docs.aws.amazon.com/cli/latest/reference/securitylake/update-data-lake.html](https://docs.aws.amazon.com/cli/latest/reference/securitylake/update-data-lake.html) command. 

To update your role permissions, you must attach the [AmazonSecurityLakeMetastoreManager](security-iam-awsmanpol.md#security-iam-awsmanpol-AmazonSecurityLakeMetastoreManager) policy to the role. 

------

## Deleting the AmazonSecurityLakeMetaStoreManager role
<a name="remove-sl-metastoremanager-role"></a>

**Important**  
After you update your role permissions to `AmazonSecurityLakeMetaStoreManagerV2`, confirm that the data lake works correctly before you remove the old `AmazonSecurityLakeMetaStoreManager` role. It is recommended to wait at-least 4 hours before removing the role.

 If you decide to remove the role, you must first delete the `AmazonSecurityLakeMetaStoreManager` role from AWS Lake Formation. 

Follow these steps to remove the `AmazonSecurityLakeMetaStoreManager` role from the Lake Formation console.

1. Sign in to the AWS Management Console, and open the Lake Formation console at [https://console.aws.amazon.com/lakeformation/](https://console.aws.amazon.com/lakeformation/).

1. In the Lake Formation console, from the navigation pane, choose **Administrative roles and tasks**.

1. Remove `AmazonSecurityLakeMetaStoreManager` from each Region.

# Removing an AWS service as a source from Security Lake
<a name="remove-internal-sources"></a>

Choose your access method, and follow these steps to remove a natively-supported AWS service as a Security Lake source. You can remove a source for one or more Regions. When you remove the source, Security Lake stops collecting data from that source in the specified Regions and accounts, and subscribers can no longer consume new data from the source. However, subscribers can still consume data that Security Lake collected from the source before removal. You can only use these instructions to remove a natively-supported AWS service as a source. For information about removing a custom source, see [Collecting data from custom sources in Security Lake](custom-sources.md).

------
#### [ Console ]

1. Open the Security Lake console at [https://console.aws.amazon.com/securitylake/](https://console.aws.amazon.com/securitylake/).

1. Choose **Sources** from the navigation pane.

1. Select a source, and choose **Disable**.

1. Select a Region or Regions in which you want to stop collecting data from this source. Security Lake will stop collecting data from the source from *all* accounts in the selected Regions.

------
#### [ API ]

To remove an AWS service as a source programmatically, use the [DeleteAwsLogSource](https://docs.aws.amazon.com/security-lake/latest/APIReference/API_DeleteAwsLogSource.html) operation of the Security Lake API. If you're using the AWS Command Line Interface (AWS CLI), run the [delete-aws-log-source](https://awscli.amazonaws.com/v2/documentation/api/latest/reference/securitylake/delete-aws-log-source.html) command. The `sourceName` and `regions` parameters are required. Optionally, you can limit the scope of the removal to specific `accounts` or a specific `sourceVersion`.

**Important**  
When you don't provide a parameter in your command, Security Lake assumes that the missing parameter refers to the entire set. For example, if you don't provide the `accounts` parameter , the command applies to the entire set of accounts in your organization.

The following example removes VPC Flow Logs as a source in the designated accounts and Regions.

```
$ aws securitylake delete-aws-log-source \
--sources sourceName=VPC_FLOW,accounts='["123456789012", "111122223333"]',regions='["us-east-1", "us-east-2"]',sourceVersion="2.0"
```

The following example removes Route 53 as a source in the designated account and Regions.

```
$ aws securitylake delete-aws-log-source \
--sources sourceName=ROUTE53,accounts='["123456789012"]',regions='["us-east-1", "us-east-2"]',sourceVersion="2.0"
```

The preceding examples are formatted for Linux, macOS, or Unix, and they use the backslash (\$1) line-continuation character to improve readability.

------

# CloudTrail event logs in Security Lake
<a name="cloudtrail-event-logs"></a>

AWS CloudTrail provides you with a history of AWS API calls for your account, including API calls made using the AWS Management Console, the AWS SDKs, the command line tools, and certain AWS services. CloudTrail also allows you to identify which users and accounts called AWS APIs for services that support CloudTrail, the source IP address that the calls were made from, and when the calls occurred. For more information, see the [AWS CloudTrail User Guide](https://docs.aws.amazon.com/awscloudtrail/latest/userguide/).

Security Lake can collect logs associated with CloudTrail management events and CloudTrail data events for S3 and Lambda. CloudTrail management events, S3 data events, and Lambda data events are three separate sources in Security Lake. As a result, they have different values for [https://docs.aws.amazon.com/security-lake/latest/APIReference/API_AwsLogSourceConfiguration.html#securitylake-Type-AwsLogSourceConfiguration-sourceName](https://docs.aws.amazon.com/security-lake/latest/APIReference/API_AwsLogSourceConfiguration.html#securitylake-Type-AwsLogSourceConfiguration-sourceName) when you add one of these as an ingested log source. Management events, also known as control plane events, provide insight into management operations that are performed on resources in your AWS account. CloudTrail data events, also known as data plane operations, show the resource operations performed on or within resources in your AWS account. These operations are often high-volume activities.

To collect CloudTrail management events in Security Lake, you must have at least one CloudTrail multi-Region organization trail that collects read and write CloudTrail management events. Logging must be enabled for the trail. If you do have logging configured in the other services, you don't need to change your logging configuration to add them as log sources in Security Lake. Security Lake pulls data directly from these services through an independent and duplicated stream of events.

A multi-Region trail delivers log files from multiple Regions to a single Amazon Simple Storage Service (Amazon S3) bucket for a single AWS account. If you already have a multi-Region trail managed through CloudTrail console or AWS Control Tower, no further action is required.
+ For information about creating and managing a trail through CloudTrail, see [Creating a trail for an organization](https://docs.aws.amazon.com/awscloudtrail/latest/userguide/creating-trail-organization.html) in the *AWS CloudTrail User Guide*. 
+ For information about creating and managing a trail through AWS Control Tower, see [Logging AWS Control Tower actions with AWS CloudTrail](https://docs.aws.amazon.com/controltower/latest/userguide/logging-using-cloudtrail.html) in the *AWS Control Tower User Guide*.

When you add CloudTrail events as a source, Security Lake immediately starts collecting your CloudTrail event logs. It consumes CloudTrail management and data events directly from CloudTrail through an independent and duplicated stream of events.

Security Lake doesn't manage your CloudTrail events or affect your existing CloudTrail configurations. To manage access and retention of your CloudTrail events directly, you must use the CloudTrail service console or API. For more information, see [Viewing events with CloudTrail Event history](https://docs.aws.amazon.com/awscloudtrail/latest/userguide/view-cloudtrail-events.html) in the *AWS CloudTrail User Guide*.

The following list provides GitHub repository links to the mapping reference for how Security Lake normalizes CloudTrail events to OCSF.

****GitHub OCSF repository for CloudTrail events****
+ Source version 1 [(v1.0.0-rc.2)](https://github.com/ocsf/examples/tree/main/mappings/markdown/AWS/v1.0.0-rc.2/CloudTrail)
+ Source version 2 [(v1.1.0)](https://github.com/ocsf/examples/tree/main/mappings/markdown/AWS/v1.1.0/CloudTrail)

# Amazon EKS Audit Logs in Security Lake
<a name="eks-audit-logs"></a>

When you add Amazon EKS Audit Logs as a source, Security Lake starts collecting in-depth information about the activities performed on the Kubernetes resources running in your Elastic Kubernetes Service (EKS) clusters. EKS Audit Logs help you detect potentially suspicious activities in your EKS clusters within the Amazon Elastic Kubernetes Service. 

Security Lake consumes EKS Audit Log events directly from the Amazon EKS control plane logging feature through an independent and duplicative stream of audit logs. This process is designed to not require additional set up or affect existing Amazon EKS control plane logging configurations that you might have. For more information, see [Amazon EKS control plane logging](https://docs.aws.amazon.com//eks/latest/userguide/control-plane-logs.html) in the **Amazon EKS User Guide**.

Amazon EKS audit logs is supported only in OCSF v1.1.0. For information about how Security Lake normalizes EKS Audit Logs events to OCSF, see the mapping reference in the [GitHub OCSF repository for Amazon EKS Audit Logs events (v1.1.0)](https://github.com/ocsf/examples/tree/main/mappings/markdown/AWS/v1.1.0/EKS Audit Logs).

# Route 53 resolver query logs in Security Lake
<a name="route-53-logs"></a>

Route 53 resolver query logs track DNS queries made by resources within your Amazon Virtual Private Cloud (Amazon VPC). This helps you understand how your applications are operating and spot security threats.

When you add Route 53 resolver query logs as a source in Security Lake, Security Lake immediately starts collecting your resolver query logs directly from Route 53 through an independent and duplicated stream of events.

Security Lake doesn't manage your Route 53 logs or affect your existing resolver query logging configurations. To manage resolver query logs, you must use the Route 53 service console. For more information, see [Managing Resolver query logging configurations](https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/resolver-query-logging-configurations-managing.html) in the *Amazon Route 53 Developer Guide*.

The following list provides GitHub repository links to the mapping reference for how Security Lake normalizes Route 53 logs to OCSF.

****GitHub OCSF repository for Route 53 logs****
+ Source version 1 [(v1.0.0-rc.2)](https://github.com/ocsf/examples/tree/main/mappings/markdown/AWS/v1.0.0-rc.2/Route53)
+ Source version 2 [(v1.1.0)](https://github.com/ocsf/examples/tree/main/mappings/markdown/AWS/v1.1.0/Route53)

# Security Hub CSPM findings in Security Lake
<a name="security-hub-findings"></a>

Security Hub CSPM findings help you understand your security posture in AWS and let you check your environment against security industry standards and best practices. Security Hub CSPM collects findings from various sources, including integrations with other AWS services, third-party product integrations, and checks against Security Hub CSPM controls. Security Hub CSPM processes findings in a standard format called AWS Security Finding Format (ASFF).

When you add Security Hub CSPM findings as a source in Security Lake, Security Lake immediately starts collecting your findings directly from Security Hub CSPM through an independent and duplicated stream of events. Security Lake also transforms the findings from ASFF to the [Open Cybersecurity Schema Framework (OCSF) in Security Lake](open-cybersecurity-schema-framework.md) (OCSF).

Security Lake doesn't manage your Security Hub CSPM findings or affect your Security Hub CSPM settings. To manage Security Hub CSPM findings, you must use the Security Hub CSPM service console, API, or AWS CLI. For more information, see [Findings in AWS Security Hub CSPM](https://docs.aws.amazon.com/securityhub/latest/userguide/securityhub-findings.html) in the *AWS Security Hub User Guide *.

The following list provides GitHub repository links to the mapping reference for how Security Lake normalizes Security Hub CSPM findings to OCSF.

****GitHub OCSF repository for Security Hub CSPM findings****
+ Source version 1 [(v1.0.0-rc.2)](https://github.com/ocsf/examples/tree/main/mappings/markdown/AWS/v1.0.0-rc.2/Security%20Hub)
+ Source version 2 [(v1.1.0)](https://github.com/ocsf/examples/tree/main/mappings/markdown/AWS/v1.1.0/Security%20Hub)

# VPC Flow Logs in Security Lake
<a name="vpc-flow-logs"></a>

The VPC Flow Logs feature of Amazon VPC captures information about the IP traffic going to and from network interfaces within your environment. 

When you add VPC Flow Logs as a source in Security Lake, Security Lake immediately starts collecting your VPC Flow Logs. It consumes VPC Flow Logs directly from Amazon VPC through an independent and duplicate stream of Flow Logs.

Security Lake doesn't manage your VPC Flow Logs or affect your Amazon VPC configurations. To manage your Flow Logs, you must use the Amazon VPC service console. For more information, see [Work with Flow Logs](https://docs.aws.amazon.com/vpc/latest/userguide/working-with-flow-logs.html) in the *Amazon VPC Developer Guide*.

The following list provides GitHub repository links to the mapping reference for how Security Lake normalizes VPC Flow Logs to OCSF.

****GitHub OCSF repository for VPC Flow Logs****
+ Source version 1 [(v1.0.0-rc.2)](https://github.com/ocsf/examples/tree/main/mappings/markdown/AWS/v1.0.0-rc.2/VPC%20Flowlogs)
+ Source version 2 [(v1.1.0)](https://github.com/ocsf/examples/tree/main/mappings/markdown/AWS/v1.1.0/VPC%20Flowlogs)

# AWS WAF logs in Security Lake
<a name="aws-waf"></a>

When you add AWS WAF as a log source in Security Lake, Security Lake immediately starts collecting the logs. AWS WAF is a web application firewall that you can use to monitor web requests that your end users send to your applications and to control access to your content. Logged information includes the time that AWS WAF received a web request from your AWS resource, detailed information about the request, and details about the rules that the request matched. 

Security Lake consumes AWS WAF logs directly from AWS WAF through an independent and duplicate stream of logs. This process is designed to not require additional setup or affect existing AWS WAF configurations. Security Lake logs only retrieve data that's permitted by the AWS WAF [web access control list (web ACL)](https://docs.aws.amazon.com/waf/latest/developerguide/web-acl.html) configuration. If [Data protection](https://docs.aws.amazon.com/waf/latest/developerguide/waf-data-protection-and-logging.html) is enabled for the web ACL in Security Lake accounts, the generated data will be redacted or hashed based on your web ACL settings. For information about using AWS WAF to protect your application resources, see [How AWS WAF works](https://docs.aws.amazon.com/waf/latest/developerguide/how-aws-waf-works.html) in the *AWS WAF Developer Guide*.

**Important**  
If you are using Amazon CloudFront distribution as the resource type in AWS WAF, you must select US East (N. Virginia) to ingest the global logs in Security Lake.

AWS WAF logs is supported only in OCSF v1.1.0. For information about how Security Lake normalizes AWS WAF log events to OCSF, see the mapping reference in the [GitHub OCSF repository for AWS WAF logs (v1.1.0)](https://github.com/ocsf/examples/tree/main/mappings/markdown/AWS/v1.1.0/WAF).

## Removing an AWS service as a source
<a name="remove-internal-sources"></a>

Choose your access method, and follow these steps to remove a natively-supported AWS service as a Security Lake source. You can remove a source for one or more Regions. When you remove the source, Security Lake stops collecting data from that source in the specified Regions and accounts, and subscribers can no longer consume new data from the source. However, subscribers can still consume data that Security Lake collected from the source before removal. You can only use these instructions to remove a natively-supported AWS service as a source. For information about removing a custom source, see [Collecting data from custom sources in Security Lake](custom-sources.md).

------
#### [ Console ]

1. Open the Security Lake console at [https://console.aws.amazon.com/securitylake/](https://console.aws.amazon.com/securitylake/).

1. Choose **Sources** from the navigation pane.

1. Select a source, and choose **Disable**.

1. Select a Region or Regions in which you want to stop collecting data from this source. Security Lake will stop collecting data from the source from *all* accounts in the selected Regions.

------
#### [ API ]

To remove an AWS service as a source programmatically, use the [DeleteAwsLogSource](https://docs.aws.amazon.com/security-lake/latest/APIReference/API_DeleteAwsLogSource.html) operation of the Security Lake API. If you're using the AWS Command Line Interface (AWS CLI), run the [delete-aws-log-source](https://awscli.amazonaws.com/v2/documentation/api/latest/reference/securitylake/delete-aws-log-source.html) command. The `sourceName` and `regions` parameters are required. Optionally, you can limit the scope of the removal to specific `accounts` or a specific `sourceVersion`.

**Important**  
When you don't provide a parameter in your command, Security Lake assumes that the missing parameter refers to the entire set. For example, if you don't provide the `accounts` parameter , the command applies to the entire set of accounts in your organization.

The following example removes VPC Flow Logs as a source in the designated accounts and Regions.

```
$ aws securitylake delete-aws-log-source \
--sources sourceName=VPC_FLOW,accounts='["123456789012", "111122223333"]',regions='["us-east-1", "us-east-2"]',sourceVersion="2.0"
```

The following example removes Route 53 as a source in the designated account and Regions.

```
$ aws securitylake delete-aws-log-source \
--sources sourceName=ROUTE53,accounts='["123456789012"]',regions='["us-east-1", "us-east-2"]',sourceVersion="2.0"
```

The preceding examples are formatted for Linux, macOS, or Unix, and they use the backslash (\$1) line-continuation character to improve readability.

------

# Collecting data from custom sources in Security Lake
<a name="custom-sources"></a>

Amazon Security Lake can collect logs and events from third-party custom sources. A Security Lake custom source is a third-party service that sends security logs and events to Amazon Security Lake. Before sending the data, the custom source must convert the logs and events to the Open Cybersecurity Schema Framework (OCSF) and meet the source requirements for Security Lake including partitioning, parquet file format and object size and rate requirements.

For each custom source, Security Lake handles the following:
+ Provides a unique prefix for the source in your Amazon S3 bucket.
+ Creates a role in AWS Identity and Access Management (IAM) that permits a custom source to write data to the data lake. The permissions boundary for this role is set by an AWS managed policy called [`AmazonSecurityLakePermissionsBoundary`](security-iam-awsmanpol.md#security-iam-awsmanpol-AmazonSecurityLakePermissionsBoundary).
+ Creates an AWS Lake Formation table to organize objects that the source writes to Security Lake.
+ Sets up an AWS Glue crawler to partition your source data. The crawler populates the AWS Glue Data Catalog with the table. It also automatically discovers new source data and extracts schema definitions.

**Note**  
You can add up to a maximum of 50 custom log sources in an account.

To add a custom source to Security Lake, it must meet the following requirements. Failure to meet these requirements could have performance impacts, and may impact analytics use cases such as querying.
+ **Destination** – The custom source must be able to write data to Security Lake as a set of S3 objects underneath the prefix assigned to the source. For sources that contain multiple categories of data, you should deliver each unique [Open Cybersecurity Schema Framework (OCSF) event class](https://schema.ocsf.io/classes?extensions=) as a separate source. Security Lake creates an IAM role that permits the custom source to write to the specified location in your S3 bucket.
+ **Format** – Each S3 object that's collected from the custom source should be formatted as an Apache Parquet file.
+ **Schema** – The same OCSF event class should apply to each record within a Parquet-formatted object. Security Lake supports versions 1.x and 2.x of Parquet. Data page size should be limited to 1 MB (uncompressed). Row group size should be no larger than 256 MB (compressed). For compression within the Parquet object, zstandard is preferred.
+ **Partitioning** – Objects must be partitioned by region, AWS account, eventDay. Objects should be prefixed with `source location/region=region/accountId=accountID/eventDay=yyyyMMdd/`.
+ **Object size and rate** – Files sent to Security Lake should be sent in increments between 5 minutes and 1 event day. Customers may send files more often than 5 minutes if files are larger than 256MB in size. The object and size requirement is to optimize Security Lake for Query Performance. Not following the custom source requirements may have an impact on your Security Lake performance.
+ **Sorting** – Within each Parquet-formatted object, records should be ordered by time to reduce the cost of querying data.

**Note**  
Use the [OCSF Validation tool](https://github.com/aws-samples/amazon-security-lake-ocsf-validation) to verify if the custom source is compatible with the `OCSF Schema`. For custom sources, Security Lake supports OCSF version 1.3 and earlier.

## Partitioning requirements for ingesting custom sources in Security Lake
<a name="custom-sources-best-practices"></a>

To facilitate efficient data processing and querying, we require meeting the partitioning and object and size requirements when adding a custom source to Security Lake:

**Partitioning**  
Objects should be partitioned by source location, AWS Region, AWS account, and date.  
+ The partition data path is formatted as

   `/ext/custom-source-name/region=region/accountId=accountID/eventDay=YYYYMMDD`.

  A sample partition with example bucket name is `aws-security-data-lake-us-west-2-lake-uid/ext/custom-source-name/region=us-west-2/accountId=123456789012/eventDay=20230428/`.

The following list describes the parameters used in the S3 path partition:
+ The name of the Amazon S3 bucket in which Security Lake stores your custom source data.
+ `source-location` – Prefix for the custom source in your S3 bucket. Security Lake stores all S3 objects for a given source under this prefix, and the prefix is unique to the given source.
+ `region` – AWS Region to which the data is uploaded. For example, you must use `US East (N. Virginia)` to upload data into your Security Lake bucket in the US East (N. Virginia) region.
+ `accountId` – AWS account ID that the records in the source partition pertain to. For records pertaining to accounts outside of AWS, we recommend using a string such as `external` or `external_externalAccountId`. By adopting this naming convection, you can avoid ambiguity in naming external account IDs so that they do not conflict with AWS account IDs or external account IDs maintained by other identity management systems.
+ `eventDay` – UTC timestamp of the record, truncated to hour formatted as an eight character string (`YYYYMMDD`). If records specify a different timezone in the event timestamp, you must convert the timestamp into UTC for this partition key.

## Prerequisites to adding a custom source in Security Lake
<a name="iam-roles-custom-sources"></a>

When adding a custom source, Security Lake creates an IAM role that permits the source to write data to the correct location in the data lake. The name of the role follows the format `AmazonSecurityLake-Provider-{name of the custom source}-{region}`, where `region` is the AWS Region in which you're adding the custom source. Security Lake attaches a policy to the role that permits access to the data lake. If you've encrypted the data lake with a customer managed AWS KMS key, Security Lake also attaches a policy with `kms:Decrypt` and `kms:GenerateDataKey` permissions to the role. The permissions boundary for this role is set by an AWS managed policy called [`AmazonSecurityLakePermissionsBoundary`](security-iam-awsmanpol.md#security-iam-awsmanpol-AmazonSecurityLakePermissionsBoundary).

**Topics**
+ [Verify permissions](#add-custom-sources-permissions)
+ [Create IAM role to permit write access to Security Lake bucket location (API and AWS CLI-only step)](#iam-roles-glue-crawler)

### Verify permissions
<a name="add-custom-sources-permissions"></a>

Before adding a custom source, verify that you have the permissions to perform the following actions.

To verify your permissions, use IAM to review the IAM policies that are attached to your IAM identity. Then, compare the information in those policies to the following list of actions that you must be allowed to perform to add a custom source. 
+ `glue:CreateCrawler`
+ `glue:CreateDatabase`
+ `glue:CreateTable`
+ `glue:StopCrawlerSchedule`
+ `iam:GetRole`
+ `iam:PutRolePolicy`
+ `iam:DeleteRolePolicy`
+ `iam:PassRole`
+ `lakeformation:RegisterResource`
+ `lakeformation:GrantPermissions`
+ `s3:ListBucket`
+ `s3:PutObject`

These actions allow you to collect logs and events from a custom source, send them to the correct AWS Glue database and table, and store them in Amazon S3.

If you use an AWS KMS key for server-side encryption of your data lake, you also need permission for `kms:CreateGrant`, `kms:DescribeKey`, and `kms:GenerateDataKey`.

**Important**  
If you plan to use the Security Lake console to add a custom source, you can skip the next step and proceed to [Adding a custom source in Security Lake](adding-custom-sources.md). The Security Lake console offers a streamlined process for getting started, and creates all necessary IAM roles or uses existing roles on your behalf.  
If you plan to use Security Lake API or AWS CLI to add a custom source, continue with the next step to create an IAM role to permit write access to Security Lake bucket location.

### Create IAM role to permit write access to Security Lake bucket location (API and AWS CLI-only step)
<a name="iam-roles-glue-crawler"></a>

If you're using Security Lake API or AWS CLI to add a custom source, add this IAM role to grant AWS Glue permission to crawl your custom source data and identify partitions in the data. These partitions are necessary to organize your data and create and update tables in the Data Catalog.

After creating this IAM role, you will need the Amazon Resource Name (ARN) of the role in order to add a custom source.

You must attach the `arn:aws:iam::aws:policy/service-role/AWSGlueServiceRole` AWS managed policy.

To grant the necessary permissions, you must also create and embed the following inline policy in your role to permit AWS Glue crawler to read data files from the custom source and create/update the tables in AWS Glue Data Catalog.

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Sid": "S3WriteRead",
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:PutObject"
            ],
            "Resource": [
            "arn:aws:s3:::amzn-s3-demo-bucket/*"
            ]
        }
    ]
}
```

------

Attach the following trust policy to permit an AWS account by using which, it can assume the role based on the external ID:

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": "glue.amazonaws.com"
            },
            "Action": "sts:AssumeRole"
        }
    ]
}
```

------

If the S3 bucket in the Region where you're adding the custom source is encrypted with a customer-managed AWS KMS key, you must also attach the following policy to the role and to your KMS key policy:

```
{
    "Effect": "Allow",
    "Action": [
        "kms:GenerateDataKey"
        "kms:Decrypt"
    ],
    "Condition": {
        "StringLike": {
            "kms:EncryptionContext:aws:s3:arn": [
                "arn:aws:s3:::{{name of S3 bucket created by Security Lake}"
            ]
        }
    },
    "Resource": [
        "{{ARN of customer managed key}}"
    ]
}
```

# Adding a custom source in Security Lake
<a name="adding-custom-sources"></a>

After creating the IAM role to invoke the AWS Glue crawler, follow these steps to add a custom source in Security Lake.

------
#### [ Console ]

1. Open the Security Lake console at [https://console.aws.amazon.com/securitylake/](https://console.aws.amazon.com/securitylake/).

1. By using the AWS Region selector in the upper-right corner of the page, select the Region where you want to create the custom source.

1. Choose **Custom sources** in the navigation pane, and then choose **Create custom source**.

1. In the **Custom source details** section, enter a globally unique name for your custom source. Then, select an OCSF event class that describes the type of data that the custom source will send to Security Lake.

1. For **AWS account with permission to write data**, enter the **AWS account ID** and **External ID** of the custom source that will write logs and events to the data lake.

1. For **Service Access**, create and use a new service role or use an existing service role that gives Security Lake permission to invoke AWS Glue.

1. Choose **Create**.

------
#### [ API ]

To add a custom source programmatically, use the [CreateCustomLogSource](https://docs.aws.amazon.com/security-lake/latest/APIReference/API_CreateCustomLogSource.html) operation of the Security Lake API. Use the operation in the AWS Region where you want to create the custom source. If you're using the AWS Command Line Interface (AWS CLI), run the [create-custom-log-source](https://awscli.amazonaws.com/v2/documentation/api/latest/reference/securitylake/create-custom-log-source.html) command.

In your request, use the supported parameters to specify configuration settings for the custom source:
+ `sourceName` – Specify a name for the source. The name must be a Regionally unique value.
+ `eventClasses` – Specify one or more OCSF event classes to describe the type of data that the source will send to Security Lake. For a list of OCSF event classes supported as source in Security Lake, see [Open Cybersecurity Schema Framework (OCSF)](https://schema.ocsf.io/classes?extensions).
+ `sourceVersion` – Optionally, specify a value to limit log collection to a specific version of custom source data.
+ `crawlerConfiguration` – Specify the Amazon Resource Name (ARN) of the IAM role that you created to invoke the AWS Glue crawler. For the detailed steps to create an IAM role, see [Prerequisites to adding a custom source](https://docs.aws.amazon.com//security-lake/latest/userguide/custom-sources.html#iam-roles-glue-crawler)
+ `providerIdentity` – Specify the AWS identity and external ID that the source will use to write logs and events to the data lake.

The following example adds a custom source as a log source in the designated log provider account in designated Regions. This example is formatted for Linux, macOS, or Unix, and it uses the backslash (\$1) line-continuation character to improve readability.

```
$ aws securitylake create-custom-log-source \
--source-name EXAMPLE_CUSTOM_SOURCE \
--event-classes '["DNS_ACTIVITY", "NETWORK_ACTIVITY"]' \
--configuration crawlerConfiguration={"roleArn=arn:aws:iam::XXX:role/service-role/RoleName"},providerIdentity={"externalId=ExternalId,principal=principal"}  \
--region=[“ap-southeast-2”]
```

------

## Keeping custom source data updated in AWS Glue
<a name="maintain-glue-schema"></a>

After you add a custom source in Security Lake, Security Lake creates an AWS Glue crawler. The crawler connects to your custom source, determines the data structures, and populates the AWS Glue Data Catalog with tables.

We recommend manually running the crawler to keep your custom source schema up to date and maintain query functionality in Athena and other querying services. Specifically, you should run the crawler if either of the following changes occur in your input data set for a custom source:
+ The data set has one or more new top-level columns.
+ The data set has one or more new fields in a column with a `struct` datatype.

For instructions on running a crawler, see [Scheduling an AWS Glue crawler](https://docs.aws.amazon.com/glue/latest/dg/schedule-crawler.html) in the *AWS Glue Developer Guide*.

Security Lake can't delete or update existing crawlers in your account. If you delete a custom source, we recommend deleting the associated crawler if you plan to create a custom source with the same name in the future.

## Supported OCSF event classes
<a name="ocsf-eventclass"></a>

The Open Cybersecurity Schema Framework (OCSF) event classes describes the type of data that the custom source will send to Security Lake. The list of supported event classes are:

```
public enum OcsfEventClass {
    ACCOUNT_CHANGE,
    API_ACTIVITY,
    APPLICATION_LIFECYCLE,
    AUTHENTICATION,
    AUTHORIZE_SESSION,
    COMPLIANCE_FINDING,
    DATASTORE_ACTIVITY,
    DEVICE_CONFIG_STATE,
    DEVICE_CONFIG_STATE_CHANGE,
    DEVICE_INVENTORY_INFO,
    DHCP_ACTIVITY,
    DNS_ACTIVITY,
    DETECTION_FINDING,
    EMAIL_ACTIVITY,
    EMAIL_FILE_ACTIVITY,
    EMAIL_URL_ACTIVITY,
    ENTITY_MANAGEMENT,
    FILE_HOSTING_ACTIVITY,
    FILE_SYSTEM_ACTIVITY,
    FTP_ACTIVITY,
    GROUP_MANAGEMENT,
    HTTP_ACTIVITY,
    INCIDENT_FINDING,
    KERNEL_ACTIVITY,
    KERNEL_EXTENSION,
    MEMORY_ACTIVITY,
    MODULE_ACTIVITY,
    NETWORK_ACTIVITY,
    NETWORK_FILE_ACTIVITY,
    NTP_ACTIVITY,
    PATCH_STATE,
    PROCESS_ACTIVITY,
    RDP_ACTIVITY,
    REGISTRY_KEY_ACTIVITY,
    REGISTRY_VALUE_ACTIVITY,
    SCHEDULED_JOB_ACTIVITY,
    SCAN_ACTIVITY,
    SECURITY_FINDING,
    SMB_ACTIVITY,
    SSH_ACTIVITY,
    USER_ACCESS,
    USER_INVENTORY,
    VULNERABILITY_FINDING,
    WEB_RESOURCE_ACCESS_ACTIVITY,
    WEB_RESOURCES_ACTIVITY,
    WINDOWS_RESOURCE_ACTIVITY,
    // 1.3 OCSF event classes
    ADMIN_GROUP_QUERY,
    DATA_SECURITY_FINDING,
    EVENT_LOG_ACTIVITY,
    FILE_QUERY,
    FILE_REMEDIATION_ACTIVITY,
    FOLDER_QUERY,
    JOB_QUERY,
    KERNEL_OBJECT_QUERY,
    MODULE_QUERY,
    NETWORK_CONNECTION_QUERY,
    NETWORK_REMEDIATION_ACTIVITY,
    NETWORKS_QUERY,
    PERIPHERAL_DEVICE_QUERY,
    PROCESS_QUERY,
    PROCESS_REMEDIATION_ACTIVITY,
    REMEDIATION_ACTIVITY,
    SERVICE_QUERY,
    SOFTWARE_INVENTORY_INFO,
    TUNNEL_ACTIVITY,
    USER_QUERY,
    USER_SESSION_QUERY,
    // 1.3 OCSF event classes (Win extension)
    PREFETCH_QUERY,
    REGISTRY_KEY_QUERY,
    REGISTRY_VALUE_QUERY,
    WINDOWS_SERVICE_ACTIVITY
}
```

# Deleting a custom source from Security Lake
<a name="delete-custom-source"></a>

Delete a custom source to stop sending data from the source to Security Lake. When you remove the source, Security Lake stops collecting data from that source in the specified Regions and accounts, and subscribers can no longer consume new data from the source. However, subscribers can still consume data that Security Lake collected from the source before removal. You can only use these instructions to remove a custom source. For information about removing a natively-supported AWS service, see [Collecting data from AWS services in Security Lake](custom-sources.md).

When deleting a custom source in Security Lake, you must disable each source outside of the Security Lake console with the source. Failure to disable an integration may result in source integrations continuing to send logs into Amazon S3. 

------
#### [ Console ]

1. Open the Security Lake console at [https://console.aws.amazon.com/securitylake/](https://console.aws.amazon.com/securitylake/).

1. By using the AWS Region selector in the upper-right corner of the page, select the Region that you want to remove the custom source from.

1. In the navigation pane, choose **Custom sources**.

1. Select the custom source that you want to remove.

1. Choose **Deregister custom source** and then choose **Delete** to confirm the action.

------
#### [ API ]

To delete a custom source programmatically, use the [DeleteCustomLogSource](https://docs.aws.amazon.com/security-lake/latest/APIReference/API_DeleteCustomLogSource.html) operation of the Security Lake API. If you're using the AWS Command Line Interface (AWS CLI), run the [delete-custom-log-source](https://awscli.amazonaws.com/v2/documentation/api/latest/reference/securitylake/delete-custom-log-source.html) command. Use the operation in the AWS Region where you want to delete the custom source.

In your request, use the `sourceName` parameter to specify the name of the custom source to delete. Or specify the name of the custom source and use the `sourceVersion` parameter to limit the scope of the deletion to only a specific version of data from the custom source.

The following example deletes a custom log source from Security Lake.

This example is formatted for Linux, macOS, or Unix, and it uses the backslash (\$1) line-continuation character to improve readability.

```
$ aws securitylake delete-custom-log-source \
--source-name EXAMPLE_CUSTOM_SOURCE
```

------