Lake Formation
Data Lakes is available for the listed account plans only.
See the available plans, or contact Support.
Lake Formation is a fully managed service built on top of the AWS Glue Data Catalog that provides one central set of tools to build and manage a Data Lake. These tools help import, catalog, transform, and deduplicate data, as well as provide strategies to optimize data storage and security. To learn more about Lake Formation features, see Amazon Web Services documentation.
The security policies in Lake Formation use two layers of permissions: each resource is protected by Lake Formation permissions (which control access to Data Catalog resources and S3 locations) and IAM permissions (which control access to Lake Formation and AWS Glue API resources). When any user or role reads or writes to a resource, that action must pass a both a Lake Formation and an IAM resource check: for example, a user trying to create a new table in the Data Catalog may have Lake Formation access to the Data Catalog, but if they don’t have the correct Glue API permissions, they will be unable to create the table.
For more information about security practices in Lake Formation, see Amazon’s Lake Formation Permissions Reference documentation.
Configure Lake Formation
You can configure Lake Formation using the IAMAllowedPrincipals
group or by using IAM policies for access control. Configuring Lake Formation using the IAMAllowedPrincipals
group is an easier method, recommended for those exploring Lake Formation. Setting up Lake Formation using IAM policies for access control is a more advanced setup option, recommended for those who want additional customization options.
Permissions required to configure Data Lakes
To configure Lake Formation, you must be logged in to AWS with data lake administrator or database creator permissions.
Configure Lake Formation using the IAMAllowedPrincipals group
Existing databases
- Open the AWS Lake Formation service.
- Under Data catalog, select Settings. Ensure the checkboxes under the Default permissions for newly created databases and tables are not checked.
- Under Permissions, select the Data lake permissions section. Click Grant.
- On the Grant data permissions page, select the
IAMAllowedPrincipals
group in the Principals section. - In the Database permissions section, select the checkboxes for Super database permissions and Super grantable permissions.
- Click Grant.
- On the Permissions page, verify the
IAMAllowedPrincipals
group has “All” permissions.
New databases
- Open the AWS Lake Formation service.
- Under Data catalog, select Settings. Ensure the checkboxes under Default permissions for newly created databases and tables are not checked.
- Select the Databases tab and click Create database. On the Create database page:
- Select the Database button.
- Name your database.
- Set the location to
s3://$datalake_bucket/segment-data/
.
Optional: Add a description to your database. - Select the
Use only IAM access control for new tables in this database
. - Click Create database.
- On the Databases page, select your database. From the Actions menu, select Grant.
- On the Grant data permissions page, select the
IAMAllowedPrincipals
group in the Principals section. - In the Database permissions section, select the checkboxes for Super database permissions and Super grantable permissions.
- Click Grant.
- On the Permissions page, verify the
IAMAllowedPrincipals
group has “All” permissions.
Verify your configuration
To verify that you’ve configured Lake Formation, open the AWS Lake Formation service, select Data lake permissions, and verify the IAMAllowedPrincipals
group is listed with “All” permissions.
Configure Lake Formation using IAM policies
Granting Super permission to IAM roles
If you manually configured your database, assign the EMR_EC2_DefaultRole
Super permissions in step 8. If you configured your database using Terraform, assign the segment_emr_instance_profile
Super permissions in step 8.
Existing databases
- Open the AWS Lake Formation service.
- Under Data catalog, select Settings. Ensure the checkboxes under the Default permissions for newly created databases and tables are not checked.
- On the Databases page, select your database. From the Actions menu, select Grant.
- On the Grant data permissions page, select the
EMR_EC2_DefaultRole
(orsegment_emr_instance_profile
, if you configured your data lake using Terraform) andsegment-data-lake-iam-role
roles in the Principals section. - In the Database permissions section, select the checkboxes for Super database permissions and Super grantable permissions.
- Click Grant.
- On the Permissions page, verify the
EMR_EC2_DefaultRole
(orsegment_emr_instance_profile
) andsegment-data-lake-iam-role
roles have “All” permissions.
New databases
- Open the AWS Lake Formation service.
- Under Data catalog, select Settings. Ensure the checkboxes under the Default permissions for newly created databases and tables are not checked.
- Select the Databases tab and click Create database. On the Create database page:
- Select the Database button.
- Name your database.
- Set the location to
s3://$datalake_bucket/segment-data/
.
Optional: Add a description to your database. - Click Create database.
- On the Databases page, select your database. From the Actions menu, select Grant.
- On the Grant data permissions page, select the
EMR_EC2_DefaultRole
(orsegment_emr_instance_profile
, if you configured your data lake using Terraform) andsegment-data-lake-iam-role
roles in the Principals section. - In the Database permissions section, select the checkboxes for Super database permissions and Super grantable permissions.
- Click Grant.
- On the Permissions page, verify the
EMR_EC2_DefaultRole
(orsegment_emr_instance_profile
) andsegment-data-lake-iam-role
roles have “All” permissions.
This page was last modified: 03 Aug 2023
Need support?
Questions? Problems? Need more info? Contact Segment Support for assistance!