Prepare Glue Crawler

These actions should be done in Governance account

  1. Open AWS Glue Service in AWS Console in the same region where S3 bucket with aggregated CUR data is located and go to Crawlers section

  2. Click Add Crawler

  3. Specify Crawler name and click Next

  4. In Specify crawler source type leave settings by default. Click Next Create S3 bucket

  5. In Add a data store select S3 bucket name with aggregated CUR data and add following exclusions **.zip, **.json, **.gz, **.yml, **sql, **csv, **/cost_and_usage_data_status/*. Click Next Create S3 bucket

  6. In Add another data store leave No by default. Click Next

  7. In Choose an IAM role select Create an IAM role and provide role name. Click Next Create an IAM role

  8. In Create a schedule for this crawler select Daily and specify Hour and Minute for crawler to run

  9. In Configure the crawler’s output choose Glue Database in which you’d like crawler to create a table or add new one. Select Create a single schema for each S3 path checkbox. Select Add new columns only and Ignore the change and don’t update the table in the data catalog in Configuration options. Click Next

    Please make sure Database name doesn’t include ‘-’ character

    Configure the crawler’s output

  10. Crawler configuration should look as on the screenshot below. Click Finish Crawler configuration

  11. Select created crawler and click Run Crawler