Analytics on AWS workshop > Catalog Data > Creating AWS Glue Crawlers

Creating AWS Glue Crawlers

Create AWS Glue Crawlers

In this step, we will navigate to the AWS Glue Console and create glue crawlers to explore the schema of newly imported data in S3.

Go to: Select here

On the left panel, Select Crawlers
Select Create crawler

Data Analytics on AWS

Information about crawler

Crawler name: AnalyticsworkshopCrawler
Option to add Tags (labels), for example: workshop: AnalyticsOnAWS
Select Next

Data Analytics on AWS

Select Add a data source. Select a data source.

Data Analytics on AWS

- Data source: S3

Let the network connect - options as is
Select In this account (in this account) in the Location of S3 data section.
Include S3 path: s3://yourname-analytics-workshop-bucket/data/
Leave Subsequent crawler runs at default Crawl all sub-folders (Crawl all subfolders)
Select Add an S3 data source

Data Analytics on AWS

Select the recently added S3 data source under Data Sources. Select Next

Data Analytics on AWS

IAM Role (IAM role)

Under Existing IAM role (existing IAM role), select AnalyticsworkshopGlueRole
Leave everything else as it is.
Select Next

Data Analytics on AWS

Output configuration: Select Add database to open a new window to create a database.

Data Analytics on AWS

Database Information

Name: analyticsworkshopdb
Select Create database

Data Analytics on AWS

Close the current window and return to the previous window.

Data Analytics on AWS

Refresh by Clicking on the refresh icon to the right of the Target database.

Select analyticsworkshopdb under Target database
Under Crawler schedule (Crawler schedule)
Frequency: On demand (On demand)
Select Next

Data Analytics on AWS

Review all settings under Review and create. Select Create crawler

Data Analytics on AWS

You will see this message: The following Crawler has been created: “AnalyticsworkshopCrawler”

Data Analytics on AWS

Select Run crawler to run the crawler for the first time

Data Analytics on AWS

Wait for a few minutes.

Data Analytics on AWS