In the previous section, you discovered how to analyze data using Amazon Athena. In this section, we will see how to perform real-time analysis of data in transit using Amazon Kinesis Data Analytics. This can be done in 2 ways, using old SQL Applications or using new Studio Notebooks is recommended. In this practice session, we will use Studio Notebook and create a SQL-based Kinesis Analytics Application.
In this step, we will navigate to the IAM Console and create a new Amazon Kinesis Data Analytics service role. This allows the Amazon Kinesis Data Analytics service to access the Kinesis Data Streams as well as the AWS Glue Catalog table.
NOTE: We use full access for practice purposes only. Please grant necessary access if you use this Role in production environment.
Kinesis Data Generator is an application that makes it simple to send test data to an Amazon Kinesis stream or an Amazon Kinesis Firehose delivery flow. We will create a Kinesis data stream to receive data from the Kinesis Data Generator. Notebook Our Kinesis Application will read the data in transit from this Kinesis data stream.
Our Kinesis Application Notebook pulls data source information from AWS Glue. When you create a Studio notebook, you specify the AWS Glue database that contains your connection information. When accessing your data source, you specify the AWS Glue tables that are contained in the database.
Select Tables in analyticsworkshopdb
Select the Add tables drop-down menu and then select Add table manually
[
{
"Name": "uuid",
"Type": "string",
"Comment": ""
},
{
"Name": "device_ts",
"Type": "timestamp",
"Comment": ""
},
{
"Name": "device_id",
"Type": "int",
"Comment": ""
},
{
"Name": "device_temp",
"Type": "int",
"Comment": ""
},
{
"Name": "track_id",
"Type": "int",
"Comment": ""
},
{
"Name": "activity_type",
"Type": "string",
"Comment": ""
}
]
Select Next
Check that all information is correct, then Click on Create
Select the newly created table raw_stream
Select on Actions and Select on Edit table
Now let’s create our Kinesis Analytics Streaming Application Studio Notebook in Kinesis Analytics Studio. This Kinesis Analytics Streaming Application Studio Notebook can process stream data from Kinesis Data Stream and we can write SQL analytics queries to get real-time information like current activity count or device temperature.
INFO: Useful logging to understand errors when your application crashes. To do this, it is necessary to add the CLoudwatchFullAccess and CloudwatchLogFullAccess permissions to the service role. We will skip this part because it is not needed in this exercise.
Skip everything and scroll to the bottom
Now that we have created the Notebook, we can run it and try to execute some SQL queries.
View the list of Notebooks in the Studio tab and Select our newly created Notebook AnalyticsWorkshop-KDANotebook.
Select Run
Wait until the Status changes to Running mode. (This will take about 5-7 minutes)
Select Create new note and name the note AnalyticsWorkshop-ZeppelinNote.
Paste this SQL query.
%flink.ssql(type=update)
SELECT * FROM raw_stream;
%flink.ssql(type=update)
SELECT activity_type, count(*) as activity_cnt FROM raw_stream group by activity_type;
To display data from queries running in the Analytics Streaming notebook, we must send the original data from our Kinesis Data Generator.
{
"uuid": "{{random.uuid}}",
"device_ts": "{{date.utc("YYYY-MM-DD HH:mm:ss.SSS")}}",
"device_id": {{random.number(50)}},
"device_temp": {{random.weightedArrayElement(
{"weights":[0.30, 0.30, 0.20, 0.20],"data":[32, 34, 28, 40]}
)}},
"track_id": {{random.number(30)}},
"activity_type": {{random.weightedArrayElement(
{
"weights": [0.1, 0.2, 0.2, 0.3, 0.2],
"data": ["\"Running\"", "\"Working\"", "\"Walking\"", "\"Traveling\"", "\"Sitting\""]
}
)}}
}
In this module, you created a Kinesis Data Stream to receive stream data from the Kinesis Data Generator, a Glue table that stores Kinesis Data Stream data and schema information, and a Kinesis Analytics Studio Notebook application that reads and parses stream data.
Next, we will learn how to build charts/dashboards using Amazon Quicksight.