Data's Blog

//----

EMR Project use :

EMR - USE for spark job

S3 - storage and data lake

GLUE & Athena - SQL Engine

Step function ( link ) - data orchtation 

CI/CD - github action 

data set :

  • retenal traction
  • vechile
  • user
  • location

project :

1) emr spark job creating the matrix and writing output in s3.

2) we use the glue for traction the job and run query in atena  then moving to redshift.

3) s3 used for data-bucket/csv file as well as  code-bucket/.py code.

4)  Step function  creating the spark-emr-cluster and runing job and then terminating spark-emr-cluster.

5) used ci/cd github action

emr job

//----
Choose Colour