Data's Blog

//---- Crawler :

1) access your data

2) extract data and create table definiton in aws catatlog

3) create and manage aws scema and manage partition of aws table

 

some time it required help to find crrorect formate/patter in data detection so it required help of classifire 

Some built in classifire alavaliable and we can make custom as well.

 

Classifier

A classifier reads the data in a data store. If it recognizes the format of the data, it generates a schema. The classifier also returns a certainty number to indicate how certain the format recognition was.

When do I use a classifier?
 You use classifiers when you crawl a data store to define metadata tables in the AWS Glue Data Catalog.

For custom classifiers, you define the logic for creating the schema based on the type of classifier. Classifier types include defining schemas based on grok patterns, XML tags, and JSON paths.

Built-in classifiers
 
AWS Glue provides built-in classifiers for various formats, including JSON, CSV, web logs, and many database systems

Built-in classifiers
Built-in Mysql(rds) classifier  - Mysql(rds)
Built-in Parquet classifier  - Parquet
Built-in Redshift classifier  - redshift
Built-in Json classifier  - Json
Built-in CSV classifier - Comma (,) / Pipe (|) / Tab (\t) / Semicolon (;)

 

 
//----
Choose Colour