Object storage built to store and retrieve any amount of data from anywhere
Comprehensive security and compliance capabilities that meet even the most stringent regulatory requirements
Query-in-place functionality: analytics directly on your data at rest in S3 - without moving the data into a separate analytics system
Amazon Athena (uses Presto) is an interactive query service that makes it easy to analyze data in S3 using standard SQL
Uses machine learning to automatically discover, classify, and protect sensitive data in AWS
Formats: CSV, JSON, ORC, Avro, and Parquet
Security standards: PCI-DSS, HIPAA/HITECH, FedRAMP, EU Data Protection Directive, and FISMA
Data Formats
OCR - self-describing type-aware columnar file format designed for Hadoop workloads
Avro is a data serialization system which relies on schemas
Parquet is a columnar storage format available to any project in the Hadoop ecosystem
Presto: https://prestodb.io
Presto is an open-source distributed SQL query engine optimized for low-latency, ad-hoc analysis of data
It supports the ANSI SQL standard, including complex queries, aggregations, joins, and window functions
It can process data from multiple data sources including the Hadoop Distributed File System (HDFS) and Amazon S3.
Amazon Redshift - Data warehouse
Fast, fully managed data warehouse that makes it simple and cost-effective to analyze
all your data using standard SQL and your existing Business Intelligence (BI) tools.
Loadup the cluster and connect a BI tool
Amazon Redshift is based on PostgreSQL 8.0.2
Uses - Columnar storage on high-performance local disks, and massively parallel query execution