What is AWS DataSync?
AWS DataSync is a secure, online service that automates and accelerates moving data between on-premises storage systems and AWS storage services. It can copy data up to 10x faster than open-source tools like rsync.
Key Concepts
1. How it works
- DataSync Agent: You deploy a software agent (VM) in your on-premise environment (Vmware, Hyper-V).
- Protocol: Connects to your local storage via NFS or SMB.
- Transfer: Sends data securely (TLS) to AWS DataSync service, which writes to S3, EFS, or FSx.
2. Validation
- Automatically checks integrity of data both in transit and at rest.
3. Bandwidth Throttling
- You can limit the bandwidth used by DataSync to prevent impacting other network traffic during business hours.
Exam Tips
[!IMPORTANT] "Transfer large data sets online": Use AWS DataSync.
[!TIP] DataSync vs. Snow Family:
- DataSync: Online replication. Depends on network bandwidth. Good for ongoing sync or migration where network is available.
- Snowball: Offline data transport. Physically ship a device. Good for massive data (Petabytes) or no internet connectivity.
[!NOTE] DataSync preserves metadata (permissions, timestamps) when moving files.
Common Use Cases
- Migration: Active migration of datasets to the cloud with minimal downtime.
- Archiving: Moving cold data from expensive on-prem SAN/NAS to S3 Glacier Deep Archive.
- Hybrid Cloud: Syncing on-prem production data to AWS for analytics or processing.