Determine How to Design and Architect the Data Processing Solution
Data processing solutions typically involve one or more of the following types of workload:
- Batch processing of big data sources at rest.
- Real-time processing of big data in motion.
- Interactive exploration of big data.
- Predictive analytics and machine learning.
An architecture style is
- a family of architectures
- share certain characteristics.
- example, N-tier is a common architecture style.
N-tier Architecture
- A traditional architecture for enterprise applications.
- Dependencies are managed by dividing the application into layers that perform logical functions, such as presentation, business logic, and data access.
- A layer can only call into layers that sit below it.
- Hard to introduce changes in one part of the application without touching the rest.
- Frequent updates a challenge
- limiting how quickly new features can be added.
- N-tier is a natural fit for applications using a layered architecture, like
- infrastructure as a service (IaaS) solutions
- application that use a mix of IaaS and managed services.
Web-Queue-Worker Architecture
- It is for a purely PaaS solution
- The application has a web front end for HTTP requests
- Back-end worker to performs CPU-intensive tasks or long-running operations.
- The front end communicates to the worker through an asynchronous message queue.
- Suitable for relatively simple domains with some resource-intensive tasks.
- Use of managed services simplifies deployment and operations.
- But with complex domains, hard to manage dependencies.
- It can easily become large, monolithic component, hard to maintain and update.
- reduce the frequency of updates and limit innovation.
Microservices Architecture
- Usually for more complex domain
- It is composed of many small, independent services.
- Each service implements a single business capability.
- Services are loosely coupled, communicating through API contracts.
- Each service built by a small, focused development team.
- Individual services can be deployed with frequent updates.
- It is more complex to build and manage than N-tier or web-queue-worker.
- It requires a mature development and DevOps culture.
- It can lead to higher release velocity, faster innovation, and a more resilient architecture.
Event-driven architecture
- It use a publish-subscribe (pub-sub) model,
- producers publish events, and consumers subscribe to them
- producers are independent from the consumers,
- consumers are independent from each other.
- Example – IoT application to ingest and process a large volume of data with very low latency, such as IoT solutions.
- Useful if different subsystems must perform different types of processing on the same event data.
Big Data, Big Compute Architecture
- It is specialized architecture style
- It divides large dataset into chunks
- Performing parallel processing across the entire set for analysis and reporting.
- Also called high-performance computing (HPC),
- Makes parallel computations across a large number of cores.
- Used for simulations, modeling, and 3-D rendering.
Architecture style | Dependency management | Domain type |
N-tier | Horizontal tiers divided by subnet | Traditional business domain. Frequency of updates is low. |
Web-Queue-Worker | Front and backend jobs, decoupled by async messaging. | Relatively simple domain with some resource intensive tasks. |
Microservices | Vertically (functionally) decomposed services that call each other through APIs. | Complicated domain. Frequent updates. |
Event-driven architecture. | Producer/consumer. Independent view per sub-system. | IoT and real-time systems |
Big data | Divide a huge dataset into small chunks. Parallel processing on local datasets. | Batch and real-time data analysis. Predictive analysis using ML. |
Big compute | Data allocation to thousands of cores. | Compute intensive domains such as simulation. |
AWS Certified Big Data - Specialty Free Practice TestTake a Quiz