Data Analysis Pipeline on Cloud Security
Architecture Overview
This documentation outlines the design and implementation of a data analysis pipeline for cloud security, focusing on monitoring and analyzing user access to a web application hosted on a Google Cloud Platform (GCP) Virtual Private Cloud (VPC).

Key Components:
- Firewall: Controls access to the VPC.
- Port 80: Allows public users to access the web application.
- Port 22: Reserved for administrative access by the Cloud Engineer.
- Firewall Logs: Enabled to capture all access details.
- GCP VPC: Contains the subnet where the web application is hosted.
- Subnet IP:
10.0.0.4/24
- Server IP:
10.0.0.4
- Log Explorer: Serves as the primary collection point for firewall logs.
- Log Router: Routes the collected logs to appropriate storage and processing services.
- BigQuery: Provides a platform for log analysis and user monitoring.
- Looker: Used for visualization and reporting based on BigQuery datasets.
Pipeline Workflow
1. User Access Through Firewall
- Public Users connect to the web application via Port 80.
- Administrators (Cloud Engineers) connect via Port 22.
2. Logging User Activity
- All traffic passing through the firewall is logged.
- Logs are forwarded to Log Explorer for collection and categorization.