Overview / About
ReinforceWall is a Reinforcement Learning-based network defense system that trains an intelligent agent to detect and respond to cyberattacks in real-time. Instead of relying on static rules, the system uses a Deep Q-Network (DQN) to learn optimal defensive strategies — deciding whether to block, alert, log, or ignore each incoming network request based on 20-dimensional behavioral feature vectors.
Problem
Traditional intrusion detection systems depend on hand-crafted rules and known attack signatures. They struggle with novel threats, produce high false-positive rates, and require constant manual tuning. As attackers evolve their strategies, static defenses fall behind.
Solution
Train an RL agent that learns from experience — observing patterns in network traffic and adapting its defense strategy over thousands of simulated episodes. The agent receives rewards for correctly identifying threats and penalties for false alarms, naturally developing a balanced and effective security policy.
Key Features
| Feature | Description |
|---|---|
| 10 Attack Types | SQL Injection, XSS, Brute Force, DDoS, Command Injection, Path Traversal, Port Scanning, CSRF, MITM, Phishing |
| Deep Q-Network | PyTorch-based DQN agent with experience replay, epsilon-greedy exploration, and target network updates |
| Custom RL Environment | Gymnasium-compatible environment with a 20-dimensional state space and 4-action defensive action space |
| Curriculum Learning | Progressive difficulty levels that gradually increase attack complexity during training |
| Real-time Dashboard | Flask + WebSocket dashboard for live training monitoring, metrics visualization, and model management |
| Firewall Integration | Supports both simulation mode and real iptables integration for production deployment |
| Baseline Comparison | Rule-based attack detector included as a performance baseline |
Tech Stack
| Layer | Technologies |
|---|---|
| Core AI/ML | Python, PyTorch, Gymnasium, NumPy |
| Environment | Custom Gym environment, attack traffic simulator, 20D state feature extraction |
| Training | DQN with experience replay, target networks, curriculum learning, epsilon decay |
| Dashboard | Flask, Flask-SocketIO, WebSocket, HTML/CSS/JS, Chart.js |
| Infrastructure | iptables integration (optional), structured logging, metrics tracking (CSV/JSON) |
How It Works
- Traffic Simulation — The AttackSimulator generates realistic network requests, mixing normal traffic with 10 attack types at configurable probabilities
- State Extraction — A StateExtractor converts each raw request into a 20-dimensional feature vector capturing behavioral patterns (request rate, payload entropy, suspicious headers, etc.)
- Agent Decision — The DQN agent observes the state and selects a defensive action: Block, Alert, Log, or Ignore
- Reward Signal — The environment provides rewards: +8 for correctly blocking attacks, −10 for ignoring attacks, −2 for blocking legitimate traffic
- Learning — Through thousands of episodes with epsilon-greedy exploration, experience replay, and target network updates, the agent converges on an optimal defense policy
Results / Outcomes
- Successfully trains to detect and respond to all 10 attack categories
- Agent learns to balance security vs. availability — minimizing both false negatives (missed attacks) and false positives (blocked legitimate traffic)
- Curriculum learning enables tackling progressively harder attack mixes
- Real-time dashboard provides full visibility into training progress and model performance
My Role
- Designed the complete RL pipeline: environment, state representation, reward structure, and agent architecture
- Implemented 10 realistic attack traffic generators with configurable patterns
- Built the DQN agent in PyTorch with experience replay and curriculum learning
- Created a real-time Flask + WebSocket dashboard for live training monitoring
- Developed a comprehensive metrics tracking and evaluation system
