TG vs ASG: AWS Scaling Secrets for SDETs

Date:

Share post:

Why Scaling Breaks Your Tests (And How TG/ASG Fix It)

Your Selenium Grid just went down during peak load. Playwright CI/CD pipeline failed because EC2 instances couldn’t handle parallel test execution. LoadRunner crashed because no instances were available for performance testing.

This isn’t bad luck. It’s scaling failure.

AWS Target Groups (TG) and Auto Scaling Groups (ASG) solve these exact problems. TG routes traffic intelligently. ASG ensures you always have enough compute capacity.

SDETs ignoring AWS scaling = 40% test flakiness, 3x longer CI/CD cycles, ₹5L+ annual infra waste.

Let’s fix this.

Target Groups (TG): Traffic Intelligence

Target Groups are the traffic cops of AWS. They decide which EC2 instances receive your test traffic.

How TG Works for Testing

Application Load Balancer (ALB)

Target Group (TG) ← Health Checks

EC2 Instances (Selenium Grid Nodes)

Real scenario: Your Selenium Grid has 10 Chrome nodes. ALB → TG → routes only healthy nodes.

# Target Group Configuration
TargetGroup:
Type: AWS::ElasticLoadBalancingV2::TargetGroup
Properties:
HealthCheckIntervalSeconds: 30
HealthCheckPath: /wd/hub/status # Selenium health
HealthCheckPort: 4444
TargetType: instance
Port: 4444
Protocol: HTTP
Matcher:
HttpCode: '200-399' # Selenium OK responses

What breaks without TG:

  • Unhealthy Selenium nodes receive tests → 100% flakiness
  • No load distribution → single node overload
  • Manual routing → ops nightmare

Auto Scaling Groups (ASG): Capacity on Demand

ASG automatically adds/removes EC2 instances based on CPU/Memory load.

ASG Triggers for Test Infra

CPU > 70% → Add 2 Chrome nodes
Memory > 80% → Scale Playwright runners
Selenium queue > 10 → Launch Firefox Grid
# ASG for Selenium Grid
SeleniumASG:
Type: AWS::AutoScaling::AutoScalingGroup
Properties:
MinSize: 2
MaxSize: 10
DesiredCapacity: 4
HealthCheckGracePeriod: 300
HealthCheckType: ELB
LaunchTemplateName: !Ref LaunchTemplate
TargetGroupARNs:
- !Ref TargetGroupChrome
- !Ref TargetGroupFirefox
ScalingPolicy:
Type: TargetTrackingScaling
TargetValue: 70.0 # CPU threshold

SDET Career Impact:

Manual scaling → ₹25L QA Engineer
ASG mastery → ₹40L+ Test Architect

TG + ASG: The Perfect Test Infra Stack

ComponentWithout TG/ASGWith TG/ASG
Test Startup15min manual provisioning45sec auto-scaling
Peak Load40% test failures99.9% pass rate
Cost₹2.5L/month idle₹80k dynamic
CI/CD Speed2hr test runs20min parallel

Real-World Example: E-commerce Peak Testing

Black Friday scenario: 5000 parallel Playwright tests hit product pages.

1. ASG detects CPU spike → launches 8 Chrome nodes
2. TG routes tests to healthy nodes only
3. Unhealthy node fails health check → traffic rerouted
4. Tests complete → ASG scales down to 2 nodes

Result: ₹12L saved vs static 10-node cluster.

Common Scaling Mistakes SDETs Make

❌ Mistake 1: Fixed Instance Count

DesiredCapacity: 10  # Always 10 nodes

Problem: Overpay during low load, under-capacity during peaks.

❌ Mistake 2: Wrong Health Checks

HealthCheckPath: /  # Returns 200 always

Problem: Routes tests to dead Selenium nodes.

❌ Mistake 3: No Warmup Period

HealthCheckGracePeriod: 0

Problem: New nodes marked unhealthy during boot.

Step-by-Step: Production Test Infra

1. Launch Template (EC2 Blueprint)

LaunchTemplate:
ImageId: ami-0c02fb55956c7d316 # Ubuntu Selenium Node
InstanceType: t3.medium
UserData: |
#!/bin/bash
apt update && apt install -y google-chrome-stable
java -jar selenium-server-4.27.jar node

2. Target Groups by Browser

ChromeTG → port 4444 → Chrome nodes
FirefoxTG → port 5555 → Firefox nodes
PlaywrightTG → port 3000 → Headless runners

3. ASG + Scaling Policies

CPU Scaling: 70% threshold
Queue Length: CloudWatch → Lambda → Scale

4. ALB Routing

textALB → /wd/hub/chrome → ChromeTG
ALB → /wd/hub/firefox → FirefoxTG

CI/CD Integration (GitLab/Jenkins)

# GitLab CI with ASG
test:
stage: test
script:
- aws autoscaling set-desired-capacity --auto-scaling-group SeleniumGridASG --desired-capacity 8
- pytest parallel --workers 32
after_script:
- aws autoscaling set-desired-capacity --auto-scaling-group SeleniumGridASG --desired-capacity 2

Test velocity: 4x faster, 60% cheaper.

Cost Optimization Hacks

StrategySavingsImplementation
Spot Instances70%ASG Spot fleet
Graceful Drain20%Connection draining
Predictive Scaling30%ML-based capacity
Reserved Warm Pool40%Pre-warmed nodes

QABash Pro Tip: Spot + Reserved Warm Pool = 85% EC2 savings without test flakiness.

When NOT to Use TG/ASG

❌ < 50 parallel tests → Local Grid
❌ Simple API tests → Local pytest
❌ Proof-of-concept → Docker Compose
✅ 100+ parallel tests → TG/ASG

Career Impact: From QA to Test Architect

TG/ASG mastery = SDET → Architect promotion path:

Level 1: Manual EC2 → ₹20L
Level 2: Static Grid → ₹28L
Level 3: **TG/ASG** → ₹40L+

Interview Question: “How would you handle 10,000 parallel Selenium tests?”
Winning Answer: “Dynamic ASG with browser-specific TGs, spot instances, predictive scaling.”

The Future: Serverless + Containers

ECS Fargate → No EC2 management
EKS → Kubernetes autoscaling
Lambda → API testing at scale

Migration path: ASG → ECS → EKS (12-month roadmap)

FAQs

Q: Do I need TG/ASG for 20 parallel tests?
A: No. Local Selenium Grid or Docker Compose works fine. TG/ASG shines at 100+ tests.

Q: How do health checks work for Playwright?
A: Custom endpoint /health returning Playwright browser status + queue length.

Q: Spot instances reliable for CI/CD?
A: Yes with warm pools. 99.7% uptime vs 100% On-Demand.

Q: Can ASG handle mobile testing?
A: Perfect for Appium Grid. Separate TG for Android/iOS farms.

Q: Cost difference ECS vs ASG?
A: ECS 20% more expensive but zero ops. ASG cheaper with expertise.

Q: Migrate static Grid to ASG?
A: 2-day migration: Launch Template → ASG → ALB → DNS switch.

Q: Multi-region testing support?
A: Global Accelerator → Regional TGs → Cross-region ASG sync.

Q: GitLab CI integration complexity?
A: 15 lines YAML. ASG handles capacity, GitLab runs tests.

🔥 Level Up Your SDET Skills 🔥

Monthly Drop : Real-world automation • Advanced interview strategies • Members-only resources

Ishan Dev Shukl
Ishan Dev Shukl
With 13+ years in SDET leadership, I drive quality and innovation through Test Strategies and Automation. I lead Testing Center of Excellence, ensuring high-quality products across Frontend, Backend, and App Testing. "Quality is in the details" defines my approach—creating seamless, impactful user experiences. I embrace challenges, learn from failure, and take risks to drive success.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Advertisement

Related articles

Selenium 4 Cheat Sheet: 50+ Commands for SDETs (2026)

Updated Feb 2026: Selenium 4.18+, Chrome 122+, WebDriverManager 5.6+ Selenium remains essential for legacy framework maintenance and specific browser...

Auto-Wait Magic: Playwright’s Flake-Proof Secret

If your Selenium tests pass locally but fail in CI, this article is for you. If you’ve added Thread.sleep()...

Top 10 Python Testing Frameworks for QA & SDETs

Python dominates testing in 2026 with 78% AI adoption in QA teams and PyTest used by 12,516+ companies including Amazon, Apple, and IBM. Selenium...

TestNG 7.12.0: Ultimate Guide for Testers & SDETs

Introduction: Why TestNG Still Matters for Testers TestNG remains one of the most widely used Java testing frameworks for...