FortiGate High Availability (HA) Deployment Models

Network downtime is not an option in today’s business environment. A single point of failure in your firewall infrastructure can bring operations to a halt, affecting revenue, productivity, and customer trust. This is where FortiGate High Availability (HA) deployment models become essential. By implementing redundancy at the firewall level, you ensure continuous network protection and seamless connectivity even when hardware failures occur.

Understanding the nuances between Active-Passive and Active-Active configurations is crucial for making informed decisions about your network architecture. Each model offers distinct advantages and trade-offs that can significantly impact your organisation’s security posture, performance, and operational costs. This comprehensive guide explores both deployment models, examines their practical applications, and highlights critical pitfalls to avoid during implementation and testing.

Whether you are planning a new deployment or evaluating your existing HA setup, this analysis will help you choose the right model for your specific requirements and avoid common mistakes that could compromise your network’s resilience.

Active-Passive HA Configuration

Active-Passive HA represents the most straightforward approach to FortiGate redundancy. In this configuration, one unit serves as the primary (active) firewall handling all traffic, whilst the secondary (passive) unit remains on standby, ready to assume control if the primary unit fails.

How Active-Passive Works

The primary unit processes all network traffic, maintains the routing table, and enforces security policies. Meanwhile, the passive unit continuously synchronises with the primary, receiving configuration updates, session tables, and routing information. This synchronisation ensures the standby unit can immediately take over without losing established connections or requiring manual reconfiguration.

When a failure occurs on the primary unit, the passive unit detects the failure through heartbeat monitoring and gratuitous ARP announcements. The failover process typically completes within seconds, depending on your configuration and network topology.

Advantages of Active-Passive

Simplicity and Reliability: Active-Passive configurations are easier to implement and troubleshoot. With only one unit actively processing traffic, there are fewer variables to consider when diagnosing network issues or planning capacity.
Predictable Performance: Since all traffic flows through a single unit, performance characteristics remain consistent. You can accurately predict throughput, latency, and resource utilisation without worrying about load distribution complexities.
Lower Licensing Costs: Many FortiGuard services and feature licences apply only to the active unit. This can result in significant cost savings compared to Active-Active deployments, where both units require full licensing.
Guaranteed Session Persistence: All sessions are maintained on the active unit, eliminating any concerns about session distribution or asymmetric routing that can complicate Active-Active setups.

Disadvantages and Limitations

Underutilised Resources: The passive unit essentially sits idle during normal operations, representing a significant capital investment that provides no performance benefit until a failure occurs.
Limited Scalability: Your maximum throughput is constrained by the capacity of a single unit. As network demands grow, you cannot leverage the combined processing power of both units.
Single Point of Performance: If the active unit becomes overwhelmed, the passive unit cannot assist with load distribution. Performance bottlenecks must be addressed through hardware upgrades rather than load sharing.

Ideal Use Cases

Active-Passive configurations work best for organisations with:

Predictable traffic patterns that do not exceed single-unit capacity
Limited IT resources for managing complex load-balancing scenarios
Strict budget constraints requiring minimal licensing overhead
Applications that do not benefit from or cannot handle load distribution
Branch offices or smaller deployments where simplicity is paramount

Active-Active HA Configuration

Active-Active HA maximises resource utilisation by distributing traffic across both FortiGate units simultaneously. This approach provides both redundancy and enhanced performance, making it attractive for high-throughput environments.

Load Distribution Methods

FortiGate Active-Active deployments typically employ session-based load balancing. New connections are distributed between the two units using various algorithms, such as round-robin, least connections, or source IP hashing. Each unit maintains its own session table for the connections it handles.

Session Pickup vs New Sessions: One critical distinction in Active-Active operation is how existing sessions are handled during a failure. While new sessions can be distributed to the remaining active unit, established sessions on the failed unit may be lost unless session synchronisation is enabled and properly configured.

Advantages of Active-Active

Maximum Resource Utilisation: Both units actively process traffic, effectively doubling your processing capacity compared to a single unit. This makes Active-Active ideal for bandwidth-intensive environments.
Enhanced Performance: The combined throughput of both units can handle higher traffic volumes and provides better response times during peak usage periods.
Graceful Degradation: When one unit fails, the remaining unit continues operating, albeit at reduced capacity. This provides a more gradual performance decline rather than a complete failover event.
Future-Proofing: As your network grows, the Active-Active configuration can accommodate increased traffic without immediate hardware upgrades.

Disadvantages and Considerations

Increased Complexity: Managing traffic distribution, session synchronisation, and asymmetric routing scenarios requires more sophisticated configuration and monitoring.
Higher Licensing Costs: Both units require full feature licensing, as they are both actively processing traffic and enforcing security policies.
Potential Asymmetric Routing: Traffic from a connection might enter through one unit and return through another, which can complicate stateful inspection and may require additional configuration considerations.
Synchronisation Overhead: Maintaining session synchronisation between units consumes bandwidth and processing resources, potentially impacting overall performance.

Ideal Use Cases

Active-Active configurations are most suitable for:

High-throughput environments where single-unit capacity is insufficient
Organisations with budget flexibility for dual licensing
Networks with experienced administrators capable of managing complex configurations
Applications that can tolerate potential session interruptions during failover
Data centres or large campus networks with substantial traffic volumes

Critical Synchronisation Pitfalls

Successful HA deployment depends heavily on proper synchronisation between the FortiGate units. Several common pitfalls can compromise the effectiveness of your HA setup, potentially leading to service interruptions or data loss during failover events.

Configuration Drift

One of the most insidious problems in HA deployments is configuration drift, where the primary and secondary units gradually develop configuration differences. This typically occurs when administrators make changes directly on the primary unit without ensuring proper synchronisation to the secondary.

Root Causes:

Manual configuration changes made outside the HA management interface
Failed synchronisation due to network connectivity issues
Interrupted synchronisation processes during large configuration updates
Third-party management tools that do not properly handle HA synchronisation

Prevention Strategies:

Always verify synchronisation status after making configuration changes
Implement regular configuration audits to detect drift
Use automated tools to compare configurations between HA members
Establish change management procedures that mandate synchronisation verification

Session Table Synchronisation Issues

In Active-Active configurations, session table synchronisation is crucial for maintaining connections during failover. However, several factors can disrupt this process, leading to dropped connections and service interruptions.

Common Session Sync Problems:

Insufficient bandwidth allocated for synchronisation traffic
Network latency affecting synchronisation timing
High connection establishment rates overwhelming the sync process
Mismatched firmware versions between HA members

Best Practices:

Dedicate sufficient bandwidth for HA synchronisation links
Monitor synchronisation lag and adjust parameters accordingly
Implement QoS policies to prioritise synchronisation traffic
Ensure firmware versions are identical across all HA members

Heartbeat and Health Monitoring Failures

The HA heartbeat mechanism is the foundation of failure detection and failover triggering. Improperly configured or unreliable heartbeat monitoring can result in false failovers (split-brain scenarios) or failure to detect actual problems.

Common Heartbeat Issues:

Single point of failure in heartbeat communication paths
Inappropriate heartbeat timeout values
Network congestion affecting heartbeat reliability
Physical layer problems on dedicated HA links

Mitigation Approaches:

Configure multiple heartbeat paths for redundancy
Tune heartbeat intervals and timeouts for your network conditions
Use dedicated physical connections for HA communication
Implement comprehensive monitoring of HA link health

Failover Testing Pitfalls

Testing is crucial for validating HA functionality, yet many organisations fall into common traps that provide false confidence or fail to identify critical issues.

Inadequate Test Scenarios

Many administrators limit their testing to simple power-off scenarios, which do not adequately represent real-world failure conditions. Comprehensive testing should cover multiple failure types and operational scenarios.

Essential Test Scenarios:

Primary unit hardware failure (power, CPU, memory)
Network interface failures on critical links
Software crashes and kernel panics
Partial failures (some services working, others failing)
Network path failures affecting connectivity but not local interfaces
High-load conditions during failover events

Testing During Maintenance Windows Only

Restricting failover testing to scheduled maintenance windows can mask problems that only manifest under production load conditions. The behaviour of HA systems can differ significantly between idle and busy periods.

Production-Safe Testing Methods:

Implement gradual testing approaches that minimally impact users
Use monitoring and alerting systems to detect any service disruption immediately
Test during various load conditions to understand performance characteristics
Document baseline performance metrics before testing

Insufficient Monitoring During Tests

Failing to monitor all critical metrics during failover testing can result in missed issues that could cause problems during actual failures.

Key Monitoring Points:

Failover timing and duration
Session preservation rates
Performance impact during and after failover
Log entries and error messages on both units
Network convergence timing for routing protocols
Application-specific health checks and response times

Ignoring Application-Layer Impact

Network-layer failover success does not guarantee application functionality. Many administrators focus solely on connectivity without verifying that applications continue to operate correctly after failover.

Application Testing Considerations:

Database connection handling and transaction integrity
Load balancer health check responses
SSL certificate validation and session resumption
Application session management and state preservation
User experience and response time impacts

Implementation Best Practices

Successful HA deployment requires careful planning and attention to detail across multiple areas. These best practices will help you avoid common pitfalls and build a robust, reliable HA infrastructure.

Network Design Considerations

Dedicated HA Links: Always use dedicated physical connections for HA heartbeat and synchronisation traffic. Sharing these critical communications over production networks introduces unnecessary risk and potential performance issues.
Multiple Communication Paths: Configure multiple heartbeat paths to prevent split-brain scenarios. This might include dedicated ethernet connections, in-band management networks, and even out-of-band console connections for comprehensive failure detection.
Proper VLAN and Routing Configuration: Ensure that VLAN configurations and routing tables are properly synchronised between HA members. Asymmetric routing can cause connection issues and complicate troubleshooting.

Monitoring and Alerting

Comprehensive HA Monitoring: Implement monitoring systems that track HA cluster health, synchronisation status, and individual unit performance. This should include automated alerts for any HA-related issues.
Regular Health Checks: Establish routine procedures for verifying HA functionality, including automated tests that can be run without impacting production traffic.
Documentation and Runbooks: Maintain detailed documentation of your HA configuration and create runbooks for common failure scenarios and recovery procedures.

Conclusion

Choosing between Active-Passive and Active-Active FortiGate HA deployments depends on your specific requirements for performance, complexity, and cost. Active-Passive offers simplicity and cost-effectiveness for many environments, whilst Active-Active provides maximum performance and resource utilisation for high-demand scenarios.

Success in either configuration requires careful attention to synchronisation, comprehensive testing, and ongoing monitoring. By understanding these common pitfalls and implementing the recommended best practices, you can build a robust HA infrastructure that provides reliable protection for your network.

Remember that HA deployment is not a one-time activity but an ongoing process that requires regular testing, monitoring, and refinement. Invest the time upfront to properly design, implement, and test your HA configuration, and you will have confidence that your network can withstand hardware failures and continue serving your organisation’s critical needs.

The key to successful HA deployment lies not just in the initial configuration, but in the ongoing operational practices that ensure your redundancy systems work when they are needed most. Regular testing, monitoring, and maintenance are just as important as the initial design decisions in building a truly resilient network infrastructure.