[2:09 AM | 16 comments ]

What is CheckPoint Clustering ?

The premise behind CheckPoint clustering is that having two firewalls in active/standby is a bad idea. This is true for CheckPoint because they are so expensive that you can’t afford to keep buying new units so why waste half of your money with the second firewall doing nothing. Therefore, owning two units, and using them in Active/Active mode is perceived as a way of saving money. To make it worse, this very idea is so ‘kleva’ that CheckPoint engineers are commonly known to suggest the practice as a ‘competitive advantage.

There is one useful feature, the fact that you can cluster up to four units into a ‘single cluster’. However, the operational impact of this is very poor. It is not possible to to determine which firewall is handling a given flow, thus making troubleshooting very hard or impossible. Anyone who thinks that the Tracker tool can be used for troubleshooting needs a good spanking – it’s a good logging tool but not a perfect troubleshooting tool.

How does CheckPoint clustering work ?

In fact, Checkpoint doesn’t do the clustering, the Nokia IPSO software does although it seems that the manual makes no reference to this. You might want to refer to Cluster XL Admin Guide for this much improved( since 2007 when I last couldn’t get the manuals because of a paywall) but mostly unhelpful piece of documentation.

It’s worth noting that CheckPoint is actually a piece of software that runs on many platforms. In the past CheckPoint was used on Solaris, Windows and BayRS routers. Today it runs on Nokia IPSO, SPLAT (custom Linux distro) and Crossbeam. As a result, the CheckPoint software isn’t tightly coupled to the networking features of the underlying platform. Perhaps this explains why the manuals miss out on the networking aspects of firewall functions.

Normal Firewall Operation

So lets set a baseline around normal firewall operation.

In normal operation a firewall works this way:

client sends packet
firewall will receive an ARP from from the router,
respond with MAC address that is shared between the firewalls (and transfers between the active and standby unit on failover).
The firewall will receive the packet and forward it to the internal network. The reverse flow is identical.

All this is standards compliant, expected and operationally easy to maintain and troubleshoot.

Checkpoint Clustering Operation

Obviously, to provide clustering something unusual has to happen because either, or both, firewalls need to receive each and every packet that needs to be forwarded. The purpose of clustering is to enable two or more (up to four ??) firewalls to pass flows in a fully load balanced/shared way. Why would you do this ? My view is that CheckPoint / Nokia firewalls are ~~relatively~~ very expensive compared to Cisco/Juniper equivalents, so customers want to make the most of the “investment”. A shortcut like this looks attractive to double the throughput of the system.

From the Manual

From the manual:

ClusterXL uses unique physical IP and MAC addresses for the cluster members and virtual IP addresses to represent the cluster itself. Virtual IP addresses do not belong to an actual machine interface (except in High Availability Legacy mode, explained later). ClusterXL provides an infrastructure that ensures that data is not lost due to a failure, by ensuring that each cluster member is aware of connections passing through the other members. Passing information about connections and other Security Gateway states between the cluster members is known as State Synchronisation.

IP and MAC addresses

No, really, if you don’t understand these you should not be reading this. Return to school, do not collect $200 etc.

State Synchronisation

This is easy enough. Each flow that traverses the firewall creates a entry in a state database on the firewall, and this state database must/should/depends/your choice to be replicated to other firewalls so that if a failure event occurs, the other unit knows what traffic flows you were forwarding and can keep on going.

State Synchronisation means that for every flow on one firewall, it’s data is replicated to other firewalls. It’s most useful for long held data flows such as SQL and not so much for HTTP (YMMV).

Cluster Control Protocol

There is no standard protocol for synchronising such devices so CheckPoint created something with an imaginative name:

The Cluster Control Protocol (CCP) is the glue that links together the machines in the Check Point Gateway Cluster. CCP traffic is distinct from ordinary network traffic and can be viewed using any network sniffer. CCP runs on UDP port 8116, and has the following roles: – It allows cluster members to report their own states and learn about the states of other members by sending keep-alive packets (this only applies to ClusterXL clusters). – State Synchronisation.

Great. Basics are done.

ClusterXL modes

ClusterXL has four working modes:

Load Sharing Multicast Mode
Load Sharing Unicast Mode
New High Availability Mode
High Availability Legacy Mode

Ok, so there are four possible high availability modes. Two of which are actually “clustering” and two of which are NOT – they are ‘High Availability” active/standby. So we will ignore those.

Checkpoint/Nokia Multicast Clustering

Anything with the word ‘Multicast’ in it automatically means trouble. And, you would be right. Except that Checkpoint does naughty multicast. Well, it’s not IP Multicast it’s Ethernet Multicast. Lets walk it through:

For CheckPoint/Nokia the packet flow works something like this:

client sends packet
router will ARP for the next hop MAC address, all firewalls will respond with a Multicast MAC address.
Router sends Ethernet frame with a Multicast MAC address which the switch must treat as a broadcast to all devices in the VLAN
The Cluster protocol will notify one of the firewalls to forward the flow, and it will reach the server.

Let’s consider the reverse direction:

Server sends an ARP request.
Firewalls respond with Multicast MAC address and transmit Ethernet frame.
Server sends Ethernet frame with a Multicast MAC address which the switch must treat as a broadcast to all devices in the VLAN
The Cluster protocol will notify one of the firewalls to forward the flow, and it will reach the server.

and off to the client it goes.

Multicast Ethernet, Undirected Broadcasts and Denial of Service

CheckPoint has now switched to using Ethernet multicast without using IP Multicast. By default, Ethernet switches are configured with IGMP enabled. Therefore after IGMP Query times have expired (about three minutes), the port will start to block the frames and thus disable the Clustering functionality.

Checkpoint recommends three options to ‘fix’ this:

disable IGMP on the switches
configure static MAC address mappings for the multicast mac address on all ports
install an IGMP agent on the firewall

Disable IGMP on the switches

This is the primary recommendation from CheckPoint engineers and from the manual. To be fair, it’s possibly the best of three bad options although it’s most likely to cause significant problems.

When you disable IGMP on your ethernet switches, you are effectively allowing all multicast packets to be broadcast. That is, a multicast frame becomes a broadcast frame and every packet must be handled by every device in the VLAN. That is, broadcast frames are received by all devices, and the software protocol driver of the device must process the broadcast frame before discarding it thus creating performance problems (bus interrupts, buffer memory, CPU, software cycles, etc etc)

This is more commonly known as a Denial of Service Attack.

Consider this scenario: Checkpoint cluster 4

Lets assume that you have 100Mbps of inbound traffic on a fairly typical, dual router, dual firewall cluster type of setup like the following diagram. In this case, with IGMP disabled, 100Mbps of traffic will sent to the firewall and the standby router and all other devices on the public facing LAN.

In this scenario, each VPN concentrator is connected to a VLAN with Public IPv4 addresses. Since this is the only VLAN with the public address, you can’t put them anywhere else.

The VPN concentrators will needs to handle 100Mbps of broadcast traffic, in addition to the VPN traffic. Most likely, this will cause intermittent outages and service problems on those devices as the CPU struggles to read and discard that volume of traffic. In the worst case, the VPN concentrator may attempt to report broadcast flood and even shut down.

Server hosting

Lets consider the return path for traffic (because all flows have a return path). In this case, lets have a VLAN directly connected to the CheckPoint / Nokia firewall and some servers connected to that VLAN. Typically, this would be an email server, a web server, maybe a proxy or some other gateway. Most likely it would be several servers on that VLAN.

The server will get a Multicast MAC address for the IP address of any frames destined for firewall (most likely the default gateway) and will dispatch those according to the normal process. However, EVERY OTHER server will receive every packet as a Broadcast.

This will cause serious CPU impacts, and possible stability problems. You can fix this by having a L3 device on the inside of the firewalls, and limiting the impact of the broadcasts to the L2 VLAN that is directly connected to the firewalls, of course. But this limits your design choices, and isn’t helpful in an existing environment.

Configure static MAC address mappings for the multicast mac address on all ports

It’s worth noting that some cheaper Ethernet switches are unable to handle large volumes of multicast or broadcast packets in silicon. They may use the onboard CPU for frame replication which can drive on a few megabits before becoming overloaded. (Less common today, but still applicable on some products).

Lets take a look at configuring static MAC address in your switches. That is, you create manual MAC address entries for each port that has the Checkpoint device connected. This seems like a good solution since it stops the broadcasting outlined previously and tightly controls the packet flow.

However, the firewall team and network team must be fully aware of this for it to be operationally effective. Consider what happens a year later, when someone upgrades the switches, or replaces a faulty module, some other minor task ? It requires close supervision to keep the static database maintained over time.

This will work for some companies, but for larger companies it’s only a matter time until an outage is caused and therefore, not a good design choice. For smaller companies where just a couple of people manage the firewalls AND the network, the static MAC address can work.

Install an IGMP agent on the firewall

This document on ClusterXL IGMP Membership dated February 14, 2006 (!) explains how to add IGMP support to Checkpoint. However, I’m told by Checkpoint that this is not supported / not recommended (it’s hard to to get a straight answer). It’s requires a number of CLI entries to work, plus specific configuration on the Module configuration.

In short, this option is operationally a disaster. You may struggle to get upgrades completed properly, module configuration on Linux/Windows need changing in the Config/Registry for the IGMP configuration to survive a reboot.

Using Unicast Mode instead

So the second option is to use Unicast instead of Multicast. In this case, the ClusterXL software selects a PIVOT firewall to act as the Master unit, and it will either process the packets itself, or redirect them to another member of the cluster.

The diagram shows this mode of operation using Unicast redirect. Although Unicast redirect doesn’t have the same problems as the Multicast solution discussed above, it does have a problem. The pivot firewall must reserve resources to be able to redirect all flows and ensure that it has enough CPU capacity to send sync data to all other firewalls in the cluster. The pivot firewall therefore handles much less traffic than other members of the cluster.

Troubleshooting

One big challenge of clustering firewalls, is that capturing packets, and troubleshooting becomes relatively more difficult. I’ve had some odd problems using Wireshark and surmise that the volume of broadcast packets was overrunning the workstation network adapter I was using to capture packets. Sadly, I wasn’t able to try with another machine to verify this theory.

Firewall Deployment with Layer Data Centre Interconnect

Apparently, there are a number of people who think that this clustering idea is perfect for data centres that are L2 interconnected. By now, most of you will have realised the problem that clustering will cause when an ethernet segment is extended between two data centres, but lets make a diagram of it anyway. Two data centres, geographically separated, but with a L2 connection between them. Doesn’t much matter how (OTV, VPLS, dark fibre, WDM – all the same for this purpose).

On the interconnect, a multicast CheckPoint ClusterXL, with an active/active firewall configuration, is going to trombone traffic between sites according to some random algorithm.

In addition, the firewall synchronisation traffic must also be given high priority and if the sync data doesn’t occur quickly enough the firewalls seem to fail quite badly (again, apocryphal evidence here, not able to test in a live network).

Finally, Multicast/ Broadcast must flow across the L2 Link on the inside and outside interfaces for every VLANs.

Thus, for 100Megabits of traffic across the firewall, 200Megabits of broadcast traffic is generated, plus the Sync traffic (determined according to firewall rule but usually a lot of traffic).

This isn’t a good idea(tm) since you need that bandwidth for server-server communication as well. Should be obvious, I think.

The EtherealMind View

Experience suggests that the Nokia/CheckPoint clustering works but at relatively low volumes, say up to 10 Mbps at a rough non-educated guess because the customer has a number of existing clusters that do work reliably. However, as load increases on the firewall, it appears that the multicast/broadcast technique causes serious service problems to devices on the same VLAN as the firewalls themselves. Since static mac options require the firewall team and network team to operate closely, this isn’t practical for very large support teams because of the level of specialisation that occurs in those teams. I have deep reservations about larger volumes of clustered traffic and have seen a number of inexplicable problems when clustering is enabled.

I could wish to have done some more testing but project timescales are a bit tight, and there is no way to have a lab of CheckPoint firewalls because of licensing and/or cost.

On the basis of this research, and recent experiences with service difficulties, I can’t recommend CheckPoint Nokia clustering because it appears to be a technology with more drawbacks than capabilities.

16 comments

Christa Rascon said... @ January 18, 2018 at 1:47 AM: Great Post Step by Step.
Thanks for Sharing...Check Point Online Guides
Anonymous said... @ August 27, 2018 at 12:40 AM: It has been simply incredibly generous with you to provide openly what exactly many individuals would’ve marketed for an eBook to end up making some cash for their end, primarily given that you could have tried it in the event you wanted.
Digital Marketing Training in Chennai
gowsalya said... @ August 27, 2018 at 2:36 AM: I wish to show thanks to you just for bailing me out of this particular trouble.As a result of checking through the net and meeting techniques that were not productive, I thought my life was done.
full stack developer training in chennai
Mounika said... @ September 6, 2018 at 4:22 AM: After reading this web site I am very satisfied simply because this site is providing comprehensive knowledge for you to audience. Thank you to the perform as well as discuss anything incredibly important in my opinion. We loose time waiting for your next article writing in addition to I beg one to get back to pay a visit to our website in
Click here:
python training in Bangalore
Click here:
python training in Bangalore
Unknown said... @ September 13, 2018 at 5:55 AM: Excellant post!!!. The strategy you have posted on this technology helped me to get into the next level and had lot of information in it.
Blueprism training in Chennai

Blueprism training in Bangalore

Blueprism training in Pune

Blueprism online training
nilashri said... @ September 14, 2018 at 6:33 AM: Wonderful bloggers like yourself who would positively reply encouraged me to be more open and engaging in commenting.So know it's helpful.

Data Science training in marathahalli
Data Science training in btm
Data Science training in rajaji nagar
Data Science training in chennai
Data Science training in electronic city
Data Science training in USA
Data science training in pune
Data science training in kalyan nagar
Mounika said... @ October 19, 2018 at 12:40 AM: I found your blog while searching for the updates, I am happy to be here. Very useful content and also easily understandable providing.. Believe me I did wrote an post about tutorials for beginners with reference of your blog.
python training in pune | python training institute in chennai | python training in Bangalore
Anoushka Sakthi said... @ October 25, 2018 at 3:46 AM: Your article increases the curiosity to learn more about this topic. Keep sharing your information regularly for future reference.
Selenium Training in Chennai
Selenium Training
iOS Training in Chennai
Digital Marketing Training in Chennai
Salesforce Training in Chennai
Salesforce Training
Salesforce Course in Adyar
Anbarasan14 said... @ December 14, 2018 at 9:58 PM: Thanks for your contribution in sharing such a useful information. Waiting for your further updates.

Spoken English Classes in Bangalore
Spoken English Class in Bangalore
Spoken English Training in Bangalore
Spoken English Course near me
Spoken English in Bangalore
Best Spoken English Classes in Bangalore
Spoken English Coaching in Bangalore
jefrin said... @ February 2, 2019 at 12:19 AM: Good post very impressive
machine learning training in chennai
anusha said... @ April 13, 2019 at 5:30 AM: MEAN Stack Training in Chennai MEAN Stack Training in Chennai with real time projects. We are Best MEAN Stack Training Institute in Chennai. Our Mean Stack courses are taught by Industrial Experts which would help you to learn MEAN Stack development from the scratch.
Unknown said... @ September 21, 2019 at 2:17 AM: Thanks for your Blogs Appreciating the persistence you put into your blog and detailed information you provide.
Aws training chennai | AWS course in chennai
Rpa training in chennai | RPA training course chennai
oracle training chennai | oracle training in chennai
Hadoop Training in chennai | Hadoop training course in chennai
divya said... @ June 6, 2020 at 8:01 PM: Ascent Business Solutions is a leading provider of offshore revenue cycle management services and medical transcription services to the US Healthcare industry. We deliver business impact through operational excellence - helping our healthcare clients improve their revenue, cash, costs, margins, speed, and customer relationships.keep on search more.
Ai & Artificial Intelligence Course in Chennai
PHP Training in Chennai
Ethical Hacking Course in Chennai Blue Prism Training in Chennai
UiPath Training in Chennai
Rajendra Cholan said... @ August 22, 2021 at 1:24 AM: Title:
No.1 Oracle DBA Training in Chennai | Infycle Technologies

Description:
Fetch Oracle DBA Training in Chennai for making the best career in the software industry with Infycle Technologies. Infycle Technologies offers the best Oracle training in Chennai, providing courses for Oracle and many other software courses in 100% hands-on practical training with professional trainers in the domain. Along with the coaching, the placement interviews will be arranged for the students, so that they can set their careers at high standards. Of all that, 200% placement assurance will be given here. To have the best career, call 7502633633 to Infycle Technologies and grab a free demo to know more.
Best traiing in Chennai
INFYCLE TECHNOLOGIES said... @ August 26, 2021 at 10:08 AM: Grab the Digital Marketing Training in Chennai from Infycle Technologies, the best software training institute, and Placement center in Chennai which is providing professional software courses such as Data Science, Artificial Intelligence, Cyber Security, Big Data, Java, Hadoop, Selenium, Android, and iOS Development, DevOps, Oracle, etc with 100% hands-on practical training. Dial 7502633633 to get more info and a free demo and to grab the certification for having a peak rise in your career.
Rajendra Cholan said... @ September 16, 2021 at 3:10 AM: Title:
Top Data Science Training Institute in Chennai | InfycleTechnologies

Description:
Don’t miss this Infycle Education feast!! Special menu like updated Java, Python, Big Data, Oracle, AWS, and more than 20 software-related courses. Just get Data Science from the best Data Science Training Institute in Chennai, Infycle Technologies, which helps to recreate your life. It can help to change your boring job into a pep-up energetic job because, in this feast, you can top-up your knowledge. To enjoy this Data Science training in Chennai, just make a call to 7502633633.
best training institute in chennai

Introduce myself

This is a personal blog about connectivity for learning - funny - sharing and reference, in my opinion, covers everything about IT network infrastructures and all of its related components, like new software and/or hardware from vendors like Cisco Systems, Microsoft, IBM, HP, CheckPoint, Juniper and other things and so on. So that some blogs also contain useful configuration examples, posts and articles, at least for me, from different network components. I created this blog to share my knowledge with other people and hopefully someone will share his knowledge with me ... contains blogs about everything related to IT network infrastructures. Most of the blogs contain experiences of myself during my work.

Who am I ... My name is Huynh Phi Long and currently I work as a IT network administrator at PPF - Homecredit.

You can contact to me by email: longhp@live.com