High Availability: Four Node Architecture

The four-node architecture comprises two nodes located in the data center and two nodes in the disaster recovery site. This setup guarantees that in the event of a complete failure of the primary data center, the servers on the disaster recovery site will seamlessly take over and maintain operations with minimal interruptions.

Note: Currently, the four-node architecture is supported only for servers running on Windows Operating System.
Supported Versions : Windows 2025, 2022, 2019. Available from ServiceDesk Plus version 14960.

Quick Links:

Architecture Diagram
Workflow
Prerequisites
- Server requirements
- ServiceDesk Plus Configuration Requirements
Network Modifications
Heart Beat Mechanism for Failover Detection
Ports Used / Firewall Configuration
File Transfer Process
Security Considerations
Steps to configure High Availability in Four Node Architecture Model using Batch File

Architecture Diagram

Worfklow

A verification is performed during the server's initial start-up to determine whether the server is currently acting as data center (DC) server or disaster recovery (DR) server.
If the server is in the Data Center, it will adhere to the HA/FOS flow: Node 1 will begin as the Primary server, while Node 2 will start in 'Stand-by' mode.
If the server is acting as a DR server, both the servers will start in 'stand-by' mode. When the primary server in the Data Center goes offline, Node 2 will become the primary server; while Node 3 and Node 4 in the disaster recovery site will remain in the stand-by state.
Node 3 and 4 periodically checks the DC servers' health through database heart beat and HTTP/HTTPS requests and runs the file replication schedule.
When the data center completely fails and both Node 1 and Node 2 are shutdown, Node 3 and Node 4 in the DR site detect the failure through the health monitoring mechanism and will come out of the stand-by state.
Out of Node 3 and Node 4, the one that detects the failure first will become the primary server while the other will be transitioned to 'stand-by' mode. Once the database entries are made for the takeover, the primary server fetches the last serving node details from the database and completes the pending file replication.
The servers Node 3 and 4, now acting as DC servers, will proceed with HA/FOS flow.

Prerequisites

Server Requirements:

Four 64-bit server machines with high network connectivity for setting up ServiceDesk Plus application.
Four 64-bit database server machines with high network connectivity.
Servers used must have two-way read-write access to the ManageEngine folder for file replication process (where ServiceDesk Plus is installed).
A dedicated user account should be used as log on user for application service in windows service manager and has to be included in the sharing list of ManageEngine folder.
Exclude antivirus scan for the folder where ManageEngine Servicedesk Plus is installed (eg. C:/ManageEngine/ServiceDesk)

Note: Out of four servers, two servers should be on the same network and the same subnets such that two servers will belong to the Data Center Site of the same network and the same subnet masks, and the other two servers will belong to the Disaster Recovery Site of same network and the same subnet masks.

ServiceDesk Plus Configuration Requirements:

A FOS license must be purchased as an add-on to your application edition.
The application must be started as a service in windows service manager.
Applications installed on the Data Center and Disaster Recovery sites must run on the same port.
The database must be externalized from all the four servers. You can either use an MSSQL database or migrate to version 14820 and use an external PGSQL database.
For MSSQL, it is recommended that the database be configured in an Always On Availability Groups (AG) setup, ideally hosted in a different geographical location.
For PostgreSQL, EDB and Percona are tested on our end. Steps to configure External PostgreSQL
The hostname of the server where ManageEngine ServiceDesk Plus is to be installed should not contain the underscore ( _ ) symbol in it. The underscore may affect the application startup and make certain services inaccessible on TCP/IP networks.
Please make sure that the installation directory name or its parent directory names do not have any space characters
A virtual IP / common IP address in the same network and subnet as DC1 and DC2 server.
A virtual IP / common IP address in the same network and subnet as DR1 and DR2 server.

Virtual IP configured should not bound to any servers.
"Request timed out" should occur and loss percentage should be 100% if we ping the virtual IP address

Note : Please ensure that all the connectivity and replication between the database has to be managed by the Infra team to ensure seamless operations from ServiceDesk Plus. Any service impact caused by database-side connectivity or replication issues is outside the scope of ServiceDesk Plus responsibility.

Network Modifications:

Initially, the Data Center Virtual IP is mapped to the domain URL/common alias URL.

After takeover from the Data Center to the Disaster Recovery site, the application will be started automatically on the Disaster Recovery site but you cannot access the application via the same domain URL/common alias URL.

Map the Disaster Recovery Site Virtual IP to the common alias URL/domain URL to access the application.

Virtual IP Binding process:

The binding process assigns a specified public IP address to the active primary server's network interface, making it reachable by external clients.

During the binding process,

The existence of the provided NIC address (interface name) is checked using ipHandler.bat (uses ifcheck.exe internally).
If the interface is up, the IP with the netmask and interface name will be added.

Primary server : DC1

Standby servers : DC2, DR1, DR2

Heart Beat Mechanism for Failover Detection

The standby servers monitors the health of the primary through a database-based heartbeat mechanism. This process involves the primary server periodically updating a counter value in the database, which the secondary servers checks at scheduled intervals. This method allows the secondary servers to confirm that the primary server is operational and responsive.

Health Monitoring:

The standby servers will start monitoring the primary server using below mentioned methods:

Standby servers will verify if the counter value of the primary server is consistently higher than its previous value by examining the database. This process is part of the heartbeat mechanism described earlier, which allows the standby servers to determine the operational status of the primary server.
Standby servers will perform an HTTP-based polling to the primary server and determine if the URL is accessible to the primary server.

Failure Detection and Take-over:

The health detectors keep monitoring the primary server's health at regular intervals. Upon detecting potential issues, the system implements a retry mechanism. Each detector will attempt to confirm the failure by making subsequent health checks. If the failure condition persists after a specified number of retries, the detectors confirm that the primary server is down. This helps to avoid false positives due to transient network issues or temporary glitches.
If the detectors conclude that the primary server is down, the standby server which detect the primary server failure first will initiate a takeover process. Before assuming the role of primary, the standby server will attempt to pull all pending files to ensure data consistency and to minimize any data loss. This step is crucial as it prepares the secondary to take over responsibilities seamlessly without affecting ongoing processes.
After successfully fetching the pending files, the secondary server transitions into the primary role.

Ports Used / Firewall Configuration:

The following ports must be open between the HA nodes to allow file transfer.

Protocol	Port	Direction	Purpose	Notes
SMB (TCP)	445	Bidirectional	File sharing and Robocopy file transfer	Required for access to shared folders (e.g., ManageEngineServiceDesk).
Database-specific protocol (over TCP/IP)	Database port configured in the application	Bidirectional	Database connection	Required to establish connection to the database server
HTTP / HTTPS (over TCP/IP)	Application port configured (default port - 8080)	Bidirectional	Application accessibility	Required to access the application

File Transfer Process:

Purpose: The purpose of this process is to sync files from the primary to the standby servers so that both remain updated without any changes. Additionally, we have specific configurations to handle the files/directories that needs to be replicated.

Tool Used: Robocopy (Robust File Copy Utility)

The following list of folders is replicated :

[application_home]inlineimages
[application_home]custom
[application_home]app_relationships
[application_home]snmp
[application_home]zreports
[application_home]backup
[application_home]archive
[application_home]lib
[application_home]exportimport
[application_home]scannedxmls
[application_home]fixes
[application_home]integration
[application_home]LuceneIndex
[application_home]ZIAdataset
[application_home]fileAttachments
[application_home]conf
[application_home]mediaFiles

Process Overview:

PowerShell maps a network drive to establish connectivity between the servers.
Robocopy, based on the defined configuration (include/exclude files/directories), transfers only updated or new files from the primary to the secondary server.
The log feature of Robocopy is enabled to monitor and display transfer statistics.

Script Execution Details:

Scripts Used

MapNetworkDrive.ps1

This script establishes the connection between primary and standby servers.
During file transfer, we use a PowerShell script and invoke it through the Windows PowerShell executable, adhering to the RemoteSigned policy.
The parameters action (connect) and target SMB path (e.g., \NodeBSharedFolder) are passed to the script to create the connection.
The PowerShell script then establishes a secure SMB connection to the remote shared folder using stored or provided credentials, ensuring the environment is ready for subsequent Robocopy file synchronization.

IPHandler.bat

This batch file is a Windows command-line utility designed to manage IP address bindings and network interface information.

Security Considerations:

Authentication

Sharing folders between Windows machines allows users to access files over a network using shared folders that are secured by authentication and permissions. When a folder is shared (by navigating to the Sharing tab → Advanced Sharing → Share this folder), access is controlled by both share permissions and NTFS (Security tab) permissions. Once a user is authenticated, their access level is determined by the strictest combination of the share and NTFS permissions. To connect, users can access the folder via \ServerNameShareName providing their credentials. Proper permission management and the use of domain accounts ensure secure and controlled resource sharing across Windows systems.

Steps to configure High Availability in Four Node Architecture Model using Batch File

Terminologies:

DC : Data Center

DR : Disaster Recovery

DC1 : Primary server in DC site

DC2 : Secondary server in DC site

DR1 : Primary server in DR site

DR2 : Secondary server in DR site

Step 1: Install the application in DC and DR sites.

Note : If mirroring approach is followed, kindly create /logs folder in DC2,DR1,DR2 servers under <application_home> directory once mirroring is completed.

Step 2: Share the ManageEngine folder in all the servers with necessary user accounts. Configure the details of servers in {application_home}/conf/dc-dr.conf file in the DC1 server.

Note: For password, we highly recommend to configure encrypted password in dc-dr.conf file for security reasons.

Steps to get the encrypted value for the password:

1) Open command prompt as administrator and navigate to /bin directory

2) Run the below command

encrypt.bat <password>

Example:

If the password is 'test', then run the command as follows

encrypt.bat test

Sample Output:

test (using AES encryption) = 0ee44c8718addb93b9373ae0ec3c90e62a14023f85a498eecad11d1056745f090e5460e4

Encrypted value for test : 0ee44c8718addb93b9373ae0ec3c90e62a14023f85a498eecad11d1056745f090e5460e4

Step 3: Download and extract this zip file (HA_4_Node_Tool_Files.zip).

HA_4_Node_Tool_Files.zip contents:

1)setup.zip

2) fjar.zip

setup.zip contents:

setup.bat
hadrsetup.jar

fjar.zip contents:

4noderepl.fjar

a) Extract setup.zip file and place the contents under /<application_home directory of DC1 Server> directory

b) Extract fjar.zip file and place the contents under /<application_home directory of DC1 Server>/fixes directory

Step 4: Navigate to /{application_home of the DC1 server}/bin directory and run the below command and configure database details.

changeDBServer.bat

Step 5: Start DC1 ManageEngine ServiceDesk Plus service from windows service manager

Step 6: Once the application is started successfully, shutdown DC1 ManageEngine ServiceDesk Plus service from windows service manager

Note: Step 7 is made to mirror the database configuration to all other servers.

Step 7: Navigate to /{application_home of the DC1 server}/bin directory and run the below commands one by one

mirrorSetup.bat <IP address of DC2 server>

mirrorSetup.bat <IP address of DR1 server>

mirrorSetup.bat <IP address of DR2 server>

Step 8: Open command prompt as an administrator and navigate to /{application_home of the DC1 server} directory and execute the below command

setup.bat FOS

Reference:

The above command will configure High Availability related details in the application files and enables Fail over Service between DC1,DC2 and DR1,DR2.

Note : In order to start the application in 4 node mode, we need to start the application in FOS mode in DC site.

Step 9: Start DC1 ManageEngine ServiceDesk Plus service from windows service manager

Step 10: Start DC2 ManageEngine ServiceDesk Plus service from windows service manager

Step 11: Stop DC2 ManageEngine ServiceDesk Plus service from windows service manager

Step 12: Stop DC1 ManageEngine ServiceDesk Plus service from windows service manager

Step 13: Open command prompt as an administrator and navigate to /{application_home of the DC1 Server} directory and execute the below command

setup.bat HADR

Reference:

The above command enables Disaster Recovery between the DC and DR sites.

Step 14: Start DC1 ManageEngine ServiceDesk Plus service from windows service manager

Note: The below steps are performed to populate the replication related details like username, password into the database.

Step 15: Once the application starts successfully on the DC1 server, login into the application and invoke the below URL in the web browser.

http/https :// localhost :<webport>/servlet/DebugServlet

Example : https://localhost:8080/servlet/DebugServlet

Reference :

Completed state :

Note: Step 15 is used to update the replication related details ( username , password) in the database after making changes in conf/dc-dr.conf file

Step 16: Start DC2 ManageEngine ServiceDesk Plus service from windows service manager

Step 17: Start DR1 ManageEngine ServiceDesk Plus service from windows service manager

Step 18: Start DR2 ManageEngine ServiceDesk Plus service from windows service manager

Steps to disable four node startup mode :

Step 1: Navigate to {application_home}/conf/HA directory of DC1 server and move the below mentioned file out of application_home directory

module-startstop-processors.xml

Step 2: Repeat Step 1 for other servers (DC2, DR1, DR2)

Note: Performing above steps will make application start as standalone service. Kindly refrain from starting DC2, DR1, DR2 servers after performing above steps.