The four-node architecture comprises two nodes located in the data center and two nodes in the disaster recovery site. This setup guarantees that in the event of a complete failure of the primary data center, the servers on the disaster recovery site will seamlessly take over and maintain operations with minimal interruptions.
Quick Links:
.png?Policy=eyJTdGF0ZW1lbnQiOlt7IlJlc291cmNlIjoiaHR0cHM6Ly9kemY4dnF2MjRlcWhnLmNsb3VkZnJvbnQubmV0L3VzZXJmaWxlcy84NjYvMTQyNjIvY2tmaW5kZXIvaW1hZ2VzL3F1LzIwMjUvMTc2MjIzNzg4ODMzMSgxKS5wbmciLCJDb25kaXRpb24iOnsiRGF0ZUxlc3NUaGFuIjp7IkFXUzpFcG9jaFRpbWUiOjE3NjU5MTA4MzN9fX1dfQ__&Signature=duOC-rmY02r4sXBq82ii3Z52Z1HfF9Ps-ToUqYFc3NTBg6qQsPH9w8oHMwY8cYkx7O4qmXhwlfP0TobkVIBGQKXo54ZHsTOF9ldnyv9PbXp0M7A~btSQderzzql83X1KOAauAVtnEKhJEvAwhUzXHaaF7VS-3kU1vsHVPSCsi8m1wRgnSzS2atHDqX3mJS82ZSXPAIjYuWPxaGDDDH4QV9b-IIMjEPuYH8PncGc19UbwjAqtEfaenZagWqoc9M10xGV6fqC2vgXUVIfNA7qFv1YJqtDHSQTRcIYsAAbaeVUZMgv5f1RSjt56Tx60t5I7L-OvHQ0awpWXhYK7yBG0bg__&Key-Pair-Id=K2TK3EG287XSFC)
.png?Policy=eyJTdGF0ZW1lbnQiOlt7IlJlc291cmNlIjoiaHR0cHM6Ly9kemY4dnF2MjRlcWhnLmNsb3VkZnJvbnQubmV0L3VzZXJmaWxlcy84NjYvMTQyNjIvY2tmaW5kZXIvaW1hZ2VzL3F1LzIwMjUvMTc2MjM0MTUzMzcwMSgxKS5wbmciLCJDb25kaXRpb24iOnsiRGF0ZUxlc3NUaGFuIjp7IkFXUzpFcG9jaFRpbWUiOjE3NjU5MTA4MzN9fX1dfQ__&Signature=P8X0D248e5IUExL-IaCDLum6GxBTyXlVac4gPIZQBNsDyhSHY8HUTFZg1NFC~cmbGHQkvkhaGHEC~V2BL~jUIx478O-syl-WtzF6H-Z-nMjk6dBz14X~xVivqW2D8OFwn2mnMjg1PbUnPc16x~xXouBkplImJVmG7UY9QXMpN~WOzarfJatD~dVKwWdQJwN3q~7wZg7XmUVLjfaXQu4R1NKPdFOLI242lU6pHPCLMDhHZ8dUWr6GX4U2AXTskbIJQgde86NehkU301XDzVT9FFV~KcDT0umkY5TMTBKWiYSoZSX5h4VV56K5OW~xOgCGHUnModFe7ARHD4cwH8UsFw__&Key-Pair-Id=K2TK3EG287XSFC)
A verification is performed during the server's initial start-up to determine whether the server is currently acting as data center (DC) server or disaster recovery (DR) server.
If the server is in the Data Center, it will adhere to the HA/FOS flow: Node 1 will begin as the Primary server, while Node 2 will start in 'Stand-by' mode.
If the server is acting as a DR server, both the servers will start in 'stand-by' mode. When the primary server in the Data Center goes offline, Node 2 will become the primary server; while Node 3 and Node 4 in the disaster recovery site will remain in the stand-by state.
Node 3 and 4 periodically checks the DC servers' health through database heart beat and HTTP/HTTPS requests and runs the file replication schedule.
When the data center completely fails and both Node 1 and Node 2 are shutdown, Node 3 and Node 4 in the DR site detect the failure through the health monitoring mechanism and will come out of the stand-by state.
Out of Node 3 and Node 4, the one that detects the failure first will become the primary server while the other will be transitioned to 'stand-by' mode. Once the database entries are made for the takeover, the primary server fetches the last serving node details from the database and completes the pending file replication.
The servers Node 3 and 4, now acting as DC servers, will proceed with HA/FOS flow.
Four 64-bit server machines with high network connectivity for setting up ServiceDesk Plus application.
Four 64-bit database server machines with high network connectivity.
Servers used must have two-way read-write access to the ManageEngine folder for file replication process (where ServiceDesk Plus is installed).
A dedicated user account should be used as log on user for application service in windows service manager and has to be included in the sharing list of ManageEngine folder.
Exclude antivirus scan for the folder where ManageEngine Servicedesk Plus is installed (eg. C:/ManageEngine/ServiceDesk)
A FOS license must be purchased as an add-on to your application edition.
The application must be started as a service in windows service manager.
Applications installed on the Data Center and Disaster Recovery sites must run on the same port.
The database must be externalized from all the four servers. You can either use an MSSQL database or migrate to version 14820 and use an external PGSQL database.
For MSSQL, it is recommended that the database be configured in an Always On Availability Groups (AG) setup, ideally hosted in a different geographical location.
For PostgreSQL, EDB and Percona are tested on our end. Steps to configure External PostgreSQL
The hostname of the server where ManageEngine ServiceDesk Plus is to be installed should not contain the underscore ( _ ) symbol in it. The underscore may affect the application startup and make certain services inaccessible on TCP/IP networks.
Please make sure that the installation directory name or its parent directory names do not have any space characters
A virtual IP / common IP address in the same network and subnet as DC1 and DC2 server.
A virtual IP / common IP address in the same network and subnet as DR1 and DR2 server.
Virtual IP configured should not bound to any servers.
"Request timed out" should occur and loss percentage should be 100% if we ping the virtual IP address

Initially, the Data Center Virtual IP is mapped to the domain URL/common alias URL.
After takeover from the Data Center to the Disaster Recovery site, the application will be started automatically on the Disaster Recovery site but you cannot access the application via the same domain URL/common alias URL.
Map the Disaster Recovery Site Virtual IP to the common alias URL/domain URL to access the application.
Virtual IP Binding process:
The binding process assigns a specified public IP address to the active primary server's network interface, making it reachable by external clients.
During the binding process,
The existence of the provided NIC address (interface name) is checked using ipHandler.bat (uses ifcheck.exe internally).
If the interface is up, the IP with the netmask and interface name will be added.
Primary server : DC1
Standby servers : DC2, DR1, DR2
The standby servers monitors the health of the primary through a database-based heartbeat mechanism. This process involves the primary server periodically updating a counter value in the database, which the secondary servers checks at scheduled intervals. This method allows the secondary servers to confirm that the primary server is operational and responsive.
The standby servers will start monitoring the primary server using below mentioned methods:
Standby servers will verify if the counter value of the primary server is consistently higher than its previous value by examining the database. This process is part of the heartbeat mechanism described earlier, which allows the standby servers to determine the operational status of the primary server.
Standby servers will perform an HTTP-based polling to the primary server and determine if the URL is accessible to the primary server.
The health detectors keep monitoring the primary server's health at regular intervals. Upon detecting potential issues, the system implements a retry mechanism. Each detector will attempt to confirm the failure by making subsequent health checks. If the failure condition persists after a specified number of retries, the detectors confirm that the primary server is down. This helps to avoid false positives due to transient network issues or temporary glitches.
If the detectors conclude that the primary server is down, the standby server which detect the primary server failure first will initiate a takeover process. Before assuming the role of primary, the standby server will attempt to pull all pending files to ensure data consistency and to minimize any data loss. This step is crucial as it prepares the secondary to take over responsibilities seamlessly without affecting ongoing processes.
After successfully fetching the pending files, the secondary server transitions into the primary role.
The following ports must be open between the HA nodes to allow file transfer.
|
Protocol |
Port |
Direction |
Purpose |
Notes |
|
SMB (TCP) |
445 |
Bidirectional |
File sharing and Robocopy file transfer |
Required for access to shared folders (e.g., ManageEngineServiceDesk). |
|
Database-specific protocol (over TCP/IP) |
Database port configured in the application |
Bidirectional |
Database connection |
Required to establish connection to the database server |
|
HTTP / HTTPS (over TCP/IP) |
Application port configured (default port - 8080) |
Bidirectional |
Application accessibility |
Required to access the application |
Purpose: The purpose of this process is to sync files from the primary to the standby servers so that both remain updated without any changes. Additionally, we have specific configurations to handle the files/directories that needs to be replicated.
Tool Used: Robocopy (Robust File Copy Utility)
The following list of folders is replicated :
[application_home]inlineimages
[application_home]custom
[application_home]app_relationships
[application_home]snmp
[application_home]zreports
[application_home]backup
[application_home]archive
[application_home]lib
[application_home]exportimport
[application_home]scannedxmls
[application_home]fixes
[application_home]integration
[application_home]LuceneIndex
[application_home]ZIAdataset
[application_home]fileAttachments
[application_home]conf
[application_home]mediaFiles
PowerShell maps a network drive to establish connectivity between the servers.
Robocopy, based on the defined configuration (include/exclude files/directories), transfers only updated or new files from the primary to the secondary server.
The log feature of Robocopy is enabled to monitor and display transfer statistics.
Scripts Used
This script establishes the connection between primary and standby servers.
During file transfer, we use a PowerShell script and invoke it through the Windows PowerShell executable, adhering to the RemoteSigned policy.
The parameters action (connect) and target SMB path (e.g., \NodeBSharedFolder) are passed to the script to create the connection.
The PowerShell script then establishes a secure SMB connection to the remote shared folder using stored or provided credentials, ensuring the environment is ready for subsequent Robocopy file synchronization.
This batch file is a Windows command-line utility designed to manage IP address bindings and network interface information.
Sharing folders between Windows machines allows users to access files over a network using shared folders that are secured by authentication and permissions. When a folder is shared (by navigating to the Sharing tab → Advanced Sharing → Share this folder), access is controlled by both share permissions and NTFS (Security tab) permissions. Once a user is authenticated, their access level is determined by the strictest combination of the share and NTFS permissions. To connect, users can access the folder via \ServerNameShareName providing their credentials. Proper permission management and the use of domain accounts ensure secure and controlled resource sharing across Windows systems.
Terminologies:
DC : Data Center
DR : Disaster Recovery
DC1 : Primary server in DC site
DC2 : Secondary server in DC site
DR1 : Primary server in DR site
DR2 : Secondary server in DR site
Step 1: Install the application in DC and DR sites.
Step 2: Share the ManageEngine folder in all the servers with necessary user accounts. Configure the details of servers in {application_home}/conf/dc-dr.conf file in the DC1 server.
1) Open command prompt as administrator and navigate to /bin directory
2) Run the below command
Example:
If the password is 'test', then run the command as follows
Sample Output:
test (using AES encryption) = 0ee44c8718addb93b9373ae0ec3c90e62a14023f85a498eecad11d1056745f090e5460e4
Encrypted value for test : 0ee44c8718addb93b9373ae0ec3c90e62a14023f85a498eecad11d1056745f090e5460e4
Step 3: Download and extract this zip file (HA_4_Node_Tool_Files.zip).
HA_4_Node_Tool_Files.zip contents:
1)setup.zip
2) fjar.zip
setup.zip contents:
setup.bat
hadrsetup.jar
fjar.zip contents:
4noderepl.fjar
a) Extract setup.zip file and place the contents under /<application_home directory of DC1 Server> directory
b) Extract fjar.zip file and place the contents under /<application_home directory of DC1 Server>/fixes directory
Step 4: Navigate to /{application_home of the DC1 server}/bin directory and run the below command and configure database details.
changeDBServer.bat
Step 5: Start DC1 ManageEngine ServiceDesk Plus service from windows service manager
Step 6: Once the application is started successfully, shutdown DC1 ManageEngine ServiceDesk Plus service from windows service manager
Step 7: Navigate to /{application_home of the DC1 server}/bin directory and run the below commands one by one
Step 8: Open command prompt as an administrator and navigate to /{application_home of the DC1 server} directory and execute the below command
Reference:

The above command will configure High Availability related details in the application files and enables Fail over Service between DC1,DC2 and DR1,DR2.
Step 9: Start DC1 ManageEngine ServiceDesk Plus service from windows service manager
Step 10: Start DC2 ManageEngine ServiceDesk Plus service from windows service manager
Step 11: Stop DC2 ManageEngine ServiceDesk Plus service from windows service manager
Step 12: Stop DC1 ManageEngine ServiceDesk Plus service from windows service manager
Step 13: Open command prompt as an administrator and navigate to /{application_home of the DC1 Server} directory and execute the below command
Reference:
The above command enables Disaster Recovery between the DC and DR sites.
Step 14: Start DC1 ManageEngine ServiceDesk Plus service from windows service manager
Step 15: Once the application starts successfully on the DC1 server, login into the application and invoke the below URL in the web browser.
http/https :// localhost :<webport>/servlet/DebugServlet
Example : https://localhost:8080/servlet/DebugServlet
Reference :
Completed state :

Step 16: Start DC2 ManageEngine ServiceDesk Plus service from windows service manager
Step 17: Start DR1 ManageEngine ServiceDesk Plus service from windows service manager
Step 18: Start DR2 ManageEngine ServiceDesk Plus service from windows service manager
Steps to disable four node startup mode :
Step 1: Navigate to {application_home}/conf/HA directory of DC1 server and move the below mentioned file out of application_home directory
module-startstop-processors.xml
Step 2: Repeat Step 1 for other servers (DC2, DR1, DR2)