Additional license consumed on client reconnection post heartbeat interval
When a client app loses a network connection, but then regains it, a socket is still open on the license server, and the license is unavailable to other users after the app reconnects (and consumes additional license) and then later exits cleanly.
- Set a Client / Server system (on separate machines) and check out a license
- Using lmstat on the server confirm that a license is checked out
- Unplug the network cable on the client
- Wait at least the period of one heartbeat (normally two minutes)
- Plug the cable back in and note (from the vendor log) that a second license is checked out.
- Confirm using lmstat that two licenses are checked out
- Exit the client app and confim that only one license is checked in.
- Confirm using lmstat that one license is left checked out, even 1 hour later.
During the time when the network is disconnected, clients heartbeat to the server fails with the network error. So, client disconnects the connection with the daemon and when the network is back, it creates a new connection and sends again the checkout request for that feature. Since this is a different connection, server does the additional checkout for the feature. This additional license of the feature is never re-claimed by the client as it does not know about it and when the client exits, license lingers forever.
In case, If client had checked out n licenses before the network disconnect, all the n licneses will be held in the server.
>> The first workaround reduces the LM_A_TCP_TIMEOUT value (set by the client, the time the server waits before deciding the client is disconnected and checks licenses back in). We suggested this formula to calculate the timeout based on heartbeat settings:
LM_A_TCP_TIMEOUT = (LM_A_CHECK_INTERVAL x 2) + LM_A_RETRY_COUNT x LM_A_RETRY_INTERVAL + one-minute-buffer.
As an example,:
1.) Setting LM_A_CHECK_INTERVAL to 30 seconds,
2.) LM_A_RETRY_COUNT to 2 and
3.) LM_A_RETRY_INTERVAL to 30 seconds
would result in LM_A_TCP_TIMEOUT of 3 minutes.
Since default LM_A_TCP_TIMEOUT is 2 hours, this significantly reduces the probability of the license server holding back licenses – for that to occur, the client would have to reconnect within 3 minutes.
The disadvantages of this workaround are:
1. Does not completely solve this issue (but does drastically reduce occurence)
2. Client updates needed
A consequence of the workaround is that clients reconnecting after 3 minutes, (using above example) would have to check out licenses again, even if they managed to reconnect on the same socket.
>> A second workaround is to edit the server OS TCP properties:
Edit/create the KeepAliveTime, KeepAliveInterval & TcpMaxDataRetransmission registry values, as set in HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters (refer http://tldp.org/HOWTO/TCP-Keepalive-HOWTO/usingkeepalive.html for equivalents on Linux).
So if we set KeepAliveTime to 600 seconds, KeepAliveInterval to 60 seconds and TcpMaxDataRetransmission to 3, the server will wait 600 seconds then for every 60 seconds sends heartbeat probes to the client for three times. After that, the server considers the connection to be broken.
The disadvantage of the second workaround is that this configures TCP properties for all processes running on the server. This should be OK if the license server is the only production process running on the server, for example if the server is isolated by running it in a VM
Version Fix Target
Not yet decided. (FNP-18904)