** New users will occasionally fail to configure the FQDN, resulting in a “Unable to resolve FQDN” problem. So people turn to the TDengine team for help with this problem. This article will explain the relevant design and configuration of FQDN, hoping to help users avoid similar problems.
As for THE FQDN mechanism of TDengine, many users have expressed: why TDengine should use it, is it not good to use IP directly?
In fact, versions of TDengine prior to 2.0 did use IP. Since version 2.0, we have introduced the FQDN mechanism, considering that many production environments have IP addresses that can change.
First, to avoid confusion, we need to clarify two concepts, one is TDengine’s “FQDN parameter” and the other is the CONCEPT of FQDN as a network service itself. As a concept, FQDN is related to domain name and hostname, so we need to understand it carefully.
Therefore, in order to clarify this logic, we will use “FQDN parameter” and “FQDN” to refer to the parameter and FQDN concept itself respectively.
The full Name of FQDN is Fully Qualified Domain Name. With the domain name relative, we temporarily translate perfect domain name better understand a bit.
FQDN is divided into two parts:
1.Hostname: the Hostname, which can be obtained by running the Hostname command in Linux, such as TDengine1;
2.Domain: indicates the Domain name, for example, Taosdata.com.
So a full domain name can simply be understood as a domain name with a host name.
Therefore, a complete FQDN in the above case should beTDengine1.taosdata.com. However, in order to facilitate quick experience, TDengine will take the hostname of the machine as the value of the FQDN parameter by default after installation.
With the concepts and background introduced, let’s move on to the configuration phase:
The first thing to be clear about is that it is highly recommended that the server FQDN parameter be set manually rather than default, as long as you need to connect to TDengine remotely from the client. In addition, both IP address and FQDN can be provided to the client for connection. The configuration mode is as follows:
After modifying the FQDN parameters, we need to add TD1 and external IP addresses to the /etc/hosts file (or DNS service).
Finally, modifying the/var/lib/taos/dnode/dnodeEPs json FQDN inside information, database service can normal boot. (If the database is installed for the first time and the service is not started, these files will not be generated. You can skip this step.)
Now that we know the correct way to configure the server configuration, it’s time to start analyzing the client connection issues.
In fact, whether the value specified in the server FQDN parameter is IP or not, the client can directly use IP to establish a connection with the server.
The connection interfaces of Linux and Windows clients are shown as follows:
Therefore, the core of this mechanism is not whether taOS-H parameter is IP or FQDN parameter value. It is whether the FQDN parameter value retrieved from the server can be resolved to the correct IP, which is the key to your smooth operation of the data.
Now, I’m going to give you two counterexamples, two scenarios that you often encounter.
The SERVER IP address is 192.168.56.161, and the FQDN parameter is set to TD1.
The reason for scenario 1 is that TD1 is not written to the hosts file (or DNS service) on the client. In the figure above, the client uses taos-h 192.168.56.161 to connect to the TDengine server, retrieving TD1 as the communication address. When the query is executed, TDengine attempts to resolve TD1 to IP only to find that TD1 is not there — this is Unable to resolve FQDN.
Scenario 2:
The SERVER IP address is 192.168.56.161, and the FQDN parameter is set to TD1.
When querying data, TDengine attempts to resolve TD1 to IP in the hosts file (or DNS service). But the IP address was written wrong. Therefore, the IP address resolved by the client is not available, so the connection cannot be established — thus the “Unable to establish connection” problem appears.
To solve these two common problems, you only need to write the server’s FQDN parameter values and IP into the client’s hosts (or DNS service) file correctly.
So, what is the reason for the above mentioned “if the client needs to remotely connect to TDengine, we must manually change the server FQDN parameter value”?
Here’s the thing: because TDengine reads the host hostname as the FQDN parameter value by default, many newly installed database services have FQDN parameters with names like “localhost” or “Ubuntu”. If your client hostname also happens to be localhost or Ubuntu, the client will connect directly to 127.0.0.1 (itself).
This problem is typical for new users with high frequency, so it is best to write a new FQDN value yourself.
The above is only for single-node database connections. The situation is slightly different in clusters, but the principle is always the same.
As shown in the following figure, TDengine is deployed on machines A, B, and C respectively to form A cluster. Each node resolves the FQDN through its hosts (DNS service) file, addresses the IP address, and communicates with each other through the network layer.
For example, when TD-A sends A message to TD-B, tD-A needs to find the IP address of TD-B. Therefore, we need to add node B to node A’s hosts (DNS service).
Similarly, when TD-B proactively sends messages to TD-A, it also needs to find the IP address of TD-A in TD-B itself. Therefore, we need to add node A to node B’s hosts (DNS service).
TD – C.
Therefore, if Unable to Resolved FQDN appears when nodes communicate with each other, the corresponding FQDN cannot be found in the hosts file (DNS service) of one party.
Next we add the client:
(Here we need to mention the architecture of TDengine, in fact, each installation package comes with a client, so the client is already involved in the above situation, the client mentioned in this paragraph refers to the separation of the client and the server.)
Communication between the client and the cluster is often where we get things wrong. Because of TDengine’s point-to-point design, it is easy for users to ignore network problems with clustered servers other than the connection target.
The architecture of A normal client remote connection cluster should look like the following – TD-A, TD-B, and TD-C all need to be in the hosts (DNS service) on the client.
Any one of the above links using FQDN will cause problems, but such errors are usually hidden: we know TDengine is a distributed big data processing engine, so its data does not exist on one node, nor does it have only one copy. If your client does not add all FQDN to hosts (DNS service), the following phenomenon may occur:
A few days ago, you set up a cluster and show dnodes to see that all nodes are ready. You can query a few tables and write to a few tables.
But at some point in the future, you suddenly realize that TDengine has an error when writing to one table, but it’s ok to write to some other tables — how does that happen? Is it a bug?
Not really.
First, databases in a cluster are typically multi-replicated, which means that a virtual data node (Vnode) has multiple copies and exists as a master-slave. TDengine queries can be performed on any Master or Slave node, but writes can only be performed on the Master node. So, if you write to a table whose Master node happens to be on a node that you cannot connect to via FQDN, the write operation will report an error.
In fact, the cluster connection error logic is similar to the standalone version: unable to resolve FQDN is reported if the client server does not have the correct FQDN name configured in the hosts (DNS service) file. Unable to establish Connection or database not ready if the FQDN name is configured but the IP is incorrectly configured, unable to establish Connection or database not ready is reported.
Therefore, these problems are usually caused by configuration omissions. The official document reads as follows:
“The client also needs to be configured to ensure that it can correctly resolve the FQDN configuration for each node, either through the DNS service or the hosts file.”
Therefore, the easiest way to confirm the configuration is to check the hosts (DNS service) contents of all the nodes to see if their configuration information about the cluster nodes is the same.
For those of you reading this and thinking about it, I’m sure you’ve cleared the way for FQDN. Moreover, because FQDN problems in some specific scenarios are combined with typical product features of TDengine, you can have a deeper understanding of TDengine architecture with this question, and lay more foundation for your future use.
Finally, I will secretly tell you that TDengine will make more optimization in error prompt in the future — taking FQDN as an example, it will tell you from which node to which node “Unable to resolve FQDN”, so that we will be more handy to deal with related problems.