Everything Grows — Practice of Progressive Development of Photovoltaic Cloud System (I)

MongoDB — Practice of Progressive Development of Photovoltaic Cloud System (II)

Database selection

SQL database is often used in traditional industrial monitoring system.

There are two types of data tables in the database: real-time and historical. The data collected regularly by the device is first sent to the real-time table. The real-time table stores only the latest data, and the new data in the database overwrites the previous data. Keep the data in the table at only one moment to improve read and write speed. The history table periodically reads the data in the real-time table through special stored procedures, and adds the time field as a history record. Depending on the amount of data, the history table may be split into multiple tables by year, month, or day.

In the development of the first-generation photovoltaic cloud system, we adopted the SQL-type database, and the design of the table was copied from the traditional on-site monitoring system, but we soon found problems:

  1. The frequency of data entry into the database is the same as that of historical records, which is 5 minutes. Therefore, the record in the real-time table is the top record in the history table. Moreover, the function of data analysis often needs the data of a period of time, so although the real-time table is constantly modified, it is almost useless for the function.
  2. The fixed division of history table by month by year does not necessarily meet the needs of functional design, and some “flexible” query functions are very cumbersome to achieve.


The original equipment data of photovoltaic cloud system is called “nuclear data”. The above problems make us re-examine the nature of nuclear data.

The traditional monitoring system is deployed on site and its main function is real-time monitoring. Since the PHOTOVOLTAIC cloud system adopts the Web technology stack based on HTTP protocol and often needs to transmit data through the Internet, it is not suitable for the task of real-time monitoring, but mainly realizes functions such as analysis and management. We think of nuclear data as more akin to some sort of “timed log information” :

  1. After the core data is stored, it is mainly used for query, and there is almost no need for modification.
  2. As the raw material of big data analysis, nuclear data is often used after the event, and single data has little influence on the results. Therefore, compared with the data of field systems, nuclear data does not have high requirements for real-time and integrity.
  3. The nuclear data varies greatly due to the different equipment models, and it will not have a unified fixed point table like the field system.
  4. Because the power stations distributed in various places often need to send data through GPRS, the flow is limited, the collection interval can not be too intensive, generally not less than 5 minutes.


In short, the characteristics of nuclear data is to pay attention to unstructured record and query, transactional requirements are not high, is not very similar to the log information of the server.

So the selection of database is logical, first of all, the transactional requirements are not high, focus on the query of large data volume, we choose NoSQL database rather than SQL database. In mainstream NoSQL-type databases, we don’t care about real-time read and write performance, so we exclude Redis; We are not deployed on Hadoop and the data is not column structured, so HBase is not used; For log-type data, we naturally think of document-based MongoDB.

In addition, the operating language of MongoDB is JavaScript, which is consistent with the overall technology stack of our system. Document-type storage facilitates us to design archival data sets in an anticanonical way, and the slight increase of archival data amount can be used to obtain great convenience in query.

MongoDB no longer has the concepts of tables, fields and records. The database organizes data by Collection, and each data is called a Document. For nuclear data, we adopt a set for each actual device. Compared with all device data of the same model in the same collection, the amount of data in a single collection is small and easy to retrieve, and the device identification number field is omitted in the document.

The downside of this is that the number of collections in the database grows as new plants and equipment are added. With the support of “–nssize” parameter, the MongoDB namespace file (.ns file) can be up to 2G, that is, it can support a maximum of 3.4 million names. In the core data set, the set name, “_id” index and time index need three names, that is, about 1 million devices can be connected at most. It can meet the demand of the volume of business.

The core data set is named as follows:

data.model_xxxx.id_0001
data.model_xxxx.id_0002
data.model_yyyy.id_0003
data.model_yyyy.id_0004
Copy the code

XXXX indicates the device model, and 0001 indicates the unique device ID. In this way, the JavaScript template string can be used to easily join the corresponding collection name.

As mentioned in the overall design of the system, except for the core data set, other business information in the database is stored in the archive set. According to the current development requirements, it mainly includes the following three types:

  1. Information, storage of similar user information, power station coordinates and other information. Subcollection names start with “i_”, for example, “file.i_plant_location”.
  2. Relationship class, which stores information such as the subordinate relationship between power stations and devices. The name of the subcollection starts with r_, for example, file.r_plant_vdev.
  3. A mapping class that holds key mappings during data model transformation. The subset name starts with “m_”, for example, “file.m_vdev_model”.


Other types of data that may be encountered, such as server logs and business work orders, are designed in a progressive manner until they are developed at this stage.

The paradigm

As we know, the following Normalization paradigm is followed in the design of SQL database tables:

  • First Normal Form (1NF) : All fields should be atomic, meaning that each column of a database table is an indivisible atomic data item.
  • Second normal Form (2NF) : On a 1NF basis, non-code attributes must be completely dependent on candidate codes.
  • Third normal Form (3NF) : Based on 2NF, any non-primary attribute is not dependent on other non-primary attributes.


The normal design of database tables can ensure that there is no redundant information in data storage and save storage space. However, the data query needs to be associated with multiple tables, which makes the query logic complicated and the performance degraded.

In the use of NoSQL database such as MongoDB, the concept of Denormalization is often adopted, that is, the collection is not designed according to the requirements of the paradigm, and the increase of data volume in exchange for the convenience of query. File information is stored in data collection, relationship, such as a relatively stable data, the data volume is relatively small, and generally do not increase, and query data collection is very frequent and requirement is various, therefore, in the archives in the design of the data set can be appropriately reduce the constraint paradigm, such as power stations and the virtual inverter subordinate relationship, can document storage as follows:

{
  plantId: '0001',
  vinv: [
    {id: '00045', model: 'XXXX'},
    {id: '00046', model: 'XXXX'},
    {id: '00047', model: 'YYYY'},
    {id: '00048', model: 'YYYY'}
  ]
}
Copy the code

In this way, although the information is redundant, the IDS and models of all virtual inverters under the power station can be obtained at one time, which is very convenient.

Persistence of maps and dynamic model creation

The core data must be converted to virtual device data for service functions. The model of virtual device data is a standard JavaScript object, in which the members can be objects. The data is hierarchical and does not follow the first normal form. The structure of the core data reflects the point table of the actual device, which generally conforms to the first normal form. The mapping between the core data and the virtual device data model is kept in the database rather than written in the program. In this way, new devices can be registered and added after being added to the system.

The object that holds the mapping relationship is isomorphic with the object of the virtual device data model, and is indexed by the virtual device type and device model jointly:

{
  vdev: 'vcom',
  model: 'XXXX'
  dataMapper: {
    pTotal: 'Ppv',
    uTotal: 'Ipv',
    strings: [
      {i: 'Istring1'},
      {i: 'Istring2'},
      {i: 'Istring3'},
      {i: 'Istring4'}
    ]
  }
}
Copy the code

In the back-end system, mongoose is used as the ODM framework. Mongoose reflects the structure of the collection through Schema, and uses Schema as the template to create models corresponding to the collection. Each type of equipment has its own unique core data set structure. Considering the large number of equipment models and the subsequent addition of new models, it is not appropriate to write a fixed Schema in the program, so we persisted the parameter object in the constructor new Mongoose.Schema(Definition) in the database. When the core data of a certain type of equipment needs to be operated, the Schema is dynamically created by searching the corresponding Definition object, and then the set model of specific equipment is dynamically created by the function Mongoose. Model (modelName, Schema, collectionName).

Database security

Speaking of MongoDB, many people may have some concerns about security. In January 2017, MongoDB was hit by a massive attack in which data was deleted for ransom in Bitcoin. But the attack was not caused by a bug in MongoDB itself, but rather by the fact that many people were using MongoDB without adequate security Settings. There are many summaries on the Internet after the event. The following measures can effectively improve database security:

  1. Create an authorized user and set a password.
  2. Disable the Internet access permission for the database.
  3. Example Change the default database port.
  4. Back up your data regularly.



Did this article help you? Welcome to join the front End learning Group wechat group: