Distributed performance monitoring method based on strategy
Technical field
The present invention relates to extraction, processing, the storage of performance management data and the method for unifying, disposing flexibly the performance monitoring task according to strategy.Be a kind of being used for present network size to be increased, isomerism increases, and the distributed performance monitoring method of the sharp increase of Network variation and number of services and the new generation network of network dynamic change belongs to the network management technology field.
Background technology
Propelling along with new generation network, the increase of network size, the increase of isomerism, the variation of Network and the sharp increase of number of services, and network presents dynamic change even has self-organization, traditional network management pattern can not adapt to the characteristics such as distributivity, scalability, dynamic and extensibility of present network, and the network management requirements of a new generation has:
1) distributivity: open, flexible net environment, but need the flexible distributed of the distributivity of network management and expandability and webmaster task and automatically perform.
2) based on strategy and real-time: traditional network management is based on static management, and the webmaster operation must manually be carried out by the operator usually, manages no real-time, has been difficult to real time reaction for catenet.Present webmaster product generally adopts the network management based on strategy, and it is based on dynamic management, carries out network management automatically according to the strategy of customization in advance, manage real-timely, and comprehensively various factors reflects timely and effectively to the whole network.
3) scalability of Topology Discovery: in the face of the network of current numerous and complicated day by day, under will the prerequisite of load outside network brings jot, automatically, quick, complete, find the topological structure of diverse network accurately, just need give full play to the advantage of various topology discovery methods and technology, realize being used of several different methods and technology, and, use corresponding effective algorithm according to the feature of the network that will manage or subnet.
The distributed network management architecture is centralized and combination (as shown in Figure 1) layer-stepping, a plurality of network management workstations are set in network, form a plurality of management domains, network management task is disperseed, what emphasize is the dispersiveness of management function and the equality of management node.The network system that the distributed network management pattern is applicable to multi-service, contain much information, can expand flexibly and can isomery handle.Therefore, the network that network size is big, network environment is complicated often adopts the distributed network management architecture, and good node is born more management role to make performance.
Performance management is that network performance, resource utilization and relevant communication activity are analyzed, to help the situation and the efficient of network management personnel's evaluating network resource and related communication activity.Performance management mainly provides performance monitoring, performance evaluation and performance management controlled function, and generally inquiry, monitoring and the analysis with performance parameter is the operation conditions that principal mode comes monitoring network.The main task of performance management is the monitoring of performance, performance monitoring mainly comprises following step: interested variable of collection network manager and performance parameter are analyzed these data, judge whether each managed device running status is in normal level and produces corresponding report; For each important variable determines a suitable performance threshold, surpass this threshold value and just mean that noticeable service to have occurred unusual; According to performance statistic, adjust running parameter, server performance parameter or the application service parameter of corresponding network device, improve net service quality.
Yet because the dynamic change of network, the disposal ability of the equipment in the network, memory capacity, power supply, link-quality or the like performance parameter is in the continuous variation.Managed device may be about to reach bottleneck owing to the minimizing of the exhausting of internal memory, power supply or the like reason, needs this incident of management node perception and takes appropriate measures; Network management workstation also might be along with the increasing the weight of of management role, and performance descends gradually, causes management to be lost efficacy or withdraw from management.In the process of this dynamic change, managed device is in the different management domains in the different periods probably.Traditional centralized performance monitoring method makes that the monitoring task is that fix, single, and a monitoring task generally all can only generate, carry out at a management node.Therefore, how to realize managed device is disposed unified, performance monitoring task flexibly, make each monitoring task dynamic transfer and reach consistency, trackability between different management work stations, this seems particularly important in the new generation network management.
Summary of the invention
Technical problem: the objective of the invention is to propose a kind of distributed performance monitoring method based on strategy, by this method, make that performance monitoring can be according to the strategy file of NM server between the distributed network management station, with the monitoring task is that unit implements the monitoring to managed device, and each monitoring task has uniformity, flexibility and consistency.
Technical scheme: the distributed performance monitoring method based on strategy of the present invention may further comprise the steps:
Step 1: NM server defines a series of management strategies according to global information, and these management strategies have certain abstractness all with the formal description of XML file;
Step 2: NM server arrives each management work station by data simultaneous module with strategy distribution;
Step 3: management work station can analyze the strategy of NM server distribution accordingly according to the characteristics in self-management territory, and the strategy that NM server is distributed converts to and meets self-management characteristics management strategy in more detail then;
Step 4: management work station disposes, carries out corresponding monitoring task;
Step 5: the data that produce in the observation process upload to NM server by the database synchronization module;
Step 6: NM server is added up, is analyzed according to the management data of each Network Management Station of collecting, and then takes appropriate measures;
Step 7: observation process for a certain reason, monitoring need be transferred to other Network Management Station, then detailed management strategy and management data is transferred in the corresponding Network Management Station by data simultaneous module.
The strategy of step 1 is described principle: abstract as much as possible, and need be according to the characteristics definition management strategy of each management domain, the further analysis of specific strategy, description allow Network Management Station go to finish; Strategy file only describes mode that monitoring gathers abstractively, by the content of the IP address of pipe node, monitoring and the grade of monitoring.
The strategy of step 3 pair NM server distribution is analyzed accordingly and is: the strategy in the management work station is described the strategy file combination simultaneously that will analyze NM server and is produced detailed monitoring policy by the characteristics of pipe node; Specifically comprise: the description of the fiducial value of SNMP version, OID type, IP address, monitoring, rising threshold value, falling-threshold value, acquisition time interval, alarm grade, monitoring task.
The monitoring task definition of step 4 mainly comprises equipment performance monitoring and network performance monitoring: the equipment performance monitoring comprises device process information, disk utilance, memory usage and cpu busy percentage, mainly is at the important node equipment that service is provided in the network;
1) cpu busy percentage: monitoring CPU utilizes situation, checks whether the CPU history run is write down in long-term oepration at full load to CPU; Passing threshold is set, and finds CPU overload, and alarm;
2) memory usage: the monitoring internal memory utilizes situation, and whether long-term utilance is higher to check internal memory; The record internal memory uses historical; The passing threshold design setting is found the memory usage overload, and alarm;
3) disk utilance: the monitoring disk utilizes situation, and the passing threshold setting is worked as the hard disk remaining space and alarmed less than threshold value; When less than threshold value, can also move oneself program or script and come the defrag problem;
4) monitoring the process: the important running state of process in the monitoring system, check that whether long-term CPU, memory usage higher, the passing threshold setting is alarmed when the process operation exceeds threshold value for a long time, so that the keeper adjusts;
5) monitoring of power information: the surplus and the remaining time of monitoring power supply, after being lower than threshold value, produce alarm, so that management node or node self are taked certain measure;
Network monitor adopts regular time to read inflow, outflow byte number and inflow, the outflow erroneous words joint number of each port of the network equipment at interval, form initial data after treatment about port flow, the computing formula of monitoring aspects such as the availability of binding ability monitoring, error rate obtains network performance index, be the form of wanting most of reflection network performance, specifically comprise error rate, utilance and three leading indicators of packet loss.
Beneficial effect: this method advantage is:
1) the distributed performance monitoring model based on strategy makes that the monitoring task is decentralized, has solved sharply descending along with network size increases performance of existing in traditional centralized network ma nagement, and then has produced the problem of bottleneck.
2) distributed tactical management has guaranteed that with performance monitoring describing method based on strategy the monitoring task has consistency and continuity in different management domains, making the monitoring task to be continued by different management nodes finishes, these a series of processes can both at length be followed the trail of, record, can satisfy the needs of dynamic network management.
3) monitoring policy of NM server definition abstraction hierarchy is adopted in tactical management, Network Management Station is according to the local monitoring policy of abstract monitoring policy refinement, and use the XML language to describe management strategy, make tactful distribution, storage etc. have very big flexibility and professional platform independence.
4) this method can satisfy the regulatory requirement of different network environments as long as pass through to adjust the description of corresponding strategies and the method that Network Management Station realizes, very extensive applicability is promptly arranged.
Description of drawings
Fig. 1 is the schematic diagram of distributed network management.
Fig. 2 is the modular structure figure that implements in the Network Management Station based on the performance monitoring of strategy.
Embodiment
Network management server generates monitoring policy according to the network management needs on the overall situation, strategy has been described the abstract monitoring tasks such as address, monitoring content and monitoring grade of equipment.NM server is according to the characteristics distribution management strategy of each management domain then.Network Management Station is disposed detailed monitoring task according to the description of management strategy.This strategy also can be generated as required by local administrator.Management data that generates in observation process and corresponding warning information deposit local data base in the form of file.The monitoring task may shift along with the dynamic change of network condition, and for example certain management station therefore need be with the monitoring task transfers to other management nodes because performance or other reasons withdraw from management, and management role is born by other management stations.The historical data and the alarm data that also monitoring will be obtained in the transfer monitoring strategy file are transferred to next Network Management Station in the lump.The management data that monitoring produces is passed to NM server with file in form, and the information that NM server obtains according to monitoring is again done corresponding statistics, analyzes and take certain feedback measure.Specifically may further comprise the steps:
1) NM server defines a series of management strategies according to global information, and all with the formal description of XML file, these descriptions have certain abstractness to these management strategies.
2) NM server arrives each management work station by data simultaneous module with strategy distribution.
3) management work station can analyze the strategy of NM server distribution accordingly according to the characteristics in self-management territory, and the strategy with the NM server distribution converts the more detailed management strategy that meets the self-management characteristics to then.
4) management work station disposes, carries out corresponding monitoring task.
5) data that produce in the observation process upload to NM server by the database synchronization module;
6) NM server is added up, is analyzed according to the management data of each Network Management Station of collecting, and then takes appropriate measures.
7) observation process for a certain reason, monitoring need be transferred to other Network Management Station, then detailed management strategy and management data is transferred in the corresponding Network Management Station by data simultaneous module.
Wherein the description of contents of Ce Lve description principle and monitoring is as follows:
The NM server strategy is described principle:
The strategy that defines in the NM server is abstract as much as possible, be according to the definition of the characteristics separately management strategy of each management domain, and the further analysis of strategy, description allow Network Management Station go to finish.
The Network Management Station strategy is described principle:
The centralized performance monitoring based on SNMP or other acquisition means of tradition is to analyze its performance by pipe node information and on Network Management Station according to the certain time interval collection, the reasons such as bottleneck of network delay and network management workstation make the performance of Measurement Network have a lot of defectives, and this problem is more outstanding in the self-organizing network that the various informative property of network organizing, types of network equipment variation, various resource-constrained and node dynamic change.In distributed network management, can not as the legacy network performance monitoring, dispose the monitoring task that monitoring range is wide, task is fixed, monitoring time is grown, acquisition interval is short.On the contrary, must dispose, carry out, follow the tracks of each monitoring task effectively.To parameters such as acquisition time granularity, time span, threshold value be set flexibly according to the difference of monitoring item simultaneously, dispose each monitoring task in conjunction with the demand of network management.Be defined as follows:
1) time granularity gathered of monitoring: monitoring need be provided with the time interval of collection according to the characteristics of the network equipment, the collection of the time interval will suitably dwindle to(for) the relatively large node of node dynamic in the network.Otherwise, can suitably increase time interval of data acquisition for the less node of node dynamic in the network.Change relatively slowly for monitoring, can further increase time interval of collection.
2) starting point of monitoring time and terminal point: owing to may be in the different periods in the different management nodes by pipe node in the dynamic network.Therefore want the beginning and the end time of each monitoring task, make monitoring task and data be followed the tracks of accordingly, protect, guarantee the integrality and the continuity of monitoring.
3) grade of monitoring task: because the finiteness of Internet resources can not be arranged numerous monitoring tasks as legacy network, therefore will be according to the grade of monitoring content setting monitoring task, different task grades are provided with different priority.Though it is multiple to cause that the reason of performance of network equipments variation has, its symptom performance basically identical: the equipment cpu busy percentage is high, system's free memory reduces sharply.In network, might exhaust and withdraw from the relevant information of necessary monitoring power supply in the management because of the power supply of equipment with the equipment of powering by power supply.The monitoring of the cpu busy percentage of the network equipment, system's free memory, power information needs higher priority with respect to other monitoring tasks.
4) Jian Ce controllability: can be according to the managerial demand monitoring task that starts or stops at any time.
The content of Network Management Station performance monitoring:
Performance monitoring comprises that mainly equipment can be monitored and network performance monitoring two aspects.The equipment performance monitoring comprises monitoring equipment progress information, disk utilance, memory usage and cpu busy percentage, mainly is at the important node equipment that service is provided in the network (for example application server).
1) cpu busy percentage: monitoring CPU utilizes situation, checks whether the CPU history run is write down in long-term oepration at full load to CPU; Passing threshold is set, and finds CPU overload, and alarm.
2) memory usage: the monitoring internal memory utilizes situation, and whether long-term utilance is higher to check internal memory; The record internal memory uses historical; The passing threshold design setting is found the memory usage overload, and alarm.
3) disk utilance: the monitoring disk utilizes situation, and the passing threshold setting is worked as the hard disk remaining space and alarmed less than threshold value.When less than threshold value, can also move oneself program or script and come the defrag problem.
4) monitoring the process: the important running state of process in the monitoring system, check that whether long-term CPU, memory usage higher, the passing threshold setting is alarmed when the process operation exceeds threshold value for a long time, so that the keeper adjusts.
5) monitoring of power information: the surplus and the remaining time of monitoring power supply, being lower than the generation alarm of threshold value limit back, so that management node or node self are taked certain measure.
Network performance monitoring adopt regular time read at interval each port of the network equipment inflow, flow out byte number and inflow, outflow erroneous words joint number etc., form initial data after treatment about port flow, the computing formula of monitoring aspects such as the availability of binding ability monitoring, error rate obtains network performance index, be the form of wanting most of reflection network performance, be primarily aimed at the equipment such as router in the network.
As shown in Figure 2.Description based on each monitoring task strategy file in the distributed performance monitoring of strategy.Abstract strategy file comes by data simultaneous module from NM server.Policy management module in the Network Management Station is analyzed the strategy of NM server according to the management domain characteristics of self, generates local detailed monitoring policy file, and this strategy file also can be by the GUI input of Network Management Station.In the strategy file specific definition all information of monitoring task, gather to be provided with, analysis module is provided with related contents such as acquisition parameter, alarm threshold according to the description of strategy.Alarm management module produces, handles corresponding warning information according to the setting of threshold value.Each monitoring task all is to dispose flexibly with the form of plug-in unit, and the generation of task, the process of carrying out, stop, deleting or the like all have detailed record, tracking.Here the collection of management information can be taked corresponding acquisition means according to the characteristics of managed device.
The detailed process that Benq implements in the distributed performance monitoring method of strategy with a concrete example below:
The NM server end has defined a monitoring policy (XML description):
<?xml?version=″1.0″encoding=″GB2312″?>
<Policy_NMS?Policy_NMSID=″″>
<task?Type=″SNMP″IP=″10.10.136.216″Content=″CPU_RATE″Grade=″2″/>
</Policy_NMS>
Here NM server has defined an abstract relatively monitoring policy, Policy_NMSID is used for identifying each strategy in the strategy file, data acquisition modes is SNMP, and management equipment is 10.10.136.216, and the content of monitoring is a cpu busy percentage, the grade of monitoring is 2, the monitoring grade is divided into 5 grades altogether, and 1 grade of grade the highest, and the time interval that the monitoring task that grade is high is gathered is more little, corresponding threshold value setting is strict more, and the alarm grade of generation is also high more.
The strategy file that Network Management Station generates according to NM server generates local monitoring detailed strategy (XML description):
<?xml?version=″1.0″encoding=″GB2312″?>
<Policy?PolicyID=″″>
<task?SnmpVersion=″″ValueType=″″OidType=″″Type=″″IP=″″Oid=″″ValueBase=″″
DownGrade=″″UpThreshold=″″SaveTime=″″CommunityGet=″″Name=″″
CollectInterval=″″Port=″″Enable=″″ID=″″UpGrade=″″/>
</Policy>
PolicyID is used for identifying each strategy in the strategy file, and each task item has illustrated the detail parameters of each monitoring task in the strategy.For example Enable, UpGrade, DownThreshold show respectively whether monitor task is carried out, the lower limit of the upper limit of threshold value, threshold value or the like.
Then Network Management Station is implemented certain monitoring task, shifts between management station in the process task of monitoring if desired of monitoring, then above-mentioned two strategy files and the management data that obtained is transferred to other Network Management Station by data simultaneous module.Monitoring task has like this had good continuity and consistency.