Wednesday, June 4

Monitoring and Maintaining CUCM Appliance Hardware

Hardware Platform Monitoring and Management The Appliance supports a variety of interfaces to enable monitoring in the following eight focus areas:
1) CPU status/utilization
2) Memory status/utilization
3) System components temperatures
4) Fan status
5) Power Supply status
6) RAID & disk status
7) Network status (incl. NIC)
8) Operational status, including instrumentation of system/kernel status and data dumps following major system issues, indicating nature/type of the operational problem and degree of severity.
This section focuses on hardware-layer monitoring for 3) thru 8). 1) and 2) are covered in the section on CUCM Application-layer and Services-layer Monitoring.
Hardware Platform Monitoring and Management The Appliance supports a variety of interfaces to enable monitoring in the following eight focus areas:
1) CPU status/utilization
2) Memory status/utilization
3) System components temperatures
4) Fan status
5) Power Supply status
6) RAID & disk status
7) Network status (incl. NIC)
8) Operational status, including instrumentation of system/kernel status and data dumps following major system issues, indicating nature/type of the operational problem and degree of severity.
This section focuses on hardware-layer monitoring for 3) thru 8). 1) and 2) are covered in the section on CUCM Application-layer and Services-layer Monitoring.
CPU Process at host.jpg
For postmortem analysis, RIS Data Collector PerfMonLog tracks processes %cpu usage as well as at system level.
RTMT monitors CPU usage. When CPU usage is above a threshold, RTMT generates CPUPegging/CallProcessNodeCPUPegging alerts. From RTM Alert Central, you can also see current status.
There are two kinds of RTMT alerts. The first set is pre-configured (also called pre-canned) , and the second set is user defined. You can customize both of them. The main difference is that you cannot delete pre-configured, whereas you can add and delete user-defined alerts. However, you can disable both pre-configured and user-defined alerts. To view the pre-configured alerts, from the RTMT client application, select the RTMT -> Tools -> Alert -> Alert Central menu option. The pre-configured alerts are enabled by default. In most cases, you do not have to change the default threshold settings configured for the pre-configured alerts. However, you have an option to change the threshold settings to meet your requirements. The notification can be an e-mail or a pager. To set up e-mail notification, you should specify the SMTP server name and port number. You can do this in the RTMT client application by selecting the Alert/Threshold -> Enable E-Mail Server menu option.
CPU2.jpg

In addition to CPUPegging / CallProcessNodeCPUPeggin, high CPU usage potentially causes other alerts to occur such as: CodeYellow CodeRed CoreDumpFileFound CriticalServiceDown LowCallManagerHeartbeatRate LowTFTPServerHeartbeakRate LowAttendantConsoleHeartRate
% Iowait Monitoring High %iowait indicates high disk I/O activities. A few things needed to be considered: High IOwait due to heavy memory swapping. Please check %CPU Time for Swap Partition to see if there is high level of memory swapping activity. One potential cause of high memory swapping is memory leak. High IOwait due to DB activity . Database accesses Active Partition. If  %CPU Time for Active Partition is high, then most likely there are a lot of DB activities. High IOwait due to Common (or Log) Partition, where trace and log files are stored. You can check the following things: 1.Check Trace Log Center to see if there is any trace collection activity going on. If call processing is impacted (ie, CodeYellow), then consider adjusting trace collection schedule. If zip option is used, please turning it off. 2.Trace setting – At Detailed level, CUCM generates a lot of trace. If high %iowait and/or CUCM is in CodeYellow state, and CUCM service trace setting is at Detailed, please chance trace setting to “Error” to reduce the trace writing. You can use RTMT to identify processes that are responsible for high %iowait: If %iowait is high enough to cause CPUPegging alert, check the alert
     message to check processes waiting for disk IO.
Go to RTMT Process page, sort by Status. Check for processes in Uninterruptible Disk Sleep state Download RIS Data Collector PerfMonLog file to examine the process status for longer period of time. Below is an example of RTMT Process page, sorted by Status. You can check for processes in Uninterruptible Disk Sleep state. In the case below, it’s sFTP process:
RTMT1.jpg
You can also use CLI to isolate which process causes high IOwait:
Syntax admin:utils fior
     utils fior status
     utils fior enable
     utils fior disable
     utils fior start
     utils fior stop
     utils fior list
     utils fior top
For example: admin:utils fior list 2007-05-31 Counters Reset
 Time         Process        PID   State       Bytes Read          Bytes Written

----------------- ----- ----- -------------------- --------------------
17:02:45 rpmq 31206 Done 14173728 0 17:04:51 java 31147 Done 310724 3582 17:04:56 snmpget 31365 Done 989543 0 17:10:22 top 12516 Done 7983360 0 17:21:17 java 31485 Done 313202 2209 17:44:34 java 1194 Done 192483 0 17:44:51 java 1231 Done 192291 0 17:45:09 cdpd 6145 Done 0 2430100 17:45:25 java 1319 Done 192291 0 17:45:31 java 1330 Done 192291 0 17:45:38 java 1346 Done 192291 0 17:45:41 rpmq 1381 Done 14172704 0 17:45:44 java 1478 Done 192291 0 17:46:05 rpmq 1540 Done 14172704 0 17:46:55 cat 1612 Done 2560 165400 17:46:56 troff 1615 Done 244103 0 18:41:52 rpmq 4541 Done 14172704 0 18:42:09 rpmq 4688 Done 14172704 0
CLI fior output sorted by top disk users admin:utils fior top Top processes for interval starting 2007-05-31 15:27:23 Sort by Bytes Written Process PID Bytes Read Read Rate Bytes Written Write Rate

----- -------------- ------------- -------------- -------------
       Linuxzip 19556       61019083      15254771       12325229        3081307
       Linuxzip 19553       58343109      11668622         9860680        1972136
       Linuxzip 19544       55679597      11135919         7390382        1478076
       installdb 28786         3764719            83660         6847693          152171
       Linuxzip 20150       18963498        6321166         6672927        2224309
       Linuxzip 20148       53597311      17865770         5943560        1981187
       Linuxzip 19968         9643296        4821648         5438963        2719482
       Linuxzip 19965       53107868      10621574         5222659        1044532
       Linuxzip 19542       53014605      13253651         4922147        1230537
             mv      5048          3458525       3458525          3454941        3454941
utils diagnose list: This command will list all available diagnostic tests. For exemple: admin: utils diagnose list Available diagnostics modules disk_space - Check available disk space as well as any unusual disk usage service_manager - Check if service manager is running tomcat - Check if Tomcat is deadlocked or not running utils diagnose test: This command will execute each diagnostic test, but will not attempt to repair anything. Example: admin: utils diagnose test
Starting diagnostic test(s)
===============
test - disk_space  : Passed test - service_manager  : Passed test - tomcat  : Passed Diagnostics Completed utils diagnose module <moduleName> This command will execute a single diagnostic test and attempt to fix the problem if possible.  You can also use the command "utils diagnose fix" to run all of the diagnostic tests at once. Example: admin: utils diagnose module tomcat Starting diagnostic test(s)
===============
test - tomcat  : Passed Diagnostics Completed utils diagnose fix: This command will execute all diagnostic tests, and if possible, attempt to repair the system. Example: admin: utils diagnose fix Starting diagnostic test(s)
===============
test - disk_space  : Passed test - service_manager  : Passed test - tomcat  : Passed
Diagnostics Completed   utils create report hardware no parameters are required Creates a system report containing disk array, remote console, diagnositic, and environmental data. Example: admin:utils create report hardware
         ***   W A R N I N G   ***
This process can take several minutes as the disk array, remote console, system diagnostics and environmental systems are probed for their current values. Continue? Press y or Y to continue, any other key to cancel request. Continuing with System Report request... Collecting Disk Array Data...SmartArray Equipped server detected...Done Collecting Remote Console Data...Done Collecting Model Specific System Diagnostic Information...Done Collecting Environmental Data...Done Collecting Remote Console System Log Data...Done Creating single compressed system report...Done System report written to SystemReport-20070730020505.tgz To retrieve diagnostics use CLI command: file get activelog platform/log/SystemReport-20070730020505.tgz
utils iostat interval optional (seconds) Interval between two iostat readings - mandatory if iterations is being used iterations optional The number of iostat iterations to be performed - mandatory if interval is being used filename optional Redirect the output to a file Help: utils iostat: This command will provide the iostat output for the given number of iterations and interval. Example: admin: utils iostat Executing command... Please be patient Tue Oct 9 12:47:09 IST 2007 Linux 2.4.21-47.ELsmp (csevdir60) 10/09/2007 Time: 12:47:09 PM avg-cpu:  %user  %nice  %sys %iowait  %idle
          3.61    0.02    3.40    0.51   92.47
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm  %util sda 3.10 19.78 0.34 7.49 27.52 218.37 13.76 109.19 31.39 0.05 5.78 0.73 0.57 sda1 0.38 4.91 0.14 0.64 4.21 44.40 2.10 22.20 62.10 0.02 26.63 1.62 0.13 sda2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 10.88 0.00 2.20 2.20 0.00 sda3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 5.28 0.00 1.88 1.88 0.00 sda4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.83 0.00 1.67 1.67 0.00 sda5 0.00 0.08 0.01 0.01 0.04 0.73 0.02 0.37 64.43 0.00 283.91 69.81 0.08 sda6 2.71 14.79 0.20 6.84 23.26 173.24 11.63 86.62 27.92 0.02 2.98 0.61 0.43

The following table lists some equivalent perfmon counters between CUCM 4.x and CUCM 5.x and later:
CUCM 4.x Perfmon counters CUCM 5.x appliance Perfmon counters Process % Privileged Time Process STime   % Processor Time   % CPU Time Processor % UserTime Processor User Percentage   % Privileged Time   System Percentage   % Idle Time   Nice Percentage   % Processor Time   % CPU Time
Memory Monitoring Virtual memory consists of physical memory (RAM) and swap memory (Disk). RTMT “CPU & Memory” page has system level memory usage information as the following: Total: total amount of physical memory Free: amount of free memory Shared: amount of shared memory used Buffers: amount of memory used for buffering purpose Cached: amount of cached memory Used: calculated as Total – Free – Buffers – Cached + Shared Total Swap: total amount of swap space. Used Swap: the amount of swap space in use on the system. Free Swap: the amount of free swap space available on the system
You can also query memory information through APIs: Through SOAP, you can query the following perfmon counters: Under Memory object: % Mem Used, % VM Used, Total Kbytes, Total Swap Kbytes, Total VM Kbytes, Used Kbytes, Used Swap Kbytes, Used VM KBytes Under Process object: VmSize, VmData, VmRSS, % Memory Usage Through SNMP, you can query the following perfmon counters: Host Resource MIB: hrStorageSize, hrStorageUsed, hrStorageAllocationUnits, hrStorageDescr, hrStorageType hrMemorySize You can also download some historical information by using RTMT Trace Log Central: Cisco AMC Service PerfMonLog // enabled by default. Deprecated in ccm 6.0 because Cisco RIS Data Collector PerfMonLog is introduced Cisco RIS Data Collector PerfMonLog // disabled by default in CUCM 5.x; enabled by default in CUCM 6.0 Note: Perfmon Virtual Memory refers to Total (Physical + Swap) memory whereas Host Resource MIB Virtual Memory refers to Swap memory only.
RTMT “Process” pre-can screen displays process level memory usage (VmSize, VmRSS, and VmData) information. VmSize is total virtual memory used by the process VmRSS is the Resident Set currently in physical memory used by the process including Code, Data and Stack VmData is the virtual memory usage of heap by the process Page Fault Count represents the number of major page faults that a process encountered that required the data to be loaded into physical memory You can go to RTMT “Process” pre-can screen and sort VmSize by clicking on VmSize tab. Then you can identify which process consumes more memory.
RTMT2.jpg
Hints on Memory leak From RTMT Process page, if a process’ VmSize is continuously increasing, that process causes memory leaking. When process leaks memory, the system administrator should report to Cisco with proper trace files. Ris Data Collector PerfMonLog is a good one to collect as it contains historical information on memory usage. Then the system administrator can schedule restarting the service during off hour to reclaim the memory.
Alert Central Alert Central (RTMT -> Tools -> Alert -> Alert Central) has all the Cisco predefined alerts and provides the current status of each alertable condition. One column to pay attention to is the “In Safe Range”. If it’s marked as No then the condition is still not corrected. For instance, if “In Safe Range” is “No” for CallProcessingNodeCPUPegging, then it means the CPU usage on that node is still above the threshold. In the bottom is the history information. You can take a look to see alerts generated previously. Quite often by the time you realize that service has crashed, the corresponding trace files have been overwritten. It would be hard for Cisco TAC to work on the issue without trace files. In this case, it would be useful to know that CoreDumpFileFound, CodeYellow, and CriticalServiceDown alerts have Enable Trace Download option. To enable it, open the Set Alert Properties. The last page has option to enable trace download. This can be used to make sure a trace file corresponding to a crash is created.
Caution - Enabling TCT Download may affect services on the server. Configuring a high number of downloads will adversely impact the quality of services on the server. Alerts can also send out an alarms (syslog messages) by configuring the Alarm Configuration page for the Cisco AMC Service. In addition to AlertHistory shown in RTMT Alert Central, there is AMC Alert Log which you can download with Trace & Log Central to get up to 7 days (by default) worth of Alert history.
From Alert Central, you can see the current status:
RTMT5.jpg
The following table compares the names of perfmon counters on virtual memory between CUCM 4.x and CUCM 5.x. CUCM 4.x Perfmon counters CUCM 5.x appliance Perfmon counters Process Private Bytes Process VmRSS   Virtual Bytes   VmSize
Partition (Disk) Usage monitoring There are 4 partitions in CUCM hard drive: Common partition, also referred as Log partition, where trace/log files are stored Active partition contains files (binaries, libraries and config files) of active OS and CUCM version Inactive partition contains files for alternative CUCM version (e.g., older version that was upgraded from or newer version recently upgraded to but the server has not been toggled to this version to run). Swap partition is used for Swap space. You can also get partition information through APIs: Through SOAP APIs, you can query the following perfmon counters Under Partition object: Total Mbytes, Used Mbytes, Queue Length, Write Bytes Per Sec, Read Bytes Per Sec Through SNMP MIB, you can query the following information: Host Resource MIB: hrStorageSize, hrStorageUsed hrStorageAllocationUnits, hrStorageDescr, hrStorageType You can also download historical information by using RTMT Trace and Log Central: Cisco AMC Service PerfMonLog // enabled by default. Deprecated in CUCM 6.0, because Cisco RIS Data Collector PerfMonLog is introduced. Cisco RIS Data Collector PerfMonLog // disabled by default in CUCM 5.x; enabled by default in CUCM 6.0 You can use RTMT to monitor disk Usage:
RTMT6.jpg

Partition Name mapping Perfmon Instance Names as shown in RTMT and SOAP Names shown in Host Resource hrStorage Description Active / Inactive /partB Common /common Boot /grub Swap Virtual Memory SharedMemory /dev/shm
LogPartitionLowWaterMarkExceeded alert occurs when the percentage of used disk space in the log partition has exceeded the configured low water mark. This alert should be considered as early warning for an administrator to clean up disk space. You can use RMT Trace/Log Central to collect trace/log files and then delete these trace/log files from the server. In addition to manually clean up the traces/log files, the system administrator should also adjust the number of trace files to be kept to avoid hitting low water mark again. LogPartitionHighWaterMarkExceeded alert occurs when the percentage of used disk space in the log partition has exceeded the configured high water mark. When this alert is generated, Log Partition Monitoring (LPM) utility starts to delete files in Log Partition until the Log Partition is down to the low water mark to avoid running out of disk space. Since LPM may delete some files that you want to keep, you need to act upon receiving LogPartitionLowWaterMarkExceed alert. LowActivePartitionAvailableDiskSpace alert occurs when the percentage of available disk space of the Active Partition is lower than the configured value. Please use the default threshold that Cisco recommends. At default threshold, this alert should never be generated. If this alert occurs, a system administrator can adjust the threshold as temporary workaround but Cisco TAC should look into this. One place to look is /tmp using remote access. We have seen cases where large files are left there by 3rd party software. LowInactivePartitionAvailableDiskSpace alert occurs when the percentage of available disk space of the InActive Partition is lower than the configured value. Please use the default threshold that Cisco recommends. At default threshold, this alert should never be generated. If this alert occurs, a system administrator can adjust the threshold as temporary workaround but Cisco TAC should look into this.
The following table is a comparison of partition related perfmon counters between CUCM 4.x and CUCM 5.x. CCM 4.x Perfmon counters CCM 5.x appliance Perfmon counters Logical Disk % Disk Time Partition % CPU Time   Disk Read Bytes/sec   Read Kbytes Per Sec
Disk Write Bytes/sec   Write Kbytes Per Sec   Current Disk Queue Length   Queue Length   Free Megabytes   Used Mbytes       Total Mbytes   % Free Space   % Used
Database Replication among CUCM nodes
You can use RTMT database Summary to monitor your database activities (ie. CallManager -> Service -> Database Summary):

The following CLI can be used to monitor and manage intra-cluster connections: utils dbreplication status utils dbreplication repair all/nodename utils dbreplication reset all/nodename utils dbreplication stop utils dbreplication dropadmindb utils dbreplication setrepltimeout show tech dbstateinfo show tech dbinuse show tech notify run sql <query> Cisco Unified Communications Manager Monitoring “ccm” is the process name for Cisco Unified Communications Manager service. The following table is a general guideline for ccm service CPU usage
ccm CPU usage “Process(ccm)\% CPU Time” MCS-7835 Server MCS-7845 Server < 44% - good < 22% - good 44-52 % - warning 22-36 % -warning > 60% - bad > 30% -bad
You may ask: “ Why MCS-7845 server has more processors, but it has lower threadshold for CUP usage?”
Here is why: CCM process is multithreaded application. But main router thread does the bulk of call processing. A single thread can run only on one processor at any given time even when there are multiple processors available. That means ccm main router thread can run out of cpu resource even when there are idle processors. With hyper-threading on, MCS 7845 server has 4 virtual processors. So on server where the main router thread is running at full blast to do call processing, it is possible three other processors are near idle. In this situation UC Manager can get into Code Yellow state even when total CPU usage is 25-30%. (Similarly 7835 server with two virtual processors, UC Manager could get into Code Yellow state at around 50-60% cpu usage. NOTE 1: Code Yellow state is when ccm service is so overloaded that it cannot process incoming calls anymore. In this case, ccm initiates call throttling. NOTE 2: This doesn't mean you will see one processor's cpu usage at 100% and rest 0% in RTMT. Since main thread can run on processor A for 1/10th of second and processor B next 2/10th of seconds, etc, the cpu usage shown in RTMT would be more balanced. By default RTMT shows average CPU usage for 30 second duration.
You can also use APIs to query perfmon counters. Through SOAP APIs, you can query: Perfmon counters Device information DB access CDR access Through SNMP, CISCO-CCM-MIB: ccmPhoneTable, ccmGatewayTable, etc You can also download historical information by using RTMT Trace/Log Central Cisco AMC Service PerfMonLog // enabled by default. Deprecated in CUCM 6.0 because Cisco RIS Data Collector PerfMonLog is introduced. Cisco RIS Data Collector PerfMonLog // disabled by default in CUCM 5.x; enabled by default in CUCM 6.0.
Code Yellow CodeYellow alert is generated when ccm service goes into Code Yellow state, which means ccm service is overloaded. You can configure Code Yellow alert so that once Code Yellow alert occurs, the trace files can be downloaded for troubleshooting purpose.
AverageExpectedDelay counter represents the current average expected delay for handling any incoming message. If the value is above the value specified in "Code Yellow Entry Latency" service parameter, CodeYellow alarm is generated. This counter is one of key indicator of call processing performance issue.
Sometimes, you might see CodeYellow, but total CPU usage is only 25%. This is because CUCM needs one processor for call processing, when no processor resource available, CodeYellow may occur even total CPU usage is only around 25-30% in a four virtual processor server. Similarly on a two processor server, CodeYellow is possible around 50% total CPU usage.
Other perfmon counters should be monitored are: Cisco CallManager\CallsActive, CallsAttempted, EncryptedCallsActive, AuthenticatedCallsActive, VideoCallsActive Cisco CallManager\RegisteredHardwarePhones, RegisteredMGCPGateway, Cisco CallManager\T1ChannelsActive, FXOPortsActive, MTPResourceActive, MOHMulticastResourceActive Cisco Locations\BandwidthAvailable Cisco CallManager System Performance\AverageExpectedDelay CodeYellow DBReplicationFailure LowCallManagerHeartbeat ExcessiveVoiceQualityReports MaliciousCallTrace CDRFileDeliveryFailure/CDRAgentSendFileFailed Critical Service Down CoreDumpFileFound The following is screen shot of RTMT performance page:
Counter1.jpg

Note: In general, CUCM 4.x Communications Manager perfmon counters have been preserved by using the same names and representing the same values. And also CISCO-CCM-MIB has backward compatibility.
RIS Data Collector PerfMonLog CCM 5.x, RIS Data Collector PerfMonLog file is not enabled by default. To Enable RIS Data Collector PerfMonLog, go to CUCM admin page, go to Service Parameter Page, select Cisco RIS Data Collector service and set Enable Logging to True, as the following:
RISdata1.jpg
It is recommended enable RIS Data Collector PerfMonLog which is very useful for troubleshooting since it tracks CPU, memory, disk, network, etc. If you enable RIS Data Collector PerfMonLog, then you can disable AMC PerfMonLog.
Note: RIS Data Collector PerfMonLog is introduced in CUCM 6.0 to replace AMC PerfMonLog. RIS Data Collector PerfMonLog provides a little more information than AMC PerfMonLog. For detailed information, please see CUCM Serviceability User Guide.
It is recommended turn on RIS Data Collector PerfMonLog as soon as CUCM is up and running (by default, it is turned on). When RIS Data Collector PerfMonLog is turned, the impact on CPU is so small (around 1%) that can be ignored.

RIS Data Collector PerfMonLog Use RTMT Trace & Log Center to download Cisco RIS Data Collector PerfMonLog files for a time period that you are interested in; Open the log file using Windows Perfmon Viewer (or RTMT Perfmon viewer), then add Performance counters of interest such as CPU usage -> Processor or Process % CPU Memory usage -> Memory %VM Used Disk usage -> Partition  % Used Call Processing -> Cisco CallManager CallsActive The following is a screen shot of Windows Perfmon Viewer:
Perfmonviewer1.jpg
Service Status Monitoring
RTMT Critical Service page provides current status of all critical services, as the following:
Servicestat1.jpg
CriticalServiceDown alert is generated when any of service is down.
Note 1: RTMT backend service checks for the status (by default) every 30 seconds. So it is possible if service goes down and comes back up within that period, CriticalServiceDown alert may not be generated. Note 2: CriticalServiceDown alert monitors only those services listed in RTMT Critical Services page. If you suspect (or want to double check) if service got restarted (without generating Core files), a few ways to check are: RTMT Critical Service page has elapsed time. Check RIS Troubleshooting perfmon log files and see if PID for service (process) is changed.
The following CLI can be used to check the logs of Service Manager: file get activelog platform/servm_startup.log file get activelog platform/log/servm*.log
The following CLI can be used to duplicate certain RTMT functions: utils service show perf show risdb

CoreDumpFileFound alert is generated when RTMT backend service detects new Core Dump file. Both CriticalServiceDown and CoreDumpFileFound alert can be configured to download corresponding trace files for troubleshooting purpose. This helps to preserve trace files at the time of a crash.
Syslog Messages Monitoring Syslog can be viewed using RTMT syslog viewer, please the following screen shot:
Syslog1.jpg
Sending syslog traps to remote server (CISCO-SYSLOG-MIB) If you want to send syslog messages as syslog traps, here are the steps. 1.Setup Trap (Notification) destination from Unified CM Serviceability SNMP page – http://www.cisco.com/en/US/docs/voice_ip_comm/cucm/service/5_1_3/ccmsrva/sasnmpv1.html 2.Enable trap generation in CISCO-SYSLOG-MIB 3.Set appropriate SysLog level in CISCO-SYSLOG-MIB If you feel you are missing syslog traps for some Unified Communications Manager service alarms, check RTMT syslog viewer to see if the alarms are shown there. If not, adjust alarm configuration setting to send alarms to local syslog. For information on alarm configuration, refer to the Alarm Configuration section of the Cisco Unified CallManager Serviceability Administration Guide here – http://www.cisco.com/en/US/docs/voice_ip_comm/cucm/service/5_1_3/ccmsrva/saalarm.html Also check that the SysLog level in CISCO-SYSLOG-MIB is set at the appropriate level.
Syslog generated due to hardware failures has an event severity of 4 or higher and contains one of the following patterns:
  • cma*[???]:*
  • cma*[????]:*
  • cma*[?????]:*
  • hp*[???]:*
  • hp*[????]:*
  • hp*[?????]:*
Therefore, you can do a manual search for the patterns above to find hardware failure events in syslog.
RTMT Alerts as Syslog Messages and Traps RTMT Alerts can be logged as syslog messages and send to remote syslog and syslog traps server. To send to local and remote syslog, please configure AMC alarm configuration page of CUCM Serviceability Web Page. For CUCM 5.1 and later releases, please go to Serviceability Web Page, under Alarm Configuration, check AMC service parameter “Alarm Enabled”. Go to Serviceability Web Page, under Tools -> Control Center – Network Services, restart AMC services in Serviceability Web Page.
Syslog2.jpg
Phone registration status needs to be monitored for sudden changes. If the registration status changes slightly and readjusts quickly over a short time frame, then it could be indicative of phone move, add, or change. A sudden smaller drop in phone registration counter can be indicative of a localized outage, for instance an access switch or a WAN circuit outage or malfunction. A significant drop in registered phone level needs immediate attention by the administrator. This counter especially needs to be monitored before and after the upgrades to ensure the system is restored completely.
RTMT Reports
RTMT has a number of pre-can screens for information such as Summary, Call Activity, Device Status, Server Status, Service Status, and Alert Status. RTMT “Summary” pre-can screen shows a summary view of CUC M system health. It shows CPU, Memory, Registered Phones, CallsInProgress, and ActiveGateway ports & channels. This should be one of the first thing you want to check each day to make sure CPU & memory usage are within normal range for your cluster and all phones are registered properly.
Phone Summary and Device Summary pre-can screens provide more detailed information about phone and gateway status. If there are a number of devices that fail to register, then you can use the Admin Find/List page or RTMT device search to get further information regarding the problem devices. Critical Services pre-can screen displays the current running/activation status of key services. You can access all the pre-can screens by simply clicking the corresponding icons on the left.
Serviceability Reports Archive The Cisco Serviceability Reporter service generates daily reports in Cisco Unified CallManager Serviceability Web Page. Each report provides a summary that comprises different charts that display the statistics for that particular report. Reporter generates reports once a day on the basis of logged information, such as Device Statistics Report Server Statistics Report Service Statistics Report Call Activities Report Alert Summary Report Performance Protection Report For detailed information about each report, please see the following URL:http://www.cisco.com/en/US/docs/voice_ip_comm/cucm/service/5_0_2/ccmsrvs/sssrvrep.html#wp1033420

1 comment:

Fixing Problem said...

A Windows feature called Superfetch preloads frequently used programmes into memory for quicker access. On some systems, nevertheless, it can result in performance concerns. To disable Superfetch in Windows, open Services, locate SysMain, right-click and choose Properties, change Startup type to Disabled, and then click Stop.

My CCIE#53599

My journey started in 2013 when I decided for a CCIE in voice. One never really knows what they are in for when starting down this r...