Alerts and Alarms - An Overview

The devices supported by the Questra IDM Application Suite are continually being monitored to assess their operational status. Generally speaking, when something is "wrong" with a device (it may be low on water, running too hot or too cold, or otherwise be malfunctioning), an alarm is raised and the users responsible for servicing that device are alerted that the device needs attention.

For every problematic condition that a device may experience, the administrators will have defined an alarm. Think of an alarm as a definition or description of a problematic condition. Some alarms can also describe conditions in which something fails to happen, such as when a device fails to send its readings to the Enterprise at its scheduled time. The table below describes the various default alarms.

NOTE: The term asset as used in an alarm name means the same thing as device.

Default alarms

Asset-task Watchdog

The device has failed to report when it was supposed to for a scheduled task. This alarm will be generated if a device task is used and a watchdog delay is specified for the task.

Fault generated by Asset

The device submitted a fault directly to the Enterprise. The Service Agent can be configured to perform device-local monitoring; submitting a fault is one of several actions that may be taken when a monitored condition is detected.

File Transfer Timeout

The request for a file transfer has timed out. A file transfer request that is not responded to by the Service Agent in a timely fashion will result in one of these timeouts. (For information on file transfers, see File Management.)

Monitor Timeout

The request for a monitor property reading has timed out. This occurs when a request to refresh device readings is not processed by the Service Agent in a timely fashion. (For information on device readings, see Devices.)

Site Visit Created

Indicates that a site visit has been created and is ready for downloading to a site visit user's laptop. (For information on using the SoftwareCourier feature of the Questra IDM Application Suite to create site visits, see Site Visits.)

Site Visit Overdue

Indicates that the site visit has been downloaded but not yet activated by its delivery date and is therefore overdue.

Site Visit State Change

Indicates that the state of the site visit has changed. This alarm is generated on every state change.

Site Visit Updated

Indicates that a site visit whose state is ReadyForDownload has been updated (specifically, its name, delivery-by date, and/or field service technician assignment has changed.)

Software Update Install by Date

Indicates that one or more devices have not installed a mandatory software package by the required date. (The Install-by date is specified when the package is scheduled for distribution. (For information on distributing software packages, see Questra SoftwareDirector.)

Software Update Reminder Date

Provides a reminder notification that a software package should be installed. For information on distributing software packages, see Questra SoftwareDirector.)

Usage Meter Timeout

The request for a usage property reading has timed out. This occurs when a request to refresh device readings is not processed by the Service Agent in a timely fashion. (For information on device readings, see Devices.)

Each defined alarm describes one particular condition, and (generally speaking for the sake of this discussion) each is associated with one or more alerts. An alert is an instruction that describes what the Enterprise will do when the alert's associated alarm is raised. There are several types of alerts, but the one that pertains to users most frequently is the type Email alert, which instructs the Enterprise to email alert notifications to the owners of the group to which the device belongs (that is, to the users responsible for servicing the device). (Another type of alert might pertain to notifications from a CRM system.)

An alarm event is an individual instance of an alarm condition detected on a device. For example, if a property of a device, such as its temperature, is detected as being outside of normal parameters, an alarm event is generated. In fact, when a device is malfunctioning, there may be multiple alarm events generated because several properties may be outside of normal parameters at the same time. For each alarm event that is generated, the Enterprise generates an alert request.

Consider the following example: Assume that the Enterprise has a rule for evaluating a motor temperature against a threshold and that a "Motor Over Temperature" alarm has been defined. This alarm is associated with an alert of the type Email alert. When a motor temperature reading is received by the Enterprise, it is evaluated by the motor temperature rule. If the motor temperature exceeds the threshold, the rule triggers the "Motor Over Temperature" alarm and a "Motor Over Temperature" alarm event is generated. In turn, an alert request of type Email alert is generated and its processing is begun.

The system manages alert requests from their generation to their completion. At any given time, an alert request will be in a certain state. In the user interface, you will see references to the states of alert requests, so it is important that you understand them. The following figure illustrates the states through which an alert request may pass in its processing, and the table defines these states.

Definitions of alert states

Active

Pending

This is the alert request's initial state when it is created. The  alert request is "pending" processing.

Submitted

The processing of the alert request has begun; it has been submitted to the alert processor.

Sent

The processing of the alert request is completed; it has been successfully sent to the alert processor.

Acknowledged

A user has reviewed the alert request and acknowledged that it exists.

Completed

Closed

The alarm condition has been cleared or otherwise handled, and no more user action is required. The alert request is closed.

Duplicate

The alert request pertains to the same alarm and alert as another alert request for the same device that is currently being processed. This alert request, therefore, is a duplicate and can be ignored.

Suppressed

The alert escalator has exhausted all escalation levels for the alert request, and processing has reached the end. Therefore, all automatic processing is now suppressed.

When you receive an emailed alert notification for a device you service, you will need to sign into the system, acknowledge the alert, take the necessary action to clear the alarm condition on the device, and finally close the alert. If you are unable to acknowledge the alert within a reasonable timeframe (as defined by your administrator), the alert will be escalated to the owners of the parent group to which your group of devices belong.

NOTE: As you use the system, you will see references to alerts and alarms on screen and in reports. Technically speaking, these are alert requests and alarm events, as described above. We've shortened these terms in the user interface for simplicity.