DBSCAN Engine

The DSCAN engine is a Cluster Engine that builds clusters using the DBSCAN algorithm.

DBSCAN

Given the graph, we extract the alarms from all the vertices and use these as points as input to the DBSCAN algorithm.

DBSCAN requires a constant $\epsilon$ and a distance function, which we define as follows:

$d(a_{1}, a_{2}) = \alpha ( ( \beta | t( a_{1} ) - t( a_{2} ) | \frac{1}{60} + ( 1 - \beta ) dg( a_{1}, a_{2} ) )$

where:

• $a_{1}$ and $a_{2}$ are the points representing the alarms

• $\alpha \in (0, \infty)$ is a scaling constant (directly related to $\epsilon$)

• $\beta \in [0,1$] is a weighting constant

• When $\beta$ is closer to 0, more weight is given to the temporal component

• When $\beta$ is closer to 1, more weight is given to the spatital component

• $t(a_{k})$ returns the time (timestamp in seconds) of the last occurence of the given alarm

• $dg(a_{i}, a_{j})$ returns the normalized distance on the shortest path between the vertices for $a_{i}$ and $a_{k}$

• If both alarms are on the same vertex, then the distance is 0

• If there is no path between both alarms, then the distance is $\infty$

In simpler terms, we can think of the distance function as taking a weighted combination of both the distance in time and in space.

Defaults

We set the constants with the following defaults:

• $\epsilon = 100$

• $minpts = 1$

• $\alpha = 144.47$

• $\beta = 0.55$

These were derived empirically during our testing.

Examples

Let’s assume that we have the following graph: a1 and a2

Let’s start determining the distance between $a_{1}$ and $a_{2}$. We can calculate the time component with:

\begin{align} | t( a_{1} ) - t( a_{2} ) | & = | 1 - 30 | \\\\ & = 29 \end{align}

And given that $a_{1}$ and $a_{2}$ are on the same vertex, the spatial component is simply zero:

$dg( a_{1}, a_{2} ) = 0$

Placing these results in the original equation gives us:

\begin{align} d( a_{1}, a_{2} ) & = \alpha ( \beta \times \frac{29}{60} ) \\\\ & = 38.40 \end{align}

and $d(a_{1}, a_{2}) < \epsilon$, so the alarms will be clustered together.

a3 and a4

Now let’s determine the distance between $a_{3}$ and $a_{4}$. We can calculate the time component with:

\begin{align} | t( a_{3} ) - t( a_{4} ) | & = | 500 - 510 | \\\\ & = 10 \end{align}

To calculate the spatial distance between $a_{3}$ and $a_{4}$, we sum up the weights on the edges between the shortest path and divide this result by the default weight (=100), so:

\begin{align} dg( a_{3}, a_{4} ) & = \frac{50 + 50}{100} \\\\ & = 1 \end{align}

Placing these results in the original equation gives us:

\begin{align} d( a_{3}, a_{4} ) & = \alpha ( \beta \times \frac{10}{60} + (1 - \beta) \times 1 ) \\\\ & = \alpha ( 0.092 + 0.45 ) \\\\ & = 78.30 \end{align}

and $d(a_{3}, a_{4}) < \epsilon$, so the alarms will be clustered together.

a2 and a3

Now let’s determine the distance between $a_{2}$ and $a_{3}$. We can calculate the time component with:

\begin{align} | t( a_{2} ) - t( a_{3} ) | & = | 30 - 500 | \\\\ & = 470 \end{align}

The value of the spatial component is:

\begin{align} dg( a_{2}, a_{3} ) & = \frac{100}{100} \\\\ & = 1 \end{align}

Placing these results in the original equation gives us:

\begin{align} d( a_{2}, a_{3} ) & = \alpha ( \beta \times \frac{470}{60} + (1 - \beta) \times 1 ) \\\\ & = \alpha ( 7.83 + 0.45 ) \\\\ & = 1196.2116 \end{align}

and $d(a_{2}, a_{3}) > \epsilon$, so the alarms will not be clustered together.

Results

Given the results of the calculations above, we the DBSCAN algorithm will output the following clusters:

$\text{clusters} = \{ \{ a_{1}, a_{2} \}, \{ a_{3}, a_{4} \} \}$

Performance

The DBSCAN algorithm performs well when there are less than 500 candidate alarms. It has a worst-case complexity of $O(n^2)$.

Note that alarms are only considered to be candidates for correlation when they have been created and/or updated in the last 2 hours (configurable). This means that the engine can still be used on systems with more than 500 active alarms, since many of these will age out over time.