LookoutEquipmentAnalysis¶

class src.lookoutequipment.evaluation.LookoutEquipmentAnalysis(model_name, tags_df)¶

A class to manage Lookout for Equipment result analysis

model_name¶

the name of the Lookout for Equipment trained model

Type: string

predicted_ranges¶

a Pandas dataframe with the predicted anomaly ranges listed in chronological order with a Start and End columns

Type: pandas.DataFrame

labelled_ranges¶

A Pandas dataframe with the labelled anomaly ranges listed in chronological order with a Start and End columns

Type: pandas.DataFrame

df_list¶

A list with each time series into a dataframe

Type: list of pandas.DataFrame

Methods

`__init__`(model_name, tags_df)	Create a new analysis for a Lookout for Equipment model.
`compute_histograms`([index_normal, …])	This method loops through each signal and computes two distributions of the values in the time series: one for all the anomalies found in the evaluation period and another one with all the normal values found in the same period.
`get_labels`([labels_fname])	Get the labelled ranges as provided to the model before training
`get_predictions`()	Get the anomaly ranges predicted by the current model
`get_ranked_list`([max_signals])	Returns the list of signals with computed rank.
`plot_histograms`([nb_cols, max_plots])	Once the histograms are computed, we can plot the top N by decreasing ranking distance.
`plot_histograms_v2`(custom_ranking[, …])
`plot_signals`([nb_cols, max_plots])	Once the histograms are computed, we can plot the top N signals by decreasing ranking distance.
`set_time_periods`(evaluation_start, …)	Set the time period of analysis

__init__(model_name, tags_df)¶

Create a new analysis for a Lookout for Equipment model.

Parameters

model_name (string) – The name of the Lookout for Equipment trained model
tags_df (pandas.DataFrame) – A dataframe containing all the signals, indexed by time
region_name (string) – Name of the AWS region from where the service is called.

compute_histograms(index_normal=None, index_anomaly=None, num_bins=20)¶

This method loops through each signal and computes two distributions of the values in the time series: one for all the anomalies found in the evaluation period and another one with all the normal values found in the same period. It then computes the Wasserstein distance between these two histograms and then rank every signals based on this distance. The higher the distance, the more different a signal is when comparing anomalous and normal periods. This can orient the investigation of a subject matter expert towards the sensors and associated components.

Parameters

index_normal (pandas.DateTimeIndex) – All the normal indices
index_anomaly (pandas.DateTimeIndex) – All the indices for anomalies
num_bins (integer) – Number of bins to use to build the distributions (default: 20)

get_labels(labels_fname=None)¶

Get the labelled ranges as provided to the model before training

Parameters: labels_fname (string) – As an option, if you provide a path to a CSV file containing the label ranges, this method will use this file to load the labels. If this argument is not provided, it will load the labels from the trained model Describe API (Default to None)
Returns: A Pandas dataframe with the labelled anomaly ranges listed in chronological order with a Start and End columns
Return type: pandas.DataFrame

get_predictions()¶

Get the anomaly ranges predicted by the current model

Returns: A Pandas dataframe with the predicted anomaly ranges listed in chronological order with a Start and End columns
Return type: pandas.DataFrame

get_ranked_list(max_signals=12)¶

Returns the list of signals with computed rank.

Parameters: max_signals (integer) – Number of signals to consider (default: 12)
Returns: A dataframe with each signal and the associated rank value
Return type: pandas.DataFrame

plot_histograms(nb_cols=3, max_plots=12)¶

Once the histograms are computed, we can plot the top N by decreasing ranking distance. By default, this will plot the histograms for the top 12 signals, with 3 plots per line.

Parameters

nb_cols (integer) – Number of plots to assemble on a given row (default: 3)
max_plots (integer) – Number of signal to consider (default: 12)

Returns

tuple containing:

A matplotlib.pyplot.figure where the plots are drawn
A list of matplotlib.pyplot.Axis with each plot drawn here

Return type

tuple

plot_histograms_v2(custom_ranking, nb_cols=3, max_plots=12, num_bins=20)¶

plot_signals(nb_cols=3, max_plots=12)¶

Once the histograms are computed, we can plot the top N signals by decreasing ranking distance. By default, this will plot the signals for the top 12 signals, with 3 plots per line. For each signal, this method will plot the normal values in green and the anomalies in red.

Parameters

nb_cols (integer) – Number of plots to assemble on a given row (default: 3)
max_plots (integer) – Number of signal to consider (default: 12)

Returns

tuple containing:

A matplotlib.pyplot.figure where the plots are drawn
A list of matplotlib.pyplot.Axis with each plot drawn here

Return type

tuple

set_time_periods(evaluation_start, evaluation_end, training_start, training_end)¶

Set the time period of analysis

Parameters

evaluation_start (datetime) – Start of the evaluation period
evaluation_end (datetime) – End of the evaluation period
training_start (datetime) – Start of the training period
training_end (datetime) – End of the training period