LookoutEquipmentAnalysis

class src.lookoutequipment.evaluation.LookoutEquipmentAnalysis(model_name, tags_df)

A class to manage Lookout for Equipment result analysis

model_name

the name of the Lookout for Equipment trained model

Type

string

predicted_ranges

a Pandas dataframe with the predicted anomaly ranges listed in chronological order with a Start and End columns

Type

pandas.DataFrame

labelled_ranges

A Pandas dataframe with the labelled anomaly ranges listed in chronological order with a Start and End columns

Type

pandas.DataFrame

df_list

A list with each time series into a dataframe

Type

list of pandas.DataFrame

Methods

__init__(model_name, tags_df)

Create a new analysis for a Lookout for Equipment model.

compute_histograms([index_normal, …])

This method loops through each signal and computes two distributions of the values in the time series: one for all the anomalies found in the evaluation period and another one with all the normal values found in the same period.

get_labels([labels_fname])

Get the labelled ranges as provided to the model before training

get_predictions()

Get the anomaly ranges predicted by the current model

get_ranked_list([max_signals])

Returns the list of signals with computed rank.

plot_histograms([nb_cols, max_plots])

Once the histograms are computed, we can plot the top N by decreasing ranking distance.

plot_histograms_v2(custom_ranking[, …])

plot_signals([nb_cols, max_plots])

Once the histograms are computed, we can plot the top N signals by decreasing ranking distance.

set_time_periods(evaluation_start, …)

Set the time period of analysis

__init__(model_name, tags_df)

Create a new analysis for a Lookout for Equipment model.

Parameters
  • model_name (string) – The name of the Lookout for Equipment trained model

  • tags_df (pandas.DataFrame) – A dataframe containing all the signals, indexed by time

  • region_name (string) – Name of the AWS region from where the service is called.

compute_histograms(index_normal=None, index_anomaly=None, num_bins=20)

This method loops through each signal and computes two distributions of the values in the time series: one for all the anomalies found in the evaluation period and another one with all the normal values found in the same period. It then computes the Wasserstein distance between these two histograms and then rank every signals based on this distance. The higher the distance, the more different a signal is when comparing anomalous and normal periods. This can orient the investigation of a subject matter expert towards the sensors and associated components.

Parameters
  • index_normal (pandas.DateTimeIndex) – All the normal indices

  • index_anomaly (pandas.DateTimeIndex) – All the indices for anomalies

  • num_bins (integer) – Number of bins to use to build the distributions (default: 20)

get_labels(labels_fname=None)

Get the labelled ranges as provided to the model before training

Parameters

labels_fname (string) – As an option, if you provide a path to a CSV file containing the label ranges, this method will use this file to load the labels. If this argument is not provided, it will load the labels from the trained model Describe API (Default to None)

Returns

A Pandas dataframe with the labelled anomaly ranges listed in chronological order with a Start and End columns

Return type

pandas.DataFrame

get_predictions()

Get the anomaly ranges predicted by the current model

Returns

A Pandas dataframe with the predicted anomaly ranges listed in chronological order with a Start and End columns

Return type

pandas.DataFrame

get_ranked_list(max_signals=12)

Returns the list of signals with computed rank.

Parameters

max_signals (integer) – Number of signals to consider (default: 12)

Returns

A dataframe with each signal and the associated rank value

Return type

pandas.DataFrame

plot_histograms(nb_cols=3, max_plots=12)

Once the histograms are computed, we can plot the top N by decreasing ranking distance. By default, this will plot the histograms for the top 12 signals, with 3 plots per line.

Parameters
  • nb_cols (integer) – Number of plots to assemble on a given row (default: 3)

  • max_plots (integer) – Number of signal to consider (default: 12)

Returns

tuple containing:
  • A matplotlib.pyplot.figure where the plots are drawn

  • A list of matplotlib.pyplot.Axis with each plot drawn here

Return type

tuple

plot_histograms_v2(custom_ranking, nb_cols=3, max_plots=12, num_bins=20)
plot_signals(nb_cols=3, max_plots=12)

Once the histograms are computed, we can plot the top N signals by decreasing ranking distance. By default, this will plot the signals for the top 12 signals, with 3 plots per line. For each signal, this method will plot the normal values in green and the anomalies in red.

Parameters
  • nb_cols (integer) – Number of plots to assemble on a given row (default: 3)

  • max_plots (integer) – Number of signal to consider (default: 12)

Returns

tuple containing:
  • A matplotlib.pyplot.figure where the plots are drawn

  • A list of matplotlib.pyplot.Axis with each plot drawn here

Return type

tuple

set_time_periods(evaluation_start, evaluation_end, training_start, training_end)

Set the time period of analysis

Parameters
  • evaluation_start (datetime) – Start of the evaluation period

  • evaluation_end (datetime) – End of the evaluation period

  • training_start (datetime) – Start of the training period

  • training_end (datetime) – End of the training period