LookoutEquipmentAnalysis¶
- class src.lookoutequipment.evaluation.LookoutEquipmentAnalysis(model_name, tags_df)¶
A class to manage Lookout for Equipment result analysis
- model_name¶
the name of the Lookout for Equipment trained model
- Type
string
- predicted_ranges¶
a Pandas dataframe with the predicted anomaly ranges listed in chronological order with a Start and End columns
- Type
pandas.DataFrame
- labelled_ranges¶
A Pandas dataframe with the labelled anomaly ranges listed in chronological order with a Start and End columns
- Type
pandas.DataFrame
- df_list¶
A list with each time series into a dataframe
- Type
list of pandas.DataFrame
Methods
__init__
(model_name, tags_df)Create a new analysis for a Lookout for Equipment model.
compute_histograms
([index_normal, …])This method loops through each signal and computes two distributions of the values in the time series: one for all the anomalies found in the evaluation period and another one with all the normal values found in the same period.
get_labels
([labels_fname])Get the labelled ranges as provided to the model before training
Get the anomaly ranges predicted by the current model
get_ranked_list
([max_signals])Returns the list of signals with computed rank.
plot_histograms
([nb_cols, max_plots])Once the histograms are computed, we can plot the top N by decreasing ranking distance.
plot_histograms_v2
(custom_ranking[, …])plot_signals
([nb_cols, max_plots])Once the histograms are computed, we can plot the top N signals by decreasing ranking distance.
set_time_periods
(evaluation_start, …)Set the time period of analysis
- __init__(model_name, tags_df)¶
Create a new analysis for a Lookout for Equipment model.
- Parameters
model_name (string) – The name of the Lookout for Equipment trained model
tags_df (pandas.DataFrame) – A dataframe containing all the signals, indexed by time
region_name (string) – Name of the AWS region from where the service is called.
- compute_histograms(index_normal=None, index_anomaly=None, num_bins=20)¶
This method loops through each signal and computes two distributions of the values in the time series: one for all the anomalies found in the evaluation period and another one with all the normal values found in the same period. It then computes the Wasserstein distance between these two histograms and then rank every signals based on this distance. The higher the distance, the more different a signal is when comparing anomalous and normal periods. This can orient the investigation of a subject matter expert towards the sensors and associated components.
- Parameters
index_normal (pandas.DateTimeIndex) – All the normal indices
index_anomaly (pandas.DateTimeIndex) – All the indices for anomalies
num_bins (integer) – Number of bins to use to build the distributions (default: 20)
- get_labels(labels_fname=None)¶
Get the labelled ranges as provided to the model before training
- Parameters
labels_fname (string) – As an option, if you provide a path to a CSV file containing the label ranges, this method will use this file to load the labels. If this argument is not provided, it will load the labels from the trained model Describe API (Default to None)
- Returns
A Pandas dataframe with the labelled anomaly ranges listed in chronological order with a Start and End columns
- Return type
pandas.DataFrame
- get_predictions()¶
Get the anomaly ranges predicted by the current model
- Returns
A Pandas dataframe with the predicted anomaly ranges listed in chronological order with a Start and End columns
- Return type
pandas.DataFrame
- get_ranked_list(max_signals=12)¶
Returns the list of signals with computed rank.
- Parameters
max_signals (integer) – Number of signals to consider (default: 12)
- Returns
A dataframe with each signal and the associated rank value
- Return type
pandas.DataFrame
- plot_histograms(nb_cols=3, max_plots=12)¶
Once the histograms are computed, we can plot the top N by decreasing ranking distance. By default, this will plot the histograms for the top 12 signals, with 3 plots per line.
- Parameters
nb_cols (integer) – Number of plots to assemble on a given row (default: 3)
max_plots (integer) – Number of signal to consider (default: 12)
- Returns
- tuple containing:
A
matplotlib.pyplot.figure
where the plots are drawnA
list of matplotlib.pyplot.Axis
with each plot drawn here
- Return type
tuple
- plot_histograms_v2(custom_ranking, nb_cols=3, max_plots=12, num_bins=20)¶
- plot_signals(nb_cols=3, max_plots=12)¶
Once the histograms are computed, we can plot the top N signals by decreasing ranking distance. By default, this will plot the signals for the top 12 signals, with 3 plots per line. For each signal, this method will plot the normal values in green and the anomalies in red.
- Parameters
nb_cols (integer) – Number of plots to assemble on a given row (default: 3)
max_plots (integer) – Number of signal to consider (default: 12)
- Returns
- tuple containing:
A
matplotlib.pyplot.figure
where the plots are drawnA
list of matplotlib.pyplot.Axis
with each plot drawn here
- Return type
tuple
- set_time_periods(evaluation_start, evaluation_end, training_start, training_end)¶
Set the time period of analysis
- Parameters
evaluation_start (datetime) – Start of the evaluation period
evaluation_end (datetime) – End of the evaluation period
training_start (datetime) – Start of the training period
training_end (datetime) – End of the training period