Welcome to Lookout for Equipment SDK’s documentation!

Installation, testing and development

Dependencies

lookoutequipment requires:

  • python (>= 3.6)

  • boto3 (>= 1.17.48)

  • markdown

  • numpy

  • pandas

  • pyarrow

  • s3fs

To run the examples Matplotlib (>=3.0.0) is required.

User installation

If you already have a working installation of numpy, boto3, pandas… you can easily install lookoutequipment using pip:

pip install lookoutequipment

You can also get the latest version of lookoutequipment by cloning the repository:

git clone https://github.com/aws-samples/amazon-lookout-for-equipment-python-sdk.git
cd lookoutequipment
pip install .

Development

For more information about our contributing guidelines, please refer to the contribute.

User Guide

Introduction

To build and use and anomaly detection model with Amazon Lookout for Equipment, you need to go through the following steps:

  1. Preparing your time series dataset and your historical maintenance time ranges

  2. Upload your data on Amazon S3

  3. Create a dataset and ingest the S3 data into it

  4. Train a Lookout for Equipment model

  5. Download and post-process the evaluation results

  6. Configure and start a scheduler

  7. Upload fresh data to Amazon S3

  8. Download the inference results generated by the scheduler

The Amazon Lookout for Equipment SDK will help you streamline steps 3 to 8. Some sample datasets are also provided along with some utility functions to tackle steps 1 and 2.

Dataset management

Creating a dataset

Let’s start by loading a sample dataset and uploading it to a location on S3:

from lookoutequipment import dataset

root_dir = 'expander-data
bucket = '<<YOUR-BUCKET>>'     # Replace by your bucket name
prefix = '<<YOUR-PREFIX>>/'    # Don't forget the training slash
role_arn = '<<YOUR-ROLE-ARN>>' # An ARN role with access to your S3 data

data = dataset.load_dataset(
    dataset_name='expander',
    target_dir=root_dir
)
dataset.upload_dataset(root_dir, bucket, prefix)

From there we are going to instantiate a class that will help us manage our Lookout for Equipment dataset:

lookout_dataset = dataset.LookoutEquipmentDataset(
    dataset_name='my_dataset',
    access_role_arn=role_arn,
    component_root_dir=f'{bucket}/{prefix}training-data'
)

You will need to specify an ARN for a role that have access to your data on S3.

The following line creates the dataset in the Lookout for Equipment service. If you log into your AWS Console and browse to the Lookout for Equipment datasets list you will see an empty dataset:

lookout_dataset.create()

Ingesting data into a dataset

This dataset is empty: let’s ingest our prepared data (note the trailing slash at the end of the prefix):

response = lookout_dataset.ingest_data(bucket, prefix + 'training-data/')

The ingestion process will take a few minutes. If you would like to get a feedback from the ingestion process, you can enable a waiter by replacing the previous command by the following:

response = lookout_dataset.ingest_data(
    bucket,
    prefix + 'training-data/',
    wait=True,
    sleep_time=60
)

Model training

Once you have ingested some time series data in your dataset, you can train an anomaly detection model:

from lookoutequipment import model

lookout_model = model.LookoutEquipmentModel(model_name='my_model', dataset_name='my_dataset')
lookout_model.set_time_periods(data['evaluation_start'],
                               data['evaluation_end'],
                               data['training_start'],
                               data['training_end'])
lookout_model.set_label_data(bucket=bucket,
                             prefix=prefix + 'label-data/',
                             access_role_arn=role_arn)
lookout_model.set_target_sampling_rate(sampling_rate='PT5M')
response = lookout_model.train()
lookout_model.poll_model_training(sleep_time=300)

You will see a progress status update every five minutes until the training is successful. With the sample dataset used in this user guide, the training can take up to an hour.

Once your model is trained, you can either check the results over the evaluation period or configure an inference scheduler.

Evaluating a trained model

Plot detected events

Once a model is trained, the DescribeModel API from Amazon Lookout for Equipment will record the metrics associated to the training.

This API returns a dictionnary with two fields of interest to plot the evaluation results: labelled_ranges and predicted_ranges which respectively contain the known and predicted anomalies in the evaluation range. Use the following SDK command to get both of these in a Pandas dataframe:

from lookoutequipment import evaluation

LookoutDiagnostics = evaluation.LookoutEquipmentAnalysis(model_name='my_model', tags_df=data['data'])
predicted_ranges = LookoutDiagnostics.get_predictions()
labeled_range = LookoutDiagnostics.get_labels()

Note: the labeled range from the DescribeModel API, only provides any labelled data falling within the evaluation range. If you want to plot or use all of them (including the labels falling within the training range), you should use the original label data by replacing the last line of the previous code snippet by the following code:

labels_fname = os.path.join(root_dir, 'labels.csv')
labeled_range = LookoutDiagnostics.get_labels(labels_fname)

You can then plot one of the original time series signal and add an overlay of the labeled and predicted anomalous events by levering the plot utilities:

from lookoutequipment import plot

TSViz = plot.TimeSeriesVisualization(timeseries_df=data['data'], data_format='tabular')
TSViz.add_signal(['signal-001'])
TSViz.add_labels(labeled_range)
TSViz.add_predictions([predicted_ranges])
TSViz.add_train_test_split(data['evaluation_start'])
TSViz.add_rolling_average(60*24)
TSViz.legend_format = {'loc': 'upper left', 'framealpha': 0.4, 'ncol': 3}
fig, axis = TSViz.plot()

This code will generate the following plot where you can see:

  • A line plot for the signal selected: the part used for training the model appears in blue while the evaluation part is in gray.

  • The rolling average appears as a thin red line overlayed over the time series.

  • The labels are shown in a green ribbon labelled “Known anomalies” (by default)

  • The predicted events are shown in a red ribbon labelled “Detected events”

_images/model_evaluation.png

Plot signal distribution

You might be curious about why Amazon Lookout for Equipment detected an anomalous event. Sometime, looking at a few of the time series is enough. But sometime, you need to dig deeper.

The following function, aggregate the signal importance of every signals over the evaluation period and sum these contributions over time for each signal. Then, it takes the top 8 signals and plot two distributions: one with the values each signal takes during the normal periods (present in the evaluation range) and a second one with the values taken during all the anomalous events detected in the evaluation range. This will help you visualize any significant shift of values for the top contributing signals.

You can also restrict these histograms over a specific range of time by setting the start and end arguments of the following function with datetime values:

from lookoutequipment import plot

TSViz = plot.TimeSeriesVisualization(timeseries_df=data['data'], data_format='tabular')
TSViz.add_predictions([predicted_ranges])
fig = TSViz.plot_histograms(freq='5min')

This code will generate the following plot where you can see a histogram for the top 8 signals contributing to the detected events present in the evaluation range of the model:

_images/histograms.jpg

Scheduler management

Once a model is successfully trained, you can configure a scheduler that will run regular inferences based on this model:

from lookout import scheduler

lookout_scheduler = scheduler.LookoutEquipmentScheduler(
    scheduler_name='my_scheduler',
    model_name='my_model'
)

scheduler_params = {
    'input_bucket': bucket,
    'input_prefix': prefix + 'inference-data/input/',
    'output_bucket': bucket,
    'output_prefix': prefix + 'inference-data/output/',
    'role_arn': role_arn,
    'upload_frequency': 'PT5M',
    'delay_offset': None,
    'timezone_offset': '+00:00',
    'component_delimiter': '_',
    'timestamp_format': 'yyyyMMddHHmmss'
}

lookout_scheduler.set_parameters(**scheduler_params)

When the scheduler wakes up, it looks for the appropriate files in the input location configured above. It also opens each file and only keep the data based on their timestamp. Use the following command to prepare some inference data using the sample we have been using throughout this user guide:

dataset.prepare_inference_data(
    root_dir='expander-data',
    sample_data_dict=data,
    bucket=bucket,
    prefix=prefix
)
response = lookout_scheduler.create()

This will create a scheduler that will process one file every 5 minutes (matchin the upload frequency set when configuring the scheduler). After 15 minutes or so, you shoud have some results available. To get these results from the scheduler in a Pandas dataframe, you just have to run the following command:

results_df = lookout_scheduler.get_predictions()

In this dataframe, you will find one row per event (i.e. one row per scheduler execution). You can then plot the feature importance of any given event. For instance, the following code will plot the feature importance for the first inference execution result:

event_details = pd.DataFrame(results_df.iloc[0, 1:]).reset_index()
fig, ax = plot.plot_event_barh(event_details)

This is the result you should have with the sample data:

_images/feature_importance.png

Once you’re done, do not forget to stop the scheduler to stop incurring cost:

scheduler.stop()

You can restart your scheduler with a call to scheduler.start() and when you don’t have any more use for your scheduler you can delete a stopped scheduler by running scheduler.delete().

API Documentation

Full API documentation of the lookoutequipment Python package.

Schema

schema.create_data_schema_from_dir(root_dir)

Generates a data schema compatible for Lookout for Equipment from a local directory

schema.create_data_schema_from_s3_path(s3_path)

Generates a data schema compatible for Lookout for Equipment from an S3 directory

schema.create_data_schema(component_fields_map)

Generates a JSON formatted string from a dictionary

Datasets

dataset.list_datasets([dataset_name_prefix, …])

List all the Lookout for Equipment datasets available in this account.

dataset.load_dataset(dataset_name, target_dir)

This function can be used to download example datasets to run Amazon Lookout for Equipment on.

dataset.upload_dataset(root_dir, bucket, prefix)

Upload a local dataset to S3.

dataset.prepare_inference_data(root_dir, …)

This function prepares sequence of data suitable as input for an inference scheduler.

dataset.generate_replay_data(dataset_name, …)

Generates inference input data from the training data to test a scheduler that would be configured for a model trained with this dataset.

dataset.LookoutEquipmentDataset(…[, …])

A class to manage Lookout for Equipment datasets

Models

model.list_models([model_name_prefix, …])

List all the models available in the current account

model.LookoutEquipmentModel(model_name, …)

A class to manage Lookout for Equipment models

Evaluation

evaluation.LookoutEquipmentAnalysis(…)

A class to manage Lookout for Equipment result analysis

Scheduler

Plot

plot.plot_histogram_comparison(timeseries_1, …)

Takes two timeseries and plot a histogram showing their respective distribution of values

plot.plot_event_barh(event_details[, …])

Plot a horizontal bar chart with the feature importance of each signal that contributes to the event passed as an argument.

plot.plot_range(range_df, range_title, …)

Plot a range with either labelled or predicted events as a filled area positionned under the timeseries data.

plot.TimeSeriesVisualization(timeseries_df, …)

A class to manage time series visualization along with labels and detected events

The Amazon Lookout for Equipment SDK is an open source Python package that allows data scientists and software developers to easily build and deploy time series anomalie detection models using Amazon Lookout for Equipment. This SDK enables to do the following:

  • Build dataset schema

  • Data upload to the necessary S3 structure

  • Train an anomaly detection model using Amazon Lookout for Equipment

  • Build beautiful visualization for your model evaluation

  • Configure and start an inference scheduler

  • Manage schedulers (start, stop, delete) whenever necessary

  • Visualize scheduler inferences results