Skip to the content.

Data transformation

TSP supports the following kinds of data transformation (i.e. pre-processing before searching for patterns):

NarrowDataUnfolding

This type unfolds data in “narrow” format (featuring only columns for key and value) into “wide” one (featuring separate column for each key), filling them with previous value if none specified at the given timestamp.

Configuration parameters:

Name | Type | Description —————-|——————-|————————————- key | String | name of the column containing keys value | String | name of the column containing values fieldsTimeoutsMs|Map[String, Long]| expiration timeouts for each key defaultTimeout | Long | default timeout for keys not specified in fieldsTimeoutsMs

Example

Raw stored data:

Time Key Value
2018-11-01T12:00:00Z key_1 1
2018-11-01T12:00:00Z key_3 8
2018-11-01T12:00:05Z key_2 10
2018-11-01T12:00:10Z key_1 2
2018-11-01T12:00:15Z key_3 1
2018-11-01T12:00:20Z key_4 6
2018-11-01T12:00:25Z key_2 15

Configuration:

{
  "key": "Key",
  "value": "Value",
  "fieldsTimeoutsMs": {"key_1": 15000, "key_2": 10000},
  "defaultTimeout": 5000,
}

Result:

Time key_1 key_2 key_3 key_4
2018-11-01T12:00:00Z 1   8  
2018-11-01T12:00:05Z 1 10 8  
2018-11-01T12:00:10Z 2 10    
2018-11-01T12:00:15Z 2 10 1  
2018-11-01T12:00:20Z 2   1 6
2018-11-01T12:00:25Z 2 15   6

(Note that the filling occurs only until specified timeout expires; key_3 and key_4 use default timeout of 5000 milliseconds.)

WideDataFilling

This type expects “wide” data as its input, doing only the filling (which is the same as in NarrowDataUnfolding; it is especially useful for InfluxDB source, which stores values separately by column, but cannot fill with previous values using expiration by time).

Configuration

It accepts only fieldsTimeoutsMs and defaultTimeout fields, with the same semantics as in NarrowDataUnfolding (see above).