Skip to content

Advanced

In the Advanced part of the Configuration Map, you can tune more advanced settings of the way TAZI works and runs your model:

TAZI Hunt Settings

Reporting Interval: At each this many instances explanation model is saved. set this to at most total number of instances divided by 1000.
Batch Training: Used by clustering algorithms to save the clustering model.
Train: Used by clustering algorithms to save the clustering model.
Input Shuffling: Enabling train data source shuffling (only file data source type).
Train End Point: Used by clustering algorithms to save the clustering model.
Batch Train End Point: Used by clustering algorithms to save the clustering model.
Graph Mini-Batch Size: Internal data flow mini-batch size. (Not related to training)
Snapshot Instance: Used by clustering algorithms to save the clustering model.
Model Save Point: At each this many instances the whole tazi system is saved and the saved system can be loaded for training another system. set this to at most number of instances divided by 10. If you dont want to save any tazi systems set this to a very large number.
Snapshot Instance: Used by clustering algorithms to save the clustering model.
dailyModelSavePoint: At this hour the whole tazi system is saved. works like the model save point.
Initial Mini: Tazi mini is an explanation model, this option is used when tazi mini outputs are used as predictions. This controls which explanatio model should the tazi start with.
Write Original Instances:
Selected Mini Model: Controls which model should be loaded as the mini model: huntcurrent: the standard explanation model, committed: the committed model (see tazilive-model tab).
Char Reset Window: Controls the interval at which the characterization should be refreshed. Used to find out if there are unexpected changes in the data characteristics.
Test Feedback Type:
Send Inner Feedback: Feed the semi-supervised model with the labeled instances automatically (without necessarily the human feedback).
Number of Feedback: Number of instances used for each semi-supervised model feedback when Send Inner Feedback is true.
Save Unsup Features: Save the unsup processed model's features (cluster descriptors) at each 'Snapshot Instance'.
Save Models: If this is set to false, the models are not saved at all.
Reset Confusion Matrices: Reset the values in the confusion matrices of each ML model when the trainendpoint is reached or a new model is loaded.
Confusion Matrix Reset Interval: Reset the confusion matrices so that the latest model performance is used for classifier combination and also model evaluation.
Terminate on Exception: Used for debugging purposes. If there is any (unhandled) exception while tazi runs controls if tazi should stop running.
Chosen Model for Optimization: Which model is used for hypo performance computation: combined (default), FulllSupModel, NN, SemiSupModel, UnsupModel, UnsupModelProcessed. If there are multiple models of the same type, hypo uses the first one (with id 0).
Probability Threshold: If asynchtrain = true, then when training a classifier, choose randomly with this probabilty whether the instance should be used for training.
Confidence Threshold: If asynchtrain = true, then when training a classifier, choose instances whose prediction confidence is below this value.
Min Generated Instance With Max Leafed Label: When training tazi based on a committed model we generate synthetic data based on the committed model. This parameter controls how many instances should be generated from each label.
Synt Accuracy Threshold: When training tazi based on a committed model we generate synthetic data based on the committed model. This parameter controls when to stop generating and teaching from the synthetic model.
Target Reached Delta: When training tazi based on a committed model we generate synthetic data based on the committed model. This parameter controls when we assume that we are close enough to the Synt Accuracy Threshold. e.g. If we are aiming for 0.9 accuracy and Target Reached Delta is 0.1, we stop training at 0.8 accuracy.
First Difference Threshold: If asynchtrain = true, if training thread is lagged by this amount then use a first severity degree method of elimination.
Second Difference Threshold: If asynchtrain = true, if training thread is lagged by this amount then use a second severity degree method of elimination.
Third Difference Threshold: If asynchtrain = true, if training thread is lagged by this amount then use a third severity degree method of elimination.
Asynchronous Training: Asynchtrain allows tazi to be faster by training not all but some selected necessary instances.
Use FullSup For Explanation: If the Fullsup is only active model, use it for explanation.

Unsup

Time Window: -
Last N: -

Outlier Detector

Logger Name: Logger name for the anomaly detection using clustering.
EPS Threshold: -
Radius Scaler: -
Enough For Boundary Decision: -

Use Batch Optimization

Active: -

Model Parameters

Boosting Type: 'gbdt', traditional Gradient Boosting Decision Tree. 'dart', Dropouts meet Multiple Additive Regression Trees. 'goss', Gradient-based One-Side Sampling. 'rf', Random Forest.
Max Levels: Maximum tree leaves for base learners
Max Depth: Maximum tree depth for base learners, <=0 means no limit.
Learning Rate: Boosting learning rate.
Number of Estimators: Number of boosted trees to fit.
Subsample For Bin: Number of samples for constructing bins.
Min Split Gain: Minimum loss reduction required to make a further partition on a leaf node of the tree.
Min Child Weight: Minimum sum of instance weight (hessian) needed in a child (leaf).
Min Child Samples: Minimum number of data needed in a child (leaf).
Subsample: Subsample ratio of the training instance.
Subsample Freq: Frequence of subsample, <=0 means no enable.
Column Sample by Tree: Subsample ratio of columns when constructing each tree.
Regularization Alpha: L1 regularization term on weights.
Regularization Lambda: L2 regularization term on weights.
Random State: Random number seed.
Number of Jobs: Random number seed.
Importance Type: The type of feature importance to be filled into feature_importances_. If 'split', result contains numbers of times the feature is used in a model. If 'gain', result contains total gains of splits which use the feature.

Clustering

Find Optimal Number of Clusters: -
Number of Clusters: Number of Clusters

Feature Statistics

Discretizer Type: -
Use Instance Count For Encoding: -
Auto Ignore: If set to true, feature selection is used. If false, all features are used.
Verbose Output: Verbose feature statistics output.
Discretizer For Relevance Calculation: Verbose characterizer output.

Common Options

Input Data

Shuffle Seed: Seed to use in random shuffling of the data source.

Parser Options

Number of Threads 'Number of threads used for parsing the incoming data.
Instance Timeout: If an instance has not been processed in this many seconds tazi throws a timeout exception.

Default Label: -

Feature Statistics Properties

Maximum Unique Elements: -
Label Feature: -
Relevance Confidence: Z-score decision point to accept the calculated relevance; higher value will eliminate more features.
Relevance Coverage: Decrease for a simpler model. Use the features whose relevance are within the first this part of the screen plot based on the feature relevances ordered from highest to lowest.
Max Number of Quantizer Steps Maximum Number of Quantizer Steps.

Seed Random: A global seed value used for random number generator initialization. In order to get the same results across different runs.