Configure Business Model

In this step, we will configure our model by covering different aspects of a typical machine learning model pipeline. First, we will be presented with a window in which we decide the key parameters of the model:

Business Model Target & Benefit Features

What do you want to predict?: Here, you will choose the target feature that you want to predict. Depending on data type of the target column selected, TAZI will recommend using a particular problem type. If your target feature is continuous, TAZI will assume you want to use Regression as your configuration type. If you use a discrete valued target column, you will be recommended to use Classification instead. However, if there is a mismatch between your target column and the problem type you have selected, TAZI will warn you and prompt you to use an appropriate version of the target column (by making it discrete and String type in Classification case) or change your target feature to another column in your dataset.

What is your business KPI?: Optionally, TAZI will let you choose a feature as business KPI that you want to maximize/minimize so that it will also try to optimize this feature while predicting the target value.

Problem Type: TAZI will let you choose between 4 problem types:

Classification

Classification refers to a predictive modeling problem where a class label is predicted for a given example of input data. For example, predicting if an email is spam or not is classification problem.

Regression

In regression algorithm, you have a continuous outcome variable. As an easy way to understand regression problem, we can consider the case of predicting house prices with given features such as house age, region etc.

In the house price example, we have a feature set consisting of attributes of the house, for instance; number of bedrooms, whether the house gets good sunlight during the day and etc. Target feature for this problem would be the house price which is continuous and different from the classification problem in which we are dealing with a discrete variable (whether an instance is classified as class 0 or class 1).

Unsupervised Anomaly Detection

Anomaly detection methods aim to find outliers in a dataset. These anomalies might point to for example unusual network traffic or fraud attemps.

Clustering

Clustering is another configuration type that was recently added to TAZI’s algorithm inventory. Clustering is an unsupervised machine learning principle meaning that you don't need to provide any target label. It involves discovering patterns in data automatically. Unlike supervised learning (like predictive modeling), clustering algorithms only interpret the input data and find natural groupings or clusters in feature space.

Configuration Description to Recall: You can write a description to your configuration optionally.

After filling out the mandatory and optional fields, click Submit.

At the upper right side of the window, you can spot more of TAZI's characteristics which we'll describe next:

Preview Data Source

By clicking the Preview Data Source button, an additional window will open for you to have a look at the data source that you have given as input to the Business Model: