## Machine Learning – Classification and Regression Analysis

Machine Learning is the science and art of programming computers so they can learn from data.

For example, your spam filter is a Machine Learning program that can learn to flag spam given examples of spam emails (flagged by users, detected by other methods) and examples of regular (non-spam, also called “ham”) emails.

The examples that the system uses to learn are called the training set. The new ingested data is called the test set. The performance measure of the prediction model is called accuracy and it’s the objetive of this project.

The tools

https://scikit-learn.org/stable/tutorial/basic/tutorial.html

Supervised learning

In general, a learning problem considers a set of n samples of data and then tries to predict properties of unknown data. If each sample is more than a single number and, for instance, a multi-dimensional entry (aka multivariate data), it is said to have several attributes or features.

Supervised learning consists in learning the link between two datasets: the observed data X and an external variable y that we are trying to predict, usually called “target” or “labels”. Most often, y is a 1D array of length n_samples.

All supervised estimators in scikit-learn implement a fit(X, y) method to fit the model and a predict(X) method that, given unlabeled observations X, returns the predicted labels y.

If the prediction task is to classify the observations in a set of finite labels, in other words to “name” the objects observed, the task is said to be a classification task. On the other hand, if the goal is to predict a continuous target variable, it is said to be a regression task.

When doing classification in scikit-learn, `y` is a vector of integers or strings.

The Models

LinearRegression, in its simplest form, fits a linear model to the data set by adjusting a set of parameters in order to make the sum of the squared residuals of the model as small as possible.

LogisticRegression, which has a very counter-intuitive model, is a better choice when linear regression is not the right approach as it will give too much weight to data far from the decision frontier. A linear approach is to fit a sigmoid function or logistic function.

The Data

Data is presented on a CSV file. It has around 2500 rows, with 5 columns. Correct formatting and integrity of values cannot be assured, so additional processing will be needed. The sample file is like this.

The Code

We need three main libraries to start:

• numpy, which basically is a N-dimensional array object. It also has tools for linear algebra, Fourier transforms and random numbers.
It can be used as an efficient multi-dimensional container of generic data, where arbitrary data-types can be defined.
• pandas, which provides high-performance and easy-to-use data structures and data analysis tools simple and efficient tools for data mining and data analysis
• sklearn, the main machine learning library. It has capabilities for classification, regression, clustering, dimensionality reduction, model selection and data preprocessing.

A non essential, but useful library is matplotlib, to plot sets of data.

In order to provide data for sklearn models to work, it has to be encoded first. As the sample data has strings, or labels, a LabelEncoder is needed. Next, the prediction model is declared, where a LogisticRegression model is used.

The input data file path is also declared, in order to be loaded with pandas.read_csv().

``````import pandas as pd
import numpy as np
import matplotlib.pyplot as pyplot

from sklearn.preprocessing import LabelEncoder
from sklearn.linear_model import LogisticRegression

encoder = LabelEncoder()
model = LogisticRegression(
solver='lbfgs', multi_class='multinomial', max_iter=5000)

# Input dataset
file = "sample_data.csv"``````

The CSV file can be loaded into a pandas dataframe in a single line. The library also provides a convenient method to remove any rows with missing values.

``````# Use pandas to load csv. Pandas can eat mixed data with numbers and strings
# Remove missing values
data = data.dropna()

print("Valid data items : %s" % len(data))``````

Once loaded, the data needs to be encoded in order to be fitted into the prediction model. This is handled by the previously declared LabelEncoder. Once encoded, the x and y datasets are selected. The pandas library provides a way to drop entire labels from a dataframe, which allows to easily select data.

``````encoded_data = data.apply(encoder.fit_transform)
x = encoded_data.drop(columns=['PREDICTION'])
y = encoded_data.drop(columns=['DRAFT', 'ACT', 'SLAST', 'FLAST'])``````

The main objective is to test against different lengths of train and test data, to find out how much data provides the best accuracy. The lengths of data will be incremented in steps of 100 to get a broad variety of results.

``````length = 100
scores = []
lenghts = []
while length < len(x):
x_train = x[:length]
y_train = y[:length]
x_test = x.sample(n=length)
y_test = y.sample(n=length)
print("Fitting model for %s training values" % length)
trained = model.fit(x_train, y_train.values.ravel())
score = model.score(x_test, y_test)
print("Score for %s training values is %0.6f" % (length, score))
length = length + 100
scores.append(score)
lenghts.append(length)``````

Finally, a plot is made with the accuracy scores.

``````pyplot.plot(lenghts,scores)
pyplot.ylabel('accuracy')
pyplot.xlabel('values')
pyplot.show()``````

## Using Zabbix API for Custom Reports

Zabbix is an open source monitoring tool for diverse IT components, including networks, servers, virtual machines (VMs) and cloud services. It provides monitoring metrics, among others network utilization, CPU load and disk space consumption. Data can be collected in a agent-less fashion using SNMP, ICMP, or with an multi-platform agent, available for most operating systems.

Even when it is considered one of the best NMS on the market, its reporting capabilities are very limited. For example, this is an availability report created with PRTG.

And this is a Zabbix Report. There is no graphs, no data tables, and it is difficult to establish a defined time span for the data collection.

My client required an executive report with the following information.

• Host / Service Name
• Minimum SLA for ICMP echo request monitoring
• Achieved SLA for ICMP echo request monitoring
• Memory usage graph, if host is being SNMP-monitored
• Main network interface graph, if host is being SNMP-monitored
• And storage usage graph, also if the host is being SNMP-monitored

Using the Zabbix API

To do call the API, we need to send HTTP POST requests to the api_jsonrpc.php file located in the frontend directory. For example, if the Zabbix frontend is installed under http://company.com/zabbix, the HTTP request to call the apiinfo.version method may look like this:

``````POST http://company.com/zabbix/api_jsonrpc.php HTTP/1.1
Content-Type: application/json-rpc
{
"jsonrpc":"2.0",
"method":"apiinfo.version",
"id":1,
"auth":null,
"params":
{
}
}``````

The request must have the Content-Type header set to one of these values: application/json-rpc, application/json or application/jsonrequest.

Before access any data, it’s necessary to log in and obtain an authentication token. The user.login method is used for this.

``````{
"jsonrpc": "2.0",
"params": {
},
"id": 1,
"auth": null
}``````

If the authentication request succeeds, the API response will look like this.

``````{
"jsonrpc": "2.0",
"result": "0424bd59b807674191e7d77572075f33",
"id": 1
}``````

The result field is the authentication token, which will be sent on subsequent requests.

Instead of reinvent the wheel, let’s use a existing library to call the API.

Using jqzabbix jQuery plugin for the Zabbix API

GitHub user kodai provides a nice JavaScript client, in a form of a jQuery plugin. You can get it on https://github.com/kodai/jqzabbix.

The usage is quite forward, first, include both jQuery and jqzabbix.js on your HTML file. I using Cloudflare to link jQuery.

``````<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js">/script>
<script type="text/javascript" charset="utf-8" src="jqzabbix.js"></script>``````

An object has to be created to initialize the client. I prefer to set url, username, and password dynamically, with data provided by the end user, so no credentials are stored here.

``````server = new \$.jqzabbix({
url: url,  			// URL of Zabbix API
basicauth: false,   // If you use basic authentication, set true for this option
busername: '',      // User name for basic authentication
timeout: 5000,      // Request timeout (milli second)
limit: 1000,        // Max data number for one request
});``````

As told before, the first step is to authenticate with the API, and save the authorization token. This is handled by the jqzabbix library by first making a request to get the API version, and then authenticating.

``````server.getApiVersion();

If the authentication procedure is completed properly, the API version and authentication ID are stored as properties of the server object. The userlogin() method allows to set callbacks for both success and error.

``````var success = function() { console.log('Success!'); }
var error = function() { console.error('Error!'); }

Once authenticated, the Zabbix API methods are called in the following fashion with the sendAjaxRequest method.

``server.sendAjaxRequest(method, params, success, error)``

Retrieving Hosts

I set a global array hosts to store the hosts information.
Another global array called SEARCH_GROUPS is used to define which hosts groups should considered on the API request. By setting the selectHosts parameter to true, the hosts on the host groups are retrieved too on the response.

On success, the result is stored on the hosts array, and the get_graphs function is called. If there is an error, the default error callback is fired.

``````hosts = [];
function get_hosts() {
// Get hosts
server.sendAjaxRequest(
"hostgroup.get",
{
"selectHosts": true,
"filter": {
"name": SEARCH_GROUPS
}
},
function (e) {
e.result.forEach(group => {
group.hosts.forEach(host => {
hosts.push(host);
});
});
get_graphs();
},
error,
);
}``````

Retrieving Graphs

Previously, user defined graphs were configured on Zabbix, to match the client requeriments of specific information. All names for the graphs that should be included on the report were terminated the ” – Report” suffix.

This function retrieves all those graphs, and by setting the selectHosts to true, the hosts linked to each graph are retrieved too.

On success, the result is stored on the graphs array, and the render function is called. If there is an error, the default error callback is fired.

``````graphs = [];
function get_graphs() {
server.sendAjaxRequest(
"graph.get",
{
"selectHosts": "*",
"search": {
name: "- Report"
}
},
function (e) {
graphs = e.result;
render();
},
error
)
}``````

Retrieving Graphs Images Instead of Graph Data

By this time you should have noticed that the Zabbix API allows to retrieve values for the graphs, but no images. An additional PHP file will be stored with the HTML and JS files, as a helper to call the web interface by using php_curl.

You can get it on https://zabbix.org/wiki/Get_Graph_Image_PHP. I made a couple modifications to it in order to pass username and password on the URL query, with parameters for the graph ID, the timespan, and the image dimensions.

``````<?php
//////////
// GraphImgByID v1.1
// (c) Travis Mathis - [email protected]
// It's free use it however you want.
// ChangeLog:
// 1/23/12 - Added width and height to GetGraph Function
// 23/7/13 - Zabbix 2.0 compatibility
// ERROR REPORTING
error_reporting(E_ALL);
set_time_limit(1800);

\$graph_id = filter_input(INPUT_GET,'id');
\$period= filter_input(INPUT_GET,'period');
\$width= filter_input(INPUT_GET,'width');
\$height = filter_input(INPUT_GET,'height');
\$user = filter_input(INPUT_GET,'user');
\$pass = filter_input(INPUT_GET,'pass');

//CONFIGURATION
\$z_server = 'zabbix_url'; //set your URL here
\$z_user = \$user;
\$z_pass = \$pass;
\$z_img_path = "/usr/local/share/zabbix/custom_pages/tmp_images/";

//NON CONFIGURABLE
\$z_url_index = \$z_server . "index.php";
\$z_url_graph = \$z_server . "chart2.php";
\$z_url_api = \$z_server . "api_jsonrpc.php";

// Zabbix 1.8
// Zabbix 2.0

// FUNCTION
function GraphImageById(\$graphid, \$period = 3600, \$width, \$height) {
// file names
\$image_name = \$z_img_path . "zabbix_graph_" . \$graphid . ".png";

//setup curl
\$ch = curl_init();
curl_setopt(\$ch, CURLOPT_URL, \$z_url_index);
curl_setopt(\$ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt(\$ch, CURLOPT_BINARYTRANSFER, true);
curl_setopt(\$ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt(\$ch, CURLOPT_POST, true);
curl_exec(\$ch);
// get graph
curl_setopt(\$ch, CURLOPT_URL, \$z_url_graph . "?graphid=" . \$graphid . "&width=" . \$width . "&height=" . \$height . "&period=" . \$period);
\$output = curl_exec(\$ch);
curl_close(\$ch);
/*
\$fp = fopen(\$image_name, 'w');
fwrite(\$fp, \$output);
fclose(\$fp);
*/
return \$output;
}

echo GraphImageById(\$graph_id, \$period, \$width, \$height);``````

Quick and Dirty Frontend

You should be able to customize this small frontend to your needs.

``````<html>

<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<script src="jqzabbix.js"></script>
<style>
.host-container {
margin-bottom: 3em;
}
@media print {
.host-container {
page-break-before: auto;
page-break-after: auto;
page-break-inside: avoid;
}
img {
display: block;
page-break-before: auto;
page-break-after: auto;
page-break-inside: avoid;
}
}
</style>

<body>
<div id="container" class="container">

<div class="row" style="margin-bottom: 3em">
<div class="col">
<h2>Services and Availability Report</h2>
<table id="table" class="bg-dark">
<th>Host Name</th>
<th>Target</th>
<th class="is-text-center">Availibilty</th>
<th class="is-text-center">Availabilty Status</th>
<th class="is-text-center">Total Availability</th>
</table>
</div>
</div>

<div id="template" style="display: none">
<div class="host-container">
<div class="row bg-dark">
<div class="col-12">
<span id="host-HOST_ID-name">Service Name</span>
</div>
</div>
<div class="row bg-light">
<div class="col-3">
Status
</div>
<div class="col-3">
SLA Minimum
</div>
<div class="col-3">
SLA
</div>
</div>
<div class="row bg-primary">
<div class="col-3">
<span id="host-HOST_ID-status"></span>OK</span>
</div>
<div class="col-3">
<span id="host-HOST_ID-sla"></span>99.9%
</div>
<div class="col-3">
<span id="host-HOST_ID-sla-value"></span>100%
</div>
</div>
<div class="row is-text-center" id="host-HOST_ID-graphs">
</div>
</div>
</div>

</div>

<script src="ui.js"></script>

</body>

</html>``````

Result

The final page is a complete report, including a briefing table which resumes the services status and SLA compliance.