Machine Learning – Weighted Train Data

Last post talked about an introduction to Machine Learning and how outcomes can be predicted using sklearn’s LogisticReggression.

Sometimes, the input data could require additional processing to prefer certain classes of information, that it considered more valuable or more representative to the outcome.

The LogisticRegression model allows to set the preference, or weight, at the time of being created, or later when being fitted.

The data used on the previous entry had four main classes: DRAFT, ACT, SLAST and FLAST. Once it is encoded and fitted, it can be selected by its index. I prefer to initialize some mnemonics selectors to ease the coding and make the entire code more human friendly.

x_columns_names = ['DRAFT', 'ACT', 'SLAST', 'FLAST']
y_columns_names = ['PREDICTION']

# Indexes for columns, used for weighting
DRAFT = 0
ACT = 1
SLAST = 2
FLAST = 3

# Weights
DRAFT_WEIGHT = 1
ACT_WEIGHT = 1
SLAST_WEIGHT = 1
FLAST_WEIGHT = 1

The model can be initialized lated using the following method, where the class_weight parameter is used referencing the previous helpers.

model = LogisticRegression(
    solver='lbfgs',
    multi_class='multinomial',
    max_iter=5000,
    class_weight={
        DRAFT: DRAFT_WEIGHT,
        ACT: ACT_WEIGHT,
        SLAST: SLAST_WEIGHT,
        FLAST: FLAST_WEIGHT,
    })