feature engineer

count features

class autox.autox_competition.feature_engineer.fe_count.FeatureCount[source]

Convert categorical features into the number of occurrences.

fit(df, degree=1, target=None, df_feature_type=None, silence_cols=[], select_all=True, max_num=None)[source]
Parameters
  • df – dataframe, train_test.

  • degree – int, degree equal to 1 or 2.

  • target – str, target column.

  • df_feature_type – dict, {col: type of col}.

  • silence_cols

  • select_all

  • max_num

transform(df)[source]
Parameters

df – dataframe, train_test.

Returns

dataframe, count features.

cross features

class autox.autox_competition.feature_engineer.fe_cross.FeatureCross(importance_type='split')[source]

synthetic feature formed by multiplying (crossing) two features.

fit(X, y, objective, category_cols, top_k=10, used_cols=[])[source]
Parameters
  • X – {array-like, sparse matrix} of shape (n_samples, n_features). Training vector, where n_samples is the number of samples and n_features is the number of features.

  • y – array-like of shape (n_samples,). Target vector relative to X.

  • objective – str, objective equal to ‘binary’ or ‘regression’.

  • category_cols – list, column names of categorical features.

  • top_k – int, keep the top_k importance cross features, default top_k = 10.

  • used_cols – list, columns will be used for training model, default top_k = 10.

transform(X)[source]
Parameters

X – {array-like, sparse matrix} of shape (n_samples, n_features). Training vector, where n_samples is the number of samples and n_features is the number of features.

Returns

dataframe, cross features.

cumsum features

class autox.autox_competition.feature_engineer.fe_cumsum.FeatureCumsum[source]

cumsum特征描述

denoising autoencoder features

class autox.autox_competition.feature_engineer.fe_denoising_autoencoder.FeatureDenoisingAutoencoder[source]

DenoisingAutoencoder特征描述

diff features

class autox.autox_competition.feature_engineer.fe_diff.FeatureDiff[source]

diff特征描述

dimension reduction features

exp weighted mean features

gbdt features

image to vector features

nlp features

features from other table (one to many relationship)

rank features

rolling statistics features (for time-series data)

shift features

shift features (for time-series data)

statistics features

target encoding features

time features