API documentation of skgarden
Table of contents
skgarden.mondrian
- skgarden.mondrian.MondrianForestClassifier
- skgarden.mondrian.MondrianForestRegressor
- skgarden.mondrian.MondrianTreeClassifier
- skgarden.mondrian.MondrianTreeRegressor
skgarden.quantile
- skgarden.quantile.DecisionTreeQuantileRegressor
- skgarden.quantile.ExtraTreeQuantileRegressor
- skgarden.quantile.ExtraTreesQuantileRegressor
- skgarden.quantile.RandomForestQuantileRegressor
skgarden.forest
skgarden.mondrian
skgarden.mondrian.MondrianForestClassifier
A MondrianForestClassifier is an ensemble of MondrianTreeClassifiers.
The probability of class is given
Parameters
-
n_estimators
(integer, optional (default=10))The number of trees in the forest.
-
max_depth
(integer, optional (default=None))The depth to which each tree is grown. If None, the tree is either grown to full depth or is constrained by
min_samples_split
. -
min_samples_split
(integer, optional (default=2))Stop growing the tree if all the nodes have lesser than
min_samples_split
number of samples. -
bootstrap
(boolean, optional (default=False))If bootstrap is set to False, then all trees are trained on the entire training dataset. Else, each tree is fit on n_samples drawn with replacement from the training dataset.
-
random_state
(int, RandomState instance or None, optional (default=None))If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by
np.random
.
Methods
MondrianForestClassifier.fit(X, y)
Builds a forest of trees from the training set (X, y).
Parameters
-
X
(array-like or sparse matrix of shape = [n_samples, n_features])The training input samples. Internally, its dtype will be converted to
dtype=np.float32
. If a sparse matrix is provided, it will be converted into a sparsecsc_matrix
. -
y
(array-like, shape = [n_samples] or [n_samples, n_outputs])The target values (class labels in classification, real numbers in regression).
-
sample_weight
(array-like, shape = [n_samples] or None)Sample weights. If None, then samples are equally weighted. Splits that would create child nodes with net zero or negative weight are ignored while searching for a split in each node. In the case of classification, splits are also ignored if they would result in any single class carrying a negative weight in either child node.
Returns
-
self
(object)Returns self.
MondrianForestClassifier.partial_fit(X, y, classes=None)
Incremental building of Mondrian Forest Classifiers.
Parameters
-
X
(array_like, shape = [n_samples, n_features])The input samples. Internally, it will be converted to
dtype=np.float32
y: array_like, shape = [n_samples] Input targets.
classes: array_like, shape = [n_classes] Ignored for a regression problem. For a classification problem, if not provided this is inferred from y. This is taken into account for only the first call to partial_fit and ignored for subsequent calls.
Returns self: instance of MondrianForestClassifier
MondrianForestClassifier.weighted_decision_path(X)
Returns the weighted decision path in the forest.
Each non-zero value in the decision path determines the weight of that particular node while making predictions.
Parameters
-
X
(array-like, shape = (n_samples, n_features))Input.
Returns
-
decision_path
(sparse csr matrix, shape = (n_samples, n_total_nodes))Return a node indicator matrix where non zero elements indicate the weight of that particular node in making predictions.
-
est_inds
(array-like, shape = (n_estimators + 1,))weighted_decision_path[:, est_inds[i]: est_inds[i + 1]] provides the weighted_decision_path of estimator i
Properties
skgarden.mondrian.MondrianForestRegressor
A MondrianForestRegressor is an ensemble of MondrianTreeRegressors.
The variance in predictions is reduced by averaging the predictions from all trees.
Parameters
-
n_estimators
(integer, optional (default=10))The number of trees in the forest.
-
max_depth
(integer, optional (default=None))The depth to which each tree is grown. If None, the tree is either grown to full depth or is constrained by
min_samples_split
. -
min_samples_split
(integer, optional (default=2))Stop growing the tree if all the nodes have lesser than
min_samples_split
number of samples. -
bootstrap
(boolean, optional (default=False))If bootstrap is set to False, then all trees are trained on the entire training dataset. Else, each tree is fit on n_samples drawn with replacement from the training dataset.
-
random_state
(int, RandomState instance or None, optional (default=None))If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by
np.random
.
Methods
MondrianForestRegressor.fit(X, y)
Builds a forest of trees from the training set (X, y).
Parameters
-
X
(array-like or sparse matrix of shape = [n_samples, n_features])The training input samples. Internally, its dtype will be converted to
dtype=np.float32
. If a sparse matrix is provided, it will be converted into a sparsecsc_matrix
. -
y
(array-like, shape = [n_samples] or [n_samples, n_outputs])The target values (class labels in classification, real numbers in regression).
-
sample_weight
(array-like, shape = [n_samples] or None)Sample weights. If None, then samples are equally weighted. Splits that would create child nodes with net zero or negative weight are ignored while searching for a split in each node. In the case of classification, splits are also ignored if they would result in any single class carrying a negative weight in either child node.
Returns
-
self
(object)Returns self.
MondrianForestRegressor.partial_fit(X, y)
Incremental building of Mondrian Forest Regressors.
Parameters
-
X
(array_like, shape = [n_samples, n_features])The input samples. Internally, it will be converted to
dtype=np.float32
y: array_like, shape = [n_samples] Input targets.
classes: array_like, shape = [n_classes] Ignored for a regression problem. For a classification problem, if not provided this is inferred from y. This is taken into account for only the first call to partial_fit and ignored for subsequent calls.
Returns self: instance of MondrianForestClassifier
MondrianForestRegressor.predict(X, return_std=False)
Returns the predicted mean and std.
The prediction is a GMM drawn from where .
The mean reduces to
The variance is given by
Parameters
-
X
(array-like, shape = (n_samples, n_features))Input samples.
-
return_std
(boolean, default (False))Whether or not to return the standard deviation.
Returns
-
y
(array-like, shape = (n_samples,))Predictions at X.
-
std
(array-like, shape = (n_samples,))Standard deviation at X.
MondrianForestRegressor.weighted_decision_path(X)
Returns the weighted decision path in the forest.
Each non-zero value in the decision path determines the weight of that particular node while making predictions.
Parameters
-
X
(array-like, shape = (n_samples, n_features))Input.
Returns
-
decision_path
(sparse csr matrix, shape = (n_samples, n_total_nodes))Return a node indicator matrix where non zero elements indicate the weight of that particular node in making predictions.
-
est_inds
(array-like, shape = (n_estimators + 1,))weighted_decision_path[:, est_inds[i]: est_inds[i + 1]] provides the weighted_decision_path of estimator i
Properties
skgarden.mondrian.MondrianTreeClassifier
A Mondrian tree.
The splits in a mondrian tree regressor differ from the standard regression tree in the following ways.
At fit time: - Splits are done independently of the labels. - The candidate feature is drawn with a probability proportional to the feature range. - The candidate threshold is drawn from a uniform distribution with the bounds equal to the bounds of the candidate feature. - The time of split is also stored which is proportional to the inverse of the size of the bounding-box.
At prediction time: - Every node in the path from the root to the leaf is given a weight while making predictions. - At each node, the probability of an unseen sample splitting from that node is calculated. The farther the sample is away from the bounding box, the more probable that it will split away. - For every node, the probability that an unseen sample has not split before reaching that node and the probability that it will split away at that particular node are multiplied to give a weight.
Parameters
-
max_depth
(int or None, optional (default=None))The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.
-
min_samples_split
(int, float, optional (default=2))The minimum number of samples required to split an internal node:
- If int, then consider
min_samples_split
as the minimum number. - If float, then
min_samples_split
is a percentage andceil(min_samples_split * n_samples)
are the minimum number of samples for each split.
- If int, then consider
-
random_state
(int, RandomState instance or None, optional (default=None))If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by
np.random
.
Methods
MondrianTreeClassifier.apply(X, check_input=True)
Returns the index of the leaf that each sample is predicted as.
.. versionadded:: 0.17
Parameters
-
X
(array_like or sparse matrix, shape = [n_samples, n_features])The input samples. Internally, it will be converted to
dtype=np.float32
and if a sparse matrix is provided to a sparsecsr_matrix
. -
check_input
(boolean, (default=True))Allow to bypass several input checking. Don't use this parameter unless you know what you do.
Returns
-
X_leaves
(array_like, shape = [n_samples,])For each datapoint x in X, return the index of the leaf x ends up in. Leaves are numbered within
[0; self.tree_.node_count)
, possibly with gaps in the numbering.
MondrianTreeClassifier.decision_path(X, check_input=True)
Return the decision path in the tree
.. versionadded:: 0.18
Parameters
-
X
(array_like or sparse matrix, shape = [n_samples, n_features])The input samples. Internally, it will be converted to
dtype=np.float32
and if a sparse matrix is provided to a sparsecsr_matrix
. -
check_input
(boolean, (default=True))Allow to bypass several input checking. Don't use this parameter unless you know what you do.
Returns
-
indicator
(sparse csr array, shape = [n_samples, n_nodes])Return a node indicator matrix where non zero elements indicates that the samples goes through the nodes.
MondrianTreeClassifier.fit(X, y, sample_weight=None, check_input=True, X_idx_sorted=None)
MondrianTreeClassifier.partial_fit(X, y, classes=None)
Incremental building of Mondrian Tree Classifiers.
Parameters
-
X
(array_like, shape = [n_samples, n_features])The input samples. Internally, it will be converted to
dtype=np.float32
y: array_like, shape = [n_samples] Input targets.
classes: array_like, shape = [n_classes] Ignored for a regression problem. For a classification problem, if not provided this is inferred from y. This is taken into account for only the first call to partial_fit and ignored for subsequent calls.
Returns self: instance of MondrianTree
MondrianTreeClassifier.predict(X, check_input=True, return_std=False)
Predict class or regression value for X.
For a classification model, the predicted class for each sample in X is returned. For a regression model, the predicted value based on X is returned.
Parameters
-
X
(array-like or sparse matrix of shape = [n_samples, n_features])The input samples. Internally, it will be converted to
dtype=np.float32
and if a sparse matrix is provided to a sparsecsr_matrix
. -
check_input
(boolean, (default=True))Allow to bypass several input checking. Don't use this parameter unless you know what you do.
-
return_std
(boolean, (default=True))Whether or not to return the standard deviation.
Returns
-
y
(array of shape = [n_samples] or [n_samples, n_outputs])The predicted classes, or the predict values.
MondrianTreeClassifier.predict_proba(X, check_input=True)
Predicts the probability of each class label given X.
Parameters
-
X
(array-like, shape = [n_samples, n_features])The input samples. Internally, it will be converted to
dtype=np.float32
. -
check_input
(boolean, (default=True))Allow to bypass several input checking. Don't use this parameter unless you know what you do.
Returns
-
y_prob
(array of shape = [n_samples, n_classes])Prediceted probabilities for each class.
MondrianTreeClassifier.weighted_decision_path(X, check_input=True)
Returns the weighted decision path in the tree.
Each non-zero value in the decision path determines the weight of that particular node in making predictions.
Parameters
-
X
(array_like, shape = [n_samples, n_features])The input samples. Internally, it will be converted to
dtype=np.float32
and if a sparse matrix is provided to a sparsecsr_matrix
. -
check_input
(boolean, (default=True))Allow to bypass several input checking. Don't use this parameter unless you know what you do.
Returns
-
indicator
(sparse csr array, shape = [n_samples, n_nodes])Return a node indicator matrix where non zero elements indicate the weight of that particular node in making predictions.
Properties
skgarden.mondrian.MondrianTreeRegressor
A Mondrian tree.
The splits in a mondrian tree regressor differ from the standard regression tree in the following ways.
At fit time: - Splits are done independently of the labels. - The candidate feature is drawn with a probability proportional to the feature range. - The candidate threshold is drawn from a uniform distribution with the bounds equal to the bounds of the candidate feature. - The time of split is also stored which is proportional to the inverse of the size of the bounding-box.
At prediction time: - Every node in the path from the root to the leaf is given a weight while making predictions. - At each node, the probability of an unseen sample splitting from that node is calculated. The farther the sample is away from the bounding box, the more probable that it will split away. - For every node, the probability that an unseen sample has not split before reaching that node and the probability that it will split away at that particular node are multiplied to give a weight.
Parameters
-
max_depth
(int or None, optional (default=None))The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.
-
min_samples_split
(int, float, optional (default=2))The minimum number of samples required to split an internal node:
- If int, then consider
min_samples_split
as the minimum number. - If float, then
min_samples_split
is a percentage andceil(min_samples_split * n_samples)
are the minimum number of samples for each split.
- If int, then consider
-
random_state
(int, RandomState instance or None, optional (default=None))If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by
np.random
.
Methods
MondrianTreeRegressor.apply(X, check_input=True)
Returns the index of the leaf that each sample is predicted as.
.. versionadded:: 0.17
Parameters
-
X
(array_like or sparse matrix, shape = [n_samples, n_features])The input samples. Internally, it will be converted to
dtype=np.float32
and if a sparse matrix is provided to a sparsecsr_matrix
. -
check_input
(boolean, (default=True))Allow to bypass several input checking. Don't use this parameter unless you know what you do.
Returns
-
X_leaves
(array_like, shape = [n_samples,])For each datapoint x in X, return the index of the leaf x ends up in. Leaves are numbered within
[0; self.tree_.node_count)
, possibly with gaps in the numbering.
MondrianTreeRegressor.decision_path(X, check_input=True)
Return the decision path in the tree
.. versionadded:: 0.18
Parameters
-
X
(array_like or sparse matrix, shape = [n_samples, n_features])The input samples. Internally, it will be converted to
dtype=np.float32
and if a sparse matrix is provided to a sparsecsr_matrix
. -
check_input
(boolean, (default=True))Allow to bypass several input checking. Don't use this parameter unless you know what you do.
Returns
-
indicator
(sparse csr array, shape = [n_samples, n_nodes])Return a node indicator matrix where non zero elements indicates that the samples goes through the nodes.
MondrianTreeRegressor.fit(X, y, sample_weight=None, check_input=True, X_idx_sorted=None)
MondrianTreeRegressor.partial_fit(X, y)
Incremental building of Mondrian Tree Regressors.
Parameters
-
X
(array_like, shape = [n_samples, n_features])The input samples. Internally, it will be converted to
dtype=np.float32
y: array_like, shape = [n_samples] Input targets.
Returns self: instance of MondrianTree
MondrianTreeRegressor.predict(X, check_input=True, return_std=False)
Predict class or regression value for X.
For a classification model, the predicted class for each sample in X is returned. For a regression model, the predicted value based on X is returned.
Parameters
-
X
(array-like or sparse matrix of shape = [n_samples, n_features])The input samples. Internally, it will be converted to
dtype=np.float32
and if a sparse matrix is provided to a sparsecsr_matrix
. -
check_input
(boolean, (default=True))Allow to bypass several input checking. Don't use this parameter unless you know what you do.
-
return_std
(boolean, (default=True))Whether or not to return the standard deviation.
Returns
-
y
(array of shape = [n_samples] or [n_samples, n_outputs])The predicted classes, or the predict values.
MondrianTreeRegressor.weighted_decision_path(X, check_input=True)
Returns the weighted decision path in the tree.
Each non-zero value in the decision path determines the weight of that particular node in making predictions.
Parameters
-
X
(array_like, shape = [n_samples, n_features])The input samples. Internally, it will be converted to
dtype=np.float32
and if a sparse matrix is provided to a sparsecsr_matrix
. -
check_input
(boolean, (default=True))Allow to bypass several input checking. Don't use this parameter unless you know what you do.
Returns
-
indicator
(sparse csr array, shape = [n_samples, n_nodes])Return a node indicator matrix where non zero elements indicate the weight of that particular node in making predictions.
Properties
skgarden.quantile
skgarden.quantile.DecisionTreeQuantileRegressor
A decision tree regressor that provides quantile estimates.
Parameters
-
criterion
(string, optional (default="mse"))The function to measure the quality of a split. Supported criteria are "mse" for the mean squared error, which is equal to variance reduction as feature selection criterion, and "mae" for the mean absolute error. .. versionadded:: 0.18 Mean Absolute Error (MAE) criterion.
-
splitter
(string, optional (default="best"))The strategy used to choose the split at each node. Supported strategies are "best" to choose the best split and "random" to choose the best random split.
-
max_features
(int, float, string or None, optional (default=None))The number of features to consider when looking for the best split: - If int, then consider
max_features
features at each split. - If float, thenmax_features
is a percentage andint(max_features * n_features)
features are considered at each split. - If "auto", thenmax_features=n_features
. - If "sqrt", thenmax_features=sqrt(n_features)
. - If "log2", thenmax_features=log2(n_features)
. - If None, thenmax_features=n_features
. Note: the search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more thanmax_features
features. -
max_depth
(int or None, optional (default=None))The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.
-
min_samples_split
(int, float, optional (default=2))The minimum number of samples required to split an internal node: - If int, then consider
min_samples_split
as the minimum number. - If float, thenmin_samples_split
is a percentage andceil(min_samples_split * n_samples)
are the minimum number of samples for each split. .. versionchanged:: 0.18 Added float values for percentages. -
min_samples_leaf
(int, float, optional (default=1))The minimum number of samples required to be at a leaf node: - If int, then consider
min_samples_leaf
as the minimum number. - If float, thenmin_samples_leaf
is a percentage andceil(min_samples_leaf * n_samples)
are the minimum number of samples for each node. .. versionchanged:: 0.18 Added float values for percentages. -
min_weight_fraction_leaf
(float, optional (default=0.))The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Samples have equal weight when sample_weight is not provided.
-
max_leaf_nodes
(int or None, optional (default=None))Grow a tree with
max_leaf_nodes
in best-first fashion. Best nodes are defined as relative reduction in impurity. If None then unlimited number of leaf nodes. -
random_state
(int, RandomState instance or None, optional (default=None))If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by
np.random
. -
presort
(bool, optional (default=False))Whether to presort the data to speed up the finding of best splits in fitting. For the default settings of a decision tree on large datasets, setting this to true may slow down the training process. When using either a smaller dataset or a restricted depth, this may speed up the training.
Attributes
-
feature_importances_
(array of shape = [n_features])The feature importances. The higher, the more important the feature. The importance of a feature is computed as the (normalized) total reduction of the criterion brought by that feature. It is also known as the Gini importance [4]_.
-
max_features_
(int,)The inferred value of max_features.
-
n_features_
(int)The number of features when
fit
is performed. -
n_outputs_
(int)The number of outputs when
fit
is performed. -
tree_
(Tree object)The underlying Tree object.
-
y_train_
(array-like)Train target values.
-
y_train_leaves_
(array-like.)Cache the leaf nodes that each training sample falls into. y_train_leaves_[i] is the leaf that y_train[i] ends up at.
Methods
DecisionTreeQuantileRegressor.predict(X, quantile=None, check_input=False)
Predict regression value for X.
Parameters
-
X
(array-like or sparse matrix of shape = [n_samples, n_features])The input samples. Internally, it will be converted to
dtype=np.float32
and if a sparse matrix is provided to a sparsecsr_matrix
. -
quantile
(int, optional)Value ranging from 0 to 100. By default, the mean is returned.
-
check_input
(boolean, (default=True))Allow to bypass several input checking. Don't use this parameter unless you know what you do.
Returns
-
y
(array of shape = [n_samples])If quantile is set to None, then return E(Y | X). Else return y such that F(Y=y | x) = quantile.
Properties
skgarden.quantile.ExtraTreeQuantileRegressor
An extremely randomized tree regressor.
Extra-trees differ from classic decision trees in the way they are built.
When looking for the best split to separate the samples of a node into two
groups, random splits are drawn for each of the max_features
randomly
selected features and the best split among those is chosen. When
max_features
is set 1, this amounts to building a totally random
decision tree.
Warning: Extra-trees should only be used within ensemble methods.
Read more in the :ref:User Guide <tree>
.
Parameters
-
criterion
(string, optional (default="mse"))The function to measure the quality of a split. Supported criteria are "mse" for the mean squared error, which is equal to variance reduction as feature selection criterion, and "mae" for the mean absolute error.
.. versionadded:: 0.18 Mean Absolute Error (MAE) criterion.
-
splitter
(string, optional (default="best"))The strategy used to choose the split at each node. Supported strategies are "best" to choose the best split and "random" to choose the best random split.
-
max_depth
(int or None, optional (default=None))The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.
-
min_samples_split
(int, float, optional (default=2))The minimum number of samples required to split an internal node:
- If int, then consider
min_samples_split
as the minimum number. - If float, then
min_samples_split
is a percentage andceil(min_samples_split * n_samples)
are the minimum number of samples for each split.
.. versionchanged:: 0.18 Added float values for percentages.
- If int, then consider
-
min_samples_leaf
(int, float, optional (default=1))The minimum number of samples required to be at a leaf node:
- If int, then consider
min_samples_leaf
as the minimum number. - If float, then
min_samples_leaf
is a percentage andceil(min_samples_leaf * n_samples)
are the minimum number of samples for each node.
.. versionchanged:: 0.18 Added float values for percentages.
- If int, then consider
-
min_weight_fraction_leaf
(float, optional (default=0.))The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Samples have equal weight when sample_weight is not provided.
-
max_features
(int, float, string or None, optional (default=None))The number of features to consider when looking for the best split:
- If int, then consider
max_features
features at each split. - If float, then
max_features
is a percentage andint(max_features * n_features)
features are considered at each split. - If "auto", then
max_features=n_features
. - If "sqrt", then
max_features=sqrt(n_features)
. - If "log2", then
max_features=log2(n_features)
. - If None, then
max_features=n_features
.
Note: the search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than
max_features
features. - If int, then consider
-
random_state
(int, RandomState instance or None, optional (default=None))If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by
np.random
. -
min_impurity_decrease
(float, optional (default=0.))A node will be split if this split induces a decrease of the impurity greater than or equal to this value.
The weighted impurity decrease equation is the following::
N_t / N * (impurity - N_t_R / N_t * right_impurity - N_t_L / N_t * left_impurity)
where
N
is the total number of samples,N_t
is the number of samples at the current node,N_t_L
is the number of samples in the left child, andN_t_R
is the number of samples in the right child.N
,N_t
,N_t_R
andN_t_L
all refer to the weighted sum, ifsample_weight
is passed... versionadded:: 0.19
-
min_impurity_split
(float,)Threshold for early stopping in tree growth. A node will split if its impurity is above the threshold, otherwise it is a leaf.
.. deprecated:: 0.19
min_impurity_split
has been deprecated in favor ofmin_impurity_decrease
in 0.19 and will be removed in 0.21. Usemin_impurity_decrease
instead. -
max_leaf_nodes
(int or None, optional (default=None))Grow a tree with
max_leaf_nodes
in best-first fashion. Best nodes are defined as relative reduction in impurity. If None then unlimited number of leaf nodes.
See also ExtraTreeClassifier, ExtraTreesClassifier, ExtraTreesRegressor
Notes
The default values for the parameters controlling the size of the trees
(e.g. max_depth
, min_samples_leaf
, etc.) lead to fully grown and
unpruned trees which can potentially be very large on some data sets. To
reduce memory consumption, the complexity and size of the trees should be
controlled by setting those parameter values.
References
.. [1] P. Geurts, D. Ernst., and L. Wehenkel, "Extremely randomized trees", Machine Learning, 63(1), 3-42, 2006.
Methods
ExtraTreeQuantileRegressor.predict(X, quantile=None, check_input=False)
Predict regression value for X.
Parameters
-
X
(array-like or sparse matrix of shape = [n_samples, n_features])The input samples. Internally, it will be converted to
dtype=np.float32
and if a sparse matrix is provided to a sparsecsr_matrix
. -
quantile
(int, optional)Value ranging from 0 to 100. By default, the mean is returned.
-
check_input
(boolean, (default=True))Allow to bypass several input checking. Don't use this parameter unless you know what you do.
Returns
-
y
(array of shape = [n_samples])If quantile is set to None, then return E(Y | X). Else return y such that F(Y=y | x) = quantile.
Properties
skgarden.quantile.ExtraTreesQuantileRegressor
An extra-trees regressor that provides quantile estimates.
This class implements a meta estimator that fits a number of randomized decision trees (a.k.a. extra-trees) on various sub-samples of the dataset and use averaging to improve the predictive accuracy and control over-fitting.
Parameters
-
n_estimators
(integer, optional (default=10))The number of trees in the forest.
-
criterion
(string, optional (default="mse"))The function to measure the quality of a split. Supported criteria are "mse" for the mean squared error, which is equal to variance reduction as feature selection criterion, and "mae" for the mean absolute error. .. versionadded:: 0.18 Mean Absolute Error (MAE) criterion.
-
max_features
(int, float, string or None, optional (default="auto"))The number of features to consider when looking for the best split: - If int, then consider
max_features
features at each split. - If float, thenmax_features
is a percentage andint(max_features * n_features)
features are considered at each split. - If "auto", thenmax_features=n_features
. - If "sqrt", thenmax_features=sqrt(n_features)
. - If "log2", thenmax_features=log2(n_features)
. - If None, thenmax_features=n_features
. Note: the search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more thanmax_features
features. -
max_depth
(integer or None, optional (default=None))The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.
-
min_samples_split
(int, float, optional (default=2))The minimum number of samples required to split an internal node: - If int, then consider
min_samples_split
as the minimum number. - If float, thenmin_samples_split
is a percentage andceil(min_samples_split * n_samples)
are the minimum number of samples for each split. .. versionchanged:: 0.18 Added float values for percentages. -
min_samples_leaf
(int, float, optional (default=1))The minimum number of samples required to be at a leaf node: - If int, then consider
min_samples_leaf
as the minimum number. - If float, thenmin_samples_leaf
is a percentage andceil(min_samples_leaf * n_samples)
are the minimum number of samples for each node. .. versionchanged:: 0.18 Added float values for percentages. -
min_weight_fraction_leaf
(float, optional (default=0.))The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Samples have equal weight when sample_weight is not provided.
-
max_leaf_nodes
(int or None, optional (default=None))Grow trees with
max_leaf_nodes
in best-first fashion. Best nodes are defined as relative reduction in impurity. If None then unlimited number of leaf nodes. -
bootstrap
(boolean, optional (default=False))Whether bootstrap samples are used when building trees.
-
oob_score
(bool, optional (default=False))Whether to use out-of-bag samples to estimate the R^2 on unseen data.
-
n_jobs
(integer, optional (default=1))The number of jobs to run in parallel for both
fit
andpredict
. If -1, then the number of jobs is set to the number of cores. -
random_state
(int, RandomState instance or None, optional (default=None))If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by
np.random
. -
verbose
(int, optional (default=0))Controls the verbosity of the tree building process.
-
warm_start
(bool, optional (default=False))When set to
True
, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new forest.
Attributes
-
estimators_
(list of ExtraTreeQuantileRegressor)The collection of fitted sub-estimators.
-
feature_importances_
(array of shape = [n_features])The feature importances (the higher, the more important the feature).
-
n_features_
(int)The number of features when
fit
is performed. -
n_outputs_
(int)The number of outputs when
fit
is performed. -
oob_score_
(float)Score of the training dataset obtained using an out-of-bag estimate.
-
oob_prediction_
(array of shape = [n_samples])Prediction computed with out-of-bag estimate on the training set.
-
y_train_
(array-like, shape=(n_samples,))Cache the target values at fit time.
-
y_weights_
(array-like, shape=(n_estimators, n_samples))y_weights_[i, j] is the weight given to sample
j` while estimator
i`` is fit. If bootstrap is set to True, this reduces to a 2-D array of ones. -
y_train_leaves_
(array-like, shape=(n_estimators, n_samples))y_train_leaves_[i, j] provides the leaf node that y_train_[i] ends up when estimator j is fit. If y_train_[i] is given a weight of zero when estimator j is fit, then the value is -1.
References .. [1] Nicolai Meinshausen, Quantile Regression Forests http://www.jmlr.org/papers/volume7/meinshausen06a/meinshausen06a.pdf
Methods
ExtraTreesQuantileRegressor.fit(X, y)
Build a forest from the training set (X, y).
Parameters
-
X
(array-like or sparse matrix, shape = [n_samples, n_features])The training input samples. Internally, it will be converted to
dtype=np.float32
and if a sparse matrix is provided to a sparsecsc_matrix
. -
y
(array-like, shape = [n_samples] or [n_samples, n_outputs])The target values (class labels) as integers or strings.
-
sample_weight
(array-like, shape = [n_samples] or None)Sample weights. If None, then samples are equally weighted. Splits that would create child nodes with net zero or negative weight are ignored while searching for a split in each node. Splits are also ignored if they would result in any single class carrying a negative weight in either child node.
-
check_input
(boolean, (default=True))Allow to bypass several input checking. Don't use this parameter unless you know what you do.
-
X_idx_sorted
(array-like, shape = [n_samples, n_features], optional)The indexes of the sorted training input samples. If many tree are grown on the same dataset, this allows the ordering to be cached between trees. If None, the data will be sorted here. Don't use this parameter unless you know what to do.
Returns
-
self
(object)Returns self.
ExtraTreesQuantileRegressor.predict(X, quantile=None)
Predict regression value for X.
Parameters
-
X
(array-like or sparse matrix of shape = [n_samples, n_features])The input samples. Internally, it will be converted to
dtype=np.float32
and if a sparse matrix is provided to a sparsecsr_matrix
. -
quantile
(int, optional)Value ranging from 0 to 100. By default, the mean is returned.
-
check_input
(boolean, (default=True))Allow to bypass several input checking. Don't use this parameter unless you know what you do.
Returns
-
y
(array of shape = [n_samples])If quantile is set to None, then return E(Y | X). Else return y such that F(Y=y | x) = quantile.
Properties
skgarden.quantile.RandomForestQuantileRegressor
A random forest regressor that provides quantile estimates.
A random forest is a meta estimator that fits a number of classifying
decision trees on various sub-samples of the dataset and use averaging
to improve the predictive accuracy and control over-fitting.
The sub-sample size is always the same as the original
input sample size but the samples are drawn with replacement if
bootstrap=True
(default).
Parameters
-
n_estimators
(integer, optional (default=10))The number of trees in the forest.
-
criterion
(string, optional (default="mse"))The function to measure the quality of a split. Supported criteria are "mse" for the mean squared error, which is equal to variance reduction as feature selection criterion, and "mae" for the mean absolute error. .. versionadded:: 0.18 Mean Absolute Error (MAE) criterion.
-
max_features
(int, float, string or None, optional (default="auto"))The number of features to consider when looking for the best split: - If int, then consider
max_features
features at each split. - If float, thenmax_features
is a percentage andint(max_features * n_features)
features are considered at each split. - If "auto", thenmax_features=n_features
. - If "sqrt", thenmax_features=sqrt(n_features)
. - If "log2", thenmax_features=log2(n_features)
. - If None, thenmax_features=n_features
. Note: the search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more thanmax_features
features. -
max_depth
(integer or None, optional (default=None))The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.
-
min_samples_split
(int, float, optional (default=2))The minimum number of samples required to split an internal node: - If int, then consider
min_samples_split
as the minimum number. - If float, thenmin_samples_split
is a percentage andceil(min_samples_split * n_samples)
are the minimum number of samples for each split. .. versionchanged:: 0.18 Added float values for percentages. -
min_samples_leaf
(int, float, optional (default=1))The minimum number of samples required to be at a leaf node: - If int, then consider
min_samples_leaf
as the minimum number. - If float, thenmin_samples_leaf
is a percentage andceil(min_samples_leaf * n_samples)
are the minimum number of samples for each node. .. versionchanged:: 0.18 Added float values for percentages. -
min_weight_fraction_leaf
(float, optional (default=0.))The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Samples have equal weight when sample_weight is not provided.
-
max_leaf_nodes
(int or None, optional (default=None))Grow trees with
max_leaf_nodes
in best-first fashion. Best nodes are defined as relative reduction in impurity. If None then unlimited number of leaf nodes. -
bootstrap
(boolean, optional (default=True))Whether bootstrap samples are used when building trees.
-
oob_score
(bool, optional (default=False))whether to use out-of-bag samples to estimate the R^2 on unseen data.
-
n_jobs
(integer, optional (default=1))The number of jobs to run in parallel for both
fit
andpredict
. If -1, then the number of jobs is set to the number of cores. -
random_state
(int, RandomState instance or None, optional (default=None))If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by
np.random
. -
verbose
(int, optional (default=0))Controls the verbosity of the tree building process.
-
warm_start
(bool, optional (default=False))When set to
True
, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new forest.
Attributes
-
estimators_
(list of DecisionTreeQuantileRegressor)The collection of fitted sub-estimators.
-
feature_importances_
(array of shape = [n_features])The feature importances (the higher, the more important the feature).
-
n_features_
(int)The number of features when
fit
is performed. -
n_outputs_
(int)The number of outputs when
fit
is performed. -
oob_score_
(float)Score of the training dataset obtained using an out-of-bag estimate.
-
oob_prediction_
(array of shape = [n_samples])Prediction computed with out-of-bag estimate on the training set.
-
y_train_
(array-like, shape=(n_samples,))Cache the target values at fit time.
-
y_weights_
(array-like, shape=(n_estimators, n_samples))y_weights_[i, j] is the weight given to sample
j` while estimator
i`` is fit. If bootstrap is set to True, this reduces to a 2-D array of ones. -
y_train_leaves_
(array-like, shape=(n_estimators, n_samples))y_train_leaves_[i, j] provides the leaf node that y_train_[i] ends up when estimator j is fit. If y_train_[i] is given a weight of zero when estimator j is fit, then the value is -1.
References .. [1] Nicolai Meinshausen, Quantile Regression Forests http://www.jmlr.org/papers/volume7/meinshausen06a/meinshausen06a.pdf
Methods
RandomForestQuantileRegressor.fit(X, y)
Build a forest from the training set (X, y).
Parameters
-
X
(array-like or sparse matrix, shape = [n_samples, n_features])The training input samples. Internally, it will be converted to
dtype=np.float32
and if a sparse matrix is provided to a sparsecsc_matrix
. -
y
(array-like, shape = [n_samples] or [n_samples, n_outputs])The target values (class labels) as integers or strings.
-
sample_weight
(array-like, shape = [n_samples] or None)Sample weights. If None, then samples are equally weighted. Splits that would create child nodes with net zero or negative weight are ignored while searching for a split in each node. Splits are also ignored if they would result in any single class carrying a negative weight in either child node.
-
check_input
(boolean, (default=True))Allow to bypass several input checking. Don't use this parameter unless you know what you do.
-
X_idx_sorted
(array-like, shape = [n_samples, n_features], optional)The indexes of the sorted training input samples. If many tree are grown on the same dataset, this allows the ordering to be cached between trees. If None, the data will be sorted here. Don't use this parameter unless you know what to do.
Returns
-
self
(object)Returns self.
RandomForestQuantileRegressor.predict(X, quantile=None)
Predict regression value for X.
Parameters
-
X
(array-like or sparse matrix of shape = [n_samples, n_features])The input samples. Internally, it will be converted to
dtype=np.float32
and if a sparse matrix is provided to a sparsecsr_matrix
. -
quantile
(int, optional)Value ranging from 0 to 100. By default, the mean is returned.
-
check_input
(boolean, (default=True))Allow to bypass several input checking. Don't use this parameter unless you know what you do.
Returns
-
y
(array of shape = [n_samples])If quantile is set to None, then return E(Y | X). Else return y such that F(Y=y | x) = quantile.
Properties
skgarden.forest
skgarden.forest.ExtraTreesRegressor
ExtraTreesRegressor that supports conditional standard deviation.
Parameters
-
n_estimators
(integer, optional (default=10))The number of trees in the forest.
-
criterion
(string, optional (default="mse"))The function to measure the quality of a split. Supported criteria are "mse" for the mean squared error, which is equal to variance reduction as feature selection criterion, and "mae" for the mean absolute error.
-
max_features
(int, float, string or None, optional (default="auto"))The number of features to consider when looking for the best split: - If int, then consider
max_features
features at each split. - If float, thenmax_features
is a percentage andint(max_features * n_features)
features are considered at each split. - If "auto", thenmax_features=n_features
. - If "sqrt", thenmax_features=sqrt(n_features)
. - If "log2", thenmax_features=log2(n_features)
. - If None, thenmax_features=n_features
. Note: the search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more thanmax_features
features. -
max_depth
(integer or None, optional (default=None))The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.
-
min_samples_split
(int, float, optional (default=2))The minimum number of samples required to split an internal node: - If int, then consider
min_samples_split
as the minimum number. - If float, thenmin_samples_split
is a percentage andceil(min_samples_split * n_samples)
are the minimum number of samples for each split. -
min_samples_leaf
(int, float, optional (default=1))The minimum number of samples required to be at a leaf node: - If int, then consider
min_samples_leaf
as the minimum number. - If float, thenmin_samples_leaf
is a percentage andceil(min_samples_leaf * n_samples)
are the minimum number of samples for each node. -
min_weight_fraction_leaf
(float, optional (default=0.))The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Samples have equal weight when sample_weight is not provided.
-
max_leaf_nodes
(int or None, optional (default=None))Grow trees with
max_leaf_nodes
in best-first fashion. Best nodes are defined as relative reduction in impurity. If None then unlimited number of leaf nodes. -
min_impurity_decrease
(float, optional (default=0.))A node will be split if this split induces a decrease of the impurity greater than or equal to this value. The weighted impurity decrease equation is the following:: N_t / N * (impurity - N_t_R / N_t * right_impurity - N_t_L / N_t * left_impurity) where
N
is the total number of samples,N_t
is the number of samples at the current node,N_t_L
is the number of samples in the left child, andN_t_R
is the number of samples in the right child.N
,N_t
,N_t_R
andN_t_L
all refer to the weighted sum, ifsample_weight
is passed. -
bootstrap
(boolean, optional (default=True))Whether bootstrap samples are used when building trees.
-
oob_score
(bool, optional (default=False))whether to use out-of-bag samples to estimate the R^2 on unseen data.
-
n_jobs
(integer, optional (default=1))The number of jobs to run in parallel for both
fit
andpredict
. If -1, then the number of jobs is set to the number of cores. -
random_state
(int, RandomState instance or None, optional (default=None))If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by
np.random
. -
verbose
(int, optional (default=0))Controls the verbosity of the tree building process.
-
warm_start
(bool, optional (default=False))When set to
True
, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new forest.
Attributes
-
estimators_
(list of DecisionTreeRegressor)The collection of fitted sub-estimators.
-
feature_importances_
(array of shape = [n_features])The feature importances (the higher, the more important the feature).
-
n_features_
(int)The number of features when
fit
is performed. -
n_outputs_
(int)The number of outputs when
fit
is performed. -
oob_score_
(float)Score of the training dataset obtained using an out-of-bag estimate.
-
oob_prediction_
(array of shape = [n_samples])Prediction computed with out-of-bag estimate on the training set.
Notes
The default values for the parameters controlling the size of the trees
(e.g. max_depth
, min_samples_leaf
, etc.) lead to fully grown and
unpruned trees which can potentially be very large on some data sets. To
reduce memory consumption, the complexity and size of the trees should be
controlled by setting those parameter values.
The features are always randomly permuted at each split. Therefore,
the best found split may vary, even with the same training data,
max_features=n_features
and bootstrap=False
, if the improvement
of the criterion is identical for several splits enumerated during the
search of the best split. To obtain a deterministic behaviour during
fitting, random_state
has to be fixed.
References .. [1] L. Breiman, "Random Forests", Machine Learning, 45(1), 5-32, 2001.
Methods
ExtraTreesRegressor.predict(X, return_std=False)
Predict continuous output for X.
Parameters
-
X
(array-like of shape=(n_samples, n_features))Input data.
-
return_std
(boolean)Whether or not to return the standard deviation.
Returns
-
predictions
(array-like of shape=(n_samples,))Predicted values for X. If criterion is set to "mse", then
predictions[i] ~= mean(y | X[i])
. -
std
(array-like of shape=(n_samples,))Standard deviation of
y
atX
. If criterion is set to "mse", thenstd[i] ~= std(y | X[i])
.
Properties
skgarden.forest.RandomForestRegressor
RandomForestRegressor that supports conditional std computation.
Parameters
-
n_estimators
(integer, optional (default=10))The number of trees in the forest.
-
criterion
(string, optional (default="mse"))The function to measure the quality of a split. Supported criteria are "mse" for the mean squared error, which is equal to variance reduction as feature selection criterion, and "mae" for the mean absolute error.
-
max_features
(int, float, string or None, optional (default="auto"))The number of features to consider when looking for the best split: - If int, then consider
max_features
features at each split. - If float, thenmax_features
is a percentage andint(max_features * n_features)
features are considered at each split. - If "auto", thenmax_features=n_features
. - If "sqrt", thenmax_features=sqrt(n_features)
. - If "log2", thenmax_features=log2(n_features)
. - If None, thenmax_features=n_features
. Note: the search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more thanmax_features
features. -
max_depth
(integer or None, optional (default=None))The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.
-
min_samples_split
(int, float, optional (default=2))The minimum number of samples required to split an internal node: - If int, then consider
min_samples_split
as the minimum number. - If float, thenmin_samples_split
is a percentage andceil(min_samples_split * n_samples)
are the minimum number of samples for each split. -
min_samples_leaf
(int, float, optional (default=1))The minimum number of samples required to be at a leaf node: - If int, then consider
min_samples_leaf
as the minimum number. - If float, thenmin_samples_leaf
is a percentage andceil(min_samples_leaf * n_samples)
are the minimum number of samples for each node. -
min_weight_fraction_leaf
(float, optional (default=0.))The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Samples have equal weight when sample_weight is not provided.
-
max_leaf_nodes
(int or None, optional (default=None))Grow trees with
max_leaf_nodes
in best-first fashion. Best nodes are defined as relative reduction in impurity. If None then unlimited number of leaf nodes. -
min_impurity_decrease
(float, optional (default=0.))A node will be split if this split induces a decrease of the impurity greater than or equal to this value. The weighted impurity decrease equation is the following:: N_t / N * (impurity - N_t_R / N_t * right_impurity - N_t_L / N_t * left_impurity) where
N
is the total number of samples,N_t
is the number of samples at the current node,N_t_L
is the number of samples in the left child, andN_t_R
is the number of samples in the right child.N
,N_t
,N_t_R
andN_t_L
all refer to the weighted sum, ifsample_weight
is passed. -
bootstrap
(boolean, optional (default=True))Whether bootstrap samples are used when building trees.
-
oob_score
(bool, optional (default=False))whether to use out-of-bag samples to estimate the R^2 on unseen data.
-
n_jobs
(integer, optional (default=1))The number of jobs to run in parallel for both
fit
andpredict
. If -1, then the number of jobs is set to the number of cores. -
random_state
(int, RandomState instance or None, optional (default=None))If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by
np.random
. -
verbose
(int, optional (default=0))Controls the verbosity of the tree building process.
-
warm_start
(bool, optional (default=False))When set to
True
, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new forest.
Attributes
-
estimators_
(list of DecisionTreeRegressor)The collection of fitted sub-estimators.
-
feature_importances_
(array of shape = [n_features])The feature importances (the higher, the more important the feature).
-
n_features_
(int)The number of features when
fit
is performed. -
n_outputs_
(int)The number of outputs when
fit
is performed. -
oob_score_
(float)Score of the training dataset obtained using an out-of-bag estimate.
-
oob_prediction_
(array of shape = [n_samples])Prediction computed with out-of-bag estimate on the training set.
Notes
The default values for the parameters controlling the size of the trees
(e.g. max_depth
, min_samples_leaf
, etc.) lead to fully grown and
unpruned trees which can potentially be very large on some data sets. To
reduce memory consumption, the complexity and size of the trees should be
controlled by setting those parameter values.
The features are always randomly permuted at each split. Therefore,
the best found split may vary, even with the same training data,
max_features=n_features
and bootstrap=False
, if the improvement
of the criterion is identical for several splits enumerated during the
search of the best split. To obtain a deterministic behaviour during
fitting, random_state
has to be fixed.
References .. [1] L. Breiman, "Random Forests", Machine Learning, 45(1), 5-32, 2001.
Methods
RandomForestRegressor.predict(X, return_std=False)
Predict continuous output for X.
Parameters
-
X
(array of shape = (n_samples, n_features))Input data.
-
return_std
(boolean)Whether or not to return the standard deviation.
Returns
-
predictions
(array-like of shape = (n_samples,))Predicted values for X. If criterion is set to "mse", then
predictions[i] ~= mean(y | X[i])
. -
std
(array-like of shape=(n_samples,))Standard deviation of
y
atX
. If criterion is set to "mse", thenstd[i] ~= std(y | X[i])
.