# Sklearn

For general information about writing Scikit-learn training scripts and using Scikit-learn estimators and models with SageMaker, see Using Scikit-learn with the SageMaker Python SDK. Scikit-learn versions supported by the Amazon SageMaker Scikit-learn container: 0.20.0, 0.23-1. Scikit Learn is the popular python library for building a machine learning model. Its features are classification, regression, clustering algorithms. You can use it for data preprocessing, Hot encoding and many things using it. I have seen the new programmers who want to learn machine learning are unable to install Scikit learn properly.

Imputation transformer for completing missing values.

Read more in the User Guide.

New in version 0.20: `SimpleImputer`

replaces the previous `sklearn.preprocessing.Imputer`

estimator which is now removed.

**missing_values**int, float, str, np.nan or None, default=np.nan

The placeholder for the missing values. All occurrences of`missing_values`

will be imputed. For pandas’ dataframes withnullable integer dtypes with missing values, `missing_values`

should be set to `np.nan`

, since `pd.NA`

will be converted to `np.nan`

.

**strategy**string, default=’mean’

The imputation strategy.

If “mean”, then replace missing values using the mean alongeach column. Can only be used with numeric data.

If “median”, then replace missing values using the median alongeach column. Can only be used with numeric data.

If “most_frequent”, then replace missing using the most frequentvalue along each column. Can be used with strings or numeric data.If there is more than one such value, only the smallest is returned.

If “constant”, then replace missing values with fill_value. Can beused with strings or numeric data.

New in version 0.20: strategy=”constant” for fixed value imputation.

**fill_value**string or numerical value, default=None

When strategy “constant”, fill_value is used to replace alloccurrences of missing_values.If left to the default, fill_value will be 0 when imputing numericaldata and “missing_value” for strings or object data types.

**verbose**integer, default=0

Controls the verbosity of the imputer.

**copy**boolean, default=True

If True, a copy of X will be created. If False, imputation willbe done in-place whenever possible. Note that, in the following cases,a new copy will always be made, even if `copy=False`

:

If X is not an array of floating values;

If X is encoded as a CSR matrix;

If add_indicator=True.

**add_indicator**boolean, default=False

If True, a `MissingIndicator`

transform will stack onto outputof the imputer’s transform. This allows a predictive estimatorto account for missingness despite imputation. If a feature has nomissing values at fit/train time, the feature won’t appear onthe missing indicator even if there are missing values attransform/test time.

## Sklearn Decision Tree

**statistics_**array of shape (n_features,)

The imputation fill value for each feature.Computing statistics can result in `np.nan`

values.During `transform`

, features corresponding to `np.nan`

statistics will be discarded.

**indicator_**

`MissingIndicator`

Indicator used to add binary indicators for missing values.`None`

if add_indicator is False.

See also

`IterativeImputer`

Multivariate imputation of missing values.

Notes

Columns which only contained missing values at `fit`

are discardedupon `transform`

if strategy is not “constant”.

Examples

Methods

| Fit the imputer on X. |

| Fit to data, then transform it. |

| Get parameters for this estimator. |

| Convert the data back to the original representation. |

| Set the parameters of this estimator. |

| Impute all missing values in X. |

`fit`

(*X*,

*y=None*)[source]¶

Fit the imputer on X.

**X**{array-like, sparse matrix}, shape (n_samples, n_features)

Input data, where `n_samples`

is the number of samples and`n_features`

is the number of features.

**self**SimpleImputer

`fit_transform`

(*X*,

*y=None*,

***fit_params*)[source]¶

Fit to data, then transform it.

Fits transformer to `X`

and `y`

with optional parameters `fit_params`

and returns a transformed version of `X`

.

**X**array-like of shape (n_samples, n_features)

Input samples.

**y**array-like of shape (n_samples,) or (n_samples, n_outputs), default=None

Target values (None for unsupervised transformations).

****fit_params**dict

Additional fit parameters.

**X_new**ndarray array of shape (n_samples, n_features_new)

Transformed array.

`get_params`

(*deep=True*)[source]¶

Get parameters for this estimator.

**deep**bool, default=True

If True, will return the parameters for this estimator andcontained subobjects that are estimators.

**params**dict

Parameter names mapped to their values.

`inverse_transform`

(*X*)[source]¶

Convert the data back to the original representation.

Inverts the `transform`

operation performed on an array.This operation can only be performed after `SimpleImputer`

isinstantiated with `add_indicator=True`

.

Note that `inverse_transform`

can only invert the transform infeatures that have binary indicators for missing values. If a featurehas no missing values at `fit`

time, the feature won’t have a binaryindicator, and the imputation done at `transform`

time won’t beinverted.

New in version 0.24.

**X**array-like of shape (n_samples, n_features + n_features_missing_indicator)

The imputed data to be reverted to original data. It has to bean augmented array of imputed data and the missing indicator mask.

**X_original**ndarray of shape (n_samples, n_features)

The original X with missing values as it was priorto imputation.

`set_params`

(***params*)[source]¶

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects(such as `Pipeline`

). The latter haveparameters of the form `<component>__<parameter>`

so that it’spossible to update each component of a nested object.

****params**dict

Estimator parameters.

**self**estimator instance

Estimator instance.

`transform`

(*X*)[source]¶

Impute all missing values in X.

**X**{array-like, sparse matrix}, shape (n_samples, n_features)

The input data to complete.

Identifying to which category an object belongs to.

**Applications**: Spam detection, Image recognition.

**Algorithms**:

SVM, nearest neighbors, random forest, …

Predicting a continuous-valued attribute associated with an object.

**Applications**: Drug response, Stock prices.

**Algorithms**:

SVR, ridge regression, Lasso, …

Automatic grouping of similar objects into sets.

**Applications**: Customer segmentation, Grouping experiment outcomes

**Algorithms**:

k-Means, spectral clustering, mean-shift, …

## Sklearn Randomforestclassifier

Reducing the number of random variables to consider.

**Applications**: Visualization, Increased efficiency

**Algorithms**:

PCA, feature selection, non-negative matrix factorization.

Comparing, validating and choosing parameters and models.

**Goal**: Improved accuracy via parameter tuning

**Modules**:

grid search, cross validation, metrics.

Feature extraction and normalization.

**Application**: Transforming input data such as text for use with machine learning algorithms.

**Modules**:

preprocessing, feature extraction.

#### News

*On-going development:**What's new*(Changelog)*October 2017.*scikit-learn 0.19.1 is available for download (Changelog).*July 2017.*scikit-learn 0.19.0 is available for download (Changelog).*June 2017.*scikit-learn 0.18.2 is available for download (Changelog).*September 2016.*scikit-learn 0.18.0 is available for download (Changelog).*November 2015.*scikit-learn 0.17.0 is available for download (Changelog).*March 2015.*scikit-learn 0.16.0 is available for download (Changelog).

#### Community

*About us*See authors and contributing*More Machine Learning*Find related projects*Questions?*See FAQ and stackoverflow*Mailing list:*[email protected]*IRC:*#scikit-learn @ freenode

**donate!**

**Cite us!**Read more about donations

#### Who uses scikit-learn?

## Sklearn Gridsearchcv

*'We use scikit-learn to support leading-edge basic research [...]'*

## Sklearn Svm

*'I think it's the most well-designed ML package I've seen so far.'*

*'scikit-learn's ease-of-use, performance and overall variety of algorithms implemented has proved invaluable [...].'*

## Sklearn Dataset

*'For these tasks, we relied on the excellent scikit-learn package for Python.'*

*'The great benefit of scikit-learn is its fast learning curve [...]'*

*'It allows us to do AWesome stuff we would not otherwise accomplish'*

*'scikit-learn makes doing advanced analysis in Python accessible to anyone.'*