Sklearn

 
  1. Sklearn Decision Tree
  2. Sklearn Randomforestclassifier
  3. Sklearn Gridsearchcv
  4. Sklearn Svm
  5. Sklearn Dataset

For general information about writing Scikit-learn training scripts and using Scikit-learn estimators and models with SageMaker, see Using Scikit-learn with the SageMaker Python SDK. Scikit-learn versions supported by the Amazon SageMaker Scikit-learn container: 0.20.0, 0.23-1. Scikit Learn is the popular python library for building a machine learning model. Its features are classification, regression, clustering algorithms. You can use it for data preprocessing, Hot encoding and many things using it. I have seen the new programmers who want to learn machine learning are unable to install Scikit learn properly.

Imputation transformer for completing missing values.

Read more in the User Guide.

New in version 0.20: SimpleImputer replaces the previous sklearn.preprocessing.Imputerestimator which is now removed.

Parameters
missing_valuesint, float, str, np.nan or None, default=np.nan

The placeholder for the missing values. All occurrences ofmissing_values will be imputed. For pandas’ dataframes withnullable integer dtypes with missing values, missing_valuesshould be set to np.nan, since pd.NA will be converted to np.nan.

strategystring, default=’mean’

The imputation strategy.

  • If “mean”, then replace missing values using the mean alongeach column. Can only be used with numeric data.

  • If “median”, then replace missing values using the median alongeach column. Can only be used with numeric data.

  • If “most_frequent”, then replace missing using the most frequentvalue along each column. Can be used with strings or numeric data.If there is more than one such value, only the smallest is returned.

  • If “constant”, then replace missing values with fill_value. Can beused with strings or numeric data.

New in version 0.20: strategy=”constant” for fixed value imputation.

fill_valuestring or numerical value, default=None

When strategy “constant”, fill_value is used to replace alloccurrences of missing_values.If left to the default, fill_value will be 0 when imputing numericaldata and “missing_value” for strings or object data types.

verboseinteger, default=0

Controls the verbosity of the imputer.

copyboolean, default=True

If True, a copy of X will be created. If False, imputation willbe done in-place whenever possible. Note that, in the following cases,a new copy will always be made, even if copy=False:

  • If X is not an array of floating values;

  • If X is encoded as a CSR matrix;

  • If add_indicator=True.

add_indicatorboolean, default=False

If True, a MissingIndicator transform will stack onto outputof the imputer’s transform. This allows a predictive estimatorto account for missingness despite imputation. If a feature has nomissing values at fit/train time, the feature won’t appear onthe missing indicator even if there are missing values attransform/test time.

Sklearn Decision Tree

Attributes
statistics_array of shape (n_features,)

The imputation fill value for each feature.Computing statistics can result in np.nan values.During transform, features corresponding to np.nanstatistics will be discarded.

indicator_MissingIndicator

Indicator used to add binary indicators for missing values.None if add_indicator is False.

See also

IterativeImputer

Multivariate imputation of missing values.

Notes

Columns which only contained missing values at fit are discardedupon transform if strategy is not “constant”.

Examples

Methods

fit(X[, y])

Fit the imputer on X.

fit_transform(X[, y])

Fit to data, then transform it.

get_params([deep])

Get parameters for this estimator.

inverse_transform(X)

Convert the data back to the original representation.

set_params(**params)

Set the parameters of this estimator.

transform(X)

Impute all missing values in X.

fit(X, y=None)[source]

Fit the imputer on X.

Parameters
X{array-like, sparse matrix}, shape (n_samples, n_features)

Input data, where n_samples is the number of samples andn_features is the number of features.

Returns
selfSimpleImputer
fit_transform(X, y=None, **fit_params)[source]

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_paramsand returns a transformed version of X.

Parameters
Xarray-like of shape (n_samples, n_features)

Input samples.

yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None

Target values (None for unsupervised transformations).

**fit_paramsdict

Additional fit parameters.

Returns
X_newndarray array of shape (n_samples, n_features_new)

Transformed array.

get_params(deep=True)[source]

Get parameters for this estimator.

Parameters
deepbool, default=True

If True, will return the parameters for this estimator andcontained subobjects that are estimators.

Returns
paramsdict

Parameter names mapped to their values.

inverse_transform(X)[source]

Convert the data back to the original representation.

Inverts the transform operation performed on an array.This operation can only be performed after SimpleImputer isinstantiated with add_indicator=True.

Note that inverse_transform can only invert the transform infeatures that have binary indicators for missing values. If a featurehas no missing values at fit time, the feature won’t have a binaryindicator, and the imputation done at transform time won’t beinverted.

New in version 0.24.

Parameters
Xarray-like of shape (n_samples, n_features + n_features_missing_indicator)

The imputed data to be reverted to original data. It has to bean augmented array of imputed data and the missing indicator mask.

Returns
X_originalndarray of shape (n_samples, n_features)

The original X with missing values as it was priorto imputation.

set_params(**params)[source]

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects(such as Pipeline). The latter haveparameters of the form <component>__<parameter> so that it’spossible to update each component of a nested object.

Parameters
**paramsdict

Estimator parameters.

Returns
selfestimator instance

Estimator instance.

transform(X)[source]

Impute all missing values in X.

Parameters
X{array-like, sparse matrix}, shape (n_samples, n_features)

The input data to complete.

Identifying to which category an object belongs to.

Applications: Spam detection, Image recognition.
Algorithms:

SVM, nearest neighbors, random forest, …

Predicting a continuous-valued attribute associated with an object.

Applications: Drug response, Stock prices.
Algorithms:

SVR, ridge regression, Lasso, …

Automatic grouping of similar objects into sets.

Applications: Customer segmentation, Grouping experiment outcomes
AlgorithmsSklearn:

k-Means, spectral clustering, mean-shift, …

Sklearn Randomforestclassifier

Reducing the number of random variables to consider.

Applications: Visualization, Increased efficiency
Algorithms:

PCA, feature selection, non-negative matrix factorization.

Comparing, validating and choosing parameters and models.

Goal: Improved accuracy via parameter tuning
Modules:

grid search, cross validation, metrics.

Feature extraction and normalization.

Application: Transforming input data such as text for use with machine learning algorithms.
Modules:

preprocessing, feature extraction.

News

  • On-going development:What's new (Changelog)
  • October 2017. scikit-learn 0.19.1 is available for download (Changelog).
  • July 2017. scikit-learn 0.19.0 is available for download (Changelog).
  • June 2017. scikit-learn 0.18.2 is available for download (Changelog).
  • September 2016. scikit-learn 0.18.0 is available for download (Changelog).
  • November 2015. scikit-learn 0.17.0 is available for download (Changelog).
  • March 2015. scikit-learn 0.16.0 is available for download (Changelog).

Community

  • About us See authors and contributing
  • More Machine Learning Find related projects
  • Questions? See FAQ and stackoverflow
  • Mailing list:[email protected]
  • IRC: #scikit-learn @ freenode
Help us, donate!Cite us!Read more about donations

Who uses scikit-learn?

Sklearn Gridsearchcv

'We use scikit-learn to support leading-edge basic research [...]'

Sklearn Svm

'I think it's the most well-designed ML package I've seen so far.'

'scikit-learn's ease-of-use, performance and overall variety of algorithms implemented has proved invaluable [...].'

Sklearn Dataset

'For these tasks, we relied on the excellent scikit-learn package for Python.'

'The great benefit of scikit-learn is its fast learning curve [...]'

'It allows us to do AWesome stuff we would not otherwise accomplish'

'scikit-learn makes doing advanced analysis in Python accessible to anyone.'