资料分析套件-pandas-profiling-58码农网

缘起

每拿到新资料时，总用pandas做一些重複性的探勘工作，
今天发现一个好套件-pandas-profiling，
套件作者觉得describe实在是太阳春了，用这个一键帮你完成以下初步的资料分析。

Essentials: type, unique values, missing valuesQuantile statistics: minimum, Q1, median, Q3, maximum, range, interquartile rangeDescriptive statistics: mean, mode, sd, sum, MAD, coef., kurtosis, skewnessMost frequent valuesHistogramCorrelations heatmap(Pearman and Pearson)

本文

安装(择一)

pip install pandas-profilingconda install pandas-profiling

需求
目前是连网版，需要网路连线下载一些Bootstrap跟JQuery。

準备好资料

from sklearn.datasets import load_bostondata = load_boston()["data"]cols = load_boston()["feature_names"]df = pd.DataFrame(data=data, columns=cols)

丢进去分析

profile = pandas_profiling.ProfileReport(df)profile.to_file(outputfile="output.html")  #支援输出html

ProfileReport Attributes
df : DataFrame
　　Data to be analyzed
bins : int
　　Number of bins in histogram.
　　The default is 10.
check_correlation : boolean
　　Whether or not to check correlation.
　　It's True by default.
correlation_threshold: float
　　Threshold to determine if the variable pair is correlated.
　　The default is 0.9.
correlation_overrides : list
　　Variable names not to be rejected because they are correlated.
　　There is no variable in the list (None) by default.
check_recoded : boolean
　　Whether or not to check recoded correlation (memory heavy feature).
　　Since it's an expensive computation it can be activated for small datasets.
　　check_correlation must be true to disable this check.
　　It's False by default.
pool_size : int
　　Number of workers in thread pool
　　The default is equal to the number of CPU.
Methods
get_description
　　 Return the description (a raw statistical summary) of the dataset.
get_rejected_variables
　　 Return the list of rejected variable or an empty list if there is no rejected variables.
to_file
　　 Write the report to a file.
to_html
　　 Return the report as an HTML string.

点进去可以看detail

好东西分享，真是太方便了对吧?感恩作者，讚叹作者!!

Reference:

官网

缘起

本文

ProfileReport Attributes

Methods

好东西分享，真是太方便了对吧?感恩作者，讚叹作者!!

Reference:

给这篇文章的作者打赏

关于作者: 网站小编

相关文章

HBO Max vs.Netflix：当你负担不起两者时如何选择

课内笔记整理---作业系统实务(资安相关篇)

excel vba捞网页数据问题

热门文章

1资料分析套件-pandas-profiling

2[SQL Server] DTS - SQL Server Enterprise Manager

3JavaScript 基础知识-操控 HTML 的方法 .createElement

4不要错过填问卷抽电影票！

5[自学笔记] JavaScript :特性、事件、方法