匯總數(shù)據(jù)的函數(shù)在python中,python中統(tǒng)計數(shù)據(jù)的總和

數(shù)據(jù)分析員用python做數(shù)據(jù)分析是怎么回事，需要用到python中的那些內(nèi)容，具體是怎么操作的?

最近，Analysis with Programming加入了Planet Python。我這里來分享一下如何通過Python來開始數(shù)據(jù)分析。具體內(nèi)容如下：

創(chuàng)新互聯(lián)公司一直在為企業(yè)提供服務(wù)，多年的磨煉，使我們在創(chuàng)意設(shè)計，全網(wǎng)整合營銷推廣到技術(shù)研發(fā)擁有了開發(fā)經(jīng)驗。我們擅長傾聽企業(yè)需求，挖掘用戶對產(chǎn)品需求服務(wù)價值，為企業(yè)制作有用的創(chuàng)意設(shè)計體驗。核心團隊擁有超過十多年以上行業(yè)經(jīng)驗，涵蓋創(chuàng)意，策化，開發(fā)等專業(yè)領(lǐng)域，公司涉及領(lǐng)域有基礎(chǔ)互聯(lián)網(wǎng)服務(wù)西云機房、app開發(fā)定制、手機移動建站、網(wǎng)頁設(shè)計、網(wǎng)絡(luò)整合營銷。

數(shù)據(jù)導(dǎo)入

導(dǎo)入本地的或者web端的CSV文件；

數(shù)據(jù)變換；

數(shù)據(jù)統(tǒng)計描述；

假設(shè)檢驗

單樣本t檢驗；

可視化；

創(chuàng)建自定義函數(shù)。

數(shù)據(jù)導(dǎo)入

這是很關(guān)鍵的一步，為了后續(xù)的分析我們首先需要導(dǎo)入數(shù)據(jù)。通常來說，數(shù)據(jù)是CSV格式，就算不是，至少也可以轉(zhuǎn)換成CSV格式。在Python中，我們的操作如下：

import pandas as pd

# Reading data locally

df = pd.read_csv('/Users/al-ahmadgaidasaad/Documents/d.csv')

# Reading data from web

data_url = ""

df = pd.read_csv(data_url)

為了讀取本地CSV文件，我們需要pandas這個數(shù)據(jù)分析庫中的相應(yīng)模塊。其中的read_csv函數(shù)能夠讀取本地和web數(shù)據(jù)。

END

數(shù)據(jù)變換

既然在工作空間有了數(shù)據(jù)，接下來就是數(shù)據(jù)變換。統(tǒng)計學(xué)家和科學(xué)家們通常會在這一步移除分析中的非必要數(shù)據(jù)。我們先看看數(shù)據(jù)（下圖）

對R語言程序員來說，上述操作等價于通過print(head(df))來打印數(shù)據(jù)的前6行，以及通過print(tail(df))來打印數(shù)據(jù)的后6行。當(dāng)然Python中，默認(rèn)打印是5行，而R則是6行。因此R的代碼head(df, n = 10)，在Python中就是df.head(n = 10)，打印數(shù)據(jù)尾部也是同樣道理

請點擊輸入圖片描述

在R語言中，數(shù)據(jù)列和行的名字通過colnames和rownames來分別進行提取。在Python中，我們則使用columns和index屬性來提取，如下：

# Extracting column names

print df.columns

# OUTPUT

Index([u'Abra', u'Apayao', u'Benguet', u'Ifugao', u'Kalinga'], dtype='object')

# Extracting row names or the index

print df.index

# OUTPUT

Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78], dtype='int64')

數(shù)據(jù)轉(zhuǎn)置使用T方法，

# Transpose data

print df.T

# OUTPUT

0 ? ? ?1 ? ? 2 ? ? ?3 ? ? 4 ? ? ?5 ? ? 6 ? ? ?7 ? ? 8 ? ? ?9

Abra ? ? ?1243 ? 4158 ?1787 ?17152 ?1266 ? 5576 ? 927 ?21540 ?1039 ? 5424

Apayao ? ?2934 ? 9235 ?1922 ?14501 ?2385 ? 7452 ?1099 ?17038 ?1382 ?10588

Benguet ? ?148 ? 4287 ?1955 ? 3536 ?2530 ? ?771 ?2796 ? 2463 ?2592 ? 1064

Ifugao ? ?3300 ? 8063 ?1074 ?19607 ?3315 ?13134 ?5134 ?14226 ?6842 ?13828

Kalinga ?10553 ?35257 ?4544 ?31687 ?8520 ?28252 ?3106 ?36238 ?4973 ?40140

... ? ? ? 69 ? ? 70 ? ? 71 ? ? 72 ? ? 73 ? ? 74 ? ? 75 ? ? 76 ? ? 77

Abra ? ? ... ? ?12763 ? 2470 ?59094 ? 6209 ?13316 ? 2505 ?60303 ? 6311 ?13345

Apayao ? ... ? ?37625 ?19532 ?35126 ? 6335 ?38613 ?20878 ?40065 ? 6756 ?38902

Benguet ?... ? ? 2354 ? 4045 ? 5987 ? 3530 ? 2585 ? 3519 ? 7062 ? 3561 ? 2583

Ifugao ? ... ? ? 9838 ?17125 ?18940 ?15560 ? 7746 ?19737 ?19422 ?15910 ?11096

Kalinga ?... ? ?65782 ?15279 ?52437 ?24385 ?66148 ?16513 ?61808 ?23349 ?68663

Abra ? ? ?2623

Apayao ? 18264

Benguet ? 3745

Ifugao ? 16787

Kalinga ?16900

Other transformations such as sort can be done using codesort/code attribute. Now let's extract a specific column. In Python, we do it using either codeiloc/code or codeix/code attributes, but codeix/code is more robust and thus I prefer it. Assuming we want the head of the first column of the data, we have

其他變換，例如排序就是用sort屬性。現(xiàn)在我們提取特定的某列數(shù)據(jù)。Python中，可以使用iloc或者ix屬性。但是我更喜歡用ix，因為它更穩(wěn)定一些。假設(shè)我們需數(shù)據(jù)第一列的前5行，我們有：

print df.ix[:, 0].head()

# OUTPUT 0 ? ? 1243 1 ? ? 4158 2 ? ? 1787 3 ? ?17152 4 ? ? 1266 Name: Abra, dtype: int64

順便提一下，Python的索引是從0開始而非1。為了取出從11到20行的前3列數(shù)據(jù)，我們有

print df.ix[10:20, 0:3]

# OUTPUT

Abra ?Apayao ?Benguet

10 ? ?981 ? ?1311 ? ? 2560

11 ?27366 ? 15093 ? ? 3039

12 ? 1100 ? ?1701 ? ? 2382

13 ? 7212 ? 11001 ? ? 1088

14 ? 1048 ? ?1427 ? ? 2847

15 ?25679 ? 15661 ? ? 2942

16 ? 1055 ? ?2191 ? ? 2119

17 ? 5437 ? ?6461 ? ? ?734

18 ? 1029 ? ?1183 ? ? 2302

19 ?23710 ? 12222 ? ? 2598

20 ? 1091 ? ?2343 ? ? 2654

上述命令相當(dāng)于df.ix[10:20, ['Abra', 'Apayao', 'Benguet']]。

為了舍棄數(shù)據(jù)中的列，這里是列1(Apayao)和列2(Benguet)，我們使用drop屬性，如下：

print df.drop(df.columns[[1, 2]], axis = 1).head()

# OUTPUT

Abra ?Ifugao ?Kalinga

0 ? 1243 ? ?3300 ? ?10553

1 ? 4158 ? ?8063 ? ?35257

2 ? 1787 ? ?1074 ? ? 4544

3 ?17152 ? 19607 ? ?31687

4 ? 1266 ? ?3315 ? ? 8520

axis?參數(shù)告訴函數(shù)到底舍棄列還是行。如果axis等于0，那么就舍棄行。

END

統(tǒng)計描述

下一步就是通過describe屬性，對數(shù)據(jù)的統(tǒng)計特性進行描述：

print df.describe()

# OUTPUT

Abra ? ? ? ?Apayao ? ? ?Benguet ? ? ? ?Ifugao ? ? ? Kalinga

count ? ? 79.000000 ? ? 79.000000 ? ?79.000000 ? ? 79.000000 ? ? 79.000000

mean ? 12874.379747 ?16860.645570 ?3237.392405 ?12414.620253 ?30446.417722

std ? ?16746.466945 ?15448.153794 ?1588.536429 ? 5034.282019 ?22245.707692

min ? ? ?927.000000 ? ?401.000000 ? 148.000000 ? 1074.000000 ? 2346.000000

25% ? ? 1524.000000 ? 3435.500000 ?2328.000000 ? 8205.000000 ? 8601.500000

50% ? ? 5790.000000 ?10588.000000 ?3202.000000 ?13044.000000 ?24494.000000

75% ? ?13330.500000 ?33289.000000 ?3918.500000 ?16099.500000 ?52510.500000

max ? ?60303.000000 ?54625.000000 ?8813.000000 ?21031.000000 ?68663.000000

END

假設(shè)檢驗

Python有一個很好的統(tǒng)計推斷包。那就是scipy里面的stats。ttest_1samp實現(xiàn)了單樣本t檢驗。因此，如果我們想檢驗數(shù)據(jù)Abra列的稻谷產(chǎn)量均值，通過零假設(shè)，這里我們假定總體稻谷產(chǎn)量均值為15000，我們有：

from scipy import stats as ss

# Perform one sample t-test using 1500 as the true mean

print ss.ttest_1samp(a = df.ix[:, 'Abra'], popmean = 15000)

# OUTPUT

(-1.1281738488299586, 0.26270472069109496)

返回下述值組成的元祖：

t : 浮點或數(shù)組類型t統(tǒng)計量

prob : 浮點或數(shù)組類型two-tailed p-value 雙側(cè)概率值

通過上面的輸出，看到p值是0.267遠大于α等于0.05，因此沒有充分的證據(jù)說平均稻谷產(chǎn)量不是150000。將這個檢驗應(yīng)用到所有的變量，同樣假設(shè)均值為15000，我們有：

print ss.ttest_1samp(a = df, popmean = 15000)

# OUTPUT

(array([ -1.12817385, ? 1.07053437, -65.81425599, ?-4.564575 ?, ? 6.17156198]),

array([ ?2.62704721e-01, ? 2.87680340e-01, ? 4.15643528e-70,

1.83764399e-05, ? 2.82461897e-08]))

第一個數(shù)組是t統(tǒng)計量，第二個數(shù)組則是相應(yīng)的p值

END

可視化

Python中有許多可視化模塊，最流行的當(dāng)屬matpalotlib庫。稍加提及，我們也可選擇bokeh和seaborn模塊。之前的博文中，我已經(jīng)說明了matplotlib庫中的盒須圖模塊功能。

請點擊輸入圖片描述

# Import the module for plotting

import matplotlib.pyplot as plt

plt.show(df.plot(kind = 'box'))

現(xiàn)在，我們可以用pandas模塊中集成R的ggplot主題來美化圖表。要使用ggplot，我們只需要在上述代碼中多加一行，

import matplotlib.pyplot as plt

pd.options.display.mpl_style = 'default' # Sets the plotting display theme to ggplot2

df.plot(kind = 'box')

這樣我們就得到如下圖表：

請點擊輸入圖片描述

比matplotlib.pyplot主題簡潔太多。但是在本文中，我更愿意引入seaborn模塊，該模塊是一個統(tǒng)計數(shù)據(jù)可視化庫。因此我們有：

# Import the seaborn library

import seaborn as sns

# Do the boxplot

plt.show(sns.boxplot(df, widths = 0.5, color = "pastel"))

請點擊輸入圖片描述

多性感的盒式圖，繼續(xù)往下看。

請點擊輸入圖片描述

plt.show(sns.violinplot(df, widths = 0.5, color = "pastel"))

請點擊輸入圖片描述

plt.show(sns.distplot(df.ix[:,2], rug = True, bins = 15))

請點擊輸入圖片描述

with sns.axes_style("white"):

plt.show(sns.jointplot(df.ix[:,1], df.ix[:,2], kind = "kde"))

請點擊輸入圖片描述

plt.show(sns.lmplot("Benguet", "Ifugao", df))

END

創(chuàng)建自定義函數(shù)

在Python中，我們使用def函數(shù)來實現(xiàn)一個自定義函數(shù)。例如，如果我們要定義一個兩數(shù)相加的函數(shù)，如下即可：

def add_2int(x, y):

return x + y

print add_2int(2, 2)

# OUTPUT

順便說一下，Python中的縮進是很重要的。通過縮進來定義函數(shù)作用域，就像在R語言中使用大括號{…}一樣。這有一個我們之前博文的例子：

產(chǎn)生10個正態(tài)分布樣本，其中和

基于95%的置信度，計算和?;

重復(fù)100次; 然后

計算出置信區(qū)間包含真實均值的百分比

Python中，程序如下：

import numpy as np

import scipy.stats as ss

def case(n = 10, mu = 3, sigma = np.sqrt(5), p = 0.025, rep = 100):

m = np.zeros((rep, 4))

for i in range(rep):

norm = np.random.normal(loc = mu, scale = sigma, size = n)

xbar = np.mean(norm)

low = xbar - ss.norm.ppf(q = 1 - p) * (sigma / np.sqrt(n))

up = xbar + ss.norm.ppf(q = 1 - p) * (sigma / np.sqrt(n))

if (mu low) (mu up):

rem = 1

else:

rem = 0

m[i, :] = [xbar, low, up, rem]

inside = np.sum(m[:, 3])

per = inside / rep

desc = "There are " + str(inside) + " confidence intervals that contain "

"the true mean (" + str(mu) + "), that is " + str(per) + " percent of the total CIs"

return {"Matrix": m, "Decision": desc}

上述代碼讀起來很簡單，但是循環(huán)的時候就很慢了。下面針對上述代碼進行了改進，這多虧了?Python專家

import numpy as np

import scipy.stats as ss

def case2(n = 10, mu = 3, sigma = np.sqrt(5), p = 0.025, rep = 100):

scaled_crit = ss.norm.ppf(q = 1 - p) * (sigma / np.sqrt(n))

norm = np.random.normal(loc = mu, scale = sigma, size = (rep, n))

xbar = norm.mean(1)

low = xbar - scaled_crit

up = xbar + scaled_crit

rem = (mu low) (mu up)

m = np.c_[xbar, low, up, rem]

inside = np.sum(m[:, 3])

per = inside / rep

desc = "There are " + str(inside) + " confidence intervals that contain "

"the true mean (" + str(mu) + "), that is " + str(per) + " percent of the total CIs"

return {"Matrix": m, "Decision": desc}

像Excel一樣使用Python（一）

在進行數(shù)據(jù)處理時，如果數(shù)據(jù)簡單，數(shù)量不多，excel是大家的首選。但是當(dāng)數(shù)據(jù)眾多，類型復(fù)雜，需要靈活地顯示切片、進行索引、以及排序時，python會更加方便。借助python中的numpy和pandas庫，它能快速完成各種任務(wù)，包括數(shù)據(jù)的創(chuàng)建、檢查、清洗、預(yù)處理、提取、篩選、匯總、統(tǒng)計等。接下來幾篇文章，將以excel為參照，介紹python中數(shù)據(jù)的處理。

提到pandas，那就不得不提兩類重要的數(shù)據(jù)結(jié)構(gòu)，Series和DataFrame，這兩類數(shù)據(jù)結(jié)構(gòu)都是建立在numpy的數(shù)組array基礎(chǔ)上。與array相比，Series是一個一維的數(shù)據(jù)集，但是每個數(shù)據(jù)元素都帶有一個索引，有點類似于字典。而DataFrame在數(shù)組的基礎(chǔ)上，增加了行索引和列索引，類似于Series的字典，或者說是一個列表集。

所以在數(shù)據(jù)處理前，要安裝好numpy , pandas。接下來就看看如何完成一套完整的數(shù)據(jù)操作。

創(chuàng)建數(shù)據(jù)表的方法分兩種，分別是從外部導(dǎo)入數(shù)據(jù)，以及直接寫入數(shù)據(jù)。

在python中，也可外部導(dǎo)入xlsx格式文件，使用read_excel()函數(shù)：

import pandas as pd

from pandas import DataFrame,Series

data=DataFrame(pd.read_excel('c:/python27/test.xlsx'))

print data

輸出：

Gene Size Function

0 arx1 411 NaN

1 arx2 550 monooxygenase

2 arx3 405 aminotransferase

……

即：調(diào)用pandas中read_excel屬性，來讀取文件test.xlsx，并轉(zhuǎn)換成DataFrame格式，賦給變量data。在每一行后，自動分了一個索引值。除了excel，還支持以下格式文件的導(dǎo)入和寫入：

Python寫入的方法有很多，但還是不如excel方便。常用的例如使用相等長度的字典或numpy數(shù)組來創(chuàng)建：

data1 = DataFrame(

{'Gene':['arx1','arx2','arx3'],

'Size':[411,550,405],

'Func':[np.NaN,'monooxygenase','aminotransferase ']})

print data1

輸出

Func Gene Size

0 NaN arx1 411

1 monooxyg arx2 550

2 amino arx3 405

分配一個行索引后，自動排序并輸出。

在python中，可以使用info()函數(shù)查看整個數(shù)據(jù)的詳細信息。

print data.info()

輸出

RangeIndex: 7 entries, 0 to 6

Data columns (total 3 columns):

Gene 7 non-null object

Size 7 non-null int64

Function 5 non-null object

dtypes: int64(1), object(2)

memory usage: 240.0+ bytes

None

此外，還可以通過shape, column, index, values, dtypes等函數(shù)來查看數(shù)據(jù)維度、行列組成、所有的值、數(shù)據(jù)類型：

print data1.shape

print data1.index

print data1.columns

print data1.dtypes

輸出

(3, 3)

RangeIndex(start=0, stop=3, step=1)

Index([u'Func', u'Gene', u'Size'], dtype='object')

Func object

Gene object

Size int64

dtype: object

在excel中可以按“F5”，在“定位條件”中選擇“空值”，選中后，輸入替換信息，再按“Ctrl+Enter”即可完成替換。

在python中，使用函數(shù) isnull 和 notnull 來檢測數(shù)據(jù)丟失, 包含空值返回True，不包含則返回False。

pd.isnull(data1)

pd.notnull(data1)

也可以使用函數(shù)的實例方法，以及加入?yún)?shù)，對某一列進行檢查：

print data1['Func'].isnull()

輸出

Func Gene Size

0 True False False

1 False False False

2 False False False

再使用fillna對空值進行填充：

data.fillna(value=0)

#用0來填充空值

data['Size'].fillna(data1['Size'].mean())

#用data1中Size列的平均值來填充空值

data['Func']=data['Func'].map(str.strip)

#清理Func列中存在的空格

Excel中可以按“Ctrl+F”，可調(diào)出替換對話框，替換相應(yīng)數(shù)據(jù)。

Python中，使用replace函數(shù)替換：

data['Func'].replace('monooxygenase', 'oxidase')

將Func列中的'monooxygenase'替換成'oxidase'。

Excel中，通過“數(shù)據(jù)-篩選-高級”可以選擇性地看某一列的唯一值。

Python中，使用unique函數(shù)查看：

print data['Func'].unique()

輸出

[nan u'monooxygenase' u'aminotransferase' u'methyltransferase']

Excel中，通過UPPER、LOWER、PROPER等函數(shù)來變成大寫、小寫、首字母大寫。

Python中也有同名函數(shù)：

data1['Gene'].str.lower()

Excel中可以通過“數(shù)據(jù)-刪除重復(fù)項”來去除重復(fù)值。

Python中，可以通過drop_duplicates函數(shù)刪除重復(fù)值：

print data['Func'].drop_duplicates()

輸出

0 NaN

1 monooxygenase

2 aminotransferase

3 methyltransferase

Name: Func, dtype: object

還可以設(shè)置“ keep=’last’ ”參數(shù)，后出現(xiàn)的被保留，先出現(xiàn)的被刪除：

print data['Func'].drop_duplicates(keep='last')

輸出

2 aminotransferase

3 methyltransferase

6 monooxygenase

8 NaN

Name: Func, dtype: object

內(nèi)容參考：

Python For Data Analysis

藍鯨網(wǎng)站分析博客，作者藍鯨（王彥平）

用Python怎么統(tǒng)計一個列表的元素種類和各個種類的個數(shù)？

統(tǒng)計一個列表中每一個元素的個數(shù)在Python里有兩種實現(xiàn)方式，

第一種是新建一個dict，鍵是列表中的元素，值是統(tǒng)計的個數(shù)，然后遍歷list。

items?=?["cc","cc","ct","ct","ac"]

count?=?{}

for?item?in?items:

count[item]?=?count.get(item,?0)?+?1

print(count)

#{'ac':?1,?'ct':?2,?'cc':?2}

之中用到了一個小技巧，當(dāng)dict中不還沒有統(tǒng)計過一個元素時，直接索引count[item]會報錯，而使用get方法count.get(item, 0)能夠設(shè)置索引不存在的鍵時返回0。

第二種是使用Python內(nèi)置的函數(shù)。統(tǒng)計元素的個數(shù)是一種非常常見的操作，Python的collection包里已經(jīng)有一個Counter的類，大致實現(xiàn)了上面的功能。

from?collections?import?Counter

items?=?["cc","cc","ct","ct","ac"]

count?=?Counter(items)

print(count)

#Counter({'ct':?2,?'cc':?2,?'ac':?1})

Python數(shù)據(jù)分析怎么入門?

一、數(shù)據(jù)獲取Python具有靈活易用，方便讀寫的特點，其可以非常方便地調(diào)用數(shù)據(jù)庫和本地的數(shù)據(jù)，同時，Python也是當(dāng)下網(wǎng)絡(luò)爬蟲的首選工具。Scrapy爬蟲，Python開發(fā)的一個快速、高層次的屏幕抓取和web抓取框架，用于抓取web站點并從頁面中提取結(jié)構(gòu)化的數(shù)據(jù)。Scrapy用途廣泛，可以用于數(shù)據(jù)挖掘、監(jiān)測和自動化測試。

二、數(shù)據(jù)整理NumPy提供了許多高級的數(shù)值編程工具，如：矩陣數(shù)據(jù)類型、矢量處理，以及精密的運算庫。專為進行嚴(yán)格的數(shù)字處理而產(chǎn)生。多為很多大型金融公司使用，以及核心的科學(xué)計算組織如：Lawrence Livermore，NASA用其處理一些本來使用C++，F(xiàn)ortran或Matlab等所做的任務(wù)。PandasPandas是基于NumPy的一種工具，該工具是為了解決數(shù)據(jù)分析任務(wù)而創(chuàng)建的。Pandas納入了大量庫和一些標(biāo)準(zhǔn)的數(shù)據(jù)模型，提供了高效地操作大型數(shù)據(jù)集所需的工具。pandas提供了大量能使我們快速便捷地處理數(shù)據(jù)的函數(shù)和方法。你很快就會發(fā)現(xiàn)，它是使Python成為強大而高效的數(shù)據(jù)分析環(huán)境的重要因素之一。

三、建模分析Scikit-learn從事數(shù)據(jù)分析建模必學(xué)的包，提供及匯總了當(dāng)前數(shù)據(jù)分析領(lǐng)域常見的算法及解決問題，如分類問題、回歸問題、聚類問題、降維、模型選擇、特征工程。四、數(shù)據(jù)可視化如果在Python中看可視化，你可能會想到Matplotlib。除此之外，Seaborn是一個類似的包，這是用于統(tǒng)計可視化的包。

關(guān)于Python數(shù)據(jù)分析怎么入門，環(huán)球青藤小編就和大家分享到這里了，學(xué)習(xí)是永無止境的，學(xué)習(xí)一項技能更是受益終身，所以，只要肯努力學(xué)，什么時候開始都不晚。如果您還想繼續(xù)了解關(guān)于python編程的學(xué)習(xí)方法及素材等內(nèi)容，可以點擊本站其他文章學(xué)習(xí)。

python做數(shù)據(jù)分析主要干哪些事情

第一、檢查數(shù)據(jù)表

Python中使用shape函數(shù)來查看數(shù)據(jù)表的維度，也就是行數(shù)以及列數(shù)。你可以使用info函數(shù)來查看數(shù)據(jù)表的整體信息，使用dtype函數(shù)來返回數(shù)據(jù)格式;lsnull是Python中檢驗空值的函數(shù)，可以對整個數(shù)據(jù)表進行檢查，也可以單獨對某一行進行空值檢查，返回的結(jié)構(gòu)是邏輯值，包含空值返回true，不包含則返回false。

第二、數(shù)據(jù)清洗

Python可以進行數(shù)據(jù)清洗，Python中處理空值的方法比較靈活，可以使用Dropna函數(shù)用來刪除數(shù)據(jù)表中包含空值的數(shù)據(jù)，也可以使用fillna函數(shù)對空值進行填充;Python中dtype是查看數(shù)據(jù)格式的函數(shù)，與之對應(yīng)的是astype函數(shù)，用來更改數(shù)據(jù)格式，Rename是更改列名稱的函數(shù)，drop_duplicates函數(shù)刪除重復(fù)值，replace函數(shù)實現(xiàn)數(shù)據(jù)替換。

第三、數(shù)據(jù)提取

進行數(shù)據(jù)提取時，主要使用三個函數(shù)：loc、iloc以及ix。Loc函數(shù)按標(biāo)簽進行提取，iloc按位置進行提取，ix可以同時按照標(biāo)簽和位置進行提取。除了按標(biāo)簽和位置提取數(shù)據(jù)之外，還可以按照具體的條件進行提取，比如使用loc和isin兩個函數(shù)配合使用。

第四、數(shù)據(jù)篩選

Python數(shù)據(jù)分析還可以進行數(shù)據(jù)篩選，Python中使用loc函數(shù)配合篩選條件來完成篩選功能，配合sum和count函數(shù)還能實現(xiàn)Excel中sumif和countif函數(shù)的功能。使用的主要函數(shù)是groupby和pivot_table;groupby是進行分類匯總的函數(shù)，使用方法比較簡單，groupby按列名稱出現(xiàn)的順序進行分組。

當(dāng)前名稱：匯總數(shù)據(jù)的函數(shù)在python中,python中統(tǒng)計數(shù)據(jù)的總和
當(dāng)前鏈接：http://chinadenli.net/article12/dsigcgc.html

成都網(wǎng)站建設(shè)公司_創(chuàng)新互聯(lián)，為您提供網(wǎng)站營銷、軟件開發(fā)、關(guān)鍵詞優(yōu)化、、網(wǎng)站排名、微信小程序

聲明：本網(wǎng)站發(fā)布的內(nèi)容（圖片、視頻和文字）以用戶投稿、用戶轉(zhuǎn)載內(nèi)容為主，如果涉及侵權(quán)請盡快告知，我們將會在第一時間刪除。文章觀點不代表本網(wǎng)站立場，如需處理請聯(lián)系客服。電話：028-86922220；郵箱：631063699@qq.com。內(nèi)容未經(jīng)允許不得轉(zhuǎn)載，或轉(zhuǎn)載時需注明來源：創(chuàng)新互聯(lián)

猜你還喜歡下面的內(nèi)容

欧美一区二区三区老妇人-欧美做爰猛烈大尺度电-99久久夜色精品国产亚洲a-亚洲福利视频一区二区

匯總數(shù)據(jù)的函數(shù)在python中,python中統(tǒng)計數(shù)據(jù)的總和

數(shù)據(jù)分析員用python做數(shù)據(jù)分析是怎么回事，需要用到python中的那些內(nèi)容，具體是怎么操作的?

像Excel一樣使用Python（一）

用Python怎么統(tǒng)計一個列表的元素種類和各個種類的個數(shù)？

Python數(shù)據(jù)分析怎么入門?

python做數(shù)據(jù)分析主要干哪些事情