pandas

Pandas

ํŒ๋‹ค์Šค๋Š” ํŒŒ์ด์ฌ์—์„œ ๊ฐ€์žฅ ๋„๋ฆฌ ์‚ฌ์šฉ๋˜๋Š” ๋ฐ์ดํ„ฐ ๋ถ„์„ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋กœ Data Frame ๊ณผ Series ์ž๋ฃŒ๊ตฌ์กฐ๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค.

Series

One-dimensional ndarray with axis labels (including time series).

Ex) df[ํ”ผ์ฒ˜]

Attributes

Name

description

Series.index

The index (axis labels) of the Series

Methods

name

Description

Series.tolist()

Return a list of the values

Series.iteritems()

Lazily iterate over (index, value) tuples. Lazily ํ•˜๊ฒŒ iterate ํ•œ๋‹ค๋Š” ๊ฒƒ์€ for ๋ฌธ ๊ฐ™์€ ๋ฐ˜๋ณต๋ฌธ์—์„œ Series ์˜ (idx, val) ํŠœํ”Œ์„ ํ•˜๋‚˜์”ฉ ๊บผ๋‚ด์“ฐ๊ธฐ ์œ„ํ•จ์ธ ๊ฒƒ row ๊ฐ€ index ๊ฐ€ ๋˜๋Š” Series ํŠน์„ฑ์ƒ for idx, val in enumerate(): ์—์„œ๋Š” row ์“ธ ์ˆ˜ ์—†๊ธฐ ๋•Œ๋ฌธ์— ํ•„์š”ํ•œ ๊ฒƒ ๊ฐ™๋‹ค.

Series.unique()

Return unique values of Series object. type ์€ ๋„˜ํŒŒ์ด๋ฐฐ์—ด์ด๋‹ค.

DataFrame

Two-dimensional, size-mutable, potentially heterogeneous tabular data.

์ธ๋ฑ์Šค์— ์กฐ๊ฑด๋ฌธ์„ ๋„ฃ์–ด์„œ ์ธ๋ฑ์Šค ํ•  ์ˆ˜ ์žˆ์Œ Ex) results = chipo_orderid_group[chipo_orderid_group.item_price >= 10]

Getting data in/out

csv

  • Writing to a csv file

    df.to_csv('[name].csv')
  • Reading frome a csv file

    # file_path = '../ํŒŒ์ผ๋ช…'
    # sep
    # csv ๋Š” ','(default) tsv ๋Š” '\t' 
    df = pd.read_csv(file_path, sep)

Attributes

Ref) df[ํ”ผ์ฒ˜] == df.ํ”ผ์ฒ˜

DafaFrame ์€ ์ธ๋ฑ์‹ฑ ์•ˆ์— ์กฐ๊ฑด๋ฌธ์„ ๋„ฃ์„ ์ˆ˜ ์žˆ๋‹ค. Ex) df[df.ํ”ผ์ฒ˜ >= num] => df ์ค‘ ์กฐ๊ฑด๋ฌธ์— ํ•ด๋‹นํ•˜๋Š” row ๋งŒ ์ทจํ•˜๋Š” df ๋ฅผ ๋ฐ˜ํ™˜

Name

description

df.shape

Return a tuple representing the dimensionality of the DataFrame. => (how many row, how many ํ”ผ์ฒ˜)

df.index

return index (row labels) of the df RangeIndex(start = [num], stop = [num], step= [num])

Methods

Command

description

df.value_counts([subset, normalize, ...])

Return a Series containing counts of unique rows in the DataFrame. ๊ฐ™์€ ํ–‰์ด ๋ช‡๊ฐœ์ธ์ง€ ๊ฐฏ์ˆ˜์˜ ๋‚ด๋ฆผ์ฐจ์ˆœ Series ๋ฅผ ๋ฐ˜ํ™˜ํ•˜๋ฉฐ, ์ˆœ์„œ ์ˆซ์ž ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ, ํ•ด๋‹น ํ–‰์˜ ์ด๋ฆ„์œผ๋กœ ์ธ๋ฑ์‹ฑํ•  ์ˆ˜ ์žˆ๋‹ค. ์ธ์ž๋กœ ํ”ผ์ฒ˜๋ฅผ ์จ๋„ ๋˜๊ณ  df[ํ”ผ์ฒ˜] ๋กœ ์ธ๋ฑ์‹ฑํ•œ df ์— ์ธ์ž์—†๋Š” ๋งค์„œ๋“œ๋ฅผ ๊ฑธ์–ด๋„ ๋œ๋‹ค.

df.info()

Print a concise summary of a DataFrame.

df.head([n])

Return the first n rows

Group DataFrame using a mapper or by a Series of columns.

df.apply()

์ด๊ฑด ๋ญ... apply ์•ˆ์—์„œ ์ ์šฉ๋˜๋Š” ํ•จ์ˆ˜๊ฐ€ ๋” ์ค‘์š”ํ•œ๋ฐ ๋”ฐ๋กœ ์จ์•ผํ•˜๋‚˜ ๊ณ ๋ฏผ์ด๋„ค ๋ฐ์ดํ„ฐ์ „์ฒ˜๋ฆฌ๋ฅผ ์œ„ํ•ด ์‚ฌ์šฉํ•จ

df.sort_values([by, ascending...])

Sort by the values along either axis.

Df.drop_duplicates()

Return DataFrame with duplicate rows removed.

df.fillna()c

๊ฒฐ์ธก์น˜๋“ค์„ ์ธ์ž ๊ฐ’์œผ๋กœ ๋ฐ”๊ฟ”์ค€๋‹ค.

df.corr()

์ƒ๊ด€๊ด€๊ณ„ ํ•จ์ˆ˜ ์ธ์ž๋กœ method ๊ฐ€ ์žˆ๊ณ  'pearson' ์„ ๋งŽ์ด ์“ด๋‹ค.

Property

Command

description

df.iloc[]

์œ„์น˜ ์ •์ˆ˜๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์ธ๋ฑ์‹ฑํ•œ๋‹ค [] ๋Š” ์—ด(column) ์„ ์„ ํƒํ•˜์ง€๋งŒ, .loc, .iloc ์€ ํ–‰(row) ๋ฅผ ์„ ํƒํ•œ๋‹ค

DataFrame.groupby

df.groupby([by]) ํ•จ์ˆ˜์— ์˜ํ•ด ์ƒ์„ฑ๋œ ๊ฐ์ฒด. ์ธ์ž ๋ณ„๋กœ ๊ทธ๋ฃนํ™”๋˜์–ด ์žˆ์œผ๋ฉฐ, ์ธ์ž ๋ณ„๋กœ ๊ทธ๋ฃน๋œ ๊ฒƒ๋“ค์˜ ์–ด๋–ค ํ”ผ์ฒ˜๋ฅผ ์–ด๋–ค ์—ฐ์‚ฐํ•œ ๊ฒฐ๊ณผ๋ฅผ value ๋กœ ๊ฐ€์งˆ ๊ฒƒ์ธ์ง€

df.groupby('๊ทธ๋ฃนํ™”์ธ์ž')[๋Œ€์ƒ ํ”ผ์ฒ˜].์–ด๋–ค์—ฐ์‚ฐํ•จ์ˆ˜()

Methods

name

description

Count()

๊ทธ๋ƒฅ ๊ฐฏ์ˆ˜ ์…ˆ (์ค‘๋ณต์— ์ƒ๊ด€์—†์ด ๊ทธ๋ƒฅ ํ–‰์ด ๋ช‡๊ฐœ์ธ์ง€ ์„ธ๋Š” ๋“ฏ?)

Sum()

๋Œ€์ƒ ํ”ผ์ฒ˜์˜ val ๋“ค์„ ๋ˆ„์  ํ•ฉ ํ•จ

Numpy

array ๊ฐœ๋…์œผ๋กœ ๋ณ€์ˆ˜๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค. ๋„˜ํŒŒ์ด ๋ฐฐ์—ด์€ ๋ฐ์ดํ„ฐ ๋ถ„์„์—์„œ ์“ฐ๋Š” ๊ธฐ๋ณธ ์ž๋ฃŒ๊ตฌ์กฐ. ๋ฒกํ„ฐ, ํ–‰๋ ฌ ๋“ฑ์˜ ์—ฐ์‚ฐ์„ ์‰ฝ๊ณ  ๋น ๋ฅด๊ฒŒ ํ•˜๊ธฐ ์œ„ํ•ด ๋งŒ๋“ค์–ด์ง„ ํŒŒ์ด์ฌ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ

Matplotlib

๋ฐ์ดํ„ฐ๋ฅผ ๊ทธ๋ž˜ํ”„๋กœ ์‹œ๊ฐํ™”ํ•ด์ฃผ๋Š” ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ

matplotlib.pyplot

matplotlib.pyplot is a state-based interface to matplotlib

state-based ๋ฐฉ์‹ (interface) ๊ณผ object-oriented ๋ฐฉ์‹์ด ์žˆ๋Š”๋ฐ ๋งํฌ ์— ์ฐจ์ด์ ์„ ์„ค๋ช…ํ•ด์ฃผ๋Š”๋ฐ ์•„์ง ๊ฐ์ด ์žกํžˆ๋Š” ์ •๋„์ผ ๋ฟ, ์™„๋ฒฝํ•˜๊ฒŒ ์ดํ•ด๋Š” ์•ˆ ๋จ

# ์ผ๋‹จ ํ˜„์žฌ ์ง„ํ–‰ํ•˜๋Š” ๋ฒ”์œ„์—์„œ pyplot ์™ธ์˜ attribute ๋ฅผ ๋ณธ ์ ์ด ์—†์Œ
import matplotlib.pyplot as plt

Last updated