pandas
Pandas
νλ€μ€λ νμ΄μ¬μμ κ°μ₯ λ리 μ¬μ©λλ λ°μ΄ν° λΆμ λΌμ΄λΈλ¬λ¦¬λ‘ Data Frame κ³Ό Series μλ£κ΅¬μ‘°λ₯Ό μ¬μ©νλ€.
Series
One-dimensional ndarray with axis labels (including time series).
Ex) df[νΌμ²]
Attributes
Name
description
Series.index
The index (axis labels) of the Series
Methods
name
Description
Series.tolist()
Return a list of the values
Series.iteritems()
Lazily iterate over (index, value) tuples.
Lazily νκ² iterate νλ€λ κ²μ for λ¬Έ κ°μ λ°λ³΅λ¬Έμμ Series μ (idx, val) ννμ νλμ© κΊΌλ΄μ°κΈ° μν¨μΈ κ²
row κ° index κ° λλ Series νΉμ±μ for idx, val in enumerate(): μμλ row μΈ μ μκΈ° λλ¬Έμ νμν κ² κ°λ€.
Series.unique()
Return unique values of Series object. type μ λνμ΄λ°°μ΄μ΄λ€.
DataFrame
Two-dimensional, size-mutable, potentially heterogeneous tabular data.
μΈλ±μ€μ 쑰건문μ λ£μ΄μ μΈλ±μ€ ν μ μμ Ex) results = chipo_orderid_group[chipo_orderid_group.item_price >= 10]
Getting data in/out
csv
Writing to a csv file
Reading frome a csv file
Attributes
Ref) df[νΌμ²] == df.νΌμ²
DafaFrame μ μΈλ±μ± μμ 쑰건문μ λ£μ μ μλ€. Ex) df[df.νΌμ² >= num] => df μ€ μ‘°κ±΄λ¬Έμ ν΄λΉνλ row λ§ μ·¨νλ df λ₯Ό λ°ν
Name
description
df.shape
Return a tuple representing the dimensionality of the DataFrame. => (how many row, how many νΌμ²)
df.index
return index (row labels) of the df RangeIndex(start = [num], stop = [num], step= [num])
Methods
Command
description
df.value_counts([subset, normalize, ...])
Return a Series containing counts of unique rows in the DataFrame. κ°μ νμ΄ λͺκ°μΈμ§ κ°―μμ λ΄λ¦Όμ°¨μ Series λ₯Ό λ°ννλ©°, μμ μ«μ λΏλ§ μλλΌ, ν΄λΉ νμ μ΄λ¦μΌλ‘ μΈλ±μ±ν μ μλ€. μΈμλ‘ νΌμ²λ₯Ό μ¨λ λκ³ df[νΌμ²] λ‘ μΈλ±μ±ν df μ μΈμμλ λ§€μλλ₯Ό κ±Έμ΄λ λλ€.
df.info()
Print a concise summary of a DataFrame.
df.head([n])
Return the first n rows
Group DataFrame using a mapper or by a Series of columns.
df.apply()
μ΄κ±΄ λ... apply μμμ μ μ©λλ ν¨μκ° λ μ€μνλ° λ°λ‘ μ¨μΌνλ κ³ λ―Όμ΄λ€ λ°μ΄ν°μ μ²λ¦¬λ₯Ό μν΄ μ¬μ©ν¨
df.sort_values([by, ascending...])
Sort by the values along either axis.
Df.drop_duplicates()
Return DataFrame with duplicate rows removed.
df.fillna()c
κ²°μΈ‘μΉλ€μ μΈμ κ°μΌλ‘ λ°κΏμ€λ€.
df.corr()
μκ΄κ΄κ³ ν¨μ μΈμλ‘ method κ° μκ³ 'pearson' μ λ§μ΄ μ΄λ€.
Property
Command
description
df.iloc[]
μμΉ μ μλ₯Ό κΈ°λ°μΌλ‘ μΈλ±μ±νλ€ [] λ μ΄(column) μ μ ννμ§λ§, .loc, .iloc μ ν(row) λ₯Ό μ ννλ€
DataFrame.groupby
df.groupby([by]) ν¨μμ μν΄ μμ±λ κ°μ²΄. μΈμ λ³λ‘ κ·Έλ£Ήνλμ΄ μμΌλ©°, μΈμ λ³λ‘ κ·Έλ£Ήλ κ²λ€μ μ΄λ€ νΌμ²λ₯Ό μ΄λ€ μ°μ°ν κ²°κ³Όλ₯Ό value λ‘ κ°μ§ κ²μΈμ§
df.groupby('κ·Έλ£ΉνμΈμ')[λμ νΌμ²].μ΄λ€μ°μ°ν¨μ()
Methods
name
description
Count()
κ·Έλ₯ κ°―μ μ (μ€λ³΅μ μκ΄μμ΄ κ·Έλ₯ νμ΄ λͺκ°μΈμ§ μΈλ λ―?)
Sum()
λμ νΌμ²μ val λ€μ λμ ν© ν¨
Numpy
array κ°λ μΌλ‘ λ³μλ₯Ό μ¬μ©νλ€. λνμ΄ λ°°μ΄μ λ°μ΄ν° λΆμμμ μ°λ κΈ°λ³Έ μλ£κ΅¬μ‘°. 벑ν°, νλ ¬ λ±μ μ°μ°μ μ½κ³ λΉ λ₯΄κ² νκΈ° μν΄ λ§λ€μ΄μ§ νμ΄μ¬ λΌμ΄λΈλ¬λ¦¬
Matplotlib
λ°μ΄ν°λ₯Ό κ·Έλνλ‘ μκ°νν΄μ£Όλ λΌμ΄λΈλ¬λ¦¬
matplotlib.pyplot
matplotlib.pyplot is a state-based interface to matplotlib
state-based λ°©μ (interface) κ³Ό object-oriented λ°©μμ΄ μλλ° λ§ν¬ μ μ°¨μ΄μ μ μ€λͺ ν΄μ£Όλλ° μμ§ κ°μ΄ μ‘νλ μ λμΌ λΏ, μλ²½νκ² μ΄ν΄λ μ λ¨
Last updated