pandas.factorize

less than 1 minute read

pandas.factorize

object를 enumerated type이나 categorical variable로 변환한다. 참고 : pandas.factorize

예시

>>> codes, uniques = pd.factorize(['b', 'b', 'a', 'c', 'b'])
>>> codes
# output : array([0, 0, 1, 2, 0])
>>> uniques
# output : array(['b', 'a', 'c'], dtype=object)

[‘b’, ‘b’, ‘a’, ‘c’, ‘b’]를 factorize하면 [0, 0, 1, 2, 0]과 [‘b’, ‘a’, ‘c’]가 반환된다.

# before
# |'Sex'   | 'Embarked' |
# |--------|------------|
# | male   | S          |
# | female | C          |
for f in ['Sex', 'Embarked']:
    train[f] = (train[f].factorize())[0]
# after
# |'Sex'   | 'Embarked' |
# |--------|------------|
# | 0      | 0          |
# | 1      | 1          |

‘Sex’, ‘Embarked’ column을 categorical variable로 변환한다.

Categories:

Updated:

Comments