Titanic 문제 - XGBoost

less than 1 minute read

XGBoost?

  • Graident Boosting 라이브러리
  • 결측값을 자동으로 처리해줌(항상 좋다는 건 아님)

참고자료

Titanic 문제에 적용

# 기존과 동일한 부분
import matplotlib.pyplot as plt
import pandas as pd

train = pd.read_csv('train.csv')
test = pd.read_csv('test.csv')

train = train.drop(['Name', 'Cabin', 'Ticket'], axis=1)
test = test.drop(['Name', 'Cabin', 'Ticket'], axis=1)

train['Embarked'] = train['Embarked'].map({'S':0, 'C':1, 'Q':2})
test['Embarked'] = test['Embarked'].map({'S':0, 'C':1, 'Q':2})

train['Sex'] = train['Sex'].map({'male':0, 'female':1})
test['Sex'] = test['Sex'].map({'male':0, 'female':1})

train_x = train.drop(['PassengerId', 'Survived'], axis=1)
train_y = train['Survived']
test_x = test.drop('PassengerId', axis=1)
import xgboost as xgb

model = xgb.XGBClassifier()
model.fit(train_x, train_y)

test_y = model.predict(test_x)

submission = pd.DataFrame({
    'PassengerId' : test['PassengerId'],
    'Survived' : test_y
})
submission.to_csv('xgb.csv', index=None)

제출

0.77511점이 나왔다.

Categories:

Updated:

Comments