Titanic 문제 - XGBoost
XGBoost?
- Graident Boosting 라이브러리
- 결측값을 자동으로 처리해줌(항상 좋다는 건 아님)
참고자료
Titanic 문제에 적용
# 기존과 동일한 부분
import matplotlib.pyplot as plt
import pandas as pd
train = pd.read_csv('train.csv')
test = pd.read_csv('test.csv')
train = train.drop(['Name', 'Cabin', 'Ticket'], axis=1)
test = test.drop(['Name', 'Cabin', 'Ticket'], axis=1)
train['Embarked'] = train['Embarked'].map({'S':0, 'C':1, 'Q':2})
test['Embarked'] = test['Embarked'].map({'S':0, 'C':1, 'Q':2})
train['Sex'] = train['Sex'].map({'male':0, 'female':1})
test['Sex'] = test['Sex'].map({'male':0, 'female':1})
train_x = train.drop(['PassengerId', 'Survived'], axis=1)
train_y = train['Survived']
test_x = test.drop('PassengerId', axis=1)
import xgboost as xgb
model = xgb.XGBClassifier()
model.fit(train_x, train_y)
test_y = model.predict(test_x)
submission = pd.DataFrame({
'PassengerId' : test['PassengerId'],
'Survived' : test_y
})
submission.to_csv('xgb.csv', index=None)
제출
0.77511점이 나왔다.
Comments