'프로젝트/kaggle & Dacon' 카테고리의 글 목록 (6 Page)

지난시간에 이어서 피처 엔지니어링 + 베이스라인 모델 생성이다. 데이터 합치기와 데이터 나누기는 지난 챌린지를 참고하면 되고, 이번에는 먼저 결과를 봐보자. from sklearn.preprocessing import OneHotEncoder encoder = OneHotEncoder() # 원-핫 인코더 생성 all_data_encoded = encoder.fit_transform(all_data) # 원-핫 인코딩 적용 all_data_encoded X_train.shape (298042, 5700) from sklearn.metrics import roc_auc_score # ROC AUC 점수 계산 함수 # 검증 데이터 ROC AUC roc_auc = roc_auc_score(y_valid, y_..

프로젝트/kaggle & Dacon 2024. 1. 22. 20:09

숙제 (2)

https://www.kaggle.com/c/cat-in-the-dat-ii Categorical Feature Encoding Challenge II | Kaggle www.kaggle.com 앞서 풀었던 이진분류 경진대회와 똑같은 성격이다. This follow-up competition offers an even more challenging dataset so that you can continue to build your skills with the common machine learning task of encoding categorical variables. This challenge adds the additional complexity of feature interactions, as w..

프로젝트/kaggle & Dacon 2024. 1. 22. 14:25

숙제 (1)

피처 요약표 6장 적용 def resumetable(df): print(f'데이터 세트 형상: {df.shape}') summary = pd.DataFrame(df.dtypes, columns=['데이터 타입']) summary = summary.reset_index() summary = summary.rename(columns={'index': '피처'}) summary['결측값 개수'] = df.isnull().sum().values summary['고윳값 개수'] = df.nunique().values summary['첫 번째 값'] = df.loc[0].values summary['두 번째 값'] = df.loc[1].values summary['세 번째 값'] = df.loc[2].values ..

프로젝트/kaggle & Dacon 2024. 1. 22. 13:40

범주형 이진분류 경진대회 (5) - 성능 개선 2번째

보호되어 있는 글입니다.

보호글 2024. 1. 22. 13:25

범주형 이진분류 경진대회 (4) - 성능 개선

보호되어 있는 글입니다.

보호글 2024. 1. 22. 13:05

범주형 데이터 이진분류 경진대회 (3) - 베이스라인 모델

보호되어 있는 글입니다.

보호글 2024. 1. 22. 11:29

시카로의 공부방

티스토리툴바