데이터 확인
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
data = pd.read_csv('advertising.csv')
data
data.info()
sns.distplot(data['Area Income'])
sns.distplot(data['Age'])
# 텍스트로 된 컬럼 확인
data['Country'].nunique() # 237
data['City'].nunique() # 969
data['Ad Topic Line'].nunique() # 1000
결측값 확인 및 처리
# 결측값 확인 및 처리
data.isna().sum() / len(data)
data.dropna()
data.drop('Age', axis=1)
round(data['Age'].mean()) # 36
data['Age'].median() # 35.0
# 결측값 평균으로 대체
data = data.fillna(round(data['Age'].mean()))
data.isna().sum()
모델링
# train / test 나누기
from sklearn.model_selection import train_test_split
X = data[['Daily Time Spent on Site','Age','Area Income', 'Daily Internet Usage','Male']]
y = data['Clicked on Ad']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 100)
# 로지스틱 리그레션 모델 만들기
from sklearn.linear_model import LogisticRegression
model = LogisticRegression()
model.fit(X_train, y_train)
# Coefficient 확인
model.coef_
# 예측 및 평가
pred = model.predict(X_test)
y_test
from sklearn.metrics import accuracy_score, confusion_matrix
accuracy_score(y_test, pred) # 0.9
confusion_matrix(y_test, pred) # 아래 사진 참조
'빅데이터 분석가 양성과정 > Python - 머신러닝' 카테고리의 다른 글
Ecommerce Machine Learning - 구매 요인 분석 ( Decision Tree) (0) | 2024.07.12 |
---|---|
Ecommerce Machine Learning - 고객 이탈 예측(KNN) (2) | 2024.07.12 |
DBSCAN( Density Based Spatial clustering of application with noise) (0) | 2024.07.12 |
군집화 (K-Means) - 실습 (0) | 2024.07.12 |
군집화 (K-Means) - 군집 평가 ( 실루엣 분석 ) (1) | 2024.07.12 |