案例背景
借贷业务是大多数银行重要的资金来源,通过资金的流动赚取利润,例如将吸引用户办理定期存款,将存款转化为贷款出借给需求方赚取利息作为利润。提高存贷款流动性即为银行开展业务的基本逻辑。
本案例中的银行为Thera Bank,该银行大多数业务为储蓄业务,为了将存款用户更多地转化为贷款客户,银行去年为存量客户开展了一项推广活动,有部分客户增加了贷款业务。本数据集记录的就是参与了此次活动的用户信息
该类分析能够帮助零售营销部门制定精准营销策略,以尽可能少的资源触达客户提高获客成功率。该部门希望识别出更有可能购买贷款的潜在客户,提高转化成功率,同时降低广告费用。
使用mysql读取csv数据
READ DATA LOCAL INFILE 'E:\bank_personal_loan_modelling.csv' INTO TABLE bank.loan FIELDS TERMINATED BY ',';
Q1.本次推广活动的效果如何
SELECT
personal_loan,
COUNT(personal_loan) AS Count,
CONCAT(ROUND(COUNT(personal_loan)/(SELECT COUNT(*) FROM loan)*100,1),'%') AS Percentage
FROM loan
GROUP BY personal_loan;
在对于参与本次活动的5000名用户中,共有480名用户被转化为贷款对象,占整体的9.6%,接下来的分析都围绕着这480名贷款用户展开
Q2.申请贷款的用户年龄分布是什么样的
SELECT MIN(age), MAX(age) FROM loan; # Check the upper bound and lower bound of age
ALTER TABLE loan ADD COLUMN age_group VARCHAR(255); # adding column
UPDATE loan SET age_group = CASE # binning
WHEN age >=20 AND Age <=29 THEN '20-30'
WHEN Age >=30 AND Age <=39 THEN '30-40'
WHEN Age >=40 AND Age <=49 THEN '40-50'
WHEN Age >=50 AND Age <=59 THEN '50-60'
else '60-70'
END;
SELECT
age_group AS Age_Group,
COUNT(personal_loan) AS Counts,
CONCAT(ROUND(COUNT(personal_loan)/(SELECT COUNT(*) FROM loan WHERE personal_loan = 1)*100,1),'%') AS Percentage
FROM loan
WHERE personal_loan = 1
GROUP BY age_group WITH ROLLUP;
在这里为了增强查询语句的可读性,首先通过ALTER和UPDATE语句增加一栏对年龄进行分组,而后展开分析。
不难看出30-60岁这一区间占据了贷款人数的70%以上,这部分人群收入稳定偿付能力较强
Q3. 贷款用户的收入状况
SELECT MAX(income), MIN(income) FROM loan; # Check the lower and upper bound of income
ALTER TABLE loan ADD COLUMN income_group VARCHAR(255);
UPDATE loan SET income_group = CASE
WHEN Income >=60 AND Income <100 THEN '60000-100000'
WHEN Income >=100 AND Income <140 THEN '100000-140000'
WHEN Income >=140 AND Income <180 THEN '140000-180000'
ELSE '>=180000'
END;
SELECT
income_group AS Income_group,
COUNT(personal_loan) AS Counts,
CONCAT(ROUND(COUNT(personal_loan)/(SELECT COUNT(*) FROM loan WHERE personal_loan = 1)*100,1),'%') AS Percentage
FROM loan
WHERE personal_loan = 1
GROUP BY income_group WITH ROLLUP;
贷款客户只要集中在100,000-180,000之间,收入较低的用户可能没达到授信标准,而收入较高的群体可能不需要个人贷款
Q4.贷款用户的家庭人口特征如何
SELECT MIN(family), MAX(family) FROM loan;
SELECT
family AS Family_size,
COUNT(*) AS Counts,
SUM(CASE WHEN personal_loan=1 THEN 1 ELSE 0 END) AS Personal_loan_counts,
ROUND((SUM(CASE WHEN personal_loan=1 THEN 1 ELSE 0 END)/COUNT(*)*100),1) AS Percentage
FROM loan
WHERE family = 1
UNION
SELECT
family AS Family_size,
COUNT(*) AS Counts,
SUM(CASE WHEN personal_loan=1 THEN 1 ELSE 0 END) AS Personal_loan_counts,
ROUND((SUM(CASE WHEN personal_loan=1 THEN 1 ELSE 0 END)/COUNT(*)*100),1) AS Percentage
FROM loan
WHERE family = 2
UNION
SELECT
family AS Family_size,
COUNT(*) AS Counts,
SUM(CASE WHEN personal_loan=1 THEN 1 ELSE 0 END) AS Personal_loan_counts,
ROUND((SUM(CASE WHEN personal_loan=1 THEN 1 ELSE 0 END)/COUNT(*)*100),1) AS Percentage
FROM loan
WHERE family = 3
UNION
SELECT
family AS Family_size,
COUNT(*) AS Counts,
SUM(CASE WHEN personal_loan=1 THEN 1 ELSE 0 END) AS Personal_loan_counts,
ROUND((SUM(CASE WHEN personal_loan=1 THEN 1 ELSE 0 END)/COUNT(*)*100),1) AS Percentage
FROM loan
WHERE family = 4;
这里通过UNION语句连接多个列数相同的行记录
总体来看,5000名客户中单身人士稍多一些,但组间差距不是太大。3口和4口之家的贷款人数比例较高,可能是为解决平时生活开支而申请
Q5.每月信用卡消费情况
SELECT MIN(ccavg), MAX(ccavg) FROM loan;
ALTER TABLE loan ADD COLUMN spending_group VARCHAR(255);
UPDATE loan SET spending_group = CASE
WHEN CCAvg >=0 AND CCAvg <1 THEN'0-1000'
WHEN CCAvg >=1 AND CCAvg <2 THEN'1000-2000'
WHEN CCAvg >=2 AND CCAvg <3 THEN'2000-3000'
WHEN CCAvg >=3 AND CCAvg <4 THEN'3000-4000'
WHEN CCAvg >=4 AND CCAvg <5 THEN'4000-5000'
WHEN CCAvg >=5 AND CCAvg <6 THEN'5000-6000'
WHEN CCAvg >=6 AND CCAvg <7 THEN'6000-7000'
WHEN CCAvg >=7 AND CCAvg <8 THEN'7000-8000'
WHEN CCAvg >=8 AND CCAvg <9 THEN'8000-9000'
ELSE '9000-10000'
END;
SELECT
spending_group,
COUNT(*) AS Counts,
SUM(CASE WHEN personal_loan=1 THEN 1 ELSE 0 END) AS Personal_loan_counts,
CONCAT(ROUND((SUM(CASE WHEN personal_loan=1 THEN 1 ELSE 0 END)/COUNT(*)*100),2),'%') AS Percentage
FROM loan
GROUP BY spending_group;
信用卡消费额在5000元以上的贷款意愿更为强烈
Q6. 贷款用户是否原本有房屋在抵押
SELECT
(SELECT COUNT(*) FROM loan WHERE personal_loan=1) AS Personal_loan,
SUM(CASE WHEN mortgage > 0 THEN 1 ELSE 0 END) AS Mortage_count,
CONCAT(ROUND(SUM(CASE WHEN mortgage > 0 THEN 1 ELSE 0 END)/(SELECT COUNT(*) FROM loan WHERE personal_loan=1) *100,1),'%') AS Mortgage_percentage,
MAX(mortgage)*1000 AS MAX_mortgage,
ROUND(AVG(mortgage)*1000) AS AVG_mortgage
FROM loan
WHERE personal_loan = 1 AND mortgage <> 0;
在贷款用户中,有房屋抵押的占35%,最大抵押价值617000,均值为288131
Q7. 根据房屋抵押价值对用户分类
SELECT MIN(mortgage), MAX(mortgage) FROM loan;
ALTER TABLE loan ADD COLUMN mortgage_group VARCHAR(255);
UPDATE loan SET mortgage_group = CASE
WHEN mortgage = 0 THEN 0
WHEN mortgage >=1 AND mortgage <100 THEN '1000-100000'
WHEN mortgage >=100 AND mortgage <200 THEN '100000*200000'
WHEN mortgage >=200 AND mortgage <300 THEN '200000-300000'
WHEN mortgage >=300 AND mortgage <400 THEN '300000-400000'
WHEN mortgage >= 400 AND mortgage <500 THEN '400000-500000'
WHEN mortgage >= 500 AND mortgage <600 THEN '500000-600000'
ELSE '>600000'
END;
SELECT
mortgage_group,
COUNT(*) AS Mortgage_count,
SUM(personal_loan) Personal_loan_count,
CONCAT(ROUND(SUM(personal_loan)/COUNT(*)*100,1),'%') AS percentage
FROM loan
GROUP BY mortgage_group;
在已经办理了房屋抵押的用户中,房屋抵押价值在300,000以上的办理个人贷款的意愿更高