A review of Lending Club installment loan risk performance by applicant-provided job titles
Tremendous amounts of data in financial services organizations are generated from initial application to all subsequent customer interactions. However, new application decisions are often entirely driven by widely available credit bureau information. Credit bureau information is the right foundation; however, incremental application questions can add valuable risk splitting and provide a proprietary edge over the competition allowing institutions to approve deeper and offer better terms.
Lending Club applications at one point were a bastion for alternative data taken at time of applications. Potential applicants could write full paragraph descriptions on why they needed a loan for prospective funders to read. While this particular field was eliminated as Lending Club became less peer-to-peer dependent, Lending Club still collects free-form, applicant–provided employment titles. For example, applicants can enter “Professor”, “Truck Driver”, “Teacher”, “Super Hero”, or enter nothing at all. Given the free-form nature of this field, it is hard to directly incorporate this field into a credit risk model build, 67 thousand unique employment titles were entered between 2016 Q3 and Q4. However, neural networks and natural language processing is tailor made for this type of data and can be used to see if there are any usable insights that could be generalized and incorporated.
I won’t go deeply into the technical details, but leveraging Keras, Python deep-learning library, and pre-trained word vectors, I trained a neural network using employment titles from 2016Q3-Q4 Lending Club bookings with a 1/0 target if the borrower charged off or defaulted in the first 18-months of the loan.
Upon training the neural network, I broke out a hold-out population into the top 10% most risky predicted titles and top 10% least risky predicted titles, and everyone else. To make sure the employment titles are not just capturing the risk splitting already captured in Lending Club’s loan rating system, A-grade (least risky) to G-grade (most risky), I reviewed loan risk performance by loan grade and employment title risk groups. High predicted risk employment titles display between 20% and 100% higher risk, and low predicted risk employment titles display between 16% and 40% lower risk (see chart below).
Commonly occurring titles in the High Risk Titles group included: “Driver”, “Server”, and no entry. Commonly occurring titles in Low Risk Titles Group included: “Accountant”, “President”, and “Attorney”. While Lending Club cannot compliantly use specific individual employment titles as a reason for declining an applicant, upon review of the common titles in each group, non-salary, hours dependent jobs, like a ride-share driver and server, appear to present incremental risk not currently captured in the Lending Club loan grading system. Adding a question around salary vs. non-salary employment may be a way to capture much of the employment title insight and meet the compliance hurdles required to be incorporated into a future loan grading model.
- Employment title and potentially salary vs. non-salary employment has potential as a credit risk splitter in consumer lending on top of bureau information
- New tools like neural networks and natural language processing,while challenging to directly incorporate into credit models today, can be used to unearth new insights, after which, more traditional methodologies can be leveraged to bring in-market change and drive incremental value