Assignment 2: Predicting Healthcare Employee

2023-12-22 Assignment 2: Predicting Healthcare Employee

Assignment 2: Predicting Healthcare Employee

Attrition

Assigned: 23/11/2023 Due: 22/12/2023

Competition Overview

In the face of unprecedented challenges posed by the COVID-19 pandemic, the

healthcare industry is grappling with a surge in employee attrition, particularly among

nurses. Addressing this issue is critical for maintaining the stability and quality of

healthcare services. The Kaggle competition, "Predicting Healthcare Employee Attrition",

focuses on leveraging machine learning and analytics to develop robust models that can

accurately predict whether an employee is likely to leave their position.

Dataset Details

The dataset provided for this competition is a synthetic compilation, drawing inspiration

from the IBM Watson dataset for attrition. To align the data with the healthcare domain,

employee roles and departments have been strategically modified. Additionally, some

known outcomes for specific employees have been altered to enhance the challenge

and foster the development of advanced machine learning models.

Objective

The overarching goal of this competition is to encourage data scientists, machine

learning practitioners, and analysts to employ cutting-edge techniques in predictive

modelling. By accurately forecasting employee attrition, participants can contribute

valuable insights to healthcare organizations, enabling them to implement targeted

strategies for employee retention and ultimately enhance the stability and efficiency of

healthcare services during these challenging times.

Competition Entrance


Tasks 1 (60 Marks)

1. Create an account on https://www.kaggle.com/. You MUST set a TEAM NAME with

the format.

2. Create a notebook on Kaggle. Conduct exploratory data analysis and data processing,

train and validate your models, and generate your ‘submission.csv’ on test data using

the notebook on Kaggle. Add necessary comments to your notebook. Download the

notebook from Kaggle and submit it to Learning Mall, with the name in

(Name_Student ID). (30 Marks)

3. Submit your predictions (‘submission.csv’) for the test solution to Kaggle. Also, you are

required to include your Kaggle score in your report (see below in Task 2). (30 Marks)

Tasks 2 (40 Marks)

Write a 1-page report, which must contain 2 or 3 tables or figures.

? Name your report with Name_Student ID.

? Submit your report to Learning Mall.

The report must cover:

? Introduction: (4 Marks)

What is the background of this project? How is it related to Big Data?

? Methodology: (8 Marks)

A. Data Preprocessing

What are the steps of data pre-preprocessing explored before training? Data

visualization, data cleaning and reduction, normalization and discretization,

feature selection, imbalanced data, etc. No need to cover all of them.

B. Classification Algorithm

How does it work? Explain the algorithm or framework.

? Results: (10 Marks)

Are there other competitor models for this project? How does it compare to your

technology?

? Discussion: (4 Marks)

What are the good aspects, and what are the bad aspects? Be sure to add a sentence on “contributor thoughts:” What are your own unique thoughts on the

pros and cons of the technology? Do you envision an extension that might be

helpful?

? Conclusion: (4 Marks)

Summarize the 2 to 4 points you think are most important.

Concise, information-rich content. For each of the sections above, you will not simply

be graded on having content but on the quality of the content and how well it answers

the questions in concise, clear, and engaging terms.

Style. (10 Marks)

In order to make your report consistent and visually appealing, as well as to make the

evaluation of your work fairly, each page should be conformed to the following

specifications:

? Margins: approx. 0.5” on all 4 sides.

? Columns: 2 with approx. 0.3in margin; justified text

? Fonts:

? Body text: Times New Roman, 11pt.

? Section headings: Calibri 13pt bold-Italic

? Within captions, tables, figures, or images: Calibri 9-11pt.

? Line Spacing:

? Body text: Single (1.0)

? Section headings: 6pt spacing above heading

Academic Honesty. Copying chunks of code or problem-solving answers from other

students, online or other resources is prohibited. You are responsible for both (1) not

copying others’ work, and (2) making sure your work is not accessible to others.

Assignments will be extensively checked for copying of others’ work. Problem-solving

solutions are expected to be original, using concepts discussed in the book, class, or

supplemental materials but not using any direct code or answers. Please see the syllabus

for additional policies.