Practical Report


          Assignment 3 Written Practical Report

Modules 4–11 are particularly relevant for this assignment. Assignment 3 relates to the specific course learning objectives 1, 2, 3 and 4:


1.     apply knowledge of people, markets, finances, technology and management in a global context of business intelligence practice (data warehousing and big data architecture, data mining process, data visualization and performance management) and resulting organizational change and understand how these apply to the implementation of business intelligence in organization systems and business processes

2.     identify and solve complex organizational problems creatively and practically through the use of business intelligence and critically reflect on how evidence based decision making and sustainable business performance management can effectively address real-world problems

3.     comprehend and address complex ethical dilemmas that arise from evidence based decision making and business performance management

4.     communicate effectively in a clear and concise manner in written report style for senior Management with the correct and appropriate acknowledgment of the main ideas presented and discussed.


Note you must use Rapid Miner Studio for Task 2 and Tableau Desktop for Task 3 in this Assignment 3. Failure to do so may result in Task 2 and/or 3 not being marked and zero marks awarded. Your Assignment 3 submission is automatically submitted to and checked in Turnitin for academic integrity when you submit your Assignment 3 via the course study Assignment 3 submission link. Note carefully University policy on Academic Misconduct such as plagiarism, collusion and cheating. If any of these occur they will be found and dealt with by the USQ Academic Integrity Procedures. If proven, Academic Misconduct may result in failure of an individual assessment, the entire course or exclusion from a University program or programs.

Assignment 3 consists of three main tasks and a number of sub tasks Task 1 (Worth 20 marks)


Task 1.1 Choose a large organisation located within Australia that is publicly listed on Australia Stock Exchange and is already actively engaged in the Information Age. Briefly describe your chosen organisation and include the url link to their corporate website and explain why you have chosen this organisation for Task 1 about 250 words).

Task 1.2 Conduct a desktop research to analyse your chosen organisation in terms of the security and privacy policy statements available on its website. Provide the url links to the security and privacy policy statements available online in your answer to Task 1 (ii) and then discuss how governance of privacy and security of data is addressed in this organisation drawing on the nine core principles of the Australian Data Governance Draft Code of Practice :

1.             No-harm rule
2.             Honesty & transparency
3.             Fairness
4.             Choice
5.             Accuracy and access
6.             Accountability
7.             Stewardship
8.             Security
9.             Enforcement

to guide your analysis and discussion (about 1250 words)

You will find it useful for Task 1.2 to review the following National Press Club

Task 2 (Worth 35 Marks)

The goal of Task 2 is to predict whether a person has diabetes or not based on data collected on 768 female Pima Indians contained in the diabetes.csv data set provided for Assignment 3 Task 2 (see Table 2.1 for the Data Dictionary for diabetes.csv data set below). It is important you understand this data set in order to complete Task 2 and four sub tasks.

Table 2.1 Data Dictionary for diabetes.csv

Variable Name
Data
Description

Type

Pregnancies
Integer
Number of Times Pregnant - Gestational Diabetes- age 25+
Glucose
Integer
Plasma glucose concentration after 2 hours in an oral glucose


tolerance test, normal when less than/equal to 110 mg/dL
Blood Pressure
Integer
Diastolic blood pressure (mm Hg) : 60-80 mm normal
Skin Thickness
Integer
Triceps skin fold thickness (mm) used to determine body fat


percent - Normal 23mm
Insulin
Integer
2-Hour serum insulin (mu U/ml) Greater than 150 mu U/ml


relates to insulin therapy
BMI
Real
BMI: Body mass index (weight in kg/(height in m)^2)


Ideal Range between 18.5 and 24.9, Less 18.5 underweight,


over 24.9 overweight – there is a link between obesity and


diabetes
Diabetes Pedigree Function
Real
Diabetes Pedigree Function equates to History of diabetes in


family (a) 0.5 (50%) for parent, full sibling (b) 0.25 (25%)


half sibling, grandparent, aunt, or uncle (c) 0.125 (12.5%)


half aunt, half uncle, or first cousin
Age
Integer
Age in years
Outcome
Integer
Class variable (0 or 1) classification prediction of diabetes 0


= False, 1 = True

Task 2.1 Conduct an exploratory data analysis of the diabetes.csv data set using RapidMiner Studio data mining tool.


Provide the following for Task 2.1:


(i)                A screen capture of your final EDA process and briefly describe your final EDA process.

(ii)             Summarise the key results of your exploratory data analysis in a table named Table 2.1 Results of Exploratory Data Analysis for Diabetes.csv

(iii)           Discuss the key results of your exploratory data analysis and provide a rationale for selecting your top 5-6 variables for predicting diabetes as the outcome based on the results of your exploratory data analysis and a review of the relevant literature on key factors contributing to likelihood of developing diabetes

(About 500 words)


Table 2.1 should include the key characteristics of each variable in the diabetes.csv data set such as maximum, minimum values, average, standard deviation, most frequent values (mode), missing values and invalid values etc.

Hint: The Statistics Tab and the Chart Tab in RapidMiner provide a lot of descriptive statistical information and the ability to create useful charts like Barcharts, Scatterplots etc for the EDA analysis. You might also like to look at running some correlations and chi


square tests on the diabetes.csv data set to indicate which variables you consider to be the top 5-6 key variables which contribute most to predicting diabetes as an outcome.


Task 2.2 Build a Decision Tree model for predicting diabetes based on the diabetes.csv data set using RapidMiner and an appropriate set of data mining operators and a reduced set of variables from diabetes.csv determined by your exploratory data analysis in Task 2.1.


Provide the following for Task 2.2:


(i)                (1) Final Decision Tree Model process, (2) Final Decision Tree diagram, and (3) Decision tree rules.

(ii)             Briefly explain your final Decision Tree Model Process, and discuss the results of the Final Decision Tree Model drawing on the key outputs (Decision Tree Diagram,

Decision Tree Rules) for predicting diabetes. This discussion should be based on the contribution of each of the top five variables to the Final Decision Tree Model and relevant supporting literature on the interpretation of decision trees

(About 250 words).

Task 2.3 Build a Logistic Regression model for predicting the diabetes based on the diabetes.csv data set using RapidMiner and an appropriate set of data mining operators and a reduced set of variables determined by your exploratory data analysis in Task 2.1.

Provide the following for Task 2.3:


(i)                (1) Final Logistic Regression Model process and (2) Coefficients, and (3) Odds Ratios. Hint you will need to install the Weka Extension in RapidMiner, use W-Logistic Regression Operator for this Task 2.3.

(ii)             Briefly explain your final Logistic Regression Model Process and discuss the results of the Final Logistic Regression Model drawing on the key outputs (Coefficients, Odds Ratios) for predicting diabetes. This discussion should be based on the contribution of each of the top five variables to the Final Logistic Regression Model

and relevant supporting literature on the interpretation of logistic regression models (About 250 words).

Task 2.4 Conduct a comparative performance evaluation of your Final Decision Tree Model with your Final Logistic Regression Model for predicting diabetes. Note you will need to use the Cross Validation Operator; Apply Model Operator and Performance (Binominal Classification) Operator in your final data mining process models (Decision Tree, Logistic Regression) to generate the required model performance metrics (Accuracy, Miscalculation Rate, True Positive Rate, False Positive Rate, Area under Roc Chart (AUC), Precision, Recall, Lift, Sensitivity, F Measure) required for Task 2.4.


Provide the following for Task 2.4:


(i)                A screen snapshot of the Confusion Matrix and AUC for each Final Model (Decision Tree, Logistic Regression)

(ii)             A table named Table 2.2 Results of Model Performance Evaluation (Decision Tree, Logistic Regression) that compares the key results of the performance evaluation for the Final Decision Tree Model and Final Logistic Regression Model in terms of Model Accuracy, Miscalculation Rate, True Positive Rate, False Positive Rate, Precision, Recall, Lift, Sensitivity, F Measure.

(iii)           Discuss and compare the key results of your performance evaluation of two final models (Decision Tree, Logistic Regression) presented in parts i and ii of the Task 2.4, indicate which model is better and explain why (About 500 words).


All important outputs from data mining analyses conducted using RapidMiner for Task 2 should be included in your Assignment 3 report to provide support for conclusions reached regarding each analysis conducted for Task 2.1, Task 2.2, Task 2.3 and Task 2.4.

Note export the important outputs from RapidMiner as jpg image files and include these screenshots in the relevant Task 2 sections and/or appendices of your Assignment 3 Report.

Note you will find the Sharda et al. 2018 and North Text books useful references for the data mining process activities conducted in Task 2 in relation to the exploratory data analysis, decision tree analysis, logistic regression analysis and evaluation of the comparative performance of the Final Decision Tree model and the Final Logistic Regression model.

Task 3 (Worth 30 marks)


The aviation-wildlife.xlsx lists historical data recorded for USA Aviation industry regarding wildlife strikes with aircraft for the years 2000 to 2011. See Table 3.1 which provides the Data dictionary for aviation-wildlife.csv Data set. It is important you understand the variables in this data set in order to build the required Aircraft Wildlife Strikes (AWS) dashboard with four specified Tableau views.


Table 3.1 Data dictionary for aviation-wildlife.csv Data set

Variable Name
Data Type
Description
1.
Aircraft:Type
Categorical
Aircraft, Helicopter
2.
Airport:Name
Categorical
Name of Airport
3.
Altitude-Bin
Categorical
< 1000 Metres, > 1000 Metres, Unknown
4.
Aircraft:Make/Model
Categorical
Make and Model of Aircraft
5.
Wildlife: Number struck
Categorical
Range of numbers
6.
Effect: Impact to flight
Categorical
None, Aborted Take-off, Engine Shut Down,



Precautionary Landing, Other
7.
Effect: Other
Categorical
Text remarks recorded for flight
8.
Location: Nearby if en route
Categorical
State Abbreviation
9.
Aircraft: Flight Number
Real

10.
FlightDate
Date
Date of Flight
11.
Record ID
Integer
Record ID – unique integer number
12.
Effect: Indicated Damage
Categorical
No Damage, Caused Damage
13.
Location: Freeform en route
Categorical
Text remark recorded for flight
14.
Aircraft: Number of engines?
Integer
1, 2, 3 or 4
15.
Aircraft: Airline/Operator
Categorical
Airline Operator
16.
Origin State
Categorical
Flight Origin State
17.
When: Phase of flight
Categorical
Take-off run, Approach, Climb, En-route,



Landing Roll
18.
Conditions: Precipitation
Categorical
Fog, None, Rain, Snow
19.
Remains of wildlife collected?
Categorical
False, True
20.
Remains of wildlife sent to
Categorical
False, True

Smithsonian


21.
Remarks
Categorical
Text remarks recorded regarding aviation –



wildlife collusion
22.
Reported: Date
Date
Date Aircraft collusion with wildlife reported
23.
Wildlife:Size
Categorical
Small, Medium, Large




24.
Conditions: Sky
Categorical
No Cloud, Overcast, Some Cloud
25.
Wildlife: Species
Categorical
Different types of wildlife mainly birds
26.
When: Time (HHMM)
Categorical
24 hour format
27.
When: Time of day
Categorical
Dawn, Day, Night, Dusk
28.
Pilot warned of birds or wildlife?
Categorical
Y = Yes, N = No
29.
Cost: Aircraft time out of service
Integer


(hours)


30.
Cost: Other (inflation adj)
Integer

31.
Cost: Repair (inflation adj)
Integer

32.
Cost: Total $
Integer

33.
Miles from airport
Integer

34.
Feet above ground
Integer

35.
Number of human fatalities
Integer

36.
Number of people injured
Integer

37.
Speed (IAS) in knots
Integer


Task 3 requires you build a Tableau dashboard which includes four different views of the aviation-wildlife.csv data set for the years 2000-2011 as specified in sub Tasks 3.1, 3.2, 3.3 and 3.4.


Task 3.1 Create a Tableau View of the impact of wildlife strikes with aircraft over time for a specific origin state. Provide a screen capture of and describe the Tableau view you have created and comment on the different types of impact to aircraft from wildlife strikes over time and does this differ much for different origin states (About 125 words).


Task 3.2 Create a Tableau View of flight phase by time of the day which shows when wildlife strikes with aircrafts occur. Provide a screen capture of and describe the Tableau view you have created and comment on which phase of a flight and time of the day wildlife strikes with aircraft are more likely to occur (about 125 words)


Task 3.3 Create a Tableau View that compares wildlife species in order of aircraft strike frequency and the chance of damage occurring. Provide a screen capture of and comment on which wildlife species are most frequently involved in aircraft strikes and which wildlife species are most likely to have the most impact in terms of damage (total cost) when an aircraft strike occurs (about 125 words).


Task 3.4 Create a Tableau GeoMap View of flights by origin states that displays the number of wildlife strikes and total monetary cost for each origin state for different periods of time. Provide a screen capture of and describe the Tableau view you have created and comment on this Tableau GeoMap View in relation to the number of wildlife strikes by origin state and total monetary cost over time. A number of origin states cannot be plotted on the geomap view as these are outside USA, comment on how you can deal with this issue (About 125 words).


Note: you need copy the four Text Table / Graph views and the dashboard you have created in Tableau using the Worksheet Menu Copy or Export Image option and include in the Task 3 section where relevant or in Appendix 3 of Assignment 3 report.

Task 3.5 Provide screen snapshot of your AWS Dashboard and an accompanying rationale (drawing on the relevant literature for good dashboard design) for the graphic design and functionality that is provided by your AWS Dashboard for the four specified Tableau views for sub Tasks 3.1, 3.2, 3.3 and 3.4 (About 500 words).

Note Stephen Few is considered to be the Guru for good Dashboard Design and has wrote a number of books on this topic. Worth having a look at his website https://www.perceptualedge.com/about.php and in particular his examples of poorly designed dashboard views and his suggestions for better dashboard views.

Report presentation writing style and referencing (worth 15 marks)

Presentation: Cover page, table of contents, page numbers, headings, sub headings, tables and diagrams, use of formatting, spacing, paragraphs,

Writing style: Use of English (Correct use of language and grammar. Also, is there evidence of spelling-checking and proofreading?)

Quality of research evident by appropriate referencing: Appropriate level of referencing in text where required for a sub task, reference list provided, used Harvard Referencing Style correctly

Assignment 3 Report should be structured as follows:


Assignment 3 Cover page
Table of Contents
Task 1 Main Heading

Task 1 Sub Tasks – Sub headings for Tasks 1.1 and 1.2 Task 2

Task 2 Sub Tasks – Sub headings for Task 2.1, 2.2, 2.3 and 2.4 Task 3

Task 3 Sub Tasks – Sub headings for Task 3.1, 3.2, 3.3, 3.4 and 3.5
List of References
List of Appendices

You must submit two files for Assignment 3:


1.     Assignment 3 Report for Tasks 1, 2 and 3 in Word document format with extension .docx
2.     Tableau packaged workbook with the extension .twbx which must contain required four Text Table / Graph views and a dashboard which consolidates these four Tableau views for Task 3

You must use the following file naming convention:

1.     Studentno-Studentname-CIS8008Ass3.docx

2.     Studentno-Studentname-CIS8008Ass3.twbx

You must use Harvard referencing style – Harvard referencing resources
Install a bibliography referencing tool – Endnote which integrates with your word processor. http://www.usq.edu.au/library/referencing/endnote-bibliographic-software or alternatively use an online citation tool such as Zetoro or You Cite This For Me

USQ Library - how to reference correctly using Harvard referencing system https://www.usq.edu.au/library/referencing/harvard-agps-referencing-guide

To get solution visit our website www,sourceessay.com

Comments

Popular posts from this blog

Organisational Best Practice

Why Students Are Stressed Today

Project Management Assignment