Recent Question/Assignment
SIT718 Real world Analytics Assignment
Total Marks = 100, Weighting - 30%
Due date: 3 May 2018 by 11.30 PM
Assignment (pdf or MS word doc and appropriate programme files with your codes) must be submitted via CloudDeakin’s Assignment Dropbox. You can submit an electronic version of your assignment (photos of written document are not accepted). No hard copy or email submissions are accepted.You should label all figures and tables.
This assignment assesses :
ULO1: Apply the concepts of multivariate functions to summarise datasets. ULO2: Analyse datasets by interpreting model and function parameters of important families of multivariate functions.
ULO3: Transform a real-life problem into a mathematical model.
ULO4: Apply linear programming concepts to make optimal decisions.
ULO6: Obtain optimal solutions for quantities that are either continuous or discrete.
This assignment consists of two parts: Part A and Part B. Each part is allocated 50 marks and contributes with 15% to the final mark.
1
Part A: Analysis of Energy Efficiency Dataset for Buildings Description:
In order to design energy efficient buildings, the computation of the Heating Load (HL) and the Cooling Load (CL) is required to determine the specifications of the heating and cooling equipment needed to maintain comfortable indoor air conditions. Energy simulation tools are widely used to analyse or forecast building energy consumption. The Dataset provides energy analysis of Heating Load (denoted as Y1) and the Cooling Load (denoted as Y2) using 768 building shapes that are simulated using a building simulator. Select one of Y1 or Y2 as your variable of interest and focus the analysis on this variable. The dataset comprises 5 features (variables), which are denoted as X1, X2, X3,X4,X5. The description of the variables is given below:
X1: Relative compactness in percentage (expressed in decimals) - A measure of building compactness. A high value means highly compact.
X2: Surface area in square metres
X3: Wall area in square metres
X4: Roof area in square metres
X5: Overall height in metres
Y1: Heating load in kWh.m-2 per annum
Y2: Cooling load in kWh.m-2 per annum
Tasks:
1. Understand the data [10 marks]
(i) Download the txt file (ENB18data.txt) from CloudDeakin and save it to your R working directory.
(ii) Assign the data to a matrix, e.g. usingthe.data - as.matrix(read.table(-ENB18data.txt-))
(iii) Decide whether you would like to investigate Heating Load (Y1) or Cooling Load(Y2). This is your variable of interest. Generate a subset of 300 data, e.g. using:
To investigate Heating Load Y1: my.data - the.data[sample(1:768,300),c(1:5,6)]
To investigate Cooling Load Y2:
my.data - the.data[sample(1:768,300),c(1:5,7)]
(iv) Using scatterplots and histograms, report on the general relationship between eachof the variables X1,X2, X3, X4 and X5 and your variable of interest Y1 (heating load) or Y2 (cooling load). Include a scatter plot for each of the variables X1, X2, X3, X4, X5 and your variable of interest Y1 or Y2. Include a histogram for X1,X2,...,X5, and Y1 or Y2. Include 1 or 2 sentences about the relationships and distributions.
2. Transform the data [15 marks]
(i) Choose any four from the first five variables X1,X2,X3,X4,X5.
Make appropriate transformations to the variables (including Y1 or Y2) so that the values can be aggregated in order to predict the variable of interest (your selected Heating Load Y1, or cooling load Y2). The transformations should reflect the general relationship between each of the four variables and the variable of interest. Assign your transformed data along with your transformed variable of interest to an array (it should be 300 rows and 5 columns). Save it to a txt file titled ”name-transformed.txt” using
write.table(your.data,-name-transformed.txt-,)
(ii) Briefly explain each transformation for your selected variables and the variable ofinterest Y1 or Y2. (1- 2 sentences each).
3. Build models and investigate the importance of each variable. [15 marks]
(i) Download the AggWaFit.R file (from CloudDeakin) to your working directory andload into the R workspace using, source(-AggWaFit718.R-)
(ii) Use the fitting functions to learn the parameters for
• Weighted arithmetic mean (WAM),
• Weighted power means (PM) with p = 0.5, and p = 2,
• Ordered weighted averaging function (OWA), and
• Choquet integral.
(iii) Include two tables in your report - one with the error measures (RMSE, Av.abs error,Pearson correlation, Spearman correlation) and one summarising the weights/parameters that were learned for your data.
(iv) Compare and interpret the data in your tables. Be sure to comment on:
(a) How good the model is,
(b) The importance of each of the variables (the four variables that you have selected),(c) Any interaction between any of those variables (are they complementary or redundant?)
(d) better models favour higher or lower inputs (1-2 paragraphs for part (iv)).
4. Use your model for prediction. [10 marks]
(i) Using your best fitting model, predict the Heating Load Y1 or the Cooling Load Y2 for the following input:
X1=0.82, X2=612.5, X3=318.5, X4=147, X5=7.
Give your result and comment on whether you think it is reasonable. (1-2 sentences)
(ii) Comment generally on the ideal conditions (in terms of your 4 variables) under whicha low heating or cooling load will occur. (1-2 sentences)
For this part, your submission should include:
1. A report (created in any word processor), covering all of the items in above. With plots and tables it should only be 2 - 3 pages.
2. A data file named “name-transformed.txt” (where ‘name’ is replaced with your name you can use your surname or first name - just to help us distinguish them!).
3. R code file, (that you have written to produce your results) named ”name-code.R”, where name is your name;
Part B: Optimisation
1. A food factory is making a special Juice for a customer from mixing two different existing products JA and JB. The compositions of JA and JB and prices ($/l) are given as follows,
Amount (l) in /100 l of JA and JB
Carrot Orange Apple Cost ($/l)
JA 4 6 3 6
JB 8 3 6 5
The customer requires that there must be at least 3.5 litres Orange and at least 4 litres of Apple concentrate per 100 litres of the Juice respectively, but no more than 6 litres of Carrot concentrate per 100 litres of Juice. The customer needs at least 50 litres of Juice per week.
a) Formulate a Linear Programming (LP) model for the factory that minimises the totalcost of producing the Juice while satisfying all constraints.
b) Use the graphical method to find the optimal solution. Show the feasible region andthe optimal solution on the graph. Annotate all lines on your graph. What is the minimal cost for the product?
[25 marks]
2. A factory makes three products (fabrics): Summer, Autumn, and Winter from three materials containing: Cotton, Wool and Viscose. The following table provides details on the sales price, production cost and purchase cost per ton of products and materials respectively.
Sales price Production cost Purchase price
Summer $50 $4 Cotton $30
Autumn $55 $4 Wool $45
Winter $60 $5 Viscose $40
The maximal demand (in tons) for each product, the minimum cotton and wool proportion in each product is as follows.
Demand min Cotton proportion min Wool proportion
Summer 4500 60% 30%
Autumn 4000 60 % 30%
Winter 3800 40% 50%
Formulate a LP model for the factory that maximises the profit, while satisfying the demand and the cotton and wool proportion constraints.
Solve the model using IBM ILOG CPLEX. What are the optimal profit and optimal values of the decision variables?
Hints:
1. Let xij = 0 be a decision variable that denotes the number of tons of products j for j ? {1 = Summer,2 = Autumn,3 = Winter} to be produced from Materials i ?{C=Cotton, W=Wool, V=Viscose}.
2. The proportion of a particular type of Material in a particular type of Product can becalculated as:
e.g., the proportion of Cotton in product Summer is given by: .
[25 marks]
Submission
Submit to the SIT718 Clouddeakin Dropbox.
Combine the report from part A and the Solutions from part B in ONE pdf file. Copy and paste your CLEX code to Solutions for Part B. Label the file with name.pdf, where ‘name’ is replaced with your name - you can use your surname or first name - to help distinguish them!).
Your final submission should consist of no more than 4 files:
1. One pdf file (created in any word processor), containing the report of Part A, the Solutions of the two questions of Part B, including CPLEX code, labelled with your name. This file should be no more than 5-6 pages.;
2. A data file named “name-transformed.txt” (where ‘name’ is replaced with your name;
3. A code with your R file, labelled with your name.R;
4. A code with your CPLEX file, labelled with your name.mod, also copy the code in your solution document.