|University||Dublin Business School (DBS)|
|Subject||B9FT105: Data Analytics and Machine Learning|
Consider a real-world, relational dataset. This dataset must have at least 2 categorical and 2 continuous variables.
- Describe the dataset using appropriate plots/curves/charts,
- Consider one of the continuous attributes, and compute central and variational measures.
- For a particular variable of the dataset, use Chebyshev’s rule, and propose a one-sigma interval. Based on your proposed interval, specify the outliers if any.
- Explain how the box-plot technique can be used to detect outliers. Apply this technique for one attribute of the dataset.
- Select four variables of the dataset, and propose an appropriate probability model to quantify the uncertainty of each variable.
- For each model in part (a), estimate the parameters of the model.
- Express the way in which each model can be used for predictive analytics, then find the prediction for each attribute.
From your dataset, specify your input and output variables, then:
- Suggest an appropriate GLM to model output to input
- Split the dataset into 80% as a trainset and 20% test set, then model the trainset by fitting your proposed GLM.
- Specify the significant variables on the output variable at the level of 𝛼=0.05 and explore the related hypotheses test. Estimate the parameters of your model.
- Predict the output of the test dataset using the trained model. And provide the functional form of the optimal predictive model.
- Propose the appropriate measure of performance to evaluate the model and compute it for your derived model.