Assignment SI Dr. Javed Iqbal Fall 21 New

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

Assignment: Statistical Inference (with Econometrics Lab)

(Instructor: Dr. Javed Iqbal)


nd
Due Date: 2 last class of the semester
Max Marks: 20 Max Pages: 20
------------------------------------------------------------------------------------------------------------
Submit a hard copy of maximum 20 MS word pages using ‘Time New Roman’, font 12 paragraph
aligned on both right and left (as in this document). Students can work individually, in a group of
2 or a group of 3 people. Any 4th member will get zero marks. Please do not share your work
(especially soft copy) with other groups as both groups will be penalized. Plagiarism is not
tolerated. Some suggested assignment topics and tasks are provided that use real world datasets.
If you want to work on any other project related to of data comparison and relationship study that
is not in this list, you must get approval from me before November 10, 2021. The Histogram can
also be constructed using the online tools e.g. https://www.statskingdom.com/histogram-maker.html.
Please provide appropriate information about your group (full name as per the ERP) on the title
page of assignment. Group members may also belong to different sections. Please ensure
sufficiently large sample size for your analysis.
Project Title: ________________________________________________
Name ERP ID Class ERP (e.g. 5139) Percent Contribution Signature
1.
2.
3.
This signature means two things (i) The assignment is not plagiarized (ii) all the group members
have contributed towards the completion of assignment and no one is a free rider.

1) Are different continents of the world equally good in sports? One way to answer this
question is to compare the average of the total number of (gold, silver, bronze) summer
Olympic medals won per capita by countries in each continent. You can consider the
record of last three Olympics only. Aggregate the number of total medals for each continent
(Asia, Europe, South America, North America, Africa) (ignore Australia) and investigate
this question using pairwise t –tests and ANOVA. Then consider the cross country analysis
of y = total number of medals in last three Olympics by each country and relate it to some
explanatory variables e.g. per capita GDP, population size etc. of that country. Perform the
relevant analysis and prepare a report. Use your multiple regression model predict the
number of total medals won for this country and compare this prediction with actual medals
of this country. Make 5 such predictions and present the information in a table. Prepare a
report of your analysis. Data can be searched using the relevant Olympics website or
Wikipedia. Select at least 15 countries from each continent.

2) The life expectancy is an important measure of soundness of health system in countries.


The World Bank’s site https://data.worldbank.org/indicator/SP.DYN.LE00.IN contains
data of life expectancy for different countries for both males and female and for different
recent and past years. You can compare the life expectancy across the two genders and
across group of countries i.e. developing countries and developed countries or countries in
different continents of the world via descriptive statistics. Also the data of economic
wellbeing i.e. per capita GDP is available at
https://data.worldbank.org/indicator/NY.GDP.PCAP.CD?view=chart. You can investigate
the relationship b/w life expectancy and economic wellbeing and other relevant variables
through simple and multiple regression. Make scatter plot, ANOVA, estimating regression
relationship, measuring goodness of fit and writing a report of your results. Use your
multiple regression model predict the life expectancy for a country and compare this
prediction with actual life expectancy of this country. Make 5 such predictions and present
the information in a table. Select at least 15 countries from each group.

3) Analysis of text book prices of various subjects on Amazon. Consider the Amazon Text
Book section (https://www.amazon.com/Best-Sellers-Books-Textbooks/). Compare the
average prices of new paperback books for sale across the subjects (e.g. Business and
Finance, Communication and Journalism, Computer Science, Engineering, Medical and
Health Sciences, Social Sciences etc.) using pairwise t tests and ANOVA. Then related the
y = price ($) to relevant characteristics e.g. number of total ratings, number of stars, years
since publication etc. via simple and multiple regression. Use your multiple regression
model predict the text book price for a text book and compare this prediction with actual
price of this book. Make 5 such predictions and present the information in a table. Perform
analysis and write a report of your analysis. Select at least 15 books from each group.

4) Fortune-500 is a list updated annually of top 500 US companies.


(https://fortune.com/fortune500/search/). For a particular year (e.g. 2021), consider the
variable Profits and Assets and construct a new variable ‘Returns on Asset’(ROA) by
diving the profits by assets and expressing in percentage. The ROA is a measure of
profitability. Perform a comparative
descriptive analysis of the following sectors (1) Technology (2) Financial (3) Food,
Beverages and Tobacco (4) Energy (5) Industrial (6) Hotels, Restaurants and Lesirue.
These can be found by searching the relevant sector. The comparison is to be based on
descriptive statistics, histograms, charts etc. Then use the individual company data and
ROA as y variable and conduct a simple and multiple regression analysis using x variables
e.g. size of firm measured by ‘market capitalization’. Use your multiple regression model
predict ROA for a firm and compare this prediction with ROA of this firm. Make 5 such
predictions and present the information in a table. Perform analysis and write a report of
your analysis. Select at least 15 companies from each sector.

5) Car prices and their characteristics. The pakwheels website


https://www.pakwheels.com/new-cars/pricelist, or the UAE’s website
(https://uae.dubizzle.com/motors/used-cars/) provide prices of various makes and models
of new and used cars and their characteristics e.g. engine capacity (cc), mileage,
transmission type etc. Used the data of prices of groups of different make and model of
cars and present tables of descriptive statistics and various graphs e.g. histogram of prices,
bar charts etc. Then use the characteristics of the car to model its y = price via simple and
multiple regression analysis with suitable x variables. Use your multiple regression model
predict price for a car and compare this prediction with actual price of this car. Make 5
such predictions and present the information in a table. Perform analysis and write a report
of your analysis. Select at least 15 cars from each group.

6) How are world university rankings related to student/faculty ratio? This question can be
answered by looking at latest QS TOPUNIVERSITIES
(https://www.topuniversities.com/universities) and other relevant sources. Compare the
average student faculty ratio across countries/regions of the world or across subjects via t-
tests (pairwise) and ANOVA using at least 15 observations. Then for individual
universities relate the y = student/faculty ratio to relevant explanatory variables e.g. total
QS score, international students etc. Use your multiple regression model predict student
faculty ratio for a university and compare this prediction with actual ratio of this university.
Make 5 such predictions and present the information in a table. Perform analysis and write
a report of your analysis.

7) Real estate market of Karachi. The website https://www.zameen.com displays offered


prices of various plots, bungalows, apartments and other real-estate for various cities of
Pakistan. Focus on the specific type of real-estate e.g. a 3 bed apartment in Karachi.
Compare the randomly selected apartment prices across 5 selected areas or zones of
Karachi and observe various characteristics e.g. square feet, story and other relevant
quantitative and qualitative features. Compare the prices of apartment over different
areas/zones and present tables of descriptive statistics and various graphs e.g. histogram of
prices, bar charts etc. Then use the characteristics of the houses to model its y = price via
regression analysis with suitable x variables e.g. number of rooms, square feet etc. Use
your multiple regression model predict price for a real estate and compare this prediction
with actual price of this real estate. Make 5 such predictions and present the information in
a table. Perform analysis and write a report of your analysis Select at least 15 properties
from each area.

8) Comparing box office revenues of different types (e.g. action, crime, animated, comedy,
documentary etc.) of Hollywood movies released over last 10 years. You can collect data
on at least 15 movies of each type and compare the average box office revenue across the
movie types via t-tests (pairwise) and ANOVA. Then you can relate the y = box office
revenue of an individual movie with the explanatory variables e.g. number of weeks run,
movie type (coded as dummy variables). The data can be collected from relevant industries
e.g. for the UK the Statistical Year Book for various years can be accessed from
https://www.bfi.org.uk/industry-data-insights. For revenue data. Other relevant data can
also be accessed from relevant Wikipedia articles. Use your multiple regression model
predict box-office revenue for a movie and compare this prediction with actual box-office
revenue of this movie. Make 5 such predictions and present the information in a table.
Perform analysis and write a report of your analysis. Select at least 15 movies from each
type.
9) Consider estimating risk and return and classifying riskiness of stocks listed on the US
stock market. The website https://finance.yahoo.com/lookup/ contains the list of symbols
from where you can select company names (select at least 10 companies). Select the desired
companies and go to historical data. In the ‘Time Period’ select at least 10 years starting
from Jan 2008 to present. In the ‘Frequency’ select monthly and click Apply. Thus your
analysis is based on monthly data of 10 years. Download closing price (5th column) data in
excel format. From prices calculate return as percent change. Also select the data on closing
monthly prices for the market index i.e. S&P500 index (symbols: ^GSPC).
𝑅𝑖𝑡 = 𝛽0 + 𝛽1 𝑅𝑚𝑡 + 𝜖𝑖𝑡 , i stands of ith company and t stands for tth month.
Here 𝛽1 provides estimate of systematic risk of security, where a value greater than 1
indicates an aggressive or risky security and a beta1 of less than 1 indicates a defensive
security or less risky than the market. For all the companies compute descriptive statistics
and present them in a table. Compare the companies with respect to their financial
soundness for investors. For each stock compute beta, R-Sq of regression,
𝑅̅ −𝑅
𝑓 𝑅̅ −𝑅
Sharpe Ratio = 𝑆𝑡.𝐷𝑒𝑣(𝑅) and Tryenor’s Ratio = 𝐵𝑒𝑡𝑎𝑓 , where beta is the systematic risk of
the security as estimated above and 𝑅𝑓 is risk free rate of return e.g. on 30 day treasury
bills. These are measures of risk adjusted returns thus measure investment attractiveness
of stocks. You can work on full sample period and some sub-periods e.g. before 9/11, after
9/11 or pre-post global financial crises of 2007 etc. to investigate what changes are brought
about in the risk profile of companies.

10) Do similar work of for Pakistan’s stock exchange data. https://www.psx.com.pk. Go to


data portal. Select historical data and symbol search. The data access is very unfriendly.
However Dr. Attaullah Shah of Institute of Management Sciences Peshawar is maintaining
historical prices and other data at www.opendoors.pk. You have to make effort to collect
monthly data closing prices. KSE100 index historical data are also given. Stock Price data
can also be accessed from the Business Recorder website
(https://markets.brecorder.com/market-data/company-archives.html). Alternatively,
DataStream database can be used to collect prices. This database is available at IBA library.

11) Risk and Returns of Pakistani Industries: Collect the historical monthly closing stock price
data of firms belonging to different sectors of Pakistan Stock Exchange. The sector and
constituent firms can be selected as follows:
https://markets.brecorder.com/market-data/information-by-sectors.html

Automobile Assembler: Atlas Honda, Dewan Farooq Motors, Pak Suzuki Motor Company, Indus
Motor Company, Millat Tractors, Hino Pak Motors, Al-Ghazi Tractors, Sazgar
Engineering Works
Cement: Attock Cement, Bestway Cement, D.G. Khan Cement, Dadhabhoy Cement, Dewan
Cement, Fauji Cement Company, Fecto Cement, Lucky Cement
Chemicals: Berger Paints Pakistan, Biafo Industries, ICI Pakistan, Pakistan PVC, Colgate
Palmolive, Engro Polymer and Chemicals, Ittehad Chemicals, Sitara Chemicals
Engineering: Ados Pakistan, Agha Steel Industries, Aisha Steel Mills, Bolan Castings, Hufffaz
Seamless Pipes Industries, Pakistan Engineering Corporation, Quality Steel Works, Ittefaq
Iron Industries
Oil and Gas: Mari Petroleum Company, Oil and Gas Development Company (OGDC), Pakistan
Oilfields, Pakistan Petroleum, Attock Petroleum, Shell Pakistan, Sui Northern, Sui
Southern
Pharmaceuticals: Abbot Laboratories, Citi Pharma, Ferozsons Laboratories, GlaxoSmithKline
Pakistan, The Searl Company, Wyeth Pakistan, Macter International, Sanofi-Anentis
Pakistan
Consider monthly closing price data from Jan 2015 to 2021 (latest available month).
(https://markets.brecorder.com/market-data/company-archives.html). Select terms as monthly
and choose date from Jan 2015 to current month. If complete data for a firm are not available,
you can select another firm from the same sector for which complete data are available. After
collecting price data, compute monthly returns (percentage change in price) for each firm. Then
form (equally weighted) portfolio returns for each sector (average of returns for firms within a
sector). You should get a time series variable of portfolio returns for each of the six sectors.
Also collect data on closing value of KSE-100 Index and compute returns (market returns).
Fixing a sector of your choice (e.g. Chemicals), perform t- tests of equality of mean returns for
sector for each pair of sectors. Then conduct an ANOVA for testing the equality of all sector
mean returns (assuming constant variance of returns for each sector portfolio). Next conduct
a regression analysis on individual firm returns on market returns and assess the riskiness of
each firm (beta risk). You can also use other variables e.g. inflation or exchange rate in the
regression for which you need to collect monthly data on these variables (e.g. from IFS). Write
a report of your analysis.

12) Consider the popular macroeconomic data set ‘Penn World Tables latest version 10’
(http://www.rug.nl/ggdc/productivity/pwt/). Download data in Excel. This database
contains macroeconomic data for about 180 countries from 1950 to 2019. Consider the
variable Human Capital index which measure the quality of human capital based on
education and returns to education. First sort the data with respect to year from largest to
smallest so that you should have data of 2019 for all countries on top. Collect data of human
capital (hc) for all countries for 2019 only. Compare and analyze human capital quality
across groups of countries. e.g. across the continents of world or across development level
i.e. with respect to per capital income. (you can use the following world bank site to classify
countries with respect to per capita income
https://datahelpdesk.worldbank.org/knowledgebase/articles/378833-how-are-the-income-
group-thresholds-determined. Check the worksheet ‘County Analytical History’. Use 2019
data to classify countries Low Income (L), Lower middle income (LM), Upper middle
income (UM), and high income (H). For each group of countries collect the data of human
capital from the PWT for year 2019 and compare the ‘hc’ across group of countries. Next
you can link this variable with other interesting variable i.e. size of government measured
by proportion of government expenditure in GDP (csh_g) to examine the relationship using
simple and multiple regression analysis. Use your multiple regression model predict hc for
a country and compare this prediction with actual hc of this country. Make 5 such
predictions and present the information in a table. Perform analysis and write a report of
your analysis. Write a report of your analysis. Select at least 15 countries from each group.

13) Consider the cricket record of players and countries available at


(http://www.howstat.com/cricket/Statistics/Players/PlayerMenu.asp). This database
contains record of all the three formats of cricket. Search your desired player in the search
box. Then in the Performance Analysis go to Career Innings-Batting (Detailed). Compare
selecting player’s performance using descriptive statistics and plots e.g. histogram, box
plots. Also measure player’s consistency using CV. The performance should be based on
latest record of at least 50 innings. Comparison (via pairwise t tests and ANOVA) should
be meaningful e.g. compare players average ODI score at particular position i.e. ‘opener’
to other opener from the same or other country. Then relate the player’s score in an innings
(y) with relevant x variables (players age etc.) in a regression framework. Use your
multiple regression model predict score for a player and compare this prediction with actual
score of this player. Make 5 such predictions and present the information in a table.
Perform analysis and write a report of your analysis. Write a report of your analysis.

14) Estimate and compare total time in hours allocated or used in last week by students of IBA
on social media or watching electronic media across gender and across year of study. For
this case you have to design a questionnaire based survey asking students about the relevant
aspects. Input data in excel and analyze using descriptive statistics and regression analysis.
Get the survey form approved from me before conducting survey.

15) Consider the cross country data on Education Index (http://hdr.undp.org/en/data#).


Download the data set in Excel. This variable measures the quality of education in a
country, the higher value indicates better education quality. The data are provided for years
from 1990 through 2018. First consider the year 2018. Sort the data for different geographic
regions. Compare the Education Index over these regions via descriptive statistics, tables,
charts, box plots, histograms etc. Also compare the Education Index for year 1990 and
2018. Also perform simple and multiple regression analysis to relate the y = ‘education
index’ with other variables e.g. x = HDI index ranking. Use your multiple regression model
predict education index for a country and compare this prediction with actual education
index of this country. Make 5 such predictions and present the information in a table.
Perform analysis and write a report of your analysis. Write a report of your analysis.

16) Consider comparing and investigating relationships between price and other characteristics
within a brand or across brands of products e.g. between mobile phones. The website
www.mega.pk contains such information. Select product e.g. laptops or mobile, and a
brand e.g. Dell (core i5 or any particular core). Note down prices (PKR) and some
important characteristics which can be related to price e.g. processing speed (GHz), RAM,
hard disk storage, for mobiles (Samsung, Apple, Huawei, Lenovo etc), select x variables
e.g. screen size (inches), battery life (mAh), camera quality (MP). Perform a descriptive
analysis of prices and comparisons across brands with similar characteristics. Within each
brand you can investigate relationship of dependent variable price with related
characteristics e.g. price vs size, battery life, camera quality etc. Investigate which
characteristic is a good predictor of price e.g. using R-Sq of regression and through scatter
plot. Then select any product e.g. a particular mobile from this website and note down its
features (x variables) and using your multiple regression model predict its price and then
compare this prediction with actual price given for the product. Make 5 more such
predictions and present information in a table.

17) The Quality of Government (QoG) Institute conducts surveys on different aspects of
quality of government intuitions e.g. transparency, level of corruption etc. for different
countries. Consider the Expert Survey data provided in the csv file on QoG Country-Level
Survey Data 2015. (https://qog.pol.gu.se/data/datadownloads/qogexpertsurveydata). The
data correspond to different questions on government quality. The questions are available
at (https://www.qogdata.pol.gu.se/data/qog_exp_15.pdf). Consider Q-9a which is related
to percentage of aid amount actually delivered to the needy persons. This percentage is
provided for all the countries and can be considered an estimate of level of transparency in
public sector. Provide a descriptive analysis including histogram, charts, summary
statistics, and box plots across the geographic regions (ht-region). Also conduct regression
analysis relating this y=’level of transparency’ to other variable you think might be related
to this e.g. an x variable in Q2a (to what extent merit and skills are considered for public
sector employment). Write a report of your findings

18) Analysis of financial ratios: Consider the SBP database: Analysis of Financial Statement
for Non-Financial Companies. The latest document (2014-2019) is available at:
(https://www.sbp.org.pk/departments/stats/FSA(Non).pdf). The document contains annual
data for each company and overall industry sector for all the non-financial firms listed on
Pakistan Stock Exchange. (There is a separate document for financial firms). The list of
sectors and constituent firms is on p-450 of the document. Consider any 6 sectors from
these sectors (1) Spinning, Weaving, Finishing, (2) Sugar (3) Chemicals &
Pharmaceuticals (4) Food Products (5) Manufacturing (6) Cement (7) Motor Vehicles,
Trailers & Auto parts (8) Fuel & Energy (9) Information, Comm. & Transport (10) Paper,
Paperboard & Products. Select any 8 firms of your choice from each of your selected
sectors. Consider the data for any latest year e.g. 2019. Consider any profitability ratio e.g.
Returns on Equity or Return on Assets. You should have profitability ratio data on 48 firms.
(6 sectors * 8 firms). Perform a graphical and descriptive comparative analysis of the
selected profitability ratio over the selected sectors. Compare the bivariate average
profitability ratio using t tests and test the average profitability of all the selected sectors
using the ANOVA. Then using individual firms’ data using y = profitability ratio, conduct
a regression analysis using appropriate x variables e.g. sales (F1), EBIT (F6), financial
leverage (P4) etc. Use your multiple regression model to make predictions for a firm that
is not in your selected sample. Predict profitability ratio for this firm and compare this
prediction with actual profitability ratio of this firm. Make 5 such predictions and present
the information in a table. Prepare a report of your analysis.

You might also like