Monday, June 17, 2013

plot lm() results with ggplot2 in R from Susan E Johnston Blog

A quick and easy function to plot lm() results with ggplot2 in R

Sometimes it's nice to quickly visualise the data that went into a simple linear regression, especially when you are performing lots of tests at once. Here is a quick and dirty solution with ggplot2 to create the following plot:

Let's try it out using the iris dataset in R:
data(iris)
head(iris)
##   Sepal.Length  Sepal.Width  Petal.Length  Petal.Width    Species
## 1     5.1           3.5          1.4           0.2         setosa
## 2     4.9           3.0          1.4           0.2         setosa
## 3     4.7           3.2          1.3           0.2         setosa
## 4     4.6           3.1          1.5           0.2         setosa
## 5     5.0           3.6          1.4           0.2         setosa
## 6     5.4           3.9          1.7           0.4         setosa
Create fit1, a linear regression of Sepal.Length and Petal.Width. Normally we would quickly plot the data in R base graphics:
fit1 <- lm(Sepal.Length ~ Petal.Width, data = iris)
summary(fit1)
## 
## Call:
## lm(formula = Sepal.Length ~ Petal.Width, data = iris)
## 
## Residuals:
##    Min       1Q      Median     3Q       Max 
## -1.3882   -0.2936   -0.0439   0.2643   1.3452 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|) 
## (Intercept)    4.7776    0.0729    65.5   <2e-16 ***
## Petal.Width    0.8886    0.0514    17.3   <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Residual standard error: 0.478 on 148 degrees of freedom
## Multiple R-squared: 0.669, Adjusted R-squared: 0.667 
## F-statistic: 299 on 1 and 148 DF, p-value: <2e-16 
## 
plot(Sepal.Length ~ Petal.Width, data = iris)
abline(fit1)

However, we can create a quick function that will pull the data out of a linear regression, and return important values (R-squares, slope, intercept and P value) at the top of a nice ggplot graph with the regression line:
ggplotRegression <- function (fit) {

require(ggplot2)

ggplot(fit$model, aes_string(x = names(fit$model)[2], y = names(fit$model)[1])) + 
  geom_point() +
  stat_smooth(method = "lm", col = "red") +
  opts(title = paste("Adj R2 = ",signif(summary(fit)$adj.r.squared, 5),
                     "; Intercept =",signif(fit$coef[[1]],5 ),
                     "; Slope =",signif(fit$coef[[2]], 5),
                     "; P =",signif(summary(fit)$coef[2,4], 5)))
}
After specifying this function, all you would have to run is:
fit1 <- lm(Sepal.Length ~ Petal.Width, data = iris)
ggplotRegression(fit1)
or even just
ggplotRegression(lm(Sepal.Length ~ Petal.Width, data = iris))

Disclaimer: This is not suitable for non-normal distributions or categorical variables!
About susanejohnston
Postdoctoral researcher at the University of Turku, interested in ecology, evolution and genetics.

Write Latex Article or Presentation www.writelatex.com

Source: https://www.writelatex.com/

35,290,776 pages compiled
  • Notice to all geeks: writelatex.com is awesome and it just saved my life. Check it out!
WriteLaTeX is free! Start writing now with one click. No sign-up required.

Getting Started

Many Eyes software for interactive data visualization

Try our featured visualizations

Most Commonly Reported Symptoms of MS
Most Commonly Reported Symptoms of MS (data from WHO, 2008),,bar-chart
Stalls at Petaling Street Flea Markets
Stalls offered at Petaling Street Flea Markets,,pie-chart
Tornadoes in Oklahoma by Month
Monthly Tornadoes for Oklahoma 1950 - 2012,weather oklahoma tornado,stack-graph
Data from WHO, 2008 Kuala Lampur 1950 - 2012
by aholik by KateKoedijk by MattSalad.com
Tourism to Australia
Tourism to Australia in March 2013,,bubble-chart
Africa Report Word Tree
Africa Report Word,Report Progress,word-tree
Box Office Gross of the Highest Grossing Films
The box office gross of the 1990-2012 highest grossing pictures.,gross movies oscars office million box billion,bar-chart
March 2013 2013 Report - Africa Progress Panel 1990 to 2012  

R Code – Graph Efficient Frontier


R Code – Graph Efficient Frontier

The following code, which is referenced here, can be cut and paste into the R console.
In order to load several of these libraries, make sure that you have installed the associated packages.  For assistance, please refer to the instructions contained here.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
# Economist at Large
# Modern Portfolio Theory
# Use solve.QP to solve for efficient frontier
# Last Edited 5/3/13
 
# This file uses the solve.QP function in the quadprog package to solve for the
# efficient frontier.
# Since the efficient frontier is a parabolic function, we can find the solution
# that minimizes portfolio variance and then vary the risk premium to find
# points along the efficient frontier. Then simply find the portfolio with the
# largest Sharpe ratio (expected return / sd) to identify the most
# efficient portfolio
 
library(stockPortfolio) # Base package for retrieving returns
library(ggplot2) # Used to graph efficient frontier
library(reshape2) # Used to melt the data
library(quadprog) #Needed for solve.QP
 
# Create the portfolio using ETFs, incl. hypothetical non-efficient allocation
stocks <- c(
 "VTSMX" = .0,
 "SPY" = .20,
 "EFA" = .10,
 "IWM" = .10,
 "VWO" = .30,
 "LQD" = .20,
 "HYG" = .10)
 
# Retrieve returns, from earliest start date possible (where all stocks have
# data) through most recent date
returns <- getReturns(names(stocks[-1]), freq="week") #Currently, drop index
 
#### Efficient Frontier function ####
eff.frontier <- function (returns, short="no", max.allocation=NULL,
 risk.premium.up=.5, risk.increment=.005){
 # return argument should be a m x n matrix with one column per security
 # short argument is whether short-selling is allowed; default is no (short
 # selling prohibited)max.allocation is the maximum % allowed for any one
 # security (reduces concentration) risk.premium.up is the upper limit of the
 # risk premium modeled (see for loop below) and risk.increment is the
 # increment (by) value used in the for loop
 
 covariance <- cov(returns)
 print(covariance)
 n <- ncol(covariance)
 
 # Create initial Amat and bvec assuming only equality constraint
 # (short-selling is allowed, no allocation constraints)
 Amat <- matrix (1, nrow=n)
 bvec <- 1
 meq <- 1
 
 # Then modify the Amat and bvec if short-selling is prohibited
 if(short=="no"){
 Amat <- cbind(1, diag(n))
 bvec <- c(bvec, rep(0, n))
 }
 
 # And modify Amat and bvec if a max allocation (concentration) is specified
 if(!is.null(max.allocation)){
 if(max.allocation > 1 | max.allocation <0){
 stop("max.allocation must be greater than 0 and less than 1")
 }
 if(max.allocation * n < 1){
 stop("Need to set max.allocation higher; not enough assets to add to 1")
 }
 Amat <- cbind(Amat, -diag(n))
 bvec <- c(bvec, rep(-max.allocation, n))
 }
 
 # Calculate the number of loops
 loops <- risk.premium.up / risk.increment + 1
 loop <- 1
 
 # Initialize a matrix to contain allocation and statistics
 # This is not necessary, but speeds up processing and uses less memory
 eff <- matrix(nrow=loops, ncol=n+3)
 # Now I need to give the matrix column names
 colnames(eff) <- c(colnames(returns), "Std.Dev", "Exp.Return", "sharpe")
 
 # Loop through the quadratic program solver
 for (i in seq(from=0, to=risk.premium.up, by=risk.increment)){
 dvec <- colMeans(returns) * i # This moves the solution along the EF
 sol <- solve.QP(covariance, dvec=dvec, Amat=Amat, bvec=bvec, meq=meq)
 eff[loop,"Std.Dev"] <- sqrt(sum(sol$solution*colSums((covariance*sol$solution))))
 eff[loop,"Exp.Return"] <- as.numeric(sol$solution %*% colMeans(returns))
 eff[loop,"sharpe"] <- eff[loop,"Exp.Return"] / eff[loop,"Std.Dev"]
 eff[loop,1:n] <- sol$solution
 loop <- loop+1
 }
 
 return(as.data.frame(eff))
}
 
# Run the eff.frontier function based on no short and 50% alloc. restrictions
eff <- eff.frontier(returns=returns$R, short="no", max.allocation=.50,
 risk.premium.up=1, risk.increment=.001)
 
# Find the optimal portfolio
eff.optimal.point <- eff[eff$sharpe==max(eff$sharpe),]
 
# graph efficient frontier
# Start with color scheme
ealred <- "#7D110C"
ealtan <- "#CDC4B6"
eallighttan <- "#F7F6F0"
ealdark <- "#423C30"
 
ggplot(eff, aes(x=Std.Dev, y=Exp.Return)) + geom_point(alpha=.1, color=ealdark) +
 geom_point(data=eff.optimal.point, aes(x=Std.Dev, y=Exp.Return, label=sharpe),
 color=ealred, size=5) +
 annotate(geom="text", x=eff.optimal.point$Std.Dev,
 y=eff.optimal.point$Exp.Return,
 label=paste("Risk: ",
 round(eff.optimal.point$Std.Dev*100, digits=3),"\nReturn: ",
 round(eff.optimal.point$Exp.Return*100, digits=4),"%\nSharpe: ",
 round(eff.optimal.point$sharpe*100, digits=2), "%", sep=""),
 hjust=0, vjust=1.2) +
 ggtitle("Efficient Frontier\nand Optimal Portfolio") +
 labs(x="Risk (standard deviation of portfolio)", y="Return") +
 theme(panel.background=element_rect(fill=eallighttan),
 text=element_text(color=ealdark),
 plot.title=element_text(size=24, color=ealred))
ggsave("Efficient Frontier.png")

Source:  (http://economistatlarge.com/portfolio-theory/r-optimized-portfolio/r-code-graph-efficient-frontier)

Create ggplot2 graphs with http://www.yeroon.net/ggplot2/

Create ggplot2 graphs with http://www.yeroon.net/ggplot2/

Statistics C183/C283: Statistical Models in Finance

Statistics C183/C283: Statistical Models in Finance

Announcements

  • First lecture is on Monday, 01 April 2013.
    Location: MWF MS 5200.
    Time: 11:00 - 11:50.
    See you then!
For the course syllabus click here.

Useful links:

Statistics Online Computational Resource (SOCR):
  • http://www.socr.ucla.edu

    It's online, therefore it exists!

    SOCR Educational Materials:
  • http://wiki.stat.ucla.edu/socr/index.php/SOCR_EduMaterials.

    Useful link:
  • Probability and Statistics EBook.
  • Download R and packages.

    Labs

    Handouts

  • 1. Historical note (from "Against the Gods, the Remarkable Story of Risk", by Peter Berstein, Wiley 1998).
  • 2. Introduction.
  • 3. Diversification - a simple example.
  • 4. Correlation coefficient and protfolio risk.
  • 5. Basics.
  • 6. Example with two stocks (portfolio possibilities curve, efficient frontier).
    For handout #6 also download the R commands and the files with the data using the following 3 links:
    a. R commands.
    b. First data set (table1.csv).
    c. Second data set (table2.csv).
  • 7. Why diversfication works.
  • 8. Short sales.
  • 9. An Analytic Derivation of the Efficient Portfolio Frontier (JFQA, Robert Merton, 1972).
  • 10. Efficient frontier with risk free lending and borrowing.
  • 11. How to find the point of tangency (see also handout 10).
  • 12. Point of tangency - example.
  • 13. Lab - example.
  • 14. R code for point of tangency.
  • 15. Sensitivity analysis.
  • 16. Trace out the efficient frontier - example.
  • 17. Trace out the efficient frontier - R code for example of handout 16.
  • 18. Single index model - summary.
  • 19. Adjusting the betas.
  • 20. Adjusting the betas using Vasicek's technique.
  • 21. Adjusting the betas using Blume's technique.
  • 22. Betas and their regression tendencies (Blume).
  • 23. Are betas best?
  • 24. Introduction to stockPortfolio package.
  • 25. How to supply your own data to use the stockPortfolio package.
  • 26. Project (more details will be discussed in class).
  • 27. Simple criteria for optimal portfolio selection.
  • 28. Single index model steps.
  • 29. Single index model - example with short sales allowed.
  • 30. Single index model - example with short sales not allowed.
  • 31. Single index model and optimization procedure give the same answer.
  • 32. Single index model - example using R.
  • 33. Single index model - example using R (same as handout 32).
  • 34. Single index model - Kuhn Tucker conditions when short sales are not allowed.
  • 35. Constant correlation model steps.
  • 36. Constant correlation - example with short sales allowed.
  • 37. Constant correlation - example with short sales not allowed.
  • 38. Constant correlation model - R example.
  • 39. Single index and constant correlation models using the stockPortfolio package.
  • 40. Trace out the efficient frontier when risk free lending and borrowing does not exist - an example using the stockPortfolio package.
  • 41. Simple criteria for optimal portfolio selection: Tracing out the efficient frontier.
  • 42. Simple criteria for optimal portfolio selection: The multi group case.
  • 43. Multigroup model.
  • 44. Multi group model using the stockPortfolio package.
  • 45. Multi-index model.
  • 46. Practice exam.
  • 47. Practice problems.
  • 48. Modern portfolio theory, 1950 to date.
  • 49. Portfolio performance.
  • 50. Plot 1 (see handout #46).
  • 51. Plot 2 (see handout #46).
  • 52. Options basics.
  • 53. Smiley faces - call option.
  • 54. Smiley faces - put option.
  • 55. Options - some simple examples.
  • 56. Payoff and profit for writer and buyer - call option.
  • 57. Payoff and profit for writer and buyer - put option.
  • 58. Lower and upper bounds for call and put options and put call parity.
  • 59. Trading strategies using options.
  • 60. Butterfly example using R.
  • 61. Access the SOCR applet.
  • 62. Trading strategies using options - Excel file.
  • 63. Binomial option pricing model - introduction.
  • 64. Binomial option pricing model - example.
  • 65. A model for stock prices.
  • 66. Monte Carlo simulaton of a stock's path.
  • 67. Ito process and Black-Scholes model.
  • 68. Estimating volatility - Excel file.
  • 69. Options - summary.
  • 70. Implied volatility.

    Homework

  • Homework 1: Due on Friday, 12 April.
  • Homework 2: Due on Friday, 19 April.
  • Homework 3: Due on Friday, 03 May.
  • Homework 4: Due on Friday, 17 May.
  • Homework 5: Due on Friday, 31 May.
  • Homework 6: Due on Wednesday, 05 June. 
  • Source:http://www.stat.ucla.edu/~nchristo/statistics_c183_c283/
  • Portfolio Optimization using Single Index Model in R

    Portfolio Optimization using Single Index Model in R

    # Load Libraries:
    library(fEcofin)
    library(fPortfolio)
    
    # Load Data:
    data(berndtInvest)
    berndt <- as.timeSeries(berndtInvest)
    names(data)
    data <- berndt[, -c(10, 17)]
    factors <- berndt[, 10]
    attr(data, "factors") <- factors
    
    # Sharpe's Single Index Factor Model:
    sharpeFactorEstimator <- 
    function(x, spec=NULL, ...)
    {
        # Sharpe Single Index Model:
        data <- getDataPart(x)
        factors <- attr(x, "factors")
        nScenarios <- nrow(data)
        X.mat <- cbind(rep(1, times=nScenarios), factors)
        G.hat <- solve(qr(X.mat), data)
        beta.hat <- G.hat[2, ]
        eps.hat <- data - X.mat %*% G.hat
        diagD.hat <- diag(crossprod(eps.hat) / (nScenarios-2))
        mu <- G.hat[1, ] + G.hat[2, ] * colMeans(factors)  
        Sigma <- var(factors)[[1]] * (beta.hat %o% beta.hat) + diag(diagD.hat)
        
        # Return Value:
        list(mu = mu, Sigma = Sigma)
    }
    
    # Solve Long Only Markowitz Model:
    markowitz <- portfolioFrontier(data)
    
    # Solve Long Only Sharpe Single Index Model:
    spec <- portfolioSpec()
    setEstimator(spec) <- "sharpeFactorEstimator"
    sharpe <- portfolioFrontier(data, spec)
    
    # Plot:
    tailoredFrontierPlot(markowitz)
    points(frontierPoints(sharpe), col = "steelblue")
     
    FUNCTION:                     DESCRIPTION:
    sharpeFactorEstimator()       Sharpe's single index model
    macroFactorEstimator()        General macroeconomic factor model
    industryFactorEstimator()     Barra industry factor model
    statisticalFactorEstimator()  Statistical factor model
    pcaFactorEstimator()          PCA statistical factor model
    asympcaFactorEstimator()      Asymptotic PCA statistical factor model
    

     Source: (https://www.rmetrics.org/blog/FactorModeling)
    References:
    Würtz, Chalabi, Chen, Ellis, Portfolio Optimization with R/Rmetrics, Rmetrics and Finance Online Publishing, Zurich 2009
    Zivot and Wang, Modeling Financial Time Series with S-PLUS. Springer Publishing New York, 2nd edition 2006
    Any Questions?
    Do not hesitate to contact us or come to the Meielisalp 2010 Workshop to meet the authors.