Monday, June 17, 2013

plot lm() results with ggplot2 in R from Susan E Johnston Blog

← Find and Replace in R, Part 1: recode in the library car

Find and Replace in R, Part 2: How to recode many values at once. →

A quick and easy function to plot lm() results with ggplot2 in R

August 9, 2012 2 Comments

Sometimes it's nice to quickly visualise the data that went into a simple linear regression, especially when you are performing lots of tests at once. Here is a quick and dirty solution with ggplot2 to create the following plot:

Let's try it out using the iris dataset in R:

data(iris)
head(iris)

##   Sepal.Length  Sepal.Width  Petal.Length  Petal.Width    Species
## 1     5.1           3.5          1.4           0.2         setosa
## 2     4.9           3.0          1.4           0.2         setosa
## 3     4.7           3.2          1.3           0.2         setosa
## 4     4.6           3.1          1.5           0.2         setosa
## 5     5.0           3.6          1.4           0.2         setosa
## 6     5.4           3.9          1.7           0.4         setosa

Create fit1, a linear regression of Sepal.Length and Petal.Width. Normally we would quickly plot the data in R base graphics:

fit1 <- lm(Sepal.Length ~ Petal.Width, data = iris)
summary(fit1)

## 
## Call:
## lm(formula = Sepal.Length ~ Petal.Width, data = iris)
## 
## Residuals:
##    Min       1Q      Median     3Q       Max 
## -1.3882   -0.2936   -0.0439   0.2643   1.3452 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|) 
## (Intercept)    4.7776    0.0729    65.5   <2e-16 ***
## Petal.Width    0.8886    0.0514    17.3   <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Residual standard error: 0.478 on 148 degrees of freedom
## Multiple R-squared: 0.669, Adjusted R-squared: 0.667 
## F-statistic: 299 on 1 and 148 DF, p-value: <2e-16 
##

plot(Sepal.Length ~ Petal.Width, data = iris)
abline(fit1)

However, we can create a quick function that will pull the data out of a linear regression, and return important values (R-squares, slope, intercept and P value) at the top of a nice ggplot graph with the regression line:

ggplotRegression <- function (fit) {

require(ggplot2)

ggplot(fit$model, aes_string(x = names(fit$model)[2], y = names(fit$model)[1])) + 
  geom_point() +
  stat_smooth(method = "lm", col = "red") +
  opts(title = paste("Adj R2 = ",signif(summary(fit)$adj.r.squared, 5),
                     "; Intercept =",signif(fit$coef[[1]],5 ),
                     "; Slope =",signif(fit$coef[[2]], 5),
                     "; P =",signif(summary(fit)$coef[2,4], 5)))
}

After specifying this function, all you would have to run is:

fit1 <- lm(Sepal.Length ~ Petal.Width, data = iris)
ggplotRegression(fit1)

or even just

ggplotRegression(lm(Sepal.Length ~ Petal.Width, data = iris))

Disclaimer: This is not suitable for non-normal distributions or categorical variables!

Filed under ggplot2, R

About susanejohnston
Postdoctoral researcher at the University of Turku, interested in ecology, evolution and genetics.

Write Latex Article or Presentation www.writelatex.com

Source: https://www.writelatex.com/

Online collaborative LaTeX editor with integrated rapid preview.

35,290,776 pages compiled

Notice to all geeks: writelatex.com is awesome and it just saved my life. Check it out!

WriteLaTeX is free! Start writing now with one click. No sign-up required.

Getting Started

Many Eyes software for interactive data visualization

Explore

Source: http://www-958.ibm.com/software/analytics/manyeyes/
Participate
Learn more

Try our featured visualizations

Most Commonly Reported Symptoms of MS	Stalls at Petaling Street Flea Markets	Tornadoes in Oklahoma by Month
Data from WHO, 2008	Kuala Lampur	1950 - 2012
by aholik	by KateKoedijk	by MattSalad.com

Tourism to Australia	Africa Report Word Tree	Box Office Gross of the Highest Grossing Films
March 2013	2013 Report - Africa Progress Panel	1990 to 2012

R Code – Graph Efficient Frontier

The following code, which is referenced here, can be cut and paste into the R console.
In order to load several of these libraries, make sure that you have installed the associated packages. For assistance, please refer to the instructions contained here.

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

# Economist at Large

# Modern Portfolio Theory

# Use solve.QP to solve for efficient frontier

# Last Edited 5/3/13

# This file uses the solve.QP function in the quadprog package to solve for the

# efficient frontier.

# Since the efficient frontier is a parabolic function, we can find the solution

# that minimizes portfolio variance and then vary the risk premium to find

# points along the efficient frontier. Then simply find the portfolio with the

# largest Sharpe ratio (expected return / sd) to identify the most

# efficient portfolio

library(stockPortfolio) # Base package for retrieving returns

library(ggplot2) # Used to graph efficient frontier

library(reshape2) # Used to melt the data

library(quadprog) #Needed for solve.QP

# Create the portfolio using ETFs, incl. hypothetical non-efficient allocation

stocks <- c(

 "VTSMX" = .0,

 "SPY" = .20,

 "EFA" = .10,

 "IWM" = .10,

 "VWO" = .30,

 "LQD" = .20,

 "HYG" = .10)

# Retrieve returns, from earliest start date possible (where all stocks have

# data) through most recent date

returns <- getReturns(names(stocks[-1]), freq="week") #Currently, drop index

#### Efficient Frontier function ####

eff.frontier <- function (returns, short="no", max.allocation=NULL,

 risk.premium.up=.5, risk.increment=.005){

 # return argument should be a m x n matrix with one column per security

 # short argument is whether short-selling is allowed; default is no (short

 # selling prohibited)max.allocation is the maximum % allowed for any one

 # security (reduces concentration) risk.premium.up is the upper limit of the

 # risk premium modeled (see for loop below) and risk.increment is the

 # increment (by) value used in the for loop

 covariance <- cov(returns)

 print(covariance)

 n <- ncol(covariance)

 # Create initial Amat and bvec assuming only equality constraint

 # (short-selling is allowed, no allocation constraints)

 Amat <- matrix (1, nrow=n)

 bvec <- 1

 meq <- 1

 # Then modify the Amat and bvec if short-selling is prohibited

 if(short=="no"){

 Amat <- cbind(1, diag(n))

 bvec <- c(bvec, rep(0, n))

 }

 # And modify Amat and bvec if a max allocation (concentration) is specified

 if(!is.null(max.allocation)){

 if(max.allocation > 1 | max.allocation <0){

 stop("max.allocation must be greater than 0 and less than 1")

 }

 if(max.allocation * n < 1){

 stop("Need to set max.allocation higher; not enough assets to add to 1")

 }

 Amat <- cbind(Amat, -diag(n))

 bvec <- c(bvec, rep(-max.allocation, n))

 }

 # Calculate the number of loops

 loops <- risk.premium.up / risk.increment + 1

 loop <- 1

 # Initialize a matrix to contain allocation and statistics

 # This is not necessary, but speeds up processing and uses less memory

 eff <- matrix(nrow=loops, ncol=n+3)

 # Now I need to give the matrix column names

 colnames(eff) <- c(colnames(returns), "Std.Dev", "Exp.Return", "sharpe")

 # Loop through the quadratic program solver

 for (i in seq(from=0, to=risk.premium.up, by=risk.increment)){

 dvec <- colMeans(returns) * i # This moves the solution along the EF

 sol <- solve.QP(covariance, dvec=dvec, Amat=Amat, bvec=bvec, meq=meq)

 eff[loop,"Std.Dev"] <- sqrt(sum(sol$solution*colSums((covariance*sol$solution))))

 eff[loop,"Exp.Return"] <- as.numeric(sol$solution %*% colMeans(returns))

 eff[loop,"sharpe"] <- eff[loop,"Exp.Return"] / eff[loop,"Std.Dev"]

 eff[loop,1:n] <- sol$solution

 loop <- loop+1

 }

 return(as.data.frame(eff))

}

# Run the eff.frontier function based on no short and 50% alloc. restrictions

eff <- eff.frontier(returns=returns$R, short="no", max.allocation=.50,

 risk.premium.up=1, risk.increment=.001)

# Find the optimal portfolio

eff.optimal.point <- eff[eff$sharpe==max(eff$sharpe),]

# graph efficient frontier

# Start with color scheme

ealred <- "#7D110C"

ealtan <- "#CDC4B6"

eallighttan <- "#F7F6F0"

ealdark <- "#423C30"

ggplot(eff, aes(x=Std.Dev, y=Exp.Return)) + geom_point(alpha=.1, color=ealdark) +

 geom_point(data=eff.optimal.point, aes(x=Std.Dev, y=Exp.Return, label=sharpe),

 color=ealred, size=5) +

 annotate(geom="text", x=eff.optimal.point$Std.Dev,

 y=eff.optimal.point$Exp.Return,

 label=paste("Risk: ",

 round(eff.optimal.point$Std.Dev*100, digits=3),"\nReturn: ",

 round(eff.optimal.point$Exp.Return*100, digits=4),"%\nSharpe: ",

 round(eff.optimal.point$sharpe*100, digits=2), "%", sep=""),

 hjust=0, vjust=1.2) +

 ggtitle("Efficient Frontier\nand Optimal Portfolio") +

 labs(x="Risk (standard deviation of portfolio)", y="Return") +

 theme(panel.background=element_rect(fill=eallighttan),

 text=element_text(color=ealdark),

 plot.title=element_text(size=24, color=ealred))

ggsave("Efficient Frontier.png")

Source:  (http://economistatlarge.com/portfolio-theory/r-optimized-portfolio/r-code-graph-efficient-frontier)

Create ggplot2 graphs with http://www.yeroon.net/ggplot2/

Statistics C183/C283: Statistical Models in Finance

Announcements

First lecture is on Monday, 01 April 2013.
Location: MWF MS 5200.
Time: 11:00 - 11:50.
See you then!

For the course syllabus click here.

Useful links:

Statistics Online Computational Resource (SOCR):
http://www.socr.ucla.edu

It's online, therefore it exists!
SOCR Educational Materials:

http://wiki.stat.ucla.edu/socr/index.php/SOCR_EduMaterials.

Useful link:

Probability and Statistics EBook.

Download R and packages.
Labs

Handouts

1. Historical note (from "Against the Gods, the Remarkable Story of Risk", by Peter Berstein, Wiley 1998).

2. Introduction.

3. Diversification - a simple example.

4. Correlation coefficient and protfolio risk.

5. Basics.

6. Example with two stocks (portfolio possibilities curve, efficient frontier).
For handout #6 also download the R commands and the files with the data using the following 3 links:
a. R commands.
b. First data set (table1.csv).
c. Second data set (table2.csv).

7. Why diversfication works.

8. Short sales.

9. An Analytic Derivation of the Efficient Portfolio Frontier (JFQA, Robert Merton, 1972).

10. Efficient frontier with risk free lending and borrowing.

11. How to find the point of tangency (see also handout 10).

12. Point of tangency - example.

13. Lab - example.

14. R code for point of tangency.

15. Sensitivity analysis.

16. Trace out the efficient frontier - example.

17. Trace out the efficient frontier - R code for example of handout 16.

18. Single index model - summary.

19. Adjusting the betas.

20. Adjusting the betas using Vasicek's technique.

21. Adjusting the betas using Blume's technique.

22. Betas and their regression tendencies (Blume).

23. Are betas best?

24. Introduction to stockPortfolio package.

25. How to supply your own data to use the stockPortfolio package.

26. Project (more details will be discussed in class).

27. Simple criteria for optimal portfolio selection.

28. Single index model steps.

29. Single index model - example with short sales allowed.

30. Single index model - example with short sales not allowed.

31. Single index model and optimization procedure give the same answer.

32. Single index model - example using R.

33. Single index model - example using R (same as handout 32).

34. Single index model - Kuhn Tucker conditions when short sales are not allowed.

35. Constant correlation model steps.

36. Constant correlation - example with short sales allowed.

37. Constant correlation - example with short sales not allowed.

38. Constant correlation model - R example.

39. Single index and constant correlation models using the stockPortfolio package.

40. Trace out the efficient frontier when risk free lending and borrowing does not exist - an example using the stockPortfolio package.

41. Simple criteria for optimal portfolio selection: Tracing out the efficient frontier.

42. Simple criteria for optimal portfolio selection: The multi group case.

43. Multigroup model.

44. Multi group model using the stockPortfolio package.

45. Multi-index model.

46. Practice exam.

47. Practice problems.

48. Modern portfolio theory, 1950 to date.

49. Portfolio performance.

50. Plot 1 (see handout #46).

51. Plot 2 (see handout #46).

52. Options basics.

53. Smiley faces - call option.

54. Smiley faces - put option.

55. Options - some simple examples.

56. Payoff and profit for writer and buyer - call option.

57. Payoff and profit for writer and buyer - put option.

58. Lower and upper bounds for call and put options and put call parity.

59. Trading strategies using options.

60. Butterfly example using R.

61. Access the SOCR applet.

62. Trading strategies using options - Excel file.

63. Binomial option pricing model - introduction.

64. Binomial option pricing model - example.

65. A model for stock prices.

66. Monte Carlo simulaton of a stock's path.

67. Ito process and Black-Scholes model.

68. Estimating volatility - Excel file.

69. Options - summary.

70. Implied volatility.

Homework

Homework 1: Due on Friday, 12 April.

Homework 2: Due on Friday, 19 April.

Homework 3: Due on Friday, 03 May.

Homework 4: Due on Friday, 17 May.

Homework 5: Due on Friday, 31 May.

Homework 6: Due on Wednesday, 05 June.

Source:http://www.stat.ucla.edu/~nchristo/statistics_c183_c283/

Portfolio Optimization using Single Index Model in R

# Load Libraries:
library(fEcofin)
library(fPortfolio)

# Load Data:
data(berndtInvest)
berndt <- as.timeSeries(berndtInvest)
names(data)
data <- berndt[, -c(10, 17)]
factors <- berndt[, 10]
attr(data, "factors") <- factors

# Sharpe's Single Index Factor Model:
sharpeFactorEstimator <- 
function(x, spec=NULL, ...)
{
    # Sharpe Single Index Model:
    data <- getDataPart(x)
    factors <- attr(x, "factors")
    nScenarios <- nrow(data)
    X.mat <- cbind(rep(1, times=nScenarios), factors)
    G.hat <- solve(qr(X.mat), data)
    beta.hat <- G.hat[2, ]
    eps.hat <- data - X.mat %*% G.hat
    diagD.hat <- diag(crossprod(eps.hat) / (nScenarios-2))
    mu <- G.hat[1, ] + G.hat[2, ] * colMeans(factors)  
    Sigma <- var(factors)[[1]] * (beta.hat %o% beta.hat) + diag(diagD.hat)
    
    # Return Value:
    list(mu = mu, Sigma = Sigma)
}

# Solve Long Only Markowitz Model:
markowitz <- portfolioFrontier(data)

# Solve Long Only Sharpe Single Index Model:
spec <- portfolioSpec()
setEstimator(spec) <- "sharpeFactorEstimator"
sharpe <- portfolioFrontier(data, spec)

# Plot:
tailoredFrontierPlot(markowitz)
points(frontierPoints(sharpe), col = "steelblue")

FUNCTION:                     DESCRIPTION:
sharpeFactorEstimator()       Sharpe's single index model
macroFactorEstimator()        General macroeconomic factor model
industryFactorEstimator()     Barra industry factor model
statisticalFactorEstimator()  Statistical factor model
pcaFactorEstimator()          PCA statistical factor model
asympcaFactorEstimator()      Asymptotic PCA statistical factor model

Source: (https://www.rmetrics.org/blog/FactorModeling)
References:
Würtz, Chalabi, Chen, Ellis, Portfolio Optimization with R/Rmetrics, Rmetrics and Finance Online Publishing, Zurich 2009
Zivot and Wang, Modeling Financial Time Series with S-PLUS. Springer Publishing New York, 2nd edition 2006
Any Questions?
Do not hesitate to contact us or come to the Meielisalp 2010 Workshop to meet the authors.

Monday, June 17, 2013

A quick and easy function to plot lm() results with ggplot2 in R

Share this:

Like this:

Getting Started

Explore

Source: http://www-958.ibm.com/software/analytics/manyeyes/

Participate

Learn more

Try our featured visualizations

R Code – Graph Efficient Frontier

Statistics C183/C283: Statistical Models in Finance

Announcements

Useful links:

It's online, therefore it exists!

Labs

Handouts

Homework