R与excel交互之美
R与Excel交互之美
徐静
2017年11月15日
很早就有萌生开一个微信公众号或个人网站或个人博客的打算,主要分享在数据科学中的一些小的知识点和例子, 今天是我使用jekyll打造自己博客的第一篇博文,希望大家能够喜欢。
像Python一样,R语言是典型的内存内运算工具,一般情况下我们需要所有数据一次读入内存,然后在由R去做数据清洗或特征工程,然后快速建模,用rmarkdwn或shiny生成自动化文档分析报告,很多人认为R将所有数据一次读入内存是一个很大的缺点,而事实上除了SAS以外绝大多数的统计软件都需要将数据全部读入内存,这可能是我们在业务数据分析中经常操作的步骤,而业务数据大多都是以Excel存放,R与Excel的操作就显得尤为重要,所以今天主要整理R语言与Excel的一些交互操作所用到的函数,让我们来看一下:
R中可以读取后缀为.xls及.xlsx版本的Excel,我们的演示以.xlsx格式为例。
对于RODBC包这是一个R除了DBI方式连接数据库的另一种R与数据库的链接方式,当然RODBC也可以读取本地的Excel文件,代码举例如下:
library(RODBC)
data0 <- odbcConnectExcel2007("path/file.xlsx")
data1 <- sqlFetch(data0,"SheetName")
readxl是一个非常方便的读取本地Excel的工具包,并且其读取速度比较快,代码简单
library(readxl)
read_excel(path, sheet = NULL, range = NULL, col_names = TRUE,
col_types = NULL, na = "", trim_ws = TRUE, skip = 0, n_max = Inf,
guess_max = min(1000, n_max))
read_xls(path, sheet = NULL, range = NULL, col_names = TRUE,
col_types = NULL, na = "", trim_ws = TRUE, skip = 0, n_max = Inf,
guess_max = min(1000, n_max))
read_xlsx(path, sheet = NULL, range = NULL, col_names = TRUE,
col_types = NULL, na = "", trim_ws = TRUE, skip = 0, n_max = Inf,
guess_max = min(1000, n_max))
xlsx是另一个读取Excel的常用的R包,但是该包需要配置Java环境,依托于xlsxjars,rJava包,该包不同于前面两个包,它不但可以进行Excel的读入,还可以方便的写出Excel
#读入Excel
require(xlsx)
file <- system.file("tests", "test_import.xlsx", package = "xlsx")
res <- read.xlsx(file, 2) # read the second sheet
#写出Excel
file <- paste(tempfile(), "xlsx", sep=".")
write.xlsx(USArrests, file=file)
在写出Excel时,该包的缺点是在写出时有时会出现列数据数据格式错误的情况,所以在使用该包写出Excel时要特别小心。
下面我们介绍另一个包:openxlsx,该包是用C++写R包的速度比较快,其中有四个比较常用的函数writeData(),writeDataTable(),write.xlsx()及read.xlsx(), 我们主要介绍read.xlsx()及write.xlsx()分别用于从本地读写Excel文件
#读Excel文件
library(openxlsx)
xlsxFile <- system.file("readTest.xlsx", package = "openxlsx")
df1 <- read.xlsx(xlsxFile = xlsxFile, sheet = 1, skipEmptyRows = FALSE)
sapply(df1, class)
wb <- loadWorkbook(system.file("readTest.xlsx", package = "openxlsx"))
df2 <- read.xlsx(wb, sheet = 2, skipEmptyRows = FALSE, colNames = TRUE)
df3 <- read.xlsx(xlsxFile, sheet = 2, skipEmptyRows = FALSE, colNames = TRUE)
all.equal(df3, df4)
#写Excel文件
library(openxlsx)
options("openxlsx.borderColour" = "#4F80BD") ## set default border colour
write.xlsx(iris, file = "writeXLSX1.xlsx", colNames = TRUE, borders = "columns")
write.xlsx(iris, file = "writeXLSX2.xlsx", colNames = TRUE, borders = "surrounding")
hs <- createStyle(textDecoration = "BOLD", fontColour = "#FFFFFF", fontSize=12,
fontName="Arial Narrow", fgFill = "#4F80BD")
write.xlsx(iris, file = "writeXLSX3.xlsx", colNames = TRUE, borders = "rows", headerStyle = hs)
XLConnect包同样也是一个既可以写Excel数据同时也可以读Excel的R包,该包有丰富的学习资源,官方说明文档比较详实,我们直接给出读写Excel的示例代码
#读Excel文件
library(XLConnect)
#readWorksheetFromFile()
demoExcelFile <- system.file("demoFiles/multiregion.xlsx",
package = "XLConnect")
df.one <- readWorksheetFromFile(demoExcelFile, sheet = 1,
header = FALSE, startCol = 2,
startRow = 2, endCol = 3,
endRow = 3)
dflist <- readWorksheetFromFile(demoExcelFile,
sheet = c("FirstSheet",
"FirstSheet",
"SecondSheet"),
header = TRUE,
startRow = c(2,2,3),
startCol = c(2,5,2),
endCol = c(5,8,6),
endRow = c(9,15,153))
#readWorksheet()
demoExcelFile <- system.file("demoFiles/mtcars.xlsx", package = "XLConnect")
wb <- loadWorkbook(demoExcelFile)
data <- readWorksheet(wb, sheet = "mtcars", region = "C1:H15")
#写Excel文件
library(XLConnect)
#writeWorksheetToFile()
demoExcelFile <- system.file("demoFiles/multiregion.xlsx",
package = "XLConnect")
template <- "template-ws.xlsx"
file.copy(demoExcelFile, template)
writeWorksheetToFile(template, data=iris, sheet="FirstSheet",
startRow=9, startCol = 9)
template <- "template-multi-ws.xlsx"
writeWorksheetToFile(template, data = list(i1 = iris, i2 = iris, i3 = iris),
sheet = c("FirstSheet", "SecondSheet", "FirstSheet"),
startRow = c(1,101,201), startCol = c(1,11,21))
#writeworksheet()
wb <- loadWorkbook("writeWorksheet.xlsx", create = TRUE)
# Create a worksheet called 'CO2'
createSheet(wb, name = "CO2")
# Write built-in data set 'CO2' to the worksheet created above;
# offset from the top left corner and with default header = TRUE
writeWorksheet(wb, CO2, sheet = "CO2", startRow = 4, startCol = 2)
# Save workbook (this actually writes the file to disk)
saveWorkbook(wb)
XLConnect中还有很多比较好用的处理Excel的函数,详细内容可以参见XLConnect说明文档
WriteXLS包,该包2005年挂在CRAN上,主要用于R对象写出本地Excel文件,说实话小编并没有用过该包,其中主要有两个函数构成,我们在此主要介绍WriteXLS()函数,该函数可生成Excel2003及Excel2007的文件,其演示样例如下
#写Excel
library(WriteXLS)
## Only run the examples if Perl and all modules are present
if (testPerl(verbose = FALSE))
{
## Examples using built-in data frames
## Create XLS (Excel 2003) files
WriteXLS("iris", "iris.xls")
WriteXLS(c("iris", "infert", "esoph"), "Example.xls")
iris.split <- split(iris, iris$Species)
WriteXLS("iris.split", "irissplit.xls")
## Example using comment()
## Commented cells with have a small red triangle in the
## upper right hand corner of the cell. Click on the cell
## or place the cursor over the cell to see the pop-up
## containing the comment text.
## Create an XLSX (Excel 2007) file
## Adjust the column widths
## Bold the header row
comment(iris$Sepal.Length) <- "Length of the sepals (cm)"
comment(iris$Sepal.Width) <- "Width of the sepals (cm)"
comment(iris$Petal.Length) <- "Length of the petals (cm)"
comment(iris$Petal.Width) <- "Width of the petals (cm)"
comment(iris$Species) <- "Species of the flowers"
WriteXLS("iris", "iriscomments.xlsx",
AdjWidth = TRUE, BoldHeaderRow = TRUE)
## Add row names
WriteXLS("iris", "irisrownames.xlsx",
AdjWidth = TRUE, BoldHeaderRow = TRUE, row.names = TRUE)
## Use latin1 Encoding
WriteXLS("iris", "irisLatin1.xls", Encoding = "latin1")
## Write a 0 row data frame
## Worksheet will contain header row only
DF0 <- data.frame("A" = numeric(), "B" = numeric(), "C" = numeric())
WriteXLS("DF0", "DF0.xls", AdjWidth = TRUE, BoldHeaderRow = TRUE)
## 'x' is a single data frame object
WriteXLS(iris, "irisDF.xls")
## 'x' is a list object containing data frames
WriteXLS(iris.split, "irisList.xls")
## Clean up and delete XLS files
rm(iris.split)
rm(DF0)
unlink("iris.xls")
unlink("Example.xls")
unlink("irissplit.xls")
unlink("iriscomments.xlsx")
unlink("irisrownames.xlsx")
unlink("irisLatin1.xls")
unlink("DF0.xls")
unlink("irisDF.xls")
unlink("irisList.xls")
}
最近一个(2017年8月)关于R读Excel文件的包是writexl,该包比较简洁,其用法更像是与readxl对应,其核心函数为write_xlsx(),工作中我现在主要是用writexl包写入Excel文件,快速方便。
#写Excel
library(writexl)
tmp <- write_xlsx(iris)
readxl::read_xlsx(tmp)
很多小朋友会问,可不可以在Excel中执行R代码?答案是肯定的,RExcel包就可以实现你的梦想,不过需要先安装DCOM,就可以在Excel中书写R代码,实现R与Excel的混搭,感兴趣的同学可以去看说明文档或源码,在此不多赘述。
最后总结一下:R读取Excel的常用包有:RODBC,readxl,xlsx,openxlsx,XLConnect;R写入Excel的包有,writexl,WriteXLS,xlsx,oepnxlsx,XLConnect;Excel与R语言的混编包为RExcel。
本人接触R语言比较浅显,可能还有其他好用的R包实现R与Excel交互,最后送上林祯舜博士的名言:学R不思则罔,思R不学则怠!!!
请关注我的GitHub,并留言