R与excel交互之美

R与excel交互之美

R与Excel交互之美

很早就有萌生开一个微信公众号或个人网站或个人博客的打算,主要分享在数据科学中的一些小的知识点和例子, 今天是我使用jekyll打造自己博客的第一篇博文,希望大家能够喜欢。

像Python一样,R语言是典型的内存内运算工具,一般情况下我们需要所有数据一次读入内存,然后在由R去做数据清洗或特征工程,然后快速建模,用rmarkdwn或shiny生成自动化文档分析报告,很多人认为R将所有数据一次读入内存是一个很大的缺点,而事实上除了SAS以外绝大多数的统计软件都需要将数据全部读入内存,这可能是我们在业务数据分析中经常操作的步骤,而业务数据大多都是以Excel存放,R与Excel的操作就显得尤为重要,所以今天主要整理R语言与Excel的一些交互操作所用到的函数,让我们来看一下:

R中可以读取后缀为.xls及.xlsx版本的Excel,我们的演示以.xlsx格式为例。

对于RODBC包这是一个R除了DBI方式连接数据库的另一种R与数据库的链接方式,当然RODBC也可以读取本地的Excel文件,代码举例如下:

library(RODBC)

data0 <- odbcConnectExcel2007("path/file.xlsx")
data1 <- sqlFetch(data0,"SheetName")

readxl是一个非常方便的读取本地Excel的工具包,并且其读取速度比较快,代码简单

library(readxl)

read_excel(path, sheet = NULL, range = NULL, col_names = TRUE,
  col_types = NULL, na = "", trim_ws = TRUE, skip = 0, n_max = Inf,
  guess_max = min(1000, n_max))

read_xls(path, sheet = NULL, range = NULL, col_names = TRUE,
  col_types = NULL, na = "", trim_ws = TRUE, skip = 0, n_max = Inf,
  guess_max = min(1000, n_max))

read_xlsx(path, sheet = NULL, range = NULL, col_names = TRUE,
  col_types = NULL, na = "", trim_ws = TRUE, skip = 0, n_max = Inf,
  guess_max = min(1000, n_max))
  

xlsx是另一个读取Excel的常用的R包,但是该包需要配置Java环境,依托于xlsxjars,rJava包,该包不同于前面两个包,它不但可以进行Excel的读入,还可以方便的写出Excel

#读入Excel
require(xlsx)

file <- system.file("tests", "test_import.xlsx", package = "xlsx")
res <- read.xlsx(file, 2)  # read the second sheet
#写出Excel

file <- paste(tempfile(), "xlsx", sep=".")
write.xlsx(USArrests, file=file)

在写出Excel时,该包的缺点是在写出时有时会出现列数据数据格式错误的情况,所以在使用该包写出Excel时要特别小心。

下面我们介绍另一个包:openxlsx,该包是用C++写R包的速度比较快,其中有四个比较常用的函数writeData(),writeDataTable(),write.xlsx()及read.xlsx(), 我们主要介绍read.xlsx()及write.xlsx()分别用于从本地读写Excel文件

#读Excel文件
library(openxlsx)

xlsxFile <- system.file("readTest.xlsx", package = "openxlsx")
df1 <- read.xlsx(xlsxFile = xlsxFile, sheet = 1, skipEmptyRows = FALSE)
sapply(df1, class)

wb <- loadWorkbook(system.file("readTest.xlsx", package = "openxlsx"))
df2 <- read.xlsx(wb, sheet = 2, skipEmptyRows = FALSE, colNames = TRUE)
df3 <- read.xlsx(xlsxFile, sheet = 2, skipEmptyRows = FALSE, colNames = TRUE)
all.equal(df3, df4)
#写Excel文件

library(openxlsx)

options("openxlsx.borderColour" = "#4F80BD") ## set default border colour
write.xlsx(iris, file = "writeXLSX1.xlsx", colNames = TRUE, borders = "columns")
write.xlsx(iris, file = "writeXLSX2.xlsx", colNames = TRUE, borders = "surrounding")


hs <- createStyle(textDecoration = "BOLD", fontColour = "#FFFFFF", fontSize=12,
                  fontName="Arial Narrow", fgFill = "#4F80BD")

write.xlsx(iris, file = "writeXLSX3.xlsx", colNames = TRUE, borders = "rows", headerStyle = hs)

XLConnect包同样也是一个既可以写Excel数据同时也可以读Excel的R包,该包有丰富的学习资源,官方说明文档比较详实,我们直接给出读写Excel的示例代码

#读Excel文件
library(XLConnect)

#readWorksheetFromFile()
demoExcelFile <- system.file("demoFiles/multiregion.xlsx", 
                             package = "XLConnect")

df.one <- readWorksheetFromFile(demoExcelFile, sheet = 1, 
                                header = FALSE, startCol = 2, 
                                startRow = 2, endCol = 3, 
                                endRow = 3)

dflist <- readWorksheetFromFile(demoExcelFile,
                                sheet = c("FirstSheet", 
                                          "FirstSheet", 
                                          "SecondSheet"),
                                header = TRUE, 
                                startRow = c(2,2,3), 
                                startCol = c(2,5,2),
                                endCol = c(5,8,6), 
                                endRow = c(9,15,153))

#readWorksheet()                               
demoExcelFile <- system.file("demoFiles/mtcars.xlsx", package = "XLConnect")

wb <- loadWorkbook(demoExcelFile)

data <- readWorksheet(wb, sheet = "mtcars", region = "C1:H15")                                
                                
#写Excel文件

library(XLConnect)

#writeWorksheetToFile()

demoExcelFile <- system.file("demoFiles/multiregion.xlsx",
                             package = "XLConnect")

template <- "template-ws.xlsx"
file.copy(demoExcelFile, template) 

writeWorksheetToFile(template, data=iris, sheet="FirstSheet",
                     startRow=9, startCol = 9)


template <- "template-multi-ws.xlsx"

writeWorksheetToFile(template, data = list(i1 = iris, i2 = iris, i3 = iris),
                     sheet = c("FirstSheet", "SecondSheet", "FirstSheet"),
                     startRow = c(1,101,201), startCol = c(1,11,21))
                     
                     
 
 
#writeworksheet()                    
wb <- loadWorkbook("writeWorksheet.xlsx", create = TRUE)

# Create a worksheet called 'CO2'
createSheet(wb, name = "CO2")

# Write built-in data set 'CO2' to the worksheet created above;
# offset from the top left corner and with default header = TRUE
writeWorksheet(wb, CO2, sheet = "CO2", startRow = 4, startCol = 2)

# Save workbook (this actually writes the file to disk)
saveWorkbook(wb)

XLConnect中还有很多比较好用的处理Excel的函数,详细内容可以参见XLConnect说明文档

WriteXLS包,该包2005年挂在CRAN上,主要用于R对象写出本地Excel文件,说实话小编并没有用过该包,其中主要有两个函数构成,我们在此主要介绍WriteXLS()函数,该函数可生成Excel2003及Excel2007的文件,其演示样例如下

#写Excel

library(WriteXLS)
  ## Only run the examples if Perl and all modules are present
  if (testPerl(verbose = FALSE))
  {
    ## Examples using built-in data frames
    ## Create XLS (Excel 2003) files 
    WriteXLS("iris", "iris.xls")

    WriteXLS(c("iris", "infert", "esoph"), "Example.xls")

    iris.split <- split(iris, iris$Species)
    WriteXLS("iris.split", "irissplit.xls")

    ## Example using comment()
    ## Commented cells with have a small red triangle in the
    ## upper right hand corner of the cell. Click on the cell
    ## or place the cursor over the cell to see the pop-up
    ## containing the comment text.
    ## Create an XLSX (Excel 2007) file
    ## Adjust the column widths
    ## Bold the header row
    comment(iris$Sepal.Length) <- "Length of the sepals (cm)"
    comment(iris$Sepal.Width) <- "Width of the sepals (cm)"
    comment(iris$Petal.Length) <- "Length of the petals (cm)"
    comment(iris$Petal.Width) <- "Width of the petals (cm)"
    comment(iris$Species) <- "Species of the flowers"
    WriteXLS("iris", "iriscomments.xlsx",
             AdjWidth = TRUE, BoldHeaderRow = TRUE)

    ## Add row names
    WriteXLS("iris", "irisrownames.xlsx",
             AdjWidth = TRUE, BoldHeaderRow = TRUE, row.names = TRUE)

    ## Use latin1 Encoding
    WriteXLS("iris", "irisLatin1.xls", Encoding = "latin1")

    ## Write a 0 row data frame
    ## Worksheet will contain header row only
    DF0 <- data.frame("A" = numeric(), "B" = numeric(), "C" = numeric())
    WriteXLS("DF0", "DF0.xls", AdjWidth = TRUE, BoldHeaderRow = TRUE)

    ## 'x' is a single data frame object
    WriteXLS(iris, "irisDF.xls")

    ## 'x' is a list object containing data frames
    WriteXLS(iris.split, "irisList.xls")

    ## Clean up and delete XLS files
    rm(iris.split)
    rm(DF0)
    unlink("iris.xls")
    unlink("Example.xls")
    unlink("irissplit.xls")
    unlink("iriscomments.xlsx")
    unlink("irisrownames.xlsx")
    unlink("irisLatin1.xls")
    unlink("DF0.xls")
    unlink("irisDF.xls")
    unlink("irisList.xls")
  }

最近一个(2017年8月)关于R读Excel文件的包是writexl,该包比较简洁,其用法更像是与readxl对应,其核心函数为write_xlsx(),工作中我现在主要是用writexl包写入Excel文件,快速方便。

#写Excel
library(writexl)

tmp <- write_xlsx(iris)
readxl::read_xlsx(tmp)

很多小朋友会问,可不可以在Excel中执行R代码?答案是肯定的,RExcel包就可以实现你的梦想,不过需要先安装DCOM,就可以在Excel中书写R代码,实现R与Excel的混搭,感兴趣的同学可以去看说明文档或源码,在此不多赘述。

最后总结一下:R读取Excel的常用包有:RODBC,readxl,xlsx,openxlsx,XLConnect;R写入Excel的包有,writexl,WriteXLS,xlsx,oepnxlsx,XLConnect;Excel与R语言的混编包为RExcel

本人接触R语言比较浅显,可能还有其他好用的R包实现R与Excel交互,最后送上林祯舜博士的名言:学R不思则罔,思R不学则怠!!!

请关注我的GitHub,并留言

Author face

徐静

数据科学从业者,算法工程师. 善于用数据科学的工具透析业务,模型的线上化部署,网络爬虫及前端可视化. 喜欢研究机器学习,深度学习及相关软件实现.目前自己还是小白一个,希望多多学习.

最近发表的文章