Unable to import text column data contained in separate lines-CodePudding

I have raw data in this txt format:

Name|Occupation|Comment
Robert|Doctor|To process, please provide:
a. Tax Returns
b. Identification
c. Statement of Approval
Sally|Accountant|Approved

Here, | is the delimiter.

For Robert, I want "To process, please provide: a. Tax Returns b. Identification c. Statement of Approval" to be reflected as one string under Comment.

However, using read.csv to import with the following arguments:

read.csv(
    "data/text_data",
    fileEncoding = "UTF-8",
    sep = "|",
    na.strings = "",
    quote = ""
  )

gives me additional rows where

Name                        Occupation    Comment
Robert                      Doctor        To process, please provide:
a. Tax Returns              NA            NA
b. Identification           NA            NA
c. Statement of Approval    NA            NA
Sally                       Accountant    Approved

Are there any R import functions or wrangling tricks to fix this? Thanks.

CodePudding user response：

An easy option splitting the data on a temporary "id" column using by, apply transformations and recombine. For convenience you may wrap this in a function.

myDataReader <- \(link) {
  r <- read.csv(link, fileEncoding="UTF-8", sep="|", na.strings="", quote="")
  r$id <- cumsum(!is.na(r$Occupation))
  do.call(what=rbind, by(r, r$id, \(x) {
    cbind(x[1, 1:2], Comment=trimws(paste(x[1, 3], toString(x[-1, 1]))))
  }))
}

myDataReader('data/text_data')
#     Name Occupation                                                                                 Comment
# 1 Robert     Doctor To process, please provide: a. Tax Returns, b. Identification, c. Statement of Approval
# 2  Sally Accountant                                                                                Approved
# 3    Tom     Lawyer                           To process, please provide: a. Tax Returns, b. Identification
# 4  Sally Accountant                                                                                Approved

Note: R>= 4.1 used.

Content of 'data/text_data':

Name|Occupation|Comment
Robert|Doctor|To process, please provide:
a. Tax Returns
b. Identification
c. Statement of Approval
Sally|Accountant|Approved
Tom|Lawyer|To process, please provide:
a. Tax Returns
b. Identification
Sally|Accountant|Approved