I have a list that looks like:
df1 <- tibble::tribble(~City, ~State, ~Year, ~Temp,
"", "", "Year", "Overall temperature, by now",
"Aberdeen", "", "2022", "18.73",
"Aberdeen", "", "2021", "17.79",
"Aberdeen", "", "2020", "-",
"Aberdeen", "", "2019", "16.43",
"Aberdeen", "", "2018", "-",
"Aberdeen", "", "2017", "17.84",
"Aberdeen", "", "2016", "17.47",
"Aberdeen", "", "2015", "25.84",
"Aberdeen", "", "2014", "26.8",
"Aberdeen", "", "2013", "22.73",
"Aberdeen", "", "2012", "23.08",
"Aalborg", "P1", "Year", "Temp, measurement",
"Aalborg", "P1", "2022", "-",
"Aalborg", "P1", "2021", "20.05")
Or, the same data for visual representation:

I need to turn the last two columns to numeric. In addition, it would be good to turn this list to dataframe (please, consider a method :)
It is a small sample of a big dataset.
Here is what doesn't work (although I used these methods for working with less complicated lists):
df1[, 3] <- as.numeric(df1[, 3]) #for sure, it is a list
# Error: 'list' object cannot be coerced to type 'double'
as.numeric(unlist(df1[[3]]))
# Error: (converted from warning) NAs introduced by coercion
df1[, 3:4] <- sapply(df1, as.numeric)
# Error in lapply(X = X, FUN = FUN, ...) : (converted from warning) NAs introduced by coercion
as.numeric(as.character(unlist(df1[[3]])))
# Error: (converted from warning) NAs introduced by coercion
df1$Year <- lapply(df1$Year, as.numeric)
# Error in lapply(df1$Year, as.numeric) : (converted from warning) NAs introduced by coercion
df1 <- as.data.frame(df1) #Working with a dataframe would be easier :)
typeof(df1)
# [1] "list"
as.numeric(df1[1, 3]) #If that would work - we could use loop to change element by element into numeric
# Error: (converted from warning) NAs introduced by coercion
df1 <- as.data.table(df1)
typeof(df1)
# [1] "list"
I don't care about the disappeared text data in the numeric columns after the transformation (these data are useless).
Update: we found that methods to work with lists are not robust - running some modern libraries deprecate the work of vital functions. But I can't find which libraries make solutions not working :( Can you, please, help.
Libraries I often use: "plyr", "dplyr", "data.table","tidyverse","magrittr", "tidyr", "reshape2", "expss", "janitor", "dplyr", "ggplot2", "purrr", "GGally", "cluster", "readxl", "writexl", "psych", "knitr", "ExPanDaR", "kableExtra", "plm", "sampleSelection", "nnet", "ggmap", "scales", "RPostgreSQL","readr","lubridate","seasonal","stargazer","merTools","RColorBrewer","colorRamps", "nycflights13", "scales", "zoo", "stringr", "maps", "mapdata", "gtrendsR", "cdlTools", "usmap", "rnaturalearth", "WDI", "tigris", "ggrepel", "rworldmap", "gapminder" System coding: Sys.setlocale(category = 'LC_ALL','en_US.UTF-8')
CodePudding user response:
It is a tibble, so [, would still return a tibble with single column as drop = FALSE by default when compared to data.frame. Instead use either $ or [[ to extract as vector. For multiple columns, use lapply instead of sapply as sapply can return a matrix
df1[3:4] <- lapply(df1[3:4], as.numeric)
-output
> str(df1)
tibble [15 × 4] (S3: tbl_df/tbl/data.frame)
$ City : chr [1:15] "" "Aberdeen" "Aberdeen" "Aberdeen" ...
$ State: chr [1:15] "" "" "" "" ...
$ Year : num [1:15] NA 2022 2021 2020 2019 ...
$ Temp : num [1:15] NA 18.7 17.8 NA 16.4 ...
> df1
# A tibble: 15 × 4
City State Year Temp
<chr> <chr> <dbl> <dbl>
1 "" "" NA NA
2 "Aberdeen" "" 2022 18.7
3 "Aberdeen" "" 2021 17.8
4 "Aberdeen" "" 2020 NA
5 "Aberdeen" "" 2019 16.4
6 "Aberdeen" "" 2018 NA
7 "Aberdeen" "" 2017 17.8
8 "Aberdeen" "" 2016 17.5
9 "Aberdeen" "" 2015 25.8
10 "Aberdeen" "" 2014 26.8
11 "Aberdeen" "" 2013 22.7
12 "Aberdeen" "" 2012 23.1
13 "Aalborg" "P1" NA NA
14 "Aalborg" "P1" 2022 NA
15 "Aalborg" "P1" 2021 20.0
For single column
> class(df1[,3])
[1] "tbl_df" "tbl" "data.frame"
> class(df1[[3]])
[1] "numeric"
As the input is tibble, we can use dplyr methods
library(dplyr)
df1 <- df1 %>%
mutate(across(3:4, as.numeric))
CodePudding user response:
First remove all words in Year and Temp then convert them to the apropriate class integer for Year and double for Temp
library(dplyr)
library(readr)
df1 %>%
dplyr::filter(!grepl("^[A-Z]", Year),
!grepl("^[A-Z]", Temp)) %>%
dplyr::mutate(Year = readr::parse_integer(Year),
Temp = readr::parse_number(Temp))
# A tibble: 13 × 4
City State Year Temp
<chr> <chr> <int> <dbl>
1 Aberdeen "" 2022 18.7
2 Aberdeen "" 2021 17.8
3 Aberdeen "" 2020 NA
4 Aberdeen "" 2019 16.4
5 Aberdeen "" 2018 NA
6 Aberdeen "" 2017 17.8
7 Aberdeen "" 2016 17.5
8 Aberdeen "" 2015 25.8
9 Aberdeen "" 2014 26.8
10 Aberdeen "" 2013 22.7
11 Aberdeen "" 2012 23.1
12 Aalborg "P1" 2022 NA
13 Aalborg "P1" 2021 20.0
