Home > Back-end >  Change factor to numeric in dataframe and drop missing values
Change factor to numeric in dataframe and drop missing values

Time:01-25

I have downloaded the data and would like to change columns named USD and EUR to numeric and also treat the column date as a date. I would also like to get rid of the missing values in the dataframe named result3.

library(dplyr)
library(ggplot2)
library(reshape2) 

getNBPRates <- function(year) {
  url1 <- sprintf(
    paste0("https://www.nbp.pl/kursy/Archiwum/archiwum_tab_a_", year, ".csv"), 
    year)
  url1 <- read.csv2(url1, header=TRUE, sep=";", dec=",") %>% 
    select(data, X1USD, X1EUR) %>% 
    rename(usd=X1USD, eur=X1EUR, date=data) %>%
    slice(-1)
  transform(url1, date = as.Date(as.character(date), "%Y%m%d"))
}

a <- getNBPRates(year=2015)

head(as.data.frame(a))

years<- c(2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020)

result <- lapply(years, getNBPRates)

result3 <- Reduce(rbind, result)

CodePudding user response:

getNBPRates <- function(year) {
  url1 <- sprintf(paste0("https://www.nbp.pl/kursy/Archiwum/archiwum_tab_a_", year, ".csv"))
  url1 <- read.csv2(url1, header=TRUE, sep=";", dec=",", fileEncoding = "Windows-1250")
  url1 <- url1 |>
    select(data, X1USD, X1EUR) |>
    slice(-1) |>
    filter(row_number()<= n()-3) |>
    mutate(data = as.Date(data, format = "%Y%m%d"), usd = as.numeric(gsub(",", ".", X1USD)), eur = as.numeric(gsub(",", ".", X1EUR))) |>
    select(-c(X1USD, X1EUR))
}

years<- c(2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020)
result <- lapply(years, getNBPRates)
result3 <- Reduce(rbind, result)

And what you understand with "to get rid of the missing values in dataframe named result3."? If that's the missing dates, then you have to substitute it with some logic. If I'm not mistaken - if there is no NBP for particular day, a last one has to be taken.

CodePudding user response:

To change a column to numeric you can use as.numeric(column_name)

Based on the date format in the archiwum_tab_a_2015.csv file, you can change the date column with as.Date(column_name, format = "%Y%m%d")

To remove all missing values you can use complete.cases(data):

mydata[complete.cases(mydata),]
  •  Tags:  
  • Related