I can't make new column with paste R-CodePudding

i have code and i cant make data in one column(problem in photos):

dane <- read.csv2("dane-ceny.csv")
str(dane)
head(dane)

d<-dane[dane$region=="MALOPOLSKIE",c(3,4,6)]
dim(dane)
d$data<-as.Date(paste(d$rok,d$mies,"01",sep="-"),"%Y-%b-%d")

sorry for my bad english!

CodePudding user response：

The format= string is correct, but I think maydin's link to a question whose answers include Sys.setlocale(.) is likely going to be the best approach. It might help to reproduce your working environment if you post what Sys.getlocale("LC_TIME") returns.

Here's some sample data, and my guess at the locale:

d <- structure(list(rok = c("2017", "2018", "2018", "2019"), mies = c("sty", "lut", "wrz", "gru")), class = "data.frame", row.names = c(NA, -4L))
d
#    rok mies
# 1 2017  sty
# 2 2018  lut
# 3 2018  wrz
# 4 2019  gru

Sys.setlocale("LC_TIME", "Polish")
# [1] "Polish_Poland.1250"
as.Date(paste(d$rok, d$mies, "01", sep = "-"), format = "%Y-%b-%d")
# [1] "2017-01-01" "2018-02-01" "2018-09-01" "2019-12-01"

As a workaround, or if this is incomplete and does not work for some of them,you can make a vector of all of your abbreviated months and then match your $mies against it. I'm going to guess Polish (informed via https://web.library.yale.edu/cataloging/months), which leads me to:

month.abb.polish <- c("sty", "lut", "mar", "kwi", "maj", "cze", "lip", "sie", "wrz", "paź", "lis", "gru") # note 1

### this portion is just to prove that it works even if locale is wrong
Sys.setlocale("LC_TIME", "English_United States.1252")
# [1] "English_United States.1252"
as.Date(paste(d$rok, d$mies, "01", sep = "-"), format = "%Y-%b-%d")
# [1] NA NA NA NA

### this is the workaround
paste(d$rok, match(d$mies, month.abb.polish), "01", sep = "-")
# [1] "2017-1-01"  "2018-2-01"  "2018-9-01"  "2019-12-01"
as.Date(paste(d$rok, match(d$mies, month.abb.polish), "01", sep = "-"), format = "%Y-%m-%d")
# [1] "2017-01-01" "2018-02-01" "2018-09-01" "2019-12-01"

Note:

I do not speak Polish, so my guess at "paź" (and any of the other months) may not be quite right; and in fact, I may have inferred incorrectly and this is a different language altogether ... my apologies. The point of the "workaround" part of the answer, though, is that it doesn't matter what locale is set, nor if the abbreviations are actually technically correct: all it requires is that you know what abbreviations in the dataset correspond with which month-numbers. This solution is a one-for-one, so if the data is inconsistent and uses different abbreviations for the same month, then a different approach would be necessary.

Multiple candidates could be handled as a lookup table. This version of month.abb.polish has more than one candidate for each month-number (I arbitrarily added "jan" and "feb" as two english variants, to fill it out, plus provided two abbreviations for "październik".
```
month.abb.polish <- c("sty"=1, "jan"=1, "lut"=2, "feb"=2, "mar"=3, "kwi"=4, "maj"=5, "cze"=6, "lip"=7, "sie"=8, "wrz"=9, "paz"=10, "pa?"=10, "lis"=11, "gru"=12)
month.abb.polish[d$mies]
# sty lut wrz gru 
#   1   2   9  12 
paste(d$rok, month.abb.polish[d$mies], "01", sep = "-")
# [1] "2017-1-01"  "2018-2-01"  "2018-9-01"  "2019-12-01"
as.Date(paste(d$rok, month.abb.polish[d$mies], "01", sep = "-"), format = "%Y-%m-%d")
# [1] "2017-01-01" "2018-02-01" "2018-09-01" "2019-12-01"
```