Home > database >  I am using levels(dataset$variable) and it is still showing a NULL output
I am using levels(dataset$variable) and it is still showing a NULL output

Time:01-12

This is all using R in Rstudio. I am hoping for an quick solution. I am currently running through a Udacity R course, and they are asking me to run the below code to show the levels for the variable age.range in a dataset called reddit:

levels(reddit$age.range)

However, it keeps returning with the same output.

NULL

In their video tutorial with the same data set it seems to be working fine, and showing a clear series of levels for this variable, so I really don't understand what the issue is. Please help.

> str(reddit)
'data.frame':   32754 obs. of  14 variables:
 $ id               : int  1 2 3 4 5 6 7 8 9 10 ...
 $ gender           : int  0 0 1 0 1 0 0 0 0 0 ...
 $ age.range        : chr  "25-34" "25-34" "18-24" "25-34" ...
 $ marital.status   : chr  NA NA NA NA ...
 $ employment.status: chr  "Employed full time" "Employed full time" "Freelance" "Freelance" ...
 $ military.service : chr  NA NA NA NA ...
 $ children         : chr  "No" "No" "No" "No" ...
 $ education        : chr  "Bachelor's degree" "Bachelor's degree" "Some college" "Bachelor's degree" ...
 $ country          : chr  "United States" "United States" "United States" "United States" ...
 $ state            : chr  "New York" "New York" "Virginia" "New York" ...
 $ income.range     : chr  "$150,000 or more" "$150,000 or more" "Under $20,000" "$150,000 or more" ...
 $ fav.reddit       : chr  "getmotivated" "gaming" "snackexchange" "spacedicks" ...
 $ dog.cat          : chr  NA NA NA NA ...
 $ cheese           : chr  NA NA NA NA ...
> table(reddit$age.range)

      18-24       25-34       35-44       45-54       55-64 65 or Above    Under 18 
      15802       11575        2257         502         140          60        2330

CodePudding user response:

The problem here seems to be that the variable is of type character. For levels() to work, it needs to be a factor. So this should work:

reddit$age.range <- as.factor(reddit$age.range)
levels(reddit$age.range)

The reason for this problem might be that you imported these data using read.csv or read.table, with the option stringsAsFactors = FALSE. The default value for this changed fairly recently. It's a good idea to always specify this option explicitly when loading data.

CodePudding user response:

I’m guessing the course you’re following was created before R 4.0. In R4 the default value for the “stringsAsFactors” argument to data.frame was changed from true to false.

  •  Tags:  
  • Related