In Stata I have split a variable where up to 20 countries were separated by a comma and now I have twenty different variables (country1 to country20), but the same country is listed in more than one variable country1 to country20.
For instance, Uganda may be in country1, country2 and in country5. Now, I want to create one variable for each country (1 if true, 0 false). So, basically Iwant one variable for each of the twenty countries. I tried this but did not work.
local N = _N
forvalues i = 1/`N' {
local s1 = Countryies1 [`i']
local s2 = Countryies2 [`i']
local s3 = Countryies3 [`i']
local s4 = Countryies4 [`i']
local s5 = Countryies5 [`i']
local s6 = Countryies6 [`i']
local s7 = Countryies7 [`i']
local s8 = Countryies8 [`i']
local s9 = Countryies9 [`i']
local s10 = Countryies10 [`i']
local s11 = Countryies11 [`i']
local s12 = Countryies12 [`i']
local s13 = Countryies13 [`i']
local s14 = Countryies14 [`i']
local s15 = Countryies15 [`i']
local s16 = Countryies16 [`i']
local s17 = Countryies17 [`i']
local s18 = Countryies18 [`i']
local s19 = Countryies19 [`i']
local s20 = Countryies20 [`i']
local intersection: list s1 & s2 & s3 & s4 & s5 & s6 & s7 & s8 & s9 & s10 & s11 & s12 & s13 & s14 & s15 & s16 & s17 & s18 & s19 & s20
replace country ="`intersection'" in `i'
}
CodePudding user response:
This seems to work -- and does not in any sense rule out other solutions.
clear
input str42 countries
"Uganda"
"Uganda, Kenya"
"Uganda, Kenya, Tanzania"
"South Africa"
end
gen id = _n
save datasofar, replace
keep id countries
split countries, parse(,)
drop countries
reshape long countries, i(id) j(which)
drop if missing(countries)
replace countries = trim(countries)
gen name = strtoname(countries)
levelsof name, local(names)
gen new_id = _n
foreach n of local names {
gen is_`n' = name == "`n'"
su new_id if is_`n', meanonly
label var is_`n' "`=countries[r(min)]'"
local vars `vars' is_`n'
}
collapse (max) `vars', by(id)
merge 1:1 id using datasofar
----------------------------------------------------------------------------------------
| id is_Kenya is_Sou~a is_Tan~a is_Uga~a countries _merge |
|----------------------------------------------------------------------------------------|
1. | 1 0 0 0 1 Uganda Matched (3) |
2. | 2 1 0 0 1 Uganda, Kenya Matched (3) |
3. | 3 1 0 1 1 Uganda, Kenya, Tanzania Matched (3) |
4. | 4 0 1 0 0 South Africa Matched (3) |
----------------------------------------------------------------------------------------
Another kind of solution is to just to loop over the names, so
foreach c in Uganda Kenya Tanzania "South Africa" {
local C = strtoname("`c'")
gen is_`C' = strpos(countries, "`c'") > 0
}
but watch out -- variations in spelling will bite you. They will bite with the earlier code too.
