An example of making formhub data more readable

formhub.R makes is easy to download and work with datasets on formhub. After downloading, formhub.R post-processes your dataset to convert the different columns to the correct type, which it derives from the type you specified during the creation of your XLSform. If you haven't read the basics document, I recommend that you read that first.

In this example, we will go through how to make data downloaded from formhub prettier by replacing the slugs in your datasets with the text of the orginial question and answer that enumerators saw on ODK or on webforms.

So lets begin with the public good_eats dataset, and look at (1) the names of that csv, and (2) the values of the “ratings” of various good eats.

require(formhub)
# Download the dataset named good_eats in the account of mberg
good_eats <- formhubDownload("good_eats", "mberg")
names(good_eats)
##  [1] "submit_data"       "food_type"         "description"      
##  [4] "amount"            "rating"            "comments"         
##  [7] "risk_factor"       "food_photo"        "location_name"    
## [10] "location_photo"    "gps"               "X_gps_latitude"   
## [13] "X_gps_longitude"   "X_gps_altitude"    "X_gps_precision"  
## [16] "imei"              "submit_date"       "X_uuid"           
## [19] "X_submission_time"
summary(good_eats$risk_factor)
##   high_risk    low_risk medium_risk        NA's 
##          23         116          41          49

We see the “slugs” that Matt input on the name column of his formhub form. But with formhub.R's replaceHeaderNamesWithLabels function, we can easily replace the questions with the actual questions that he asked:

good_eats_readable_questions <- replaceHeaderNamesWithLabels(good_eats)
names(good_eats_readable_questions)
##  [1] "submit_data"       "Type of Eat"       "Description"      
##  [4] "Amount"            "Rating"            "Comments"         
##  [7] "Risk Factor"       "Food Pic"          "Location Name"    
## [10] "Served At"         "Location"          "X_gps_latitude"   
## [13] "X_gps_longitude"   "X_gps_altitude"    "X_gps_precision"  
## [16] "imei"              "submit_date"       "X_uuid"           
## [19] "X_submission_time"

You'll see that all the questions that actually had a label are replaced. The effect is pretty subtle; mostly things are just being capitalized. With this function, the answers to the question remain unreplaced:

summary(good_eats_readable_questions$`Risk Factor`)  # Note: the column name, because it includes a space, is surrounded by backticks (` `)
##   high_risk    low_risk medium_risk        NA's 
##          23         116          41          49

We can also do that, easily, using the replaceAllNamesWithLabels function:

good_eats_readable <- replaceAllNamesWithLabels(good_eats)
summary(good_eats_readable$`Risk Factor`)
## High Risk (Hope it was worth it)                         Low Risk 
##                               23                              116 
##       Medium Risk (Questionable)                             NA's 
##                               41                               49

And of course, even the graph comes out looking slightly better with better default labels:

require(ggplot2)
qplot(data = good_eats_readable, x = submit_date, fill = `Risk Factor`)

plot of chunk RD.5

For multi-lingual forms, the replaceAllNamesWithLabels function takes a language argument:

pde <- formhubDownload("points_d_eau", "Roxance")
pde_fr <- replaceAllNamesWithLabels(pde, language = "French")
qplot(data = pde_fr, x = `A-6.6 Qui gère cette source/ ce point d’eau ?`) + 
    coord_flip()

plot of chunk RD.6