SOCR ≫ DSPA ≫ Topics ≫

Use the SOCR Jobs Data to practice learning via Apriori Association Rules

  • Load the Jobs Data. Use this guide to load HTML data.

  • Focus on the Description feature. Replace all underscore characters “_" with spaces.

  • Review chapter 7, use tm package to process text data to plain text (Hint: need to apply stemDocument as well, we will discuss more details in chapter 19).

  • Generate a “transaction” matrix by considering each job as one record and description words as “transaction” items. (Hint: You need to fill missing values since records do not have the same length of description.)

  • Save the data using write.csv() and then use read.transactions() in arules package to read the CSV data file. Visualize the item support using item frequency plots. What terms appear as more popular?

  • Fit a model: myrules <- apriori(data=jobs,parameter=list(support=0.02, confidence=0.6, minlen=2)). Try out several rule thresholds trading off gain and accuracy.

  • Evaluate the rules you obtained with lift and visualize their metics.

  • Mine medical related rules(e.g.,rules include “treatment”, “patient”, “care”, “diagnos”. Notice: these are word stems).

  • Sort the set of association rules for all and medical related subsets.

  • Save these rules into a CSV file.

SOCR Resource Visitor number Dinov Email