SOCR ≫ DSPA ≫ Topics ≫

1 Working with website data

2 Network data and visualization

  • Download 03_les miserablese_GraphData.txt

  • Visualize this undirected network.

  • Summary the graph and explain the output.

  • Calculate degree and the centrality of this graph.

  • Find out some important characters.

  • Will the result change or not if we assume the graph is directed.

3 Data conversion and parallel computing

  • Download CaseStudy12_ AdultsHeartAttack_Data.xlsx or require online.

  • load this data as data frame.

  • Use Export() or write.xlsx() to renew the xlsx file.

  • Use rio package to convert this “.xlsx”" file to “.csv”.

  • Generate generalizing tabular data structures.

  • Generate data.table.

  • Create disk-based data frames and perform basic calculation.

  • Perform basic calculation on the last 5 columns as a big matrix.

  • Use DIAGNOSIS, SEX, DRG, CHARGES, LOS and AGE to predict DIED with randomForest setting ntree=20000. Notice: sample without replacement to get an as large as possible balanced dataset.

  • Run train() in caret and detect the execute time.

  • Detect cores and make proper number of Clusters.

  • Rerun train() parallelized and compare the execute time.

  • Use foreach and doMC to design a parallelized random forest with ntree=20000 totally and compare the execute time with sequential execution.

SOCR Resource Visitor number Dinov Email