Working with website data
Network data and visualization
- Download 03_les miserablese_GraphData.txt
- Visualize this undirected network graph
- Summarize the graph and explain the output
- Calculate the degree and the centrality of this graph
- Find out some important nodes (corresponding to novel characters)
- Will the results change if we assume the graph is directed?
Data conversion and parallel computing
- Download CaseStudy12_ AdultsHeartAttack_Data.xlsx or require online
- Load this data as data frame
- Use
Export() or write.xlsx() to renew the xlsx file
- Use
rio package to convert this “.xlsx”" file to “.csv”
- Generate generalizing tabular data structures
- Generate a
data.table
- Create disk-based data frames and perform basic calculation
- Perform basic calculation on the last 5 columns as a big matrix
- Use DIAGNOSIS, SEX, DRG, CHARGES, LOS and AGE to predict DIED with randomForest setting
ntree=20000. Notice: sample without replacement to get as large as possible balanced dataset
- Run
train() in caret and detect the execution time
- Detect cores and make proper number of clusters
- Rerun
train() parallelized and compare the execute time
- Use
foreach and doMC to design a parallelized random forest with ntree=20000 and compare the execution time against linear sequential execution.
SOCR Resource Visitor
number