A place to collect my thoughts on data analysis within Enrollment Management. Dare I call it Enrollment Science?
Below is a quick writeup on how I use R
and RNeo4j
to munge my data and throw “larger” datasets into Neo4j. In short, I am fairly capable in R, so I prefer to use it to do the heavy lifting.
All I am doing is calling the neo4j-shell
tool via ?system
command. This post runs through how I have used this approach in some of my recent projects. I used this process for a project that I am currently working on at work, where 3+ million nodes and nearly 9 million relationships.
Below is a basic project structure when I am combining R and Neo4j
neo-project/
├── R
│ ├── 1-get-data.R
│ ├── 2-clean-data.R
│ └── 3-import-data.R
├── README.md
├── cql
│ ├── constraints.cql
│ └── import-nodes.cql
└── data
├── file1.csv
└── file2.csv
3 directories, 8 files
There are many ways to organize projects, but I recommend that you stick with one that works for you.
csv
filesneo4j-shell
On the third step above, I am using a helper function, shown below.
### helper function to load a CQL file into neo4j shell
build_import = function(neo_shell = "~/neo4j-community-2.3.1/bin/neo4j-shell",
cypher_file) {
cmd = sprintf("%s -file %s", neo_shell, cypher_file)
system(cmd)
}
In my case, I am simply running a community edition, and point to the path of the neo4j-shell
tool. If you get an error, you may need to ensure that:
After that, usage is simple within your R script.
build_import(cypher_file = "../cql/import-geo.cql")
The RNeo4j
package is great, and in particular, I love the ?clear
function which helps us rapidly prototype our data imports, the data model, etc.
Below is an example session of what the R script 3-import-data.R
might look like.
###############################################################################
## Load the csvs through Cypher and the terminal
## in a non-Windows environment
###############################################################################
options(stringsAsFactors = FALSE)
## the packages
library(RNeo4j)
## connect to a running db server
graph = startGraph("http://localhost:7474/db/data/",
username = "neo4j",
password = "password")
###############################################################################
## helper function
###############################################################################
## cypher helper function to load against the shell tool
build_import = function(neo_shell = "~/neo4j-community-2.3.1/bin/neo4j-shell",
cypher_file) {
cmd = sprintf("%s -file %s", neo_shell, cypher_file)
system(cmd)
}
###############################################################################
## clear database for testing
###############################################################################
## clear the entire database
clear(graph, input = FALSE)
###############################################################################
## load daata - 1 file per import
###############################################################################
build_import(cypher_file = "../cql/constraints.cql")
build_import(cypher_file = "../cql/import-nodes.cql")