Max De Marzi

Batch Importer – Part 2 « Max De Marzi says:

February 28, 2012 at 1:45 PM

[…] you’ve been following along, we got Michael’s Batch Importer, compiled it, created some test data, ran it and saw […]

Reply

Batch Importer – Neo4j « Another Word For It says:

March 7, 2012 at 4:44 PM

[…] Batch Importer – Part 1: CSV files. […]

Reply

Batch Importer – Part 3 « Max De Marzi says:

July 2, 2012 at 3:55 PM

[…] the end of February, we took a look at Michael Hunger’s Batch Importer. It is a great tool to load millions of nodes and relationships into Neo4j quickly. The only thing […]

Reply

Graph Generator « Max De Marzi says:

July 3, 2012 at 3:12 PM

[…] recall, I’ve had three blog posts about the Batch Importer. In the first one, I showed you how to install the Batch Importer, in the second one, I showed you how to use data in your relational database to generate the csv […]

Reply

volkan Tüfekçi (@volkantufekci) says:

August 15, 2012 at 7:59 AM

Hi Max,

My thesis work requires filling a Neo4j server instance with at least 1M nodes(+ their relationships) as quickly as possible. (I am using Neo4j server instead of embedded as I need to communicate between servers running on different machines)

I tried REST Api Batch Ops(via Neography) but I realised that it is not the way to go. Then I found out your entry and now I am trying to use batch-importer. It works, but it takes too much time. My testbed is a AWS Large instance with 7.5GB ram, 2virtual cores.

As a comparison; you have written that “Importing 7500000 Nodes took 17 seconds”, the same value for me is 8 times larger, 138 seconds.

Batch importer is running for 2.5 hours, still puttings dots but the last and only thing it printed out was “Importing 7500000 Nodes took 138 seconds”.

Do you have any idea what slows down the operation?
Could you please your test configuration…

Thanks a lot for your great blog and for neography…

Reply

Erik Fäßler (@Khituras) says:

April 17, 2013 at 2:18 AM

Hi Vokan,

did you ever solve this issue? I’m facing the exact same problem, I want to add a lot of data into a remote Neo4j Server instance and I don’t want to / can’t shut down the DB for that or taking the embedded approach. Did have any luck in the end?

Thanks!

Erik

Reply

maxdemarzi says:

August 15, 2012 at 8:54 AM

Volkan,

2.5 hours? Something is not right. Do your nodes and relationships have a ton of properties? Can you check inside the graph.db folder being created and see the file sizes growing? Are you indexing (that’s a bit slower than creating nodes and relationships)? Post your answers on the neo4j google forum and we’ll figure this out.

Thanks,
Max

Reply

Keith Strickland says:

October 20, 2012 at 5:01 PM

Hi Max,

I was trying to install this using maven as your instructions suggest but I’m getting the following error:

C:\Users\GBS\git\batch-import>mvn clean compile assembly:single
[INFO] Scanning for projects…
[INFO]
[INFO] ————————————————————————
[INFO] Building Simple Batch Importer 0.1-SNAPSHOT
[INFO] ————————————————————————
[WARNING] The POM for org.neo4j:neo4j-kernel:jar:1.8-SNAPSHOT is missing, no dependency information available
[WARNING] The POM for org.neo4j:neo4j-lucene-index:jar:1.8-SNAPSHOT is missing, no dependency information available
[INFO] ————————————————————————
[INFO] BUILD FAILURE
[INFO] ————————————————————————
[INFO] Total time: 0.453s
[INFO] Finished at: Sat Oct 20 18:38:24 EDT 2012
[INFO] Final Memory: 6M/77M
[INFO] ————————————————————————
[ERROR] Failed to execute goal on project batch-import: Could not resolve dependencies for project org.neo4j:batch-impor
t:jar:0.1-SNAPSHOT: The following artifacts could not be resolved: org.neo4j:neo4j-kernel:jar:1.8-SNAPSHOT, org.neo4j:ne
o4j-lucene-index:jar:1.8-SNAPSHOT: Failure to find org.neo4j:neo4j-kernel:jar:1.8-SNAPSHOT in http://m2.neo4j.org/conten
t/repositories/snapshots was cached in the local repository, resolution will not be reattempted until the update interva
l of Neo4j Snapshots has elapsed or updates are forced -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException
C:\Users\GBS\git\batch-import>

I’ve searched and can’t seem to find anything concerning the error above. Hopefully you can point me in the right direction.

Thanks

Reply

maxdemarzi says:

October 20, 2012 at 6:54 PM

Open up the pom.xml and change the two 1.8-SNAPSHOT to 1.8. Michael will update his repo shortly.

Reply

Keith Strickland says:

October 22, 2012 at 8:33 AM

Thanks Max! That got it :)

Enzo says:

February 25, 2013 at 4:01 AM

Hi
Tried to compile TestDataGenerator. Initially couldn’t find the file, then found it in /src/test/java/org/neo4j/batchimport/TestDataGenerator.java.
But then got compile errors:
./src/test/java/org/neo4j/batchimport/TestDataGenerator.java:3: package org.junit does not exist
import org.junit.Ignore;
^
./src/test/java/org/neo4j/batchimport/TestDataGenerator.java:14: cannot find symbol
symbol: class Ignore
@Ignore
^
Help please!!
Enzo

Reply

maxdemarzi says:

February 25, 2013 at 9:41 AM

Enzo,

Can you post your error to https://groups.google.com/forum/?fromgroups#!forum/neo4j ?
We can better help you there.

Regards,
Max

Reply

Importing data into Neo4j – the spreadsheet way « Another Word For It says:

March 6, 2013 at 6:23 PM

[…] are many technical tools out there (definitely look here, here and here, but I needed something simple. So my friend and colleague Michael Hunger came to the […]

Reply

Seb says:

March 22, 2013 at 5:21 AM

Hi,

is there a release (JAR file) available somewhere? Building it is such a pain…thanks!

Reply

maxdemarzi says:

March 22, 2013 at 7:50 AM

You can grab this one from my public dropbox => https://dl.dropbox.com/u/57740873/batch-import-jar-with-dependencies.jar

Reply

mentatseb says:

March 22, 2013 at 7:59 AM

Thanks! For those who rarely use Maven projects it’s a real help :)

Anyway here is how to do with Netbeans:
– clone the project
– open it in Netbeans
– right-click on the project name, select Properties, then the Actions panel
– select Build with Dependencies and add this goal to the Execute Goals settings: ‘assembly:single’
– add also the property Skip Tests
– press OK
– right-click again on the project name, select Resolve Problems on the bottom to download the dependencies.
– right-click again and select Build with Dependencies

cheers,
Seb

Permission Resolution with Neo4j – Part 2 | Max De Marzi says:

March 24, 2013 at 11:24 PM

[…] so instead of typing out a million node graph, we’ll build a graph generator and use the batch importer to load it into Neo4j. What I want to create is a set of files to feed to the batch-importer. A […]

Reply

从Excel中导入数据到Neo4j | Neo4j中文站 – 世界领先的图数据库 says:

March 30, 2013 at 9:58 PM

[…] 有需要技术教程教我们如何做（比如batch-import,batch importer part，import），但我需要一些简单的方法，所以我的朋友和同事 Michael Hunger前来帮助我，提供了一些方法用于创建一个Excel将数据导入Neo4j. […]

Reply

Enric G. Torrents says:

April 9, 2013 at 8:30 PM

Max, thanks for all these tutorials. Have you noticed that batch-import tool does not support UTF-8 encoding? No accents, no non-English characters at all, this is a massive problem for many of us. I have already raised the issue in github, do you have any idea how to make it work?

Reply

tameem says:

May 23, 2013 at 8:25 AM

Hi,
I am trying to import 117,000,000+ nodes as well as their relationships and indices on a server using the batchimport jar and running it using netbeans. We did it before but indexing wasn’t implemented correctly so we are kind of debugging and running again to check why indexing isn’t working while trying as much as possible what we want to try on a smaller example (2M nodes without relationships) and then trying the same thing on the big files. The problem is that running this on a server with the big files takes more than 27 hours for each run and we end up with it not working and “oh maybe this is why, run again on a small example, great looks like we found it, run on the big files, 27 hours after: oh not working again”. My question is: is there a way to speed up the running time on this big example with the aforementioned number of nodes?

Reply

maxdemarzi says:

May 23, 2013 at 8:00 PM

There is batch.properties file when you run the batch import. The defaults are for a small graph. Tweak these:

use_memory_mapped_buffers=true neostore.nodestore.db.mapped_memory=100M neostore.relationshipstore.db.mapped_memory=500M neostore.propertystore.db.mapped_memory=1G neostore.propertystore.db.strings.mapped_memory=200M neostore.propertystore.db.arrays.mapped_memory=0M neostore.propertystore.db.index.keys.mapped_memory=15M neostore.propertystore.db.index.mapped_memory=15M

To much larger values depending on your expected graph size.

Reply

tameem says:

May 24, 2013 at 9:44 AM

Hello,
Thanks for your answer. I don’t know what’s going wrong in my machine though as after changing the values to larger values, importing the nodes is taking more than two and a half hours and it had been taking one hour before so it became slower.

tameem says:

June 18, 2013 at 3:07 AM

Hello, I am having problems executing queries on an already established graph that has 118 million nodes and 140 million relationships. In the beginning it was a memory problem then I changed the initmemory and maxmemory options to proper values (on a server with 250GB of RAM) which made life much better but then while running the very same queries again that proved this memory change to be effective, they are throwing a memory heap exception which is driving me crazy. I think the problem is in the buffer size. The neo4j website speaks about Xmx and the fact it should be increased but I think there is nothing EXPLICITLY written about how and where to change this value of the heap. Last thing I tried after some guesses on this extremely vague info they give on the website about that, I added a wrapper.java.additional=Xmx and wrapper.java.additional=Xss unfortunately to no avail. It even got worse as much as this linux command “cat /proc/meminfo” is concerned as the “buffers” show a smaller value than before. Any directions about how to effectively change the buffer size?

Reply

tameem says:

June 18, 2013 at 4:40 AM

It’s a “broken pipe” exception

Reply

Importing Data into Neo4j | Leo's Aqu-Blog-Arium says:

June 19, 2013 at 12:25 AM

[…] Maxdemarzi’s blog left us a detailed steps of importing data as a fresh start, with the tools provided by Michael Hunger. […]

Reply

Daniel says:

June 24, 2013 at 10:05 AM

hello people, I have an issue when importing, because there comes a point where I get an error that says “The requested operation can not be performed on a file with a user mapped section open”, anyone could help me?

Reply

Shelley says:

July 4, 2013 at 5:29 PM

I tried your command “java -server -Xmx4G -jar target/batch-import-jar-with-dependencies.jar target/db nodes.csv rels.csv ” in order to try to import the nodes and relationships. I also received an error trying to create the test data, which I resolved by TestDataGenerator.java on eclipse (by importing the project as a maven project). Is there a way I can do the imports in a similar manner in eclipse?

Reply

Daniel says:

July 8, 2013 at 2:25 PM

run this command to generate the test data: mvn clean test-compile exec:java -Dexec.mainClass=org.neo4j.batchimport.TestDataGenerator -Dexec.classpathScope=test -Dexec.args=sorted

Reply

femvestor says:

July 26, 2013 at 3:28 PM

From what I understand in the Neo4j documentations you can either have your neo4j embedded or you can call it through REST. Can you create your neo4f in an embedded environment (Java API) and then access it through REST?

Reply

maxdemarzi says:

July 26, 2013 at 3:34 PM

Yes. You can do both.

Reply

vaibhav jain says:

September 22, 2013 at 1:15 AM

I am trying batch-import but getting this error.I am new to neo4j can you help me to insert bulk data into neo4j…Actually i want to do performance testing of neo4j.

javac ./src/test/java/TestDataGenerator.java -d .
javac: file not found: ./src/test/java/TestDataGenerator.java
Usage: javac
use -help for a list of possible options

Reply

tliimfee says:

October 9, 2013 at 4:57 PM

I ran into a similar problem. That file doesn’t exist anymore in the newest version on github. I think the tutorial is out of date.

I switched to following the projects readme file => https://github.com/jexp/batch-import/blob/master/readme.md
but it was unclear how to go from csv into neo.

Reply

Ravinder says:

November 12, 2013 at 4:45 AM

Hi,
I tried to import data from a csv file and it ran successfully for 100 nodes/records. But when i try to import 300 nodes/records it import only 100 nodes. I don’t know why this is happening. Is there any setting that checks the number of nodes to import ?

Reply

neonewbie says:

November 18, 2013 at 3:15 PM

Hello I’m running the 2.0 branch. After installation I run the maven command but execution fails on the same files with
Caused by: java.lang.IllegalStateException: Index users not configured.
at org.neo4j.batchimport.Importer.importNodes(Importer.java:102)

I thought the program would set up the index automatically? I even tried creating the index manually CREATE INDEX ON :users(name) but still fails on that piece of code

Any suggestions? the indexing functionality looks interesting.

Reply

neonewbie says:

November 18, 2013 at 3:40 PM

I meant sample not same files

Reply

Leapfrog Technology – Training | » Neo4j: Using the Batch Importer says:

November 22, 2013 at 3:23 PM

[…] DeMarzi did a great series of blog posts on the Neo4j batch […]

Reply

kxmehdi says:

January 19, 2014 at 1:30 AM

Hi all,
I am trying to find an importer to load .owl file into neo4j?
Can anyone help me on this.

Reply

fsalvador23 says:

February 20, 2014 at 8:07 PM

Just to let you know that in Neo4j 2.0 We must set “allow_store_upgrade=true” in neo4j.properties. Under conf folder. Cheers.

Reply

Quazi Marufur Rahman (@maruf_q) says:

May 19, 2014 at 1:19 PM

Hi,
I have used a different nodes.csv and rels.csv file.
link: https://gist.github.com/qmaruf/ed69acf8625ac577d578

Everything seems fine and after importing it shows the following message:

	maruf@leopard:~/Desktop/bi/batch-import$ java -server -Xmx4G -jar target/batch-import-jar-with-dependencies.jar target/db nodes.csv rels.csv
	Usage: Importer data/dir nodes.csv relationships.csv [node_index node-index-name fulltext\|exact nodes_index.csv rel_index rel-index-name fulltext\|exact rels_index.csv ….]
	Using: Importer target/db nodes.csv rels.csv

	Using Existing Configuration File

	Importing 4 Nodes took 0 seconds

	Importing 4 Relationships took 0 seconds

	Total import time: 1 seconds

view raw

neo4j

hosted with ❤ by GitHub

But there is not data in db. I have tried executing the following cypher query. “START n=node(*) RETURN n;” and it returns 0 row. It should show at least 4 nodes according to nodes.csv.

Am I missing something?
Eagerly waiting for help.

Thanks

Reply

maxdemarzi says:

May 19, 2014 at 1:54 PM

Did you copy the graph.db directory made by the batch importer into your neo4j/data directory and restart it?

Reply

alexmaddoc says:

May 26, 2014 at 1:40 AM

Cannot compile the data generator.. give me errors:
root@srv:/home/alex/batch-import# javac ./src/test/java/org/neo4j/batchimport/TestDataGenerator.java -d .
./src/test/java/org/neo4j/batchimport/TestDataGenerator.java:3: error: package org.junit does not exist
import org.junit.Ignore;
^
./src/test/java/org/neo4j/batchimport/TestDataGenerator.java:14: error: cannot find symbol
@Ignore
^
symbol: class Ignore
./src/test/java/org/neo4j/batchimport/TestDataGenerator.java:29: error: cannot find symbol
System.out.println(“Using: TestDataGenerator “+nodes+” “+relsPerNode+” “+ Utils.join(types, “,”)+” “+(sorted?”sorted”:””));
^
symbol: variable Utils
location: class TestDataGenerator
3 errors

When trying to run the included generate.sh script, I’m getting:

root@srv:/home/alex/batch-import# ./generate.sh
—————————————————
constituent[0]: file:/usr/share/maven2/lib/maven-debian-uber.jar
—————————————————
java.lang.StringIndexOutOfBoundsException: String index out of range: -1
at java.lang.AbstractStringBuilder.setLength(AbstractStringBuilder.java:173)
at java.lang.StringBuffer.setLength(StringBuffer.java:170)
at org.apache.maven.cli.CLIManager.cleanArgs(CLIManager.java:271)
at org.apache.maven.cli.CLIManager.parse(CLIManager.java:224)
at org.apache.maven.cli.MavenCli.main(MavenCli.java:119)
at org.apache.maven.cli.compat.CompatibleMain.main(CompatibleMain.java:60)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.codehaus.classworlds.Launcher.launchEnhanced(Launcher.java:315)
at org.codehaus.classworlds.Launcher.launch(Launcher.java:255)
at org.codehaus.classworlds.Launcher.mainWithExitCode(Launcher.java:430)
at org.codehaus.classworlds.Launcher.main(Launcher.java:375)

any help?
running on ubuntu server 12.04 lts
2.0 branch of batchimport

Reply

minh nhut says:

June 1, 2014 at 11:04 PM

Need help!
When i copy all the files inside the folder graph.db and paste them to data/graph.db of neo4j. After that I start the server but it not works. Then I create the new graph.db and start server aganin and it works fine. Don’t know why?

Reply

Neo4j | Pearltrees says:

December 1, 2014 at 11:08 PM

[…] Batch Importer – Part 1. Data is everywhere… all around us, but sometimes the medium it is stored in can be a problem when analyzing it. […]

Reply

Fun with Beer - and Graphs - Neo4j Graph Database says:

July 7, 2015 at 3:59 PM

[…] use this Neo4j-Batch-Importer to import CSV files directly into the graph (including indexing), ETL-article by Max de Marzi).So now I had my Gephi project, but how to get it into Neo4j? Well, turns out there is a Gephi […]

Reply

Importing data into Neo4j - the spreadsheet way - Neo4j Graph Database says:

July 9, 2015 at 10:49 AM

[…] Neo4j – so how do I do that?There are many technical tools out there (definitely look here, here and here, but I needed something simple. So my friend and colleague Michael Hunger came to the […]

Reply

Musicbrainz in Neo4j - Part 1 - Neo4j Graph Database says:

July 17, 2015 at 11:50 AM

[…] to TSV. sql2graph was inspired by Max De Marzi blog posts on using batch-import: part 1 ( https://maxdemarzi.com/2012/02/28/batch-importer-part-1/ ) and part 2 ( https://maxdemarzi.com/2012/02/28/batch-importer-part-2/ ) It operates in […]

Reply

Max De Marzi

Graphs, Graphs, and nothing but the Graphs

Batch Importer – Part 1

46 thoughts on “Batch Importer – Part 1”

Leave a comment Cancel reply

Max De Marzi

Graphs, Graphs, and nothing but the Graphs

Batch Importer – Part 1

Share this:

Related

46 thoughts on “Batch Importer – Part 1”

Leave a comment Cancel reply