I'm using sqlpackage to import .bacpac to Azure Databases but once the datas have been imported it starts enabling back the indexes and it takes forever. I would like to skip this part.
I though it was bound to the following parameter : /p: DisableIndexesForDataPhase=(BOOLEAN TRUE) but it's defaulted as true and it seems it does not work that way.
Is there a parameter or extention I didnt see that could achieve that ?
Related
CALL apoc.import.csv(
[{fileName: 'file:/persons.csv', labels: ['Person']}],
[{fileName: 'file:/knows.csv', type: 'KNOWS'}],
{delimiter: '|', arrayDelimiter: ',', stringIds: false}
)
For this example, internally, does the 'import' use merge or create to add nodes, relationships and properties? I tested, it seems it uses 'create' to add new rows even for a new ID record. Is there a way to control this? When to use apoc.load VS apoc.import? It seems apoc.load is a lot more flexible, where users can choose to use cypher commands specifically for purposes. Right?
From the source of CsvEntityLoader (which seems to be doing the work under the covers), nodes are blindly created rather than being merged.
While there's an ignoreDuplicateNodes configuration property you can set, it just ignores IDs duplicated within the incoming CSV (i.e. it's not de-duplicating the incoming records against your existing graph). You could protect yourself from creating duplicate nodes by creating an appropriate unique constraint on any uniquely-identifying properties, which would at least prevent you accidentally running the same import twice.
Personally I'd only use apoc.import.csv to do a one-off bulk load of data into a fresh graph (or to load a dump from another graph that was exported as a CSV by something like apoc.export.csv.*). And even then, you've got the batch import tool that'll do that job with higher performance for large datasets.
I tend to use either the built-in LOAD CSV command or apoc.load.csv for most things, as you can control exactly what you do with each record coming in from the file (such as performing a MERGE rather than a CREATE).
As indicated by by #Pablissimo's answer, the ignoreDuplicateNodes config option (when explicitly set to true) does not actually check for duplicates in the DB - it just checks within the file. A request to address this hole was brought up before, but nothing has been done yet to address it. So, if this is a concern for your use case, then you should not use apoc.import.csv.
The rest of this answer applies iff your files never specify nodes that already exist in your DB.
If your node CSV file follows the neo4j-admin import command's import file header format and has a header that specifies the :ID field for the column containing the node's unique ID, then the apoc.import.csv procedure should, by default, fail when it encounters duplicate node IDs (within the same file). That is because the procedure's ignoreDuplicateNodes config value defaults to false (you can specify true to skip duplicate IDs instead of failing).
However, since your node imports are not failing but are generating duplicate nodes, that implies your node CSV file does not specify the :ID field as appropriate. To fix this, you need to add the :ID field and call the procedure with the config option ignoreDuplicateNodes:true. Or, you can modify those CSV files somehow to remove duplicate rows.
I am trying to import multiple rdf files into neo4j as described here
My problem is that even though elements have the same rdf:ID they end up being imported as different neo4j nodes with different uris prefixed by the different file names like file:/x.xml#_00141f6c-69b1-4a1a-a83b-333d0bb9d586 and file:/y.xml#_00141f6c-69b1-4a1a-a83b-333d0bb9d586.
I have tried to use:
call semantics.addNamespacePrefix("local","file:/x.xml#")
call semantics.addNamespacePrefix("local","file:/y.xml#")
before importing but to no avail. I have additionally tried to set handleVocabUris: "MAP" as an option for the import function.
Is there an import option that I am missing which allows these nodes to be unified? Is there generally an elegant way to reunify them after importing?
My current workaround is to copy each file into a temp file before loading so that the prefixes are the same. Neo4j joins the nodes with the same uri into one, which is exactly what I need.
Still happy to hear about an elegant way to do this though..
In Dataflow 1.x versions, we could use CloudBigtableIO.writeToTable(TABLE_ID) to create, update, and delete Bigtable rows. As long as a DoFn was configured to output a Mutation object, it could output either a Put or a Delete, and CloudBigtableIO.writeToTable() successfully created, updated, or deleted a row for the given RowID.
It seems that the new Beam 2.2.0 API uses BigtableIO.write() function, which works with KV<RowID, Iterable<Mutation>>, where the Iterable contains a set of row-level operations. I have found how to use that to work on Cell-level data, so it's OK to create new rows and create/delete columns, but how do we delete rows now, given an existing RowID?
Any help appreciated!
** Some further clarification:
From this document: https://cloud.google.com/bigtable/docs/dataflow-hbase I understand that changing the dependency ArtifactID from bigtable-hbase-dataflow to bigtable-hbase-beam should be compatible with Beam version 2.2.0 and the article suggests doing Bigtble writes (and hence Deletes) in the old way by using CloudBigtableIO.writeToTable(). However that requires imports from the com.google.cloud.bigtable.dataflow family of dependencies, which the Release Notes suggest is deprecated and shouldn't be used (and indeed it seems incompatible with the new Configuration classes/etc.)
** Further Update:
It looks like my pom.xml didn't refresh properly after the change from bigtable-hbase-dataflow to bigtable-hbase-beam ArtifactID. Once the project got updated, I am able to import from the
com.google.cloud.bigtable.beam.* branch, which seems to be working at least for the minimal test.
HOWEVER: It looks like now there are two different Mutation classes:
com.google.bigtable.v2.Mutation and
org.apache.hadoop.hbase.client.Mutation ?
And in order to get everything to work together, it has to be specified properly which Mutation is used for which operation?
Is there a better way to do this?
Unfortunately, Apache Beam 2.2.0 doesn't provide a native interface for deleting an entire row (including the row key) in Bigtable. The only full solution would be to continue using the CloudBigtableIO class as you already mentioned.
A different solution would be to just delete all the cells from the row. This way, you can fully move forward with using the BigtableIO class. However, this solution does NOT delete the row key itself, so the cost of storing the row key remains. If your application requires deleting many rows, this solution may not be ideal.
import com.google.bigtable.v2.Mutation
import com.google.bigtable.v2.Mutation.DeleteFromRow
// mutation to delete all cells from a row
Mutation.newBuilder().setDeleteFromRow(DeleteFromRow.getDefaultInstance()).build()
I would suggest that you should continue using CloudBigtableIO and bigtable-hbase-beam. It shouldn't be too different from CloudBigtableIO in bigtable-hbase-dataflow.
CloudBigtableIO uses the HBase org.apache.hadoop.hbase.client.Mutation and translates them into the Bigtable equivalent values under the covers
I am currently writing some Java code extracting some data and writing them as Linked Data, using the TRIG syntax. I am now using Jena, and Fuseki to create a SPARQL endpoint to query and visualize this data.
The data is written so that each source dataset gives me a .trig file, containing one named graph. So I want to load thoses files in Fuseki. Except that it doesn't seem to understand the Trig syntax...
If I remove the named graphs, and rename the files as .ttl, everything loads perfectly in the default graphs. But if I try to import trig files :
using Fuseki's webapp uploader, it either crashes ("Can't make new graphs") or adds nothing except the prefixes, as if the graphs other than the default ones could not be added (the logs say nothing helpful except the error code and description).
using Java code, the process is too slow. I used this technique : " Loading a .trig file into TDB? " but my trig files are pretty big, so this solution is not very good for me.
So I tried to use the bulk loader, the console command 'tdbloader'. This time everything seems fine, but in the webapp, there is still no data.
You can see the process going fine here : Quads are added just fine
But the result still keeps only the default graph and its original data : Nothing is added
So, I don't know what to do. The guys behind Jena and Fuseki suggested not to use the bulk loader in the Java code (rather than the command line tool), so that's one solution I guess I'd like to avoid.
Did I miss something obvious about how to load TRIG files to Fuseki? Thanks.
UPDATE :
As it seemed to be a problem in my configuration (see the comments of this post for a link to my config file; I cannot post more than 2 links), I tried to add some kind of specification for some named graphs I would like to see added to the dataset on Fuseki.
I added code to link (with ja:namedgraph) external graphs that I added via tdbloader. This seems to work. Great!
Now another problem : there's no inference, even when my config file specifies an Inference model... I set that queries should be applied with named graphs merged as the default graph, but this does not seem to carry the OWL Inference rules...So simple queries work, but I have 1/ to specify the graph I query (with "FROM") and 2/ no inference in my data.
The two methods are to use the tdb bulkloader offline or you can POST data into the dataset directly. (i.e. HTTP POST operations to http://localhost:3030/ds).
You can test where your graph are there with a query like
SELECT (count(*) AS ?C) { GRAPH ?g { ?s ?p ?o } }
The named graphs will show up when the Fuseki server is started unless your configuration of the SPARQL services only exports one graph.
Many of the samples I have seen for angular2 have the following import statement:
import {bind} from 'angular2/di';
I am working in VS Code (with TypeScript) and it complains about not being able to find the angular2/di module.
However I do see a bind function defined in angular2/angular2.d.ts. If I change the import statement to the following, then the error goes away.
import {bind} from 'angular2/angular2';
Is the question in the title off-base and I am making some erroneous assumption?
If not, why do many samples reference one module to import the bind function from, yet I seem to be able to get it from a different module?
Most likely because you looked at versions from older alphas. Look at the angular2.ts file. Everything is exported from it. Also note that the d.ts is going to contain everything to resolve types in your IDE and at compilation time. What import does is actually importing the .js files.