In my rails app a user can have a directory structure which has folders and files in sub-folders. Which is the best way to store such data ??
Also, which database offers best way to do so?
You can store a directory tree in a single table using any SQL database, by making the table self-referential. A good example is the Windows Installer's Directory table, where you will see a structure like this:
Directory = primary key id field, typically an integer
Directory_Parent = "foreign key" id field, which points to the id of another Directory in the same table
Value = string containing the directory/folder name
Your file table would then have a foreign key referencing the Directory id. To find the full path, you must follow it up the chain and build up the path from the end (right), tacking each parent directory onto the front (left). For example, the file would point to Directory id '4' with the Value 'subfolder', then you fetch the parent's value 'folder', then the parents value again until you get to the root, creating a path like /root/folder/subfolder/filename.
If your database supports recursive queries (either Oracle's connect by or the standard recursive common table expressions) then a self referencing table is fine (it's easy to update and query).
If your DBMS does not support hierarchical queries, then Eimantas suggestion to use a preordered tree traversal scheme is probably the best way.
It's simple tree stored in sql. Either check the standard parent-child scheme or implement preordered tree traversal scheme (left-right).
Related
CALL apoc.import.csv(
[{fileName: 'file:/persons.csv', labels: ['Person']}],
[{fileName: 'file:/knows.csv', type: 'KNOWS'}],
{delimiter: '|', arrayDelimiter: ',', stringIds: false}
)
For this example, internally, does the 'import' use merge or create to add nodes, relationships and properties? I tested, it seems it uses 'create' to add new rows even for a new ID record. Is there a way to control this? When to use apoc.load VS apoc.import? It seems apoc.load is a lot more flexible, where users can choose to use cypher commands specifically for purposes. Right?
From the source of CsvEntityLoader (which seems to be doing the work under the covers), nodes are blindly created rather than being merged.
While there's an ignoreDuplicateNodes configuration property you can set, it just ignores IDs duplicated within the incoming CSV (i.e. it's not de-duplicating the incoming records against your existing graph). You could protect yourself from creating duplicate nodes by creating an appropriate unique constraint on any uniquely-identifying properties, which would at least prevent you accidentally running the same import twice.
Personally I'd only use apoc.import.csv to do a one-off bulk load of data into a fresh graph (or to load a dump from another graph that was exported as a CSV by something like apoc.export.csv.*). And even then, you've got the batch import tool that'll do that job with higher performance for large datasets.
I tend to use either the built-in LOAD CSV command or apoc.load.csv for most things, as you can control exactly what you do with each record coming in from the file (such as performing a MERGE rather than a CREATE).
As indicated by by #Pablissimo's answer, the ignoreDuplicateNodes config option (when explicitly set to true) does not actually check for duplicates in the DB - it just checks within the file. A request to address this hole was brought up before, but nothing has been done yet to address it. So, if this is a concern for your use case, then you should not use apoc.import.csv.
The rest of this answer applies iff your files never specify nodes that already exist in your DB.
If your node CSV file follows the neo4j-admin import command's import file header format and has a header that specifies the :ID field for the column containing the node's unique ID, then the apoc.import.csv procedure should, by default, fail when it encounters duplicate node IDs (within the same file). That is because the procedure's ignoreDuplicateNodes config value defaults to false (you can specify true to skip duplicate IDs instead of failing).
However, since your node imports are not failing but are generating duplicate nodes, that implies your node CSV file does not specify the :ID field as appropriate. To fix this, you need to add the :ID field and call the procedure with the config option ignoreDuplicateNodes:true. Or, you can modify those CSV files somehow to remove duplicate rows.
How can I prevent replacing the existing file with a new file which has the same name, when I upload file to one drive?
I am using PUT /me/drive/items/{parent-id}:/{filename}:/content docs end point.
I instead need to keep indexing (test.jpg, test (1).jpg) or, just like google drive does, add two files with the same name.
You can control this behavior using Instance Attributes, specifically the #microsoft.graph.conflictBehavior query parameter. There are three supported conflict behaviors; fail, replace (the default), and rename.
The conflict resolution behavior for actions that create a new item. You can use the values fail, replace, or rename. The default for PUT is replace. An item will never be returned with this annotation. Write-only.
In order to have it automatically rename the file, you add #microsoft.graph.conflictBehavior=rename as a query parameter to your URI.
PUT /me/drive/items/{parent-id}:/{filename}:/content?#microsoft.graph.conflictBehavior=rename
I have a data model that starts with a single record, this has a custom "recordId" that's a uuid, then it relates out to other nodes and they then in turn relate to each other. That starting node is what defines the data that "belongs" together, as in if we had separate databases inside neo4j. I need to export this data, into a backup data-set that can be re-imported into either the same or a new database with ease
After some help, I'm using APOC to do the export:
call apoc.export.cypher.query("MATCH (start:installations)
WHERE start.recordId = \"XXXXXXXX-XXX-XXX-XXXX-XXXXXXXXXXXXX\"
CALL apoc.path.subgraphAll(start, {}) YIELD nodes, relationships
RETURN nodes, relationships", "/var/lib/neo4j/data/test_export.cypher", {})
There are then 2 problems I'm having:
Problem 1 is the data that's exported has internal neo4j identifiers to generate the relationships. This is bad if we need to import into a new database and the UNIQUE IMPORT ID values already exist. I need to have this data generated with my own custom recordIds as the point of reference.
Problem 2 is that the import doesn't even work.
call apoc.cypher.runFile("/var/lib/neo4j/data/test_export.cypher") yield row, result
returns:
Failed to invoke procedure apoc.cypher.runFile: Caused by: java.lang.RuntimeException: Error accessing file /var/lib/neo4j/data/test_export.cypher
I'm hoping someone can help me figure out what may be going on, but I'm not sure what additional info is helpful. No one in the Neo4j slack channel has been able to help find a solution.
Thanks.
problem1:
The exported file does not contain any internal neo4j ids. It is not safe to use neo4j ids out of the database, since they are not globally unique. So you should not use them to transfer data from one database to another.
If you are about to use globally uniqe ids, you can use an external plugin like GraphAware UUID plugin. (disclaimer: I work for GraphAware)
problem2:
If you cannot access the file, then possible reasons:
apoc.import.file.enabled=true is not set in neo4j.conf
os level
permission is not set
I have a series of folders that all have files I need linked to the database (via there file path). One option I could do is manually insert all the file paths into my database, however, this can be painful as the number of folders will keep increasing and manual uploading will take too much time.
Is there a way to write a ruby helper function that will search these folders and automatically add the path to the files into a column in my database?
All the file paths have a recognizable pattern, for example:
Tel/a_1/poi1/names.csv
Tel/a_2/poi1/names.csv
Tel/a_3/poi1/names.csv
I need a function that will occupy a field in my database with the path of each of these names.csv files. Very new to ruby and rails, so any help is greatly appreciated. Also, please let me know if anything is unclear.
Something like this should give u all the filenames in the folder, for you to manipulate:
Dir["Tel/**/**/*.csv].each do |file|
* update attribute of your model with the path of the file
end
Read about the Dir object too.
Thats a example to get all files.
Dir["Tel/a_*/poi1/names.csv"] return a Array with path about all files.
Is there any way in action script to list all the keys (not values) of given resource bundle.
My use case is to combine content of two different resource bundle. I wan to do this by creating a new resource bundle at runtime and add each key value pair from two different resource bundles into one. I would appreciate if anyone have idea if it can be done different way.
If you get the actual resource bundle (eg. IResourceBundle), you could iterate over the name keys in the content object I would think.