I came across a scenario i.e, Each and every node is to represent with an image . so i have to store image into neo4j. please healp me out with your views on it.
You have several options:
As #Christophe Willemsen suggested you, save the image on disk or on a web site and reference it using a URL.
Save it in byte[] array as a property.
Save the image in base64 using a Data URI (as String).
Of course option 2 and 3 are not recommendable if the images are very large. Though you could save a thumbnail in the database and the full image on file system.
You should save some metadata too in properties, like type, size, location etc.. to improve searches in the database.
According to the Neo4j community:
Don't add large blobs of data, e.g. base64, to the graph as property values. This is one of the few anti-use-cases for graph databases.
Store the binary information somewhere, e.g. AWS S3, and have a node representing that photo or video. That node contains the S3 URL as reference and potentially some meta-data as well (width, height, quality, codec).
Use relationships to amend information like tagging, comments, ownership and permissions.
Kudos to Stefan Armbruster (Neo4j Staff) for the info.
Related
I am trying to build an app where I download a list of images from different URLs into a collection view. All these images are stored inside a cache with max age and cache limit. I am storing the URL as the key for each image so that I can check if the image already exists when downloading it again and if yes, check the max age, but I have been told that storing URLs as ID is bad practice. Any suggestions on how I could save these images and retrieve their details when needed?
So to be clear were thinking a good way to store info about an image is instead of having an object like this :
"some_url_as_the_key":{ // details about the image }
we could have.
"number_of_bytes_of_the_image_as_key":{ //detail about the image }
Im trying to do the same and need to store the meta data bout each image in a dictionary. Which of the above would be best. I can see an issue in as much as two images could theoretically have the same number of bytes, though 1) this is unlikely, and 2) would possibly be a faster look up when searching for the meta data about an image that i need to store. I.e. a key of '12312423' (number of bytes), vs a long url like "https://www.lindofinasdoifnosidnf.sdfsdfnsdf/dfsdf' would be faster i guess.
Thoughts?
I want to store lots of data into buckets or different hash-maps between two nodes on their edges. Basically there is lot of data getting generated related to the two nodes and i want to keep the data over their edge in a hash. Since the context matters, the edge should have different buckets/hasmaps so that i can write data into and return data from the data written over the edges. How to do this in neo4j, any reference articles please.
Consider that properties are already an hash map key/value. Regarding the context you can have multiple edges between two nodes one for each context. If the contexts are limited and pre-defined you can use a specific name to the relationship that identify the context, otherwise you can add a 'context' property that allow you to identify the context.
How much data do you plan to store? How many keys in the hash map?
I have a question regarding how a graph in Neo4j is loaded into memory
from disk.
Reading the link here, I think I understand how the graph is represented on
disk. And when a new Neo4j databases is created, there are
physically separate files created for Nodes, Edges and Property
stores (mainly).
When you issue a query to Neo4j, does it:
1) Load the entire graph(nodes, edges, properties) in memory using a
doubly link list structure?
OR
2) Determine the nodes, edges required for the query and populate the
list structure with random accessess to the relavant stores(nodes,
edges) on disk? If so, how does Neo4j minimize the number of disk-accesses?
As frobberOfBits mentions it's more like #2. The disc accesses are minimized by a two-layered cache architecture which is best described in the reference manual.
Even if your cache is smaller than the store files this results mostly in seek operations (since a fixed record length) with a read. This kind of operations are typically fast (even faster with appropriate hardware like SSD)
I have a lot of documents about 30 TB, These docs have other attributes associated with it
don't want to store the actual documents after indexing it with Solr since there it is stored somewhere else and I can access it if needed later
The other data attributes will also be indexed with solr and won't be deleted.
I'm currently developing with Ruby on rails and have mysql but would like to move to
Mongodb. Is the scenario above possible?
Thanks
-Maged
You don't have to store original content in Solr. That's the difference between stored and indexed. If you set stored to false, you will only keep the processed, tokenized version of content as needed for search. Just make sure you keep your ID stored. This is set in your field definition in schema.xml.
This does mean Solr cannot return any of the non-stored fields back to the user, so you need to match them to the original records based on IDs (just as you seem to suggest).
This also break the partial document updates, so you will need to make sure you are reindexing the whole document when things changed.
As I understand, that you don't want to play with you content of the document. Once you'll index it and keep it. The other data properties, you want to index frequently. It's better you create your "content" field stored and indexed both, if you are not concerned about space. Choose the tokenizer and filters for content smartly, so that it creates less tokens.
For partial update, follow http://solr.pl/en/2012/07/09/solr-4-0-partial-documents-update/
In my Rails app, I want to store the geographical bounds of places column fields in a database. E.g., the boundary of New York is represented as a polygon: an array of arrays.
I have declared my model to serialize the polygons, but I am unsure whether I should even store them like this. The size of these serialized polygons easily exceed 100,000 characters, and MySQL only can store about 65000 characters in a standard TEXT field.
Now I know MySQL also has a LONGTEXT field. But I really want my app to be database-agnostic. How does Rails handle this by itself? Will it switch automatically to LONGTEXT fields? What about when I start using PostgreSQL?
At this point I suggest you ask yourself - does this data need to be stored, or should be store in a database in this format?
I propose 2 possible solutions:
Store your polygons in the filesystem, and reference them from the database. Such large data items are of little use in a database - it's practically pointless to query against them as text. The filesystem is good at storing files - use it.
If you do need these polygons in the database, store them as normalised data. Have a table called polygon, and another called point, deserialize the polygons and store it in a way that reflects the way that databases are intended to be used.
Hope this is of help.
Postgresql has a library called PostGIS that my company uses to handle geometric locations and calculations that may be very helpful in this situation. I believe postgresql also has two data types that allow arrays and hashes. Arrays are declared, as an example, like text[] where text could be replaced with another data type. Hashes can be defined using the hstore module.
This question answers part of my question: Rails sets a default byte limit of 65535, and you can change it manually.
All in all, whether you will run into trouble after that depends on the database you're using. For MySQL, Rails will automatically switch to the appropriate *TEXT field. MySQL can store up to 1GB of text.
But like benzado and thomasfedb say, it is probably better to store the information in a file so that the database doesn't allocate a lot of memory that might not even be used.
Even though you can store this kind of stuff in the database, you should consider storing it externally, and just put a URL or some other identifier in the database.
If it's in the database, you may end up loading 64K of data into memory when you aren't going to use it, just because you access something in that table. And it's easier to scale a collection of read-only files (using something like Amazon S3) than a database table.