I'm trying to write a mapreduce query in erlang for riak but I'm having trouble getting my head around it. Does anyone know where I can find an example of an erlang mapreduce query, or can write one, that will perform the SQL equivalent of a count operation? It would also be helpful if someone could explain what the actual query does line-by-line too. I've managed to write one in js but erlang is pretty different. Thank you.
Riak comes with a set of predefined mapreduce functions implemented in Erlang that you can use as a guide if you are trying to write your own functions. One of the functions provided is
reduce_count_inputs, which counts inputs (as long as the input are not integers) and might be useful for your scenario.
I have also created a library of map phase functions implemented in Erlang that you can look at.
Although I believe it is possible to pass in Erlang functions as part of the mapreduce job specification in a similar way to how you send anonymous JavaScript function, it is usually not recommended, and I have not done this myself.
I always look into riak sources to find some good examples.
Here module which implement standard mapreduce funs: riak_kv_mapreduce
This is a simples one, which just returns value of the object.
Related
I would like to modify the way Cypher processes queries sent to it for pattern matching. I have read about Execution plans and how Cypher chooses the best plan with the least number of operations and all. This is pretty good. However I am looking into implementing a Similarity Search feature that allows you to specify a Query graph that would be matched if not exact, close (similar). I have seen a few examples of this in theory. I would like to implement something of this sort for Neo4j. Which I am guessing would require a change in how the Query Engine deals with queries sent to it. Or Worse :)
Here are some links that demonstrate the idea
http://www.cs.cmu.edu/~dchau/graphite/graphite.pdf
http://www.cidrdb.org/cidr2013/Papers/CIDR13_Paper72.pdf
I am looking for ideas. Anything at all in relation to the topic would be helpful. Thanks in advance
(:I)<-[:NEEDING_HELP_FROM]-(:YOU)
From my point of view, better for you is to create Unmanaged Extensions.
Because you can create you own custom functionality into Neo4j server.
You are not able to extend Cypher Language without your own fork of source code.
it would be appreciated to if anyone can answer that is there any way to add an LOOP function to cypher?
i can find loops in graph by use of traversal. but i want to know is there anyway to pass obtained result to a customized user defined Cypher function?
Not yet. They're talking about UDFs (User Defined Functions) in an upcoming release of Neo4j, though. You might consider refining your use case and asking for it as a feature of Cypher itself in github issues, as well.
Until UDFs are possible with Cypher, you might consider using unmanaged extensions.
It seems that you are asking two different questions.
About whether you can use loops in Cypher, yes you can, with FOREACH or UNWIND, depending on what you want to achieve. This is a good resource for when you don't quite know which is the right one for your case. It compares the two and tries them with different example queries.
As for whether you can write user defined function, as of Neo4j 3.0, you can. They are however written in Java.
Look into this link for more details: https://neo4j.com/developer/procedures-functions/
I've seen answers to this question but I couldn't figure out which of the answers would perform the fastest. These are the answers I've seen- which is best?
Read one line at a time using each or each_line
Read one line at a time using gets
Save it all into an array of lines using readlines and then use each
Use grep (not sure what exactly to do with grep...)
Use sed (not sure what exactly to do with sed...)
Something else?
Also, would it be better to just use another language or should Ruby be fine?
EDIT:
More details: Each line contains something like "id1 attr1_1 attr2_1 id2 attr1_2 attr2_2... idn attr1_n attr2_n" (n is very big) and I need to insert those into a database. For that example line, I would need to insert n rows into the database.
Ruby will likely be using the same or very similar low-level code (written in C) to do the actual reading from disk for the first three options, so they should perform similarly. Given that, you should choose whichever is most convenient for you; the ability to do that is what makes languages like Ruby so useful! You will be reading a lot of data from disk, so I would suggest using each_line and processing each line as you read it.
I would not recommend bringing grep, sed, or any other such external utilities into the picture unless you have a very good reason, as they will make your code less portable and expose you to failures that may be difficult to diagnose.
If you're using Ruby then there's no need to worry about performance. The language is such that it suits an iterative approach to reading a file, line by line, and works very nicely. So long as you're using the language the way it's designed you can let the interpreter people worry about performance. Job done.
If one particular readLargeFileFast method is needed then it should be because it's really hindering the program somehow. Now, you write a C program to do it and popen it as a separate process within your ruby code. You could call it read_large.c and (perhaps) use command line arguments to tell it how to behave.
This is championing the idea that a scripting language is used for a fast development rather than a fast run time. As such a developer can be very productive by swiftly 'prototyping' a program in something like Ruby and only later rewriting the components warrant some low level code. Often, however, once it's working in script, it's not necessary to do anything else at all.
The Ruby Docs describe launching a separate process and treating it as a file. It's easy-peasy! A good start is The Art of Linux Programming's introductory paragraph on program modularity. This book also makes a great example of using linux's standard stream editor, called sed, which you could probably use from Ruby right now.
If you need to parse or edit a lot of text then many interpreters or editors have been written around sed's functionality. Further, it may save you a lot of effort writing something super efficient if you don't know C. Good is the Introduction to SED by Bruce Barnett.
I have existing java code and need to create Design Document based on that.
For starter even if I could get all functions with input / output parameters that will help in overall proces.
Note: There is not commeted documentation on any procedures, function or classes.
Last but not least. Let me know for any good tool which will reduce time required for this phase. As currently we write every flow and related stuffs.
What you want is just too much. Quoting Linus Torvalds: “Good code is its own best documentation.”. Anyway, I digress.
You might want to look into UML tools which generate class/sequence diagrams from the code. There are many of them but only a handful support reverse engineering (into and from the class diagram), and even fewer subset support the same to/from sequence diagram. I only know MagicDraw could do this, but I am biased as I used to work for the manufacturer of this tool so do your shopping around first.
Use java docs: http://www.oracle.com/technetwork/java/javase/documentation/index-137868.html
or Introspection: http://docs.oracle.com/javase/tutorial/reflect/class/classMembers.html
Out of curiosity, I wonder what can people do with parsers, how they are applied, and what do people usually create with it?
I know it's widely used in programming language industry, however I think this is just a tiny portion of it, right?
Besides special-purpose languages, my most ambitious use of a parser generator yet (with good old yacc back in C, and again later with pyparsing in Python) was to extract, validate and possibly alter certain meta-info from SQL queries -- parsing SQL properly is a real challenge (especially if you hope to support more than one dialect!-), a parser generator (and a lexer it sits on top of) at least remove THAT part of the job!-)
They are used to parse text....
To give a more concrete example, where I work we use lexx/yacc to parse strings coming over sockets.
Also from the name it should give you an idea what javacc is used for (java compiler compiler!)
Generally to parse Domain Specific Languages or scripting languages, or similar support for code snipits.
Previously I have seen it used to parse the command line based output of another software tool. This way the outer tool (VPN software) could re-use the base router IPSec code without modification. As lots of what was being parsed was IP Route tables and other structured repeated text.
Using a parser allowed simple changes when the formatting changed, instead of trying to find and tweak the a hand written parser. And the output did change a few times of the life of the product.
I used parsers to help process +/- 800 Clipper source files into similar PRGs that could be compiled with Alaksa Xbase 32.
You can use it to extend your favorite language by getting its language definition from their repository and then adding what you've always wanted to have. You can pass the regular syntax to your application and handle the extension in your own program.