So recently I was searching on command line tools that perform fast search and I stumbled upon a lot . Out of those it is my understanding that Ag is reportedly faster than grep, ack, and sift. With grep being the slowest.
now I have 300.000 strings on a file and I try to find which strings have a specific substring and return them back.
time grep 'substring' file.txt
real 0m0.030s
user 0m0.009s
sys 0m0.008s
.
time ag 'substring' file.txt ----> 5 secs
real 0m0.083s
user 0m0.038s
sys 0m0.014s
Am I doing something wrong , or ag is not used the way I am trying to use it?
Grep is really efficient. However, even if Ag is faster on one system, it all comes down to which package and distribution you are using.
This, if you are using 64 bit on grep (on a Cygwin package), you can utilize more memory on the system. It could be that Ag has a package that is using less resources.
I would recommend using the parallel command, which allows you to specify how many processes per core to use and speed things up.
Related
I'm building a parsing engine, which will take JSON strings as inputs, parse the JSON string, and output the parsed JSON string. I'd like the parsing engine to run as either a daemon or service, so I can deploy it using Docker. It needs to be extermely high performance, because it will parse high volumes of data.
I understand I could just have a script, which launches sed as a background process. But, it seems like launching and re-launching a process will incur overhead, thus reducing performance. I'm thinking running sed as a daemon or service might allow me the convenience of using an existing, and well vetted tool while maximizing system performance.
Additionally, if awk or another existing tool would be better suited to this purpose, I am open to other options. But, I'd like it to be a well vetted Linux/Unix tool if possible, just to avoid re-inventing the wheel.
I read this SO question. And this one regarding running emacs as a daemon. But, neither seem to work for sed.
I have also considered piping stdin to sed in a daemon, but not sure if that is the best approach.
UPDATE
The key thing I am trying to ask is this: How can I run either sed, awk, or jq as a daemon, so that I can pass many strings to it without incurring the overhead of launching a new process?
(this was too big for a comment)
The way I understand it, these classic unix text-processing tools such as sed, awketc are written as filters, which process an input stream and produce an output stream. They are not built for being daemons, they terminate after processing the input stream. EOF on the input stream will eventually terminate the filter. So you'll have to keep that pipe open.
If you don't like the idea of wrapping the tool with a shell script, perhaps the functionality needed to keep the pipe open, turn the process into a daemon and later close the open file descriptor to gracefully terminate the process can be implemented in the constructor/destructor (init/fini) of a shared library which can be preloaded (with LD_PRELOAD) while running the tool.
If you choose to implement something like that, the daemonize project can be a good starting point.
I am training a mode using LuaTorch. Lately, I am faced with an annoying problem. The program runs more and more slowly as time goes by! When I execute
sudo sysctl -w vm.drop_caches=3
Then the program runs much faster. However, about one day after, it slows down again. I check the buffers and caches using top, finding they are quite high.
The first question: Does that matters if I release the buffers and caches using that command when training a model?
My initial idea is checking the time elapsed each epoch, and calls the cmd command when elapsed time is longer than the pre-setting value.
if time_elapse > time_out then
os.execute('sudo sudo sysctl -w vm.drop_caches=3')
end
However,it requires your manuly inputs of passwd for the first time called. How to use lua code to avoid manuly code input?
To answer your question directly: permit your user to execute the sysctl -w vm.drop_caches=3 command without entering a password.
If a user named naruto is running your lua script, add the following line to /etc/sudoers (or better still, create a file in /etc/sudoers.d for it).
naruto ALL=(ALL) NOPASSWD: /usr/sbin/sysctl -w vm.drop_caches=3
This will allow naruto to execute the exact command as root without supplying a password.
As to the underlying problem, there's more here to look into. You would never normally want to drop the page cache. It's just a cache, and that memory isn't actually used. Read through the very helpful page at http://www.linuxatemyram.com/, and in particular, consult the Warning Signs section at the end (see below, quoted from that page):
Warning signs of a genuine low memory situation that you may want to
look into:
available memory (or "free + buffers/cache") is close to zero swap
used increases or fluctuates
dmesg | grep oom-killer shows theOutOfMemory-killer at work
The sysctl -w vm.drop_caches=3 is likely speeding up your process by forcing swapped pages back into main memory. This may mean your system is configured to swap too aggressively. You can configure swappiness by modifying vm.swappiness. Many Linux distros default this value to 60. Try reducing it to 10 to encourage the kernel to keep your processes in memory. The swappiness Wikipedia entry has more detailed instructions on modifying it. This doesn't answer your direct question, Mike nailed that, but it may help solve your underlying problem.
I am currently running some simple cypher queries (count etc) on a large dataset (>10G) and am having some issues with tuning NE04J.
The machine running the queries has 4TB of ram, 160 cores and is running Ubuntu 14.04/neo4j version 2.3. Originally I left all the settings as default as it is stated that free memory will be dynamically allocated as required. However, as the queries are taking several minutes to complete I assumed this was not the case. As such I have set various combinations of the following parameters within the neo4j-wrapper.conf:
wrapper.java.initmemory=1200000
wrapper.java.maxmemory=1200000
dbms.memory.heap.initial_size=1200000
dbms.memory.heap.max_size=1200000
dbms.jvm.additional=-XX:NewRatio=1
and the following within neo4j.properties:
use_memory_mapped_buffers=true
neostore.nodestore.db.mapped_memory=50G
neostore.relationshipstore.db.mapped_memory=50G
neostore.propertystore.db.mapped_memory=50G
neostore.propertystore.db.strings.mapped_memory=50G
neostore.propertystore.db.arrays.mapped_memory=1G
following every guide/Stackoverflow post I could find on the topic, but I seem to have exhausted the available material with little effect.
I am running queries through the shell using the following command neo4j-shell -c < "queries/$1.cypher", but have also tried explicitly passing the conf files with -config $NEO4J_HOME/conf/neo4j-wrapper.conf (restarting the sever everytime I make a change).
I imagine that I have missed something silly which is causing the issue, as there are many reports of neo4j working well with data of this size, but cannot think what it could be. As such any help would be greatly appreciated.
Type :SCHEMA in your neo4j browser to show if you have indexes.
Share a couple of your queries.
In the neo4j.properties file, you need to set the dbms.pagecache.memory setting to about 1.5x the size of your database files. In your example, you can set it to 15g
Some information that's important to the question before describe the problems and issues.
Redis lua scripting replicates the script itself instead of
replicating the single commands, both to slaves and to the AOF file.
This is needed as often scripts are one or two order of magnitudes
faster than executing commands in a normal way, so for a slave to be
able to cope with the master replication link speed and number of
commands per second this is the only solution available.
More information about this decision in Lua scripting: determinism,
replication, AOF (github issue)).
Question
Is here is any way or workaround to replicates single commands instead of executing LUA script itself?
Why?
We use Redis as Natural language processing (Multinomial Naive Bayes) application server. Each time you want to learn on new text you should update big list of word weights. The word list with approximately 1,000,000 words in it. Processing time using LUA ~350 ms per run. Processing using separate applicaton server (hiredis based) is 37 seconds per run.
I think about workaround like this:
After computation are done transfer key to other (read only server) with MIGRATE
From time to time save and move RDB to other server and load it my hands.
Is here is any other workaround to solve this?
Yes, in the near future we're gonna have just that: https://www.reddit.com/r/redis/comments/3qtvoz/new_feature_single_commands_replication_for_lua/
I need to make an application that sends a POST with data to a web service when a user taps a tag to the ACR122U nfc reader. The application needs to continue running on it's own after starting it and send POST's each time a tag is tapped. Control over the POST url in the application is required for conditional logic based on the tag data. This is for a brief POC installation with no more than 500 tag swipes within a 4 hour window. There will be a reader and a screen displaying a web page with an accumulating list as the user taps a few tags.
Since I have a MacBook Pro (OS X 10.7), I have tried several approaches with that platform. Unfortunately, without success. I would prefer an OS X solution, but am open to suggestions.
Given the following, what do I need to do? Is there a better/easier way?
tagstand_writer:
The ACR122U came with software called 'tagstand_writer_macosx_0_6_5_beta'. tagstand_writer does not seem to encompass the functionality I want. It seems only to enable simple read/write without continuous polling. Can it be used by a wrapper application that does the polling, or can it be used in a way I am not aware of to achieve the desired functionality? Anyway, I tried to write a url to my tag, per the instructions, but was unable to. I forget what the problem was, but it didn't seem worth pursuing. I was, however, able to read the tag per the instructions.
libnfc
Searching for clues, I stumbled upon libnfc. So, I took a deep breadth and braved the install process. It didn't go very well. The documentation is 'ok' (not stumble-proof), and the process was challenging. I hit a few pot holes in the configure/make process and it took a while. Eventually, I was able to get one of the examples running. But, I wasn't sure what to do next. It seems pretty low level. There is an example provided called 'nfc-poll' but, it exits after a tag is read and I'm not sure if I can make that do what I want. I think this is the most promising of my 3 attempts, but am not sure what to do next.
tageventor
Looking for a higher-level starting point, I found tageventor. It seemed promising in that tagEventor, once started, is supposed to run and poll and call a script when a tag is read. The script, supposedly can be anything. So, I tried, but was unable to get it working. I found a more current version on github and tried that as well to no avail. I could get tagEventor to run, but when I touched a tag to the reader there was an error: "ERROR: readerCheck:: RPC transport error". I have no idea what that is, and neither does the internet apparently. Also, while trying to debug tageventor, I did notice that my console was outputting an error: "token in reader ACS ACR122U PICC Interface 00 00 cannot be used (error 229)" regardless of whether tageventor was running or not.
What's the simple/quick solution?
I suggest you use something like :
echo 1 | pcsctest >out.txt
cat out.txt | grep "Current Reader ATR Value " | tr -d " " | tr ":" " " | awk '{print $2}'
in your app to get the ART , you can create a bash script and put it in the daemon mode if you like
Sadly, I did not find an OS X solution in time and used a Windows box, which was quite easy. The SDK is made for Windows: http://www.acs.com.hk/en/products/12/acr122u-nfc-contactless-smart-card-reader-software-developmnt-kit/