I get below output from the perf tool on linux.
Take page-faults for example, what does (99.99%) mean here?
Thank you.
8,650,693,659 cache-references (49.96%)
1,775,706,317 cache-misses # 20.527 % of all cache refs (50.05%)
1,738,355,037,627 cycles (50.10%)
2,031,670,462,762 instructions # 1.17 insn per cycle (50.11%)
325,348,135,322 branches (50.02%)
1,064,589 faults
12,767 migrations
3,244,576,263 branch-misses # 1.00% of all branches (49.97%)
1,064,661 page-faults (99.99%)
Related
I am using GNU parallel (version 20200522) on Ubuntu Linux 18.04.4 and running jobs on all cores of a local server minus 2 cores, that is I am using the -j-2 parameter.
find /folder/ -type f -iname "*.pdf" | parallel -j-2 --nice 2 "script.sh {1} {1/.}; mv -f -v {1} /folder2/; mv -f {1/.}.txt /folder3/" :::: -
However, the program shows
Error: Cannot run any jobs.
I tried using the -j100% parameter and I have seen that it uses just 1 core(job), and I deduce that, for GNU parallel, 100% of the available cores on this system is just one core.
If I use the -j5 parameter (which does not imply autodetection of the total number of cores), everything is alright, parallel launches 5 jobs and uses 5 cores.
The interesting part is that the file /root/.parallel/tmp/sshlogin/MACHINE_NAME/cpuspec contains the following:
1
6
6
which means, I think, that GNU parallel should see 6 available cores.
I have tried deleting the cpuspec file and running parallel again to redetect the total number of cores, but the cpuspec file and the behavior of the program remain the same.
On different systems, deleting the cpuspec file solved all issues, but on this particular system it is not working. The virtual machine is copied from another server with a different configuration, that is why I need deleting the cpuspec file.
What should I do to get GNU parallel to correctly detect the number of cores on the system, so that I can use the -j-2 parameter?
Update 21.07:
After deleting once again the folder with the cpuspec file, running the parallel --number-of-sockets/cores/threads commands and using just once the -S 6/: parameter, the problem seems to have resolved itself. Now GNU parallel correctly detects the number of cores and the -j-2 parameters works.
I am not sure what good things happened, but I am not able to reproduce the bug anymore.
Ole, thank you for your answer. If I meet the bug again or if I am able to reproduce it, I will post it here.
And here is the output to the commands:
parallel --number-of-sockets
1
parallel --number-of-cores
6
parallel --number-of-threads
6
cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 63
model name : Intel(R) Xeon(R) CPU E5-2620 v3 # 2.40GHz
stepping : 2
microcode : 0xffffffff
cpu MHz : 2397.218
cache size : 15360 KB
physical id : 0
siblings : 6
core id : 0
cpu cores : 6
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx lm constant_tsc rep_good nopl cpuid pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm pti fsgsbase bmi1 avx2 smep bmi2 erms xsaveopt
bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit
bogomips : 4794.43
clflush size : 64
cache_alignment : 64
address sizes : 42 bits physical, 48 bits virtual
power management:
And it repeats itself for 5 more cores.
You may have found a bug. Please post the output of:
cat /proc/cpuinfo
parallel --number-of-sockets
parallel --number-of-cores
parallel --number-of-threads
Also see if you can make an MCVE.
As a workaround you can use -S 6/: to force GNU Parallel to detect 6 cores on your system.
find /folder/ -type f -iname "*.pdf" |
parallel -S 6/: -j-2 --nice 2 "script.sh {1} {1/.}; mv -f -v {1} /folder2/; mv -f {1/.}.txt /folder3/"
(Also :::: - can be left out completely: If there is no ::: :::: then GNU Parallel reads from stdin).
I'm running dask over slurm via jobqueue and I have been getting 3 errors pretty consistently...
Basically my question is what could be causing these failures? At first glance the problem is that too many workers are writing to disk at once, or my workers are forking into many other processes, but it's pretty difficult to track that. I can ssh into the node but I'm not seeing an abnormal number of processes, and each node has a 500gb ssd, so I shouldn't be writing excessively.
Everything below this is just information about my configurations and such
My setup is as follows:
cluster = SLURMCluster(cores=1, memory=f"{args.gbmem}GB", queue='fast_q', name=args.name,
env_extra=["source ~/.zshrc"])
cluster.adapt(minimum=1, maximum=200)
client = await Client(cluster, processes=False, asynchronous=True)
I suppose i'm not even sure if processes=False should be set.
I run this starter script via sbatch under the conditions of 4gb of memory, 2 cores (-c) (even though i expect to only need 1) and 1 task (-n). And this sets off all of my jobs via the slurmcluster config from above. I dumped my slurm submission scripts to files and they look reasonable.
Each job is not complex, it is a subprocess.call( command to a compiled executable that takes 1 core and 2-4 GB of memory. I require the client call and further calls to be asynchronous because I have a lot of conditional computations. So each worker when loaded should consist of 1 python processes, 1 running executable, and 1 shell.
Imposed by the scheduler we have
>> ulimit -a
-t: cpu time (seconds) unlimited
-f: file size (blocks) unlimited
-d: data seg size (kbytes) unlimited
-s: stack size (kbytes) 8192
-c: core file size (blocks) 0
-m: resident set size (kbytes) unlimited
-u: processes 512
-n: file descriptors 1024
-l: locked-in-memory size (kbytes) 64
-v: address space (kbytes) unlimited
-x: file locks unlimited
-i: pending signals 1031203
-q: bytes in POSIX msg queues 819200
-e: max nice 0
-r: max rt priority 0
-N 15: unlimited
And each node has 64 cores. so I don't really think i'm hitting any limits.
i'm using the jobqueue.yaml file that looks like:
slurm:
name: dask-worker
cores: 1 # Total number of cores per job
memory: 2 # Total amount of memory per job
processes: 1 # Number of Python processes per job
local-directory: /scratch # Location of fast local storage like /scratch or $TMPDIR
queue: fast_q
walltime: '24:00:00'
log-directory: /home/dbun/slurm_logs
I would appreciate any advice at all! Full log is below.
FORK BLOCKING IO ERROR
distributed.nanny - INFO - Start Nanny at: 'tcp://172.16.131.82:13687'
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/home/dbun/.local/share/pyenv/versions/3.7.0/lib/python3.7/multiprocessing/forkserver.py", line 250, in main
pid = os.fork()
BlockingIOError: [Errno 11] Resource temporarily unavailable
distributed.dask_worker - INFO - End worker
Aborted!
CANT START NEW THREAD ERROR
https://pastebin.com/ibYUNcqD
BLOCKING IO ERROR
https://pastebin.com/FGfxqZEk
EDIT:
Another piece of the puzzle:
It looks like dask_worker is running multiple multiprocessing.forkserver calls? does that sound reasonable?
https://pastebin.com/r2pTQUS4
This problem was caused by having ulimit -u too low.
As it turns out each worker has a few processes associated with it, and the python ones have multiple threads. In the end you end up with approximately 14 threads that contribute to your ulimit -u. Mine was set to 512, and with a 64 core system I was likely hitting ~896. It looks like the a maximum threads per a process I could have had would have been 8.
Solution:
in .zshrc (.bashrc) I added the line
ulimit -u unlimited
Haven't had any problems since.
I have one java based application which is having huge line of source code(~1m).Now I am using jenkins with sonar-runner-2.4 to run analysis with code coverage and test cases count.I have upgraded sonarqube server from 5.4 to 6.3.1.Before upgrade this job took 9hrs to complete the whole analysis (still it is very much long time but fine) but after upgrade to sonarqube-6.3.1 same job taking 13hrs to complete the same analysis.
How do I improve analysis time at least my earlier time 9hr ?
EDIT
Here is my JAVA_OPTS for sonarqube-6.3.1 instance
sonar.web.javaOpts=-Xmx6G -Xms2G -XX:MaxPermSize=1G -XX:+HeapDumpOnOutOfMemoryError -Djava.net.preferIPv4Stack=true
Available Hardware :
$lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 1
Core(s) per socket: 4
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 26
Stepping: 5
CPU MHz: 1596.000
BogoMIPS: 3999.44
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 4096K
NUMA node0 CPU(s): 0-3
NUMA node1 CPU(s): 4-7
Available Memory :
$free -m
total used free shared buff/cache available
Mem: 128714 58945 66232 430 3535 68298
Swap: 32767 957 31810
sonar-project.properties for the long running job:
sonar-project.properties
As you haven't really given many details, I can't really give many details in the answer, but the simple answer is that you have to make the scan do less work.
Look at your codebase. Is your scan processing generated classes? Is it scanning test classes? Is it scanning classes that have little real business logic? If you answer "yes" to any of those, consider excluding those classes.
Look at the SonarQube plugins you're using. Are you running every possible plugin you can run? Are there some heuristics you don't need to run, or perhaps you could run less frequently?
How we can generate FortiFy report using command ??? on linux.
In command, how we can include only some folders or files for analyzing and how we can give the location to store the report. etc.
Please help....
Thanks,
Karthik
1. Step#1 (clean cache)
you need to plan scan structure before starting:
scanid = 9999 (can be anything you like)
ProjectRoot = /local/proj/9999/
WorkingDirectory = /local/proj/9999/working
(this dir is huge, you need to "rm -rf ./working && mkdir ./working" before every scan, or byte code piles underneath this dir and consume your harddisk fast)
log = /local/proj/9999/working/sca.log
source='/local/proj/9999/source/src/**.*'
classpath='local/proj/9999/source/WEB-INF/lib/*.jar; /local/proj/9999/source/jars/**.*; /local/proj/9999/source/classes/**.*'
./sourceanalyzer -b 9999 -Dcom.fortify.sca.ProjectRoot=/local/proj/9999/ -Dcom.fortify.WorkingDirectory=/local/proj/9999/working -logfile /local/proj/working/9999/working/sca.log -clean
It is important to specify ProjectRoot, if not overwrite this system default, it will put under your /home/user.fortify
sca.log location is very important, if fortify does not find this file, it cannot find byte code to scan.
You can alter the ProjectRoot and Working Directory once for all if your are the only user: FORTIFY_HOME/Core/config/fortify_sca.properties).
In such case, your command line would be ./sourceanalyzer -b 9999 -clean
2. Step#2 (translate source code to byte code)
nohup ./sourceanalyzer -b 9999 -verbose -64 -Xmx8000M -Xss24M -XX:MaxPermSize=128M -XX:+CMSClassUnloadingEnabled -XX:+UseConcMarkSweepGC -XX:+UseParallelGC -Dcom.fortify.sca.ProjectRoot=/local/proj/9999/ -Dcom.fortify.WorkingDirectory=/local/proj/9999/working -logfile /local/proj/9999/sca.log -source 1.5 -classpath '/local/proj/9999/source/WEB-INF/lib/*.jar:/local/proj/9999/source/jars/**/*.jar:/local/proj/9999/source/classes/**/*.class' -extdirs '/local/proj/9999/source/wars/*.war' '/local/proj/9999/source/src/**/*' &
always unix background job (&) in case your session to server is timeout, it will keep working.
cp : put all your known classpath here for fortify to resolve the functiodfn calls. If function not found, fortify will skip the source code translation, so this part will not be scanned later. You will get a poor scan quality but FPR looks good (low issue reported). It is important to have all dependency jars in place.
-extdir: put all directories/files you don't want to be scanned here.
the last section, files between ' ' are your source.
-64 is to use 64-bit java, if not specified, 32-bit will be used and the max heap should be <1.3 GB (-Xmx1200M is safe).
-XX: are the same meaning as in launch application server. only use these to control the class heap and garbage collection. This is to tweak performance.
-source is java version (1.5 to 1.8)
3. Step#3 (scan with rulepack, custom rules, filters, etc)
nohup ./sourceanalyzer -b 9999 -64 -Xmx8000M -Dcom.fortify.sca.ProjectRoot=/local/proj/9999 -Dcom.fortify.WorkingDirectory=/local/proj/9999/working -logfile /local/ssap/proj/9999/working/sca.log **-scan** -filter '/local/other/filter.txt' -rules '/local/other/custom/*.xml -f '/local/proj/9999.fpr' &
-filter: file name must be filter.txt, any ruleguid in this file will not be reported.
rules: this is the custom rule you wrote. the HP rulepack is in FORTIFY_HOME/Core/config/rules directory
-scan : keyword to tell fortify engine to scan existing scanid. You can skip step#2 and only do step#3 if you did notchange code, just want to play with different filter/custom rules
4. Step#4 Generate PDF from the FPR file (if required)
./ReportGenerator -format pdf -f '/local/proj/9999.pdf' -source '/local/proj/9999.fpr'
Problem:
I am trying to get sphinx running again after server reboot. There seems to be no sphinx.conf file when I try to start it running:
>searchd
Sphinx 2.0.4-release (r3135)
Copyright (c) 2001-2012, Andrew Aksyonoff
Copyright (c) 2008-2012, Sphinx Technologies Inc (http://sphinxsearch.com)
FATAL: no readable config file (looked in /etc/sphinxsearch/sphinx.conf, ./sphinx.conf).
I have run:
rake thinking_sphinx:configure
rake thinking_sphinx:index
rake thinking_sphinx:start
The problem is for some reason no etc/sphinxsearch/sphinx.conf file is being created... I am new to thinking_sphinx and this might not be the only problem (with the site), but it doesn't seem to be set up fully. For out put and more information read below:
Background info:
I am working on a project I didn't set up initially. We rebooted the server to see some of the changes we made in a constants file. But after the reboot the project no longer displays when you navigate to the site. When you put in the straight ip address it just says "Welcome to Nginx".
The port is open and working through our hosting server, so I was told I have to restart some services. One of the issues I came upon was with thinking_sphinx. This was the rake tasks for sphinx site I referenced. As well as common configuration issues for sphinx.
I set up the sphinx.yml development paths (we aren't using production). Then I ran
>rake thinking_sphinx:index
which seems to have worked even though it output some warnings:
Generating Configuration to /home/potato/streetpotato/config/development.sphinx.conf
(0.2ms) SELECT ##global.sql_mode, ##session.sql_mode;
Sphinx 2.0.4-release (r3135)
Copyright (c) 2001-2012, Andrew Aksyonoff
Copyright (c) 2008-2012, Sphinx Technologies Inc (http://sphinxsearch.com)
using config file '/home/potato/streetpotato/config/development.sphinx.conf'...
indexing index 'bar_core'...
WARNING: collect_hits: mem_limit=0 kb too low, increasing to 14080 kb
collected 249 docs, 0.0 MB
sorted 0.0 Mhits, 100.0% done
total 249 docs, 32394 bytes
total 0.254 sec, 127298 bytes/sec, 978.49 docs/sec
indexing index 'bar_delta'...
WARNING: collect_hits: mem_limit=0 kb too low, increasing to 14080 kb
collected 0 docs, 0.0 MB
total 0 docs, 0 bytes
total 0.003 sec, 0 bytes/sec, 0.00 docs/sec
skipping non-plain index 'bar'...
indexing index 'synonym_core'...
WARNING: collect_hits: mem_limit=0 kb too low, increasing to 13568 kb
collected 3 docs, 0.0 MB
sorted 0.0 Mhits, 100.0% done
total 3 docs, 103 bytes
total 0.003 sec, 30356 bytes/sec, 884.17 docs/sec
indexing index 'synonym_delta'...
WARNING: collect_hits: mem_limit=0 kb too low, increasing to 13568 kb
collected 0 docs, 0.0 MB
total 0 docs, 0 bytes
total 0.002 sec, 0 bytes/sec, 0.00 docs/sec
skipping non-plain index 'synonym'...
indexing index 'user_core'...
WARNING: collect_hits: mem_limit=0 kb too low, increasing to 13568 kb
collected 100 docs, 0.0 MB
sorted 0.0 Mhits, 100.0% done
total 100 docs, 3146 bytes
total 0.013 sec, 239348 bytes/sec, 7608.03 docs/sec
skipping non-plain index 'user'...
total 11 reads, 0.000 sec, 3.8 kb/call avg, 0.0 msec/call avg
total 37 writes, 0.000 sec, 2.5 kb/call avg, 0.0 msec/call avg
Then I ran
>rake thinking_sphinx:configure
Generating Configuration to /home/potato/streetpotato/config/development.sphinx.conf
(0.2ms) SELECT ##global.sql_mode, ##session.sql_mode;
Lastly running:
>rake thinking_sphinx:start
Started successfully (pid 29623).
Now even though my log says:
[Fri Nov 16 19:34:29.820 2012] [29623] accepting connections
There is still no sphinx.conf file being generated and when I try to use the searchd command it still gives me the error...
>searchd --stop
Sphinx 2.0.4-release (r3135)
Copyright (c) 2001-2012, Andrew Aksyonoff
Copyright (c) 2008-2012, Sphinx Technologies Inc (http://sphinxsearch.com)
FATAL: no readable config file (looked in /etc/sphinxsearch/sphinx.conf, ./sphinx.conf).
I am at a loss, I know this is super long but only because I am so lost and trying to give as much information as possible. I got further then I did yesterday with this but it still doesn't seem to be fully working. I might have to do more set up with unicorn or thin as well. I'm just trying to figure out how to get the site back up and running again... If any one has run into similar issues with their site going down after reboot and got it back up (specifically a rails project on Nginx and unicorn or thin using sphinx) any insight would be appreciated.
Thanks,
Alan
Calm down!! :-)
Firstly, you don't need a /etc/sphinxsearch/sphinx.conf file; that is just the default file that searchd tries to use when you don't specify any configuration file.
As your log output shows, your rails application is using /home/potato/streetpotato/config/development.sphinx.conf file when it starts the searchd process.
Run ps -fe | grep searchd on your dev machine; you should see something like this as the output:
501 14128 1 0 0:00.00 ttys004 0:00.00 searchd --pidfile --config /home/potato/streetpotato/config/development.sphinx.conf
501 14130 13546 0 0:00.00 ttys004 0:00.01 grep searchd
So rails app calls searchd with --config /home/potato/streetpotato/config/development.sphinx.conf argument, to specify a different conf file.
From your logs, it is clear that thinkingsphinx is running fine. You can confirm it further by logging into rails console and running a search method on one of the models which have thinking_sphinx indexes defined on them.
Eg: If your app has an Article model as shown in the above link, the following command will show all articles having National Parks in them:
$ rails console
> Article.search( "National Parks" )
=> [#<Article id: 15,... >, #<Article id: 22,...>,...]
The real problem is the application not showing after restarting the server. That has nothing to do with thinking sphinx which is running fine.
Try rolling back all the changes made in the constants file that you mention above, and make sure the application is working fine. Then start making the changes one by one and isolate the one change that breaks your application.
So yeah, this is a hole in ThinkingSphinx (IMHO) -- you can start the sphinxd server using the various rake tasks (which generate the config as needed) ... but this doesn't work in production.
On a project I worked in last year (running on a Linux server) we created an /etc/init.d script to start sphinxd -- it takes options including a path to the configuration file. We did our deploys with capistrano, and put generated code in app/shared -- a directory outside of the source tree. I believe there are some predefined capistrano tasks that will rebuild the Rails-specific config files when models change or otherwise affect what Sphinx does (same as the rake task you mention).
This was one of those cases for us where we had been putting off site search for a long time, and one of our developers got it "all set up" in an afternoon. Getting it deployed took a lot more work.
(Just saw answer from #prakash-murthy -- he provides some details of how to specify config path when you initialized sphinxd). But the trick is to have it start when the system starts and pointing to the config that ThinkingSphinx generates.)
Ok so after a day n a half I finally set it all up and got it running (it was more then just sphinx). I also had to get nginx and unicorn up and running in the background, since we didn't have scripts set up to restart them when the server was rebooted...
When rebooting the server you have to restart some services before the app will be accessible:
1) thinking_sphinx
reference sites
http://pat.github.com/ts/en/rake_tasks.html
http://www.claytonlz.com/2010/09/thinkingsphinx-conf-problems/
a)create/modify app/config/sphinx.yml
development:
morphology: stem_en
port: 9312
bin_path: "/usr/bin" # set up the path to binary for searchd
searchd_binary_name: searchd
indexer_binary_name: indexer
#mem_limit: 128M
test:
morphology: stem_en
port: 9312
mem_limit: 128M
production:
morphology: stem_en
port: 9312
mem_limit: 512M
# the searchd ip, in case it's not on localhost
# address: 10.10.0.0
# this is by default included in db/sphinx
# searchd_file_path: "/path/to/shared/folder/sphinx"
b)rake thinking_sphinx:index
c)rake thinking_sphinx:configure # creates config/development.sphinx.conf which helps define sphinx's indexing
d)# then you have to start sphinx, there are 2 ways to do this
rake thinking_sphinx:start
rake thinking_sphinx:stop
OR
searchd
searchd --stop
# only the rake commands worked for me, when I tried to run searchd
# I got an error FATAL: no readable config file (looked in /etc/sphinxsearch/sphinx.conf, ./sphinx.conf).
# for some reason we dont have a sphinx.conf file, but the rake commands work without it
e)# once you start thinking_sphinx check log/searchd.log file for the line
[Fri Nov 16 19:34:29.820 2012] [29623] accepting connections
2) nginx
reference site:
http://wiki.nginx.org/CommandLine
a) check that nginx is up and running
i) start server
# to check where nginx resides type in this into server console
which nginx
# whatever path it gives you is how you start the server this is my path
/usr/sbin/nginx
ii) stop server
/usr/sbin/nginx -s stop # use the path given by which command
3) unicorn (starting app server)
reference site:
http://codelevy.com/2010/02/09/getting-started-with-unicorn.html
a) test if unicorn will run after previous changes
unicorn_rails -p 3000
# the site should now be up and running, check that it is
# console should now log the different actions you do on the site
b) create unicorn.rb in config folder (if none is there)
# only start this step if the step above got the site running
# close the console or exit the process you started above
# contents of unicorn.rb
worker_processes 2 #(starts 2 child processes, not completely neccissary)
preload_app true
timeout 30
listen 3000
after_fork do |server, worker|
ActiveRecord::Base.establish_connection
end
c) run unicorn in the background
# make sure you exited the process above before running this
unicorn_rails -c config/unicorn.rb -D
# this was giving me an error that it said was logged by stderr
# I got the command to run by adding a command to the front
http://stackoverflow.com/questions/2325152/check-for-stdout-or-stderr
exec 2> /dev/null unicorn_rails -c config/unicorn.rb -D
d) (optional) check stats from starting unicorn
i) pgrep -lf unicorn_rails
#sample output
5374 unicorn_rails master -c config/unicorn.rb -D
5388 unicorn_rails worker[0] -c config/unicorn.rb -D # not needed currently
5391 unicorn_rails worker[1] -c config/unicorn.rb -D # not needed currently
ii) cat tmp/pids/unicorn.pid # from inside the streetpotato folder
#sample output
5374