My machine has nvidia Tesla K20m gpu. I would like to know gpu utilzation, memory utilization, temperature and fan speed. So I have used nvidia-smi to know the details. Nvidia-smi log is as follows
==============NVSMI LOG==============
Timestamp : Tue Dec 10 11:06:11 2013
Driver Version : 319.49
Attached GPUs : 1
GPU 0000:84:00.0
Product Name : Tesla K20m
Display Mode : Disabled
Display Active : Disabled
Persistence Mode : Enabled
Accounting Mode : Disabled
Accounting Mode Buffer Size : 128
Driver Model
Current : N/A
Pending : N/A
Serial Number : 0325212069909
GPU UUID : GPU-8b890015-e683-4061-6596-d27716c2900b
VBIOS Version : 80.10.11.00.0B
Inforom Version
Image Version : 2081.0208.01.07
OEM Object : 1.1
ECC Object : 3.0
Power Management Object : N/A
GPU Operation Mode
Current : Compute
Pending : Compute
PCI
Bus : 0x84
Device : 0x00
Domain : 0x0000
Device Id : 0x102810DE
Bus Id : 0000:84:00.0
Sub System Id : 0x101510DE
GPU Link Info
PCIe Generation
Max : 2
Current : 1
Link Width
Max : 16x
Current : 16x
Fan Speed : N/A
Performance State : P8
Clocks Throttle Reasons
Idle : Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
Unknown : Not Active
Memory Usage
Total : 4799 MB
Used : 11 MB
Free : 4788 MB
Compute Mode : Default
Utilization
Gpu : 0 %
Memory : 0 %
Ecc Mode
Current : Enabled
Pending : Enabled
ECC Errors
Volatile
Single Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : 0
Total : 0
Double Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : 0
Total : 0
Aggregate
Single Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : 0
Total : 0
Double Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : 0
Total : 0
Retired Pages
Single Bit ECC : 0
Double Bit ECC : 0
Pending : No
Temperature
Gpu : 29 C
Power Readings
Power Management : Supported
Power Draw : 25.44 W
Power Limit : 225.00 W
Default Power Limit : 225.00 W
Enforced Power Limit : 225.00 W
Min Power Limit : 150.00 W
Max Power Limit : 225.00 W
Clocks
Graphics : 324 MHz
SM : 324 MHz
Memory : 324 MHz
Applications Clocks
Graphics : 705 MHz
Memory : 2600 MHz
Default Applications Clocks
Graphics : 705 MHz
Memory : 2600 MHz
Max Clocks
Graphics : 758 MHz
SM : 758 MHz
Memory : 2600 MHz
Compute Processes : None
How to know fan speed. Is there any plug-in? Can anyone help me?
Have you tried nvclock ? It works for my Tesla K40M.
Related
I'd like to know if my pytorch code is fully utilizing the GPU SMs. According to this question gpu-util in nvidia-smi only shows how time at least one SM was used.
I also saw that typing nvidia-smi dmon gives the following table:
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 132 71 - 58 18 0 0 6800 1830
Where one would think that sm% would be SM utilization, but I couldn't find any documentation on what sm% means. The number given is exactly the same as gpu-util in nvidia-smi.
Is there any way to check the SM utilization?
On a side note, is there any way to check memory bandwidth utilization?
We are doing some performance measurements including some memory footprint measurements. We've been doing this with GNU time.
But, I cannot tell if they are measuring in kilobytes (1000 bytes) or kibibytes (1024 bytes).
The man page for my system says of the %M format key (which we are using to measure peak memory usage): "Maximum resident set size of the process during its lifetime, in Kbytes."
I assume K here means the SI "Kilo" prefix, and thus kilobytes.
But having looked at a few other memory measurements of various things through various tools, I trust that assumption like I'd trust a starved lion to watch my dogs during a week-long vacation.
I need to know, because for our tests 1000 vs 1024 Kbytes adds up to a difference of nearly 8 gigabytes, and I'd like to think I can cut down the potential error in our measurements by a few billion.
Using the below testing setup, I have determined that GNU time on my system measures in Kibibytes.
The below program (allocator.c) allocates data and touches each of it 1 KiB at a time to ensure that it all gets paged in. Note: This test only works if you can page in the entirety of the allocated data, otherwise time's measurement will only be the largest resident collection of memory.
allocator.c:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define min(a,b) ( ( (a)>(b) )? (b) : (a) )
volatile char access;
volatile char* data;
const int step = 128;
int main(int argc, char** argv ){
unsigned long k = strtoul( argv[1], NULL, 10 );
if( k >= 0 ){
printf( "Allocating %lu (%s) bytes\n", k, argv[1] );
data = (char*) malloc( k );
for( int i = 0; i < k; i += step ){
data[min(i,k-1)] = (char) i;
}
free( data );
} else {
printf("Bad size: %s => %lu\n", argv[1], k );
}
return 0;
}
compile with: gcc -O3 allocator.c -o allocator
Runner Bash Script:
kibibyte=1024
kilobyte=1000
mebibyte=$(expr 1024 \* ${kibibyte})
megabyte=$(expr 1000 \* ${kilobyte})
gibibyte=$(expr 1024 \* ${mebibyte})
gigabyte=$(expr 1000 \* ${megabyte})
for mult in $(seq 1 3);
do
bytes=$(expr ${gibibyte} \* ${mult} )
echo ${mult} GiB \(${bytes} bytes\)
echo "... in kibibytes: $(expr ${bytes} / ${kibibyte})"
echo "... in kilobytes: $(expr ${bytes} / ${kilobyte})"
/usr/bin/time -v ./allocator ${bytes}
echo "===================================================="
done
For me this produces the following output:
1 GiB (1073741824 bytes)
... in kibibytes: 1048576
... in kilobytes: 1073741
Allocating 1073741824 (1073741824) bytes
Command being timed: "./a.out 1073741824"
User time (seconds): 0.12
System time (seconds): 0.52
Percent of CPU this job got: 75%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.86
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 1049068
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 262309
Voluntary context switches: 7
Involuntary context switches: 2
Swaps: 0
File system inputs: 16
File system outputs: 8
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
====================================================
2 GiB (2147483648 bytes)
... in kibibytes: 2097152
... in kilobytes: 2147483
Allocating 2147483648 (2147483648) bytes
Command being timed: "./a.out 2147483648"
User time (seconds): 0.21
System time (seconds): 1.09
Percent of CPU this job got: 99%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:01.31
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 2097644
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 524453
Voluntary context switches: 4
Involuntary context switches: 3
Swaps: 0
File system inputs: 0
File system outputs: 8
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
====================================================
3 GiB (3221225472 bytes)
... in kibibytes: 3145728
... in kilobytes: 3221225
Allocating 3221225472 (3221225472) bytes
Command being timed: "./a.out 3221225472"
User time (seconds): 0.38
System time (seconds): 1.60
Percent of CPU this job got: 99%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:01.98
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 3146220
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 786597
Voluntary context switches: 4
Involuntary context switches: 3
Swaps: 0
File system inputs: 0
File system outputs: 8
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
====================================================
In the "Maximum resident set size" entry, I see values that are closest to the kibibytes value I expect from that raw byte count. There is some difference because its possible that some memory is being paged out (in cases where it is lower, which none of them are here) and because there is more memory being consumed than what the program allocates (namely, the stack and the actual binary image itself).
Versions on my system:
> gcc --version
gcc (GCC) 6.1.0
Copyright (C) 2016 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
> /usr/bin/time --version
GNU time 1.7
> lsb_release -a
LSB Version: :base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch
Distributor ID: CentOS
Description: CentOS release 6.10 (Final)
Release: 6.10
Codename: Final
I am using two graphic cards and the GeForce gtx980 with 4GB, where I compute my neuronal network is always jumping from 0 to 99% and from 99% to 0% (repeating) at the last line of the pasted shell output.
After around 90seconds it did the first calculation. I put my images one after another into the neuronal network (for-loop). And the following calculations only need 20 seconds (3 epochs) and the GPU jumps between 96 and 100%.
Why is it jumping at the beginning?
I use the flag:
config.gpu_options.allow_growth = True
with tf.Session(config=config) as sess:
Can I be sure that is really using not less megabytes than nvidia-smi -lms 50 is showing me?
2017-08-10 16:33:24.836084: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-08-10 16:33:24.836100: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-08-10 16:33:25.052501: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:893] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2017-08-10 16:33:25.052861: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 0 with properties:
name: GeForce GTX 980
major: 5 minor: 2 memoryClockRate (GHz) 1.2155
pciBusID 0000:03:00.0
Total memory: 3.94GiB
Free memory: 3.87GiB
2017-08-10 16:33:25.187760: W tensorflow/stream_executor/cuda/cuda_driver.cc:523] A non-primary context 0x8532640 exists before initializing the StreamExecutor. We haven't verified StreamExecutor works with that.
2017-08-10 16:33:25.188006: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:893] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2017-08-10 16:33:25.188291: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 1 with properties:
name: GeForce GT 730
major: 3 minor: 5 memoryClockRate (GHz) 0.9015
pciBusID 0000:02:00.0
Total memory: 1.95GiB
Free memory: 1.45GiB
2017-08-10 16:33:25.188312: I tensorflow/core/common_runtime/gpu/gpu_device.cc:832] Peer access not supported between device ordinals 0 and 1
2017-08-10 16:33:25.188319: I tensorflow/core/common_runtime/gpu/gpu_device.cc:832] Peer access not supported between device ordinals 1 and 0
2017-08-10 16:33:25.188329: I tensorflow/core/common_runtime/gpu/gpu_device.cc:961] DMA: 0 1
2017-08-10 16:33:25.188335: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0: Y N
2017-08-10 16:33:25.188339: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 1: N Y
2017-08-10 16:33:25.188348: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 980, pci bus id: 0000:03:00.0)
Epoche: 0001 cost= 0.620101001 time= 115.366318226
Epoche: 0004 cost= 0.335480299 time= 19.4528050423
I try to extract GPS metadata using exiftool
$ exiftool input.mov (attached)
Which able to display GPS data.
However when try to run with some app (eg:from http://www.registratorviewer.com) the GPS data is not display.
Perhap there is another way. I would like to ask:
if one would know how to extract GPS data (if need) in order to
display location on the map while play that movie
Here the metada:
ExifTool Version Number : 9.85
File Name : input.mov
Directory : .
File Size : 391 MB
File Modification Date/Time : 2015:02:17 21:19:42+07:00
File Access Date/Time : 2015:02:19 17:23:53+07:00
File Inode Change Date/Time : 2015:02:19 17:23:25+07:00
File Permissions : rwxrwxrwx
File Type : MOV
MIME Type : video/quicktime
Major Brand : Apple QuickTime (.MOV/QT)
Minor Version : 0.0.0
Compatible Brands : qt
Movie Data Size : 409646393
Movie Data Offset : 36
Movie Header Version : 0
Create Date : 2015:02:17 14:19:43
Modify Date : 2015:02:17 14:23:00
Time Scale : 600
Duration : 0:03:17
Preferred Rate : 1
Preferred Volume : 100.00%
Preview Time : 0 s
Preview Duration : 0 s
Poster Time : 0 s
Selection Time : 0 s
Selection Duration : 0 s
Current Time : 0 s
Next Track ID : 3
Track Header Version : 0
Track Create Date : 2015:02:17 14:19:43
Track Modify Date : 2015:02:17 14:23:00
Track ID : 1
Track Duration : 0:03:17
Track Layer : 0
Track Volume : 0.00%
Image Width : 1920
Image Height : 1080
Clean Aperture Dimensions : 1920x1080
Production Aperture Dimensions : 1920x1080
Encoded Pixels Dimensions : 1920x1080
Graphics Mode : ditherCopy
Op Color : 32768 32768 32768
Compressor ID : avc1
Source Image Width : 1920
Source Image Height : 1080
X Resolution : 72
Y Resolution : 72
Compressor Name : H.264
Bit Depth : 24
Video Frame Rate : 25.5
Matrix Structure : 1 0 0 0 1 0 0 0 1
Media Header Version : 0
Media Create Date : 2015:02:17 14:19:43
Media Modify Date : 2015:02:17 14:23:00
Media Time Scale : 44100
Media Duration : 0:03:17
Media Language Code : und
Balance : 0
Handler Class : Data Handler
Handler Vendor ID : Apple
Handler Description : Core Media Data Handler
Audio Format : mp4a
Audio Channels : 1
Audio Bits Per Sample : 16
Audio Sample Rate : 44100
Purchase File Format : mp4a
Handler Type : Metadata Tags
Make (tha-TH) : Apple
Creation Date (tha-TH) : 2015:02:17 21:19:43+07:00
GPS Coordinates (tha-TH) : 5 deg 46' 19.20" N, 101 deg 4' 19.92" E, 287 m Above Sea Level
Software (tha-TH) : 8.1.3
Model (tha-TH) : iPhone 5
Make (tha) : Apple
Software Version (tha) : 8.1.3
Content Create Date (tha) : 2015:02:17 21:19:43+07:00
GPS Coordinates (tha) : 5 deg 46' 19.20" N, 101 deg 4' 19.92" E, 287 m Above Sea Level
Model (tha) : iPhone 5
Make : Apple
Creation Date : 2015:02:17 21:19:43+07:00
GPS Coordinates : 5 deg 46' 19.20" N, 101 deg 4' 19.92" E, 287 m Above Sea Level
Software : 8.1.3
Model : iPhone 5
Software Version : 8.1.3
Content Create Date : 2015:02:17 21:19:43+07:00
Avg Bitrate : 16.6 Mbps
GPS Altitude : 287 m
GPS Altitude Ref : Above Sea Level
GPS Latitude : 5 deg 46' 19.20" N
GPS Longitude : 101 deg 4' 19.92" E
GPS Position : 5 deg 46' 19.20" N, 101 deg 4' 19.92" E
Image Size : 1920x1080
Megapixels : 2.1
Rotation : 90
I made an entity in which quartus successfully recognizes RAM, and instantiates a RAM megafunction for it. It would be nice if I could initialize that RAM from a file. I found tutorials for making such file (.mif file). Now that I have created that file, i don't know how to make quartus initialize that module. Any help is appreciated.
Here is my RAM entity:
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity RAM is
port (
clk: in std_logic;
we: in std_logic;
data_in: in std_logic_vector (7 downto 0);
read_addr: in integer range 0 to 65535;
write_addr: in integer range 0 to 65535;
data_out: out std_logic_vector (7 downto 0)
);
end entity RAM;
architecture RAM_arch of RAM is
type memory is array (65535 downto 0) of std_logic_vector (7 downto 0);
signal content: memory;
begin
process(clk)
begin
if (RISING_EDGE(clk)) then
if (we = '1') then
content(write_addr) <= data_in;
end if;
data_out <= content(read_addr);
end if;
end process;
end architecture;
Possibly the best way to initialise the memory is to ... put an initialisation clause on the memory variable. There may be Quartus-specific ways to load .MIF files, but this is probably simpler, definitely more portable (to Xilinx for example), and more flexible because you get to define the file format, you don't have to generate .mif files.
Given the following code:
type memory is array (65535 downto 0) of std_logic_vector (7 downto 0);
signal content: memory;
you could simply write
type memory is array (65535 downto 0) of std_logic_vector (7 downto 0);
signal content: memory := init_my_RAM(filename => "ram_contents.txt");
Now it is possible but unlikely that Quartus doesn't support initialisation this way,
so we can test it by writing a simple init_my_ram function ignoring the actual file contents:
function init_my_ram (filename : string) return memory is
variable f : file;
variable m : memory;
begin
file_open(f, filename, read_mode);
for i in memory'range loop
m(i) := X"55";
end loop;
file_close(f);
return m;
end init_my_ram;
Because the function call is an initialiser, and called at elaboration time when the design is synthesised, this is all synthesisable.
If this compiles and Quartus generates a memory full of X"55", you are good to go, to parse whatever file format you want, in the init_my_ram function. (Binary files are harder, and the reader code may not be so portable between tools, but not impossible).
The .MIF approach has one potential advantage though : you can update just the memory contents without requiring another synthesis/place and route cycle.
One simple way to initalize a ram area is as follows:
(quartus 15.1 tested)
(* ram_init_file = "Bm437_IBM_VGA8.mif" *) reg [7:0] Bm437_IBM_VGA8[4096];
Best regards,
Johi.
simplest method of initializing is by writing .mif with any simple editor such as Notepad. The .mif list below is for a ROM decoder as multiplexer. 6-bit address 64-bit data. .mif can contain any data word size 8,16,32,64-bit etc..in both hex or binary. Works all the time. The file must be in same directory as project.
WIDTH=64;
DEPTH=128;
ADDRESS_RADIX=HEX;
DATA_RADIX=HEX;
CONTENT BEGIN
000 : 0000000000000001;-- 0
001 : 0000000000000002;-- 1
002 : 0000000000000004;-- 2
003 : 0000000000000008;-- 3
004 : 0000000000000010;-- 4
005 : 0000000000000020;-- 5
006 : 0000000000000040;-- 6
007 : 0000000000000080;-- 7
008 : 0000000000000100;-- 8
009 : 0000000000000200;-- 9
00A : 0000000000000400;-- 10
00B : 0000000000000800;-- 11
00C : 0000000000001000;-- 12
00D : 0000000000002000;-- 13
00E : 0000000000004000;-- 14
00F : 0000000000008000;-- 15
010 : 0000000000010000;-- 16
011 : 0000000000020000;-- 17
012 : 0000000000040000;-- 18
013 : 0000000000080000;-- 19
014 : 0000000000100000;-- 20
015 : 0000000000200000;-- 21
016 : 0000000000400000;-- 22
017 : 0000000000800000;-- 23
018 : 0000000001000000;-- 24
019 : 0000000002000000;-- 25
01A : 0000000004000000;-- 26
01B : 0000000008000000;-- 27
01C : 0000000010000000;-- 28
01D : 0000000020000000;-- 29
01E : 0000000040000000;-- 30
01F : 0000000080000000;-- 31
020 : 0000000100000000;-- 32
021 : 0000000200000000;-- 33
022 : 0000000400000000;-- 34
023 : 0000000800000000;-- 35
024 : 0000001000000000;-- 36
025 : 0000002000000000;-- 37
026 : 0000004000000000;-- 38
027 : 0000008000000000;-- 39
028 : 0000010000000000;-- 40
029 : 0000020000000000;-- 41
02A : 0000040000000000;-- 42
02B : 0000080000000000;-- 43
02C : 0000100000000000;-- 44
02D : 0000200000000000;-- 45
02E : 0000400000000000;-- 46
02F : 0000800000000000;-- 47
030 : 0001000000000000;-- 48
031 : 0002000000000000;-- 49
032 : 0004000000000000;-- 50
033 : 0008000000000000;-- 51
034 : 0010000000000000;-- 52
035 : 0020000000000000;-- 53
036 : 0040000000000000;-- 54
037 : 0080000000000000;-- 55
038 : 0100000000000000;-- 56
039 : 0200000000000000;-- 57
03A : 0400000000000000;-- 58
03B : 0800000000000000;-- 59
03C : 1000000000000000;-- 60
03D : 2000000000000000;-- 61
03E : 4000000000000000;-- 62
03F : 8000000000000000;-- 63
[40..7F] : 0000000000000000;
END;
As specified in this document this is the proper way to init memory from file:
signal content: memory;
attribute ram_init_file : string;
attribute ram_init_file of content:
signal is "init.mif";
If you generated one of the RAM modules using the wizard but forgot to add a memory initialization file to it you can add one later by doing the following:
Tools > MegaWizard Plug-In Manager > Edit an existing custom megafunction variation > {Select your file} > Next > Mem Init > Yes, use this file for the memory content data > Browse