PC Bottlenecking, FPS Issue - bottleneck

I Play VALORANT but only get 60 FPS in it. MY PC COMPONENTS =>
CPU = AMD RYZEN 5 3400G
Motherboard = ASUS EX-A320M-GAMING
RAM = DDR4 16 GB CORSAIR VENGENANCE 3000MHz
SSD = PCIe M.2 GIGABYTE 256GB
HDD = DESKTOP SEAGATE 1TB

Related

How to check SM utilization on Nvidia GPU?

I'd like to know if my pytorch code is fully utilizing the GPU SMs. According to this question gpu-util in nvidia-smi only shows how time at least one SM was used.
I also saw that typing nvidia-smi dmon gives the following table:
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 132 71 - 58 18 0 0 6800 1830
Where one would think that sm% would be SM utilization, but I couldn't find any documentation on what sm% means. The number given is exactly the same as gpu-util in nvidia-smi.
Is there any way to check the SM utilization?
On a side note, is there any way to check memory bandwidth utilization?

In which cases "MTLDevice newTextureWithDescriptor" can fail in iOS?

In my iOS project Metal is used extensively.
Sometimes MTLDevice newTextureWithDescriptor fails to create a texture.
Texture format is valid, it's a small RGBA 512x512 texture, everything is set up correctly.
When I print MTLDevice.currentAllocatedSize >> 20 it's 1396 MB on iPhone XR ( A12 processor ).
I've used a lot this thread for max runtime RAM estimate: https://stackoverflow.com/a/15200855/2567725
So I believe for iPhone XR max allowed RAM is 1792 MB.
I guess the texture is not created because RAM has been exsausted.
So the guestions are:
Is it true that on A12 Metal's currentAllocatedSize correlates with CPU memory, and GPU memory is "shared" with the CPU one, so 1792 MB = GPU+CPU for iPhone XR ?
Is there any way to get the description of Metal error? (now it just returns nil in newTextureWithDescriptor).
Is it possible to know how Metal allocation strategy works, e.g. how small available RAM should be to MTLDevice newTextureWithDescriptor return nil ?
UPDATE:
In some cases, however, MTLDevice.currentAllocatedSize >> 20 is much less, e.g. 14 MB, so I suspect Metal has some state corruption. But how to check what's the reason of the error?
The debug description of a texture descriptor:
textureDescriptor.pixelFormat = 70
textureDescriptor.width = 512
textureDescriptor.height = 512
textureDescriptor.textureType = 3
textureDescriptor.usage = 23
textureDescriptor.arrayLength = 1000
textureDescriptor.sampleCount = 1
textureDescriptor.mipmapLevelCount = 1
device.currentAllocatedSize >> 20 = 14

why there are different datarates for each WLAN apmmendment?

Why there are different data rates in each WLAN protocol.
Ex: 802.11 supports 1 and 2 Mbps,
802.11a can support 6, 9, 12, 18, 24, 36, 48, 54
802.11b can support 1, 2, 5.5, 11
etc..
802.11 and 802.11b:
Each data bits is converted into multiple bits of information for protection against errors due to noise or interference. Each of the new coded bits is called a chip. The different data rates has different chipping methods.
For example:
1 and 2 Mbps using the Barker code.
5.5 and 11 Mbps using Complementary Code Keying (CCK)
Both run at 11 Mchips/s.
Barker code has 11 chip code per symbol, CCK has 8 chip code per symbol =>
Symbol rate for Barker code 11000000/11 = 1 Msps, for CCK 1.375 Msps.
For Barker code:
DBPSK can modulate 1 bit of data => 1 bit * 1 Msps = 1 Mbps
DQPSK can modulate data bits in pairs => 2 bits * 1Msps = 2 Mbps
For CCK:
4 bits * 1.375 Msps = 5.5 Mbps
8 bits * 1.375 Msps = 11 Mbps
802.11g(802.11a):
This standart use OFDM modulation scheme. Look at modulation types:
BPSK (1 bit per symbol) => max rate is 12Mbps
QPSK (2 bits per symbol) => max rate is 24 Mbps
16-QAM (4 bits per symbol) => max rate is 48 Mbps
64-QAM (6 bits per symbol) => max rate is 72 Mbps
This types use code rate for error correction:
BPSK 1/2 => 6 Mbps
BPSK 3/4 => 9 Mbps
QPSK 1/2 => 12 Mbps
and so on.

Why is the gpu usage bouncing from 0 to 99% (Volatile GPU-Util, nvidia-smi) in tensorflow?

I am using two graphic cards and the GeForce gtx980 with 4GB, where I compute my neuronal network is always jumping from 0 to 99% and from 99% to 0% (repeating) at the last line of the pasted shell output.
After around 90seconds it did the first calculation. I put my images one after another into the neuronal network (for-loop). And the following calculations only need 20 seconds (3 epochs) and the GPU jumps between 96 and 100%.
Why is it jumping at the beginning?
I use the flag:
config.gpu_options.allow_growth = True
with tf.Session(config=config) as sess:
Can I be sure that is really using not less megabytes than nvidia-smi -lms 50 is showing me?
2017-08-10 16:33:24.836084: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-08-10 16:33:24.836100: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-08-10 16:33:25.052501: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:893] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2017-08-10 16:33:25.052861: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 0 with properties:
name: GeForce GTX 980
major: 5 minor: 2 memoryClockRate (GHz) 1.2155
pciBusID 0000:03:00.0
Total memory: 3.94GiB
Free memory: 3.87GiB
2017-08-10 16:33:25.187760: W tensorflow/stream_executor/cuda/cuda_driver.cc:523] A non-primary context 0x8532640 exists before initializing the StreamExecutor. We haven't verified StreamExecutor works with that.
2017-08-10 16:33:25.188006: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:893] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2017-08-10 16:33:25.188291: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 1 with properties:
name: GeForce GT 730
major: 3 minor: 5 memoryClockRate (GHz) 0.9015
pciBusID 0000:02:00.0
Total memory: 1.95GiB
Free memory: 1.45GiB
2017-08-10 16:33:25.188312: I tensorflow/core/common_runtime/gpu/gpu_device.cc:832] Peer access not supported between device ordinals 0 and 1
2017-08-10 16:33:25.188319: I tensorflow/core/common_runtime/gpu/gpu_device.cc:832] Peer access not supported between device ordinals 1 and 0
2017-08-10 16:33:25.188329: I tensorflow/core/common_runtime/gpu/gpu_device.cc:961] DMA: 0 1
2017-08-10 16:33:25.188335: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0: Y N
2017-08-10 16:33:25.188339: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 1: N Y
2017-08-10 16:33:25.188348: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 980, pci bus id: 0000:03:00.0)
Epoche: 0001 cost= 0.620101001 time= 115.366318226
Epoche: 0004 cost= 0.335480299 time= 19.4528050423

How many address bits are required to address 32 Mbyte of byte-addressable memory?

I found this question in one of my previous exam papers and I am not really sure if I got the right answer to it. As far as I see 2^15 is 32768 which is 32 MB so the answer could be 15 bits. But I think I'm missing something here?
32768 bytes is not 32 Mb.
32 Mb = 32 * 1024Kb = 32 * 1024 * 1024 bytes = 2^5 * 2^10 * 2^10 = 2^25
That is, 33.554.432 bytes = 32 Mb.
So you will need, at least 25 bits to address a single byte in that memory scheme.
Yes, some powers of 10. 32768<>32MB
1M is 2^20, 32 is 2^5, so you need 25 bits.
Since 1MB = 10^6 bytes i.e. 2^20 bytes for 32 MB we have:
32 = 2^5 bytes
1MB = 2^20 bytes so,
32MB = 2^5 * 2^20 = 2^25 bytes,
BUT the question asks "How many address bits..." not bytes, therefore we multiply by 8 = 2^3 (because 1byte = 8bits), that is
32 Mbytes = 2^5 * 2^20 *2^3 = 2^28
Thus, 28 bits are needed.

Resources