EDIT: Based on additional experimentation, I'm fairly confident the slow-down occurs in response to many calls to the file open and close routines (h5open and close). I'm a bit short on time right now, but will come back and add more code/detail in the next few days.
Using the HDF5 package for Julia, I've noticed that the h5write function starts to slow down if one performs many iterations over calls to h5write and h5read. Interestingly, it appears that for the behaviour to be really obvious, one should be reading and writing to a large number of locations in a small number of files. A demonstration of the behaviour I'm talking about can be obtained by starting a Julia session and running the following subroutine (note, you will need the HDF5 package):
#Set parameters
numFile = 10;
numLocation = 10000;
writeDir = "/home/colin/Temp/";
FloatOut = 5.5;
#Import functions
using HDF5
#Loop over read/writes.
c1 = 1;
timeMat = Array(Float64, numFile * 2, 2);
for i in 1:numFile
filePath = string(writeDir, "f", string(i), ".h5");
for j in 1:numLocation
location = string("G1/L", string(j));
if j == 1 || j == numLocation; tic(); end;
h5write(filePath, location, FloatOut);
if j == 1 || j == numLocation; timeMat[c1, 1] = toc(); end;
if j == 1 || j == numLocation; tic(); end;
FloatIn = h5read(filePath, location);
if j == 1 || j == numLocation; timeMat[c1, 2] = toc(); end;
if j == 1 || j == numLocation; c1 = c1+1; end;
end
rm(filePath);
end
This code writes the floating point number 5.5 (chosen for no particular reason) to 10,000 locations in each of 10 files using h5write. Immediately after performing the write operation each time, the number is then read back in using h5read. For each file, I store the time taken to perform the write and read operation to the first and last location for each file in timeMat (note: initially, I stored the timing for every call but this level of detail is unnecessary to demonstrate the anomaly for the purposes of this question). The times are printed below:
h5write h5read
0.0007 0.0004
0.0020 0.0004
0.0020 0.0004
0.0031 0.0004
0.0034 0.0004
0.0049 0.0004
0.0050 0.0004
0.0064 0.0005
0.0068 0.0004
0.0082 0.0004
0.0084 0.0005
0.0106 0.0005
0.0114 0.0005
0.0114 0.0005
0.0120 0.0005
0.0131 0.0005
0.0135 0.0005
0.0146 0.0005
0.0151 0.0005
0.0163 0.0005
The timings for h5read are fairly consistent across the subroutine. However, the timings for h5write gradually get slower. By the end, a write is taking an order of magnitude longer than at the start. I understand that for each file, as we increase the number of locations, the time for a write (and read) might get slightly slower. But in this case, the slower performance persists even after we begin a new file. Perhaps strangest of all, we can run the subroutine a second time and time taken for a write will pick up where we left off on the previous run. The only way to get the time taken for a write back to the fastest speed is to completely restart Julia.
Final disclaimer: I am brand new to both Julia and hdf5, so I may have done something stupid or be overlooking something obvious.
The slowdown is indeed curious; profiling shows that almost all the time is spent in the close function, which is basically just a ccall. This suggests it may be a problem with the HDF5 C-library itself.
I think you'll be rather happier with the performance if you don't open and close the file each time you write a variable; instead, access the file through a file object. Here's an example:
filePath = string(writeDir, "f", string(i), ".h5")
h5open(filePath, "w") do file
global c1
for j in 1:numLocation
...
write(file, location, FloatOut)
...
FloatIn = read(file, location)
...
end
end
This way you're leaving the file open throughout the test. On my machine this is something like 100x faster.
If you want to pursue this further, please submit an issue.
Related
I have a problem where I need to do a linear interpolation on some data as it is acquired from a sensor (it's technically position data, but the nature of the data doesn't really matter). I'm doing this now in matlab, but since I will eventually migrate this code to other languages, I want to keep the code as simple as possible and not use any complicated matlab-specific/built-in functions.
My implementation initially seems OK, but when checking my work against matlab's built-in interp1 function, it seems my implementation isn't perfect, and I have no idea why. Below is the code I'm using on a dataset already fully collected, but as I loop through the data, I act as if I only have the current sample and the previous sample, which mirrors the problem I will eventually face.
%make some dummy data
np = 109; %number of data points for x and y
x_data = linspace(3,98,np) + (normrnd(0.4,0.2,[1,np]));
y_data = normrnd(2.5, 1.5, [1,np]);
%define the query points the data will be interpolated over
qp = [1:100];
kk=2; %indexes through the data
cc = 1; %indexes through the query points
qpi = qp(cc); %qpi is the current query point in the loop
y_interp = qp*nan; %this will hold our solution
while kk<=length(x_data)
kk = kk+1; %update the data counter
%perform online interpolation
if cc<length(qp)-1
if qpi>=y_data(kk-1) %the query point, of course, has to be in-between the current value and the next value of x_data
y_interp(cc) = myInterp(x_data(kk-1), x_data(kk), y_data(kk-1), y_data(kk), qpi);
end
if qpi>x_data(kk), %if the current query point is already larger than the current sample, update the sample
kk = kk+1;
else %otherwise, update the query point to ensure its in between the samples for the next iteration
cc = cc + 1;
qpi = qp(cc);
%It is possible that if the change in x_data is greater than the resolution of the query
%points, an update like the above wont work. In this case, we must lag the data
if qpi<x_data(kk),
kk=kk-1;
end
end
end
end
%get the correct interpolation
y_interp_correct = interp1(x_data, y_data, qp);
%plot both solutions to show the difference
figure;
plot(y_interp,'displayname','manual-solution'); hold on;
plot(y_interp_correct,'k--','displayname','matlab solution');
leg1 = legend('show');
set(leg1,'Location','Best');
ylabel('interpolated points');
xlabel('query points');
Note that the "myInterp" function is as follows:
function yi = myInterp(x1, x2, y1, y2, qp)
%linearly interpolate the function value y(x) over the query point qp
yi = y1 + (qp-x1) * ( (y2-y1)/(x2-x1) );
end
And here is the plot showing that my implementation isn't correct :-(
Can anyone help me find where the mistake is? And why? I suspect it has something to do with ensuring that the query point is in-between the previous and current x-samples, but I'm not sure.
The problem in your code is that you at times call myInterp with a value of qpi that is outside of the bounds x_data(kk-1) and x_data(kk). This leads to invalid extrapolation results.
Your logic of looping over kk rather than cc is very confusing to me. I would write a simple for loop over cc, which are the points at which you want to interpolate. For each of these points, advance kk, if necessary, such that qp(cc) is in between x_data(kk) and x_data(kk+1) (you can use kk-1 and kk instead if you prefer, just initialize kk=2 to ensure that kk-1 exists, I just find starting at kk=1 more intuitive).
To simplify the logic here, I'm limiting the values in qp to be inside the limits of x_data, so that we don't need to test to ensure that x_data(kk+1) exists, nor that x_data(1)<pq(cc). You can add those tests in if you wish.
Here's my code:
qp = [ceil(x_data(1)+0.1):floor(x_data(end)-0.1)];
y_interp = qp*nan; % this will hold our solution
kk=1; % indexes through the data
for cc=1:numel(qp)
% advance kk to where we can interpolate
% (this loop is guaranteed to not index out of bounds because x_data(end)>qp(end),
% but needs to be adjusted if this is not ensured prior to the loop)
while x_data(kk+1) < qp(cc)
kk = kk + 1;
end
% perform online interpolation
y_interp(cc) = myInterp(x_data(kk), x_data(kk+1), y_data(kk), y_data(kk+1), qp(cc));
end
As you can see, the logic is a lot simpler this way. The result is identical to y_interp_correct. The inner while x_data... loop serves the same purpose as your outer while loop, and would be the place where you read your data from wherever it's coming from.
I am working on programming a Markov chain in Lua, and one element of this requires me to uniformly generate random numbers. Here is a simplified example to illustrate my question:
example = function(x)
local r = math.random(1,10)
print(r)
return x[r]
end
exampleArray = {"a","b","c","d","e","f","g","h","i","j"}
print(example(exampleArray))
My issue is that when I re-run this program multiple times (mash F5) the exact same random number is generated resulting in the example function selecting the exact same array element. However, if I include many calls to the example function within the single program by repeating the print line at the end many times I get suitable random results.
This is not my intention as a proper Markov pseudo-random text generator should be able to run the same program with the same inputs multiple times and output different pseudo-random text every time. I have tried resetting the seed using math.randomseed(os.time()) and this makes it so the random number distribution is no longer uniform. My goal is to be able to re-run the above program and receive a randomly selected number every time.
You need to run math.randomseed() once before using math.random(), like this:
math.randomseed(os.time())
From your comment that you saw the first number is still the same. This is caused by the implementation of random generator in some platforms.
The solution is to pop some random numbers before using them for real:
math.randomseed(os.time())
math.random(); math.random(); math.random()
Note that the standard C library random() is usually not so uniformly random, a better solution is to use a better random generator if your platform provides one.
Reference: Lua Math Library
Standard C random numbers generator used in Lua isn't guananteed to be good for simulation. The words "Markov chain" suggest that you may need a better one. Here's a generator widely used for Monte-Carlo calculations:
local A1, A2 = 727595, 798405 -- 5^17=D20*A1+A2
local D20, D40 = 1048576, 1099511627776 -- 2^20, 2^40
local X1, X2 = 0, 1
function rand()
local U = X2*A2
local V = (X1*A2 + X2*A1) % D20
V = (V*D20 + U) % D40
X1 = math.floor(V/D20)
X2 = V - X1*D20
return V/D40
end
It generates a number between 0 and 1, so r = math.floor(rand()*10) + 1 would go into your example.
(That's multiplicative random number generator with period 2^38, multiplier 5^17 and modulo 2^40, original Pascal code by http://osmf.sscc.ru/~smp/)
math.randomseed(os.clock()*100000000000)
for i=1,3 do
math.random(10000, 65000)
end
Always results in new random numbers. Changing the seed value will ensure randomness. Don't follow os.time() because it is the epoch time and changes after one second but os.clock() won't have the same value at any close instance.
There's the Luaossl library solution: (https://github.com/wahern/luaossl)
local rand = require "openssl.rand"
local randominteger
if rand.ready() then -- rand has been properly seeded
-- Returns a cryptographically strong uniform random integer in the interval [0, n−1].
randominteger = rand.uniform(99) + 1 -- randomizes an integer from range 1 to 100
end
http://25thandclement.com/~william/projects/luaossl.pdf
This is my code:
def is_prime(i)
j = 2
while j < i do
if i % j == 0
return false
end
j += 1
end
true
end
i = (600851475143 / 2)
while i >= 0 do
if (600851475143 % i == 0) && (is_prime(i) == true)
largest_prime = i
break
end
i -= 1
end
puts largest_prime
Why is it not returning anything? Is it too large of a calculation going through all the numbers? Is there a simple way of doing it without utilizing the Ruby prime library(defeats the purpose)?
All the solutions I found online were too advanced for me, does anyone have a solution that a beginner would be able to understand?
"premature optimization is (the root of all) evil". :)
Here you go right away for the (1) biggest, (2) prime, factor. How about finding all the factors, prime or not, and then taking the last (biggest) of them that is prime. When we solve that, we can start optimizing it.
A factor a of a number n is such that there exists some b (we assume a <= b to avoid duplication) that a * b = n. But that means that for a <= b it will also be a*a <= a*b == n.
So, for each b = n/2, n/2-1, ... the potential corresponding factor is known automatically as a = n / b, there's no need to test a for divisibility at all ... and perhaps you can figure out which of as don't have to be tested for primality as well.
Lastly, if p is the smallest prime factor of n, then the prime factors of n are p and all the prime factors of n / p. Right?
Now you can complete the task.
update: you can find more discussion and a pseudocode of sorts here. Also, search for "600851475143" here on Stack Overflow.
I'll address not so much the answer, but how YOU can pursue the answer.
The most elegant troubleshooting approach is to use a debugger to get insight as to what is actually happening: How do I debug Ruby scripts?
That said, I rarely use a debugger -- I just stick in puts here and there to see what's going on.
Start with adding puts "testing #{i}" as the first line inside the loop. While the screen I/O will be a million times slower than a silent calculation, it will at least give you confidence that it's doing what you think it's doing, and perhaps some insight into how long the whole problem will take. Or it may reveal an error, such as the counter not changing, incrementing in the wrong direction, overshooting the break conditional, etc. Basic sanity check stuff.
If that doesn't set off a lightbulb, go deeper and puts inside the if statement. No revelations yet? Next puts inside is_prime(), then inside is_prime()'s loop. You get the idea.
Also, there's no reason in the world to start with 600851475143 during development! 17, 51, 100 and 1024 will work just as well. (And don't forget edge cases like 0, 1, 2, -1 and such, just for fun.) These will all complete before your finger is off the enter key -- or demonstrate that your algorithm truly never returns and send you back to the drawing board.
Use these two approaches and I'm sure you'll find your answers in a minute or two. Good luck!
Do you know you can solve this with one line of code in Ruby?
Prime.prime_division(600851475143).flatten.max
=> 6857
I'm getting ready to release a tool that is only effective with regular hard drives, not SSD (solid state drive). In fact, it shouldn't be used with SSD's because it will result in a lot of read/writes with no real effectiveness.
Anyone knows of a way of detecting if a given drive is solid-state?
Finally a reliable solution! Two of them, actually!
Check /sys/block/sdX/queue/rotational, where sdX is the drive name. If it's 0, you're dealing with an SSD, and 1 means plain old HDD.
I can't put my finger on the Linux version where it was introduced, but it's present in Ubuntu's Linux 3.2 and in vanilla Linux 3.6 and not present in vanilla 2.6.38. Oracle also backported it to their Unbreakable Enterprise kernel 5.5, which is based on 2.6.32.
There's also an ioctl to check if the drive is rotational since Linux 3.3, introduced by this commit. Using sysfs is usually more convenient, though.
You can actually fairly easily determine the rotational latency -- I did this once as part of a university project. It is described in this report. You'll want to skip to page 7 where you see some nice graphs of the latency. It goes from about 9.3 ms to 1.1 ms -- a drop of 8.2 ms. That corresponds directly to 60 s / 8.2 ms = 7317 RPM.
It was done with simple C code -- here's the part that measures the between positions aand b in a scratch file. We did this with larger and larger b values until we have been wandered all the way around a cylinder:
/* Measure the difference in access time between a and b. The result
* is measured in nanoseconds. */
int measure_latency(off_t a, off_t b) {
cycles_t ta, tb;
overflow_disk_buffer();
lseek(work_file, a, SEEK_SET);
read(work_file, buf, KiB/2);
ta = get_cycles();
lseek(work_file, b, SEEK_SET);
read(work_file, buf, KiB/2);
tb = get_cycles();
int diff = (tb - ta)/cycles_per_ns;
fprintf(stderr, "%i KiB to %i KiB: %i nsec\n", a / KiB, b / KiB, diff);
return diff;
}
This command lsblk -d -o name,rota lists your drives and has a 1 at ROTA if it's a rotational disk and a 0 if it's an SSD.
Example output :
NAME ROTA
sda 1
sdb 0
Detecting SSDs is not as impossible as dseifert makes out. There is already some progress in linux's libata (http://linux.derkeiler.com/Mailing-Lists/Kernel/2009-04/msg03625.html), though it doesn't seem user-ready yet.
And I definitely understand why this needs to be done. It's basically the difference between a linked list and an array. Defragmentation and such is usually counter-productive on a SSD.
You could get lucky by running
smartctl -i sda
from Smartmontools. Almost all SSDs has SSD in the Model field. No guarantee though.
My two cents to answering this old but very important question... If a disk is accessed via SCSI, then you will (potentially) be able to use SCSI INQUIRY command to request its rotational rate. VPD (Vital Product Data) page for that is called Block Device Characteristics and has a number 0xB1. Bytes 4 and 5 of this page contain a number with meaning:
0000h "Medium rotation rate is not reported"
0001h "Non-rotating medium (e.g., solid state)"
0002h - 0400h "Reserved"
0401h - FFFEh "Nominal medium rotation rate in rotations per minute (i.e.,
rpm) (e.g., 7 200 rpm = 1C20h, 10 000 rpm = 2710h, and 15 000 rpm = 3A98h)"
FFFFh "Reserved"
So, SSD must have 0001h in this field. The T10.org document about this page can be found here.
However, the implementation status of this standard is not clear to me.
I wrote the following javascript code. I needed to determine if machine was ussing SSD drive and if it was boot drive. The solution uses MSFT_PhysicalDisk WMI interface.
function main()
{
var retval= false;
// MediaType - 0 Unknown, 3 HDD, 4 SSD
// SpindleSpeed - -1 has rotational speed, 0 has no rotational speed (SSD)
// DeviceID - 0 boot device
var objWMIService = GetObject("winmgmts:\\\\.\\root\\Microsoft\\Windows\\Storage");
var colItems = objWMIService.ExecQuery("select * from MSFT_PhysicalDisk");
var enumItems = new Enumerator(colItems);
for (; !enumItems.atEnd(); enumItems.moveNext())
{
var objItem = enumItems.item();
if (objItem.MediaType == 4 && objItem.SpindleSpeed == 0)
{
if (objItem.DeviceID ==0)
{
retval=true;
}
}
}
if (retval)
{
WScript.Echo("You have SSD Drive and it is your boot drive.");
}
else
{
WScript.Echo("You do not have SSD Drive");
}
return retval;
}
main();
SSD devices emulate a hard disk device interface, so they can just be used like hard disks. This also means that there is no general way to detect what they are.
You probably could use some characteristics of the drive (latency, speed, size), though this won't be accurate for all drives. Another possibility may be to look at the S.M.A.R.T. data and see whether you can determine the type of disk through this (by model name, certain values), however unless you keep a database of all drives out there, this is not gonna be 100% accurate either.
write text file
read text file
repeat 10000 times...
10000/elapsed
for an ssd will be much higher, python3:
def ssd_test():
doc = 'ssd_test.txt'
start = time.time()
for i in range(10000):
with open(doc, 'w+') as f:
f.write('ssd test')
f.close()
with open(doc, 'r') as f:
ret = f.read()
f.close()
stop = time.time()
elapsed = stop - start
ios = int(10000/elapsed)
hd = 'HDD'
if ios > 6000: # ssd>8000; hdd <4000
hd = 'SSD'
print('detecting hard drive type by read/write speed')
print('ios', ios, 'hard drive type', hd)
return hd
I have 30000 files to process each file has 80000 x 5 lines. I need to read all files and process them finding the average of each line. I have written the code to read and extract all data from the file. My code is in Fortran. There is an array of (30000 X 800000) My program could not go over (3300 X 80000). I need to add the 4th column of each file in 300 file steps, I mean 4th column of 1st file with 4th column of 301st file, 4th col of 2nd file with 4th col of 302nd file and so on .Do you think this is because of the limitation of the size of array that Fortran can handle? If so, is there any way to increase the size of the array that Fortran can handle? What about the no of files? My code looks like this:
This program runs well.
implicit double precision (a-h,o-z),integer(i-n)
dimension x(78805,5),y(78805,5),den(78805,5)
dimension b(3300,78805),bb(78805)
character*70,fn
nf = 3300 ! NUMBER OF FILES
nj = 78804 ! Number of rows in file.
ns = 300 ! No. of steps for files.
ncores = 11 ! No of Cores
c--------------------------------------------------------------------
c--------------------------------------------------------------------
!Initialization
do i = 0,nf
do j = 1, nj
x(j,1) = 0.0
y(j,2) = 0.0
den(j,4) = 0.0
c a(i,j) = 0.0
b(i,j) = 0.0
c aa(j) = 0.0
bb(j) = 0.0
end do
end do
c-------!Body program-----------------------------------------------
iout = 6 ! Output Files upto "ns" no.
DO i= 1,nf ! LOOP FOR THE NUMBER OF FILES
write(fn,10)i
open(1,file=fn)
do j=1,nj ! Loop for the no of rows in the domain
read(1,*)x(j,1),y(j,2),den(j,4)
if(i.le.ns) then
c a(i,j) = prob(j,3)
b(i,j) = den(j,4)
else
c a(i,j) = prob(j,3) + a(i-ns,j)
b(i,j) = den(j,4) + b(i-ns,j)
end if
end do
close(1)
c ----------------------------------------------------------
c -----Write Out put [Probability and density matrix]-------
c ----------------------------------------------------------
if(i.ge.(nf-ns)) then
do j = 1, nj
c aa(j) = a(i,j)/(ncores*1.0)
bb(j) = b(i,j)/(ncores*1.0)
write(iout,*) int(x(j,1)),int(y(j,2)),bb(j)
end do
close(iout)
iout = iout + 1
end if
END DO
10 format(i0,'.txt')
END
It's hard to say for sure because you haven't given all the details yet, but your problem is quite possibly that you are using a 32 bit compiler producing 32 bit executables and you are simply running out of address space.
Although your operating system supports 64 bit address space, your 32 bit process is still limited to 32 bit addresses.
You have found a limit at 3300*78805*8 which is just under 2GB and this supports my theory.
No matter what is the cause of your immediate problem, your fundamental problem is that you appear to be loading everything into memory at once. I've not closely studied your algorithm but on first inspection it seems likely that you could re-arrange it to avoid having everything in memory at once.