Console Print Speed

Console Print Speed - dart

I’ve been looking at a few example programs in order to find better ways to code with Dart.
Not that this example (below) is of any particular importance, however it is taken from rosettacode dot org with alterations by me to (hopefully) bring it up-to-date.
The point of this posting is with regard to Benchmarks and what may be detrimental to results in Dart in some Benchmarks in terms of the speed of printing to the console compared to other languages. I don’t know what the comparison is (to other languages), however in Dart, the Console output (at least in Windows) appears to be quite slow even using StringBuffer.
As an aside, in my test, if n1 is allowed to grow to 11, the total recursion count = >238 million, and it takes (on my laptop) c. 2.9 seconds to run Example 1.
In addition, of possible interest, if the String assignment is altered to int, without printing, no time is recorded as elapsed (Example 2).
Typical times on my low-spec laptop (run from the Console - Windows).
Elapsed Microseconds (Print) = 26002
Elapsed Microseconds (StringBuffer) = 9000
Elapsed Microseconds (no Printing) = 3000
Obviously in this case, console print times are a significant factor relative to computation etc. times.
So, can anyone advise how this compares to eg. Java times for console output? That would at least be an indication as to whether Dart is particularly slow in this area, which may be relevant to some Benchmarks. Incidentally, times when running in the Dart Editor incur a negligible penalty for printing.
// Example 1. The base code for the test (Ackermann).
main() {
for (int m1 = 0; m1 <= 3; ++m1) {
for (int n1 = 0; n1 <= 4; ++n1) {
print ("Acker(${m1}, ${n1}) = ${fAcker(m1, n1)}");
}
}
}
int fAcker(int m2, int n2) => m2==0 ? n2+1 : n2==0 ?
fAcker(m2-1, 1) : fAcker(m2-1, fAcker(m2, n2-1));
The altered code for the test.
// Example 2 //
main() {
fRunAcker(1); // print
fRunAcker(2); // StringBuffer
fRunAcker(3); // no printing
}
void fRunAcker(int iType) {
String sResult;
StringBuffer sb1;
Stopwatch oStopwatch = new Stopwatch();
oStopwatch.start();
List lType = ["Print", "StringBuffer", "no Printing"];
if (iType == 2) // Use StringBuffer
sb1 = new StringBuffer();
for (int m1 = 0; m1 <= 3; ++m1) {
for (int n1 = 0; n1 <= 4; ++n1) {
if (iType == 1) // print
print ("Acker(${m1}, ${n1}) = ${fAcker(m1, n1)}");
if (iType == 2) // StringBuffer
sb1.write ("Acker(${m1}, ${n1}) = ${fAcker(m1, n1)}\n");
if (iType == 3) // no printing
sResult = "Acker(${m1}, ${n1}) = ${fAcker(m1, n1)}\n";
}
}
if (iType == 2)
print (sb1.toString());
oStopwatch.stop();
print ("Elapsed Microseconds (${lType[iType-1]}) = "+
"${oStopwatch.elapsedMicroseconds}");
}
int fAcker(int m2, int n2) => m2==0 ? n2+1 : n2==0 ?
fAcker(m2-1, 1) : fAcker(m2-1, fAcker(m2, n2-1));
//Typical times on my low-spec laptop (run from the console).
// Elapsed Microseconds (Print) = 26002
// Elapsed Microseconds (StringBuffer) = 9000
// Elapsed Microseconds (no Printing) = 3000

I tested using Java, which was an interesting exercise.
The results from this small test indicate that Dart takes about 60% longer for the console output than Java, using the results from the fastest for each. I really need to do a larger test with more terminal output, which I will do.
In terms of "computational" speed with no output, using this test and m = 3, and n = 10, the comparison is consistently around 530 milliseconds for Java compared to 580 milliseconds for Dart. That is 59.5 million calls. Java bombs with n = 11 (238 million calls), which I presume is stack overflow. I'm not saying that is a definitive benchmark of much, but it is an indication of something. Dart appears to be very close in the computational time which is pleasing to see. I altered the Dart code from using the "question mark operator" to use "if" statements the same as Java, and that appears to be a bit faster c. 10% or more, and that appeared to be consistently the case.

I ran a further test for console printing as shown below (example 1 – Dart), (Example 2 – Java).
The best times for each are as follows (100,000 iterations) :
Dart 47 seconds.
Java 22 seconds.
Dart Editor 2.3 seconds.
While it is not earth-shattering, it does appear to illustrate that for some reason (a) Dart is slow with console output, and (b) Dart-Editor is extremely fast with console output. (c) This needs to be taken into account when evaluating any performance that involves console output, which is what initially drew my attention to it.
Perhaps when they have time :) the Dart team could look at this if it is considered worthwhile.
Example 1 - Dart
// Dart - Test 100,000 iterations of console output //
Stopwatch oTimer = new Stopwatch();
main() {
// "warm-up"
for (int i1=0; i1 < 20000; i1++) {
print ("The quick brown fox chased ...");
}
oTimer.reset();
oTimer.start();
for (int i2=0; i2 < 100000; i2++) {
print ("The quick brown fox chased ....");
}
oTimer.stop();
print ("Elapsed time = ${oTimer.elapsedMicroseconds/1000} milliseconds");
}
Example 2 - Java
public class console001
{
// Java - Test 100,000 iterations of console output
public static void main (String [] args)
{
// warm-up
for (int i1=0; i1<20000; i1++)
{
System.out.println("The quick brown fox jumped ....");
}
long tmStart = System.nanoTime();
for (int i2=0; i2<100000; i2++)
{
System.out.println("The quick brown fox jumped ....");
}
long tmEnd = System.nanoTime() - tmStart;
System.out.println("Time elapsed in microseconds = "+(tmEnd/1000));
}
}

Related

Bandwidth Speed Test Slower Than Expected

Im trying to create a bandwidth test using Dart using a LibreSpeed server, but for some reason it's reporting a much slower speed than expected. When I test using the web client for LibreSpeed I get around 30mb/s download bandwidth, however with my Dart program I'm only getting around 4mb/s.
The code for the speed test is as follows:
Future<void> start() async {
var rand = Random();
var req = await client.getUrl(Uri.http(_serverAddress, '/garbage.php',
{'r': rand.nextDouble().toString(), 'ckSize': '20'}));
var resp = await req.close();
var bytesDownloaded = 0;
var start = DateTime.now();
await for (var bytes in resp) {
bytesDownloaded += bytes.length;
}
var timeTaken = DateTime.now().difference(start).inSeconds;
var mbsDownloaded = bytesDownloaded / 1000000;
print(
'$mbsDownloaded megabytes downloaded in $timeTaken seconds at a rate of ${mbsDownloaded / timeTaken} mbs per second');
}
I think I'm probably not understanding a crucial reason as to why it appears to be so slow compared to the web client. Can anyone give me any ideas as to what the bottleneck might be?

The problem is confusion between the use of units when measure internet speed. In general terms there are two ways we can measure the speed:
Mbps (MegaBit Per Second)
MB/s (MegaByte per second)
To understand your problem we need to notice that 1 byte = 8 bits. Also, that the unit LibreSpeed (and most internet providers) uses is Mbps.
The unit your current program are measuring in is MB/s since you are using the length of the list (each element of the list is 1 byte = 8 bit:
await for (var bytes in resp) {
bytesDownloaded += bytes.length;
}
And never multiply that number by 8 later in your code. You are then comparing this number against the number from LibreSpeed which uses Mbps (same as most internet providers) which means your number is 8 times smaller that expected.

Memory/CPU optimzation?

My program uses alot of memory and Processing power, I can only search up to 6000, is there any way to reduce the amount of memory this uses? This will really help with future programming endevours as it will be nice to know how to work with memory smartly.
ArrayList<Integer> factor = new ArrayList<Integer>();
ArrayList<Integer> non = new ArrayList<Integer>();
ArrayList<Integer> prime = new ArrayList<Integer>();
Scanner sc = new Scanner(System.in);
System.out.println("Please enter how high we want to search");
long startTime = System.nanoTime();
int max = sc.nextInt();
int number = 2;
while (number < max)
{
for (int i=0;i<prime.size();i++)
{
int value = prime.get(i);
if (number % value == 0)
{
factor.add(value);
}
else
{
non.add(value);
}
}
if(factor.isEmpty())
{
prime.add(number);
}
else
{
composite.add(number);
}
factor.clear();
number++;
}
int howMany=prime.size();
System.out.printf("The are "+howMany+" prime numbers up to " +max + " and they are: " +prime );
System.out.println();
}

You do not say what language you are using, so this answer will be general.
To store primes up to 6,000 you only need about 3,000 bits which is less than 380 bytes. Your basic solution is the Sieve of Eratosthenes and the fact that 2 is the only even prime. You set up the sieve to handle only odd numbers, which halves the storage needed. Since the sieve only holds prime or not prime for each odd number, the storage can be reduced to a single bit for each number.
Once you have set up your sieve, there are many sites including this one which have instructions in different languages, you just need to retrieve the prime/not prime value from the sieve for the numbers in your range. Here is the pseudocode for checking if a number is prime, assuming the sieve has already been set up:
boolean function isPrime(number)
// Low numbers
if (number < 2)
return false
endif
// Even numbers
if (number is even)
return number == 2
endif
// Odd numbers >= 3
return sieve[(number - 1) / 2] == 1
end function
Low numbers are not prime. 2 is the only even prime; all other even numbers are not prime. The prime flag for the odd number 2n+1 is stored at bit n in the sieve. This assumes that the language you are using allows bit level access, something like a BitSet in Java.

Why does this Rascal pattern matching code use so much memory and time?

I'm trying to write what I would think of as an extremely simple piece of code in Rascal: Testing if list A contains list B.
Starting out with some very basic code to create a list of strings
public list[str] makeStringList(int Start, int End)
{
return [ "some string with number <i>" | i <- [Start..End]];
}
public list[str] toTest = makeStringList(0, 200000);
My first try was 'inspired' by the sorting example in the tutor:
public void findClone(list[str] In, str S1, str S2, str S3, str S4, str S5, str S6)
{
switch(In)
{
case [*str head, str i1, str i2, str i3, str i4, str i5, str i6, *str tail]:
{
if(S1 == i1 && S2 == i2 && S3 == i3 && S4 == i4 && S5 == i5 && S6 == i6)
{
println("found duplicate\n\t<i1>\n\t<i2>\n\t<i3>\n\t<i4>\n\t<i5>\n\t<i6>");
}
fail;
}
default:
return;
}
}
Not very pretty, but I expected it to work. Unfortunately, the code runs for about 30 seconds before crashing with an "out of memory" error.
I then tried a better looking alternative:
public void findClone2(list[str] In, list[str] whatWeSearchFor)
{
for ([*str head, *str mid, *str end] := In)
if (mid == whatWeSearchFor)
println("gotcha");
}
with approximately the same result (seems to run a little longer before running out of memory)
Finally, I tried a 'good old' C-style approach with a for-loop
public void findClone3(list[str] In, list[str] whatWeSearchFor)
{
cloneLength = size(whatWeSearchFor);
inputLength = size(In);
if(inputLength < cloneLength) return [];
loopLength = inputLength - cloneLength + 1;
for(int i <- [0..loopLength])
{
isAClone = true;
for(int j <- [0..cloneLength])
{
if(In[i+j] != whatWeSearchFor[j])
isAClone = false;
}
if(isAClone) println("Found clone <whatWeSearchFor> on lines <i> through <i+cloneLength-1>");
}
}
To my surprise, this one works like a charm. No out of memory, and results in seconds.
I get that my first two attempts probably create a lot of temporary string objects that all have to be garbage collected, but I can't believe that the only solution that worked really is the best solution.
Any pointers would be greatly appreciated.
My relevant eclipse.ini settings are
-XX:MaxPermSize=512m
-Xms512m
-Xss64m
-Xmx1G

We'll need to look to see why this is happening. Note that, if you want to use pattern matching, this is maybe a better way to write it:
public void findClone(list[str] In, str S1, str S2, str S3, str S4, str S5, str S6) {
switch(In) {
case [*str head, S1, S2, S3, S4, S5, S6, *str tail]: {
println("found duplicate\n\t<S1>\n\t<S2>\n\t<S3>\n\t<S4>\n\t<S5>\n\t<S6>");
}
default:
return;
}
}
If you do this, you are taking advantage of Rascal's matcher to actually find the matching strings directly, versus your first example in which any string would match but then you needed to use a number of separate comparisons to see if the match represented the combination you were looking for. If I run this on 110145 through 110150 it takes a while but works and it doesn't seem to grow beyond the heap space you allocated to it.
Also, is there a reason you are using fail? Is this to continue searching?

It's an algorithmic issue like Mark Hills said. In Rascal some short code can still entail a lot of nested loops, almost implicitly. Basically every * splice operator on a fresh variable that you use on the pattern side in a list generates one level of loop nesting, except for the last one which is just the rest of the list.
In your code of findClone2 you are first generating all combinations of sublists and then filtering them using the if construct. So that's a correct algorithm, but probably slow. This is your code:
void findClone2(list[str] In, list[str] whatWeSearchFor)
{
for ([*str head, *str mid, *str end] := In)
if (mid == whatWeSearchFor)
println("gotcha");
}
You see how it has a nested loop over In, because it has two effective * operators in the pattern. The code runs therefore in O(n^2), where n is the length of In. I.e. it has quadratic runtime behaviour for the size of the In list. In is a big list so this matters.
In the following new code, we filter first while generating answers, using fewer lines of code:
public void findCloneLinear(list[str] In, list[str] whatWeSearchFor)
{
for ([*str head, *whatWeSearchFor, *str end] := In)
println("gotcha");
}
The second * operator does not generate a new loop because it is not fresh. It just "pastes" the given list values into the pattern. So now there is actually only one effective * which generates a loop which is the first on head. This one makes the algorithm loop over the list. The second * tests if the elements of whatWeSearchFor are all right there in the list after head (this is linear in the size of whatWeSearchFor and then the last *_ just completes the list allowing for more stuff to follow.
It's also nice to know where the clone is sometimes:
public void findCloneLinear(list[str] In, list[str] whatWeSearchFor)
{
for ([*head, *whatWeSearchFor, *_] := In)
println("gotcha at <size(head)>");
}
Rascal does not have an optimising compiler (yet) which might possibly internally transform your algorithms to equivalent optimised ones. So as a Rascal programmer you are still asked to know the effect of loops on your algorithms complexity and know that * is a very short notation for a loop.

Real FFT output

I have implemented fft into at32ucb series ucontroller using kiss fft library and currently struggling with the output of the fft.
My intention is to analyse sound coming from piezo speaker.
Currently, the frequency of the sounder is 420Hz which I successfully got from the fft output (cross checked with an oscilloscope). However, the output frequency is just half of expected if I put function generator waveform into the system.
I suspect its the frequency bin calculation formula which I got wrong; currently using, fft_peak_magnitude_index*sampling frequency / fft_size.
My input is real and doing real fft. (output samples = N/2)
And also doing iir filtering and windowing before fft.
Any suggestion would be a great help!
// IIR filter calculation, n = 256 fft points
for (ctr=0; ctr<n; ctr++)
{
// filter calculation
y[ctr] = num_coef[0]*x[ctr];
y[ctr] += (num_coef[1]*x[ctr-1]) - (den_coef[1]*y[ctr-1]);
y[ctr] += (num_coef[2]*x[ctr-2]) - (den_coef[2]*y[ctr-2]);
y1[ctr] = y[ctr] - 510; //eliminate dc offset
// hamming window
hamming[ctr] = (0.54-((0.46) * cos(2*M_PI*ctr/n)));
window[ctr] = hamming[ctr]*y1[ctr];
fft_input[ctr].r = window[ctr];
fft_input[ctr].i = 0;
fft_output[ctr].r = 0;
fft_output[ctr].i = 0;
}
kiss_fftr_cfg fftConfig = kiss_fftr_alloc(n,0,NULL,NULL);
kiss_fftr(fftConfig, (kiss_fft_scalar * )fft_input, fft_output);
peak = 0;
freq_bin = 0;
for (ctr=0; ctr<n1; ctr++)
{
fft_mag[ctr] = 10*(sqrt((fft_output[ctr].r * fft_output[ctr].r) + (fft_output[ctr].i * fft_output[ctr].i)))/(0.5*n);
if(fft_mag[ctr] > peak)
{
peak = fft_mag[ctr];
freq_bin = ctr;
}
frequency = (freq_bin*(10989/n)); // 10989 is the sampling freq
//************************************
//Usart write
char filtResult[10];
//sprintf(filtResult, "%04d %04d %04d\n", (int)peak, (int)freq_bin, (int)frequency);
sprintf(filtResult, "%04d %04d %04d\n", (int)x[ctr], (int)fft_mag[ctr], (int)frequency);
char c;
char *ptr = &filtResult[0];
do
{
c = *ptr;
ptr++;
usart_bw_write_char(&AVR32_USART2, (int)c);
// sendByte(c);
} while (c != '\n');
}

The main problem is likely to be how you declared fft_input.
Based on your previous question, you are allocating fft_input as an array of kiss_fft_cpx. The function kiss_fftr on the other hand expect an array of scalar. By casting the input array into a kiss_fft_scalar with:
kiss_fftr(fftConfig, (kiss_fft_scalar * )fft_input, fft_output);
KissFFT essentially sees an array of real-valued data which contains zeros every second sample (what you filled in as imaginary parts). This is effectively an upsampled version (although without interpolation) of your original signal, i.e. a signal with effectively twice the sampling rate (which is not accounted for in your freq_bin to frequency conversion). To fix this, I suggest you pack your data into a kiss_fft_scalar array:
kiss_fft_scalar fft_input[n];
...
for (ctr=0; ctr<n; ctr++)
{
...
fft_input[ctr] = window[ctr];
...
}
kiss_fftr_cfg fftConfig = kiss_fftr_alloc(n,0,NULL,NULL);
kiss_fftr(fftConfig, fft_input, fft_output);
Note also that while looking for the peak magnitude, you probably are only interested in the final largest peak, instead of the running maximum. As such, you could limit the loop to only computing the peak (using freq_bin instead of ctr as an array index in the following sprintf statements if needed):
for (ctr=0; ctr<n1; ctr++)
{
fft_mag[ctr] = 10*(sqrt((fft_output[ctr].r * fft_output[ctr].r) + (fft_output[ctr].i * fft_output[ctr].i)))/(0.5*n);
if(fft_mag[ctr] > peak)
{
peak = fft_mag[ctr];
freq_bin = ctr;
}
} // close the loop here before computing "frequency"
Finally, when computing the frequency associated with the bin with the largest magnitude, you need the ensure the computation is done using floating point arithmetic. If as I suspect n is an integer, your formula would be performing the 10989/n factor using integer arithmetic resulting in truncation. This can be simply remedied with:
frequency = (freq_bin*(10989.0/n)); // 10989 is the sampling freq

Can iOS boot time drift?

I'm using this code to determine when my iOS device last rebooted:
int mib[MIB_SIZE];
size_t size;
struct timeval boottime;
mib[0] = CTL_KERN;
mib[1] = KERN_BOOTTIME;
size = sizeof(boottime);
if (sysctl(mib, MIB_SIZE, &boottime, &size, NULL, 0) != -1) {
return boottime.tv_sec;
}
return 0;
I'm seeing some anomalies with this time. In particular, I save the long and days and weeks later check the saved long agains the value returned by the above code.
I'm not sure, but I think I'm seeing some drift. This doesn't make any sense to me. I'm not converting to NSDate to prevent drift. I would think that boot time is record by the kernel when it boots and isn't computed again, it is just stored. But could iOS be saving boot time as an NSDate, with any inherent drift problems with that?

While the iOS Kernel is closed-source, it's reasonable to assume most of it is the same as the OSX Kernel, which is open-source.
Within osfmk/kern/clock.c there is the function:
/*
* clock_get_boottime_nanotime:
*
* Return the boottime, used by sysctl.
*/
void
clock_get_boottime_nanotime(
clock_sec_t *secs,
clock_nsec_t *nanosecs)
{
spl_t s;
s = splclock();
clock_lock();
*secs = (clock_sec_t)clock_boottime;
*nanosecs = 0;
clock_unlock();
splx(s);
}
and clock_boottime is declared as:
static uint64_t clock_boottime; /* Seconds boottime epoch */
and finally the comment to this function shows that it can, indeed, change:
/*
* clock_set_calendar_microtime:
*
* Sets the current calendar value by
* recalculating the epoch and offset
* from the system clock.
*
* Also adjusts the boottime to keep the
* value consistent, writes the new
* calendar value to the platform clock,
* and sends calendar change notifications.
*/
void
clock_set_calendar_microtime(
clock_sec_t secs,
clock_usec_t microsecs)
{
...
Update to answer query from OP
I am not certain about how often clock_set_calendar_microtime() is called, as I am not familiar with the inner workings of the kernel; however it adjusts the clock_boottime value and the clock_bootime value is initialized in clock_initialize_calendar(), so I would say it can be called more than once. I have been unable to find any call to it using:
$ find . -type f -exec grep -l clock_set_calendar_microtime {} \;

RE my comment above...
"to my understanding, when the user goes into settings and changes the
time manually, the boot time is changed by the delta to the new time
to keep the interval between boot time and system time, equal. but it
does not "drift" as it is a timestamp, only the system clock itself
drifts."
I'm running NTP on my iOS app, and speak with Google's time servers.
I feed NTP the uptime since boot (which doesn't pause and is correctly adjusted if some nefarious user starts messing with system time... which is the whole point of this in the first place), and then add the offset between uptime since boot and epoch time to my uptime.
inline static struct timeval uptime(void) {
struct timeval before_now, now, after_now;
after_now = since_boot();
do {
before_now = after_now;
gettimeofday(&now, NULL);
after_now = since_boot();
} while (after_now.tv_sec != before_now.tv_sec && after_now.tv_usec != before_now.tv_usec);
struct timeval systemUptime;
systemUptime.tv_sec = now.tv_sec - before_now.tv_sec;
systemUptime.tv_usec = now.tv_usec - before_now.tv_usec;
return systemUptime;
}
I sync with the time servers once every 15 minutes, and calculate the offset drift (aka on system clock drift) every time.
static void calculateOffsetDrift(void) {
static dispatch_queue_t offsetDriftQueue = dispatch_queue_create("", DISPATCH_QUEUE_CONCURRENT);
static double lastOffset;
dispatch_barrier_sync(offsetDriftQueue, ^{
double newOffset = networkOffset();
if (lastOffset != 0.0f) printf("offset difference = %f \n", lastOffset - newOffset);
lastOffset = newOffset;
});
}
On my iPhone Xs Max the system clock usually runs around 30ms behind over 15 minutes.
Here's some figures from a test I just ran using LTE in NYC..
+47.381592 ms
+43.325684 ms
-67.654541 ms
+24.860107 ms
+5.940674 ms
+25.395264 ms
-34.969971 ms

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Console Print Speed - dart

Related

Bandwidth Speed Test Slower Than Expected

Memory/CPU optimzation?

Why does this Rascal pattern matching code use so much memory and time?

Real FFT output

Can iOS boot time drift?

Categories

Resources