Memory/CPU optimzation? - memory

My program uses alot of memory and Processing power, I can only search up to 6000, is there any way to reduce the amount of memory this uses? This will really help with future programming endevours as it will be nice to know how to work with memory smartly.
ArrayList<Integer> factor = new ArrayList<Integer>();
ArrayList<Integer> non = new ArrayList<Integer>();
ArrayList<Integer> prime = new ArrayList<Integer>();
Scanner sc = new Scanner(System.in);
System.out.println("Please enter how high we want to search");
long startTime = System.nanoTime();
int max = sc.nextInt();
int number = 2;
while (number < max)
{
for (int i=0;i<prime.size();i++)
{
int value = prime.get(i);
if (number % value == 0)
{
factor.add(value);
}
else
{
non.add(value);
}
}
if(factor.isEmpty())
{
prime.add(number);
}
else
{
composite.add(number);
}
factor.clear();
number++;
}
int howMany=prime.size();
System.out.printf("The are "+howMany+" prime numbers up to " +max + " and they are: " +prime );
System.out.println();
}

You do not say what language you are using, so this answer will be general.
To store primes up to 6,000 you only need about 3,000 bits which is less than 380 bytes. Your basic solution is the Sieve of Eratosthenes and the fact that 2 is the only even prime. You set up the sieve to handle only odd numbers, which halves the storage needed. Since the sieve only holds prime or not prime for each odd number, the storage can be reduced to a single bit for each number.
Once you have set up your sieve, there are many sites including this one which have instructions in different languages, you just need to retrieve the prime/not prime value from the sieve for the numbers in your range. Here is the pseudocode for checking if a number is prime, assuming the sieve has already been set up:
boolean function isPrime(number)
// Low numbers
if (number < 2)
return false
endif
// Even numbers
if (number is even)
return number == 2
endif
// Odd numbers >= 3
return sieve[(number - 1) / 2] == 1
end function
Low numbers are not prime. 2 is the only even prime; all other even numbers are not prime. The prime flag for the odd number 2n+1 is stored at bit n in the sieve. This assumes that the language you are using allows bit level access, something like a BitSet in Java.

Related

writing to flash memory dspic33e

I have some questions regarding the flash memory with a dspic33ep512mu810.
I'm aware of how it should be done:
set all the register for address, latches, etc. Then do the sequence to start the write procedure or call the builtins function.
But I find that there is some small difference between what I'm experiencing and what is in the DOC.
when writing the flash in WORD mode. In the DOC it is pretty straightforward. Following is the example code in the DOC
int varWord1L = 0xXXXX;
int varWord1H = 0x00XX;
int varWord2L = 0xXXXX;
int varWord2H = 0x00XX;
int TargetWriteAddressL; // bits<15:0>
int TargetWriteAddressH; // bits<22:16>
NVMCON = 0x4001; // Set WREN and word program mode
TBLPAG = 0xFA; // write latch upper address
NVMADR = TargetWriteAddressL; // set target write address
NVMADRU = TargetWriteAddressH;
__builtin_tblwtl(0,varWord1L); // load write latches
__builtin_tblwth(0,varWord1H);
__builtin_tblwtl(0x2,varWord2L);
__builtin_tblwth(0x2,varWord2H);
__builtin_disi(5); // Disable interrupts for NVM unlock sequence
__builtin_write_NVM(); // initiate write
while(NVMCONbits.WR == 1);
But that code doesn't work depending on the address where I want to write. I found a fix to write one WORD but I can't write 2 WORD where I want. I store everything in the aux memory so the upper address(NVMADRU) is always 0x7F for me. The NVMADR is the address I can change. What I'm seeing is that if the address where I want to write modulo 4 is not 0 then I have to put my value in the 2 last latches, otherwise I have to put the value in the first latches.
If address modulo 4 is not zero, it doesn't work like the doc code(above). The value that will be at the address will be what is in the second set of latches.
I fixed it for writing only one word at a time like this:
if(Address % 4)
{
__builtin_tblwtl(0, 0xFFFF);
__builtin_tblwth(0, 0x00FF);
__builtin_tblwtl(2, ValueL);
__builtin_tblwth(2, ValueH);
}
else
{
__builtin_tblwtl(0, ValueL);
__builtin_tblwth(0, ValueH);
__builtin_tblwtl(2, 0xFFFF);
__builtin_tblwth(2, 0x00FF);
}
I want to know why I'm seeing this behavior?
2)I also want to write a full row.
That also doesn't seem to work for me and I don't know why because I'm doing what is in the DOC.
I tried a simple write row code and at the end I just read back the first 3 or 4 element that I wrote to see if it works:
NVMCON = 0x4002; //set for row programming
TBLPAG = 0x00FA; //set address for the write latches
NVMADRU = 0x007F; //upper address of the aux memory
NVMADR = 0xE7FA;
int latchoffset;
latchoffset = 0;
__builtin_tblwtl(latchoffset, 0);
__builtin_tblwth(latchoffset, 0); //current = 0, available = 1
latchoffset+=2;
__builtin_tblwtl(latchoffset, 1);
__builtin_tblwth(latchoffset, 1); //current = 0, available = 1
latchoffset+=2;
.
. all the way to 127(I know I could have done it in a loop)
.
__builtin_tblwtl(latchoffset, 127);
__builtin_tblwth(latchoffset, 127);
INTCON2bits.GIE = 0; //stop interrupt
__builtin_write_NVM();
while(NVMCONbits.WR == 1);
INTCON2bits.GIE = 1; //start interrupt
int testaddress;
testaddress = 0xE7FA;
status = NVMemReadIntH(testaddress);
status = NVMemReadIntL(testaddress);
testaddress += 2;
status = NVMemReadIntH(testaddress);
status = NVMemReadIntL(testaddress);
testaddress += 2;
status = NVMemReadIntH(testaddress);
status = NVMemReadIntL(testaddress);
testaddress += 2;
status = NVMemReadIntH(testaddress);
status = NVMemReadIntL(testaddress);
What I see is that the value that is stored in the address 0xE7FA is 125, in 0xE7FC is 126 and in 0xE7FE is 127. And the rest are all 0xFFFF.
Why is it taking only the last 3 latches and write them in the first 3 address?
Thanks in advance for your help people.
The dsPIC33 program memory space is treated as 24 bits wide, it is
more appropriate to think of each address of the program memory as a
lower and upper word, with the upper byte of the upper word being
unimplemented
(dsPIC33EPXXX datasheet)
There is a phantom byte every two program words.
Your code
if(Address % 4)
{
__builtin_tblwtl(0, 0xFFFF);
__builtin_tblwth(0, 0x00FF);
__builtin_tblwtl(2, ValueL);
__builtin_tblwth(2, ValueH);
}
else
{
__builtin_tblwtl(0, ValueL);
__builtin_tblwth(0, ValueH);
__builtin_tblwtl(2, 0xFFFF);
__builtin_tblwth(2, 0x00FF);
}
...will be fine for writing a bootloader if generating values from a valid Intel HEX file, but doesn't make it simple for storing data structures because the phantom byte is not taken into account.
If you create a uint32_t variable and look at the compiled HEX file, you'll notice that it in fact uses up the least significant words of two 24-bit program words. I.e. the 32-bit value is placed into a 64-bit range but only 48-bits out of the 64-bits are programmable, the others are phantom bytes (or zeros). Leaving three bytes per address modulo of 4 that are actually programmable.
What I tend to do if writing data is to keep everything 32-bit aligned and do the same as the compiler does.
Writing:
UINT32 value = ....;
:
__builtin_tblwtl(0, value.word.word_L); // least significant word of 32-bit value placed here
__builtin_tblwth(0, 0x00); // phantom byte + unused byte
__builtin_tblwtl(2, value.word.word_H); // most significant word of 32-bit value placed here
__builtin_tblwth(2, 0x00); // phantom byte + unused byte
Reading:
UINT32 *value
:
value->word.word_L = __builtin_tblrdl(offset);
value->word.word_H = __builtin_tblrdl(offset+2);
UINT32 structure:
typedef union _UINT32 {
uint32_t val32;
struct {
uint16_t word_L;
uint16_t word_H;
} word;
uint8_t bytes[4];
} UINT32;

Getting a specific number from a bigger number?

a = (any random number)
Is there a way could I get the third number (12345) without converting it into a string?
If not, what would be a good way to get it by converting it into a string?
You can use the sub function
function getDigit(value,digitPlace)
return tonumber(tostring(value):sub(digitPlace,digitPlace))
end
This will get the third digit of a as a number:
a = 12345
print(getDigit(a,3))
You can get that using simple maths.
function getDigitInt(value, digit)
-- get rid of the sign
value = math.abs(value)
-- how many digits does the number have?
local numDigits = math.floor(math.log(value, 10)) + 1
-- does the requested digit exist?
if digit > numDigits or digit < 1 then
print("digit does not exist")
return
end
-- return the requested digit
return math.floor(value / 10^(numDigits - digit)) % 10
end
-- test
for i = 0, 8 do print(getDigitInt(1234567, i)) end
Add more error handling as needed. Also this can only handle integers of course. But I'm sure you will find out how to apply this idea to decimals as well.
You can convert the number into array and find the any place easily like blow
public int GetDigitsPlace(int number, int digitPlace) {
string t = number.ToString();
int[] nArr = new int[t.Length];
for(int i = 0; i < nArr.Length; i++) {
nArr[i] = int.Parse(t[i]);
}
return nArr[digitPlace];
}

Dart: How to iterate digits of integer?

How to iterate digits of integer? for example sum of digits here, it works, but is any way to right way?
int sumOfDigits(int num) {
int sum = 0;
String numtostr = num.toString();
for (var i = 0; i < numtostr.length; i++) {
sum = sum + int.parse(numtostr[i]);
}
return sum;
}
If you're looking for a shorter way to do this, you can combine split, map and reduce
int sum = num.split('').map((e) => int.parse(e)).reduce((t, e) => t + e);
You can even do this:
int sum = num.split('').map(int.parse).reduce((t, e) => t + e);
Thank you #julemand101
It's fairly inefficient to create a string, then split the string, and parse the individual digits back to integers.
How about something like:
Iterable<int> digitsOf(int number) sync* {
do {
yield = number.remainder(10);
number ~/= 10;
} while (number != 0);
}
This iterates the digits of the (non-negative) number in base 10, from least significant to most significant, without allocating any strings along the way.
If you want the digits in the reverse order, you can either create a list from the iterable above and reverse it, or use a different approach:
Iterable<int> digitsHighToLow(int number) sync* {
var base = 1;
while (base * 10 < number) {
base = base * 10;
}
do {
var digit = number ~/ base;
yield digit;
number = (number - digit * base) * 10;
} while (number != 0);
}
(again, only works on non-negative numbers, you'll have to figure out what you want for negative numbers, either throw, or try negating the number, it's the same digits after all, or something else).

Real FFT output

I have implemented fft into at32ucb series ucontroller using kiss fft library and currently struggling with the output of the fft.
My intention is to analyse sound coming from piezo speaker.
Currently, the frequency of the sounder is 420Hz which I successfully got from the fft output (cross checked with an oscilloscope). However, the output frequency is just half of expected if I put function generator waveform into the system.
I suspect its the frequency bin calculation formula which I got wrong; currently using, fft_peak_magnitude_index*sampling frequency / fft_size.
My input is real and doing real fft. (output samples = N/2)
And also doing iir filtering and windowing before fft.
Any suggestion would be a great help!
// IIR filter calculation, n = 256 fft points
for (ctr=0; ctr<n; ctr++)
{
// filter calculation
y[ctr] = num_coef[0]*x[ctr];
y[ctr] += (num_coef[1]*x[ctr-1]) - (den_coef[1]*y[ctr-1]);
y[ctr] += (num_coef[2]*x[ctr-2]) - (den_coef[2]*y[ctr-2]);
y1[ctr] = y[ctr] - 510; //eliminate dc offset
// hamming window
hamming[ctr] = (0.54-((0.46) * cos(2*M_PI*ctr/n)));
window[ctr] = hamming[ctr]*y1[ctr];
fft_input[ctr].r = window[ctr];
fft_input[ctr].i = 0;
fft_output[ctr].r = 0;
fft_output[ctr].i = 0;
}
kiss_fftr_cfg fftConfig = kiss_fftr_alloc(n,0,NULL,NULL);
kiss_fftr(fftConfig, (kiss_fft_scalar * )fft_input, fft_output);
peak = 0;
freq_bin = 0;
for (ctr=0; ctr<n1; ctr++)
{
fft_mag[ctr] = 10*(sqrt((fft_output[ctr].r * fft_output[ctr].r) + (fft_output[ctr].i * fft_output[ctr].i)))/(0.5*n);
if(fft_mag[ctr] > peak)
{
peak = fft_mag[ctr];
freq_bin = ctr;
}
frequency = (freq_bin*(10989/n)); // 10989 is the sampling freq
//************************************
//Usart write
char filtResult[10];
//sprintf(filtResult, "%04d %04d %04d\n", (int)peak, (int)freq_bin, (int)frequency);
sprintf(filtResult, "%04d %04d %04d\n", (int)x[ctr], (int)fft_mag[ctr], (int)frequency);
char c;
char *ptr = &filtResult[0];
do
{
c = *ptr;
ptr++;
usart_bw_write_char(&AVR32_USART2, (int)c);
// sendByte(c);
} while (c != '\n');
}
The main problem is likely to be how you declared fft_input.
Based on your previous question, you are allocating fft_input as an array of kiss_fft_cpx. The function kiss_fftr on the other hand expect an array of scalar. By casting the input array into a kiss_fft_scalar with:
kiss_fftr(fftConfig, (kiss_fft_scalar * )fft_input, fft_output);
KissFFT essentially sees an array of real-valued data which contains zeros every second sample (what you filled in as imaginary parts). This is effectively an upsampled version (although without interpolation) of your original signal, i.e. a signal with effectively twice the sampling rate (which is not accounted for in your freq_bin to frequency conversion). To fix this, I suggest you pack your data into a kiss_fft_scalar array:
kiss_fft_scalar fft_input[n];
...
for (ctr=0; ctr<n; ctr++)
{
...
fft_input[ctr] = window[ctr];
...
}
kiss_fftr_cfg fftConfig = kiss_fftr_alloc(n,0,NULL,NULL);
kiss_fftr(fftConfig, fft_input, fft_output);
Note also that while looking for the peak magnitude, you probably are only interested in the final largest peak, instead of the running maximum. As such, you could limit the loop to only computing the peak (using freq_bin instead of ctr as an array index in the following sprintf statements if needed):
for (ctr=0; ctr<n1; ctr++)
{
fft_mag[ctr] = 10*(sqrt((fft_output[ctr].r * fft_output[ctr].r) + (fft_output[ctr].i * fft_output[ctr].i)))/(0.5*n);
if(fft_mag[ctr] > peak)
{
peak = fft_mag[ctr];
freq_bin = ctr;
}
} // close the loop here before computing "frequency"
Finally, when computing the frequency associated with the bin with the largest magnitude, you need the ensure the computation is done using floating point arithmetic. If as I suspect n is an integer, your formula would be performing the 10989/n factor using integer arithmetic resulting in truncation. This can be simply remedied with:
frequency = (freq_bin*(10989.0/n)); // 10989 is the sampling freq

Console Print Speed

I’ve been looking at a few example programs in order to find better ways to code with Dart.
Not that this example (below) is of any particular importance, however it is taken from rosettacode dot org with alterations by me to (hopefully) bring it up-to-date.
The point of this posting is with regard to Benchmarks and what may be detrimental to results in Dart in some Benchmarks in terms of the speed of printing to the console compared to other languages. I don’t know what the comparison is (to other languages), however in Dart, the Console output (at least in Windows) appears to be quite slow even using StringBuffer.
As an aside, in my test, if n1 is allowed to grow to 11, the total recursion count = >238 million, and it takes (on my laptop) c. 2.9 seconds to run Example 1.
In addition, of possible interest, if the String assignment is altered to int, without printing, no time is recorded as elapsed (Example 2).
Typical times on my low-spec laptop (run from the Console - Windows).
Elapsed Microseconds (Print) = 26002
Elapsed Microseconds (StringBuffer) = 9000
Elapsed Microseconds (no Printing) = 3000
Obviously in this case, console print times are a significant factor relative to computation etc. times.
So, can anyone advise how this compares to eg. Java times for console output? That would at least be an indication as to whether Dart is particularly slow in this area, which may be relevant to some Benchmarks. Incidentally, times when running in the Dart Editor incur a negligible penalty for printing.
// Example 1. The base code for the test (Ackermann).
main() {
for (int m1 = 0; m1 <= 3; ++m1) {
for (int n1 = 0; n1 <= 4; ++n1) {
print ("Acker(${m1}, ${n1}) = ${fAcker(m1, n1)}");
}
}
}
int fAcker(int m2, int n2) => m2==0 ? n2+1 : n2==0 ?
fAcker(m2-1, 1) : fAcker(m2-1, fAcker(m2, n2-1));
The altered code for the test.
// Example 2 //
main() {
fRunAcker(1); // print
fRunAcker(2); // StringBuffer
fRunAcker(3); // no printing
}
void fRunAcker(int iType) {
String sResult;
StringBuffer sb1;
Stopwatch oStopwatch = new Stopwatch();
oStopwatch.start();
List lType = ["Print", "StringBuffer", "no Printing"];
if (iType == 2) // Use StringBuffer
sb1 = new StringBuffer();
for (int m1 = 0; m1 <= 3; ++m1) {
for (int n1 = 0; n1 <= 4; ++n1) {
if (iType == 1) // print
print ("Acker(${m1}, ${n1}) = ${fAcker(m1, n1)}");
if (iType == 2) // StringBuffer
sb1.write ("Acker(${m1}, ${n1}) = ${fAcker(m1, n1)}\n");
if (iType == 3) // no printing
sResult = "Acker(${m1}, ${n1}) = ${fAcker(m1, n1)}\n";
}
}
if (iType == 2)
print (sb1.toString());
oStopwatch.stop();
print ("Elapsed Microseconds (${lType[iType-1]}) = "+
"${oStopwatch.elapsedMicroseconds}");
}
int fAcker(int m2, int n2) => m2==0 ? n2+1 : n2==0 ?
fAcker(m2-1, 1) : fAcker(m2-1, fAcker(m2, n2-1));
//Typical times on my low-spec laptop (run from the console).
// Elapsed Microseconds (Print) = 26002
// Elapsed Microseconds (StringBuffer) = 9000
// Elapsed Microseconds (no Printing) = 3000
I tested using Java, which was an interesting exercise.
The results from this small test indicate that Dart takes about 60% longer for the console output than Java, using the results from the fastest for each. I really need to do a larger test with more terminal output, which I will do.
In terms of "computational" speed with no output, using this test and m = 3, and n = 10, the comparison is consistently around 530 milliseconds for Java compared to 580 milliseconds for Dart. That is 59.5 million calls. Java bombs with n = 11 (238 million calls), which I presume is stack overflow. I'm not saying that is a definitive benchmark of much, but it is an indication of something. Dart appears to be very close in the computational time which is pleasing to see. I altered the Dart code from using the "question mark operator" to use "if" statements the same as Java, and that appears to be a bit faster c. 10% or more, and that appeared to be consistently the case.
I ran a further test for console printing as shown below (example 1 – Dart), (Example 2 – Java).
The best times for each are as follows (100,000 iterations) :
Dart 47 seconds.
Java 22 seconds.
Dart Editor 2.3 seconds.
While it is not earth-shattering, it does appear to illustrate that for some reason (a) Dart is slow with console output, and (b) Dart-Editor is extremely fast with console output. (c) This needs to be taken into account when evaluating any performance that involves console output, which is what initially drew my attention to it.
Perhaps when they have time :) the Dart team could look at this if it is considered worthwhile.
Example 1 - Dart
// Dart - Test 100,000 iterations of console output //
Stopwatch oTimer = new Stopwatch();
main() {
// "warm-up"
for (int i1=0; i1 < 20000; i1++) {
print ("The quick brown fox chased ...");
}
oTimer.reset();
oTimer.start();
for (int i2=0; i2 < 100000; i2++) {
print ("The quick brown fox chased ....");
}
oTimer.stop();
print ("Elapsed time = ${oTimer.elapsedMicroseconds/1000} milliseconds");
}
Example 2 - Java
public class console001
{
// Java - Test 100,000 iterations of console output
public static void main (String [] args)
{
// warm-up
for (int i1=0; i1<20000; i1++)
{
System.out.println("The quick brown fox jumped ....");
}
long tmStart = System.nanoTime();
for (int i2=0; i2<100000; i2++)
{
System.out.println("The quick brown fox jumped ....");
}
long tmEnd = System.nanoTime() - tmStart;
System.out.println("Time elapsed in microseconds = "+(tmEnd/1000));
}
}

Resources