Separate printing for parallel processes - printing

I am using pathos.multiprocessing to run a function in parallel processes and with different input arguments per process. Here is a minimum working example:
import pathos.multiprocessing as mp
from time import sleep
def my_func(x, y):
for i in range(x):
print(y+i)
sleep(.2)
return i + y
seq = [(100, 4), (100, 5)]
processes = 2
print ("Multiprocessing...")
pool = mp.Pool(processes)
resultsObj = pool.starmap_async(my_func, seq )
pool.close()
results = resultsObj.get()
As expected, the results are printed mixed up from the 2 processes, like so:
Multiprocessing...
4
5
5
6
7
6
7
8
8
9
10
9
10
11
Is there a way to drive the results to 2 different terminal to watch the progress? Or any other way to have the results printed in a "per process" fashion?

I'm the pathos and multiprocess author. One way to split the output is to compose a nested map. Essentially, if you have a function that pipes the input to a terminal, then use a pathos.pools.ThreadPool to map the printing to two different terminals in thread parallel. As you are only working with two processes, then you could just include the function to split the output to the different terminals inside the mapped function you are using above. If you just want to know what process each output is coming from, you can alternately use multiprocess.process.current_process() or os.getpid().

Related

Loop to select which variable to omit from analysis

I have datasets with a large number of variables and I need to run PCA over these datasets with one variable removed each time. Below are 20 variables for an example dataset. I would like to run PCA with one variable removed from each PCA solution. For example, the first PCA solution will include all variables excluding Var_1_GroupA, the second will include all variables excluding Var_2_GroupA, etc. I am familiar with using macros to write loops but unsure how to complete the following task using macros or code in python.
Var_1_GroupA
Var_2_GroupA
Var_1_GroupB
Var_2_GroupB
Var_3_GroupB
Var_1_GroupC
Var_2_GroupC
Var_3_GroupC
Var_4_GroupC
Var_5_GroupC
Var_1_GroupD
Var_1_GroupE
new_Var_1_GroupA
new_Var_1_GroupB
new_Var_1_GroupC
new_Var_2_GroupC
Var_1_GroupF
Var_1_GroupG
Var_1_GroupH
Var_2_GroupH
In the example below I create 10 variables, and then run a simple means command with a different set of variables each time - excluding one of the variables at a time. You can edit the code to match your variables and your analysis code.
data list list/var1 to var10 (10F1).
begin data
1 2 3 4 5 6 7 8 9 9
5 4 3 6 3 8 1 2 5 8
0 8 6 4 2 1 3 5 7 9
end data.
dataset name wrk.
define !loopit (!pos=!cmdend)
!do !a !in(!1)
means
!do !b !in(!1) !if (!b<>!a) !then !b !ifend !doend
.
!doend
!enddefine.
!loopit var1 var2 var3 var4 var5 var6 var7 var8 var9 var10 .
note you vave to list the variable names in the macro call, can't use var1 to var10.
If you run into trouble while adapting this to your exact needs, these are very helpful in debugging macros:
set mexpand=on.
set mprint=on.

Memory Blow Up when using Dask compute or persist with Dask Delayed

I am trying to process the data of several subjects all in one dataframe. There are >30 subjects and 14 computations per subject it is a large data set but any more than 5 blows up the memory on the scheduler node with out running any workers on the same node as the scheduler it has 128gb of memory? Any ideas how I can get around this or if im doing something wrong? code bellow.
def channel_select(chn,sub):
subject = pd.DataFrame(df.loc[df['sub'] == sub])
subject['s0'] = subject[chn]
val = []
for x in range(13):
for i in range(len(subject)):
val.append(subject['s0'].values[i-x])
name = 's' + str(x+1)
subject[name] = val
val = []
return subject
subs = df['sub'].unique()
subs = np.delete(subs, [34,33])
for s in subs:
for c in chn:
chn_del.append(delayed(channel_select)(c,subs[s]))
results = e.persist(pred)
I have the code shown to run all the subjects but anymore than 5 at a time and I run out of memory
You're telling the computer to keep almost 1,000 GB of memory.
But you knew that already (:
As Mary stated above, every call to channel_select created and stores the dataframe on the schedulers memory, with 30 subjects calling 14 time each and a 2gb dataframe...yeah you can do the math of how much memory that was trying grab.

Generating means of a variable using dummy variables & foreach in Stata

My dataset includes TWO main variables X and Y.
Variable X represents distinct codes (e.g. 001X01, 001X02, etc) for multiple computer items with different brands.
Variable Y represents the tax charged for each code of variable X (e.g. 15 = 15% for 001X01) at a store.
I've created categories for these computer items using dummy variables (e.g. HD dummy variable for Hard-Drives, takes value of 1 when variable X represents a HD, etc). I have a list of over 40 variables (two of them representing X and Y, and the rest is a bunch of dummy variables for the different categories I've created for computer items).
I would like to display the averages of all these categories using a loop in Stata, but I'm not sure how to do this.
For example the code:
mean Y if HD == 1
Mean estimation Number of obs = 5
--------------------------------------------------------------
| Mean Std. Err. [95% Conf. Interval]
-------------+------------------------------------------------
Tax | 7.1 2.537716 1.154172 15.24583
gives me the mean Tax for the category representing Hard Drives. How can I use a loop in Stata to automatically display all the mean Taxes charged for each category? I would do it by hand without a problem, but I want to repeat this process for multiple years, so I would like to use a loop for each year in order to come up with this output.
My goal is to create a separate Excel file with each of the computer categories I've created (38 total) and the average tax for each category by year.
Why bother with the loop and creating the indicator variables? If I understand correctly, your initial dataset allows the use of a simple collapse:
clear all
set more off
input ///
code tax str10 categ
1 0.15 "hd"
2 0.25 "pend"
3 0.23 "mouse"
4 0.29 "pend"
5 0.16 "pend"
6 0.50 "hd"
7 0.54 "monitor"
8 0.22 "monitor"
9 0.21 "mouse"
10 0.76 "mouse"
end
list
collapse (mean) tax, by(categ)
list
To take to Excel you can try export excel or put excel.
Run help collapse and help export for details.
Edit
Because you insist, below is an example that gives the same result using loops.
I assume the same data input as before. Some testing using this example database
with expand 1000000, shows that speed is virtually the same. But almost surely,
you (including your future you) and your readers will prefer collapse.
It is much clearer, cleaner and concise. It is even prettier.
levelsof categ, local(parts)
gen mtax = .
quietly {
foreach part of local parts {
summarize tax if categ == "`part'", meanonly
replace mtax = r(mean) if categ == "`part'"
}
}
bysort categ: keep if _n == 1
keep categ mtax
Stata has features that make it quite different from other languages. Once you
start getting a hold of it, you will find that many things done with loops elsewhere,
can be made loop-less in Stata. In many cases, the latter style will be preferred.
See corresponding help files using help <command> and if you are not familiarized with saved results (e.g. r(mean)), type help return.
A supplement to Roberto's excellent answer: After collapse, you will need a loop to export the results to excel.
levelsof categ, local(levels)
foreach x of local levels {
export excel `x', replace
}
I prefer to use numerical codes for variables such as your category variable. I then assign them value labels. Here's a version of Roberto's code which does this and which, for closer correspondence to your problem, adds a "year" variable
input code tax categ year
1 0.15 1 1999
2 0.25 2 2000
3 0.23 3 2013
4 0.29 1 2010
5 0.16 2 2000
6 0.50 1 2011
7 0.54 4 2000
8 0.22 4 2003
9 0.21 3 2004
10 0.76 3 2005
end
#delim ;
label define catl
1 hd
2 pend
3 mouse
4 monitor
;
#delim cr
label values categ catl
collapse (mean) tax, by(categ year)
levelsof categ, local(levels)
foreach x of local levels {
export excel `:label (categ) `x'', replace
}
The #delim ; command makes it possible to easily list each code on a separate line. The"label" function in the export statement is an extended macro function to insert a value label into the file name.

Interpreter with a one-register VM - possible to evaluate all math. expressions?

I'm writing an interpreter. I've done that before but never tried one which can work with expressions like 3 + 4 * 2 / ( 1 − 5 ) ^ 2 ^ 3.
I'm not having a problem with the parsing process, actually it is about my VM which then executes the code.
My goal was a fast interpreter and so I decided not to use a stack-based VM where you would need more than one instruction for a multiplication, for example (push, push, mul)
The "assembly" code for the VM generated by the parser looks as following:
3 + 4 * 2 / ( 1 − 5 ) ^ 2 ^ 3
becomes
sub 1 5
pow result 2
pow result 3
div 2 result
mul 4 result
add 3 result
(The result is correct)
As you can see: Every instruction takes no, one or two arguments. There is the result register which holds the result of the last instruction. And that's it.
Can a VM with a language of this structure and only one register calculate every mathematical expression for example Python or PHP can?
If it is not possible without a stack I'll start over right now!
What do you do about (1 + 2) * (3 + 4), or any other that would require you to calculate more than one intermediate result?

How do I format a PRINT or WRITE statement to overwrite the current line on the console screen?

I want to display the progress of a calculation done with a DO-loop, on the console screen. I can print out the progress variable to the terminal like this:
PROGRAM TextOverWrite_WithLoop
IMPLICIT NONE
INTEGER :: Number, Maximum = 10
DO Number = 1, MAXIMUM
WRITE(*, 100, ADVANCE='NO') REAL(Number)/REAL(Maximum)*100
100 FORMAT(TL10, F10.2)
! Calcultations on Number
END DO
END PROGRAM TextOverWrite_WithLoop
The output of the above code on the console screen is:
10.00 20.00 30.00 40.00 50.00 60.00 70.00 80.00
90.00 100.00
All on the same line, wrapped only by the console window.
The ADVANCE='No' argument and the TL10 (tab left so many spaces) edit descriptor works well to overwrite text on the same line, e.g. the output of the following code:
WRITE(*, 100, ADVANCE='NO') 100, 500
100 FORMAT(I3, 1X, TL4, I3)
Is:
500
Instead of:
100 500
Because of the TL4 edit descriptor.
From these two instances one can conclude that the WRITE statement cannot overwrite what has been written by another WRITE statement or by a previous execution of the same WRITE satement (as in a DO-loop).
Can this be overcome somehow?
I am using the FTN95 compiler on Windows 7 RC1. (The setup program of the G95 compiler bluescreens Windows 7 RC1, even thought it works fine on Vista.)
I know about the question Supressing line breaks in Fortran 95 write statements, but it does not work for me, because the answer to that question means new ouput is added to the previous output on the same line; instead of new output overwriting the previous output.
Thanks in advance.
The following should be portable across systems by use of ACHAR(13) to encode the carriage return.
character*1 creturn
! CODE::
creturn = achar(13) ! generate carriage return
! other code ...
WRITE( * , 101 , ADVANCE='NO' ) creturn , i , npoint
101 FORMAT( a , 'Point number : ',i7,' out of a total of ',i7)
There is no solution to this question within the scope of the Fortran standards. However, if your compiler understand backslash in Fortran strings (GNU Fortran does if you use the option -fbackslash), you can write
write (*,"(A)",advance="no") "foo"
call sleep(1)
write (*,"(A)",advance="no") "\b\b\bbar"
call sleep(1)
write (*,"(A)",advance="no") "\b\b\bgee"
call sleep(1)
write (*,*)
end
This uses the backslash character (\b) to erase previously written characters on that line.
NB: if your compiler does not understand advance="no", you can use related non-standard tricks, such as using the $ specifier in the format string.
The following worked perfectly using g95 fortran:
NF = NF + 1
IF(MOD(NF,5).EQ.0) WRITE(6,42,ADVANCE='NO') NF, ' PDFs'//CHAR(13)
42 FORMAT(I6,A)
gave:
5 PDFs
leaving the cursor at the #1 position on the same line. On the next update,
the 5 turned into a 10. ASCII 13 (decimal) is a carriage return.
OPEN(6,CARRIAGECONTROL ='FORTRAN')
DO I=1,5
WRITE(6,'(1H+" ",I)') I
ENDDO

Resources