When we do not know the number of targets, Can we create the targets at run time using informatica Powercenter.
Suppose we have below source:
Employee:
Dept_ID EmpName Sal
10 A 200
11 B 100
10 C 200
10 D 400
12 E 500
12 F 400
...
It can have any number of distinct Dept_ID.
I want to load all EmpName and Sal of a particular Dept_ID into a separate target table (i.e target name should be as Tar_10 or Tar_11 where 10 & 11 are Dept_ID).
You can achieve this by the following method:
While creating the target check the include file name port checkbox.
Use an expression to create the file name port name, something like " 'Tar' || Dept_ID" should do.
Use a sorter to sort your input with respect to Dept_ID.
Use a transaction control transformation on the condition that when Dept_ID is different from previous Dept_ID use "TC_COMMIT_AFTER", this will keep changing the file name depending on your input.
Your Output will look like this :
TAR_10
10 A 200
10 C 200
10 D 400
TAR_11
11 B 100
TAR_12
12 E 500
12 F 400
Yes Sumit is right. You can achieve this by creating the File name port and Transaction control. Also if your target is file then you can write the whole record in 1 port so there is no need to worry about the target structure either.
Related
Chapter 3 of Starting FORTH says,
Now that you've made a block "current", you can list it by simply typing the word L. Unlike LIST, L does not want to be proceeded by a block number; instead it lists the current block.
When I run 180 LIST, I get
Screen 180 not modified
0
...
15
ok
But when I run L, I get an error
:30: Undefined word
>>>L<<<
Backtrace:
$7F0876E99A68 throw
$7F0876EAFDE0 no.extensions
$7F0876E99D28 interpreter-notfound1
What am I doing wrong?
Yes, gForth supports an internal (BLOCK) editor. Start gforth
type: use blocked.fb (a demo page)
type: 1 load
type editor
words will show the editor words,
s b n bx nx qx dl il f y r d i t 'par 'line 'rest c a m ok
type 0 l to list screen 0 which describes the editor,
Screen 0 not modified
0 \\ some comments on this simple editor 29aug95py
1 m marks current position a goes to marked position
2 c moves cursor by n chars t goes to line n and inserts
3 i inserts d deletes marked area
4 r replaces marked area f search and mark
5 il insert a line dl delete a line
6 qx gives a quick index nx gives next index
7 bx gives previous index
8 n goes to next screen b goes to previous screen
9 l goes to screen n v goes to current screen
10 s searches until screen n y yank deleted string
11
12 Syntax and implementation style a la PolyFORTH
13 If you don't like it, write a block editor mode for Emacs!
14
15
ok
Creating your own block file
To create your own new block file myblocks.fb
type: use blocked.fb
type: 1 load
type editor
Then
type use myblocks.fb
1 load will show BLOCK #1 (lines 0 till 15. 16 Lines of 64 characters each)
1 t will highlight line 1
Type i this is text to [i]nsert into line 1
After the current BLOCK is edited type flush in order to write BLOCK #1 to the file myblocks.fb
For more information see, gForth Blocks
It turns out these are "Editor Commands" the book says,
For Those Whose EDITOR Doesn't Follow These Rules
The FORTH-79 Standard does not specify editor commands. Your system may use a different editor; if so, check your systems documentation
I don't believe gforth supports an internal editor at all. So L, T, I, P, F, E, D, R are all presumably unsupported.
gforth is well integrated with emacs. In my xemacs here, by default any file called *.fs is considered FORTH source. "C-h m", as usual, gives the available commands.
No, GNU Forth doesn't have an internal editor; I use Vim :)
I have a file with a different number of rows for every "unit", and I'd like all the units to have the same number of rows, by adding the right number of empty rows per unit in the data.
For example:
data list list/ unit serial someData.
begin data.
1 1 54
2 1 57
2 2 87
2 3 91
3 1 17
3 2 43
end data.
what i'd like to get to is this:
1 1 54
1 2 .
1 3 .
2 1 57
2 2 87
2 3 91
3 1 17
3 2 43
3 3 .
I've worked with simple workarounds, for example casestovars => varstocases (keeping nulls), or preparing a base file with all the lines with unit names and serials, and then matching it with the data file so I end up with all the lines and all the data.
Could anyone suggest a more direct (\elegant\efficient\simple) approach?
Thanks!
Cartesian product is what you require here.
Using your example data and downloading the Custom Extension Command, you can solve as below:
data list list/ unit serial someData.
begin data.
1 1 54
2 1 57
2 2 87
2 3 91
3 1 17
3 2 43
end data.
DATASET NAME ds0.
DATASET ACTIVATE ds0.
STATS CARTPROD VAR1=unit VAR2=serial /SAVE OUTFILE="C:\Temp\dsCart".
SORT CASES BY unit serial.
MATCH FILES FILE=* /BY unit serial /FIRST=Primary.
SELECT IF Primary.
MATCH FILES FILE=* /FILE=ds0 /BY unit serial /DROP=Primary.
EXE.
I'm not sure how efficient this Custom Extension Command is so you may want to experiment with different flavours of using STATS CARTPROD. An alternative approach would be to create two datasets (left and right) with your unique unit and serial values and then process these through the STATS CARTPROD command.
You already mentioned it: creating a base file with all the lines with unit names and serials, and then matching it with the data file would be a simple approach. I'd like to outline this one here for other readers.
So for the questions example you would create the base data set like this:
INPUT PROGRAM.
LOOP #i = 1 to 3. /* 3 = maximum value of unit.
LOOP # = 1 to 3. /* 3 = maximum value of serial.
COMPUTE unit = #i.
COMPUTE serial = #j.
END CASE.
END LOOP.
END LOOP.
END FILE.
END INPUT PROGRAM.
DATASET NAME base.
EXECUTE.
The data set will look like this.
unit serial
1 1
1 2
1 3
2 1
2 2
2 3
3 1
3 2
3 3
The following match files command will bring the wanted result.
MATCH FILES
/FILE base
/FILE data1
/BY unit serial.
If you want the code be more flexible regarding the maximum value of "unit" and "serial" you can make use of the python extension:
BEGIN PROGRAM.
import spss, spssdata
# list of variable names
variables = ["unit", "serial"]
#fetch variable data
data = spssdata.Spssdata(variables).fetchall()
# get maximum of 'unit' and 'serial'
maxunit = max([int(i[0]) for i in data])
maxserial = max([int(i[1]) for i in data])
# create base data set
spss.Submit('''
INPUT PROGRAM.
LOOP #i = 1 to {maxu}.
LOOP #j = 1 to {maxs}.
COMPUTE unit = #i.
COMPUTE serial = #j.
END CASE.
END LOOP.
END LOOP.
END FILE.
END INPUT PROGRAM.
DATASET NAME base.
EXECUTE.
'''.format(maxu=maxunit, maxs=maxserial))
END PROGRAM.
I am trying to create a file descriptor using the command:
$ MAHOUT_HOME/core/target/mahout-core--job.jar org.apache.mahout.classifier.df.tools.Describe -p testdata/KDDTrain+.arff -f testdata/KDDTrain+.info -d N 3 C 2 N C 4 N C 8 N 2 C 19 N L
from the link:
https://mahout.apache.org/users/classification/partial-implementation.html on my data file but whatever file I take and change the number of attributes string N 3 C 2 N C 4 N C 8 N 2 C 19 N L .
I get the following exception:
Exception in thread "main" java.lang.IllegalArgumentException: Wrong number of attributes in the string
Please help!
There are a couple of reasons for which you might get an error like that...
Wrong Descriptor: Putting this for a sake of completeness. You must have already checked this one out. You have actually given a wrong descriptor for the data. Re-check the number and type of columns and then give them correctly to the descriptor.
Bad separator: Re-check the delimiter used in the data. That also might create some trouble. May be the data you have has some wrongly placed delimiter in some records. Make sure of that.
Special Characters: In my few experiments, I have noticed mahout does not enjoy if there are certain special characters, or data consists of characters of language other than English (unless of course, you tweak around the code). So make sure you have a way of handling them, and you should be good to go.
Anyways all these fight just so you can create a descriptor of the data. ATB.
Old question, but I had a more acute answer that I discovered after landing here with the same problem.
In this particular case, the problem I found was that the format of data file (from http://nsl.cs.unb.ca/NSL-KDD/) seems to have changed from the example as listed on the Mahout Random Forest example page.
The example lists a line format with the specifier
N 3 C 2 N C 4 N C 8 N 2 C 19 N L
but there's an extra element at the end of the lines; for example:
13,tcp,telnet,SF,118,2425,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,26,10,0.38,0.12,0.04,0.00,0.00,0.00,0.12,0.30,guess_passwd,2
which has one more field. Adding another number field (N) to the end of the specifier, as
N 3 C 2 N C 4 N C 8 N 2 C 19 N L N
I had luck using just the plain .txt file format instead of the .arff file format.
My dataset includes TWO main variables X and Y.
Variable X represents distinct codes (e.g. 001X01, 001X02, etc) for multiple computer items with different brands.
Variable Y represents the tax charged for each code of variable X (e.g. 15 = 15% for 001X01) at a store.
I've created categories for these computer items using dummy variables (e.g. HD dummy variable for Hard-Drives, takes value of 1 when variable X represents a HD, etc). I have a list of over 40 variables (two of them representing X and Y, and the rest is a bunch of dummy variables for the different categories I've created for computer items).
I would like to display the averages of all these categories using a loop in Stata, but I'm not sure how to do this.
For example the code:
mean Y if HD == 1
Mean estimation Number of obs = 5
--------------------------------------------------------------
| Mean Std. Err. [95% Conf. Interval]
-------------+------------------------------------------------
Tax | 7.1 2.537716 1.154172 15.24583
gives me the mean Tax for the category representing Hard Drives. How can I use a loop in Stata to automatically display all the mean Taxes charged for each category? I would do it by hand without a problem, but I want to repeat this process for multiple years, so I would like to use a loop for each year in order to come up with this output.
My goal is to create a separate Excel file with each of the computer categories I've created (38 total) and the average tax for each category by year.
Why bother with the loop and creating the indicator variables? If I understand correctly, your initial dataset allows the use of a simple collapse:
clear all
set more off
input ///
code tax str10 categ
1 0.15 "hd"
2 0.25 "pend"
3 0.23 "mouse"
4 0.29 "pend"
5 0.16 "pend"
6 0.50 "hd"
7 0.54 "monitor"
8 0.22 "monitor"
9 0.21 "mouse"
10 0.76 "mouse"
end
list
collapse (mean) tax, by(categ)
list
To take to Excel you can try export excel or put excel.
Run help collapse and help export for details.
Edit
Because you insist, below is an example that gives the same result using loops.
I assume the same data input as before. Some testing using this example database
with expand 1000000, shows that speed is virtually the same. But almost surely,
you (including your future you) and your readers will prefer collapse.
It is much clearer, cleaner and concise. It is even prettier.
levelsof categ, local(parts)
gen mtax = .
quietly {
foreach part of local parts {
summarize tax if categ == "`part'", meanonly
replace mtax = r(mean) if categ == "`part'"
}
}
bysort categ: keep if _n == 1
keep categ mtax
Stata has features that make it quite different from other languages. Once you
start getting a hold of it, you will find that many things done with loops elsewhere,
can be made loop-less in Stata. In many cases, the latter style will be preferred.
See corresponding help files using help <command> and if you are not familiarized with saved results (e.g. r(mean)), type help return.
A supplement to Roberto's excellent answer: After collapse, you will need a loop to export the results to excel.
levelsof categ, local(levels)
foreach x of local levels {
export excel `x', replace
}
I prefer to use numerical codes for variables such as your category variable. I then assign them value labels. Here's a version of Roberto's code which does this and which, for closer correspondence to your problem, adds a "year" variable
input code tax categ year
1 0.15 1 1999
2 0.25 2 2000
3 0.23 3 2013
4 0.29 1 2010
5 0.16 2 2000
6 0.50 1 2011
7 0.54 4 2000
8 0.22 4 2003
9 0.21 3 2004
10 0.76 3 2005
end
#delim ;
label define catl
1 hd
2 pend
3 mouse
4 monitor
;
#delim cr
label values categ catl
collapse (mean) tax, by(categ year)
levelsof categ, local(levels)
foreach x of local levels {
export excel `:label (categ) `x'', replace
}
The #delim ; command makes it possible to easily list each code on a separate line. The"label" function in the export statement is an extended macro function to insert a value label into the file name.
I have the data in a sav file
CODE | QUANTITY
------|----------
A | 1
B | 4
C | 1
F | 3
B | 3
D | 12
D | 5
I need to obtain the quantity of codes which have a quantity <= 3 and to obtain the proportion in a percentage with respect to the total number and present a result like this
<= 3 | PERCENTAGE
------|----------
4 | 57 %
All of this using SPSS syntax.
I would first convert the quantity value to a 0-1 variable, and then aggregate by code to the mean. This produces a nice second dataset to make a table. Example below.
data list free / Code (A1) Quantity (F2.0).
begin data
A 1
B 4
C 1
F 3
B 3
D 12
D 5
end data.
*convert to 0-1.
compute QuantityB3 = (Quantity LE 3).
*Aggregate.
DATASET DECLARE AggQuant.
AGGREGATE
/OUTFILE='AggQuant'
/BREAK=Code
/QuantityB3 = MEAN(QuantityB3).
I dont know how you migrate your question here, I dont have reputation here to add screen shoots that's help you allot. Anyhow the procedure of your desire output is given below.
Goto Transform->Count Values within cases a dialogue box open, write the name of new variable say "New" in Target Variable: go to define values a new dialogue box is open then check the radio button Range, LOWEST through value: put in below box 3 and then press add and press continue and press ok. A new variable is created with the name of "New". Now go to Analyze -> Descriptive Statistics-> Frequencies, new dialogue box will be open send "New" variable into Variable(s): press Statistics in new dialogue box check Percentile(s): write 100 in box and press Add and then continue and ok. You get the desire results.