Data Element Type(DET) in Function Point Analysis? - analysis

I am studying function point analysis from Alvin Alexander's website.
http://alvinalexander.com/FunctionPoints/
In his example, he is calculating DETs from GUI screen but I cannot understand how he is counting. For example according to him at
http://alvinalexander.com/FunctionPoints/node26.shtml (end of page) DET of Create Project is five while there are only three input fields. Same is with other Screens. Can anyone help me? I'm stuck here.

A DET (Data Element Type) isn't just an input field: it's any piece of information recognizable by the user that crosses the application boundary. Usually, every input field on the screen is indeed a DET, but not always. I'm not going to get into that now, though, since in this particular case all the input fields are indeed DETs. Let's just talk about those 2 DETs that seem unaccounted for.
You should count 3 DETs for the 3 input fields (Project Name, Project Type and Project Description), and also 1 DET for the act of clicking on the Save button. Note that even if there were multiple ways to save the project (clicking on the Save button, pressing Enter etc) you would still count only 1 DET.
As for the fifth DET, I'm assuming the author is counting 1 DET for any messages the application is capable of showing in the process of creating a new project (confirmation message, any error messages, warnings etc). Again, you should only count 1 DET no matter how many possible messages there are. And I said I'm assuming because, while it is correct to count 1 DET for the capability of showing messages (it is, after all, information recognizable by the user that crosses the application boundary), he should have explicitly mentioned at least one message, especially since it's a teaching example.

DET basically count of controls/fields, error message & button/href on UI screen for transaction functions.
- 1 DET for 1 controls/fields.
- 1 DET for all error messages.
- 1 DET for all buttons/hrefs.
eg, 1 Text field = 1 DET
1 Label = 1 DET
1 Radio button group = 1 DET
2 Button (Submit & Cancel) = 1 DET
Total 4 DET.

Related

Deterministic Finite Automata divisibility problem

Design a DFA that accepts the string given by L = { w has number of 'a' divisible by 3 and number of 'b' divisible by 2 over the alphabet {a,b} }
Realize that we should have 3 * 2 = 6 states in the DFA. Why? Because one has 3 choices for the number of a's (0 or 1 or 2) [think in terms of remainders] and 2 choices for number of b's (0 or 1 similarly).
Let us name the states axby which means I have found x number of a's and y number of b's till now. For example, if we are in a2b0 and we encounter an a, then we go to a0b0 (hope you see why?). Similarly a1b1 ---b---> a1b0 and a1b1 ---a---> a2b1.
Needless to say a0b0 is the accepting state.
Now, all you have to do is draw the states and keep joining them. I have drawn them on a paper here.

Generating means of a variable using dummy variables & foreach in Stata

My dataset includes TWO main variables X and Y.
Variable X represents distinct codes (e.g. 001X01, 001X02, etc) for multiple computer items with different brands.
Variable Y represents the tax charged for each code of variable X (e.g. 15 = 15% for 001X01) at a store.
I've created categories for these computer items using dummy variables (e.g. HD dummy variable for Hard-Drives, takes value of 1 when variable X represents a HD, etc). I have a list of over 40 variables (two of them representing X and Y, and the rest is a bunch of dummy variables for the different categories I've created for computer items).
I would like to display the averages of all these categories using a loop in Stata, but I'm not sure how to do this.
For example the code:
mean Y if HD == 1
Mean estimation Number of obs = 5
--------------------------------------------------------------
| Mean Std. Err. [95% Conf. Interval]
-------------+------------------------------------------------
Tax | 7.1 2.537716 1.154172 15.24583
gives me the mean Tax for the category representing Hard Drives. How can I use a loop in Stata to automatically display all the mean Taxes charged for each category? I would do it by hand without a problem, but I want to repeat this process for multiple years, so I would like to use a loop for each year in order to come up with this output.
My goal is to create a separate Excel file with each of the computer categories I've created (38 total) and the average tax for each category by year.
Why bother with the loop and creating the indicator variables? If I understand correctly, your initial dataset allows the use of a simple collapse:
clear all
set more off
input ///
code tax str10 categ
1 0.15 "hd"
2 0.25 "pend"
3 0.23 "mouse"
4 0.29 "pend"
5 0.16 "pend"
6 0.50 "hd"
7 0.54 "monitor"
8 0.22 "monitor"
9 0.21 "mouse"
10 0.76 "mouse"
end
list
collapse (mean) tax, by(categ)
list
To take to Excel you can try export excel or put excel.
Run help collapse and help export for details.
Edit
Because you insist, below is an example that gives the same result using loops.
I assume the same data input as before. Some testing using this example database
with expand 1000000, shows that speed is virtually the same. But almost surely,
you (including your future you) and your readers will prefer collapse.
It is much clearer, cleaner and concise. It is even prettier.
levelsof categ, local(parts)
gen mtax = .
quietly {
foreach part of local parts {
summarize tax if categ == "`part'", meanonly
replace mtax = r(mean) if categ == "`part'"
}
}
bysort categ: keep if _n == 1
keep categ mtax
Stata has features that make it quite different from other languages. Once you
start getting a hold of it, you will find that many things done with loops elsewhere,
can be made loop-less in Stata. In many cases, the latter style will be preferred.
See corresponding help files using help <command> and if you are not familiarized with saved results (e.g. r(mean)), type help return.
A supplement to Roberto's excellent answer: After collapse, you will need a loop to export the results to excel.
levelsof categ, local(levels)
foreach x of local levels {
export excel `x', replace
}
I prefer to use numerical codes for variables such as your category variable. I then assign them value labels. Here's a version of Roberto's code which does this and which, for closer correspondence to your problem, adds a "year" variable
input code tax categ year
1 0.15 1 1999
2 0.25 2 2000
3 0.23 3 2013
4 0.29 1 2010
5 0.16 2 2000
6 0.50 1 2011
7 0.54 4 2000
8 0.22 4 2003
9 0.21 3 2004
10 0.76 3 2005
end
#delim ;
label define catl
1 hd
2 pend
3 mouse
4 monitor
;
#delim cr
label values categ catl
collapse (mean) tax, by(categ year)
levelsof categ, local(levels)
foreach x of local levels {
export excel `:label (categ) `x'', replace
}
The #delim ; command makes it possible to easily list each code on a separate line. The"label" function in the export statement is an extended macro function to insert a value label into the file name.

Obtaining the quantity and proportion in SPSS 21

I have the data in a sav file
CODE | QUANTITY
------|----------
A | 1
B | 4
C | 1
F | 3
B | 3
D | 12
D | 5
I need to obtain the quantity of codes which have a quantity <= 3 and to obtain the proportion in a percentage with respect to the total number and present a result like this
<= 3 | PERCENTAGE
------|----------
4 | 57 %
All of this using SPSS syntax.
I would first convert the quantity value to a 0-1 variable, and then aggregate by code to the mean. This produces a nice second dataset to make a table. Example below.
data list free / Code (A1) Quantity (F2.0).
begin data
A 1
B 4
C 1
F 3
B 3
D 12
D 5
end data.
*convert to 0-1.
compute QuantityB3 = (Quantity LE 3).
*Aggregate.
DATASET DECLARE AggQuant.
AGGREGATE
/OUTFILE='AggQuant'
/BREAK=Code
/QuantityB3 = MEAN(QuantityB3).
I dont know how you migrate your question here, I dont have reputation here to add screen shoots that's help you allot. Anyhow the procedure of your desire output is given below.
Goto Transform->Count Values within cases a dialogue box open, write the name of new variable say "New" in Target Variable: go to define values a new dialogue box is open then check the radio button Range, LOWEST through value: put in below box 3 and then press add and press continue and press ok. A new variable is created with the name of "New". Now go to Analyze -> Descriptive Statistics-> Frequencies, new dialogue box will be open send "New" variable into Variable(s): press Statistics in new dialogue box check Percentile(s): write 100 in box and press Add and then continue and ok. You get the desire results.

Determine consecutive video clips

I a long video stream, but unfortunately, it's in the form of 1000 15-second long randomly-named clips. I'd like to reconstruct the original video based on some measure of "similarity" of two such 15s clips, something answering the question of "the activity in clip 2 seems like an extension of clip 1". There are small gaps between clips --- a few hundred milliseconds or so each. I can also manually fix up the results if they're sufficiently good, so results needn't be perfect.
A very simplistic approach can be:
(a) Create an automated process to extract the first and last frame of each video-clip in a known image format (e.g. JPG) and name them according to video-clip names, e.g. if you have the video clips:
clipA.avi, clipB.avi, clipC.avi
you may create the following frame-images:
clipA_first.jpg, clipA_last.jpg, clipB_first.jpg, clipB_last.jpg, clipC_first.jpg, clipC_last.jpg
(b) The sorting "algorithm":
1. Create a 'Clips' list of Clip-Records containing each:
(a) clip-name (string)
(b) prev-clip-name (string)
(c) prev-clip-diff (float)
(d) next-clip-name (string)
(e) next-clip-diff (float)
2. Apply the following processing:
for Each ClipX having ClipX.next-clip-name == "" do:
{
ClipX.next-clip-diff = <a big enough number>;
for Each ClipY having ClipY.prev-clip-name == "" do:
{
float ImageDif = ImageDif(ClipX.last-frame.jpg, ClipY.first_frame.jpg);
if (ImageDif < ClipX.next-clip-diff)
{
ClipX.next-clip-name = ClipY.clip-name;
ClipX.next-clip-diff = ImageDif;
}
}
Clips[ClipX.next-clip-name].prev-clip-name = ClipX.clip-name;
Clips[ClipX.next-clip-name].prev-clip-diff = ClipX.next-clip-diff;
}
3. Scan the Clips list to find the record(s) with no <prev-clip-name> or
(if all records have a <prev-clip-name> find the record with the max <prev-clip-dif>.
This is a good candidate(s) to be the first clip in sequence.
4. Begin from the clip(s) found in step (3) and rename the clip-files by adding
a 5 digits number (00001, 00002, etc) at the beginning of its filename and going
from aClip to aClip.next-clip-name and removing the clip from the list.
5. Repeat steps 3,4 until there are no clips in the list.
6. Voila! You have your sorted clips list in the form of sorted video filenames!
...or you may end up with more than one sorted lists (if you have enough
'time-gap' between your video clips).
Very simplistic... but I think it can be effective...
PS1: Regarding the ImageDif() function: You can create a new DifImage, which is the difference of Images ClipX.last-frame.jpg, ClipY.first_frame.jpg and then then sum all pixels of DifImage to a single floating point ImageDif value. You can also optimize the process to abort the difference (or sum process) if your sum is bigger than some limit: You are actually interested in small differences. A ImageDif value which is larger than an (experimental) limit, means that the 2 images differs so much that the 2 clips cannot be one next each other.
PS2: The sorting algorithm order of complexity must be approximately O(n*log(n)), therefore for 1000 video clips it will perform about 3000 image comparisons (or a little more if you optimize the algorithm and you allow it to not find a match for some clips)

splitting space delimited entries into new columns in R

I am coding a survey that outputs a .csv file. Within this csv I have some entries that are space delimited, which represent multi-select questions (e.g. questions with more than one response). In the end I want to parse these space delimited entries into their own columns and create headers for them so i know where they came from.
For example I may start with this (note that the multiselect columns have an _M after them):
Q1, Q2_M, Q3, Q4_M
6, 1 2 88, 3, 3 5 99
6, , 3, 1 2
and I want to go to this:
Q1, Q2_M_1, Q2_M_2, Q2_M_88, Q3, Q4_M_1, Q4_M_2, Q4_M_3, Q4_M_5, Q4_M_99
6, 1, 1, 1, 3, 0, 0, 1, 1, 1
6,,,,3,1,1,0,0,0
I imagine this is a relatively common issue to deal with but I have not been able to find it in the R section. Any ideas how to do this in R after importing the .csv ? My general thoughts (which often lead to inefficient programs) are that I can:
(1) pull column numbers that have the special suffix with grep()
(2) loop through (or use an apply) each of the entries in these columns and determine the levels of responses and then create columns accordingly
(3) loop through (or use an apply) and place indicators in appropriate columns to indicate presence of selection
I appreciate any help and please let me know if this is not clear.
I agree with ran2 and aL3Xa that you probably want to change the format of your data to have a different column for each possible reponse. However, if you munging your dataset to a better format proves problematic, it is possible to do what you asked.
process_multichoice <- function(x) lapply(strsplit(x, " "), as.numeric)
q2 <- c("1 2 3 NA 4", "2 5")
processed_q2 <- process_multichoice(q2)
[[1]]
[1] 1 2 3 NA 4
[[2]]
[1] 2 5
The reason different columns for different responses are suggested is because it is still quite unpleasant trying to retrieve any statistics from the data in this form. Although you can do things like
# Number of reponses given
sapply(processed_q2, length)
#Frequency of each response
table(unlist(processed_q2), useNA = "ifany")
EDIT: One more piece of advice. Keep the code that processes your data separate from the code that analyses it. If you create any graphs, keep the code for creating them separate again. I've been down the road of mixing things together, and it isn't pretty. (Especially when you come back to the code six months later.)
I am not entirely sure what you trying to do respectively what your reasons are for coding like this. Thus my advice is more general – so just feel to clarify and I will try to give a more concrete response.
1) I say that you are coding the survey on your own, which is great because it means you have influence on your .csv file. I would NEVER use different kinds of separation in the same .csv file. Just do the naming from the very beginning, just like you suggested in the second block.
Otherwise you might geht into trouble with checkboxes for example. Let's say someone checks 3 out of 5 possible answers, the next only checks 1 (i.e. "don't know") . Now it will be much harder to create a spreadsheet (data.frame) type of results view as opposed to having an empty field (which turns out to be an NA in R) that only needs to be recoded.
2) Another important question is whether you intend to do a panel survey(i.e longitudinal study asking the same participants over and over again) . That (among many others) would be a good reason to think about saving your data to a MySQL database instead of .csv . RMySQL can connect directly to the database and access its tables and more important its VIEWS.
Views really help with survey data since you can rearrange the data in different views, conditional on many different needs.
3) Besides all the personal / opinion and experience, here's some (less biased) literature to get started:
Complex Surveys: A Guide to Analysis Using R (Wiley Series in Survey Methodology
The book is comparatively simple and leaves out panel surveys but gives a lot of R Code and examples which should be a practical start.
To prevent re-inventing the wheel you might want to check LimeSurvey, a pretty decent (not speaking of the templates :) ) tool for survey conductors. Besides I TYPO3 CMS extensions pbsurvey and ke_questionnaire (should) work well too (only tested pbsurvey).
Multiple choice items should always be coded as separate variables. That is, if you have 5 alternatives and multiple choice, you should code them as i1, i2, i3, i4, i5, i.e. each one is a binary variable (0-1). I see that you have values 3 5 99 for Q4_M variable in the first example. Does that mean that you have 99 alternatives in an item? Ouch...
First you should go on and create separate variables for each alternative in a multiple choice item. That is, do:
# note that I follow your example with Q4_M variable
dtf_ins <- as.data.frame(matrix(0, nrow = nrow(<initial dataframe>), ncol = 99))
# name vars appropriately
names(dtf_ins) <- paste("Q4_M_", 1:99, sep = "")
now you have a data.frame with 0s, so what you need to do is to get 1s in an appropriate position (this is a bit cumbersome), a function will do the job...
# first you gotta change spaces to commas and convert character variable to a numeric one
y <- paste("c(", gsub(" ", ", ", x), ")", sep = "")
z <- eval(parse(text = y))
# now you assing 1 according to indexes in z variable
dtf_ins[1, z] <- 1
And that's pretty much it... basically, you would like to reconsider creating a data.frame with _M variables, so you can write a function that does this insertion automatically. Avoid for loops!
Or, even better, create a matrix with logicals, and just do dtf[m] <- 1, where dtf is your multiple-choice data.frame, and m is matrix with logicals.
I would like to help you more on this one, but I'm recuperating after a looong night! =) Hope that I've helped a bit! =)
Thanks for all the responses. I agree with most of you that this format is kind of silly but it is what I have to work with (survey is coded and going into use next week). This is what I came up with from all the responses. I am sure this is not the most elegant or efficient way to do it but I think it should work.
colnums <- grep("_M",colnames(dat))
responses <- nrow(dat)
for (i in colnums) {
vec <- as.vector(dat[,i]) #turn into vector
b <- lapply(strsplit(vec," "),as.numeric) #split up and turn into numeric
c <- sort(unique(unlist(b))) #which values were used
newcolnames <- paste(colnames(dat[i]),"_",c,sep="") #column names
e <- matrix(nrow=responses,ncol=length(c)) #create new matrix for indicators
colnames(e) <- newcolnames
#next loop looks for responses and puts indicators in the correct places
for (i in 1:responses) {
e[i,] <- ifelse(c %in% b[[i]],1,0)
}
dat <- cbind(dat,e)
}
Suggestions for improvement are welcome.

Resources