Foreach loop and hash table with two values: - foreach

Example 01 - below is a simplified example of code I am using in a more complicated script.This code works.
THE PROBLEM is that I want to be able to control the order in which the two sets of values - website name and URL - are processed thru the foreach loop.
I UNDERSTAND that hash-tables do not process the data in the order they are listed but rather list it out in a seemingly random fashion.
WHAT HAVE I TRIED:
I've done some research on this via a book, google, and stack exchange.
I have tried using the GetEnumerator command along with its associated "sort name" option as shown in Example 02 below. This example works in this context - it successfully lists the sites according to the alphabetical order of the site name. However I do not want to list alphabetically but rather according to a stipulated order. If the ordering needs to be done manually that will be
fine but I don't see how to achieve this. Furthermore, when I try to use this method with my actual script the script will run but in this section it will display " System.Collections.DictionaryEntry", rather than running the script properly. Maybe I am not running it correctly here.
WHAT ELSE HAVE I TRIED:
I also tried a hash-table using two values - site name and a letter of the alphabet. This successfully listed the sites in the order I had stipulated using letters of the alphabet, BUT my actual script needs to have the two values - website name and URL - to be used in the foreach loop that governs the
running of the script.
I also tried to use three values in the hash-table - thought I may have seen a note that this was possible - however if this is even possible I could not find the syntax to make this work.
FINAL THOUGHT:
I was wondering if you could use two hash-tables to solve this - one to setup the order in which the foreach loop processes the values (as per Example 03 below) and the other to provide the actual values to be used in the foreach loop (as per Example 01).
Is that a possible solution or some alternative ?
Is there any other way to achieve this in a reasonable fashion ?
Thanks in advance.
Example 01
$Sites = #{
'DuckDuckGo' = 'https://duckduckgo.com'
'Google' = 'https://www.google.com'
'Ixquick' = 'https://www.ixquick.com'
'Yahoo ' = 'https://search.yahoo.com'
'Dogpile' = 'http://www.dogpile.com'
'Yippee ' = 'http://www.yippy.com'
}
clear
write-host "`n`n`n"
foreach ($name in $Sites.Keys) {
write-host
"`t`t {0} `t`t`t {1}" -f $name , $Sites[$name]
write-host
}
Example 02
$Sites = #{
'DuckDuckGo' = 'https://duckduckgo.com'
'Google' = 'https://www.google.com'
'Ixquick' = 'https://www.ixquick.com'
'Yahoo ' = 'https://search.yahoo.com'
'Dogpile' = 'http://www.dogpile.com'
'Yippee ' = 'http://www.yippy.com'
}
clear
write-host "`n`n`n"
foreach($name in $Sites.GetEnumerator() | Sort Name)
{
$name
}
write-host "`n`n`n`n"
Example 03
$Sites = #{
'DuckDuckGo' = 'a'
'Google' = 'f'
'Ixquick' = 'c'
'Yahoo ' = 'z'
'Dogpile' = 'e'
'Yippee ' = 'b'
}
$Sites.GetEnumerator() | Sort Value

You're thinking too complicated. Just sort the keys of the hashtable, like this:
$Sites.Keys | Sort-Object | ForEach-Object {
"{0}`t{1}" -f $_, $Sites[$_]
}
or like this:
foreach ($name in ($Sites.Keys | Sort-Object)) {
"{0}`t{1}" -f $name, $Sites[$name]
}
If you want the elements of the hashtable to be in a particular order from the start create an ordered hashtable:
$Sites = [ordered]#{
'DuckDuckGo' = 'https://duckduckgo.com'
'Google' = 'https://www.google.com'
'Ixquick' = 'https://www.ixquick.com'
'Yahoo ' = 'https://search.yahoo.com'
'Dogpile' = 'http://www.dogpile.com'
'Yippee ' = 'http://www.yippy.com'
}
As a side note: don't mix Write-Host and PowerShell default output (echoing of bare strings). They do different things. The latter writes to the success output stream, whereas the former writes to the host console. The order in which host and stream output are displayed may be different from what you expect.

Related

foreach loops: wildcard asterisks (*), nested macros, string variables - resolving errors when using *

In Stata, in a foreach loop, I am searching for values within string variables, using strmatch() and asterisk (*) wildcards. I need the asterisks because I'm searching for words that fall into any part of the string.
These string variables are nested into local macros. However using * in the foreach does not work with Stata IF it is part of a nested/descendant macro. Is this because:
A) wildcards within strings can never be used in foreach in Stata when using nested macros, or
B) it isn't the wildcard itself, but the * (asterisk) that is producing the error in foreach?
If B), is it possible to define a new character that means 'wildcard' instead of * so I can still use nested macros to organize my concepts before doing foreach?
Note: I'm working with a large dataset so the strmatch() function without the foreach loop is not a solution, unless there is an alternative to foreach.
Here's an example, for drug class Q (parent/ancestor macro), with individual drug lists (descendant macro):
*chem term list
local drug_list1 " "A*B" "B*A" "A" "
local drug_list2 " "C*D" "D" "
*search term list
local drugclassQ " "drug_list1" "drug_list2" "
*check macro data successfully stored
di `drugclassQ'
(successfully stored)
*Search all drug terms in descriptions
foreach searchterm in "drugclassQ" {
gen byte `searchterm' = 0
di "Making column called `searchterm'"
foreach chemterm in ``searchterm'' {
di "Checking individual search terms in: `chemterm'"
foreach indiv in ``chemterm'' {
di "Searching all columns for *`indiv'*"
foreach codeterm in lower(variable) {
di "`searchterm': Checking `codeterm' column for *`indiv'*"
replace `searchterm' = 1 if strmatch(`codeterm', "*`indiv'*")
}
}
}
}
gen keep_term = .
replace keep_term=1 if drugclassQ==1
keep if keep_term==1
Here's an example of what I would want the foreach loop for find, searching within the string variable chemical
For example searching on "A*B" within parent macro drugclassQ would find drugs with string values within the string variable chemical as the following:
Amg / Fmg /B A/B A/ B/R Amg/dose / Emg/dose / Bmg/dose
(note: mg = milligrams to illustrate my point about needing to define the variable as a string since the drugs are entered into the database in different ways)
Example Output to identify strings with A and B anywhere within values of 'Chemical':
Obs
Chemical (string variable)
drugclassQ
1
Amg / Fmg /B
1
2
A/B
1
3
A/ B/R
1
4
Amg/dose / Emg/dose / Bmg/dose
1
5
A
0
My code works when I don't use asterisks, but then that defeats the premise of how I'm using the foreach code, i.e. using the wildcard that is within nested macros.
Any solutions?

How to identify text within an object's string

I want to find specified text within a string in a given column, and count how many times that string is repeated throughout the entire column.
For example, Find "XX" within a string in a column and print to dialogue box the number of times that text was found.
Module m = current
Object o
string s
string x
int offset = null
int len = null
int c
for o in m do
{
string s = probeAttr_(o, "AttributeA")
x = o."Object Text" ""
if(findPlainText(s, "XX", offset, len, false)){
print "Success "
} else {
print "Failed to match"
}
}
I have tried to use command findPlainText but I am inadvertently passing every object as true.
As well I placed the output to print 'success' or 'Failed to match' so I can at least get a number count of what is being passed. Unfortunately it seems like everything is being passed!
My understanding is that 'probeAttr_(o, "AttributeA")' allows me to specify and enter what column to search. As well o."Object Text" "" now allows me to look within any object and search for any text contained. I also realize that variable x is not being used but assume it has some way of being used to solve this issue.
I only use DOORS at a surface level but having this ability will save other staff tons of time. I realize this may be accomplished using the DOORS advanced filtering capability but I'd be able to compound this code with other simple commands to save time.
Thank you in advance for your help!!
If you want to count every occurence of a specified string in a text in an attribute for all objects, I think Mike's proposal is the correct answer. If you are only interested, if the specified string occurs once in that object's attribute, I suggest using Regexp, as I find it very fast, quite powerful and nevertheless easy to use, e.g.:
Regexp reSearch = regexp2 "XX"
int iCounter = 0
string strOT = ""
for o in m do {
strOT = o."Object Text" ""
if (reSearch strOT) {
iCounter++
}
}
print "Counted: '" iCounter "'\n"
Most of this has been answered in (DXL/Doors) How to find and count a text in a string?
You can easily exchange the "print" with a counter.

How to get dxl scripts to run faster

I have created a DXL script that goes through every row of a couple modules. I am printing out certain rows and its information. I am doing this by having a for loop that goes through the rows and if it hits a row that I am interested in, I save the elements in the columns of this row to different string variables and print those string variables. The script does not take too long to run if the module does not have a lot of rows I am interested in but if I want to run multiple modules at the same time or if a module has a lot of rows I am interested in, the script can take hours. I can show the code that I have if this is not enough to come up with solutions. Any help would be appreciated!
I have tried using a skip list to store the print statements in that and then tried going through the skip list to print each value but that did not make the script run any faster.
string sep=","
for o in m do
{
string ver1= o."column1"
if (checkIf(o) && (!(isDeleted(o))))
{
string ver2= o."column2"
string onum=number(o)
""
string otext = o."Object Text"
print ver1 sep ver2 sep onum
}
}
Initial optimization:
for o in m do
{
if (checkIf(o) && (!(isDeleted(o)))) {
//This doesn't appear to be used?
//string otext = o."Object Text"
print o."column1" "," o."column2" "," number(o) "\n"
}
}
Reasoning: DOORS has a system in place called the string table that holds declared strings in memory- and doesn't necessarily do the best at clearing it out when appropriate. By constantly declaring strings in your loop, you might be bumping into the memory limits of that system.
Problem with this is that the results all end up in that 'DXL editor' little window, and then have to be copy and pasted somewhere else to actually use it.
Secondary optimization:
// Turn off runlimit for timing
pragma runLim , 0
// Set file location - CHANGE FOR YOUR COMPUTER
string csv_location = "C:/Users/Username/Desktop/Info_Collection.csv"
// Open stream
Stream out = append csv_location
// Set headers
out << "Module,Column 1,Column 2,Object Number" "\n"
// Define your loop constraints
Module m = current
Object o
// Run your loop
for o in m do
{
if (checkIf(o) && (!(isDeleted(o)))) {
//This doesn't appear to be used?
//string otext = o."Object Text"
out << fullName(m)","o."column1" "," o."column2" "," number(o) "\n"
}
}
close out
This will let you run the same script in different modules, all outputting to the same CSV file, which you can then load into Excel or your data manipulation engine of choice.
This keeps the data collection happening outside of DOORS, so if something goes awry, you can track down where it occurred.
My third optimization would be to use a list of modules in, say, excel as an input and do this analysis, but that might be going too far.
If this doesn't help, then we can start examining other issues.
Note- I still would like to know what 'checkIf' is/does.
If your objective is to speed up the execution of the script, since most of the objects are of no interest to you, the most effective way I know of is to filter out most of the objects that are not interesting, e.g., a filter which is obj."Object Text" != "" would filter out Headings, if you are just interested in requirements, obj."Object Text" contains "[Ss]hall" etc. Save as a view for later use.
for o in m do { respects the display set, so if you don't touch most of the objects it will speed it up a lot!
Hope this helps.
Don

find Variables in a module

I'm new to DXL and I want to extract the variables containing
_I_,_O_ and _IO_
from a module and export then to csv file. Please help me with this
EG:
ADCD_WE_I_DFGJDGFJ_12_QWE and CVFDFH_AWEQ_I_EHH[X] is set to some value
This question has two parts.
You want to find variables that contain those parts in their name
You want to export to a .csv file
Another person may be able to expand on a better way, but the only way coming to mind right now for 1. is this:
Loop over the attributes in the module (for ad in m do {}) and get the string of the attribute names.
I am assuming that your attributes are valued at _I_, _O_ or _OI_? Like alpha = "_I_"? Are these enumerate values?
If they are, then you only need to check the value of each object's attribute. If one of those are the attribute values, then add it to something like a skip list. Having a counter here would be useful, maybe one for each attribute, like countI, countO, countOI, you can then use the counters as keys for the put() function for the skip list.
Once you have found all the attributes then you can move on to writing to csv
Stream out = write("filepathname/filename.csv") // to declare the stream
out << "_I_, _O_, _OI_\n"
Then you could loop over your Skip lists at the same time
int ijk = 0; bool finito = false
while(!finito) do {
if(ijk<countI) {
at = get(skipListI, ijk)
out << at.name ","
}
else out << ","
if(ijk<countO) {
at = get(skipListO, ijk)
out << at.name ","
}
else out << ","
if(ijk<countOI) {
at = get(skipListOI, ijk)
out << at.name "\n"
}
else out << "\n"
ijk++
// check if the next iteration would be outside of bounds on all lists
if(ijk >= countI && ijk >= countO && ijk >= countIO) finito = true
}
Or instead of at.name, you could print out whatever part of the attribute you wanted. The name, the value, "name:value" or whatever.
I have not run this, so you will be left to do any troubleshooting.
--
I hope this idea gets you started, write out what you want on paper first and then follow that plan. The key things I have mentioned that would be useful here are Skip lists, and Stream write (or append, if you want to keep adding).
In the future, please consider making your question more clear. Are you looking for those search terms in the name of the attribute, or in the value of the attribute. Are you looking to print out the names or the values, or the what? What kind of format for the .csv are you going to have? Any information will help your question be answered.

How can filter any SET by its concat value according to another SET in Redis

I have a filter optimization problem in Redis.
I have a Redis SET which keeps the doc and pos pairs of a type in a corpus.
example:
smembers type_in_docs.1
result: doc.pos pairs
array (size=216627)
0 => string '2805.2339' (length=9)
1 => string '2410.14208' (length=10)
2 => string '3516.1810' (length=9)
...
Another redis set i create live according to user choices
It contains selected docs.
smembers filteredDocs
I want to filter doc.pos pairs "type_in_docs" set according to user Doc id choices.
In fact if i didnt use concat values in set it was easy with SINTER.
So i implement a php filter code as below.
It works but need an optimization.
In big doc.pairs set too much time need. (Nearly After 150000 members!)
$concordance= $this->redis->smembers('types_in_docs.'.$typeID);
$filteredDocs= $this->redis->smembers('filteredDocs');
$filtered = array_filter($concordance, function($pairs) use ($filteredDocs) {
if( in_array(substr($pairs, 0, strpos($pairs, '.')), $filteredDocs) ) return true;
});
I tried sorted set with scores as docId.
Bu couldnt find a intersect or filter option for score values.
I am thinking and searching a Redis based solution with supported keys, sets or Lua script for time optimization.
But nothing find.
How can i filter Redis sets with concat values?
Thanks for helps.
Your code is slow primarily because you're moving a lot of data from Redis to your PHP filter. The general motivation here should be perform as much filtering as possible on the server. To do that you'd need to pay some sort of price in CPU & RAM.
There are many ways to do this, here's one:
Ensure you're using Redis v2.8.9 or above.
To allow efficiently looking for doc only, keep your doc.pos pairs as is but use Sorted Sets with score = 0, your e.g.:
ZADD type_in_docs.1 0 2805.2339 0 2410.14208 0 3516.1810
This will allow you to mimic SISMEMBER for doc in the set with:
ZRANGEBYLEX type_in_docs.1 [<$typeID> (<$typeID + "\xff">
You can now just SMEMBERS on the (usually) smaller filterDocs set and then call ZRANGEBYLEX on each for immediate gains.
If you want to do better - in extreme cases (i.e. large filterDocs, small type_in_docs) you should do the reverse.
If you want to do even better, use Lua to wrap up the filtering logic - something like:
-- #usage: redis-cli --filter_doc_pos.lua <filter set keyname> <type pairs keyname>
-- #returns: list of matching doc.pos pairs
local r = {}
for _, fv in pairs(redis.call("SMEMBERS", KEYS[1])) do
local t = redis.call("ZRANGEBYLEX", KEYS[2], "[" .. fv , "(" .. fv .. "\xff")
for _, tv in pairs(t) do
r[#r+1] = tv
end
end
return r

Resources