HDFql Get Size of Group - hdf5

I am wondering how to get the number of datasets within a group using C++ and HDFql. Currently I have tried something like this (inspired by the HDFql manual):
char script[1024];
uint64_t group_size = 0;
sprintf(script, "SHOW my_group SIZE INTO MEMORY %d", HDFql::variableTransientRegister(&group_size));
HDFql::execute(script);
But unfortunately this doesn't work at all.
Many thanks!

One possible solution to solve your issue is to retrieve all the datasets stored in, e.g., group my_group like this:
HDFql::execute("SHOW DATASET my_group/");
And then, get the number of datasets found using HDFql function cursorGetCount (which returns the number of elements in the cursor). Example:
std::cout << "Number of datasets: " << HDFql::cursorGetCount();
As a side note, if you wish to retrieve all the datasets stored in group my_group and in sub-groups do the following (the LIKE option activates recursive search in HDFql):
HDFql::execute("SHOW DATASET my_group/ LIKE **");
For more information, please refer to HDFql reference manual and quick start.

Related

How to read out a list of cases in one variable in SPSS and use that to add data?

To explain my problem I use this example data set:
SampleID Date Project Problem
03D00173 03-Dec-2010 1,00
03D00173 03-Dec-2010 1,00
03D00173 28-Sep-2009 YNTRAD
03D00173 28-Sep-2009 YNTRAD
Now, the problem is that I need to replace the text "YNTRAD" with "YNTRAD_PILOT" but only for the cases with Date = 28-Sep-2009.
This is example is part of a much larger database, with many more cases having Project=YNTRAD and Data=28-Sep-2009, so I can not simply select first all cases with 28-Sep-2009, then check which of these cases have Project=YNTRAD and then replace. Instead, what I need to do is:
Look at each case that has a 1,00 in Problem (these are problem
cases)
Then find the SampleID that corresponds with that sample
Then find all other cases with the same SampleID BUT WITH
Date=28-Sep-2009 (this is needed because only those samples are part
of a pilot study) and then replace YNTRAD in Project to
YNTRAD_PILOT.
I read a lot about:
LOOP
- DO REPEAT
- DO IF
but I don't know how to use these in solving this problem.
I first tried making a list containing only the sample ID's that need eventually to be changed (again, this is part of a much larger database).
STRING SampleID2 (A20).
IF (Problem=1) SampleID2=SampleID.
EXECUTE.
AGGREGATE
/OUTFILE=*
/BREAK=SampleID2
/n_SampleID2=N.
This gives a dataset with only the SampleID's for which a change should be made. However I don't know how to read out this dataset case by case and looking up each SampleID in the overall file with all the date and then change only those cases were Date = 28-Sep-2009.
It sounds like once we can identify the IDs that need to be changed we've done the tricky part here. We can use AGGREGATE with MODE=ADDVARIABLES to add a problem Id counter variable to our dataset. From there, it's as you'd expect.
* Add var IdProblemCnt to your database . Stores # of times a given Id had a record with Problem = 1.
AGGREGATE
/OUTFILE=* MODE=ADDVARIABLES
/BREAK=SampleId
/IdProblemCnt=CIN(Problem, 1, 1) .
EXE .
* once we've identified the "problem" Ids we can use `RECODE` Project var.
DO IF (IdProblemCnt>0 AND Date = DATE.MDY(9,28,2009) .
RECODE Project ('YNTRAD' = 'YNTRAD_PILOT') .
END IF .
EXE .

Esper very simple context and aggregation

I have a quite simple problem to modelize and I don't have experience in Esper, so I may be headed the wrong way so I'd like some insight.
Here's the scenario: I have one stream of events "ParkingEvent", with two types of events "SpotTaken" and "SpotFree". So I have an Esper context both partitioned by id and bordered by a starting event of type "SpotTaken" and an end event of type "SpotFree". The idea is to monitor a parking spot with a sensor and then aggregate data to count the number of times the spot has been taken and also the time occupation.
That's it, no time window or whatsoever, so it seems quite simple but I struggle aggregating data. Here's the code I got so far:
create context ParkingSpotOccupation
context PartionBySource
partition by source from SmartParkingEvent,
context ContextBorders
initiated by SmartParkingEvent(
type = "SpotTaken") as startEvent
terminated by SmartParkingEvent(
type = "SpotFree") as endEvent;
#Name("measurement_occupation")
context ParkingSpotOccupation
insert into CreateMeasurement
select
e.source as source,
"ParkingSpotOccupation" as type,
{
"startDate", min(e.time),
"endDate", max(e.time),
"duration", dateDifferenceInSec(max(e.time), min(e.time))
} as fragments
from
SmartParkingEvent e
output
snapshot when terminated;
I got the same data for min and max so I'm guessing I'm doing somthing wrong.
When I'm using context.ContextBorders.startEvent.time and context.ContextBorders.endEvent.time instead of min and max, the measurement_occupation statement is not triggered.
Given that measurements have already been computed by the EPL that you provided, this counts the number of times the spot has been taken (and freed) and totals up the duration:
select source, count(*), sum(duration) from CreateMeasurement group by source

SSRS: Adding a filter that returns information from entire group

I am trying to create a report in SSRS. Below is a small example of what my dataset looks like.
Example Data Set
So, there are three different stores (A,B,C) and each has a landlord (a,b,c). Landlords can pay via three different methods (1,2,3) and the amounts paid per method are shown.
Right now, I have two filters set up. The first is by Store and the second is by Landlord.
What I am having trouble with is:
How can I set up a filter by the Amount that will return information from an entire Store/Landlord?
So for example, if I wanted to filter Amount by 150, I would like to return all the "payment" information for the store(s) that have a payment of 150. Such as the following:
Desired Result
Is it possible to add a filter to return information from the entire group? (Store and Landlord are the group in this case)
I am new to SSRS so any help/insight would be greatly appreciated!
You can use LookUpSet to locate the matching groups, JOIN to put the results in a string and the INSTR function to filter your results.
=IIF(ISNOTHING(Parameters!AMOUNT.Value) OR INSTR(
Join(LOOKUPSET(Fields!Amount.Value, Fields!Amount.Value, Fields!Store.Value, "DataSet1"), ", ") ,
Fields!Store.Value
) > 0, 1, 0)
This translates to:
If the Store value is found (INSTR > 0) in the list (JOIN) of Stores where the Amount is the current Amount (Lookupset).
In your filter, put the above expression in the Expression, change the type to INTEGER and the Value to 1.
[

Categorizing Hastags based on similarities

I have different documents with a list of hashtags in each. I would like to group them under the most relevant hashtag (which would be present in the document itself).
Egs: If there are #Eco, # Ecofriendly # GoingGreen - I would like to group all these under the most relevant and representative Hashtag (say #Eco). How should I be approaching this and what techniques and algorithms should I be looking at?
I would create a bipartite graph of documents-hashtags and use clustering on a bipartite graph:
http://www.cs.utexas.edu/users/inderjit/public_papers/kdd_bipartite.pdf
This way I am not using the content of the document, but just clustering the hashtags, which is what you wanted.
Your question is not very strict, and as such may have multiple answers, however, if we assume that you literally want "I would like to group all these under the most common Hashtag", then simply loop through all hashtags, compute have often they come up, and then for each document select the one with highest number of occurences.
Something like
N = {}
for D in documents:
for h in D.hashtags:
if h not in N: N[h] = 0
N[h] += 1
for D in documents:
best = None
for h in D.hashtags:
if best==None or N[best] < N[h]:
best = h
print 'Document ',D,' should be tagged with ',best

R package Twitter to analyze tweets text

I'm using TwitteR package (specifically, the searchTwitter function) to export in a csv format all the tweets containing a specific hashtag.
I would like to analyze their text and discover how many of them contain a specific list of words that I have just saved in a file called importantwords.txt.
How can I create a function that could return me a score of how many tweets contain the words that I have written in my file importantwords.txt?
Pseudocode:
> for (every word in importantwords.txt):
> int i = 0;
> for (every line in tweets.csv):
> if (line contains(word)):
> i = i+1
> print(word: i)
Is that along the lines of what you wanted?
I think best bet is to use the tm package.
http://cran.r-project.org/web/packages/tm/index.html
This fella uses it to create Word Clouds with the information. Looking through his code will probably help you out too.
http://davetang.org/muse/2013/04/06/using-the-r_twitter-package/
If your important words is just to avoid "the" "a" and things like that this will work fine. If its for something in particular you'll need to loop over the corpus with your list of words retrieving the counts.
Hope it helps
Nathan

Resources