Applying GroupBy and then apply Count in Google Dataflow - google-cloud-dataflow

I have the following in my Google cloud storage
Advertiser | Event
__________________
100 | Click
101 | Impression
100 | Impression
100 | Impression
101 | Impression
My output of the pipeline should be something like
Advertiser | Clicks | Impressions
100 | 1 | 2
101 | 0 | 2
First I used groupByKey, the output is like
100 Click, Impression, Impression
101 Impression, Impression
Now is it possible to count the value in KV?
Currently I just used comparing strings to count the clicks and impressions.
Is it possible to use count transforms over here?
Or do we any other transforms to be used here?
Or the way that I did is the only way?
Thanks,
Sam.

I'm assuming your input is available as a PCollection<KV<Long, EventType>> input where the Long is the advertiser ID and EventType is an enum { CLICK, IMPRESSION, possibly something else }.
I'm also assuming you want the output to be a PCollection> where AdvertiserStats is a class with fields "numClicks", "numImpressions".
In that case one way to achieve what you want is to use Combine - input.apply(Combine.<Long, AdvertiserStats>perKey(new ComputeAdvertiserStatsFn())), where ComputeAdvertiserStatsFn is defined something like this:
public class ComputeAdvertiserStatsFn
extends CombineFn<EventType, AdvertiserStats, AdvertiserStats> {
public AdvertiserStats createAccumulator() { return new AdvertiserStats(); }
public void addInput(AdvertiserStats stats, EventType input) {
switch (input) {
case CLICK: stats.numClicks++; break;
case IMPRESSION: stats.numImpressions++; break;
default: (depending on your application?)
}
}
public AdvertiserStats mergeAccumulators(Iterable<AdvertiserStats> stats) {
AdvertiserStats merged = createAccumulator();
for (AdvertiserStats item : stats) {
merged.numClicks += item.numClicks;
merged.numImpressions += item.numImpressions;
}
return merged;
}
public AdvertiserStats extractOutput(AdvertiserStats stats) { return stats; }
}
This should perform very well because most of the grouping and counting will happen locally.
Currently, AFAIK, there is no PTransform that would do the work of ComputeAdvertiserStatsFn for you. I think the ideal interface would look something like input.apply(Combine.perKey(Count.perElement())), but it wouldn't work with the way these are currently defined.

Related

Vaadin7 table sort label column numerically

I have vaadin table with multiple columns. One of the column is of label class
container.addContainerProperty(ID_COLUMN, Label.class, "");
and I fill it up like
referenceItem.getItemProperty(ID_COLUMN).setValue(new Label(new Integer(reference.getId()).toString()));
When I sort the table by clicking on this table, it sorts the data like
1
10
100
2
200
7
So, I tried to change the class to Integer, then it works fine, but I get the numbers with comma like
1
2
...
..
7,456
8,455
How can I have the data sorted numerically and no commas.
I was able to figure out. I used Integer as class for my column and used following
referenceTable = new Table()
{
#Override
protected String formatPropertyValue(final Object a_row_id, final Object a_col_id, final Property<?> a_property)
{
if (a_property.getType() == Integer.class && null != a_property.getValue())
{
DecimalFormat df = (DecimalFormat) DecimalFormat.getInstance(getLocale());
df.applyLocalizedPattern("#0");
return df.format(a_property.getValue());
}
return super.formatPropertyValue(a_row_id, a_col_id, a_property);
}
};
It has been a while since i have been having fun with Vaadin Table.
There are property formatters, generators etc... stuff but in this case it might be easiest just to:
container.addContainerProperty(ID_COLUMN, String.class, "");
referenceItem.getItemProperty(ID_COLUMN).setValue(""+reference.‌​getId());

Criteria API Average of hasmany Relationship

We have an Shop with ShopArticles and want to add an Rating system.
Our ShopArticle looks like this:
class ShopArticle {
String contUnit = 'STK', orderUnit = 'PCK'
Double value, tax = 0.19
String name, description, keyword
String group1, group2, group3, articleNumber
String producer
Boolean unlocked
static hasMany = [ratings: ShopArticleRating]
}
And the Rating looks like this:
class ShopArticleRating {
String comment
int rating
ShopArticle shopArticle
User user
static belongsTo = ShopArticle
}
Now we want to filter for the average Rating of an atricle, so we made this:
def shopArticleList = ShopArticleRating.createCriteria().listDistinct {
projections {
groupProperty("shopArticle")
}
}
def ids = []
shopArticleList.each { shopArticle ->
def sum = 0
shopArticle.ratings.each {
sum += it.rating
}
if ((sum / shopArticle.ratings.size()) >= filter.rating) {
ids.add(shopArticle.id)
}
}
List<ShopArticle> list = ShopArticle.createCriteria().list {
if (ids.size() > 0) {
'in'("id", ids)
}
}
Is there a better way to filter for the average Rating?
Maybe like this:
List<ShopArticle> list = ShopArticle.createCriteria().list {
createAlias('ratings','r')
projections {
groupProperty('r.rating')
}
gt("r.rating",filter.rating)
}
Were it me, I'd add an averageRating attribute to the ShopArticle class itself.
Do the math to compute the average when a rating is added/deleted/changed for that ShopArticle, and distribute the 'cost' of doing the math across each rating entry/change. Ten people add ratings, you do the math 10 times. A thousand people do queries, you don't do the math 1000 times.
Showing the average rating becomes nothing more than showing another attribute on the screen, filtering is trivial -- no extra work when querying data (and I think it is safe to bet that there will be more queries than ratings added).
Try this:
ShopArticleRating.withCriteria {
projections {
avg("rating")
groupProperty("shopArticle")
}
}
This should give you the article object together with the average of ratings.
I could not test the other way around but it should be like this:
ShopArticle.withCriteria {
projections {
ratings {
avg("rating)
}
property("id")
}
}
which should give you the article id together with the average of the correspondent ratings
another way would be to write your own sql syntax which gives you the absolute freedom to return everything you want. In that case look for "grails HQL". Sometimes I touch the borders when I need to do very complex queries. But in your case you should be good.

grails: converting SQL into domain classes

I am developing a GRAILS application (I'm new to GRAILS and inherited the project from a previous developer). I'm slowly getting a small grasp for how GRAILS operates and the use of DOMAIN classes, hibernate etc. The MySQL db is hosted on Amazon and we're using ElasticCache.
Do any of you more knowledgeable folks know how I can go about converting the following SQL statement into domain classes and query criteria.
if(params?.searchterm) {
def searchTerms = params.searchterm.trim().split( ',' )
def resultLimit = params.resultlimit?: 1000
def addDomain = ''
if (params?.domainname){
addDomain = " and url like '%${params.domainname}%' "
}
def theSearchTermsSQL = ""
/*
* create c.name rlike condition for each search term
*
*/
searchTerms.each{
aSearchTerm ->
if( theSearchTermsSQL != '' ){
theSearchTermsSQL += ' or '
}
theSearchTermsSQL += "cname rlike '[[:<:]]" + aSearchTerm.trim() + "[[:>:]]'"
}
/*
* build query
*
*/
def getUrlsQuery = "select
u.url as url,
c.name as cname,
t.weight as tweight
from
(category c, target t, url_meta_data u )
where
(" + theSearchTermsSQL + ")
and
t.category_id = c.id
and t.url_meta_data_id = u.id
and u.ugc_flag != 1 " + addDomain + "
order by tweight desc
limit " + resultLimit.toLong()
/*
* run query
*
*/
Sql sqlInstance = new Sql( dataSource )
def resultsList = sqlInstance.rows( getUrlsQuery )
}
The tables are as follows (dummy data):
[Category]
id | name
-----------
1 | small car
2 | bike
3 | truck
4 | train
5 | plane
6 | large car
7 | caravan
[Target]
id | cid | weight | url_meta_data_id
----------------------------------------
1 | 1 | 56 | 1
2 | 1 | 76 | 2
3 | 3 | 34 | 3
4 | 2 | 98 | 4
5 | 1 | 11 | 5
6 | 3 | 31 | 7
7 | 5 | 12 | 8
8 | 4 | 82 | 6
[url_meta_data]
id | url | ugc_flag
---------------------------------------------
1 | http://www.example.com/foo/1 | 0
2 | http://www.example.com/foo/2 | 0
3 | http://www.example.com/foo/3 | 1
4 | http://www.example.com/foo/4 | 0
5 | http://www.example.com/foo/5 | 1
6 | http://www.example.com/foo/6 | 1
7 | http://www.example.com/foo/7 | 1
8 | http://www.example.com/foo/8 | 0
domain classes
class Category {
static hasMany = [targets: Target]
static mapping = {
cache true
cache usage: 'read-only'
targetConditions cache : true
}
String name
String source
}
class Target {
static belongsTo = [urlMetaData: UrlMetaData, category: Category]
static mapping = {
cache true
cache usage: 'read-only'
}
int weight
}
class UrlMetaData {
String url
String ugcFlag
static hasMany = [targets: Target ]
static mapping = {
cache true
cache usage: 'read-only'
}
static transients = ['domainName']
String getDomainName() {
return HostnameHelper.getBaseDomain(url)
}
}
Basically, a url from url_meta_data can be associated to many categories. So in essence what I'm trying to achieve should be a relatively basic operation...to return all the urls for the search-term 'car', their weight(i.e importance) and where the ugc_flag is not 1(i.e the url is not user-generated content). There are 100K + of records in the db and these are imported from a third-party provider. Note that all the URLs do belong to my client - not doing anything dodgy here.
Note the rlike I've used in the query - I was originally using ilike %searchterm% but that would find categories where searchterm is part of a larger word, for example 'caravan') - unfortunately though the rlike is not going to return anything if the user requests 'cars'.
I edited the code - as Igor pointed out the strange inclusion originally of 'domainName'. This is an optional parameter passed that allows the user to filter for urls of only a certain domain (e.g. 'example.com')
I'd create an empty list of given domain objects,
loop over the resultsList, construct a domain object from each row and add it to a list of those objects. Then return that list from controller to view. Is that what you're looking for?
1) If it's a Grails application developed from a scratch (rather than based on a legacy database structure) then you probably should already have domain classes Category, Target, UrlMetaData (otherwise you'll have to create them manually or with db-reverse-engineer plugin)
2) I assume Target has a field Category category and Category has a field UrlMetaData urlMetaData
3) The way to go is probably http://grails.org/doc/2.1.0/ref/Domain%20Classes/createCriteria.html and I'll try to outline the basics for your particular case
4) Not sure what theDomain means - might be a code smell, as well as accepting rlike arguments from the client side
5) The following code hasn't been tested at all - in particular I'm not sure how disjunction inside of a nested criteria works or not. But this might be suitable a starting point; logging sql queries should help with making it work ( How to log SQL statements in Grails )
def c = Target.createCriteria() //create criteria on Target
def resultsList = c.list(max: resultLimit.toLong()) { //list all matched entities up to resultLimit results
category { //nested criteria for category
//the following 'if' statement and its body is plain Groovy code rather than part of DSL that translates to Hibernate Criteria
if (searchTerms) { //do the following only if searchTerms list is not empty
or { // one of several conditions
for (st in searchTerms) { // not a part of DSL - plain Groovy loop
rlike('name', st.trim())) //add a disjunction element
}
}
}
urlMetaData { //nested criteria for metadata
ne('ugcFlag', 1) //ugcFlag not equal 1
}
}
order('weight', 'desc') //order by weight
}
Possibly the or restriction works better when written explicitly
if (searchTerms) {
def r = Restrictions.disjunction()
for (st in searchTerms) {
r.add(new LikeExpression('name', st.trim()))
}
instance.add(r) //'instance' is an injected property
}
Cheers,
Igor Sinev

add two fields and insert into third in domain class in grails

I am working on a grails project in which i have a domain class having 3 fields. my requirement is to input only 2 fields and 3rd field get populated with the sum of both the fields. can anyone send me the code. thnks
See Derived properties
Example from above link
class Product {
Float price
Float taxRate
Float tax
static mapping = {
tax formula: 'PRICE * TAX_RATE'
}
}
Here's two complete ways of doing it, depending on your needs.
If you don't need to store the third field, meaning it's only used for display, you can do this:
class MyDomain {
int field1
int field2
static transients = ['field3']
getField3() {
field1 + field2
}
}
This will allow you to access the sum as myDomain.field3.
If you need to store it, say because it's heavily used in calculations, you can use events to automatically calculate and store the sum, like so:
class MyDomain {
int field1
int field2
int field3
def beforeInsert() {
field3 = field1 + field2
}
def beforeUpdate() {
field3 = field1 + field2
}
}
The benefit of doing it this way is that the third field is populated no matter where it's created or updated.
Two Notes:
If you only want to sum field3 when it's created, and not on updates, then remove beforeUpdate.
If you are doing more complex calculations than that simple sum, throw them in another method (like updateField3), and call that instead of hard-coding it.

GORM createCriteria and list do not return the same results : what can I do?

I am using Nimble and Shiro for my security frameworks and I've just come accross a GORM bug. Indeed :
User.createCriteria().list {
maxResults 10
}
returns 10 users whereas User.list(max: 10) returns 9 users !
After further investigations, I found out that createCriteria returns twice the same user (admin) because admin has 2 roles!!! (I am not joking).
It appears that any user with more than 1 role will be returned twice in the createCriteria call and User.list will return max-1 instances (i.e 9 users instead of 10 users)
What workaround can I use in order to have 10 unique users returned ?
This is a very annoying because I have no way to use pagination correctly.
My domain classes are:
class UserBase {
String username
static belongsTo = [Role, Group]
static hasMany = [roles: Role, groups: Group]
static fetchMode = [roles: 'eager', groups: 'eager']
static mapping = {
roles cache: true,
cascade: 'none',
cache usage: 'read-write', include: 'all'
}
}
class User extends UserBase {
static mapping = {cache: 'read-write'}
}
class Role {
static hasMany = [users: UserBase, groups: Group]
static belongsTo = [Group]
static mapping = { cache usage: 'read-write', include: 'all'
users cache: true
groups cache: true
}
}
Less concise and clear, but using an HQL query seems a way to solve this problem. As described in the Grails documentation (executeQuery section) the paginate parameters can be added as extra parameters to executeQuery.
User.executeQuery("select distinct user from User user", [max: 2, offset: 2])
this way you can still use criteria and pass in list/pagination paramaters
User.createCriteria().listDistinct {
maxResults(params.max as int)
firstResult(params.offset as int)
order(params.order, "asc")
}
EDIT: Found a way to get both! Totally going to use it now
http://www.intelligrape.com/blog/tag/pagedresultlist/
If you call createCriteria().list() like this
def result=SampleDomain.createCriteria().list(max:params.max, offset:params.offset){
// multiple/complex restrictions
maxResults(params.max)
firstResult(params.offset)
} // Return type is PagedResultList
println result
println result.totalCount
You will have all the information you need in a nice PagedResultList format!
/EDIT
Unfortunately I do not know how to get a combination of full results AND max/offset pagination subset in the same call. (Anyone who can enlighten on that?)
I can, however, speak to one way I've used with success to get pagination working in general in grails.
def numResults = YourDomain.withCriteria() {
like(searchField, searchValue)
order(sort, order)
projections {
rowCount()
}
}
def resultList = YourDomain.withCriteria() {
like(searchField, searchValue)
order(sort, order)
maxResults max as int
firstResult offset as int
}
That's an example of something I'm using to get pagination up and running. As KoK said above, I'm still at a loss for a single atomic statement that gives both results. I realize that my answer is more or less the same as KoK now, sorry, but I think it's worth pointing out that rowCount() in projections is slightly more clear to read, and I don't have comment privileges yet :/
Lastly: This is the holy grail (no pun intended) of grails hibernate criteria usage references; bookmark it ;)
http://www.grails.org/doc/1.3.x/ref/Domain%20Classes/createCriteria.html
Both solutions offered here by Ruben and Aaron still don't "fully" work for pagination
because the returned object (from executeQuery() and listDistinct) is an ArrayList
(with up to max objects in it), and not PagedResultList with the totalCount property
populated as I would expect for "fully" support pagination.
Let's say the example is a little more complicated in that :
a. assume Role has an additional rolename attribute AND
b. we only want to return distinct User objects with Role.rolename containing a string "a"
(keeping in mind that a User might have multiple Roles with rolename containing a string "a")
To get this done with 2 queries I would have to do something like this :
// First get the *unique* ids of Users (as list returns duplicates by
// default) matching the Role.rolename containing a string "a" criteria
def idList = User.createCriteria().list {
roles {
ilike( "rolename", "%a%" )
}
projections {
distinct ( "id" )
}
}
if( idList ){
// Then get the PagedResultList for all of those unique ids
PagedResultList resultList =
User.createCriteria().list( offset:"5", max:"5" ){
or {
idList.each {
idEq( it )
}
}
order ("username", "asc")
}
}
This seems grossly inefficient.
Question : is there a way to accomplish both of the above with one GORM/HQL statement ?
You can use
User.createCriteria().listDistinct {
maxResults 10
}
Thanks for sharing your issue and Kok for answering it. I didn't have a chance to rewrite it to HQL. Here is my solution (workaround): http://ondrej-kvasnovsky.blogspot.com/2012/01/grails-listdistinct-and-pagination.html
Please tell me if that is useful (at least for someone).

Resources