Apache PIG - Create a new column based on another value - foreach

I've a column (named Product) that is defined as Chararray that have three values: OT, AT and HP. I want to create a new column and transform this values in integer:
OT = 1
AT = 2
HP = 3
For that I create a foreach statment:
REGISTER '/usr/lib/pig/piggybank.jar';
File = load '/user/cloudera/file.csv'
USING org.apache.pig.piggybank.storage.CSVExcelStorage(',')
as (ID:Long,
Chain:Int,
Dept:Int,
Product_Measure:Chararray,
Price:Double);
Values = FOREACH File Generate
ID,
Chain,
Dept,
((Chararray)Product_Measure=='OT'?'1':(Chararray)Product_Measure=='AT'?'2':(Chararray)Product_Measure=='HP'?'3':'0') as Product_Measure,
(Price<0.1?0:Price) as Price;
Filter_Values = FILTER Values BY Price > 0;
DUMP Filter_Values;
If remove the thrid line it works fine, so I think the problem is in when I try to convert the chararray in int.
Can anyone help me?
Thanks!

Values = FOREACH Source Generate
ID,
Date,
((Chararray)Product == 'OT' ? (int)1 : (Chararray)Product_Measure == 'AT' ? (int)2 : (Chararray)Product_Measure == 'HP' ? (int)3 : 0) as Product_Value,
(Quantity<0?0:Quantity) as Quantity,
(Price<0.1?0:Price) as Price;
or if you want NULL then
Values = FOREACH Source Generate
ID,
Date,
((Chararray)Product == 'OT' ? '1' : (Chararray)Product_Measure == 'AT' ? '2' : (Chararray)Product_Measure == 'HP' ? '3' : 'NULL') as Product_Value,
(Quantity<0?0:Quantity) as Quantity,
(Price<0.1?0:Price) as Price;
Two modification u need to do in your pig script.
1st in place of = just put ==
2nd if u want null in value convert it to chararray else all replaced value to int

Related

Create a Trigger to genearate a random alphanumeric string in informix

To create a trigger before insert using Informix database.
When we try to insert a record into the table it should insert random alphanumeric string into one of the field. Are there any built in functions?
The table consists of the following fields:
empid serial NOT NULL
age int
empcode varchar(10)
and I am running
insert into employee(age) values(10);
The expected output should be something as below:
id age empcode
1, 10, asf123*
Any help is appreciated.
As already commented there is no existing function to create a random string however it is possible to generate random numbers and then convert these to characters. To create the random numbers you can either create a UDR wrapper to a C function such as random() or register the excompat datablade and use the dbms_random_random() function.
Here is an example of a user-defined function that uses the dbs_random_random() function to generate a string of ASCII alphanumeric characters:
create function random_string()
returning varchar(10)
define s varchar(10);
define i, n int;
let s = "";
for i = 1 to 10
let n = mod(abs(dbms_random_random()), 62);
if (n < 10)
then
let n = n + 48;
elif (n < 36)
then
let n = n + 55;
else
let n = n + 61;
end if
let s = s || chr(n);
end for
return s;
end function;
This function can then be called from an insert trigger to populate the empcode column of your table.

How do I convert string to integer inside select new statement in LINQ?

Here is my query...
var salesSelection = (from a in crmDBContext.Quotes
where a.quoteDate >= startDate && a.quoteDate <= endDate && a.region == regionselect
group a by new { a.customerNumber } into queryOut
select new { queryOut.Key.customerNumber,
totalQuantity = queryOut.Sum(q => int.Parse(q.itemQuantity)),
totalPrice = queryOut.Sum(s => s.price) }).OrderByDescending(i => i.totalPrice);
I get an error that linq doesn't recognize the method.
I have also tried
totalQuantity = queryOut.Sum(q => Convert.ToInt32(q.itemQuantity)).
I am not allowed to change the actual column in the table in the database to reflect the quantity as an integer instead of a string as the table updates daily from our mainframe.
Is there a way to do this conversion inside of the linq query, and if not, is there a way to retrieve all of the quantities and sum them later without having to requery?

Failing to query from database

I'm trying to get a count of occurrences of a value from my database. It's failing.
My effort is
var dc = new Dal.Entities();
var query = (from d in dc.Instruments
where d.Src.ToLower().Contains("other.png")
select new
{
count = d.Src.Count(),
key = d.Src
}
);
This keeps throwing the following exception
"DbExpressionBinding requires an input expression with a collection ResultType.\r\nParameter name: input"
If I change select new... to select d then it works fine so I know that part of the query is OK.
I don't understand why I can't get a Count of each string it finds. What did I do wrong?
edit
If my database is
Src (column title)
my value
my other value
my value
I'm hoping to get the result of
my value, 2
my other value, 1
You need to group by then:
var query = from d in dc.Instruments
where d.Src.ToLower().Contains("other.png")
group d by d.Src into g
select new
{
count = g.Count(),
key = g.Key
};
var items = dc.Instruments
.Where(p => p.Src.ToLower().Contains("other.png"))
.Count();
or
var items = (from item in dc.Instruments
where item.Src.ToLower().Contains("other.png")
select item).Count();

How to cast or convert string data type to integer in order to run sum function in linq usage

var logs = (from m in db.AccountFileOperationLogList
where m.OperationId == OperationId
where m.JobStatus == "OK"
select m.JobFileSize).Sum();
m.JobFileSize is nvarchar database column without any null or string values.
You can try this:
var logs = (from m in db.AccountFileOperationLogList
where m.OperationId == OperationId
where m.JobStatus == "OK"
select m.JobFileSize).ToList();// getting data from db to local memory
logs.Sum(s => int.Parse(s));//and then you can use `int.Parse(s)` function
Try this:
var logs = (from m in db.AccountFileOperationLogList
where m.OperationId == OperationId
where m.JobStatus == "OK"
select m.JobFileSize).Sum(s => int.Parse(s));

Can I Compare the day name with the date Entity Framework 6

I am working on news function. News model contains publishing date....
Is there a way to filter my record from db on the base of Publishing Date's day name such as in controller action:
var data1 = db.News.Where(x => x.PublishingDate >= DateTime.Now
&& x.PublishingDate.Day == (int)DayOfWeek.Sunday);
ViewBag.SundayNews = data1;
Or if there is another way around or any reference.
Try this solution: http://c-sharp-snippets.blogspot.ru/2011/12/getting-dayofweek-in-linq-to-entities.html
var firstSunday = new DateTime(1753, 1, 7);
var filtered = from e in dbContext.Entities
where EntityFunctions.DiffDays(firstSunday, e.SomeDate) % 7 == (int)DayOfWeek.Monday
select e;
firstSunday stores the minimal value for MS SQL DATETIME type.

Resources