Watson Studio change multiple column types at the same type in Refine - watson-studio

I am loading a file to watson studio with 152 columns and I have the problem that by default it takes the string type.
Is there any way to change several columns at the same time?
I know I can do it column by column but 150 columns are too much.
I tried "mutate_all(~ ifelse(is.na(as.double(.x)),.x,as.double(.x)))"
It works in the preview but fails when I launch the flow with the following error:
19 Feb 2019-20:15:25+0100: Job execution started
19 Feb 2019-20:15:32+0100: Error in ifelse(is.na(as.double(.x)), .x, as.double(.x)): object 'COLUMN1' not found
19 Feb 2019-20:15:32+0100: Job execution ended

If you need to do for all the string columns, please use mutate_if instead of mutate_all()
mutate_if(is.character,as.double)
It should change all the string types to double.
So if you want to not convert any specific column you would have to do something like this, -matches() would list all columns other than specified column and would apply double conversion to only those columns.
mutate_at(vars(-matches("columnname")),funs(as.double(.)))

Related

Trying to make a generated column in my postgres database using these attributes from my rails models

I have a model ServiceAgreement with :starts_at(Datetime), :ends_at(Datetime), and :service_interval(integer that corresponds to an enum)
{"weekly"=>0, "every_month"=>1, "biannually"=>2, "annually"=>3}
I need this virtual column to be an array of the proposed service dates based on the distance between starts_at and ends_at. ex. a two-year agreement with a "biannually" service interval should have 4 dates: 6 months, 12 months, 18 months, and 24 months from the start.
I can do it in ruby, but I'm having a hard time with the SQL for it.
Side note: How would I even write the add_column line in my migration?
Thanks in advance.

Changing timezone for a default timestamp column in Snowflake

I have a table defined in snowflake as follows;
CREATE OR REPLACE TABLE DATA_LAKE.CUSTOMER.ACT_PREDICTED_PROBABILITIES(
PREDICTED_PROBABILITY FLOAT,
TIME_PREDICTED TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
As expected default timezone is America/Los Angeles. I am trying to change it to UTC, but only for this column, not at account/session level.Here is the code I wrote;
ALTER TABLE DATA_LAKE.CUSTOMER.ACT_PREDICTED_PROBABILITIES ALTER TIME_PREDICTED SET timezone= 'UTC';
But it is giving an error,
SQL compilation error: syntax error line 1 at position 86 unexpected 'timezone'.
Can I please get help on how to do the change at column level?thanks in advance.
I recommend reviewing this documentation regarding timestamp data types:
https://docs.snowflake.com/en/sql-reference/data-types-datetime.html#timestamp
However, assuming that your TIMESTAMP_TYPE_MAPPING is still the default of TIMESTAMP_NTZ, then you've now got a bunch of data that is set to America/Los Angeles without a timezone offset in the values, and if you're not going to change any of your account timezone settings, then you either need to leave it that way and just change the timezone as you select the data using the CONVERT_TIMEZONE function, or you should change your table definition to a data type that includes a timezone offset. You could also update the column as it is by converting it with the same CONVERT_TIMEZONE function, but then future data would still be inserted using America/Los Angeles timezone.
My recommendation is to use TIMESTAMP_TZ as your column type and modify the current data accordingly.

Set new year for date column in rails query

I have the contract start of a number of companies, and I want to report on each contract year by creating a column with the contract start updated to a select year. There are a number of solutions in SQL involving functions like DATE_ADD or DATEFROMPARTS, but I'm having trouble adapting it to rails (if those functions are available at all).
The closest I've gotten is: Company.select("contract_start + '1 YEAR'::INTERVAL as new_contract_start"). This adds 1 year to each contract start but doesn't take into account contracts older than a year (or started the same year). I've also tried the following but again run into syntax errors:
new_year = 2020
Company.select("contract_start + '#{new_year} - EXTRACT (YEAR from contract_start) YEAR'::INTERVAL")
I'm looking for a solution that can either:
Directly set the year to what I want
Add a variable amount of years based on its distance from the desired year
I'm on Ruby 2.3.3
I think the key here was finding functions compatible with the PostgreSQL that my database was built on. Once I started searching for the functions I thought would help and their PostgreSQL equivalents, I found more compatible solutions, such as: NUMTODSINTERVAL in PostgreSQL
I ended up with:
contract_start_year = 2020
Company.select("contract_start + make_interval(years => CAST (#{contract_start_year} - EXTRACT (YEAR from contract_start) as INT))
I've also made it a bit smarter by adding the number of years required to get the latest contract date without going over the report date. This would be problematic if the report start date was "2020-01-01" but the contract start was "2017-06-01". Setting the contract date to "2020-06-01" would overshoot the intentions of the report.
report_start = "`2020-07-01`"
Company.select("contract_start + make_interval(years => CAST (EXTRACT (YEAR FROM AGE(CAST (#{start_quotations} AS DATE), contract_start)) AS INT)) as new_contract_year")
Note the additional single quotes in report_start since the SQL code need to read a string to convert it to a date
There might be other methods that can "build" the date directly, but this methods works well enough for now.

Converting DT_WSTR to DT_DATE

So Im pretty new to this stuff but working through a few issues. What I am trying to do is pull source files from a Flat File Source but the dates in all my source files are formatted to YYYYMMDD so I have inserted a Derived Column task and created an expression to format all the columns with dates YYYYMMDD to YYYY-MM-DD and that looks like this:
LEFT(ISSUE_DT,4) + "-" + SUBSTRING(ISSUE_DT,5,2) + "-" + RIGHT(ISSUE_DT,2)
All is good with that except it's in the data type of DT_WSTR so I dropped in a Columns Conversion task to convert DT_WSTR to DT_DATE but I keep getting the following error:
[Columns Conversion [1]] Error: Data conversion failed while converting
column "ISSUE_DT Formatted" (258) to column "Copy of ISSUE_DT Formatted"
(16). The conversion returned status value 2 and status text "The value
could not be converted because of a potential loss of data.".
I have tried opening the advanced editor and navigated to the Data Conversion Output Columns and tried changing the DataType under Data Type Properties to DT_DATE but still the same error..
What am I missing or doing wrong??
Data Flow
Formatted Dates
Column Conversion
Column Conversion Advanced Editor
You have some dates that are not in the format that your SSIS package is expecting. Perhaps single-digit months or days do not have a leading 0. That scenario would definitely cause this specific error.
Take today for example, if single-digit months or days did not have leading zeroes, you have 2018918 instead of 20180918. Without seeing the data, I can't guarantee that this is the exact issue, but it is something like this. The error is occurring when converting the string to the date. So continuing with the example above, after your Derived Column ISSUED_DT formatted would have a value of '2018-91-18' which of course is not a valid date and causes the error.

Monthly data Extraction for 2 different tables with a common field Using Joins

I have two different tables with a common field, now i want to extract monthly records from these tables in year wise.
for example table 1 have following records
date items
01/20/2008 20
02/15/2008 10
01/23/2009 23
02/25/2009 12
03/15/2010 05
table 2
date items
01/12/2008 02
02/09/2008 10
01/02/2009 03
02/10/2009 07
03/19/2010 12
And i need the output as follows
date items
jan-2008 22
feb-2008 20
jan-2009 26
feb-2009 19
jan-2010 17
With the help of joins
You don't necessarily need a JOIN to make this work. This is a MSSQL implementation that should give you the results in your output.
SELECT [date], SUM(items) as items
FROM (
SELECT LOWER(LEFT(DATENAME(MONTH, [date]),3)) + '-' + CONVERT(VarChar(4), DatePart(yyyy, [date])) as [date], items
FROM table1
UNION ALL
SELECT LOWER(LEFT(DATENAME(MONTH, [date]),3)) + '-' + CONVERT(VarChar(4), DatePart(yyyy, [date])), items
FROM table2
) a
GROUP BY [date]
The answer depends on your version of SQL. Please run ##version and update your answer as to which version you are using. This is important because many date functions have been added to SQL SERVER 2012+. For instance, this problem can be easily resolved by using
MONTH(dateField), which returns, the month of the date. This will not work on 2008R2 or below. You can use DAY, MONTH and YEAR to get to use for an easy join.

Resources