how can I use date variable while building the model? - machine-learning

Please find github link for data set and its info.
I'm working on some bank data set where I need predict the customers who are willing to take the loan(Classification). Can I drop that date column or I need to consider that?

M not sure what type of dates you are referring to, e.g. The date column would make sense if its a Birth Date of the customer, as taking a loan would depend on the person age as well which is conveyed by its birth date. So simply asking should i drop date columns does not make sense. In ML it is always important to consider the context related to column

Related

How do I get the nearest past date in a range for each entry in another list?

I collect customer feedback for my education business and add it to a Google Sheet. The feedback data has a submission date (A2:A) and some satisfaction metrics, which I visualize in a Google Data Studio dashboard.
The problem is that I want the feedback per cohort, but not everyone fills in the feedback form on the same day. I have a list of all courses with their respective dates (Cohorts!A2:A), and I want to assign each feedback submission to their respective cohort in a new column. It would be nice to also match it to the specific course type and country, but for now matching the cohort date would suffice.
I've tried using VLOOKUP and ARRAYFORMULA to go through the feedback dates and get the nearest past date to take it as the "course date" for that student. All the solutions I've tried either only take a single date or TODAY as a reference, but I have a whole list I'd like to fill in.
From my understanding, you are trying to round the timestamp, then match it to your course table?
To round a timestamp to a date:
=INT($A2)
When doing lookups like you're describing, I frequently end up calculating the nearest week as well - this formula returns the Sunday of the week start. Figured it might be helpful.
=text($A2+CHOOSE(WEEKDAY($A2),0,-1,-2,-3,-4,-5,-6),"m/d/yyyy")

better design for fact table where each row has a Start & End Date

My fact table contains details for clients who attend a course.
To ensure i can get a list of clients registered on any particular day, I have not related the date dimension to the fact table.
Instead i created a measure that does basic between logic (where startDate <= selectedDate && endDate >=SelectedDate)
This allows me to find all clients registered on one single selected day.
There are a few drawback to this however:
-I have to ensure the report user only selects a single day, i.e. they cannot select a date range.
-I cant easily do counts for samePeriodLastMonth or Year.
Is there a better design i should consider that will still allow me to see counts of registered clients on any given day, along with allowing me to use SamePeriodLastMonth/Year functionality?
Would you mind uploading the structure of your fact and dim tables?
Just a thought bubble: if you would like to measure counts for a program over calendar years, I believe you would definitely need to create a Date dimension. Also depending on your reporting needs you might want to consider whether you need an Accumulating Snapshot Fact table.
Please find further details on this:
http://www.kimballgroup.com/2012/05/design-tip-145-time-stamping-accumulating-snapshot-fact-tables/
Cheers
Nithin

How to filter result based on birthdate and min/max age values in Parse?

I'm using the Parse iOS SDK. I want to filter users based on their specified age ranges.
I have two tables:
1st, tableUser which has a field titled birthdate with a String data type.
2nd, tableSettings which has two fields minAge and maxAge, both of which are Number types
I want to fetch users from the tableUser class who's age, calculated from birthdate field, falls between the age range specified in the tableSettings class. For example, if the minAge value is 20 and the maxAge value is 25, then I only want to retrieve users with an age between this range.
Is this possible? How would I make such a query?
Your requirement sounds non-trivial with that suboptimal data structure. I'd probably go for cloud code to hide the required logic from the app. This logic would be to query the tableSettings and calculate the date range that applies.
Now that you have this range, it's still hard to use because your other table uses a string representation of the date rather than a true Date type. This really sucks. If you can you should change the date to the correct type, or at least add another column with a correct representation of the date (but then you have to keep them in sync).
Working with dates you can add specific range criteria to your query and life is easy.
Working with strings is compounded in difficulty because you have the day first, so you can't even use BEGINSWITH to filter the query on year and then process the content. It really is a terrible data model for the problem. So this basically leaves you paging through everything doing an explicit conversion of the string to a date and then comparing that to the range.
If you at all can, change the data model. Even if you create a new class (table) specifically for this data and use an afterSave hook to keep them in sync.

Joining Two Data-sets in PowerPivot by Month

I've got 2 different data sets, revenue and contracts sold, that I need to join based off of year and month in PowerPivot so when I use my slicers, they'll filter accordingly. I know part of this will involve coming up with some temp tables for year and month but I can't get those to work. In the contracts sold table, there is an actual date column which I'm then using to format the year/month in "MM-MMM" format:
However, the revenue comes in only as a YYYYMM format:
So the solution would have to take into account this aspect as well. It's been a while since I've dealt with PowerPivot and I recall the PowerPivotPro or Kasep de Jonge's site containing something about linking tables based off of common month but I can't find those pages anymore. If anyone could point me in the right direction or give me some insight, it'd be greatly appreciated.
I'm using Excel 2010 with PowerPivot version 11.0.3000.0.
Thanks,
Joshua
Joshua, I think the solution can be quite simple:
In the contracts sold table, create a new calculated column (a new column within a powerpivot window) that would give you the same date format as is in the revenue table (YYYYMM).
Use Create Time Dimension app in Excel 2013 -- this app creates a date-table with unique dates which makes everything much easier. As with the other table, create a new calculated column with the same format (YYYYMM).
Make a relationship between those tables -- the date table will be linked to revenue as well as contracts.
Created required measures (like sums of revenue, number of contracts etc.).
Place a new pivot table - rows will probably be date-based (YYYYMM), with measures coming from both tables it should be easy to create a report that you need.

How would I store a date that can be partial (i.e. just the year, maybe the month too) and output it later with the same specifity?

I want to let users specify a date that may or may not include a day and month (but will have at least the year.) The problem is when it is stored as a datetime in the DB; the missing day/month will be saved as default values and I'll lose the original format and meaning of the date.
My idea was to store the real format in a column as a string in addition to the datetime column. Then I could use the string column whenever I have to display the date and the datetime for everything else. The downside is an extra column for every date column in the table I want to display, and printing localized dates won't be as easy since I can't rely on the datetime value... I'll probably have to parse the string.
I'm hoping I've overlooked something and there might be an easier way.
(Note I'm using Rails if it matters for a solution.)
As proposed by Jhenzie, create a bitmask to show which parts of the date have been specified. 1 = Year, 2 = Month, 4 = Day, 8 = Hour (if you decide to get more specific) and then store that into another field.
The only way that I could think of doing it without requiring extra columns in your table would be to use jhenzie's method of using a bitmask, and then store that bitmask into the seconds part of your datetime column.
in your model only pay attention to the parts you care about. So you can store the entire date in your db, but you coalesce it before displaying it to the user.
The additional column could simple be used for specifying what part of the date time has been specified
1 = day
2 = month
4 = year
so 3 is day and month, 6 is month and year, 7 is all three. its a simple int at that point
If you store a string, don't partially reinvent ISO 8601 standard which covers the case you describe and more:
http://en.wikipedia.org/wiki/ISO_8601
Is it really necessary to store it as a datetime at all ? If not stored it as a string 2008 or 2008-8 or 2008-8-1 - split the string on hyphens when you pull it out and you're able to establish how specific the original input was
I'd probably store the datetime and an additional "precision" column to determine how to output it. For output, the precision column can map to a column that contains the corresponding formatting string ("YYYY-mm", etc) or it can contain the formatting string itself.
I don't know a lot about DB design, but I think a clean way to do it would be with boolean columns indicating if the user has input month and day (one column for each). Then, to save the given date, you would:
Store the date that the user input in a datetime column;
Set the boolean month column if the user has picked a month;
Set the boolean day column if the user has picked a day.
This way you know which parts of the datetime you can trust (i.e. what was input by the user).
Edit: it also would be much easier to understand than having an int field with cryptic values!
The informix database has this facility. When you define a date field you also specify a mask of the desired time & date attributes. Only these fields count when doing comparisons.
With varying levels of specificity, your best bet is to store them as simple nullable ints. Year, Month, Day. You can encapsulate the display logic in your presentation model or a Value Object in your domain.
Built-in time types represent an instant in time. You can use the built in types and create a column for precision (Year, Month, Day, Hour, Etc.) or you can create your own date structure and use nulls (or another invalid value) for empty portions.
For ruby at least - you could use this gem - partial-date
https://github.com/58bits/partial-date

Resources