xls2csv: wrong date cells parsing - parsing

I format dates with the command line option -f %Y-%m-%d or even %d-%b-%y
but each date comes out four years and one day ahead of the date I input
for example, date 01.06.2012 after parsing without -f option comes as 2016-06-02
toying with -f gives same result
What is the reason? Are there any workarounds,
except hardcode and substract back these 4 years and 1 day?
I am using xls2csv (by V.B.Wagner, comes with catdoc package in debian)
and switching to another parser can be very expensive option

Tools xls2csv is a Perl application that uses Spreadsheet::ParseExcel library.
Based on such library documentation, one of known problems is:
If Excel has date fields where the specified format is equal to the system-default for the short-date locale, Excel does not store the format, but defaults to an internal format which is system dependent. In these cases ParseExcel uses the date format 'yyyy-mm-dd'.
So you probably manipulate with Excel file that does not contain date formating due to above listed issue.

That's a known bug. A patch is available at
https://www.wagner.pp.ru/cgi-bin/cvstrac/catdoc/tktview?tn=14,4
It works.
By the way, there are two programs called xls2csv, we're talking about the one from the catdoc package, not the Perl program.

Related

How many obscure subclasses of `NSFormatter` are there? Know of any others?

If you're used Cocoa for a while you're probably familiar with NSDateFormatter
and NSNumberFormatter. They're handy for creating formatted display strings from dates and numbers, or for converting date or number strings into numeric values, while supporting different languages and locales.
A few weeks ago I stumbled on NSDateComponentsFormatter, which lets you create formatted time intervals like "4 hours, 37 minutes and 17 seconds." Pretty cool.
There's also the related NSDateIntervalFormatter, which creates strings by comparing 2 dates.
Then there are some REALLY obscure NSFormatter subclasses:
NSMassFormatter
NSByteCountFormatter
NSLengthFormatter
NSEnergyFormatter
NSPersonNameComponentsFormatter
EDIT:
From the comments, I've aded NSPersonNameComponentsFormatter.
Searching on "NS*Formatter" in the Xcode help system reveals most of these, but not all. (It looks like the help text has to be indexed correctly in order for searching to work, which is annoying.)
That brings the total I have been able to find to
NSDateIntervalFormatter -Difference between 2 dates
NSDateComponentsFormatter -NSDateComponents to/from string
NSDateFormatter -Formats NSDates as strings
NSNumberFormatter -Formats numbers as strings
NSMassFormatter -Formats mass quantity as strings
NSByteCountFormatter -Formats byte counts in K, MB, GB, etc.
NSLengthFormatter -Formats length values
NSEnergyFormatter -Displays energy qualities in Joules or Calories
NSPersonNameComponentsFormatter - displays localized formatted names
Annoyingly, it looks like many of these formatters don't have a locale property, so it's not very easy to use them to create formatted strings in languages/locales other than the system's default locale. If I'm missing something, please tell me.
Does anybody else know of other formatters I'm missing? These are pretty obscure, but could save you a lot of time if you were to need them.
EDIT #2:
Question part 2: Is there a way to get output from the formatters that lack a locale property in locale's other than the system default locale? It seems silly that they don't ALL have and honor a locale property. It's pretty common to need to generate output for languages/locales other than the current locale.
There's no need to search. The NSFormatter documentation lists all of its subclasses. Look at the top of the page, in the "inherits from" block.
Note that this information is not available in the Xcode 7.3 doc reader. It's only available in the online docs (or by using the excellent Dash reader).
I went into
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.11.sdk/System/Library/Frameworks
and then did
for header in **/*.h; do ack -o 'NSFormatter' "$header"; done
which gave me some interesting ones:
NSPersonNameComponentsFormatter
CNContactFormatter
CNPostalAddressFormatter
DRMSFFormatter
MKDistanceFormatter
Doing the same for iPhoneOS.sdk didn't turn up any new NSFormatter subclasses.

tFuzzyMatch apparently not working on Arabic text strings

I have created a job in talend open studio for data integration v5.5.1.
I am trying to find matches between two customer names columns, one is a lookup and the other contain dirty data.
The job runs as expected when the customer names are in english. However, for arabic names, only exact matches are found regardless of the underlying match algorithm i used (levenschtein, metaphone, double metaphone) even with loose bounds for the levenschtein algorithm min 1 max 50).
I suspect this has to do with character encoding. How should I proceed? any way I can operate using the unicode or even UTF-8 interpretation in Talend?
I am using excel data sources through tFileInputExcel
I got it resolved by moving the data to mysql with a UTF-8 collation. Somehow Excel input wasn't preserving the collation.

Best way to present the ambiguous extra hour when winter time begins

What is the standard way to present the ambiguous extra hour when winter time begins?
So far i used localized time formats to display and parse dates and times. E.g. 1. January 2014, 15:27.
I'm using location based time zones like "Europe/Berlin".
And i can't just change to plain GMT offsets because i do need to perform calculations on the dates. Otherwise i would get the wrong absolute time when moving across DST change dates.
All this works fine except for the one hour at the end of DST (e.g. October 26th 2014, 2am-3am) which occurs twice. I need to present it in a way that i can later parse again.
Is there a stadardized format? Do i just add a custom symbol? Do i use the GMT offset additionally to the geographic time zone? And how do i know when to use this special format - because i don't want to print it all the time, since it's redundant most of the year.
The answer by Matt Johnson is correct and insightful.
Java 8 & java.time.*
Let me add another to his list, from the new java.time.* classes bundled with Java 8 and defined by JSR 310. These new classes are inspired by Joda-Time but are re-architected.
The default used by java.time.ZonedDateTime is one concatenated string using square brackets around the time zone name and no spaces.
2014-10-26T13:49:48.278+01:00[Europe/Berlin]
#MattJohnson Feel free to merge this answer's content with yours if you wish.
There isn't a standard that combines everything. The closest you can get is with two fields. One which would be an ISO8601/RFC3339 Date+Time+Offset, and another which would be the IANA/Olson time zone.
Depending on your platform, you may have a single object that represents both, such as a DateTime in Joda-Time or a ZonedDateTime in Noda Time. But there is no standardized representation of this as a string.
Here are some that I have seen though:
Two completely separate strings:
"2014-10-26T02:00:00+01:00"
"Europe/Berlin"
One concatenated string, space separated:
"2014-10-26T02:00:00+01:00 Europe/Berlin"
One concatenated string with a space and using parentheses:
"2014-10-26T02:00:00+01:00 (Europe/Berlin)"
One concatenated string without any space, but with square brackets: (thanks Basil)
"2014-10-26T02:00:00+01:00[Europe/Berlin]"
As JSON, with some predetermined field names:
{
value: "2014-10-26T02:00:00+01:00",
zone: "Europe/Berlin"
}
As XML, with some predetermined attribute names:
<TimeStamp Value="2014-10-26T02:00:00+01:00" Zone="Europe/Berlin" />
As XML, with some predetermined element names:
<TimeStamp>
<Value>2014-10-26T02:00:00+01:00</Value>
<Zone>Europe/Berlin</Zone/>
</TimeStamp>
Any of these would be acceptable. Pick the one that fits your situation, or adapt to something similar.
Regarding your question:
... how do i know when to use this special format ...
When you're recording an event that has already passed and cannot be changed, then you do not need to store the time zone. The date+time+offset value alone is sufficient. Otherwise, you need both.

Converting Date Format in Advantage SQL

I have a simple problem in Advantage Database SQL.
I have dates in the format M/D/YYYY and want to convert them MM/DD/YYYY. Normally in SQL Server I would just use a convert(varchar(20), field, 101) but this does not work in Advantage.
What is the format for doing so?
I don't believe there is a simple conversion function like that available. To convert it directly in SQL would probably turn into a fairly messy statement (I think it would require a combination of CONVERT, YEAR, DAY, and MONTH scalars).
If the goal, though, is to force the display of date values in a specific format in the client application, then one possibility might be to specify the date format at connection time. How you do that depends on the client being used. If, for example, you are using a connection string, then you may be able to specify the date format as follows.
Data Source=\\server\share\yourdatapath;...;DateFormat=MM/DD/YYYY;

What formats can I use for an xforms:input bound to a node of type xs:date?

I am looking at the <xforms:input> formatting documentation and am curious if it is at all possible to display the date as "3 Jul 2011". This should formatting very simple given the use of Java's SimpleDateFormat with the mask [d] [MMM] [yyyy]. The <xforms:input> documentation makes it seem possible to change the canonical format but only references Regex expressions.
Or am I restricted to the masks [M], [D] and [Y]?
You can choose pretty much any format you want when displaying a date or time with <xforms:output>. However, when capturing a date or time with <xforms:input>, Orbeon Forms limits you to just a few formats, as documented.
The reason for this is somewhat technical: for inputs, Orbeon Forms needs to be able to both format the date/time in the format you specify, and to parse it. And the parsing is implemented to accept as many reasonable date or time formats entered by the user. For instance, if you choose a [M]/[D]/[Y] format (typical in the US), you can enter 12/2/2011, but also 12/2 (skipping the year), or even 2 (skipping both the year and the month), or today, as well as several other formats.
The bottom line is that because of this "smart parsing", the <xforms:input> can only support a number of predefined formats. Additional formats can be added, but this requires Orbeon Forms itself to be changed to support those additional formats.

Resources