Charset for exporting from SQL developer - character-encoding

When exporting from Oracle SQL Developer I've been mostly using 'Macroman', but this causes some problems when I further want to manipulate the data with Python, or when I import the CSV-files into Excel. Since some of the characters are Norwegian characters and special signs I often have to replace these by find-replace or other tedious methods. I have tried choosing both 'unicode' and 'utf-8' which I normally tend to prefer, but without any good effect. In python I end up with signs such as: '\xc3\xb8r', and in Excel '¢'.
Is there a good rule of thumb to get this right? What encoding is the most ubiquitous, and what works best with SQL Developer?

The problem what not so much in SQL Developer, but in Excel. I ended up using SQL Developer to export the dataset using UTF-8, and then transforming it into excel using Google Refine.

Related

Dealing with invalid characters from web scraping

I've written a web scraper to extract a large amount of information from a website using Nokigiri and Mechanize, which outputs a database seed file. Unfortunately, I've discovered there's a lot of invalid characters in the text on the source website, things like keppnisæfind, Scémario and Klätiring, which is preventing the seed file from running. The seed file is too large to go through with search and replace, so how can I go about dealing with this issue?
I think those are html characters, all you need do is to write functions that will clean the characters. This depends on the programming platform
Those are almost certainly UTF-8 characters; the words should look like keppnisæfind, Scémario and Klätiring. The web sites in question might be sending UTF-8 but not declaring that as their encoding, in which case you will have to force Mechanize to use UTF-8 for sites with no declared encoding. However, that might complicate matters if you encounter other web sites without a declared encoding and they send something besides UTF-8.

Flat file in delphi

In my application I want to use files for storing data. I don't want to use database or clear text file, the goal is to save double and integer values along with string just to identify the name of the record ; I simple need to save data on disk for generating reports. File can grow even to gigabyte. What format you suggest to use? Binary? If so what vcl component/library you know which is good to use? My goal is to create an application which creates and updates the files while another tool will "eat" those file
producing nice pdf reports for user on demand. What do you think? Any idea or suggestion?
Thanks in advance.
If you don't want to reinvent the wheel, you may find all needed Open Source tools for your task from our side:
Synopse Big Table to store huge amount of data - see in particular the TSynBigTableRecord class to store an unlimited number of records with fields, including indexes if needed - it will definitively be faster and use less disk size than any other regular SQL DB
Synopse SQLite3 Framework if you would rather use a standard SQLite engine for the storage - it comes with a full Client/Server ORM
Reporting from code, including pdf file generation
With full Source code, working from Delphi 6 up to XE.
I've just updated the documentation of the framework. More than 600 pages, with details of every class method, and new enhanced general introduction. See the SAD document.
Update: If you plan to use SQLite, you should first guess how the data will be stored, which indexes are to be created, and how a SQL query may speed up your requests. It's a bad idea to read all file content for every request: you should better structure your data so that a single SQL query would be able to return the expended results. Sometimes, using additional values (like temporary sums or means) to the data is a good idea. Also consider using the RTree virtual table of SQLite3, which is dedicated to speed up access to double min/max multi-dimensional data: it may speed up a lot your requests.
You don't want to use a full SQL database, and you think that a plain text file is too simple.
Points in between those include:
Something that isn't a full SQL database, but more of a key-value store, would technically not be a flat file, but it does provide a single "key+value" list, that is quickly searchable on a single primary key. Such as BSDDB. It has the letter D and B in the name. Does that make it a database, in your view? Because it's not a relational database, and doesn't do SQL. It's just a binary key-value (hashtable) blob storage mechanism, using a well-understood binary file format. Personally, I wouldn't start a new project and use anything in this category.
Recommended: Something that uses SQL but isn't as large as standalone SQL database servers. For example, you could use SQLite and a delphi wrapper. It is well tested, and used in lots of C/C++ and Delphi applications, and can be trusted more than anything you could roll yourself. It is a very light embedded database, and is trusted by many.
Roll your own ISAM, or VLIR, which will eventually morph over time into your own in-house DBMS. There are multiple files involved, and there are indexes, so you can look up data fast without loading everything into memory. Not recommended.
The most flat of flat binary fixed-record-length files. You mentioned originally in your question, power basic which has something called Random Access files, and then you deleted that from your question. Probably what you are looking for, especially for append-only write as the primary operation. Roll your own TurboPascal era "file of record". If you use the "FILE OF RECORD" type, you hit the 2gb limit, and there are problems with Unicode. So use TStream instead, like this. Binary file formats have a lot of strikes against them, especially since it is difficult to grow and expand your binary file format over time, without breaking your ability to read old files. This is a key reason why I would recommend you start out with what might at first seem like overkill (SQLite) instead of rolling your own binary solution.
(Update 2: After updating the question to mention PDFs and what sounds like a reporting-system requirement, I think you really should be using a real database but perhaps a small and easy to use one, like firebird, or interbase.)
I would suggest using TClientDataSet, and use it's SaveToFile() / SaveToStream() methods by the generating program, and LoadFromFile() / LoadFromStream() methods for the program that will "consume" the data. That way, you can still make indexed records without connecting to any external database, all while keeping the interchange data in a single file.
Define API to work with your flat file, so that the API can be implemented by a separate data layer in many ways.
Implement the API using standard embedded SQL database (ex SQLite or Firebird).
Only if there is something wrong with the standard solution think of your own.
I use KBMMemtable - see http://www.components4developers.com/ - fast, reliable, been around a long time - supports binary and CSV streaming in and out of files, as well indexing, filters, and lots of other goodies - TClientDataSet will not do well with large datasets.

What is the best way to set a collation in Informix IDS?

I'm administrating a informix IDS DBMS in Argentina. We speak spanish, and the traditional ASCII caracter Set of Informix doesn't fit our needs.
I've been fooling around, and make it work with the DB_LOCALE variable. But i've seen some other call CLIENT_LOCALE and SERVER_LOCALE. Should i use them? Is it enough with DB_LOCALE
Thanks.
You mainly need to set CLIENT_LOCALE and DB_LOCALE - to es_es.8859-1 or something similar (maybe es_ar.8859-1, but you would probably need to get the ILS International Language Supplement to get that, assuming it is available at all).
The server locale controls the language used when the server reports errors. Some of the messages in the server log files would be given in Spanish rather than English.
The DB_LOCALE controls how the data is sorted in the database in indexes. It is most critical when the database is created; if it is not set, the database will be assumed to be in US English (American). You should normally set DB_LOCALE when accessing the database too, though it isn't quite as critical. The CLIENT_LOCALE should be set too. Usually, these values are the same. Sometimes, though, you have a Windows client running using a Microsoft code page for Spanish (CP 1252, I think) and a Unix server using 8859-1 or perhaps 8859-15. In those cases, the GLS (Global Language Support) library will automatically take care of codeset conversion for you.

How does COBOL store and retrieve data? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I'm starting to learn about COBOL. I have some experience writing programs that deal with SQL databases and I guess I'm confused how COBOL stores and retrieves data that is stored in a mainframe for example. I know that it's not like relational databases but every example program I've seen takes data straight from the command line and I know that's not how real world COBOL programs process the data. Can someone explain or show me a good resource that can explain it?
COBOL is just another third generation computer language. It is just a little older than most which doesn't mean it is somehow incomplete (actually it comes with quite a bit of baggage - but that is another story).
As with any other third generation language, COBOL manipulates data files in pretty much the same way that you would in a C program. Nothing odd, mysterious or magical about it. Files are opened, read, written and closed using the file I/O features of the language.
Various mechanisms are used to form a link between an actual file and the program. The details here are often specific to the operating system you are working under. Generally, COBOL implementations try to isolate themselves from the operating environment through a logical file name as opposed to an actual name. This added indirection is important when you are writing programs that will be ported to different platforms (e.g. write and test within an IDE on a Windows platform, and then run on a mainframe).
The following examples relate to an IBM Mainframe environment.
Within the IBM mainframe world, you will find that programs are run as either batch or on-line (e.g. CICS). I will not describe how to set up for file I/O under CICS (that's a long story). Programs that are used to manipulate files are usually batch. Here is a rough illustration of how a batch program works:
Batch programs are run via JCL. JCL is used to identify the program to run ('EXEC' statement) and to identify which files your program will reference using 'DD' statements. The function of a DD Statement is to form a logical connection between an actual file and a name your COBOL program will reference when it wants to refer to the file (this is the isolation mechanism mentioned earlier). For example,
JCLDDNAM DD DSN='HLQ.MY.FILE'...
would associate the 'DD' name 'JCLDDNAM' to the file named 'HLQ.MY.FILE'. This part is platform dependant so the details are specific to the operating environment.
In the 'FILE-CONTROL' section of your COBOL program, you connect the 'DD NAME' defined in your JCL with the name you will use on each I/O statement to reference that file. This connection is defined using the 'SELECT' statement.
For example,
SELECT MYFILE
ASSIGN JCLDDNAM
remainder of select
makes a connection between whatever file you associated with 'JCLDDNAM' in your 'JCL' to 'MYFILE' that you will later reference in COBOL I/O statements. The SELECT statement itself is part of the ISO COBOL standard. However, many COBOL implementations define some non-standard extentions to facilitate various quirks to their file subsystems.
Open, read, write, close files within the 'PROCEDURE DIVISION' of you program using the name 'MYFILE' as in:
OPEN MYFILE
READ MYFILE
CLOSE MYFILE
The above is highly simplified, and there are a multitude of ways to do this within COBOL. Understanding the complete picture will take some real effort, time and practice. The I/O statements illustrated above are part of the COBOL standard, but every vendor will have their own extentions.
IBM COBOL supports a wide range of file organizations and access methods. You can review the IBM Enterprise COBOL Language Reference manual here to get the syntax and rules for file manipulation, However, the User Guide provides a lot of good examples for reading/writing files (you will have to dig a bit—but it is all laid out for you).
The setup to reference an SQL database via a COBOL program is somewhat different but involves setting up a connection between your program and the database subsystem. Within the IBM world this is done throug JCL, other environments will use different mechanisms.
IBM COBOL uses a pre-processor or co-processor to integrate database access and data exchange. For example, the following code would retrieve some data from a DB2 database:
MOVE 1234 TO PERSON-ID
EXEC SQL
SELECT FIRST_NAME, LAST_NAME
INTO :FIRST-NAME, :LAST-NAME
FROM PERSON
WHERE PERSON_ID = :PERSON-ID
END-EXEC
DISPLAY PERSON-ID FIRST-NAME LAST-NAME
The stuff between EXEC SQL and END-EXEC is a pretty simple SQL select statement. The names preceded by colons are COBOL host variables used to pass data to DB2 or receive it back. If you have ever coded database access routines before this should look very familiar to you. This link provides a simple introduction to incorporating SQL statements into an IBM Enterpirse COBOL program.
By the way, IBM Enterprise COBOL is capable of working with XML documents too. Sorry for the heavy IBM slant, but that is the environment I am most familiar with.
Hope this gets you started in the right direction.
Who said you cannot use SQL to retrieve data from a cobol application, maybe without spending money?
A company I used to work for, did just that - with SQLite. This little gem of a public domain library compiles SQL statements to bytecode, then it executes them.
By replacing the "backend" level of SQLite with a custom interface to the C library that deals with Cobol files, it was possible to query the Cobol data from other languages, Python in that case. It worked -- within the limits of SQLite of course, but it was stable, it seemed relational enough and it didn't even require a DB server :-)
Traditional COBOL batch environments use a 'data section' of the cobol program to directly declare database connections, which are in turn set up in JCL. Since COBOL predates SQL, those would have tended to be various other types of databases, but it's likely that IBM made SQL work with DB/2. I imagine you'll get another answer from someone closer to this stuff. If you look at the SQL preprocessors available for use with other languages you'll get the idea -- a cursor becomes a native datatype and delivers query results to native variables.
Mainframe Cobol uses embedded SQL (kinda like SQLj), e.g.:
Procedure Division.
Exec SQL
Select col1, col2
from myTable
into :ws-col1, :ws-col2
where col0 = :col0
End-Exec
In this case, the host variables ws-col0, ws-col1 and ws-col2 are defined in the working-storage section. The database interface manages getting that data in the right place.
Very easy compared to the distributed stuff actually.
All the IBM mainframe shop's I've worked in have used COBOL which talked with a relational database. Generally that has been IBM's DB2. Please note that DB2 is a relational database that run's on mainframes. It can also be run on Windows or Linux.
Twenty years ago a predominate way to enter data into the DB2 mainframe database was to use CICS. CICS is "presentation level" software that comminicates with character based data entry screens. Consider CICS the functional equivelant of PHP or ASP.NET.
Today there are many more options to get data into DB2. CICS is still an option but your "presentation layer" could be PHP, ASP.NET, Win Forms, Java JSF, Powerbuilder. The key thing is that your development platform would need to be able to work with a DB2 database driver. The platform could be Windows, Linux, and possibly others.
My point is that data can get into the mainframe DB2 database in a many ways from many platforms. The COBOL language might be involved in data entry, reporting, altering data COBOL, etc. But it might only be part of an multiple tier application that could be part Windows, web, and mainframe . I could give specific examples if you have more information about the application you'll be working with at your internship.

Reading Excel spreadsheets with Delphi

I need to read from and write to Excel spreadsheets using Delphi 2010. Nothing fancy. Just reading and writing values from specific cells and ranges on different sheets. Needs to work without having Excel installed and support Excel 2007.
Some things I've looked at:
I've tried using ADO, which works OK for selecting everything in an entire sheet, but I haven't had much luck reading specific cells or ranges.
NativeExcel looked promising, but it doesn't seem to be in active development, and they don't respond to e-mails.
Axolot has a couple of products. The main product seems to be very functional, but is pricey. They have a lite version, but it doesn't support Delphi 2010.
Any recommendations? Free would be great, but I'm open to a commercial solution as long as it's reliable and well supported.
TMS Flexcel - I know it looks like a reporting component for Excel (which it does very well and is a very handy tool to have in your toolkit) but it also includes components for reading and displaying Excel files. I've been very impressed with how well Adrian Gallero seems to know the Excel API, including Excel 2007.
http://www.tmssoftware.com/site/flexcel.asp
Not free of course, but at 75 Euros I think it's good value.
I've had very good luck with ADO, provided the Excel sheet is a fairly straight-forward row/column layout.
The key with using ADO, I've found, is treating the Excel sheet like a database. If your Excel sheets are primarily straight row and column layouts, just treat the rows as database records and the columns as fields. Navigate to the desired row first by searching for a particular column (field) value (preferably something unique), and then read the desired cell in that row by referencing the field that is the column name.
If your Excel sheets are more free-form, then it will be more difficult.
Don't write off NativeExcel. I have used it for a couple of years now with excellent results. It's fast and versatile. I use it to produce a nicely formatted multi-page spreadsheet with frozen panes, formulas in cells, and data from a client's database for them to use for input and then send back to me. My clients were really thrilled when they got the first spreadsheet from me because it reduced their workload tremendously and it was fairly intuitive for them to use.
I don't know why they have not responded to you because I have updated their package at least a couple of time over the last two years. When my license expires, I definitely intend to renew.
I'd recommend SMImport / SMExport from http://www.scalabium.com
Mike has always been really helpful and quick to respond.
What really helps is if you have some kind of control over the layout of the excel file.
I have built a whole unit and acceptance testing framework where the data and the test controls are all contained within an Excel spreadsheet.
I did everything through ADO. You can restrict your ADO SQL query to a whole sheet, a named range or any range for the matter. In my opinion and experience, this method is very powerful.
Two things that did cause me some problems :
1. depending on how your sheets are named, ADO might or might not see them (again, if you have control over the layout, great !)
2. be careful about the data type ADO returns when you read data i.e. it might show numbers as strings. This is because ADO tries, IIRC, to guess the data type based on the first few rows.
Disclaimer : I have never used any of the tools mentioned above. ADO did the trick for me, and I feel more in control since I wrote the code for my framework (save the ADO part obviously...).
Bruce, I've used the Axolot XLSReadWriteII component for going on 10 years now. It's been very good, and their support forums (while lite on content) seem to be monitored pretty well. The XLSReadWriteII2 version is blindingly fast, and supports all sorts of things like charts and graphics, named ranges, adding formulas on the fly, cell formatting (including borders and shading, merging cells, vertical and horizontal alignment, auto-width column sizing, and so forth).
I haven't upgraded to the latest version (we're still using XLSReadWriteII2) because we can still use the Excel XP format files, and I haven't used the XLSMini at all. I can say really good things about the full product, though; in fact, I just used it for a couple of database export things this past week.
If you decide to go that route, I have a bunch of notes about how to do different things that might be useful; if you want them, drop me a note. I also have a Delphi 2007 app that just shows how to do different formatting and alignments; I actually reproduced an existing, fairly complex report in Excel complete with all of the formats, borders, etc. that I'd be glad to let you have as well.
DISCLAIMER: I have no connection with Axolot or any of their employees. I'm just a very happy customer who learned of the product at a previous job, and was impressed enough to buy it when I started my current one.
Don't bother with Axolot's XLSMini (lite) version. I haven't purchased either of them yet, but I asked about Excel 2007 support in early 2008 and Lars told me XLSMini was based on the XLSReadWriteII and that both would be updated with Excel 2007 support at the same time. XLSReadWriteII has had Excel 2007 support since April 2008; XLSMini still doesn't have it.
With great luck, I've used Axolot for some years now. The support forum isn't exactly brimming with messages, but maybe that's because it works so well?
You can use ADO connection string like
http://www.connectionstrings.com/excel
than include the options (in the third tab of ado connection string):
Extended Properties=Excel 8.0;HDR=Yes;IMEX=0
for security purposes Microsoft prevent modifications (with IMEX=1)
http://support.microsoft.com/kb/904953/en
Sample SQL (don't forget the brackets):
SELECT * FROM [Sheet1$]
The only thing you can't do, is deletion:
http://support.microsoft.com/kb/257819/en
So to delete a row make it empty!
You can also use SQL via ADO to export:
YourADOConnection.Execute('SELECT * INTO aSheet IN "'+ExtractFilePath(ParamStr(0))+'Exported.xls" "Excel 8.0;" FROM YourTable');
I would advise to go for an option where you don't need Excel installed on the machine. I once used a component that could easily fill in some data in one sheet without needing excel installed. I would also do most of the Excel work in the Excel sheet itself. And just use the components to fill in some data on the sheet.
My 2cts.

Resources