DBF Large Char Field - oledb

I have a database file that I beleive was created with Clipper but can't say for sure (I have .ntx files for indexes which I understand is what Clipper uses). I am trying to create a C# application that will read this database using the System.Data.OleDB namespace.
For the most part I can sucessfully read the contents of the tables there is one field that I cannot. This field called CTRLNUMS that is defined as a CHAR(750). I have read various articles found through Google searches that suggest field larger than 255 chars have to be read through a different process than the normal assignment to a string variable. So far I have not been successful in an approach that I have found.
The following is a sample code snippet I am using to read the table and includes two options I used to read the CTRLNUMS field. Both options resulted in 238 characters being returned even though there is 750 characters stored in the field.
Here is my connection string:
Provider=Microsoft.Jet.OLEDB.4.0;Data Source=c:\datadir;Extended Properties=DBASE IV;
Can anyone tell me the secret to reading larger fields from a DBF file?
using (OleDbConnection conn = new OleDbConnection(connectionString))
{
conn.Open();
using (OleDbCommand cmd = new OleDbCommand())
{
cmd.Connection = conn;
cmd.CommandType = CommandType.Text;
cmd.CommandText = string.Format("SELECT ITEM,CTRLNUMS FROM STUFF WHERE ITEM = '{0}'", stuffId);
using (OleDbDataReader dr = cmd.ExecuteReader())
{
if (dr.Read())
{
stuff.StuffId = dr["ITEM"].ToString();
// OPTION 1
string ctrlNums = dr["CTRLNUMS"].ToString();
// OPTION 2
char[] buffer = new char[750];
int index = 0;
int readSize = 5;
while (index < 750)
{
long charsRead = dr.GetChars(dr.GetOrdinal("CTRLNUMS"), index, buffer, index, readSize);
index += (int)charsRead;
if (charsRead < readSize)
{
break;
}
}
}
}
}
}

You can find a description of the DBF structure here: http://www.dbf2002.com/dbf-file-format.html
What I think Clipper used to do was modify the Field structure so that, in Character fields, the Decimal Places held the high-order byte of the size, so Character field sizes were really 256*Decimals+Size.
I may have a C# class that reads dbfs (natively, not ADO/DAO), it could be modified to handle this case. Let me know if you're interested.

Are you still looking for an answer? Is this a one-off job or something that needs doing regularly?
I have a Python module that is primarily intended to extract data from all kinds of DBF files ... it doesn't yet handle the length_high_byte = decimal_places hack, but it's a trivial change. I'd be quite happy to (a) share this with you and/or (b) get a copy of such a DBF file for testing.
Added later: Extended-length feature added, and tested against files I've created myself. Offer to share code with anyone who would like to test it still stands. Still interested in getting some "real" files myself for testing.

3 suggestions that might be worth a shot...
1 - use Access to create a linked table to the DBF file, then use .Net to hit the table in the access database instead of going direct to the DBF.
2 - try the FoxPro OLEDB provider
3 - parse the DBF file by hand. Example is here.
My guess is that #1 should work the easiest, and #3 will give you the opportunity to fine tune your cussing skills. :)

Related

ImageJ/Fiji - Save CSV using macro

I am not a coder but trying to turn ThunderSTORM's batch process into an automated one where I have a single input folder and a single output folder.
input_directory = newArray("C:\\Users\\me\\Desktop\\Images");
output_directory = ("C:\\Users\\me\\Desktop\\Results");
for(i = 0; i < input_directory.length; i++) {
open(input_directory[i]);
originalName = getTitle();
originalNameWithoutExt = replace( originalName , ".tif" , "" );
fileName = originalNameWithoutExt;
run("Run analysis", "filter=[Wavelet filter (B-Spline)] scale=2.0 order=3 detector "+
"detector=[Local maximum] connectivity=8-neighbourhood threshold=std(Wave.F1) "+
"estimator=[PSF: Integrated Gaussian] sigma=1.6 method=[Weighted Least squares] fitradius=3 mfaenabled=false "+
"renderer=[Averaged shifted histograms] magnification=5.0 colorizez=true shifts=2 "+
"repaint=50 threed=false");
saveAs(fileName+"_Results", output_directory);
}
This probably looks like a huge mess but the original batch file used arrays and I can't figure out what that is. Taking it out brakes it so I left it in. The main issues I have revolve around the saveAs part not working.
Using run("Export Results") works but I need to manually pick a location and file name. I tried to set this up to take the file name and rename it to the generic image name so it can save a CSV using that name.
Any help pointing out why I'm a moron? I would also love to only open one file at a time (this opens them all) and close it when the analysis is complete. But I will settle for that happening on a different day if I can just manage to save the damn CSV automatically.
For the most part, I broke the code a whole bunch of times but it's in a working condition like this.
I appreciate any and all help. Thank you!

Uploading a File: MemoryStream vs. File System

In a business app I am creating, we allow our administrators to upload a CSV file with certain data that gets parsed and entered into our databases (all appropriate error handling is occurring, etc.).
As part of an upgrade to .NET 4.5, I had to update a few aspects of this code, and, while I was doing so, I ran across this answer of someone who is using MemoryStream to handle uploaded files as opposed to temporarily saving to the file system. There's no real reason for me to change (and maybe it's even bad to), but I wanted to give it a shot to learn a bit. So, I quickly swapped out this code (from a strongly-typed model due to the upload of other metadata):
HttpPostedFileBase file = model.File;
var fileName = Path.GetFileName(file.FileName);
var path = Path.Combine(Server.MapPath("~/App_Data/Uploads"), fileName);
file.SaveAs(path);
CsvParser csvParser = new CsvParser();
Product product = csvParser.Parse(path);
this.repository.Insert(product);
this.repository.Save();
return View("Details", product);
to this:
using (MemoryStream memoryStream = new MemoryStream())
{
model.File.InputStream.CopyTo(memoryStream);
CsvParser csvParser = new CsvParser();
Product product = csvParser.Parse(memoryStream);
this.repository.Insert(product);
this.repository.Save();
return View("Details", product);
}
Unfortunately, things break when I do this - all my data is coming out with null values and it seems as though there is nothing actually in the MemoryStream (though I'm not positive about this). I know this may be a long shot, but is there anything obvious that I'm missing here or something I can do to better debug this?
You need to add the following:
model.File.InputStream.CopyTo(memoryStream);
memoryStream.Position = 0;
...
Product product = csvParser.Parse(memoryStream);
When you copy the file into the MemoryStream, the pointer is moved to the end of the stream, so when you then try to read it, you're getting a null byte instead of your stream data. You just need to reset the position to the start, i.e. 0.
The problem I believe is your memoryStream has it's position set to the end, and I'm guessing that your CSVParser is only processing from that point onwards which there is no data.
To fix you can simply set the memoryStream position to 0 before you parse it with your csvParser.
memoryStream.Position = 0;

Corrupting Access databases when inserting values

Recently, a program that creates an Access db (a requirement of our downstream partner), adds a table with all memo columns, and then inserts a bunch of records stopped working. Oddly, there were no changes in the environment that I could see and nothing in any diffs that could have affected it. Furthermore, this repros on any machine I've tried, whether it has Office or not and if it has Office, whether it's 32- or 64-bit.
The problem is that when you open the db after the program runs, the destination table is empty and instead there's a MSysCompactError table with a bunch of rows.
Here's the distilled code:
var connectionString = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=corrupt.mdb;Jet OLEDB:Engine Type=5";
// create the db and make a table
var cat = new ADOX.Catalog();
try
{
cat.Create(connectionString);
var tbl = new ADOX.Table();
try
{
tbl.Name = "tbl";
tbl.Columns.Append("a", ADOX.DataTypeEnum.adLongVarWChar);
cat.Tables.Append(tbl);
}
finally
{
Marshal.ReleaseComObject(tbl);
}
}
finally
{
cat.ActiveConnection.Close();
Marshal.ReleaseComObject(cat);
}
using (var connection = new OleDbConnection(connectionString))
{
connection.Open();
// insert a value
using (var cmd = new OleDbCommand("INSERT INTO [tbl] VALUES ( 'x' )", connection))
cmd.ExecuteNonQuery();
}
Here are a couple of workarounds I've stumbled into:
If you insert a breakpoint between creating the table and inserting the value (line 28 above), and you open the mdb with Access and close it again, then when the app continues it will not corrupt the database.
Changing the engine type from 5 to 4 (line 1) will create an uncorrupted mdb. You end up with an obsolete mdb version but the table has values and there's no MSysCompactError. Note that I've tried creating a database this way and then upgrading it to 5 programmatically at the end with no luck. I end up with a corrupt db in the newest version.
If you change from memo to text fields by changing the adLongVarWChar on line 13 to adVarWChar, then the database isn't corrupt. You end up with text fields in the db instead of memo, though.
A final note: in my travels, I've seen that MSysCompactError is related to compacting the database, but I'm not doing anything explicit to make the db compact.
Any ideas?
As I replied to HasUp:
According MS support, creation of Jet databases programmatically is deprecated. I ended up checking in an empty model database and then copying it whenever I needed a new one. See http://support.microsoft.com/kb/318559 for more info.

Sending javaMail attachement of any type from database

I'm have domain class with property that represents files uploaded on my GSP. I've defined that file as byte array (byte [] file). When some specific action happens I'm sending mail with attachments from. This is part of my SendMail service:
int i = 1;
[requestInstance.picture1, requestInstance.picture2, requestInstance.picture3].each(){
if(it.length != 0){
DataSource image = new ByteArrayDataSource(it, "image/jpeg");
helper.addAttachment("image" + i + ".jpg", image);
i++;
}
}
This works fine with image files. But now I want to be able to work with all file types and I'm wondering how to implement this. Also, I want to save real file name in database. All help is welcomed.
You can see where the file name and MIME type are specified in your code. It should be straightforward to save and restore that information from your database along with the attachment data.
If you're trying to figure out from the byte array of data what the MIME type is and what a good filename would be, that's a harder problem. Try this.

When parsing XML, the character é is missing

I have an XML as input to a Java function that parses it and produces an output. Somewhere in the XML there is the word "stratégie". The output is "stratgie". How should I parse the XML as to get the "é" character as well?
The XML is not produced by myself, I get it as a response from a web service and I am positive that "stratégie" is included in it as "stratégie".
In the parser, I have:
public List<Item> GetItems(InputStream stream) {
try {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(stream);
doc.getDocumentElement().normalize();
NodeList nodeLst = doc.getElementsByTagName("item");
List<Item> items = new ArrayList<Item>();
Item currentItem = new Item();
Node node = nodeLst.item(0);
if (node.getNodeType() == Node.ELEMENT_NODE) {
Element item = (Element) node;
if(node.getChildNodes().getLength()==0){
return null;
}
NodeList title = item.getElementsByTagName("title");
Element titleElmnt = (Element) title.item(0);
if (null != titleElmnt)
currentItem.setTitle(titleElmnt.getChildNodes().item(0).getNodeValue());
....
Using the debugger, I can see that titleElmnt.getChildNodes().item(0).getNodeValue() is "stratgie" (without the é).
Thank you for your help.
I strongly suspect that either you're parsing it incorrectly or (rather more likely) it's just not being displayed properly. You haven't really told us anything about the code or how you're using the result, which makes it hard to give very concrete advice.
As ever with encoding issues, the first thing to do is work out exactly where data is getting lost. Lots of logging tends to be the way forward: create a small test case that demonstrates the problem (as small as you can get away with) and log everything about the data. Don't just try to log it as raw text: log the Unicode value of each character. That way your log will have all the information even if there are problems with the font or encoding you use to view the log.
The answer was here: http://www.yagudaev.com/programming/java/7-jsp-escaping-html
You can either use utf-8 and have the 'é' char in your document instead of é, or you need to have a parser that understand this entity which exists in HTML and XHTML and maybe other XML dialects but not in pure XML : in pure XML there's "only" ", <, > and maybe &apos; I don't remember.
Maybe you can need to specify those special-char entities in your DTD or XML Schema (I don't know which one you use) and tell your parser about it.

Resources