I'm consuming a remote API which returns an io.ReadCloser (so no way of getting the length without a read) and need to add a header to the response that includes the data length from the original io.Reader, then write the resulting io.Reader to an io.Writer.
Except for the omission of error-handling, the following functions are functionally correct. The addHeaderBytesBuffer function however, results in 7 allocations and the addHeaderTeeReader function results in 8:
func addHeaderBytesBuffer(in io.Reader) (out io.Reader) {
buf := new(bytes.Buffer)
io.Copy(buf, in)
header := createHeader(buf.Len())
return io.MultiReader(header, buf)
}
func addHeaderTeeReader(in io.Reader) (out io.Reader) {
buf := new(bytes.Buffer)
n, _ := io.Copy(ioutil.Discard, io.TeeReader(in, buf))
return io.MultiReader(createHeader(int(n)), buf)
}
func createHeader(length int) (out io.Reader) {
return strings.NewReader(fmt.Sprintf("HEADER:%d", length))
}
I'm maintaining a sync.Pool of bytes.Buffer instances to re-use to reduce GC pressure but is there a more efficient way to do this?
Related
I am working on a command line tool in Go called redis-mass that converts a bunch of redis commands into redis protocol format.
The first step was to port the node.js version, almost literally to Go. I used ioutil.ReadFile(inputFileName) to get a string version of the file and then returned an encoded string as output.
When I ran this on a file with 2,000,000 redis commands, it took about 8 seconds, compared to about 16 seconds with the node version. I guessed that the reason it was only twice as fast was because it was reading the whole file into memory first, so I changed my encoding function to accept a pair (raw io.Reader, enc io.Writer), and it looks like this:
func EncodeStream(raw io.Reader, enc io.Writer) {
var args []string
var length int
scanner := bufio.NewScanner(raw)
for scanner.Scan() {
command := strings.TrimSpace(scanner.Text())
args = parse(command)
length = len(args)
if length > 0 {
io.WriteString(enc, fmt.Sprintf("*%d\r\n", length))
for _, arg := range args {
io.WriteString(enc, fmt.Sprintf("$%d\r\n%s\r\n", len(arg), arg))
}
}
}
}
However, this took 12 seconds on the 2 million line file, so I used github.com/pkg/profile to see how it was using memory, and it looks like the number of memory allocations is huge:
# Alloc = 3162912
# TotalAlloc = 1248612816
# Mallocs = 46001048
# HeapAlloc = 3162912
Can I constrain the io.Writer to use a fixed sized buffer and avoid all those allocations?
More generally, how can I avoid excessive allocations in this method? Here's the full source for more context
Reduce allocations by working with []byte instead of strings. fmt.Printf directly to the output instead of fmt.Sprintf and io.WriteString.
func EncodeStream(raw io.Reader, enc io.Writer) {
var args []string
var length int
scanner := bufio.NewScanner(raw)
for scanner.Scan() {
command := bytes.TrimSpace(scanner.Bytes())
args = parse(command)
length = len(args)
if length > 0 {
fmt.Printf(enc, "*%d\r\n", length))
for _, arg := range args {
fmt.Printf(enc, "$%d\r\n%s\r\n", len(arg), arg))
}
}
}
}
Result example:
{collisions=0, rx_bytes=258, rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=3, tx_bytes=648, tx_dropped=0, tx_errors=0, tx_packets=8}
This format is like JSON, but not JSON.
Is there an easy way to parse this into map[string]int? Like json.Unmarshal(data, &value).
If that transport format is not recursively defined, i.e. a key cannot start a sub-structure, then its language is regular. As such, you can soundly parse it with Go's standard regexp package:
Playground link.
package main
import (
"fmt"
"regexp"
"strconv"
)
const data = `{collisions=0, rx_bytes=258, rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=3, tx_bytes=648, tx_dropped=0, tx_errors=0, tx_packets=8}`
const regex = `([a-z_]+)=([0-9]+)`
func main() {
ms := regexp.MustCompile(regex).FindAllStringSubmatch(data, -1)
vs := make(map[string]int)
for _, m := range ms {
v, _ := strconv.Atoi(m[2])
vs[m[1]] = v
}
fmt.Printf("%#v\n", vs)
}
Regexp by #thwd is an elegant solution.
You can get a more efficient solution by using strings.Split() to split by comma-space (", ") to get the pairs, and then split again by the equal sign ("=") to get the key-value pairs. After that you just need to put these into a map:
func Parse(s string) (m map[string]int, err error) {
if len(s) < 2 || s[0] != '{' || s[len(s)-1] != '}' {
return nil, fmt.Errorf("Invalid input, no wrapping brackets!")
}
m = make(map[string]int)
for _, v := range strings.Split(s[1:len(s)-1], ", ") {
parts := strings.Split(v, "=")
if len(parts) != 2 {
return nil, fmt.Errorf("Equal sign not found in: %s", v)
}
if m[parts[0]], err = strconv.Atoi(parts[1]); err != nil {
return nil, err
}
}
return
}
Using it:
s := "{collisions=0, rx_bytes=258, ...}"
fmt.Println(Parse(s))
Try it on the Go Playground.
Note: If performance is important, this can be improved by not using strings.Split() in the outer loop, but instead searching for the comma "manually" and maintaining indices, and only "take out" substrings that represent actual keys and values (but this solution would be more complex).
Another solution...
...but this option is much slower so it is only viable if performance is not a key requirement: you can turn your input string into a valid JSON format and after that you can use json.Unmarshal(). Error checks omitted:
s := "{collisions=0, rx_bytes=258, ...}"
// Turn into valid JSON:
s = strings.Replace(s, `=`, `":`, -1)
s = strings.Replace(s, `, `, `, "`, -1)
s = strings.Replace(s, `{`, `{"`, -1)
// And now simply unmarshal:
m := make(map[string]int)
json.Unmarshal([]byte(s), &m)
fmt.Println(m)
Advantage of this solution is that this also works if the destination value you unmarhsal into is a struct:
// Unmarshal into a struct (you don't have to care about all fields)
st := struct {
Collisions int `json:"collisions"`
Rx_bytes int `json:"rx_bytes"`
}{}
json.Unmarshal([]byte(s), &st)
fmt.Printf("%+v\n", st)
Try these on the Go Playground.
I'm having some problems addressing logical drives. For clarity, my definition of 'Physical Disk' (PD) is the raw disk regardless of partitioning. 'Logical Drive' (LD) refers to a volume such as Drive E:, Drive F: etc.
Using the examples from RRUZ (my hero SO member) and implementing the WMI Class I have created a Freepascal program for reading disks. I address PD by \.\PhyscialDiskX and that works fine by the examples created by RRUZ (here). I can read all the bytes no problem for PD's.
I use the same handle technique for logical volumes, which are \?\E: or \?\F: etc. I then use IOCTL_DISK_GET_LENGTH_INFO to get the length of the PD or LV and then read the byte range until ReadBytes = TotalLength. I read on the MSDN website that it will automatically retrieve the size of whatever device handle it is passed - PD or LD alike. And indeed I have checked the szie values returned by my program againzt WinHex, FTK Imager, HxD and several other low level disk tools. With the exception of 1 byte variances caused by zero or 1 starting positions, they match.
However, for some reason, my program is failing to acquire the final 32Kb on Windows 7 Pro 64-bit, despite running the program as administrator. It reads the whole disk and then on the final buffer read (which is done as 64Kb buffers) BytesRead returns -1. Using the debugger I worked out the following values :
493,846,527 exact LV size of Drive F:
493,813,760 total bytes read at time of failure
32,767 bytes missing
The result of the following
BytesRead := FileRead(hDiskHandle, Buffer, (DiskSize - TotalBytesRead));
is -1 on the final buffer read. THis the line that tests for the end of the disk by saying "if the amount left to read is less than the size of the buffer size, only try to read what is left". So, the value of bytes being asked to be stored by FileRead at the end is 32,767 (because DiskSize - TotalBytesRead at that point is 32,767, meaning that's how many bytes are left to read of the disk). The designated size of buffer is 64Kb. My understanding is that you can put less in a buffer than it is capable of holding but not more (FileRead states : "Buffer must be at least Count bytes long. No checking on this is performed"? IS that correct? If it is not then this may be (and probably is) the issue.
I don't know if it's due to IOCTL_DISK_GET_LENGTH_INFO, the buffer storage or something else? Hoping someone can help? I have also posted along with some screenshot at the Lazarus Freepascal forum. Here is my relevant code sections:
The handle:
// Create handle to source disk. Abort if fails
hSelectedDisk := CreateFileW(PWideChar(SourceDevice), FILE_READ_DATA,
FILE_SHARE_READ, nil, OPEN_EXISTING, FILE_FLAG_SEQUENTIAL_SCAN, 0);
if hSelectedDisk = INVALID_HANDLE_VALUE then
begin
RaiseLastOSError;
end
Compute the size ion bytes of the given device:
ExactDiskSize := GetDiskLengthInBytes(hSelectedDisk);
Now read the device and store the input as a flat file
ImageResult := WindowsImageDisk(hSelectedDisk, ExactDiskSize, HashChoice, hImageName);
Functions for the above:
function GetDiskLengthInBytes(hSelectedDisk : THandle) : Int64;
const
// These are defined at the MSDN.Microsoft.com website for DeviceIOControl
// and https://forum.tuts4you.com/topic/22361-deviceiocontrol-ioctl-codes/
{
IOCTL_DISK_GET_DRIVE_GEOMETRY = $0070000
IOCTL_DISK_GET_PARTITION_INFO = $0074004
IOCTL_DISK_SET_PARTITION_INFO = $007C008
IOCTL_DISK_GET_DRIVE_LAYOUT = $007400C
IOCTL_DISK_SET_DRIVE_LAYOUT = $007C010
IOCTL_DISK_VERIFY = $0070014
IOCTL_DISK_FORMAT_TRACKS = $007C018
IOCTL_DISK_REASSIGN_BLOCKS = $007C01C
IOCTL_DISK_PERFORMANCE = $0070020
IOCTL_DISK_IS_WRITABLE = $0070024
IOCTL_DISK_LOGGING = $0070028
IOCTL_DISK_FORMAT_TRACKS_EX = $007C02C
IOCTL_DISK_HISTOGRAM_STRUCTURE = $0070030
IOCTL_DISK_HISTOGRAM_DATA = $0070034
IOCTL_DISK_HISTOGRAM_RESET = $0070038
IOCTL_DISK_REQUEST_STRUCTURE = $007003C
IOCTL_DISK_REQUEST_DATA = $0070040
IOCTL_DISK_CONTROLLER_NUMBER = $0070044
IOCTL_DISK_GET_PARTITION_INFO_EX = $0070048
IOCTL_DISK_SET_PARTITION_INFO_EX = $007C04C
IOCTL_DISK_GET_DRIVE_LAYOUT_EX = $0070050
IOCTL_DISK_SET_DRIVE_LAYOUT_EX = $007C054
IOCTL_DISK_CREATE_DISK = $007C058
IOCTL_DISK_GET_LENGTH_INFO = $007405C // Our constant...
SMART_GET_VERSION = $0074080
SMART_SEND_DRIVE_COMMAND = $007C084
SMART_RCV_DRIVE_DATA = $007C088
IOCTL_DISK_GET_DRIVE_GEOMETRY_EX = $00700A0
IOCTL_DISK_UPDATE_DRIVE_SIZE = $007C0C8
IOCTL_DISK_GROW_PARTITION = $007C0D0
IOCTL_DISK_GET_CACHE_INFORMATION = $00740D4
IOCTL_DISK_SET_CACHE_INFORMATION = $007C0D8
IOCTL_DISK_GET_WRITE_CACHE_STATE = $00740DC
IOCTL_DISK_DELETE_DRIVE_LAYOUT = $007C100
IOCTL_DISK_UPDATE_PROPERTIES = $0070140
IOCTL_DISK_FORMAT_DRIVE = $007C3CC
IOCTL_DISK_SENSE_DEVICE = $00703E0
IOCTL_DISK_INTERNAL_SET_VERIFY = $0070403
IOCTL_DISK_INTERNAL_CLEAR_VERIFY = $0070407
IOCTL_DISK_INTERNAL_SET_NOTIFY = $0070408
IOCTL_DISK_CHECK_VERIFY = $0074800
IOCTL_DISK_MEDIA_REMOVAL = $0074804
IOCTL_DISK_EJECT_MEDIA = $0074808
IOCTL_DISK_LOAD_MEDIA = $007480C
IOCTL_DISK_RESERVE = $0074810
IOCTL_DISK_RELEASE = $0074814
IOCTL_DISK_FIND_NEW_DEVICES = $0074818
IOCTL_DISK_GET_MEDIA_TYPES = $0070C00
}
IOCTL_DISK_GET_LENGTH_INFO = $0007405C;
type
TDiskLength = packed record
Length : Int64;
end;
var
BytesReturned: DWORD;
DLength: TDiskLength;
ByteSize: int64;
begin
BytesReturned := 0;
// Get the length, in bytes, of the physical disk
if not DeviceIOControl(hSelectedDisk, IOCTL_DISK_GET_LENGTH_INFO, nil, 0,
#DLength, SizeOf(TDiskLength), BytesReturned, nil) then
raise Exception.Create('Unable to determine byte capacity of disk.');
ByteSize := DLength.Length;
ShowMessage(IntToStr(ByteSize));
result := ByteSize;
end;
The disk reader function
function WindowsImageDisk(hDiskHandle : THandle; DiskSize : Int64; HashChoice : Integer; hImageName : THandle) : Int64;
var
Buffer : array [0..65535] of Byte; // 1048576 (1Mb) or 262144 (240Kb) or 131072 (120Kb buffer) or 65536 (64Kb buffer)
BytesRead : integer;
NewPos, SectorCount,
TotalBytesRead, BytesWritten, TotalBytesWritten : Int64;
...
// Now to seek to start of device
FileSeek(hDiskHandle, 0, 0);
repeat
// Read device in buffered segments. Hash the disk and image portions as we go
if (DiskSize - TotalBytesRead) < SizeOf(Buffer) then
begin
// Read 65535 or less bytes
BytesRead := FileRead(hDiskHandle, Buffer, (DiskSize - TotalBytesRead));
BytesWritten := FileWrite(hImageName, Buffer, BytesRead);
end
else
begin
// Read 65536 (64kb) at a time
BytesRead := FileRead(hDiskHandle, Buffer, SizeOf(Buffer));
BytesWritten := FileWrite(hImageName, Buffer, BytesRead);
end;
if BytesRead = -1 then
begin
ShowMessage('There was a read error encountered. Aborting');
// ERROR IS THROWN AT THIS POINT ONLY WITH LD's - not PD's
exit;
end
else
begin
inc(TotalBytesRead, BytesRead);
inc(TotalBytesWritten, BytesWritten);
NewPos := NewPos + BytesRead;
...
until (TotalBytesRead = DiskSize);
Probably, it is a boundary check error. Quote from MSDN (CreateFile, note on opening physical drives and volumes, which you call logical drives):
To read or write to the last few sectors of the volume, you must call DeviceIoControl and specify FSCTL_ALLOW_EXTENDED_DASD_IO
I suspect that the problem stems from the use of 64-bit integers and arithmetic to calculate a value passed as a 32-bit Integer:
FileRead(hDiskHandle, Buffer, (DiskSize - TotalBytesRead));
I cannot explain with certainty why this might affect only LD's and not PD's, except to speculate that there may be some difference in the reported DiskSize that somehow avoids the Int64 arithmetic/32-bit problem in that case.
e.g. If the 32-bit truncated result of the Int64 arithmetic is NEGATIVE (which requires only that the high bit be set, i.e. 1 not 0) then FileRead() will return -1 since a negative value for "bytes to read" is invalid.
But if the high-bit in the result is NOT set, resulting in a positive value, then even if that value is significantly greater than 64KB this will not cause an error since this invocation is called only when you have already determined that there are fewer than 64K bytes to be read. The 32-bit truncated Int64 arithmetic may result in a request to read 2 BILLION bytes but FileRead() is only going to read the actual 32K bytes that remain anyway.
However, this very fact points to a solution (assuming that this diagnosis is correct).
As noted, FileRead() (which is just a wrapper around ReadFile(), on Windows) will read either the number of bytes specified or as many bytes remain to be read, whichever is lower.
So if you specify 64KB but only 32KB remain, then only 32KB will be read.
You can replace all of this code:
if (DiskSize - TotalBytesRead) < SizeOf(Buffer) then
begin
// Read 65535 or less bytes
BytesRead := FileRead(hDiskHandle, Buffer, (DiskSize - TotalBytesRead));
BytesWritten := FileWrite(hImageName, Buffer, BytesRead);
end
else
begin
// Read 65536 (64kb) at a time
BytesRead := FileRead(hDiskHandle, Buffer, SizeOf(Buffer));
BytesWritten := FileWrite(hImageName, Buffer, BytesRead);
end;
With simply:
BytesRead := FileRead(hDiskHandle, Buffer, SizeOf(Buffer));
BytesWritten := FileWrite(hImageName, Buffer, BytesRead);
This eliminates the 64-bit arithmetic and any possibility for errors resulting from 64-bit results being passed in 32-bit values. If the final segment contains only 32KB, then only 32KB will be read.
You can also simplify your loop termination (removing the 64-bit arithmetic, unless you need the accumulated values for other purposes). Instead of accumulating the total bytes read, you can terminate your loop simply when your FileRead() reads less than the number of bytes specified. i.e. BytesRead < 64KB:
Even if your disk is an exact multiple of 64KB blocks, your penultimate FileRead() will return a full buffer of 64KB, and the very next FileRead() will read 0 bytes, which is < 64KB, terminating the loop. :)
32,767 is an odd number and not a multiple of the sector size. That means that the final part sector is simply not readable.
OK, I've done it.
Credit to user2024154 as that was the first major thing. So based on that I've granted the answer there.
However, what was not clear was how to properly assign its const value. After many hours of Googling I stumbled across this . It was the only Delphi example I could find where it actually shows FSCTL_ALLOW_EXTENDED_DASD_IO defined, allbeit one had to look through much of it to pull the values together.
For the benefit of anyone else, the values that I needed, which now work, is :
const
FILE_DEVICE_FILE_SYSTEM = $00000009;
FILE_ANY_ACCESS = 0;
METHOD_NEITHER = 3;
FSCTL_ALLOW_EXTENDED_DASD_IO = ((FILE_DEVICE_FILE_SYSTEM shl 16)
or (FILE_ANY_ACCESS shl 14)
or (32 shl 2) or METHOD_NEITHER);
I then used FSCTL_ALLOW_EXTENDED_DASD_IO after first creating the handle and then :
if not DeviceIOControl(hSelectedDisk, FSCTL_ALLOW_EXTENDED_DASD_IO, nil, 0,
nil, 0, BytesReturned, nil) then
raise Exception.Create('Unable to initiate FSCTL_ALLOW_EXTENDED_DASD_IO disk access.');
This works with freepascal and with some minor adjustment should work with Delphi.
Thanks to you all for your continued help, especially user2024154 and, as always, David, for his continued assistance.
UPDATE : Except now physcial disk access doesn't work at all! But I'll work something out.
I am new to golang, and I made a function which returns a map, but I do not know if it will cause a memory leak. the code like below
func ParseParams(data string) map[string]string {
params := strings.Split(data, "&")
m := make(map[string]string)
for idx := range params {
vals := strings.Split(params[idx], "=")
m[vals[0]] = vals[1]
}
return m
}
So, I would like to know if it is necessary to release or free the map ? or do something to avoid the memory leak. Thanks!
Go is garbage-collected so there is no possibility of a memory leak here.
I'm using bufio.Scanner, and I'm not sure if I should be giving it a reader wrapped by bufio.Reader.
I.e., where f is an os.File, should I:
scanner := bufio.NewScanner(f)
or
scanner := bufio.NewScanner(bufio.NewReader(f))
From the scan.go source it doesn't look like you need to pass it a *bufio.Reader: it has its own buffer, defaulting to 4K as bufio.Reader's buffers do.
// NewScanner returns a new Scanner to read from r.
// The split function defaults to ScanLines.
func NewScanner(r io.Reader) *Scanner {
return &Scanner{
r: r,
split: ScanLines,
maxTokenSize: MaxScanTokenSize,
buf: make([]byte, 4096), // Plausible starting size; needn't be large.
}
}