Cannot free dynamic memory in async rust task - memory

Our Rust application appeared to have a memory leak and I've distilled down the issue to the code example below. I still can't see where the problem is.
My expectation is that on the (500,000 + 1)'th message the memory of the application would return to low levels. Instead I observe the following:
before sending 500,000 messages the memory usage is 124KB
after sending 500,000 message the memory usage climbs to 27MB
after sending 500,000 + 1 message the memory usage drops to 15.5MB
After trying many things, I cannot find where the 15.5MB is hiding. The only way to free the memory is to kill the application. Valgrind did not detect any memory leaks. A work around, solution, or point in the right direction would all be much appreciated.
A demo project with the code below can be found here: https://github.com/loriopatrick/mem-help
Notes
If I remove self.items.push(data); memory usage does not increase so I don't think it's an issue with Sender/Receiver
Wrapping items: Vec<String> in an Arc<Mutex<..>> made no observable memory difference
The task where the memory should be managed
struct Processor {
items: Vec<String>,
}
impl Processor {
pub fn new() -> Self {
Processor {
items: Vec::new(),
}
}
pub async fn task(mut self, mut receiver: Receiver<String>) {
while let Some(data) = receiver.next().await {
self.items.push(data);
if self.items.len() > 500000 {
{
std::mem::replace(&mut self.items, Vec::new());
}
println!("Emptied items array");
}
}
println!("Processor task closing in 5 seconds");
tokio::time::delay_for(Duration::from_secs(5)).await;
}
}
Full runnable example
use std::time::Duration;
use tokio::stream::StreamExt;
use tokio::runtime::Runtime;
use tokio::sync::mpsc::{channel, Receiver, Sender};
struct Processor {
items: Vec<String>,
}
impl Processor {
pub fn new() -> Self {
Processor {
items: Vec::new(),
}
}
pub async fn task(mut self, mut receiver: Receiver<String>) {
while let Some(data) = receiver.next().await {
self.items.push(data);
if self.items.len() > 500000 {
{
std::mem::replace(&mut self.items, Vec::new());
}
println!("Emptied items array");
}
}
println!("Processor task closing in 5 seconds");
tokio::time::delay_for(Duration::from_secs(5)).await;
}
}
pub fn main() {
{
let mut runtime: Runtime = tokio::runtime::Builder::new()
.threaded_scheduler()
.core_threads(1)
.enable_all()
.build()
.expect("Failed to build runtime");
let (mut sender, receiver) = channel(1024);
let p = Processor::new();
runtime.spawn(async move {
println!("Before send, waiting 5 seconds");
tokio::time::delay_for(Duration::from_secs(5)).await;
for i in 0..500000 {
sender.send("Hello".to_string()).await;
}
println!("Sent 500,000 items, waiting 5 seconds");
tokio::time::delay_for(Duration::from_secs(5)).await;
sender.send("Hello".to_string()).await;
println!("Send message to clear items");
tokio::time::delay_for(Duration::from_secs(3)).await;
println!("Closing sender in 5 seconds");
tokio::time::delay_for(Duration::from_secs(5)).await;
});
runtime.block_on(async move {
{
p.task(receiver).await;
}
println!("Task is done, waiting 5 seconds");
tokio::time::delay_for(Duration::from_secs(5)).await;
});
}
println!("Runtime closed, waiting 5 seconds");
std::thread::sleep(Duration::from_secs(5));
}
Cargo.toml
[package]
name = "mem-help"
version = "0.1.0"
authors = ["Patrick Lorio <dev#plorio.com>"]
edition = "2018"
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[dependencies]
futures = "0.3.1"
tokio = { version = "0.2.6", features = ["full"] }

Related

Why this Rust code allocates buffers on same memory region?

I don't understand the behaviour of this piece of code... I'm writing an RTOS an this issue is halting me. I really don't get why the code acts this way.
Here is some code I tested on the playground that shows the issue.
https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=cc6cc0ec8bfe76f65e1baaa67caaf9e6
use core::fmt;
use core::fmt::Display;
struct StackPointer(*const usize);
impl Display for StackPointer {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
write!(f, "{}", self.0 as usize)
}
}
struct Stack<const WORDS: usize> {
pub sp: StackPointer,
pub mem: [usize; WORDS],
}
impl<const WORDS: usize> Stack<WORDS> {
pub fn new() -> Self {
let mem = [0; WORDS];
let sp = StackPointer(mem.as_ptr() as *const usize);
Self {
mem,
sp,
}
}
}
struct PCB<const WORDS: usize> {
pub stack: Stack<WORDS>,
}
impl<const WORDS: usize> PCB<WORDS> {
pub fn new() -> Self {
Self {
stack: Stack::new(),
}
}
}
fn main() {
let pcb1 = PCB::<128>::new();
let pcb2 = PCB::<128>::new();
let pcb3 = PCB::<128>::new();
println!("sp1: {}, sp2: {}, sp3: {}", pcb1.stack.sp, pcb2.stack.sp, pcb3.stack.sp);
}
I don't understand the behaviour of this piece of code... I'm writing an RTOS an this issue is halting me. I really don't get why the code acts this way.
Because you're writing broken code.
let mem = [0; WORDS];
this reserves WORDS words on the stack (incidentally why is it usize?)
let sp = StackPointer(mem.as_ptr() as *const usize);
this takes a pointer to a location in the current stackframe, where you've put your array.
Self {
mem,
sp,
}
this then blissfully copies the data out of the current stackframe and into the parent stackframe, while keeping a pointer to the now-popped stackframe.
So on each call to PCB::<128>::new(); you're going to create a stackframe, allocate an array into that stackframe, take a pointer to that array (in the stackframe), then pop the stackframe.
All the stackframes being in the same location (on top of main's stackframe) they're at roughly the same offset, hence the array is at the same offset in all calls, and all your nonsensical StackPointer store data to the same location, which will be filled with nonsense as soon as you call an other function.

How can I check if std::io::Cursor has unconsumed data?

I am writing a low-level network app that deals with TCP sockets where I often need to process binary data streams. When some data is available, I read it into u8 array, then wrap into std::io::Cursor<&[u8]> and then pass it to handlers. In a handler, I often need to know if there is some more data in the Cursor or not.
Imagine that the handle function receives data and then processes it in chunks using the handle_chunk function. For simplicity, assume that chunk size is fixed at 10 bytes; if the data size is not divisible by 10, it's an error. This simple logic can be implemented in the following way:
fn handle(mut data: Cursor<&[u8]>) {
while !data.empty() {
if let Err(err) = handle_chunk(&mut data) {
eprintln!("Error while handling data: {}", err);
}
}
}
fn handle_chunk(data: &mut Cursor<&[u8]>) -> Result<(), String> {
// Returns Err("unexpected EOF".to_string()) if chunk is incomplete
// ...
}
However, Cursor does not have an empty() method or any other method capable of telling if there is more data to process. The working solution that I could come up with is:
fn handle(data: Cursor<&[u8]>) {
let data = data.into_inner();
let len = data.len();
let mut data = Cursor::new(data);
while (data.position() as usize) < len - 1 {
if let Err(err) = handle_chunk(&mut data) {
eprintln!("Error while handling data: {}", err);
}
}
}
This looks hacky and inelegant though. Is there a better solution? Maybe there is a different tool in the Rust standard library that fits here better than Cursor?
Your code can be simplified by using Cursor::get_ref to avoid breaking up the input and putting it back together:
fn handle(mut data: Cursor<&[u8]>) {
let len = data.get_ref().len();
while (data.position() as usize) < len - 1 {
if let Err(err) = handle_chunk(&mut data) {
eprintln!("Error while handling data: {}", err);
}
}
}
Now, you haven't shown any code that requires a Cursor. Many times, people think it's needed to convert a &[u8] to something that implements Read, but it's not. Read is implemented for &'a [u8]:
use std::io::Read;
fn handle(mut data: &[u8]) {
while !data.is_empty() {
if let Err(err) = handle_chunk(&mut data) {
eprintln!("Error while handling data: {}", err);
}
}
}
fn handle_chunk<R: Read>(mut data: R) -> Result<(), String> {
let mut b = [0; 10];
data.read_exact(&mut b).unwrap();
println!("Chunk: {:?}", b);
Ok(())
}
fn main() {
let d: Vec<u8> = (0..20).collect();
handle(&d)
}
By having mut data: &[u8] and using &mut data, the code will update the slice variable in place to advance it forward. We can't easily go backward though.
an empty() method
Rust style indicates that an empty method would be a verb — this would remove data (if it were possible). The method you want should be called is_empty, as seen on slices.

"error: closure may outlive the current function" but it will not outlive it

When I try to compile the following code:
fn main() {
(...)
let mut should_end = false;
let mut input = Input::new(ctx);
input.add_handler(Box::new(|evt| {
match evt {
&Event::Quit{..} => {
should_end = true;
}
_ => {}
}
}));
while !should_end {
input.handle();
}
}
pub struct Input {
handlers: Vec<Box<FnMut(i32)>>,
}
impl Input {
pub fn new() -> Self {
Input {handlers: Vec::new()}
}
pub fn handle(&mut self) {
for a in vec![21,0,3,12,1] {
for handler in &mut self.handlers {
handler(a);
}
}
}
pub fn add_handler(&mut self, handler: Box<FnMut(i32)>) {
self.handlers.push(handler);
}
}
I get this error:
error: closure may outlive the current function, but it borrows `should_end`, which is owned by the current function
I can't simply add move to the closure, because I need to use should_end later in the main loop. I mean, I can, but since bool is Copy, it will only affect the should_end inside the closure, and thus the program loops forever.
As far as I understand, since input is created in the main function, and the closure is stored in input, it couldn't possibly outlive the current function. Is there a way to express to Rust that the closure won't outlive main? Or is there a possibility that I can't see that the closure will outlive main? In the latter case, it there a way to force it to live only as long as main?
Do I need to refactor the way I'm handling input, or is there some way I can make this work. If I need to refactor, where can I look to see a good example of this in Rust?
Here's a playpen of a simplified version. It is possible I made a mistake in it that could crash your browser. I happened to me once, so, beware.
In case it is needed, the rest of my code is available. All the relevant info should be in either main.rs or input.rs.
The problem is not your closure, but the add_handler method. Fully expanded it would look like this:
fn add_handler<'a>(&'a mut self, handler: Box<FnMut(i32) + 'static>)
As you can see, there's an implicit 'static bound on the trait object. Obviously we don't want that, so we introduce a second lifetime 'b:
fn add_handler<'a, 'b: 'a>(&'a mut self, handler: Box<FnMut(i32) + 'b>)
Since you are adding the handler object to the Input::handlers field, that field cannot outlive the scope of the handler object. Thus we also need to limit its lifetime:
pub struct Input<'a> {
handlers: Vec<Box<FnMut(i32) + 'a>>,
}
This again requires the impl to have a lifetime, which we can use in the add_handler method.
impl<'a> Input<'a> {
...
pub fn add_handler(&mut self, handler: Box<FnMut(i32) + 'a>) {
self.handlers.push(handler);
}
}
Now all that's left is using a Cell to control access to your should_end flag.
Here is an example of the fixed code:
use std::cell::Cell;
fn main() {
let should_end = Cell::new(false);
let mut input = Input::new();
input.add_handler(Box::new(|a| {
match a {
1 => {
should_end.set(true);
}
_ => {
println!("{} {}", a, should_end.get())
}
}
}));
let mut fail_safe = 0;
while !should_end.get() {
if fail_safe > 20 {break;}
input.handle();
fail_safe += 1;
}
}
pub struct Input<'a> {
handlers: Vec<Box<FnMut(i32) + 'a>>,
}
impl<'a> Input<'a> {
pub fn new() -> Self {
Input {handlers: Vec::new()}
}
pub fn handle(&mut self) {
for a in vec![21,0,3,12,1,2] {// it will print the 2, but it won't loop again
for handler in &mut self.handlers {
handler(a);
}
}
}
pub fn add_handler(&mut self, handler: Box<FnMut(i32) + 'a>) {
self.handlers.push(handler);
}
}

objc_sync_enter / objc_sync_exit not working with DISPATCH_QUEUE_PRIORITY_LOW

I need a read\write lock for my application. I've read https://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock
and wrote my own class, cause there are no read/write lock in swift
class ReadWriteLock {
var logging = true
var b = 0
let r = "vdsbsdbs" // string1 for locking
let g = "VSDBVSDBSDBNSDN" // string2 for locking
func waitAndStartWriting() {
log("wait Writing")
objc_sync_enter(g)
log("enter writing")
}
func finishWriting() {
objc_sync_exit(g)
log("exit writing")
}
// ждет пока все чтение завершится чтобы начать чтение
// и захватить мютекс
func waitAndStartReading() {
log("wait reading")
objc_sync_enter(r)
log("enter reading")
b++
if b == 1 {
objc_sync_enter(g)
log("read lock writing")
}
print("b = \(b)")
objc_sync_exit(r)
}
func finishReading() {
objc_sync_enter(r)
b--
if b == 0 {
objc_sync_exit(g)
log("read unlock writing")
}
print("b = \(b)")
objc_sync_exit(r)
}
private func log(s: String) {
if logging {
print(s)
}
}
}
It works good, until i try to use it from GCD threads.
dispatch_async(dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_LOW, 0)
dispatch_async(dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0)
When i try to use this class from different async blocks at some moment it allows to write when write is locked
here is sample log:
wait reading
enter reading
read lock writing
b = 1
wait reading
enter reading
b = 2
wait reading
enter reading
b = 3
wait reading
enter reading
b = 4
wait reading
enter reading
b = 5
wait reading
enter reading
b = 6
wait reading
enter reading
b = 7
wait reading
enter reading
b = 8
wait reading
enter reading
b = 9
b = 8
b = 7
b = 6
b = 5
wait Writing
enter writing
exit writing
wait Writing
enter writing
So, as you can see g was locked, but objc_sync_enter(g) allows to continue.
Why could this happen ?
BTW i checked how many times ReadWriteLock constructed, and it's 1.
Why objc_sync_exit not working and allowing to objc_sync_enter(g) when it's not freed ?
PS Readwirtelock defined as
class UserData {
static let lock = ReadWriteLock()
Thanks.
objc_sync_enter is an extremely low-level primitive, and isn't intended to be used directly. It's an implementation detail of the old #synchronized system in ObjC. Even that is extremely out-dated and should generally be avoided.
Synchronized access in Cocoa is best achieved with GCD queues. For example, this is a common approach that achieves a reader/writer lock (concurrent reading, exclusive writing).
public class UserData {
private let myPropertyQueue = dispatch_queue_create("com.example.mygreatapp.property", DISPATCH_QUEUE_CONCURRENT)
private var _myProperty = "" // Backing storage
public var myProperty: String {
get {
var result = ""
dispatch_sync(myPropertyQueue) {
result = self._myProperty
}
return result
}
set {
dispatch_barrier_async(myPropertyQueue) {
self._myProperty = newValue
}
}
}
}
All your concurrent properties can share a single queue, or you can give each property its own queue. It depends on how much contention you expect (a writer will lock the entire queue).
The "barrier" in "dispatch_barrier_async" means that it is the only thing allowed to run on the queue at that time, so all previous reads will have completed, and all future reads will be prevented until it completes. This scheme means that you can have as many concurrent readers as you want without starving writers (since writers will always be serviced), and writes are never blocking. On reads are blocking, and only if there is actual contention. In the normal, uncontested case, this is extremely very fast.
Are you 100% sure your blocks are actually executing on different threads?
objc_sync_enter() / objc_sync_exit() are guarding you only from object being accessed from different threads. They use a recursive mutex under the hood, so they won't either deadlock or prevent you from repeatedly accessing object from the same thread.
So if you lock in one async block and unlock in another one, the third block executed in-between can have access to the guarded object.
This is one of those very subtle nuances that is easy to miss.
Locks in Swift
You have to really careful what you use as a Lock. In Swift, String is a struct, meaning it's pass-by-value.
Whenever you call objc_sync_enter(g), you are not giving it g, but a copy of g. So each thread is essentially creating its own lock, which in effect, is like having no locking at all.
Use NSObject
Instead of using a String or Int, use a plain NSObject.
let lock = NSObject()
func waitAndStartWriting() {
log("wait Writing")
objc_sync_enter(lock)
log("enter writing")
}
func finishWriting() {
objc_sync_exit(lock)
log("exit writing")
}
That should take care of it!
In addition to #rob-napier's solution. I've updated this to Swift 5.1, added generic typing and a couple of convenient append methods. Note that only methods that access resultArray via get/set or append are thread safe, so I added a concurrent append also for my practical use case where the result data is updated over many result calls from instances of Operation.
public class ConcurrentResultData<E> {
private let resultPropertyQueue = dispatch_queue_concurrent_t.init(label: UUID().uuidString)
private var _resultArray = [E]() // Backing storage
public var resultArray: [E] {
get {
var result = [E]()
resultPropertyQueue.sync {
result = self._resultArray
}
return result
}
set {
resultPropertyQueue.async(group: nil, qos: .default, flags: .barrier) {
self._resultArray = newValue
}
}
}
public func append(element : E) {
resultPropertyQueue.async(group: nil, qos: .default, flags: .barrier) {
self._resultArray.append(element)
}
}
public func appendAll(array : [E]) {
resultPropertyQueue.async(group: nil, qos: .default, flags: .barrier) {
self._resultArray.append(contentsOf: array)
}
}
}
For an example running in a playground add this
//MARK:- helpers
var count:Int = 0
let numberOfOperations = 50
func operationCompleted(d:ConcurrentResultData<Dictionary<AnyHashable, AnyObject>>) {
if count + 1 < numberOfOperations {
count += 1
}
else {
print("All operations complete \(d.resultArray.count)")
print(d.resultArray)
}
}
func runOperationAndAddResult(queue:OperationQueue, result:ConcurrentResultData<Dictionary<AnyHashable, AnyObject>> ) {
queue.addOperation {
let id = UUID().uuidString
print("\(id) running")
let delay:Int = Int(arc4random_uniform(2) + 1)
for _ in 0..<delay {
sleep(1)
}
let dict:[Dictionary<AnyHashable, AnyObject>] = [[ "uuid" : NSString(string: id), "delay" : NSString(string:"\(delay)") ]]
result.appendAll(array:dict)
DispatchQueue.main.async {
print("\(id) complete")
operationCompleted(d:result)
}
}
}
let q = OperationQueue()
let d = ConcurrentResultData<Dictionary<AnyHashable, AnyObject>>()
for _ in 0..<10 {
runOperationAndAddResult(queue: q, result: d)
}
I had the same problem using queues in background. The synchronization is not working all the time in queues with "background" (low) priority.
One fix I found was to use semaphores instead of "obj_sync":
static private var syncSemaphores: [String: DispatchSemaphore] = [:]
static func synced(_ lock: String, closure: () -> ()) {
//get the semaphore or create it
var semaphore = syncSemaphores[lock]
if semaphore == nil {
semaphore = DispatchSemaphore(value: 1)
syncSemaphores[lock] = semaphore
}
//lock semaphore
semaphore!.wait()
//execute closure
closure()
//unlock semaphore
semaphore!.signal()
}
The function idea comes from What is the Swift equivalent to Objective-C's "#synchronized"?, an answer of #bryan-mclemore.

Create a moving average (and other FIR filters) using ReactiveCocoa

I'm still getting started with ReactiveCocoa and functional reactive programming concepts, so maybe this is a dumb question.
ReactiveCocoa seem naturally designed to react to streams of live data, touch events or accelerometer sensor input etc.
Is it possible to apply finite impulse response filters in ReactiveCocoa in an easy, reactive fashion? Or if not, what would be the least-ugly hacky way of doing this? How would one go about implementing something like a simple moving average?
Ideally looking for an Swift 2 + RA4 solution but also interested in if this is possible at all in Objective C and RA2/RA3.
What you actually need is a some sort of period buffer, which will keep a period of values buffered and only start sending out when the buffer has reached capacity (the code below is heavenly inspired on takeLast operator)
extension SignalType {
func periodBuffer(period:Int) -> Signal<[Value], Error> {
return Signal { observer in
var buffer: [Value] = []
buffer.reserveCapacity(period)
return self.observe { event in
switch event {
case let .Next(value):
// To avoid exceeding the reserved capacity of the buffer, we remove then add.
// Remove elements until we have room to add one more.
while (buffer.count + 1) > period {
buffer.removeAtIndex(0)
}
buffer.append(value)
if buffer.count == period {
observer.sendNext(buffer)
}
case let .Failed(error):
observer.sendFailed(error)
case .Completed:
observer.sendCompleted()
case .Interrupted:
observer.sendInterrupted()
}
}
}
}
}
based on that you can map it to any algorithm you want
let pipe = Signal<Int,NoError>.pipe()
pipe.0
.periodBuffer(3)
.map { Double($0.reduce(0, combine: +))/Double($0.count) } // simple moving average
.observeNext { print($0) }
pipe.1.sendNext(10) // does nothing
pipe.1.sendNext(11) // does nothing
pipe.1.sendNext(15) // prints 12
pipe.1.sendNext(7) // prints 11
pipe.1.sendNext(9) // prints 10.3333
pipe.1.sendNext(6) // prints 7.3333
Probably the scan signal operator is what you're looking for. Inspired by Andy Jacobs' answer, I came up with something like this (a simple moving average implementation):
let (signal, observer) = Signal<Int,NoError>.pipe()
let maxSamples = 3
let movingAverage = signal.scan( [Int]() ) { (previousSamples, nextValue) in
let samples : [Int] = previousSamples.count < maxSamples ? previousSamples : Array(previousSamples.dropFirst())
return samples + [nextValue]
}
.filter { $0.count >= maxSamples }
.map { $0.average }
movingAverage.observeNext { (next) -> () in
print("Next: \(next)")
}
observer.sendNext(1)
observer.sendNext(2)
observer.sendNext(3)
observer.sendNext(4)
observer.sendNext(42)
Note: I had to move average method into a protocol extension, otherwise the compiler would complain that the expression was too complex. I used a nice solution from this answer:
extension Array where Element: IntegerType {
var total: Element {
guard !isEmpty else { return 0 }
return reduce(0){$0 + $1}
}
var average: Double {
guard let total = total as? Int where !isEmpty else { return 0 }
return Double(total)/Double(count)
}
}

Resources