Send your request Join Sii

C++ is a common choice when the performance of the designed system is crucial. Unfortunately, its use alone does not guarantee that the assumed processing speed will be achieved. This is why more and more emphasis is placed on the optimization of the source code and its execution time. One aspect that affects performance is the amount of data copied.

In this article, I will focus on the number of object copies made by the very popular Qt library version 6.2.4, specifically in its mechanism called “Signal & Slot”. I will also present a comparison with my own implementation of the TBC (Template Based Communication) library I created.

QT library

The “Signal & Slot” mechanism in Qt is used for synchronous or asynchronous communication between objects. To use this mechanism, objects must inherit from the QObject class and include the “Q_OBJECT” macro in the declaration. The object that sends the message declares a signal with arguments that must be copyable:

class Sender : public QObject {
	Q_OBJECT

signals:
    void customSignal(int value);
};

  The receiving object declares the corresponding slot:

class Receiver : public QObject {
	Q_OBJECT

public slots:
	void customSlot(int value);

};

Then, the objects should be connected by executing the static method QObject::connect:

QObject::connect(&senderObject,
                 &Sender::customSignal,
				 &receiverObject,
				 &Receiver::customSlot);

This method also accepts an optional parameter defining the connection type. I will focus on the two most important ones:

  • Qt::DirectConnection – the method defined as a slot will be executed synchronously in the same thread, as if the slot had been called directly,
  • Qt::QueuedConnection – the method defined as a slot will be executed asynchronously:
    • In the same thread when it returns to the main Qt loop, and this call will be next in the queue.
    • In another thread after the QObject::moveToThread method has been executed on the receiving object.

Details of this mechanism are included in the Qt documentation.

The mechanism described above seems very useful and easy to use. Later in the article, I will focus on checking the number of copies created only in the queued connection. I have written a simple object transfer test that counts copies by incrementing a static variable in the copy constructor. The definition of this class is as follows:

namespace
{
    static int copyCounter = 0;
}

class Msg {
public:
    Msg() = default;

    Msg(const Msg&) {
        ++::copyCounter;
    }

    Msg(Msg&&) = default;

    int copyCounter() const {
        return ::copyCounter;
    }
};

After providing the parameter to the slot, the number of copies is printed to the console. The results of this test are presented in the table below:

Parameter type in signalArgument type in slotNumber of copies made
const Msg&const Msg&1
const Msg&Msg2
Msg&&const Msg&1
Msg&&Msg2
Tab. 1 The result of the test

As can be seen, in a queued connection, the number of copies made depends on the type of the parameter in the signal and the argument in the slot. It’s worth noting that in the last two cases, the parameter is given by rvalue. I will present the performance overhead introduced by making these copies in a separate chapter.

TBC library

The Qt documentation mentions that it must make a copy to store the object “behind the scenes”, but doesn’t define how many of these copies are made for different types of arguments. I aim to demonstrate that the number of copies for a queuing connection can be minimized by using move semantics or by ensuring that the object reference will be valid at the time of the slot call. For this purpose, I wrote my TBC (Template Based Communication) library. Instead of creating my own metalanguage, I used the template mechanism to specify the arguments to be sent. Here’s how the library works:

  • The sending object must inherit from the TBC::Sender<T> class.
  • The receiving object must inherit from the TBC::Receiver<T> class, where ‘T’ is the type of the argument being sent.

The signal is sent in two ways:

  • valueSignal(T ) – the signal takes an argument by value, allowing for the use of move semantics,
  • constRefSignal(const T& ) – the signal takes a constant reference to an object. The object will only be copied if the slot accepts an argument as a value. It’s important to ensure that the object being sent is not destroyed before the connected slot is invoked.

Similarly, the reception of arguments is handled by valueSlot(T ) and constRefSlot(const T& ). An example of usage in the code is presented below:

class Sender : public TBC::Sender<LargeObj> {
public:
    void sendValue(LargeObj value) {
        valueSignal(std::move(value));
    }

    void sendconstRef(const LargeObj& ref) {
        constRefSignal(ref);
    }
};

class Receiver : public TBC::Receiver<LargeObj> {
public:
    void valueSlot(LargeObj value) override {}
	
	void constRefSlot(const LargeObj& ref) override {}
};

To connect objects, you can use the static TBC::connect method:

TBC::connect(&sender, &receiver);

For more details, please refer to my repository, where you can find the implementation of the TBC library, a class diagram and functional and performance tests. All comments and remarks are welcome 🙂

As in the case of Qt, analogous tests were performed to examine the number of copies made for the queued connection. The test results are in the table below:

Parameter type in signalArgument type in slotNumber of copies made
const Msg&const Msg&0
const Msg&Msg1
Msg&&const Msg&0
Msg&&Msg0
Tab. 2 The test results

You will notice that the TBC library provides the minimum number of required copies for a queued connection.

Benchmarks

The performance tests conducted measure the time that passes from the signal broadcast to the slot call running in a separate thread for various sizes of the transmitted parameter. Each benchmark was executed for 50 iterations. The tests were run twice, resulting in a total of 100 measurements, from which the average operation time was calculated.

Here is the configuration of the system used for the tests:

  • Operating system: Ubuntu 22.04.2 LTS
  • Compiler: gcc 11.3.0
  • Qt version: 6.2.4

The first benchmark measures the transmission of std::chrono::high_resolution_clock::time_point set just before the signal is emitted and examines the delay in the recipient slot. To illustrate, here is a piece of code using Qt:

QObject::connect(&sender, &QTSender::send, &receiver, &QTReceiver::valueSlot, Qt::QueuedConnection);
newThread.start();

std::cout << "QT latency [µs]: ";
for (size_t i = 0; i < iterations; ++i) {
	sender.send(std::chrono::high_resolution_clock::now());
	std::this_thread::sleep_for(std::chrono::seconds{1});
}
std::cout << std::endl;
public slots:
    void valueSlot(std::chrono::high_resolution_clock::time_point sendTimePoint) {
        auto endTimePoint = std::chrono::high_resolution_clock::now();
        std::cout << std::chrono::duration_cast<std::chrono::microseconds> (endTimePoint - sendTimePoint).count() << ",";
    }

and TBC:

TBC::connect(&sender, &receiver);
receiver.runInNewThread();

std::cout << "TBC latency [µs]: ";
for (size_t i = 0; i < iterations; ++i) {
	sender.valueSignal(std::chrono::high_resolution_clock::now());
	std::this_thread::sleep_for(std::chrono::seconds{1});
}
std::cout << std::endl;
public:
    void valueSlot(std::chrono::high_resolution_clock::time_point sendTimePoint) override {
        auto endTimePoint = std::chrono::high_resolution_clock::now();
        std::cout << std::chrono::duration_cast<std::chrono::microseconds> (endTimePoint - sendTimePoint).count() << ",";
    }

The test results are illustrated below:

Avarage message delivery delay
Fig. 1 Avarage message delivery delay

Although the TBC library shows approximately 10% less delay compared to the Qt library, it is not a significant difference because only after 100 thousand operations will the delay accumulate to 1 second.

Subsequent benchmarks

Subsequent benchmarks measure the time needed to transmit a Msg object containing an array of bytes, with the size increasing fourfold, starting from 1kB and going up to 256MB. The source code of the Msg class is presented below:

class Msg {
    std::vector<uint8_t> _data;
    std::chrono::high_resolution_clock::time_point _msgCreationTimePoint;

public:
    Msg() = default;

    Msg(size_t msgByteSizeInKb) :
        _data(msgByteSizeInKb * 1024),
        _msgCreationTimePoint{std::chrono::high_resolution_clock::now()}
    {}

    Msg(const Msg& other) = default;

    Msg(Msg&& other) = default;

    void resetCreationTimePoint() {
        _msgCreationTimePoint = std::chrono::high_resolution_clock::now();
    }

    const std::chrono::high_resolution_clock::time_point& sendTimePoint() const {
        return _msgCreationTimePoint;
    }

    size_t dataSizeInKb() const {
        return _data.size() / 1024;
    }

    static constexpr int maxMsgSizeInKb = power(8, 6);
};

Tests were conducted for all combinations of signal parameter and slot argument types as listed in the tables in the previous sections. Charts created based on the results achieved are presented below:

Test results
Fig. 2 Test results

The graphs reveal that the operation time of Qt in each case, is linearly dependent on the size of the parameter, and the times correlate with the previously declared values of the number of copies made. In contrast, TBC, for the three cases in which an object copy can be avoided, demonstrates constant complexity, rendering the data transfer time independent of its size.

Summary

The performance tests have confirmed that the Qt library (version 6.2.4) does not optimally manage memory in the tests performed. Unfortunately, the Qt documentation does not explicitly mention this performance limitation. The TBC library I created proves that it is possible to achieve the optimal number of objects copying operations while maintaining a similar, user-friendly interface. It is possible that in subsequent versions of Qt we may receive an update adding support for move semantics.

In the meantime, when using the “Signal & Slot” mechanism from Qt:

  • Please remember that the object must be copyable, and it will be copied at least once when the signal is emitted in the queued connection.
  • Prefer to take the parameter in the slot as a constant reference, which will avoid making one copy.
  • If possible, wrap the parameter in std::shared_ptr so that a copy of the object is not made.

When using std::shared_ptr, remember that the parameter inside will be destroyed when the last copy of std::shared_ptr is deleted unless custom deletetion is set. It is safest to use the std::make_shared at the memory allocation to avoid calling the destructor of the stored object and freeing its memory a second time.

Sources

***

If you’re interested in C++, also take a look at other articles by our experts.

5/5 ( votes: 4)
Rating:
5/5 ( votes: 4)
Author
Avatar
Karol Sierociński

A C/C++ Developer with 6 years of experience, an enthusiast of clean code and software architecture design. Outside of work, passionate about playing ping-pong and speed skating

Leave a comment

Your email address will not be published. Required fields are marked *

You might also like

More articles

Don't miss out

Subscribe to our blog and receive information about the latest posts.

Get an offer

If you have any questions or would like to learn more about our offer, feel free to contact us.

Send your request Send your request

Natalia Competency Center Director

Get an offer

Join Sii

Find the job that's right for you. Check out open positions and apply.

Apply Apply

Paweł Process Owner

Join Sii

SUBMIT

Ta treść jest dostępna tylko w jednej wersji językowej.
Nastąpi przekierowanie do strony głównej.

Czy chcesz opuścić tę stronę?