Key-Value storage for poor

Despite the hype about NoSql databases, sometimes it’s nice to have an embedded key-value storage available in your app. For example, I’m maintaining a cache of metadata of images in my cross-platform desktop app Xpiks and anyway I have to search by filepath to find the metadata. Besides that I have few more requirements specific to my project:

  • the database should be possible to embed directly into your app
  • the license should allow linking with proprietary code (so to be either permissive MIT, BSD-like or LGPL)
  • it should be cross-platform in a sense of Windows, macOS and Linux
  • it should have a history of being used across various projects (sort of to be 5+ years old)
  • it should be affordable to get, since I’m not a big company or any sort of company

So when I started looking for this sort of databases, first thing I found was Berkeley DB. This is a very well tested, stable key-value storage, used for decades by programmers. It is very well documented with extensive examples and APIs for C, C++ and Java. Also it is a cross-platform solution available for many more platforms than just Windows, macOS and Linux. I even wrote a small wrapper for Qt to use this library. But it appears to use two licenses: AGPL 3.0+ and Commercial license. AGPL is nowhere near being permissive and the commercial license costs literally thousands of dollars so this is not an option for me. Also early versions of Berkeley DB (before it was brought by Oracle) were released under SleepyCat Software License which is a BSD license with a small AGPL-like amendment (about network use and releasing the source code). Overall verdict – unfortunately NO.. (but if the license would allow – better than any other in this post)

There’re many nice LSM-like modern key-value databases with permissive licenses on the market which are not cross-platform like Google’s LevelDB or memory-only databases like LMDB (“memory-only” is speculative, but there were some issues with hard drives and cross-platform use with this DB like missing memory-mapping implementations for Windows).

Next embedded database I looked at was RocksDB. This is a relatively fresh database from Facebook, but very well tested under Facebook’s load. It is advertised to be cross-platform (though building it is not very straightforward) and available under Apache License 2.0. Really sounds like what I need. The only thing which looked suspicious was that they are adding same amount of features each release as they are fixing bugs. Of course, I’m not maintaining a stocks exchange market, but I’m striving to create a very stable software and therefore I have to use very stable third-party software too. So overall verdict – almost YES, maybe later.

Of course when it comes to an embedded database with permissive license, you cannot omit SQLite. This is a very old, well-tested project, given to the public domain (in terms of licensing) and used across enormous amount of platforms (most extensively right now in Android) product.  Easy to use, well documented, but with one “minor” problem: it is not a key-value storage. But can we use it as one? Probably. All we need to do is to create a table with 2 columns “key” and “value” and use the former as a PRIMARY KEY. Performance? Yeah it is probably not great with this approach, but for me it is not that critical so I wrote a wrapper around it and it works sufficiently for my project. Overall verdict – YES, for now.

The bottom line is:

So in this pursuit of a key-value storage for my cross-platform desktop app I have found two very good-looking options: RocksDB and SQLite (with modifications). The time will show if it was a wise choice to use SQLite, but I can always switch to the other later. Also if Oracle would have changed BerkeleyDB license, that would definitely be my choice.

Dependency-driven development: forced OSS contributions

It is such a relief when you app just works. Moreover, when it is open. My pet project Xpiks is not only an open-source project itself, but it also uses a lot of the other open-source technologies inside. Qt framework, zlib, hunspell – to name just a few. A big deal is to make them work together. A much bigger deal to make them work together across different platforms (Xpiks is announced as cross-platform for Windows, OS X and Linux). The least problems you can expect – is a tricky build process or somebody’s typo in the Makefile which breaks the-other-system’s build.

More often what you’ll encounter – is people building a huge pile of code working only for their needs. Only for their server. Only for their version if libcurl. Only for x86 operating system. And then they open-source it to GitHub – much like a cemetery for projects with 1 star and 0 forks, decaying there until forgotten forever.

This is how the initial joy of finding an open-source technology you needed is being replaced by a constant frustration of not just a need to slightly tweak some header file or Makefile, but to go the sources, read them, understand everything inside and fix. This is what I have encountered many times and what I did as well.

(more…)

BackToWork – smarter Alt+Tab for Windows

Frequently there’s a need to quickly switch to a specific window or two from a dozen. What I usually do is I hit Alt+Tab and cycle through windows to find the one. Today I decided that it’s enough and wrote a simple productivity tool to switch to the needed windows with a hotkey. It reads a config file and gets patterns to find the needed windows and once you hit a hotkey – it brings them to front. It is especially useful when you have those “5 minutes of procrastination” and then you want to switch back to the development routine, but you need to find your IDE among windows you have opened before.

BackToWork at GitHub – just download the binaries (built for x86), edit config and you’re done.

Enjoy!

Unicode support for avformat_open_input in Windows

For those of us ever writing cross-platform application there has always been enough quires and quests to accomplish. Typical one is to correctly handle multibyte/unicode filepaths in Windows. And though they are handled pretty good in Qt, when you write your own library you have to do it yourself.

Another level of quests is using third-party libraries which were not designed for cross-platform usage. For example if you wanted to use ffmpeg / libav libraries in Windows, you have to deal with lack of support of std::wstring parameters in the API. One way to deal with it – arrange a custom IO using AVFormatContext and handle file paths by yourself. I have found a wonderful article and code example of how to do it in the blog of Marika Wei. Slightly adapted, the solution will handle all Windows paths


struct {
#ifdef _WIN32
    std::wstring m_FilePath;
#else
    std::string m_FilePath;
#endif
    AVIOContext *m_IOCtx;
    uint8_t *m_Buffer; // internal buffer for ffmpeg
    int m_BufferSize;
    FILE *m_File;
}

#ifdef _WIN32
    m_File = _wfopen(m_FilePath.c_str(), L"rb");
#else
    m_File = fopen(m_FilePath.c_str(), "rb");
#endif

m_IOContext = avio_alloc_context(
    m_Buffer, m_BufferSize, // internal buffer and its size
    0, // write flag (1=true, 0=false)
    (void*)this, // user data, will be passed to our callback functions
    IOReadFunc,
    0, // no writing
    IOSeekFunc
);

Check out the full code at GitHub.

How to pass Amazon SDE interview

Amazon is considered to be one for the most wanted employers among software engineers who don’t work for any of the tech giants. Standing in one line with Google, Microsoft, Facebook and maybe some smaller like Twitter, Uber, Dropbox etc., it has unstoppable flow of CV’s from people passionate of working on big scale.

But is it really that cool, demanding and, in the end, rewarding? A lot of people would disagree with that, others will be neutral and there will be only few of those who will agree. For example, typical everyday job of server-side SDE II responsible for customer experience with purchasing goods can only consist of sending/receiving requests to/from internal web-services, validating input data, fixing small bugs and that’s all. Oh no, there’s one more thing – on-call rotations. So one week every few months (that depends on a team, but just to give you an idea) that employee despite of his “interesting and challenging” duties will be responsible for fixing bugs on production asap which literally means ASAP – during the weekend, in the evening, in the night – doesn’t matter.

That is why Amazon looks for people who won’t whine about such lifestyle. Amazon has a dozen of so-called “principles” (read “search criteria for new employees”) where some are contradictory to the others. Like they need employees who have a “bias for action” but are “insisting on highest standards” or who are “frugal” but “think big” and stuff like that. Interviewers will ask you about how do you match with these principles and what they’re really interested in is if you had experience working overtime, on the weekends, under pressure, overnight – in order to deliver results in short terms and fix bugs. They clearly tell you about it – if you’re weak in programming or algorithms – it does not matter if on the other hand you’re used to working overtime just to deliver results.

So how to pass Amazon interview? They will ask you about your experience and definitely will ask you to give them example where you had tight deadlines and half-finished task. They want to hear how did you work overnights and did not complain for that. If they will – you’ve passed even if your solution for their O(N^2) dynamic programming puzzle is NP-complete.

Replacing QNetworkAccessManager for the great good

Everybody using Qt for networking for small tasks will sometimes face oddities of QNetworkAccessManager. This class aims to be useful and convenient while having few quite sensible drawbacks. First one of couse is inability to use it in blocking way. What you should do instead is to create instance of QEventLoop and connect it’s quit() signal with network manager.

QNetworkAccessManager networkManager;
QEventLoop loop;
QNetworkReply *netReply = networkManager.get(resource);
connect(netReply, SIGNAL(finished()), &loop, SLOT(quit()));
loop.exec();    

This is overkill and overengineering of course. This inconveniency strikes also when you try to use it from background thread for downloading something – QNetworkAccessManager needs an event loop and it will launch one more thread – it’s own to do all the operations required.

Also it has a lot of data, methods and abilities not needed for “everyday simple network operations” like querying some API or downloading files. I don’t know anybody who wasn’t looking for a substitude for it at least once. But fortunately the solution exists.

(more…)

Resources to learn and understand parallel programming. The hard way

There’s no way other than the hard way. (c)

Parallel programming is considered as not easy or even advanced topic by many programmers. It’s the starting point for even more advanced stuff like distributed computations, reliability, CAP theorem, consensus problems and much more. Besides, deep understanding of how CPU and operating system works can help you to write less buggy software and parallel programming can help you with that too.

In this post I will focus on books describing parallel programming using 1 computer and 1 CPU using classical approaches. Neither they contain SSE instructions guides nor you will find matterials on CUDA or OpenCL. Similary you will find no resourced about Hadoop and/or MapReduce technologies and nothing about technologies supporting parallel programming out of the box like Go or Erlang.

So I will go now through all the resources which I find more or less useful. I’m not going to stick to any technology in general – the point is to understand the topic from different perspectives. The materials I’m refering to in general should not be considered as entry-level –  they require fair amount of knowledge, but nevertheless, list goes sorted starting from “easier” things.

(more…)

Implementing spellchecking in desktop application in C++

When user is supposed to enter significant amount of text in your application, it’s better to help him/her to control it with checking spelling. Basically, to check spelling you need a dictionary with words and algorithm to order these words. Also it might be useful to provide user with possible corrections for any spelling error. Here where Hunspell comes handy. It’s an open source library built on top of MySpell library and used in a significant number of projects varying from open source projects like Firefox to proprietary like OS X. It contains bindings to a number of platforms (.NET, Ruby etc.) and should be fairly easy to integrate to your project. In this post I’ll discuss how to integrate it to C++/Qt project.

(more…)

Classic Producer-Consumer in Qt/C++

Producer-Consumer is a classic pattern of interaction between two or more threads which share common tasks queue and workers who process that queue. When I came to similar task first I googled for standard approaches in Qt to solve this problem, but they were based on signals/slots plus synchronization primitives while I wanted simple and clear solution. Of course, in the end I’ve invented my own wheel and I invite you to take a look at it.

For the synchronization in Producer-Consumer it’s useful to use Mutex and some kind of WaitingEvent for synchronous waiting until mutex is acquired. In Qt you have QMutex and QWaitCondition which are all that we need.

Let’s suppose we have following data structures:

        QWaitCondition m_WaitAnyItem;
        QMutex m_QueueMutex;
        QVector<T*> m_Queue;

where T is type of messages we’re producing/consuming. So we have queue of elements being processed, mutex to secure access to the queue and wait condition to wait if the queue is empty.

For Producer-Consumer usually we need methods produce() and consume(). Let’s see how we can implement them.

(more…)

Implementing autocomplete for English in C++

When it comes to implementing autocompletion in C++ in some type of input field, the question is which algorithm to choose and where to get the source for completion. In this post I’ll try to answer both questions.

As for the algorithm, SO gives us hints about tries, segment trees and others. You can find good article about them. Author has implemented some of them in a repository called FACE (fastest auto-complete in the east). You can easily find it on GitHub. This solution is used for the autocompletion in search engine Duck-Duck-Go which should tell you how good it is. Unfortunately their solution requires dependencies on libuv and joyent http-parser, which is not good in case you need just to integrate autocompletion functionality into your C++ application, but not build auto-complete server and send queries to it. Another drawback – libuv and cpp-libface itself fails to compile in Windows which is bad in case you’re building cross-platform solution.

You can find out how to built FACE into your cross-platform C++ application below.

(more…)