I'm surprised that many comments here seem to have missed this bit of context:
> One thing you need to know about me is that despite working on SumatraPDF C++ code base for 16 years, I don’t know 80% of C++.
I'm pretty sure that most "why don't you just use x…" questions are implicitly answered by it, with the answer being "because using x correctly requires learning about all of it's intricacies and edge-cases, which in turn requires understanding related features q, r, s… all the way to z, because C++ edge-case complexity doesn't exist in a vacuum".
throwaway2037 14 hours ago [-]
I agree: This quote is the star of the show. I'll troll a little bit here: I thought to myself: "Finally, a humble C++ programmer. Really, they do exist... well, at least one."
uncircle 8 hours ago [-]
There's two:
> Even I can’t answer every question about C++ without reference to supporting material (e.g. my own books, online documentation, or the standard). I’m sure that if I tried to keep all of that information in my head, I’d become a worse programmer.
-- Bjarne Stroustrup, creator of C++
vanderZwan 6 hours ago [-]
Bjarne has his moments - I like his saying that somewhere buried underneath all of C++'s complexity there's an elegant language struggling to get out, and I'm sympathetic to his frustrations and believe he does have good intentions there.
But he can also contradict himself sometimes in this regard, because he also often uses a variation of calling C++ a language for "people who know what they are doing" as a sort of catch-all dismissal of critiques of its footguns.
The whole problem is that very few people can claim to truly "know what they are doing" when it comes to all of C++' features and how they interconnect, dismissing that by (implicitly) telling people to just "git gud" is missing the point a bit.
But again, he's only human and I do get the urge to get a bit defensive of your baby.
nxobject 6 hours ago [-]
There’s another slogan that also acts as catch-all dismissal - “easy things should be easy; hard things should be possible”. Yes, but the bar for “hard things” just happens to be frustratingly low compared to other languages - ie library programming that has enough genericity and robustness. To wit, this example.
throwaway2037 5 hours ago [-]
> easy things should be easy; hard things should be possible
From many years ago, this was a Perl motto from Larry Warry. Is the original pontificator... or was it someone before him?
jahnu 4 hours ago [-]
30 year c++ veteran here. Also don’t know 80%. I used to know more but a combination of way better tooling and “modern” c++ and most importantly realising that thinking more about things other than the language details led to better software meant I have forgotten stuff I used to know.
skrebbel 12 hours ago [-]
This seems very similar to Java's oldschool single-interface callback mechanism. Originally, Java didn't have lambdas or closures or anything of the sort, so instead they'd litter the standard library with single-method interfaces with names like ActionListener, MouseListener, ListItemSelectedListener, etc. You'd make a class that implements that interface, manually adding whatever data you need in the callback (just like here), and implement the callback method itself of course.
I think that has the same benefit as this, that the callbacks are all very clearly named and therefore easy to pick out of a stack trace.
(In fact, it seems like a missed opportunity that modern Java lambdas, which are simply syntactical sugar around the same single-method interface, do not seem to use the interface name in the autogenerated class)
spullara 10 hours ago [-]
They don't autogenerate classes anymore, just private static methods though I agree that it would be nice to have more of the metadata in the name of the generated method.
skrebbel 10 hours ago [-]
Oh really? Cool, I did not know that.
How does that work with variables in the closure then? I could see that work with the autogenerated class: Just make a class field for every variable referenced inside the lambda function body, and assign those in constructor. Pretty similar to this here article. But it's not immediately obvious to me how private static methods can be used to do the same, except for callbacks that do not form a closure (eg filter predicates and sort compare functions and the likes that only use the function parameters).
spullara 9 hours ago [-]
Ah there is some nuance. For capturing lambdas they do generate a class at runtime to capture the variables but it still then just calls the generated private method with the simplistic naming scheme. Also, apparently the simple naming scheme was chosen so as to not go down the C++ mangled name path and just depend on the debugging information.
tlb 9 hours ago [-]
I don't have this problem with backtraces in Clang. The 'anonymous' lambdas have debugging symbols named after the function it lexically appears in, something like parent_function::$_0::invoke. $_0 is the first lambda in that function, then $_1, etc. So it's easy enough to look up.
lenkite 7 hours ago [-]
This. I was confused when I read that - I guess MSVC doesn't generate such conventional lambda names ?
badmintonbaseba 7 hours ago [-]
It's up to the demangler, the info must be there in the decorated/mangled name. Demanglers sometimes choke on these complex symbols.
AFAIK MSVC also changed their lambda ABI once, including mangling. As I recall at one point it even produced some hash in the decorated/mangled name, with no way to revert it, but that was before /Zc:lambda (enabled by default from C++20).
JeanMarcS 1 days ago [-]
Don't know about the code subtilities, but SumatraPDF is a gift for viewing PDF on MS Windows.
So big thanks to the author !
Arainach 19 hours ago [-]
Out of curiosity, what's your use case for it? Years ago I preferred Sumatra/Foxit to Adobe, but every major browser has supported rendering PDFs for at least a decade and I haven't had needed or wanted a dedicated PDF reader in all that time.
sameerds 13 hours ago [-]
Opening a pdf inside a browser feels to me like an application inside an application. My brain can't handle that load. I would rather have the browser to browse the internet and a pdf reader to display pdfs. If I clicked on a link to a pdf, it is _not_ part of the web, and I want the browser to stay out of it. Same goes for Office 360 wanting to documents inside my browser. I don't want it to do that. I have the necessary apps installed for it.
4gotunameagain 8 hours ago [-]
I really would not like to ruin the entire internet for you, but isn't the vast majority of websites these days fully fledged applications, hence applications in application in the sense you mentioned ?
I would argue that a pdf reader is much simpler than multiple very popular webpages nowadays.
drewbitt 17 hours ago [-]
Not only is it faster in opening than a browser and a separation of concerns (documents get their own app, which I can leave with open tabs), it also opens epub, .cbz, and other formats, so I have it installed on all my Windows machines. I eventually open a book.
mjmas 18 hours ago [-]
Part of why I use SumatraPDF is that it automatically reloads its view when the files change (at least for PDFs, I haven't tested on the other file types it supports).
KaushikR2 14 hours ago [-]
That's not always desirable though. I'd rather have control over that
eviks 12 hours ago [-]
You have, via a config. You have no control in a browser.
jasonfarnon 13 hours ago [-]
same here--I can't imagine using latex without this feature. To me it's a beautiful piece of software, the only thing I keep pinned to the taskbar other than wsl shell.
vachina 14 hours ago [-]
> use case
Sumatra excels at read-only. Usually anything to do with PDF is synonymous with slow, bloat, buggy, but Sumatra at just 10Mbytes, managed to feel snappy, fast like a win32 native UI.
lenkite 7 hours ago [-]
> I haven't had needed or wanted a dedicated PDF reader in all that time.
OK. Now load 100 PDF's. You will need a dedicated PDF reader unless you don't mind wasting a truckload of RAM. Also, browser PDF readers are generally slower and are not optimal at search/bookmarks/navigation/etc.
CamouflagedKiwi 3 hours ago [-]
I've never needed to load 100 PDFs at once, and honestly I don't imagine I ever will. I guess it might happen for some people, so a dedicated app would be useful for them.
For me, having a separate dedicated app isn't worth it for the benefits you mention, which to me are minor compared to having to install and manage another thing (which, to be fair, I imagine Sumatra to be a very pleasant citizen at compared to Acrobat).
Cadwhisker 18 hours ago [-]
It's smaller, lighter and much faster than launching a web browser to view a PDF. I can configure it to open a new instance for each PDF which is nice if you need to have several docs open at once. Again, nothing that you can't do with a browser and dragging tabs, but I prefer this.
df0b9f169d54 11 hours ago [-]
As I still recalled it's possible to configure an external editor so that when you click on any place on sumatraPDF viewer you can open the source file that is annotated with the clicked position. This is extremely helpful when working with LaTeX documents.
agent327 12 hours ago [-]
Sumatra will reload any PDF that changes while you are viewing it (Adobe locks the file, so you can't change it to begin with). This is incredibly useful when you are writing documentation using a document generating system (like docbook).
graemep 15 hours ago [-]
Large PDFs are very slow in browsers. I believe they all use pdf.js (or similar).
cAtte_ 14 hours ago [-]
firefox uses pdf.js, but chromium uses pdfium and safari uses pdfkit
graemep 10 hours ago [-]
PDFkit seems similar in that it is JS. Is it also slow?
I just tried a few PDFs in Chromium and PDFium seems to be much better than pdf.js - faster and handles forms more smoothly.
fredoralive 10 hours ago [-]
PDFkit seems to be a name a few different things in stuff like node and ruby. I think the Apple PDFkit is probably just wrapping Apple’s in house PDF tech that Preview uses?
graemep 8 hours ago [-]
It does not seem to have any requirements than JS - browser or Node. There is an online demo that works with Firefox on Linux so not wrapping anything else.
cAtte_ 3 hours ago [-]
the pdfkit from the first google result doesn't seem to be related to apple's. what happened here is that "pdfkit" is a very generic name (that will tend to show up because people love writing pdf-related software) that also happens to coincide with apple's convention of naming their frameworks something-kit (uikit, appkit, avkit, ...)
eviks 12 hours ago [-]
How do you alt tab to a browser tab with a PDF? How do you change navigation shortcuts when browsers are notoriously bad at such customizations?
dolmen 11 hours ago [-]
It is not sandboxed.
So one can expect zero day exists and are exploited.
That may not be a feature for you, but it is for attackers.
jasonjayr 9 hours ago [-]
Does it implement any of the dynamic features in PDF that are vectors for easy attacks like that?
PDF was originally a display-only format.
kccqzy 4 hours ago [-]
You don't need any dynamic features in PDF to attack. One of the most famous exploits used a bug in the JBIG2 format to build the attacker's own dynamic feature (basically a virtual machine built from logic operations) to launch an exploit. https://googleprojectzero.blogspot.com/2021/12/a-deep-dive-i...
In fact you have gotten it backwards. The obviously dynamic features in PDF like JavaScript are designed to be dynamic so they receive so much more attention in security. So smart attackers attack the not-obviously-dynamic features in PDF.
jasonjayr 3 hours ago [-]
Ah, very good point.
shakna 8 hours ago [-]
Sumatra has more security features than most other readers?
For example, it doesn't support JavaScript. And it doesn't support GoToE.
The text features, both strings and fonts, get sent through HarfBuzz for sanitisation.
How is it not sandboxed?
vgb2k18 10 hours ago [-]
If you hate it when pdfs won't print because of restrictive permissions... Sumatra.
wavemode 7 hours ago [-]
opening (and browsing/searching through) a very large PDF is a nightmare in most browsers
NooneAtAll3 11 hours ago [-]
in my experience, browser pdf viewers take a loooot more RAM than Sumatra
ternaryoperator 17 hours ago [-]
Not the OP, but my use case is epub books, which it handles flawlessly.
comex 15 hours ago [-]
Note that some CFI (control flow integrity) implementations will get upset if you call a function pointer with the wrong argument types:
(This approach also requires explicitly writing the argument type. It's possible to remove the need for this, but not without the kind of complexity you're trying to avoid.)
_randyr 23 hours ago [-]
I'm not a C++ programmer, but I was under the impression that closures in c++ were just classes that overload the function call operator `operator()`. So each closure could also be implemented as a named class. Something like:
Perhaps I'm mistaken in what the author is trying to accomplish though?
OskarS 23 hours ago [-]
Indeed, that is exactly the case, lambdas are essentially syntax sugar for doing this.
The one thing the author's solution does which this solution (and lambdas) does not is type erasure: if you want to pass that closure around, you have to use templates, and you can't store different lambdas in the same data structure even if they have the same signature.
You could solve that in your case by making `void operator()` virtual and inheriting (though that means you have to heap-allocate all your lambdas), or use `std::function<>`, which is a generic solution to this problem (which may or may not allocate, if the lambda is small enough, it's usually optimized to be stored inline).
I get where the author is coming from, but this seems very much like an inferior solution to just using `std::function<>`.
pwagland 9 hours ago [-]
The author of the article freely admits that `std::function<>` is more flexible. He still prefers this solution, as it is easier for him to reason about. This is covered in the "Fringe Benefits" part of the document.
usefulcat 19 hours ago [-]
> though that means you have to heap-allocate all your lambdas
I think whether or not you have to allocate from the heap depends on the lifetime of the lambda. Virtual methods also work just fine on stack-allocated objects.
OskarS 13 hours ago [-]
Fair point, but generally speaking, callbacks tend to escape the scopes they are in (if you have a callback for ”user clicked mouse”, it’s likely not going to be triggered in your current scope), so stack-allocation isn’t really an option.
But yes, fair point: they can be stack or statically allocated as well.
spacechild1 23 hours ago [-]
Exactly! And if you need type erasure, you can just store it in a std::function.
> OnListItemSelectedData data;
In this case you can just store the data as member variables. No need for defining an extra class just for the data.
As I've written elsewhere, you can also just use a lambda and forward the captures and arguments to a (member) function. Or if you're old-school, use std::bind.
InfiniteRand 19 hours ago [-]
Main issue author had with lambdas is autogenerated names in crash reports
_randyr 13 hours ago [-]
Yes, but that's exactly why I mention this. By explicitly creating a class (that behaves the same as a lambda) the author might get better names in crash reports.
b0a04gl 45 minutes ago [-]
the approach works, but there's hidden cost in how it shapes the compiled output. every callback adds a layer the compiler has to guess around. curious if anyone checked what this does to inlining and branch prediction across builds. does the extra indirection prevent useful optimisations? or does the compiler end up being too aggressive and misoptimise when the struct layout changes later? would be useful to diff the assembly across releases and compilers
akdev1l 21 hours ago [-]
I don’t really understand what problem this is trying to solve and how the solution is better than std::function. (I understand the issue with the crash reports and lambdas being anonymous classes but not sure how the solution improved on this or how std::function has this problem?)
I haven’t used windows in a long time but back in the day I remember installing SumatraPDF to my Pentium 3 system running windows XP and that shit rocked
benreesman 11 minutes ago [-]
It's a daily thing we all do: decide if this problem is better solved by a big chunk of code that is probably well tested but probably satisfies a bunch of requirements and other constraints or a smaller chunk of code that I can write or vendor in and has other advantages or maybe I just prefer how its spelled. Sometimes there's a "right" answer, e.g. you should generally link in your TLS implantation unless you're a professional TLS pereon, but usually its a judgement call, and the aggregate of all those micro-decisions are a component of the intangible ideal of "good taste" (also somewhat subjective but most agree on the concept of an ideal).
In this instance the maintainer of a useful piece of software has made a choice that's a little less common in C++ (totally standard practice in C) and it seems fine, its on the bubble, I probably default the other way, but std::function is complex and there are platforms where that kind of machine economy is a real consideration, so why not?
In a zillion contributor project I'd be a little more skeptical of the call, but even on massive projects like the Linux kernel they make decisions about the house style that seem unorthodox to outsiders and they have their reasons for doing so. I misplaced the link but a kernel maintainer raised grep-friendliness as a reason he didn't want a patch. At first I was like, nah you're not saying the real reason, but I looked a little and indeed, the new stuff would be harder to navigate without a super well-configured LSP.
Longtime maintainers have reasons they do things a certain way, and the real test is the result. In this instance (and in most) I think the maintainer seems to know what's best for their project.
kjksf 19 hours ago [-]
How is Func0 / Func1<T> better than std::function?
Smaller size at runtime (uses less memory).
Smaller generated code.
Faster at runtime.
Faster compilation times.
Smaller implementation.
Implementation that you can understand.
How is it worse?
std::function + lambda with variable capture has better ergonomics i.e. less typing.
akdev1l 19 hours ago [-]
I think none of these points are demonstrated in the post hence I fail to visualize it
Also I copy pasted the code from the post and I got this:
test.cpp:70:14: error: assigning to 'void ' from 'func0Ptr' (aka 'void ()(void *)') converts between void pointer and function pointer
70 | res.fn = (func0Ptr)fn;
kjksf 9 hours ago [-]
Thanks, fixed.
It works in msvc but as someone pointed out, it was a typo and was meant to be (void*) cast.
cryptonector 18 hours ago [-]
> test.cpp:70:14: error: assigning to 'void ' from 'func0Ptr' (aka 'void ()(void *)') converts between void pointer and function pointer 70 | res.fn = (func0Ptr)fn;
This warning is stupid. It's part of the "we reserve the right to change the size of function pointers some day so that we can haz closures, so you can't assume that function pointers and data pointers are the same size m'kay?" silliness. And it is silly: because the C and C++ committees will never be able to change the size of function pointers, not backwards-compatibly. It's not that I don't wish they could. It's that they can't.
akdev1l 18 hours ago [-]
It’s not a warning, it’s a compile time error and I am not even using -Wall -Werror
> auto fptr = &f; void a = reinterpret_cast<void &>(fptr);
edit: I tried with GCC 15 and that compiled successfully
comex 15 hours ago [-]
It should just be
res.fn = (void *)fn;
`res.fn` is of type `void *`, so that's what the code should be casting to. Casting to `func0Ptr` there seems to just be a mistake. Some compilers may allow the resulting function pointer to then implicitly convert to `void *`, but it's not valid in standard C++, hence the error.
Separately from that, if you enable -Wpedantic, you can get a warning for conversions between function and data pointers even if they do use an explicit cast, but that's not the default.
gpderetta 13 hours ago [-]
FWIW, POSIX practically requires void otr and function otr inter-convertibility hence the support from GCC.
wahern 11 hours ago [-]
Not practically, but literally:
Note that conversion from a void * pointer to a function
pointer as in:
fptr = (int (*)(int))dlsym(handle, "my_function");
is not defined by the ISO C standard. This standard
requires this conversion to work correctly on conforming
implementations.
gpderetta 9 hours ago [-]
I remembered the dlsym requirement, I wasn't sure it was de-jure.
spacechild1 16 hours ago [-]
You can't just keep claiming these things without providing evidence. How much faster? How much smaller? These claims are meaningless without numbers to back it up.
oezi 14 hours ago [-]
I think the one key downside for std::function+lambda which resonated with me was bad ergonomics during debugging.
Sadly that'll only work for captureless lambdas, however.
zack-alex 8 hours ago [-]
[dead]
m-schuetz 10 hours ago [-]
None of the arguments on this list seem convincing. The only one that makes sense was the argument that it helps identify the source of a crash.
How much smaller is it? Does it reduce the binary size and RAM usage by just 100 bytes?
Is it actually faster?
How much faster does it compile? 2ms faster?
kjksf 10 hours ago [-]
I didn't write that article to convince anybody.
I wrote it to share my implementation and my experience with it.
SumatraPDF compiles fast (relative to other C++ software) and is smaller, faster and uses less resources that other software.
Is it because I wrote Func0 and Func1 to replace std::function? No.
Is it because I made hundreds decisions like that? Yes.
You're not wrong that performance wins are miniscule.
What you don't understand is that eternal vigilance is the price of liberty. And small, fast software.
pwagland 9 hours ago [-]
This is a valid point missed by many today. The mantra of don't optimise early is often used as an excuse to not optimise at all, and so you end up with a lot of minor choices scattered throughout the code with all suck a tiny bit of performance out of the system. Fixing any of these is also considered to be worthless, as the improvement from any one change is miniscule. But added up, they become noticeable.
SuperV1234 7 hours ago [-]
> Is it because I made hundreds decisions like that? Yes.
Proof needed. Perhaps your overall program is designed to be fast and avoid silly bottlenecks, and these "hundred decisions" didn't really matter at all.
fsloth 4 hours ago [-]
In my experience performance comes from constant vigilance and using every opportunity to choose the performant way of implementing something.
Silly bottlenecks are half of the perf story in my experience. The other half are a billion tiny details.
badmintonbaseba 10 hours ago [-]
> Smaller size at runtime (uses less memory).
Yours is smaller (in terms of sizeof), because std::function employs small-buffer optimization (SBO). That is if the user data fits into a specific size, then it's stored inline the std::function, instead of getting heap allocated. Yours need heap allocation for the ones that take data.
Whether yours win or lose on using less memory heavily depends on your typical closure sizes.
> Faster at runtime
Benchmark, please.
almostgotcaught 19 hours ago [-]
Your Func thing is better than std::function the same way a hammer is better than a drill press... ie it's not better because it's not the same thing at all. Yes the hammer can do some of the same things, at a lower complexity, but it can't do all the same things.
What I'm trying to say is being better than x means you can do all the same things as x better. Your thing is not better, it is just different.
spacechild1 23 hours ago [-]
> I’ve used std::function<> and I’ve used lambdas and what pushed me away from them were crash reports.
In danger of pointing out the obvious: std::function does note require lambdas. In fact, it has existed long before lambdas where introduced. If you want to avoid lambdas, just use std::bind to bind arguments to regular member functions or free functions. Or pass a lambda that just forwards the captures and arguments to the actual (member) function. There is no reason for regressing to C-style callback functions with user data.
kjksf 20 hours ago [-]
I did use bind earlier in SumatraPDF.
There are 2 aspects to this: programmer ergonomics and other (size of code, speed of code, compilation speed, understandability).
Lambdas with variable capture converted to std::function have best ergonomics but at the cost of unnamed, compiler-generated functions that make crash reports hard to read.
My Func0 and Func1<T> approach has similar ergonomics to std::bind. Neither has the problem of potentially crashing in unnamed function but Func0/Func1<T> are better at other (smaller code, faster code, faster compilation).
It's about tradeoffs. I loved the ergonomics of callbacks in C# but I working within limitations of C++ I'm trying to find solutions with attributes important to me.
spacechild1 20 hours ago [-]
> but Func0/Func1<T> are better at other (smaller code, faster code, faster compilation).
I would really question your assumptions about code size, memory usage and runtime performance. See my other comments.
Kranar 16 hours ago [-]
> In fact, it has existed long before lambdas where introduced.
Both std::function<> and lambdas were introduced in C++11.
Furthermore absolutely no one should use std::bind, it's an absolute abomination.
spacechild1 6 hours ago [-]
You are absolutely right of course! No idea, why I thought std::function existed before C++11. Mea culpa!
> Furthermore absolutely no one should use std::bind, it's an absolute abomination.
Agree 100%! I almost always use a wrapper lambda.
However, it's worth pointing out that C++20 gave us std::bind_front(), which is really useful if you want to just bind the first N arguments:
struct Foo {
void bar(int a, int b, int c);
};
Foo foo;
using Callback = std::function<void(int, int, int)>;
// with std::bind (ugh):
using namespace std::placeholders;
Callback cb1(std::bind(&Foo::bar, &foo, _1, _2, _3));
// lambda (without perfect forwarding):
Callback cb2([&foo](auto&&... args) { foo.bar(args...); });
// lambda (with perfect forwarding):
Callback cb3([&foo](auto&&... args) { foo.bar(std::forward<decltype(args)>(args)...); });
// std::bind_front
Callback cb4(std::bind_front(&Foo::bar, &foo));
I think std::bind_front() is the clear winner here.
mandarax8 23 hours ago [-]
std::bind is bad for him for the same reasons std::function is bad though
spacechild1 22 hours ago [-]
Why? If the bound (member) function crashes, you should get a perfectly useable crash report. AFAIU his problem was that lambdas are anonymous function objects. This is not the case here, because the actual code resides in a regular (member) function.
dustbunny 22 hours ago [-]
Does a stack trace from a crash in a bound function show the line number of where the bind() took place?
spacechild1 22 hours ago [-]
No, but neither does the author's solution.
delusional 22 hours ago [-]
Assuming the stack trace is generated by walking up the stack at the time when the crash happened, nothing that works like a C function pointer would ever do that. Assigning a a pointer to a memory location doesn't generate a stack frame, so there's no residual left in the stack that could be walked back.
A simple example. If you were to bind a function pointer in one stack frame, and the immediately return it to the parent stack frame which then invokes that bound pointer, the stack that bound the now called function would literally not exist anymore.
AlienRobot 24 minutes ago [-]
>The implementation cleverness: use a special, impossible value of a pointer (-1) to indicate a function without arguments.
From what I know about C this code probably breaks on platforms that nobody uses.
Thanks for Sumatra, by the way :D Very useful software!
mwkaufma 21 hours ago [-]
The lengths some go to avoid just using a bog-standard virtual function.
kjksf 21 hours ago [-]
I actually used the "virtual function" approach earlier in SumatraPDF.
The problem with that is that for every type of callback you need to create a base class and then create a derived function for every unique use.
That's a lot of classes to write.
Consider this (from memory so please ignore syntax errors, if any):
I would have to create a base class for every unique type of the callback and then for every caller possibly a new class deriving.
This is replaced by Func0 or Func1<T>. No new classes, much less typing. And less typing is better programming ergonomics.
std::function arguably has slightly better ergonomics but higher cost on 3 dimension (runtime, compilation time, understandability).
In retrospect Func0 and Func1 seem trivial but it took me years of trying other approaches to arrive at insight needed to create them.
gpderetta 9 hours ago [-]
You could do:
template<class R, class... Args>
struct FnBase {
virtual R operator()(Args...) = 0;
};
class MyThread : FnBase<void> { ... };
mwkaufma 30 minutes ago [-]
This is pithy and clever, but in exchange makes the code less obviously self-documenting (implementation details creeping into type declarations) and complicates implementing multiple interfaces on the same receiver.
mwkaufma 19 hours ago [-]
>> I would have to create a base class for every unique type of the callback and then for every caller possibly a new class deriving.
An interface declaration is, like, two lines. And a single receiver can implement multiple interfaces. In exchange, the debugger gets a lot more useful. Plus it ensures the lifetime of the "callback" and the "context" are tightly-coupled, so you don't have to worry about intersecting use-after-frees.
bmn__ 8 hours ago [-]
> [Lambdas] get non-descriptive, auto-generated names. When I look at call stack of a crash I can’t map the auto-generated closure name to a function in my code.
Well, HN lazyweb, how do you override the stupid name in C++? In other languages this is possible:
Better solution would be to map to a source location (filename and line) instead.
eddd-ddde 6 hours ago [-]
Not necessary tho, the function already exists in the binary and already has a symbol. All you need is some compiler/language feature to change that symbol.
tempodox 5 hours ago [-]
Slightly off-topic: Thanks to the author for SumatraPDF! It's an excellent Windows app that saves me (and many others, I'm sure) from having to use that horrible shit show that is Acrobat Reader.
waynecochran 1 days ago [-]
A small kitten dies every time C++ is used like its 1995.
void (*fn)(void*, T) = nullptr;
tom_ 22 hours ago [-]
And another one dies every time you need to step through a call to std::function. Whatever you do, the kittens are never going to escape.
plq 24 hours ago [-]
Unless you mutter the magic incantation "C compatibility" while doing it
zabzonk 24 hours ago [-]
did nullptr exist in c++ back in 1995 - i can't remember
trealira 24 hours ago [-]
Nope, it was introduced in C++11, along with the type std::nullptr_t. Before that, you either used 0 or NULL, which was a macro constant defined to be 0.
petters 12 hours ago [-]
> In fact, I don’t think anyone understands std::function<> including the 3 people who implemented it.
"I don't understand it, so surely it must be very difficult and probably nobody understands it"
at this stage? This implementation has a bunch of performance and ergonomics issues due to things like not using perfect forwarding for the Func1::Call(T) method, so for anything requiring copying or allocating it'll be a decent bit slower and you'll also be unable to pass anything that's noncopyable like an std::unique_ptr.
kjksf 24 hours ago [-]
I don't know fancy C++ so I don't understand your point about perfect forwarding.
But I do know the code I write and you're wrong about performance of Func0 and Func1. Those are 2 machine words and all it takes to construct them or copy them is to set those 2 fields.
There's just no way to make it faster than that, both at runtime or at compile time.
The whole point of this implementation was giving up fancy features of std::function in exchange for code that is small, fast (both runtime and at compilation time) and one that I 100% understand in a way I'll never understand std::function.
Say you pass something like an std::vector<double> of size 1 million into Call. It'll first copy the std::vector<double> at the point you invoke Call, even if you never call fn. Then, if fn is not nullptr, you'll then copy the same vector once more to invoke fn. If you change Call instead to
the copy will not happen at the point Call is invoked. Additionally, if arg is an rvalue, fn will be called by moving instead of copying. Makes a big difference for something like
std::vector<double> foo();
void bar(Func1<std::vector<double>> f) {
auto v = foo();
f(std::move(v));
}
OskarS 23 hours ago [-]
> But I do know the code I write and you're wrong about performance of Func0 and Func1. Those are 2 machine words and all it takes to construct them or copy them is to set those 2 fields.
You also have to heap allocate your userData, which is something std::function<> avoids (in all standard implementations) if it’s small enough (this is why the sizeof() of std::function is larger than 16 bytes, so that it can optionally store the data inline, similar to the small string optimization). The cost of that heap allocation is not insignificant.
If I were doing this, I might just go the full C route and just use function pointers and an extra ”userData” argument. This seems like an awkward ”middle ground” between C and C++.
badmintonbaseba 10 hours ago [-]
Just use std::function, you don't have to pass a lambda. Any callable is fine.
CraftingLinks 14 hours ago [-]
"Templated code is a highway to bloat." Should be on a t-shirt.
mandarax8 23 hours ago [-]
What he shows here is 75% of c++26's std::function_ref. It's mainly missing variadic arguments and doesn't support all types of function objects.
Do you understand how your compiler works? Shouldn't you be writing assembly instead? You can't understand all internals and that's perfectly fine.
Why do you even care how std::function is implemented? (Unless you are working in very performance critical or otherwise restricted environments.)
kjksf 22 hours ago [-]
I've listed several reasons why I decided to write and use this implementation:
- better call stacks in crash reports
- smaller and faster at runtime
- faster compilation because less complicated, less templated code
- I understand it
So there's more to it that just that one point.
Did I loose useful attributes? Yes. There's no free lunch.
Am I going too far to achieve small, fast code that compiles quickly? Maybe I do.
My code, my rules, my joy.
But philosophically, if you ever wonder why most software today can't start up instantly and ships 100 MB of stuff to show a window: it's because most programmers don't put any thought or effort into keeping things small and fast.
spacechild1 22 hours ago [-]
Oh, I definitely agree with some of your other points, just not the one I argued against.
BTW, I would also contest that your version is faster at runtime. Your data always allocated on the heap. Depending on the size of the data, std::function can utilize small function optimization and store everything in place. This means there is no allocation when setting the callback and also better cache locality when calling it. Don't make performance claims without benchmarking!
Similarly, the smaller memory footprint is not as clear cut: with small function optimization there might be hardly a difference. In some cases, std::function might even be smaller. (Don't forget about memory allocation overhead!)
The only point I will absolutely give you is compilation times. But even there I'm not sure if std::function is your bottleneck. Have you actually measured?
kjksf 20 hours ago [-]
That's a fair point. I just looked and out of 35 uses of MkFunc0 only about 3 (related to running a thread) allocate the args.
All others use a pointer to an object that exists anyway. For example, I have a class MyWindow with a button. A click callback would have MyWindow* as an argument because that's the data needed to perform that action. That's the case for all UI widgets and they are majority uses of callbacks.
I could try to get cheeky and implement similar optimization as Func0Fat where I would have inline buffer on N bytes and use it as a backing storage for the struct. But see above for why it's not needed.
As to benchmarking: while I don't disagree that benchmarking is useful, it's not the ace card argument you think it is.
I didn't do any benchmarks and I do no plan to.
Because benchmarking takes time, which I could use writing features.
And because I know things.
I know things because I've been programming, learning, benchmarking for 30 years.
I know that using 16 bytes instead of 64 bytes is faster. And I know that likely it won't be captured by a microbenchmark.
And even if it was, the difference would be miniscule.
So you would say "pfft, I told you it was not worth it for a few nanoseconds".
But I know that if I do many optimizations like that, it'll add up even if each individual optimization seems not worth it.
And that's why SumatraPDF can do PDF, ePub, mobi, cbz/cbr and uses less resources that Windows' start menu.
spacechild1 19 hours ago [-]
First, thanks for providing SumataraPDF as free software! I don't want to disparage your software in any way. I don't really care how it's written as long as it works well - and it does! This is really just about your blog post.
> I just looked and out of 35 uses of MkFunc0 only about 3 (related to running a thread) allocate the args.
In that case, std::function wouldn't allocate either.
> All others use a pointer to an object that exists anyway. For example, I have a class MyWindow with a button. A click callback would have MyWindow* as an argument because that's the data needed to perform that action. That's the case for all UI widgets and they are majority uses of callbacks.
That's what I would have guessed. Either way, I would just use std::bind or a little lambda:
If your app crashes in MyWindow::onButtonClicked, that method would be on the top of the stack trace. IIUC this was your original concern. Most of your other points are just speculation. (The compile time argument technically holds, but I'm not sure to which extend it really shows in practice. Again, I would need some numbers.)
> I know things because I've been programming, learning, benchmarking for 30 years.
Thinking that one "knows things" is dangerous. Things change and what we once learned might have become outdated or even wrong.
> I know that using 16 bytes instead of 64 bytes is faster. And I know that likely it won't be captured by a microbenchmark.
Well, not necessarily. If you don't allocate any capture data, then your solution will win. Otherwise it might actually perform worse. In your blog post, you just claimed that your solution is faster overall, without providing any evidence.
Side note: I'm a bit surprised that std::function takes up 64 bytes in 64-bit MSVC, but I can confirm that it's true! With 64-bit GCC and Clang it's 32 bytes, which I find more reasonable.
> And even if it was, the difference would be miniscule.
That's what I would think as well. Personally, I wouldn't even bother with the performance of a callback function wrapper in a UI application. It just won't make a difference.
> But I know that if I do many optimizations like that, it'll add up even if each individual optimization seems not worth it.
Amdahl's law still holds. You need to optimize the parts that actually matter. It doesn't mean you should be careless, but we need to keep things in perspective. (I would care if this was called hundreds or thousands of times within a few milliseconds, like in a realtime audio application, but this is not the case here.)
To be fair, in your blog post you do concede that std::function has overall better ergonomics, but I still think you are vastly overselling the upsides of your solution.
maleldil 20 hours ago [-]
> You can't understand all internals, and that's perfectly fine.
C++ takes this to another level, though. I'm not an expert Go or Rust programmer, but it's much easier to understand the code in their standard libraries than C++.
spacechild1 18 hours ago [-]
Fair enough :) Unfortunately, this is just something one has to accept as a C++ programmer. Should we roll our own std::vector because we can't understand the standard library implemention? The answer is, of course, a firm "no" (unless you have very special requirements).
Somehow my blog server got overwhelmed and requests started taking tens of seconds. Which is strange because typically it's under 100ms (it's just executing a Go template).
It's not a CPU issues so there must be locking issue I don't understand.
noomen 23 hours ago [-]
I just want to thank SumatraPDF's creator, he literally saved my sanity from the evil that Adobe Acrobat Reader is. He probably saved millions of people thousands of hours of frustration using Acrobat Reader.
_ZeD_ 11 hours ago [-]
> In programming language lingo, code + data combo is called a closure.
in my day code + data was called a class :)
(yeah, yeah, I know closure and class may be viewed as the same thing, and I know the Qc Na koan)
vanschelven 10 hours ago [-]
For those not in the know:
The venerable master Qc Na was walking with his student, Anton. Hoping to prompt the master into a discussion, Anton said "Master, I have heard that objects are a very good thing - is this true?" Qc Na looked pityingly at his student and replied, "Foolish pupil - objects are merely a poor man's closures."
Chastised, Anton took his leave from his master and returned to his cell, intent on studying closures. He carefully read the entire "Lambda: The Ultimate..." series of papers and its cousins, and implemented a small Scheme interpreter with a closure-based object system. He learned much, and looked forward to informing his master of his progress.
On his next walk with Qc Na, Anton attempted to impress his master by saying "Master, I have diligently studied the matter, and now understand that objects are truly a poor man's closures." Qc Na responded by hitting Anton with his stick, saying "When will you learn? Closures are a poor man's object." At that moment, Anton became enlightened.
Class is a set of functions. Closure is one function.
In old Java, it really was a class. In new Java, I'm not 100% sure anymore, but with verbose syntax it'll be an class. I made it as verbose as possible:
Two, my main objective is extreme simplicity and understandability of the code.
I explicitly gave up features of std::function for smaller code that I actually understand.
fu2 seems to be "std::function but more features".
pif 10 hours ago [-]
> I don’t understand std::function<> implementation.
This is the kind of (maybe brilliant, maybe great, maybe both, surely more than myself) developers I don't like to work with.
You are not required to understand the implementation: you are only required to fully understand the contract. I hate those colleagues who waste my time during reviews because they need to delve deeply into properly-named functions before coming back to the subject at hand.
Implementations are organized at different logical level for a reason. If you are not able to reason at a fixed level, I don't like to work with you (and I understand you will not like to work with me).
skrebbel 10 hours ago [-]
I'd be more sympathetic to your argument if this was about Python or Java web backends or something like that. But in C++, especially for a program like SumatraPDF with millions of installations on end-user computers where crashes can occur far away from a debugger, it's often borderline impossible to analyse problems without at least somewhat understanding the internals of every library feature you use.
I think avoiding features you don't understand the implementation of makes a lot of sense in those kinds of situations.
The hidden assumption in your comment is that the contract is implemented perfectly and that the abstraction isn't leaky. This isn't always the case. The author explained a concrete way in which the std::function abstraction leaks:
> They get non-descriptive, auto-generated names. When I look at call stack of a crash I can’t map the auto-generated closure name to a function in my code. It makes it harder to read crash reports.
spacechild1 7 hours ago [-]
> The author explained a concrete way in which the std::function abstraction leaks:
But that's not an issue with std::function at all! His comment is really about lambdas and I don't understand why he conflates these two. Just don't put the actual code in a lambda:
Should have just implemented his own std::function with the simplicity and performance trade-off he wanted.
account42 7 hours ago [-]
"Surely no one will ever need to pass -1 as user data"
Why place undefined behavior traps like that in your code.
commandersaki 22 hours ago [-]
I always love this author's writing style, his articles are pure bliss to read.
agent327 12 hours ago [-]
It's unfortunate that the author hasn't spent some time figuring out how to get a stack trace, that would have saved him from reinventing std::function badly.
Also, sad to see people still using new. C++11 was 14 years ago, for crying out loud...
1 days ago [-]
1 days ago [-]
pjmlp 23 hours ago [-]
Another example of NIH, better served by using the standard library.
kjksf 19 hours ago [-]
Programming requires making fine-grained implementation decisions.
There are numerous differences between my Func0 and Func1<T> and std::function<>.
Runtime size, runtime performance, compilation speed, understandability of the code, size of the source code, size of the generated code, ergonomics of use.
My solution wins on everything except ergonomics of use.
LLVM has a small vector class.
When asked for comment, pjmlp said: "Another example of NIH, better served by using the standard library".
pjmlp 14 hours ago [-]
First of all there is the whole how much performance gain that actually wins in practice, when everyone amd their dog are shipping Electron apps.
Secondly, the maintainability of duplicating standard library code, without having the same resources as the compiler vendor.
It is your product, naturally you don't have to listen to folks like myself.
AlexandrB 3 hours ago [-]
I think context is important. In a large corporate context, NIH kills you because everything you implement needs to be documented, debugged, and understood by 10s or 100s of other people. In a small or one-man project, a lot of the NIH downsides go away and it makes (some) sense to reinvent the wheel if there are performance or simplicity benefits to be had. Consider Roller Coaster Tycoon as an example of the latter - where the author wrote everything in asm out of personal preference and for performance reason instead of using C and its libraries.
I'm surprised by how many people on HN are yelling at the author to code as if he's working at a company like Adobe, when objectively Adobe's PDF reader is dogshit (especially performance wise) for most people and is probably built on best practices like using standard libraries.
mdaniel 1 days ago [-]
SumatraPDF is outstanding software. But I'm actually surprised to hear that it seems to be written in C++ ... I dunno, kind of like "by default?" And a blog post hand rolling callback functions using structs and a bunch of pointers seems to double down on: are you sure this language is getting you where you want to go?
kjksf 1 days ago [-]
As opposed to?
Today, if I was starting from scratch, I would try zig or odin or maybe even Go.
But SumatraPDF started 15 years. There was nothing but C++. And a much different C++ that C++ of today.
Plus, while my own code is over 100k lines, external C / C++ libraries are multiple of that so (easy) integration with C / C++ code is a must.
mdaniel 24 hours ago [-]
I didn't know how to correctly package my comment as not criticizing, and that's half of why I opened with "is outstanding software." I genuinely believe that, I'm deeply grateful for you releasing SumatraPDF into the world, and it makes my life better. Truly, I am thankful
I hear you about "back in my day," but since as best I can tell it's just your project (that is, not a whole team of 50 engineers who have to collaborate on the codebase) so you are the audience being done a disservice by continuing to battle a language that hates you
As for the interop, yes, since the 70s any language that can't call into a C library is probably DoA but that list isn't the empty set, as you pointed out with the ones you've actually considered. I'd even suspect if you tried Golang it may even bring SumatraPDF to other platforms which would be another huge benefit to your users
kjksf 24 hours ago [-]
Don't worry about being nice, 15 years doing open source develops a thick skin.
But you didn't respond: which language should I use?
Go didn't exist when I started SumatraPDF.
And while I write pretty much everything else in Go and love the productivity, it wouldn't be a good fit.
A big reason people like Sumatra is that it's fast and small. 10 MB (of which majority are fonts embedded in the binary) and not 100MB+ of other apps.
Go's "hello world" is 10 MB.
Plus abysmal (on Windows) interop with C / C++ code.
And the reason SumatraPDF is unportable to mac / linux is not the language but the fact that I use all the Windows API I can for the UI.
Any cross-platform UI solution pretty much require using tens of megabytes of someone else's reimplementation of all the UI widgets (Qt, GTK, Flutter) or re-implementing a smaller subset of UI using less code.
sitzkrieg 24 hours ago [-]
sumatrapdf not being cross platform is a great feature, maximizing the use of intended platform. win32api is great. thank you for that
nashashmi 14 hours ago [-]
Yes, I now seem to enjoy win32api as it is very snappy, compared to the hell that other cross platform solutions are introducing.
nashashmi 14 hours ago [-]
I would have recommended Go but if it increases file size, please don't. Anyways, it is very fast and quick so keep it the way it is. I can't find a reason to change it.
It can even do comments. But I would like to see more comment tools, especially measurement tools.
And since you are using Windows, do you think it would be worthwhile to add Windows OCR?
mhd 24 hours ago [-]
> I'd even suspect if you tried Golang it may even bring SumatraPDF to other platforms which would be another huge benefit to your users
Probably by using a cross-platform toolkit written in C++.
CyberDildonics 5 hours ago [-]
being done a disservice by continuing to battle a language that hates you
I think you should learn basic modern C++ before making judgements like this.
While you are upset about it the rest of the world is just using basic data structures, looping through them, never dealing with garbage collection and enjoying fast and light software that can run on any hardware.
A lot of people could get by with $100 computers if all software was written like sumatraPDF.
> One thing you need to know about me is that despite working on SumatraPDF C++ code base for 16 years, I don’t know 80% of C++.
I'm pretty sure that most "why don't you just use x…" questions are implicitly answered by it, with the answer being "because using x correctly requires learning about all of it's intricacies and edge-cases, which in turn requires understanding related features q, r, s… all the way to z, because C++ edge-case complexity doesn't exist in a vacuum".
> Even I can’t answer every question about C++ without reference to supporting material (e.g. my own books, online documentation, or the standard). I’m sure that if I tried to keep all of that information in my head, I’d become a worse programmer.
-- Bjarne Stroustrup, creator of C++
But he can also contradict himself sometimes in this regard, because he also often uses a variation of calling C++ a language for "people who know what they are doing" as a sort of catch-all dismissal of critiques of its footguns.
The whole problem is that very few people can claim to truly "know what they are doing" when it comes to all of C++' features and how they interconnect, dismissing that by (implicitly) telling people to just "git gud" is missing the point a bit.
But again, he's only human and I do get the urge to get a bit defensive of your baby.
I think that has the same benefit as this, that the callbacks are all very clearly named and therefore easy to pick out of a stack trace.
(In fact, it seems like a missed opportunity that modern Java lambdas, which are simply syntactical sugar around the same single-method interface, do not seem to use the interface name in the autogenerated class)
How does that work with variables in the closure then? I could see that work with the autogenerated class: Just make a class field for every variable referenced inside the lambda function body, and assign those in constructor. Pretty similar to this here article. But it's not immediately obvious to me how private static methods can be used to do the same, except for callbacks that do not form a closure (eg filter predicates and sort compare functions and the likes that only use the function parameters).
AFAIK MSVC also changed their lambda ABI once, including mangling. As I recall at one point it even produced some hash in the decorated/mangled name, with no way to revert it, but that was before /Zc:lambda (enabled by default from C++20).
I would argue that a pdf reader is much simpler than multiple very popular webpages nowadays.
Sumatra excels at read-only. Usually anything to do with PDF is synonymous with slow, bloat, buggy, but Sumatra at just 10Mbytes, managed to feel snappy, fast like a win32 native UI.
OK. Now load 100 PDF's. You will need a dedicated PDF reader unless you don't mind wasting a truckload of RAM. Also, browser PDF readers are generally slower and are not optimal at search/bookmarks/navigation/etc.
For me, having a separate dedicated app isn't worth it for the benefits you mention, which to me are minor compared to having to install and manage another thing (which, to be fair, I imagine Sumatra to be a very pleasant citizen at compared to Acrobat).
I just tried a few PDFs in Chromium and PDFium seems to be much better than pdf.js - faster and handles forms more smoothly.
So one can expect zero day exists and are exploited.
That may not be a feature for you, but it is for attackers.
PDF was originally a display-only format.
In fact you have gotten it backwards. The obviously dynamic features in PDF like JavaScript are designed to be dynamic so they receive so much more attention in security. So smart attackers attack the not-obviously-dynamic features in PDF.
For example, it doesn't support JavaScript. And it doesn't support GoToE.
The text features, both strings and fonts, get sent through HarfBuzz for sanitisation.
How is it not sandboxed?
https://gcc.godbolt.org/z/EaPqKfvne
You could get around this by using a wrapper function, at the cost of a slightly different interface:
(This approach also requires explicitly writing the argument type. It's possible to remove the need for this, but not without the kind of complexity you're trying to avoid.)The one thing the author's solution does which this solution (and lambdas) does not is type erasure: if you want to pass that closure around, you have to use templates, and you can't store different lambdas in the same data structure even if they have the same signature.
You could solve that in your case by making `void operator()` virtual and inheriting (though that means you have to heap-allocate all your lambdas), or use `std::function<>`, which is a generic solution to this problem (which may or may not allocate, if the lambda is small enough, it's usually optimized to be stored inline).
I get where the author is coming from, but this seems very much like an inferior solution to just using `std::function<>`.
I think whether or not you have to allocate from the heap depends on the lifetime of the lambda. Virtual methods also work just fine on stack-allocated objects.
But yes, fair point: they can be stack or statically allocated as well.
> OnListItemSelectedData data;
In this case you can just store the data as member variables. No need for defining an extra class just for the data.
As I've written elsewhere, you can also just use a lambda and forward the captures and arguments to a (member) function. Or if you're old-school, use std::bind.
I haven’t used windows in a long time but back in the day I remember installing SumatraPDF to my Pentium 3 system running windows XP and that shit rocked
In this instance the maintainer of a useful piece of software has made a choice that's a little less common in C++ (totally standard practice in C) and it seems fine, its on the bubble, I probably default the other way, but std::function is complex and there are platforms where that kind of machine economy is a real consideration, so why not?
In a zillion contributor project I'd be a little more skeptical of the call, but even on massive projects like the Linux kernel they make decisions about the house style that seem unorthodox to outsiders and they have their reasons for doing so. I misplaced the link but a kernel maintainer raised grep-friendliness as a reason he didn't want a patch. At first I was like, nah you're not saying the real reason, but I looked a little and indeed, the new stuff would be harder to navigate without a super well-configured LSP.
Longtime maintainers have reasons they do things a certain way, and the real test is the result. In this instance (and in most) I think the maintainer seems to know what's best for their project.
Smaller size at runtime (uses less memory).
Smaller generated code.
Faster at runtime.
Faster compilation times.
Smaller implementation.
Implementation that you can understand.
How is it worse?
std::function + lambda with variable capture has better ergonomics i.e. less typing.
Also I copy pasted the code from the post and I got this:
test.cpp:70:14: error: assigning to 'void ' from 'func0Ptr' (aka 'void ()(void *)') converts between void pointer and function pointer 70 | res.fn = (func0Ptr)fn;
It works in msvc but as someone pointed out, it was a typo and was meant to be (void*) cast.
This warning is stupid. It's part of the "we reserve the right to change the size of function pointers some day so that we can haz closures, so you can't assume that function pointers and data pointers are the same size m'kay?" silliness. And it is silly: because the C and C++ committees will never be able to change the size of function pointers, not backwards-compatibly. It's not that I don't wish they could. It's that they can't.
I also believe there are platforms where a function pointer and a data pointer are not the same but idk about such esoteric platforms first hand (seems Itanium had that: https://stackoverflow.com/questions/36645660/why-cant-i-cast...)
Though my point was only that this code will not compile as is with whatever clang Apple ships*
I am not really sure how to get it to compile tbqh
Some further research ( https://www.kdab.com/how-to-cast-a-function-pointer-to-a-voi...) suggest it should be done like so:
> auto fptr = &f; void a = reinterpret_cast<void &>(fptr);
edit: I tried with GCC 15 and that compiled successfully
Separately from that, if you enable -Wpedantic, you can get a warning for conversions between function and data pointers even if they do use an explicit cast, but that's not the default.
My unanswered question on this from 8 years ago:
https://stackoverflow.com/questions/41385439/named-c-lambdas...
If there was a way to name lambdas for debug purposes then all other downsides would be irrelevant (for most usual use cases of using callbacks).
Instead of fully avoiding lambdas, you can use inheritance to give them a name: https://godbolt.org/z/YTMo6ed8T
Sadly that'll only work for captureless lambdas, however.
How much smaller is it? Does it reduce the binary size and RAM usage by just 100 bytes?
Is it actually faster?
How much faster does it compile? 2ms faster?
I wrote it to share my implementation and my experience with it.
SumatraPDF compiles fast (relative to other C++ software) and is smaller, faster and uses less resources that other software.
Is it because I wrote Func0 and Func1 to replace std::function? No.
Is it because I made hundreds decisions like that? Yes.
You're not wrong that performance wins are miniscule.
What you don't understand is that eternal vigilance is the price of liberty. And small, fast software.
Proof needed. Perhaps your overall program is designed to be fast and avoid silly bottlenecks, and these "hundred decisions" didn't really matter at all.
Silly bottlenecks are half of the perf story in my experience. The other half are a billion tiny details.
Yours is smaller (in terms of sizeof), because std::function employs small-buffer optimization (SBO). That is if the user data fits into a specific size, then it's stored inline the std::function, instead of getting heap allocated. Yours need heap allocation for the ones that take data.
Whether yours win or lose on using less memory heavily depends on your typical closure sizes.
> Faster at runtime
Benchmark, please.
What I'm trying to say is being better than x means you can do all the same things as x better. Your thing is not better, it is just different.
In danger of pointing out the obvious: std::function does note require lambdas. In fact, it has existed long before lambdas where introduced. If you want to avoid lambdas, just use std::bind to bind arguments to regular member functions or free functions. Or pass a lambda that just forwards the captures and arguments to the actual (member) function. There is no reason for regressing to C-style callback functions with user data.
There are 2 aspects to this: programmer ergonomics and other (size of code, speed of code, compilation speed, understandability).
Lambdas with variable capture converted to std::function have best ergonomics but at the cost of unnamed, compiler-generated functions that make crash reports hard to read.
My Func0 and Func1<T> approach has similar ergonomics to std::bind. Neither has the problem of potentially crashing in unnamed function but Func0/Func1<T> are better at other (smaller code, faster code, faster compilation).
It's about tradeoffs. I loved the ergonomics of callbacks in C# but I working within limitations of C++ I'm trying to find solutions with attributes important to me.
I would really question your assumptions about code size, memory usage and runtime performance. See my other comments.
Both std::function<> and lambdas were introduced in C++11.
Furthermore absolutely no one should use std::bind, it's an absolute abomination.
> Furthermore absolutely no one should use std::bind, it's an absolute abomination.
Agree 100%! I almost always use a wrapper lambda.
However, it's worth pointing out that C++20 gave us std::bind_front(), which is really useful if you want to just bind the first N arguments:
I think std::bind_front() is the clear winner here.A simple example. If you were to bind a function pointer in one stack frame, and the immediately return it to the parent stack frame which then invokes that bound pointer, the stack that bound the now called function would literally not exist anymore.
From what I know about C this code probably breaks on platforms that nobody uses.
Thanks for Sumatra, by the way :D Very useful software!
The problem with that is that for every type of callback you need to create a base class and then create a derived function for every unique use.
That's a lot of classes to write.
Consider this (from memory so please ignore syntax errors, if any):
compared to: I would have to create a base class for every unique type of the callback and then for every caller possibly a new class deriving.This is replaced by Func0 or Func1<T>. No new classes, much less typing. And less typing is better programming ergonomics.
std::function arguably has slightly better ergonomics but higher cost on 3 dimension (runtime, compilation time, understandability).
In retrospect Func0 and Func1 seem trivial but it took me years of trying other approaches to arrive at insight needed to create them.
An interface declaration is, like, two lines. And a single receiver can implement multiple interfaces. In exchange, the debugger gets a lot more useful. Plus it ensures the lifetime of the "callback" and the "context" are tightly-coupled, so you don't have to worry about intersecting use-after-frees.
Well, HN lazyweb, how do you override the stupid name in C++? In other languages this is possible:
"I don't understand it, so surely it must be very difficult and probably nobody understands it"
https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-...
I'm with OP here.
But I do know the code I write and you're wrong about performance of Func0 and Func1. Those are 2 machine words and all it takes to construct them or copy them is to set those 2 fields.
There's just no way to make it faster than that, both at runtime or at compile time.
The whole point of this implementation was giving up fancy features of std::function in exchange for code that is small, fast (both runtime and at compilation time) and one that I 100% understand in a way I'll never understand std::function.
You also have to heap allocate your userData, which is something std::function<> avoids (in all standard implementations) if it’s small enough (this is why the sizeof() of std::function is larger than 16 bytes, so that it can optionally store the data inline, similar to the small string optimization). The cost of that heap allocation is not insignificant.
If I were doing this, I might just go the full C route and just use function pointers and an extra ”userData” argument. This seems like an awkward ”middle ground” between C and C++.
https://github.com/TartanLlama/function_ref/blob/master/incl...
I can't even read it.
That's the fundamental problem with C++: I've understood pretty much all Go code I ever looked at.
The code like the above is so obtuse that 0.001% of C++ programmers is capable of writing it and 0.01% is capable of understanding it.
Sure, I can treat it as magic but I would rather not.
Main things you would need to understand is specialization (think like pattern matching but compile time) and pack expansion (three dots).
https://llvm.org/doxygen/STLFunctionalExtras_8h_source.html
Why do you even care how std::function is implemented? (Unless you are working in very performance critical or otherwise restricted environments.)
Did I loose useful attributes? Yes. There's no free lunch.
Am I going too far to achieve small, fast code that compiles quickly? Maybe I do.
My code, my rules, my joy.
But philosophically, if you ever wonder why most software today can't start up instantly and ships 100 MB of stuff to show a window: it's because most programmers don't put any thought or effort into keeping things small and fast.
BTW, I would also contest that your version is faster at runtime. Your data always allocated on the heap. Depending on the size of the data, std::function can utilize small function optimization and store everything in place. This means there is no allocation when setting the callback and also better cache locality when calling it. Don't make performance claims without benchmarking!
Similarly, the smaller memory footprint is not as clear cut: with small function optimization there might be hardly a difference. In some cases, std::function might even be smaller. (Don't forget about memory allocation overhead!)
The only point I will absolutely give you is compilation times. But even there I'm not sure if std::function is your bottleneck. Have you actually measured?
All others use a pointer to an object that exists anyway. For example, I have a class MyWindow with a button. A click callback would have MyWindow* as an argument because that's the data needed to perform that action. That's the case for all UI widgets and they are majority uses of callbacks.
I could try to get cheeky and implement similar optimization as Func0Fat where I would have inline buffer on N bytes and use it as a backing storage for the struct. But see above for why it's not needed.
As to benchmarking: while I don't disagree that benchmarking is useful, it's not the ace card argument you think it is.
I didn't do any benchmarks and I do no plan to.
Because benchmarking takes time, which I could use writing features.
And because I know things.
I know things because I've been programming, learning, benchmarking for 30 years.
I know that using 16 bytes instead of 64 bytes is faster. And I know that likely it won't be captured by a microbenchmark.
And even if it was, the difference would be miniscule.
So you would say "pfft, I told you it was not worth it for a few nanoseconds".
But I know that if I do many optimizations like that, it'll add up even if each individual optimization seems not worth it.
And that's why SumatraPDF can do PDF, ePub, mobi, cbz/cbr and uses less resources that Windows' start menu.
> I just looked and out of 35 uses of MkFunc0 only about 3 (related to running a thread) allocate the args.
In that case, std::function wouldn't allocate either.
> All others use a pointer to an object that exists anyway. For example, I have a class MyWindow with a button. A click callback would have MyWindow* as an argument because that's the data needed to perform that action. That's the case for all UI widgets and they are majority uses of callbacks.
That's what I would have guessed. Either way, I would just use std::bind or a little lambda:
If your app crashes in MyWindow::onButtonClicked, that method would be on the top of the stack trace. IIUC this was your original concern. Most of your other points are just speculation. (The compile time argument technically holds, but I'm not sure to which extend it really shows in practice. Again, I would need some numbers.)> I know things because I've been programming, learning, benchmarking for 30 years.
Thinking that one "knows things" is dangerous. Things change and what we once learned might have become outdated or even wrong.
> I know that using 16 bytes instead of 64 bytes is faster. And I know that likely it won't be captured by a microbenchmark.
Well, not necessarily. If you don't allocate any capture data, then your solution will win. Otherwise it might actually perform worse. In your blog post, you just claimed that your solution is faster overall, without providing any evidence.
Side note: I'm a bit surprised that std::function takes up 64 bytes in 64-bit MSVC, but I can confirm that it's true! With 64-bit GCC and Clang it's 32 bytes, which I find more reasonable.
> And even if it was, the difference would be miniscule.
That's what I would think as well. Personally, I wouldn't even bother with the performance of a callback function wrapper in a UI application. It just won't make a difference.
> But I know that if I do many optimizations like that, it'll add up even if each individual optimization seems not worth it.
Amdahl's law still holds. You need to optimize the parts that actually matter. It doesn't mean you should be careless, but we need to keep things in perspective. (I would care if this was called hundreds or thousands of times within a few milliseconds, like in a realtime audio application, but this is not the case here.)
To be fair, in your blog post you do concede that std::function has overall better ergonomics, but I still think you are vastly overselling the upsides of your solution.
C++ takes this to another level, though. I'm not an expert Go or Rust programmer, but it's much easier to understand the code in their standard libraries than C++.
Somehow my blog server got overwhelmed and requests started taking tens of seconds. Which is strange because typically it's under 100ms (it's just executing a Go template).
It's not a CPU issues so there must be locking issue I don't understand.
in my day code + data was called a class :)
(yeah, yeah, I know closure and class may be viewed as the same thing, and I know the Qc Na koan)
The venerable master Qc Na was walking with his student, Anton. Hoping to prompt the master into a discussion, Anton said "Master, I have heard that objects are a very good thing - is this true?" Qc Na looked pityingly at his student and replied, "Foolish pupil - objects are merely a poor man's closures."
Chastised, Anton took his leave from his master and returned to his cell, intent on studying closures. He carefully read the entire "Lambda: The Ultimate..." series of papers and its cousins, and implemented a small Scheme interpreter with a closure-based object system. He learned much, and looked forward to informing his master of his progress.
On his next walk with Qc Na, Anton attempted to impress his master by saying "Master, I have diligently studied the matter, and now understand that objects are truly a poor man's closures." Qc Na responded by hitting Anton with his stick, saying "When will you learn? Closures are a poor man's object." At that moment, Anton became enlightened.
https://people.csail.mit.edu/gregs/ll1-discuss-archive-html/...
In old Java, it really was a class. In new Java, I'm not 100% sure anymore, but with verbose syntax it'll be an class. I made it as verbose as possible:
and a bit less verbose with modern Java: So closure could be definitely be considered as a very simple class.Two, my main objective is extreme simplicity and understandability of the code.
I explicitly gave up features of std::function for smaller code that I actually understand.
fu2 seems to be "std::function but more features".
This is the kind of (maybe brilliant, maybe great, maybe both, surely more than myself) developers I don't like to work with.
You are not required to understand the implementation: you are only required to fully understand the contract. I hate those colleagues who waste my time during reviews because they need to delve deeply into properly-named functions before coming back to the subject at hand.
Implementations are organized at different logical level for a reason. If you are not able to reason at a fixed level, I don't like to work with you (and I understand you will not like to work with me).
I think avoiding features you don't understand the implementation of makes a lot of sense in those kinds of situations.
The hidden assumption in your comment is that the contract is implemented perfectly and that the abstraction isn't leaky. This isn't always the case. The author explained a concrete way in which the std::function abstraction leaks:
> They get non-descriptive, auto-generated names. When I look at call stack of a crash I can’t map the auto-generated closure name to a function in my code. It makes it harder to read crash reports.
But that's not an issue with std::function at all! His comment is really about lambdas and I don't understand why he conflates these two. Just don't put the actual code in a lambda:
Here the code is in a method and the lambda is only used to bind an object to the method. If you are old-school, you can also do this with std::bind:Why place undefined behavior traps like that in your code.
Also, sad to see people still using new. C++11 was 14 years ago, for crying out loud...
There are numerous differences between my Func0 and Func1<T> and std::function<>.
Runtime size, runtime performance, compilation speed, understandability of the code, size of the source code, size of the generated code, ergonomics of use.
My solution wins on everything except ergonomics of use.
LLVM has a small vector class.
When asked for comment, pjmlp said: "Another example of NIH, better served by using the standard library".
Secondly, the maintainability of duplicating standard library code, without having the same resources as the compiler vendor.
It is your product, naturally you don't have to listen to folks like myself.
I'm surprised by how many people on HN are yelling at the author to code as if he's working at a company like Adobe, when objectively Adobe's PDF reader is dogshit (especially performance wise) for most people and is probably built on best practices like using standard libraries.
Today, if I was starting from scratch, I would try zig or odin or maybe even Go.
But SumatraPDF started 15 years. There was nothing but C++. And a much different C++ that C++ of today.
Plus, while my own code is over 100k lines, external C / C++ libraries are multiple of that so (easy) integration with C / C++ code is a must.
I hear you about "back in my day," but since as best I can tell it's just your project (that is, not a whole team of 50 engineers who have to collaborate on the codebase) so you are the audience being done a disservice by continuing to battle a language that hates you
As for the interop, yes, since the 70s any language that can't call into a C library is probably DoA but that list isn't the empty set, as you pointed out with the ones you've actually considered. I'd even suspect if you tried Golang it may even bring SumatraPDF to other platforms which would be another huge benefit to your users
But you didn't respond: which language should I use?
Go didn't exist when I started SumatraPDF.
And while I write pretty much everything else in Go and love the productivity, it wouldn't be a good fit.
A big reason people like Sumatra is that it's fast and small. 10 MB (of which majority are fonts embedded in the binary) and not 100MB+ of other apps.
Go's "hello world" is 10 MB.
Plus abysmal (on Windows) interop with C / C++ code.
And the reason SumatraPDF is unportable to mac / linux is not the language but the fact that I use all the Windows API I can for the UI.
Any cross-platform UI solution pretty much require using tens of megabytes of someone else's reimplementation of all the UI widgets (Qt, GTK, Flutter) or re-implementing a smaller subset of UI using less code.
It can even do comments. But I would like to see more comment tools, especially measurement tools.
And since you are using Windows, do you think it would be worthwhile to add Windows OCR?
Probably by using a cross-platform toolkit written in C++.
I think you should learn basic modern C++ before making judgements like this.
While you are upset about it the rest of the world is just using basic data structures, looping through them, never dealing with garbage collection and enjoying fast and light software that can run on any hardware.
A lot of people could get by with $100 computers if all software was written like sumatraPDF.