Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

arrow.h brings lots of C++ to the c_api.h header #6839

Open
Ten0 opened this issue Feb 20, 2025 · 3 comments
Open

arrow.h brings lots of C++ to the c_api.h header #6839

Ten0 opened this issue Feb 20, 2025 · 3 comments
Assignees
Labels

Comments

@Ten0
Copy link
Contributor

Ten0 commented Feb 20, 2025

Description

I'm trying to update lightgbm following the vulnerability report.
It turns out that this is made very hard by the fact that c_api.h now includes tons of C++ includes.
For context, I'm using bindgen to bind to lightgbm from Rust code, an up-to-date fork of https://crates.io/crates/lightgbm.

Since Arrow support was added to the C package, tons of C++ gets brought in transitively by c_api.h, whereas before it did only contain actual C stuff:
3405ee8#diff-0b41a7775380c6d2b3321f189fe3fd3412c6621c4075ce00067b74f9312f38efR16

Brings in:

#include <algorithm>
#include <cstdint>
#include <functional>
#include <iterator>
#include <limits>
#include <memory>
#include <utility>
#include <vector>
#include <stdexcept>

Which unlike in c_api.cpp, isn't guarded by #ifdef __cplusplus:

#ifdef __cplusplus
#include <cstdint>
#include <cstdio>
#include <cstring>
#else
#include <stdint.h>
#include <stdio.h>
#include <string.h>
#endif

This makes tons of C++ leak to the include, unlike what c_api.h managed to abstract before, actually providing a pure C API only to lightgbm.

This seems to make bindgen fail miserably at generating the bindings, because it's trying to generate bindings to what is largely C++ code, instead of just lightgbm. (Also that makes the generated file 15k lines instead of 2.7k lines, besides the fact that said file doesn't compile.)

I imagine it should be possible to properly isolate the arrow types that are relevant in C in arrow.h without leaking a large part of the C++ standard library into lightgbm's c_api.h, as c_api.h used to be able to abstract lightgbm itself (which is also written in C++) behind a such clean interface.

Environment info

LightGBM version or commit hash: 4.6.0

@jameslamb jameslamb added the bug label Feb 20, 2025
Ten0 added a commit to Ten0/lightgbm-rs that referenced this issue Feb 20, 2025
@jameslamb
Copy link
Collaborator

Thanks very much for the excellent report, and very sorry for the disruption! We really do want it to be possible to link against LightGBM using only a C compiler and C data types.

We unfortunately don't test that (as you've noticed 😬 ), so issues like what you've reported can slip through. There's this old tracking issue describing adding such testing: #4609

We just never got around to implementing it.

I'd really like to return to that and help with this. I can set up the testing piece, but before that... would you be willing to submit a patch here that adds the include guards you're talking about?

@Ten0
Copy link
Contributor Author

Ten0 commented Feb 20, 2025

Thanks for your very quick answer, and happy to hear that this is something that is meant to be fixed 🚀

As it turns out I managed to find a hack to make the Rust library compile, so I'd like to correct the statement I made before that it was "very hard" to bind to, at least from a Rust standpoint: Ten0/lightgbm-rs@243322e didn't take hours to find. (Of course I'd rather this was fixed and I could use the actual c_api.h instead 😅)

would you be willing to submit a patch here that adds the include guards you're talking about?

I'm not super clear on how these should be built: the arrow.h file seems to contain class definitions explicitly unconditionally on whether we're in C++ mode as well, and as I'm not a C/C++ expert, re-evaluating what should be done here with includes at the boundary of C and C++ seems to be beyond the scope of what I would be able to do efficiently. 🫤

Ten0 added a commit to Ten0/lightgbm-rs that referenced this issue Feb 20, 2025
@jameslamb
Copy link
Collaborator

No problem at all! Thanks for sharing the link to that patch, I'm glad you have a workaround for the latest LightGBM release.

I'll try to make some progress on this, at least setting up the type of test that would have caught this, and maybe others who contribute here will be able to help.

When we have something that we feel is working, I'll @ you here asking if you'd test it in lightgbm-rs.

@jameslamb jameslamb self-assigned this Feb 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants