- 2 Posts
- 56 Comments
BB_C@programming.devto
Rust@programming.dev•Please share your opinions on my learning code parquet2csv
7·4 days agoLet’s do this incrementally, shall we?
First, let’s
make get_files_in_dir()idiomatic. We will get back to errors later.fn get_files_in_dir(dir: &str) -> Option<Vec<PathBuf>> { fs::read_dir(dir) .ok()? .map(|res| res.map(|e| e.path())) .collect::<Result<Vec<_>, _>>() .ok() }
Now, in
read_parquet_dir(), if the unwraps stem from confidence that we will never get errors, then we can confidently ignore them (we will get back to the errors later).fn read_parquet_dir(entries: &Vec<String>) -> impl Iterator<Item = record::Row> { // ignore all errors entries.iter() .cloned() .filter_map(|p| SerializedFileReader::try_from(p).ok()) .flat_map(|r| r.into_iter()) .filter_map(|r| r.ok()) }
Now, let’s go back to
get_files_in_dir(), and not ignore errors.fn get_files_in_dir(dir: &str) -> Result<Vec<PathBuf>, io::Error> { fs::read_dir(dir)? .map(|res| res.map(|e| e.path())) .collect::<Result<Vec<_>, _>>() }fn main() -> Result<(), io::Error> { let args = Args::parse(); - let entries = match get_files_in_dir(&args.dir) - { - Some(entries) => entries, - None => return Ok(()) - }; - + let entries = get_files_in_dir(&args.dir)?; let mut wtr = WriterBuilder::new().from_writer(io::stdout()); for (idx, row) in read_parquet_dir(&entries.iter().map(|p| p.display().to_string()).collect()).enumerate() {
Now,
SerializedFileReader::try_from()is implemented for&Path, andPathBufderefs to&Path. So your dance of converting to display then to string (which is lossy btw) is not needed.While we’re at it, let’s use a slice instead of
&Vec<_>in the signature (clippy would tell you about this if you have it set up with rust-analyzer).fn read_parquet_dir(entries: &[PathBuf]) -> impl Iterator<Item = record::Row> { // ignore all errors entries.iter() .filter_map(|p| SerializedFileReader::try_from(&**p).ok()) .flat_map(|r| r.into_iter()) .filter_map(|r| r.ok()) }let entries = get_files_in_dir(&args.dir)?; let mut wtr = WriterBuilder::new().from_writer(io::stdout()); - for (idx, row) in read_parquet_dir(&entries.iter().map(|p| p.display().to_string()).collect()).enumerate() { + for (idx, row) in read_parquet_dir(&entries).enumerate() { let values: Vec<String> = row.get_column_iter().map(|(_column, value)| value.to_string()).collect(); if idx == 0 { wtr.serialize(row.get_column_iter().map(|(column, _value)| column.to_string()).collect::<Vec<String>>())?;
Now let’s see what we can do about not ignoring errors in
read_parquet_dir().
Approach 1: Save intermediate reader results
This consumes all readers before getting further. So, it’s a behavioral change. The signature may also scare some people 😉
fn read_parquet_dir(entries: &Vec<PathBuf>) -> Result<impl Iterator<Item = Result<record::Row, ParquetError>>, ParquetError> { Ok(entries .iter() .map(|p| SerializedFileReader::try_from(&**p)) .collect::<Result<Vec<_>, _>>()? .into_iter() .flat_map(|r| r.into_iter())) }
Approach 2: Wrapper iterator type
How can we combine errors from readers with flat record results?
This is how.
enum ErrorOrRows { Error(Option<ParquetError>), Rows(record::reader::RowIter<'static>) } impl Iterator for ErrorOrRows { type Item = Result<record::Row, ParquetError>; fn next(&mut self) -> Option<Self::Item> { match self { Self::Error(e_opt) => e_opt.take().map(Err), Self::Rows(row_iter) => row_iter.next(), } } } fn read_parquet_dir(entries: &[PathBuf]) -> impl Iterator<Item = Result<record::Row, ParquetError>> { entries .iter() .flat_map(|p| match SerializedFileReader::try_from(&**p) { Err(e) => ErrorOrRows::Error(Some(e)), Ok(sr) => ErrorOrRows::Rows(sr.into_iter()), }) }let mut wtr = WriterBuilder::new().from_writer(io::stdout()); for (idx, row) in read_parquet_dir(&entries).enumerate() { + let row = row?; let values: Vec<String> = row.get_column_iter().map(|(_column, value)| value.to_string()).collect(); if idx == 0 { wtr.serialize(row.get_column_iter().map(|(column, _value)| column.to_string()).collect::<Vec<String>>())?;
Approach 3 (bonus): Using unstable
#![feature(gen_blocks)]fn read_parquet_dir(entries: &[PathBuf]) -> impl Iterator<Item = Result<record::Row, ParquetError>> { gen move { for p in entries { match SerializedFileReader::try_from(&**p) { Err(e) => yield Err(e), Ok(sr) => for row_res in sr { yield row_res; } } } } }
BB_C@programming.devto
Rust@programming.dev•Next-generation Proton Mail mobile apps: more than meets the eye | Proton
22·5 days agoNCDC (No Code, Don’t Care)
BB_C@programming.devto
Rust@programming.dev•Rust at Scale: An Added Layer of Security for WhatsApp
4·10 days agoAs with all ads, especially M$ ones…
No Code, Don’t CareAt least if the code was available, I would find out what they mean by “spoofed Mime” and how that attack vector works (Is the actual file “magic” header spoofed, but the file still manages to get parsed with its non-“spoofed” actual format none the less?!, How?).
Also, I would have figured out if this is a new use of “at scale” applied to purely client code, or if a service is actually involved.
BB_C@programming.devto
Rust@programming.dev•Is the async_trait still necessary for dync Trait with async methods?
4·16 days agodyncompatibility of the trait itself is another matter. In this case, an async method makes a trait not dyn-compatible because of the implicit-> impl Futureopaque return type, as documented here.But OP didn’t mention whether
dynis actually needed or not. For me,dynis almost always a crutch (exceptions exist).
BB_C@programming.devto
Rust@programming.dev•Is the async_trait still necessary for dync Trait with async methods?
6·16 days agoIf I understand what you’re asking…
This leaves out some details/specifics out to simplify. But basically:
async fn foo() {} // ^ this roughly desugars to fn foo() -> impl Future<()> {}This meant that you couldn’t just have (stable) async methods in traits, not because of async itself, but because you couldn’t use impl Trait in return positions in trait methods, in general.
Box<dyn Future>was an unideal workaround (not zero-cost, and otherdyndrawbacks).async_traitwas a proc macro solution that generated code with that workaround. soBox<dyn Future>was never a desugaring done by the language/compiler.now that we have (stable) impl Trait in return positions in trait methods, all this dance is not strictly needed anymore, and hasn’t been needed for a while.
I was just referring to the fact that they are macros.
printfuses macros in its implementation.int __printf (const char *format, ...) { va_list arg; int done; va_start (arg, format); done = __vfprintf_internal (stdout, format, arg, 0); va_end (arg); return done; }^ This is from glibc. Do you know what
va_startandva_endare?to get features that I normally achieve through regular code in other languages.
Derives expand to “regular code”. You can run
cargo expandto see it. And I’m not sure how that’s an indication of “bare bone”-ness in any case.Such derives are actually using a cool trick, which is the fact that proc macros and traits have separate namespaces. so
#[derive(Debug)]is using the proc macro namedDebugwhich happens to generate “regular code” that implements theDebugtrait. The proc macro namedDebugand implemented traitDebugdon’t point to the same thing, and don’t have to match name-wise.
Not sure if you’re talking about the language, or the core/alloc/std libraries, or both/something in-between?
Can you provide specific examples, an which specific languages are you comparing against?
BB_C@programming.devto
Linux@programming.dev•Two Linux Distributions I'm Watching Closely in 2026
3·1 month ago(didn’t read OP, didn’t keep up with chimera recently)
From the top of my head:
The init system. Usable FreeBSD utils instead of busybox overridable by gnu utils (which you will have to do because the former are bare-bones). Everything is built with LLVM (not gcc). Extra hardening (utilizing LLVM). And it doesn’t perform like shit in some multi-threaded allocator-heavy loads because they patch musl directly with mimalloc. It also doesn’t pretend to have a stable/release channel (only rolling).So, the use of
apkis not that relevant. “no GNU” is not really the case with Alpine. They do indeed have “musl” in common, but Chimera “fixes” one of the most relevant practical shortcomings of using it. And finally, I don’t think Chimera really targets fake “lightweight”-ness just for the sake of it.
BB_C@programming.devto
Rust@programming.dev•`ffmpReg`: a complete rewrite of ffmpeg in pure Rust
1·1 month agoHow is this literal joke still getting so much engagement?
nice
CLAUDE.md. got it on the contributor list too.
BB_C@programming.devto
Rust@programming.dev•`ffmpReg`: a complete rewrite of ffmpeg in pure Rust
23·1 month agoAn actually serious project that is not at the “joke” stage. Zero LLM use too:
https://nihav.org/For audio at least, people should be aware of:
https://github.com/pdeljanov/Symphonia
BB_C@programming.devto
Rust@programming.dev•My wife gave me the best Christmas gift: a handmade Ferris!
15·1 month agosoftware-rendered implemented-in-C++ terminal
you fail the cult test 😉
The whole premise is wrong, since it’s based on the presumption of C++ and Rust being effectively generational siblings, with the C++ “designers” (charitable) having the option to take the Rust route (in the superficial narrow aspects covered), but choosing not to do so. When the reality is that C++ was the intellectual pollution product of “next C” and OOP overhype from that era (late 80’s/ early 90’s), resulting in the “C with classes” moniker.
The lack of both history (and/or evolution) and paradigm talk is telling.
BB_C@programming.devto
Linux@programming.dev•Chimera Linux Releases New Images With Kernel 6.18
4·2 months agoNo. This one is actually cool, useful, and innovative. And it tries to do some things differently than everyone else.
/me putting my Rust (post-v1.0 era) historian hat on.
The list of (language-level) reasons why people liked Rust was already largely covered by the bullet points in the real original Rust website homepage, before some “community” people decided to nuke that website because they didn’t like the person who wrote these points (or rather, what that person was “becoming”). They tasked some faultless volunteers who didn’t even know much Rust to develop a new website, and then rushed it out. It was ugly. It lacked supposedly important components like internationalization, which the original site did. But what was important to those “community people” (not to be confused with the larger body of people who develop Rust and/or with Rust) is that the very much technically relevant bullet points were gone. And it was then, and only then, that useless meaningless “empowerment” speak came into the picture.
less likely to be insecure
Evidenced by?
requires reviewing all source code
This is exactly the la-la-land view of what distributors do I was dispelling with facts and reality checks. No one is reviewing all source code of anything, except for cases where a distro developer and an upstream member are the same person. And even then, this may not be the case depending on the upstream project, its size, and the distro developer’s role within that project.
to make sure it meets interoperability
Doesn’t mean anything other than “it builds” and “API is not broken” (e.g. withing the same
.soversion), and “seems to work”.These considerations happen to hardly exist with the good tooling provided by cargo.
and open-source standards.
Doesn’t mean anything outside of licensing (for code and assets), and “seems to work”.
Your argument that crates.io is a known organization therefore we should trust the packages distributed is undermined by your acknowledgement that crates.io does not produce any code. Instead we are relying on the individual crate developers, who can be as anonymous as they want.
Largely correct. But that was me comparing middle-man vs. middle-man. That is if
crates.iooperators can be described as middle-men, since their responsibilities (and consequently, attack vectors) are much smaller.Barring organizational attacks from within, with
crates.io, you have one presumably competent/knowledgable, possibly anonymous, source, and operators that don’t do much. With a binary distro, you have that, AND another “middle-man” source, possibly anonymous, and with competence and applicable knowledge <= upstream (charitable), yet put in a position to decide what to do with what upstream provides, or rather, provided… X years ago, if we are talking about the claimed “stable” release channel.The middle man pulls sources from places like
crates.ioanyway. So applying trivial “logic”/“maths”, it can’t be “better”, in the context being discussed.Software doesn’t get depended on out of thin air. You are either first in line directly depending on a library, and thus you would naturally at least make the minimum effort to make sure it’s minimally “fit for purpose”. Or you are an indirect dependant, and thus looking at your direct dependencies, and maybe “trusting” them with the “trickle down”.
More processes, especially automated ones, are always welcome to help catch “stuff” early. But it is no surprise that the “success stories” concern crates with fat ZERO dependants.
Processes that help dependants share their knowledge about their dependencies (a la
cargo vet) are unquestionably good additions. They sure trump the dogmatic blind faith in distros doing something they simply don’t have the knowledge or resources to do, or the slightly less dogmatic faith in some library being “trustable” if packaged by X or XX distros, assuming at least someone knowledgable/competent must have given a thorough look (this has a rough equivalent in the number of dependants anyway).This is all obvious, and doesn’t take much thought from anyone active from the inside (upstreams or distros), instead of the surface “knowledge” that leaks, and possibly gets manipulated, in route to the outside.
While it may never be “enough” depending on your requirements (which you didn’t specifically and coherently define), the amount of “review”, and having the required know-how to do it competently, is much bigger/higher from your crate dependants, than from your distro packages.
It’s not rare for a distro packager to not know much about the programming language (let a lone the specific code) of some packages they package. It’s very rare for a packager to know much about the specific code of what they package (they may or may not have some level of familiarity with a handful of codebases).
So what you get is someone who pulls source packages (from the interwebs), possibly patching them (and possibly breaking them), compiling them, and giving you the binaries (libs/execs). With source distros, you don’t have the compiling and binary package part. With
crates.io, you don’t have the middle man at all. Which is why the comparison was never right from the start. That’s the pondering I left you to do on your own two comments ago.Almost all sufficiently complex user-space software in your system right now has a lot of dependencies (vendored or packaged), you just don’t think of them because they are not in your face, and/or because you are ambivalent to the realities of how distros work, and what distro developers/packagers actually do (described above). You can see for yourself with whatever the Debian equivalent is to pactree (from pcaman).
At least with cargo, you can have all your dependencies in their source form one command away from you (
cargo vendor), so you can trivially inspect as much as you like/require. The only part that adds unknowns/complexities is crates that usebuild.rs. But just likeunsafe{}, this factor is actually useful, because it tells you where you should look first with the biggest magnifying glass. And just like cargo itself, the streamlining of the process means there aren’t thousands of ways/places in the build process to do something.



No. It’s how you (explicitly) go from ref to deref.
Here:
pis&PathBuf*pisPathBuf**pisPath(Deref)&**pis&Path.Since what you started with is a reference to a non-Copy value, you can’t do anything that would use/move
*por**p. Furthermore,Pathis an unsized type (just likestrand[T]), so you need to reference it (or Box it) in any case.Another way to do this is:
let p: &Path = p.as_ref();Some APIs use
AsRefin signatures to allow passing references of different types directly (e.g. File::open()), but that doesn’t apply here.