Czech the Data

I'm Mike Czech, a software engineer and data scientist living in Hamburg, Germany. I work on autonomous driving safety at MOIA and write about data, machine learning, software engineering, travel, and other things I’m learning.

note Jul 19, 2026 #

One of the things I enjoy most about AI coding agents is how much easier they’ve made it to show work that would have stayed hidden before. Quick visualizations, one-off tools, and small interactive demos are suddenly worth building because they only take minutes. As a result, presentations have become much more lively. We end up discussing something tangible instead of imagining it!

Generally, I’m a big fan of a show and tell culture instead of endless discussions about abstract ideas. People should feel comfortable showing incomplete work, know it’s okay not to be great at presenting, and be proud of small achievements. Coding agents make this easier than ever before.

note Jul 13, 2026 #

Sunk Cost Fallacy as a Feature

“Sunk cost fallacy can be a feature: if you have spent a lot of blood, sweat, and tears on a project, you are more likely to push it through adversity and the doldrums that inevitably one will encounter. If all it took was one of those momentarily brilliant ideas and a prompt on Claude to produce something, there is no attachment whatsoever to it.”
sph on Hacker News

note Jul 07, 2026 #

Recognition in the Age of AI

I’ve been working in an environment where heavy AI usage has been the default for a few months now, and I’ve noticed an interesting shift in the economics of recognition.

If exceptional work is increasingly assumed to be AI-generated, the incentive to produce exceptional work may gradually weaken. Recognition has always been part of what motivates people to push beyond “good enough”.

The challenge is that AI makes authorship harder to infer from the output. Even when you solve a problem where AI failed, others may still assume AI did most of the work. As recognition becomes less connected to visible skill, the incentives around excellence begin to change.

note Jul 02, 2026 #

Fast Inference Might Matter More Than Smarter Models Now

I have a feeling that the next breakthroughs in AI won’t necessarily come from more capable models, but rather from much faster inference. In the past two weeks, we’ve seen three interesting developments in that direction.

First, OpenAI announced GPT-5.6 Sol:

“We’re also launching GPT‑5.6 Sol on Cerebras at up to 750 tokens per second in July”

Second, Google released Nano Banana 2 Lite, bringing image generation down to just a few seconds. Finally, they also published Diffusion Gemma, achieving over 1,000 tokens per second on a single NVIDIA H100 GPU.

I suspect that even with model capabilities around the level of Claude Opus 4.6+, this kind of inference speed would enable a lot of new use cases, much like video streaming only became practical once the internet became fast enough.

note Jun 21, 2026 #

I’ve noticed that with AI-assisted coding, it’s becoming even more common to end up with large PRs. This is problematic for several reasons:

bugs are harder to spot
merge conflicts increase
reviews take longer

The interesting thing is that AI coding agents are also really good at splitting large PRs into smaller, more manageable pieces! I’ll often ask an agent to identify independent changes, separate refactors from functional changes, and suggest a sequence of smaller PRs.

To me, this shows that AI-assisted coding isn’t just about producing more code in less time. It can also help reinforce good engineering practices, which are necessary if you want to scale development over time.

note Jun 19, 2026 #

One of my favourite new tricks is to be a little more verbose in Slack and then use Claude and Slack MCP to generate a pull request from the discussion. That way, ideas from our Slack discussions make their way into the product almost immediately!

It’s a small example of how software engineering is becoming less about producing code and more about collaboratively shaping the product.

note May 23, 2026 #

Recently, I’ve been working more with dbt again and came across a useful way to handle questionable rows: configure a test to warn and store its failures.

{{ config(
    severity = 'warn',
    store_failures = true
) }}

select *
from {{ ref('some_model') }}
where ...

With dbt test, this writes the failing rows to a table instead of stopping the whole pipeline. That makes it a handy way to flag invalid or suspicious records and keep them available for investigation.

article Mar 28, 2026

Deep Water Soloing at Tonsai Bay

I’ve been climbing for more than 10 years now, and recently I finally got the chance to try something that’s been on my mind for a long time: Deep Water Soloing (DWS)! For those who don’t already know, it’s climbing without ropes, relying only on the water below for protection.

[... 612 words]

article Sep 06, 2025

Lightweight Re-Ranking in Qdrant Using Logistic Regression

I’ve worked a little more with Qdrant’s hybrid queries and noticed that they are useful beyond what I described in my last article. When building a recommendation system, we usually start with vector search to retrieve promising candidates — for example, the top-M videos for a user based on their past behavior in a shared embedding space.

[... 409 words]

article Jun 11, 2025

Local LLMs on the Rise

After tinkering with Qwen3-30b and Gemma3-12b via Ollama for a week, I have the feeling that we’re at a turning point with local LLMs.

[... 269 words]

article May 23, 2025

Combining Vector Search with Business Logic in Qdrant

I’ve been using Qdrant for vector similarity search in a recommendation system, and one challenge that keeps coming up is how to incorporate additional information beyond just the vectors themselves.

[... 353 words]

article Apr 11, 2025

A First Glance at Python UDTFs in Snowflake

Recently, I’ve been working more with dbt / Snowflake and needed to utilize a multi-output regression model from SQL. The model was implemented in Python using scikit-learn and LightGBM.

[... 663 words]

article Mar 23, 2025

Using PDB in Metaflow

For some projects, I use Netflix’s Metaflow to build machine learning pipelines. I generally enjoy using it because it allows me to run pipelines both locally and remotely on AWS Batch, depending on the resource requirements of a pipeline.

[... 138 words]

article Mar 16, 2025

Finding the Maximal Rectangle in Augmented Reality

Raycast view next to maximal rectangle matrix

A few weeks ago, I had the task of determining a reasonable location for placing virtual objects on detected walls in Apple’s ARKit. This was challenging because real objects like doors, shelves, and so on can interfere with the detected wall.

[... 272 words]

article Mar 13, 2025

How Boosted Decision Trees can Benefit from Language Models

There has been a lot of progress in natural language understanding during the last years. Let it be the rise of AI assistants like ChatGPT or most recently in foundational models like ModernBERT.

[... 981 words]