For quite some time I’ve been trying, and until just now failing, to
get Mastodon to display preview images for my blog posts. It seemed
to me that all I had to do was add the proper
OpenGraphog:type, og:title, and og:image
tags, right?
Right?
And then if I check it with one of the
numerouspreviewcheckers, and I see the image, it
ought to work, right? Right?
No! It doesn’t work that way. This page is a test to see if I was
missing something in particular. Maybe it was the lack of og:url
(which is required given the spec)? Maybe og:description? Maybe
og:image:alt?
And the winner is: og:url, which isn’t that surprising since as the
spec says:
What do you need in order to run them on your computer?
A low-profile graphics card with a low-profile bracket
With reasonable power consumption (single-slot under 75W TDP)
Supported by current drivers and compute libraries
Can you do this? Why yes, you can!
All of the information that follows assumes that you are running
Debian 13, but should apply to various Ubuntus. I do not know what a
“NixOS” or an “Arch” or an “Omarchy” is so please don’t ask.
Cheapo GPUs as of April 2026
As of this writing, approximate prices (via eBay) and specifications
(via TechPowerUp) of GPUs
that meet these criteria, ordered by price in loonies and
toonies:
The sweet spot for performance and compatibility (since
unfortunately, CUDA Rules Everything Around
Me) is the GTX1650,
though its fans are quite loud. Be careful to get a low profile
version and not simply a small form factor one as it comes in an
“Aero-ITX” form factor which is unlikely to fit in a low-profile case.
The GT1030 is still the best fanless GPU after all these years, and
can actually be coaxed into running some fairly useful models, but
that’s about all it has going for it. Also, its heatsink is
massive so it won’t fit in really tiny PCs. So, if you don’t mind
a bit of noise, the P620 and P1000 are a great deal.
In theory the Arc A310 should be an excellent choice, but software
support for it is quite uncertain. And while the RX640 is a GCN4
architecture card
and is thus in theory supported by
llama.cpp, it is extremely
unclear exactly which version of ROCm you need to install to make this
work. But ROCm is open source, so in theory… you could do this (I
haven’t yet, stay tuned).
In case you’re worried about the absolutely horrible FP16 performance
of some cards in the table… don’t worry about it unless you want to
do some form of fine-tuning, which you probably don’t have enough VRAM
to do anyway. You’re going to have to use quantized models, so the
actual computation will get done in single precision.
Software setup
Our test case here is simple: extract a table as HTML (no, not
Markdown, because shut up, clanker, that’s why) using
dots.mocr. We’ll use
the weakest card in the bunch above, the GT1030, just because we can.
First we’ll install a few gigabytes of CUDA nonsense. This will
require you to add the evil incantation contrib nonfree to the end
of all the deb lines in /etc/apt/sources.list. Now you can:
Success! Next, we will build a copy of
llama.cpp (maybe a
pre-compiled one will work, but it probably requires Just Exactly The
Right Version Of Ten Thousand Libraries, so we won’t even try).
In the past, you had to convince llama.cpp that you really wanted to
use some ancient GPU architecture from 2016, but recent versions of
llama.cpp are able to detect your GPU capability, so
-DCMAKE_CUDA_ARCHITECTURES is no longer necessary if you build on the
same machine you’ll be running on. You can simply:
This might take a while if your computer is old like mine.
Note that on the GTX 1650, you may want to follow the helpful
suggestions that llama.cpp prints when you run it, and instead
configure your build with with:
Since HuggingFace “acquired” (meaning “acquihired”, I guess)
llama.cpp, it now works super well with their model repositories.
Let’s try it out! First, we’ll get ourselves a table of some sort.
Because I’ll be using it in another blog post shortly, we can try the
permitting statistics for the last year of my term as a town
councillor, which I obtained with a freedom of information
request
(and you can too!).
While llama.cpp now has a fancy web interface that can use
pdf.js to convert PDFs to images,
we’ll do this the old fashioned way from the command line. First,
convert the pages to PNGs:
No! We can’t do that. The reason is that by default llama.cpp will
try to put not only the model, but also the input and output data
buffers, on the GPU. So even if the model fits just fine, it will
most likely segfault, because llama.cpp is written in C++, so
segmentation faults are just its normal form of error handling.
This is very annoying! Luckily there’s a way around it, which is to
use the top-secret GGUF_CUDA_ENABLE_UNIFIED_MEMORY environment
variable, which basically tells llama.cpp to load and unload things
automatically between CPU and GPU memory:
export GGML_CUDA_ENABLE_UNIFIED_MEMORY=1
You may think that this is going to be slower than explicitly loading
everything on the GPU but in reality, it isn’t, unless your model
doesn’t fit on the GPU. You’ll know, if that happens 😉
Wow! Look at it go:
$ ./build/bin/llama-cli -hf lodrick-the-lafted/dots.mocr-gguf:Q8_0 \
--image analyse_permis-1.png -st -p "Convert table to HTML"
ggml_cuda_init: found 1 CUDA devices (Total VRAM: 1992 MiB):
Device 0: NVIDIA GeForce GT 1030, compute capability 6.1, VMM: yes, VRAM: 1992 MiB
Loading model...
▄▄ ▄▄
██ ██
██ ██ ▀▀█▄ ███▄███▄ ▀▀█▄ ▄████ ████▄ ████▄
██ ██ ▄█▀██ ██ ██ ██ ▄█▀██ ██ ██ ██ ██ ██
██ ██ ▀█▄██ ██ ██ ██ ▀█▄██ ██ ▀████ ████▀ ████▀
██ ██
▀▀ ▀▀
build : b1-98d2d28
model : lodrick-the-lafted/dots.mocr-gguf:Q8_0
modalities : text, vision
available commands:
/exit or Ctrl+C stop or exit
/regen regenerate the last response
/clear clear the chat history
/read <file> add a text file
/glob <pattern> add text files using globbing pattern
/image <file> add an image file
Loaded media from 'analyse_permis-1.png'
> Convert table to HTML
<html><body><table><thead><tr><td rowspan="2"></td><td colspan="2">Logements ...
[ Prompt: 39,3 t/s | Generation: 18,1 t/s ]
But is the output any good? Amazingly enough, llama-cli cannot
simply write the model’s output to a file or standard output without
printing its stupid logo everywhere (yes, really), so again, if you
don’t want to chat with your imaginary machine god but actually want
to write code that gets the computer to do useful work, you have to
either:
copy and paste from the terminal
run llama-server instead and talk to it over HTTP
use the llama.cpp Python wrapper (which is not super well maintained)
You should know that if you use llama-server, it has various default
settings which are exceedingly suboptimal for small GPUs, and will
probably cause it to segfault or loop endlessly after a couple of
prompts. Probably Ollama works better, if you
like installing software with curl | sudo bash (do not do this).
Basically you just need to disable any kind of multi-user or caching
capability, which you can do with the options --parallel 0 --cache-ram 0 --no-cache-prompt.
A full example of how to use it will show up in the near future on
this blog, but in French so, I guess,
you might want to take a language course.
What about the NVidia Tesla P4?
But you say, what about Tesla P4s, which are spectacularly cheap and
plentiful? Yes, I tried this, so now you don’t have to. There are a
few important things to know about this card:
It is “fanless” but relies on active cooling from outside. It will
quickly heat up to 91 degrees Celsius and either:
throttle its performance down to nothing
burn a hole in your computer
all of the above
People have tried
variousthings
to cool it with varying degrees of success. Some of these things
are even for sale on eBay! Many sellers will even sell you a P4
with a fan installed! Crucially, however, all these cooling fans
take up extra space in your case so may not fit in your average $30
small form factor corporate surplus PC. Make sure to measure!
It requires certain PCIe and/or BIOS features that older computers
(1st generation Intel Core processors) don’t support. If your
computer doesn’t allow you to use the onboard video when a discrete
GPU is plugged in, it likely won’t be recognized at all and there
is nothing you can do about this. Notably it does not work
on a Compaq Elite 8100, but it does work in a Lenovo
ThinkCentre M82, and you can also fit a cooler in there if you
remove the hard drive cage.
NVidia (and thus the Debian nvidia-detect utility) claims it is
only supported by the 470 series drivers, but this is actually a
lie mistake in the documentation, as it works absolutely fine
with the 550 series on Debian 13. It’s a Pascal-architecture GPU
with compute capability 6.1, just like the P1000, but twice as fast
and twice as much memory (if you can cool it).
So, while it may seem too good to be true, it may actually work well
for you with some fiddling. What I found in practice is that, if you
can keep it under 80C, the P4 will run quite a bit faster than the
GTX1650 (75 tokens per second with dots.mocr), so it’s definitely
worth the trouble.
The same caveats for the P4 apply also to the A2 and T4, which are
similarly passive-cooled cards, but also exceedingly expensive, so, in
summary, life is a symphony of contrasts. Note that the same cooling
fans that work for the P4 generally also work for the T4. But you
should just buy an Intel Arc Pro B50 instead if you’re going to spend
that much.
If you need to delve into the murky depths of a PDF to return with
spices and silk extract metadata, images, and yes, even text, I
have some excellent Free Software for you:
PLAYA-PDF and
PAVÉS. If you’d like to know how
this came to be, then continue reading. And if you need a consultant
for document intelligence tasks, large and small, I’m currently
available for contracts of all sorts!
“You’re nothing but a pack of indirect objects!”
As you may or may not know, I’m a computational
linguist by
trade and
training. In
2021, something else happened: I was elected to the town council in a
municipality of the greater St-Jérôme area, which shall remain
unnamed. Shortly thereafter, I quit my job as Principal Research
Scientist at a company (which shall also remain unnamed, and is now a
division of Microsoft) because it was clear that I couldn’t continue
to work full-time in Montréal while being a responsive and effective
public servant. I also found municipal politics to be a lot more
interesting and relevant than the slow, incremental improvement of
machine learning models for natural language understanding which I was
working on at the time.
And in the mean time, well, some other things happened…
One of the unintended consequences of this possibly ill-advised career
move was that I ended up becoming an expert of sorts on parsing and
manipulating PDF files. Of course, like any programmer, I did this in
the usual way, by starting a Free Software project.
Why PDF?
Once you get into the details of document management and archiving in
a municipality or other similar organization, it quickly becomes
obvious that, despite all the best efforts of decades of work on ODP,
OOXML, HTML, and various other purportedly universal document formats,
at the end of the day, the only thing you can count on is that a
document wil always be available as a PDF. This is the unfortunate
result of Microsoft’s domination of the office software market: not
only is Office gratuitously incompatible in subtle ways with every
other alternative (free and
proprietary), but Office
isn’t even compatible with itself a lot of the time.
Why another PDF library?
Free Software, fundamentally, is about choice, and I had certain
critieria for the tool that I wanted to use, which were not fulfilled
by the available choices:
Permissive open-source license (BSD or MIT).
Written in Python and portable to various platforms.
Programmer-friendly interface.
Direct access to internal PDF data structures, not just text
extraction but also access to graphics state, images and metadata.
Fast and efficient, to the extent possible given #2.
The closest thing that I found at the time was
pdfplumber, which is still a
very nice library which definitely satisfies 1, 2 and 3 above! I even
contributed support for logical structure trees to it at some point.
Unfortunately, pdfplumber, like its underlying library
pdfminer.six and other
popular projects, is not very efficient, in particular because it
needs to parse the entirety of each page and construct all the data
structures before returning any useful information.
Enter PLAYA: LazY and Parallel Analyzer
This is the main reason for PLAYA-PDF’s existence: it is designed from
the ground up to be
“lazy”,
and only processes the bare minimum of data needed to get the
information you want. On the other hand, if you are lazy, it also
has an “eager” interface which can convert PDF metadata to
JSON quite efficiently:
with playa.open(path) as pdf:
json.dumps(playa.asobj(pdf))
The other important aspect of PLAYA-PDF is built-in support for
parallel
processing,
of PDFs, with a simple and easy-to-use interface:
with playa.open(path, max_workers=4) as pdf:
texts =list(pdf.pages.map(playa.Page.extract_text))
Par-dessus la PLAYA, les PAVÉS!
Since the guiding principles of PLAYA-PDF are efficiency and the
absence of external dependencies, it doesn’t do any high-level
extraction tasks which require image processing, heuristics or machine
learning models.
For this reason I’ve also created
PAVÉS which will gradually support
more and more ways to do:
Structural and textual analysis of PDFs, such as table detection and
extraction, as well as extraction of rich text and logical structure.
Visualisation
of object in a PDF as well as rendering of pages to images.
This second library is under construction but is already good enough
to do the analysis and extraction used in my projects like
ZONALDA and
SÈRAFIM.
Conclusion
If you are one of the tiny minority of people who this might possibly
interest, then by all means feel free to give it a try! Take a look
at the documentation the sample
Jupyter
notebooks to
get an idea of what it can do.
Of course you can also contribute to its development on
GitHub! (I may soon move
development to Codeberg or another independent
provider outside the USA, but the GitHub mirror will always remain).