This is a piece of cake using GitHub’s excellent permission system.
(I’m joking, of course. Service accounts are nowhere to be seen. OAuth can’t even scope to an organization, let alone a repository. And this whole github.dev thing illustrates that you don’t even need to explicitly grant permission to issue broadly scoped tokens.)
Also, forking is pretty heavyweight just to launch something that, for all anyone knows before starting actual work, is being used as a read only viewer.
I think it's ok to be signed-in when opening your own repositories, but definitely not when opening repositories from other accounts. And also the webview keyboard shortcut thing needs to be fixed to only allow harmless keybinds and NOT propagate to any keydown handler. Also on desktop it should be removed in favor of Electron intercepting directly. And on web it should probably disabled by the default.
it absolutely is advertising, you can even call it a growth hack if you want to feel good about it
co-authorship implies ability to hold author rights, which afaik an algorithm can't do.
are folks adding speakeasy/stainless co-authorship lines to their commits? should i add alembic as a co-author after making some changes to the database schema?
That's exactly what you're supposed to do - if a tool generated the code in a commit, you should be using a commit trailer for that. Whether that's uniffi, an rpc preprocessor, dependabot or renovate, or some AI tool.
1. count for claude as co-author: 25M
2. count for speakeasy as co-author: 917
3. count for stainless as co-author: 6.2k
4. count for alembic as co-author: *Your search did not match any commits*
As the sibling notes, that would usually be marked as Generated-By or Generator or similar tags. Claude is only using "Co-Authored-By" for the same reason that Anthropic is calling claude "he", not "it": to anthropomorphize the machine in the public's perception.
assume no deep learning, of any kind is involved: you write a program, you are the author, right? you compile the code, are you still the author? do you have to attribute co-authorship to gcc/llvm/oracle?
i think not, you are still the author, same as when anyone else uses an llm to write code.
I agree. It's both an ad and a useful signal of where the code came from or how it was created.
Just like the default iPhone email signature, it's an ad and a hint that the author was typing with their thumbs, so it's probably a brief auto-corrected message for that reason.
The expected purpose of websites is to spread information, so whether users get it by making a request to your website or to Google is irrelevant. In fact, if they get it from Google it's better because it reduces website load.
If instead the purpose of your website is to manipulate users for financial gain (for instance by showing media attempting to manipulate their purchasing decisions, after receiving a bribe from a vendor), and the information is just a way to lure users, then maybe this malicious business model will finally be no longer possible.
On the one hand, this is an interesting observation. The internet as it exists today is filled with product placement slop and real information is a rare commodity. The loss of these kinds of sites is a blessing.
On the other hand, Google played a big role in creating this problem in the first place. The search results have trended downwards towards this kind of SEO slop for the last 10 years and Google has been unable (or unwilling) to fix it. Plus, the AI results Google shows are not free from commercial influence and will probably only get worse in this regard. Except now this money will flow to Google instead of $random_internet_spammer. I don't know if that's any better.
The idea that Google won't eventually "manipulate users for financial gain" with Gemini is comically naive. That's how they're going to make money from this thing.
If they want a dependently typed language, why not use one? Lean is good, and I don't think it has any significant downside wrt Haskell other than more limited library ecosystem (but I guess AI can translate Haskell libraries to Lean very effectively).
Seems bullshit, apparently it only works with TPM-only mode, which is obviously insecure (it relies on neither the OS nor the hardware being exploitable, on a random Windows PC...), and not worth building a backdoor for.
The way one would backdoor something like Bitlocker is to encrypt the disk encryption key with a (post-quantum) public key for which only the backdoor owner has the private key for, and then put it on a place on disk that is unused by the filesystem.
The escape algorithm here is very simple, you remove special tokens from the runtime tokenizer's vocabulary so that it's forced to encode them as multiple non-special tokens. (That doesn't actually mean the LLM won't treat them as special tokens though, so this isn't sufficient on it's own.)
Cool technique, but I'm not sure I'd call it simple.
Doing this means that you can't just tokenize the string output of the chat template as one big string. You might need to tokenize things separately, and combine them after.
If you want the token sequence, you ought to avoid discarding it when you produce the string output. This is because, even ignoring special tokens, different token sequences map to the same strings.
From a space perspective, this is actually better because tokenization tends to compress text quite well. For example, common tokens in English text take up ~4 characters on average (expands to 32 bits), but only take up a fraction of that to store (15-18 bits/token depending on vocabulary size)
In fact it appears that designing the tokens as a text compression encoding is a decent approach, since it's roughly what some LLMs do. For example, early GPT tokenizers followed byte pair encoding to create the vocabulary, which is a text compression algorithm from the 90s.
Good catch. We'd have to integrate with jinja2 (or similar) and tokenzize as we format the context, so that we know which spans are instructions and which spans are data. Which makes it more complex but still very achievable.
You're right, there must be a good and simple way to do it.
Obviously the prefix-with-backslash convention won't do it.
The escaping system could be something like inserting a character on the second position in the text repr, and reversing that on output too if it matches an escaped known special token.
Changing the vocab on the fly requires tokenizing things separately, breaking the chat template.
Anecdotally, even claude code has an anneurism sometimes when listing special tokens. Idk exactly what claude's <eos> token is, but I'm fairly sure I've seen it stop generation when it tried to generate it before.
I should also say that I've (clearly) not thought about this deeply. There should be a simpler way to do it.
That is only the case if people enter exclusive relationships. But if someone has access to a dating app or system that works really well, there is little reason to do that.
reply