More

lifis · 2026-06-03T08:21:06 1780474866

You can just fork the repository, give it access to the fork and then merge what you want

amluto · 2026-06-03T13:59:29 1780495169

This is a piece of cake using GitHub’s excellent permission system.

(I’m joking, of course. Service accounts are nowhere to be seen. OAuth can’t even scope to an organization, let alone a repository. And this whole github.dev thing illustrates that you don’t even need to explicitly grant permission to issue broadly scoped tokens.)

Also, forking is pretty heavyweight just to launch something that, for all anyone knows before starting actual work, is being used as a read only viewer.

lifis · 2026-06-03T08:19:34 1780474774

I think it's ok to be signed-in when opening your own repositories, but definitely not when opening repositories from other accounts. And also the webview keyboard shortcut thing needs to be fixed to only allow harmless keybinds and NOT propagate to any keydown handler. Also on desktop it should be removed in favor of Electron intercepting directly. And on web it should probably disabled by the default.

lifis · 2026-05-26T19:12:50 1779822770

Huh? It's not advertising, it's disclosure that the code was not fully (or at all) written by you.

0123456789ABCDE · 2026-05-26T19:32:51 1779823971

it absolutely is advertising, you can even call it a growth hack if you want to feel good about it

co-authorship implies ability to hold author rights, which afaik an algorithm can't do.

are folks adding speakeasy/stainless co-authorship lines to their commits? should i add alembic as a co-author after making some changes to the database schema?

  Co-authored-by: buf generate <noreply@github.com>

kuschku · 2026-05-26T20:09:23 1779826163

That's exactly what you're supposed to do - if a tool generated the code in a commit, you should be using a commit trailer for that. Whether that's uniffi, an rpc preprocessor, dependabot or renovate, or some AI tool.

0123456789ABCDE · 2026-05-26T20:56:46 1779829006

> That's exactly what you're supposed to do

this does not seem to match reality, see:

  1. count for claude as co-author: 25M
  2. count for speakeasy as co-author: 917
  3. count for stainless as co-author: 6.2k
  4. count for alembic as co-author: *Your search did not match any commits*

ex. search: https://github.com/search?q=%22Co-Authored-By%3A+Claude%22+%...

kuschku · 2026-05-27T01:32:19 1779845539

As the sibling notes, that would usually be marked as Generated-By or Generator or similar tags. Claude is only using "Co-Authored-By" for the same reason that Anthropic is calling claude "he", not "it": to anthropomorphize the machine in the public's perception.

Conan_Kudo · 2026-05-26T20:28:54 1779827334

There's a Generated-by trailer for that sort of thing.

cobbal · 2026-05-26T19:54:44 1779825284

If co-authorship implies holding rights, then what gives the "primary author" who just prompted for the code the right to add their own name?

0123456789ABCDE · 2026-05-26T20:40:12 1779828012

the fact they authored the code?

assume no deep learning, of any kind is involved: you write a program, you are the author, right? you compile the code, are you still the author? do you have to attribute co-authorship to gcc/llvm/oracle?

i think not, you are still the author, same as when anyone else uses an llm to write code.

ianal

ellyagg · 2026-05-26T19:13:41 1779822821

It’s both

mbreese · 2026-05-26T19:17:38 1779823058

> Sent from my iPhone.

I agree. It's both an ad and a useful signal of where the code came from or how it was created.

Just like the default iPhone email signature, it's an ad and a hint that the author was typing with their thumbs, so it's probably a brief auto-corrected message for that reason.

amarant · 2026-05-26T19:26:25 1779823585

The iPhone analogy is very apt and accurate: it's ~95% advertising and ~5% useful signal.

Which for my repositories means I want ~95% less of it in my commit history. I'm prepared to round up for simplicity. But to each their own.

layer8 · 2026-05-26T19:57:37 1779825457

You could do that without naming the AI product.

lifis · 2026-05-20T02:51:21 1779245481

The expected purpose of websites is to spread information, so whether users get it by making a request to your website or to Google is irrelevant. In fact, if they get it from Google it's better because it reduces website load.

If instead the purpose of your website is to manipulate users for financial gain (for instance by showing media attempting to manipulate their purchasing decisions, after receiving a bribe from a vendor), and the information is just a way to lure users, then maybe this malicious business model will finally be no longer possible.

AlexandrB · 2026-05-20T17:38:50 1779298730

On the one hand, this is an interesting observation. The internet as it exists today is filled with product placement slop and real information is a rare commodity. The loss of these kinds of sites is a blessing.

On the other hand, Google played a big role in creating this problem in the first place. The search results have trended downwards towards this kind of SEO slop for the last 10 years and Google has been unable (or unwilling) to fix it. Plus, the AI results Google shows are not free from commercial influence and will probably only get worse in this regard. Except now this money will flow to Google instead of $random_internet_spammer. I don't know if that's any better.

The idea that Google won't eventually "manipulate users for financial gain" with Gemini is comically naive. That's how they're going to make money from this thing.

lifis · 2026-05-18T13:37:57 1779111477

If they want a dependently typed language, why not use one? Lean is good, and I don't think it has any significant downside wrt Haskell other than more limited library ecosystem (but I guess AI can translate Haskell libraries to Lean very effectively).

lifis · 2026-05-18T11:14:52 1779102892

AI is not yet at the superhuman stage where you can tell it "clone Photoshop" and get a perfect result within a day for almost free

lifis · 2026-05-17T17:34:24 1779039264

Seems bullshit, apparently it only works with TPM-only mode, which is obviously insecure (it relies on neither the OS nor the hardware being exploitable, on a random Windows PC...), and not worth building a backdoor for.

The way one would backdoor something like Bitlocker is to encrypt the disk encryption key with a (post-quantum) public key for which only the backdoor owner has the private key for, and then put it on a place on disk that is unused by the filesystem.

lifis · 2026-05-14T22:13:36 1778796816

Surely one can just escape the input, no? Seems astonishing if someone isn't doing that

maxbond · 2026-05-14T22:21:16 1778797276

The escape algorithm here is very simple, you remove special tokens from the runtime tokenizer's vocabulary so that it's forced to encode them as multiple non-special tokens. (That doesn't actually mean the LLM won't treat them as special tokens though, so this isn't sufficient on it's own.)

bashbjorn · 2026-05-14T23:47:08 1778802428

Cool technique, but I'm not sure I'd call it simple.

Doing this means that you can't just tokenize the string output of the chat template as one big string. You might need to tokenize things separately, and combine them after.

sebastianmestre · 2026-05-15T15:26:56 1778858816

If you want the token sequence, you ought to avoid discarding it when you produce the string output. This is because, even ignoring special tokens, different token sequences map to the same strings.

From a space perspective, this is actually better because tokenization tends to compress text quite well. For example, common tokens in English text take up ~4 characters on average (expands to 32 bits), but only take up a fraction of that to store (15-18 bits/token depending on vocabulary size)

In fact it appears that designing the tokens as a text compression encoding is a decent approach, since it's roughly what some LLMs do. For example, early GPT tokenizers followed byte pair encoding to create the vocabulary, which is a text compression algorithm from the 90s.

maxbond · 2026-05-15T21:27:13 1778880433

Good catch. We'd have to integrate with jinja2 (or similar) and tokenzize as we format the context, so that we know which spans are instructions and which spans are data. Which makes it more complex but still very achievable.

bashbjorn · 2026-05-14T23:45:10 1778802310

You're right, there must be a good and simple way to do it.

Obviously the prefix-with-backslash convention won't do it. The escaping system could be something like inserting a character on the second position in the text repr, and reversing that on output too if it matches an escaped known special token.

Changing the vocab on the fly requires tokenizing things separately, breaking the chat template.

Anecdotally, even claude code has an anneurism sometimes when listing special tokens. Idk exactly what claude's <eos> token is, but I'm fairly sure I've seen it stop generation when it tried to generate it before.

I should also say that I've (clearly) not thought about this deeply. There should be a simpler way to do it.

lifis · 2026-05-14T12:47:34 1778762854

That is only the case if people enter exclusive relationships. But if someone has access to a dating app or system that works really well, there is little reason to do that.

lifis · 2026-05-13T10:16:31 1778667391

Not clear what it actually does, but seems equivalent to a global right click menu with "Chat with AI about this"