Show HN: Sourcebot, an open-source Sourcegraph alternative

github.com

189 points by bshzzle 17 hours ago

Hi HN,

We’re Brendan and Michael, the creators of Sourcebot (https://github.com/sourcebot-dev/sourcebot). Sourcebot is an open-source code search tool that allows you to quickly search across many large codebases. Check out our demo video here: https://youtu.be/mrIFYSB_1F4, or try it for yourself on our demo site here: https://demo.sourcebot.dev

While at prior roles, we’ve both felt the pain of searching across hundreds of multi-million line codebases. Using local tools like grep were ill-suited since you often only had a handful of codebases checked out at a time. Sourcegraph (https://sourcegraph.com/) solves this issue by indexing a collection of codebases in the background and exposing a web-based search interface. It is the de-facto search solution for medium to large orgs, but is often cited as expensive ($49 per user / month) and recently went closed source (https://news.ycombinator.com/item?id=41296481). That’s why we built Sourcebot.

We designed Sourcebot to be:

- Easily deployed: we provide a single, self-contained Docker image (https://github.com/sourcebot-dev/sourcebot/pkgs/container/so...).

- Fast & scalable: designed to minimize search times (current average is ~73ms) across many large repositories.

- Cross code-host support: we currently support syncing public & private repositories in GitHub and GitLab.

- Quality UI: we like to think that a good looking dev-tool is more pleasant to use.

- Open source: Sourcebot is free to use by anyone.

Under the hood, we use Zoekt (https://github.com/sourcegraph/zoekt) as our code search engine, which was originally authored by Han-Wen Nienhuys and now maintained by Sourcegraph (https://sourcegraph.com/blog/sourcegraph-accepting-zoekt-mai...). Zoekt works by building a trigram index from the source code enabling extremely fast regular expression matching. Russ Cox has a great article on how trigram indexes work if you’re interested: https://swtch.com/~rsc/regexp/regexp4.html

In the shorter-term, there are several improvements we want to make, like:

- Improving how we communicate indexing progress (this is currently non-existent so it’s not obvious how long things will take)

- UX improvements like search history, query syntax highlighting & suggestions, etc.

- Small QOL improvements like bookmarking code snippets.

- Support for more code hosts (e.g., BitBucket, SourceForge, ADO, etc.)

In the longer-term, we want to investigate how we could go beyond just traditional code search by leveraging machine learning to enable experiences like semantic code search (“where is system X located?”) and code explanations (”how does system X interact with system Y?”). You could think of this as a copilot being embedded into Sourcebot. Our hunch is that will be useful to devs, especially when packaged with the traditional code search, but let us know what you think.

Give it a try: https://github.com/sourcebot-dev/sourcebot. Cheers!

peterldowns 7 hours ago

Re-asking [0] as a top-level question, since it has gone unanswered: do you intend to make a business out of this project in some way, or is it a "real" open source project?

I know that intentions can change, but I'm curious how you see it. Sourcegraph was pretty clearly always going to be a business-type-of-project, and like most business projects, relicensed everything to their custom enterprise license. Originally it was Apache 2 [1].

I love open source and I write a lot of it myself [2]. I use the MIT license, just like you've done here, and I admire that. I don't think you owe me or anyone else anything, and the MIT license makes that clear.

I am very interested in this project and I'd love to extend and contribute to it, but only if it's an actual open source project. Seems like every devtools-focused startup these days calls themselves "open source" but fails to actually build a community, because in reality it's just a marketing gimmick. Because the project is actually a company, the people involved never try very hard to build a community of contributors. When the company invariably cannot make money with an open source product, the code gets relicensed to be closed-source. The few people who had contributed end up getting played. That's what happened to Sourcegraph!

So: open source, or open source "for now"?

[0]: https://news.ycombinator.com/item?id=41715776

[1]: https://github.com/sourcegraph/sourcegraph-public-snapshot/c...

[2]: https://github.com/peterldowns

hanwenn 2 hours ago

Hi!

sorry for not responding to your email, I was swamped.

I looked through the sourcecode, but I can only find UI (ie. browser) code. Does this do anything beyond delivering a more functional and prettier UI on top of an existing zoekt deployment? If no, everybody would be better served if you tried to improve the UI inside Zoekt, which currently is a live demonstration of (my lack of) web app programming skills.

Have you thought of how you will achieve your further goals (eg. semantic search)? That will require server-side changes, but you currently have no Go code at all.

morgante 14 hours ago

Awesome to see another open source player in the space, especially after Sourcegraph went closed source.

It looks like you're working on this full-time (and it's a lot of work to build great code search, as I know from working on my own product).

What are your plans for monetizing / building a sustainable business without inevitably going closed source like Sourcegraph?

  • bshzzle 12 hours ago

    Currently, we don't have any plans of monetizing - the main focus for us right now is building something that people want to use :)

    • peterldowns 10 hours ago

      Do you plan on eventually attempting to monetize in some way, or is this open source as in free software as in you legitimately are just creating a new open source project?

      I understand intentions can change, but there's a difference, and I'm curious to know the answer.

threecheese 15 hours ago

Regarding your response to “why not use an IDE?”; do you have any other product-like use cases interest you? The one you mention - search across many repositories - makes a lot of sense for organizations with (for example) a Github Enterprise installation and want to investigate or make changes across multiple components. This is definitely relevant to me, and so I wonder what other cool things can I do with it?

  • bshzzle 15 hours ago

    I think in the immediate term, we would like to talk to as many people as we can that have this "search across many repos" problem such that we can dial in the core search experience.

    Looking beyond the immediate, I think there is allot of fertile ground with respect to making engineering teams more efficient beyond just regular code search. Semantic code search for example is one of those features that I really wish I had when I was at my last job - would have made onboarding onto new codebases much easier.

    Would love to hear more about your use cases: brendan@sourcebot.dev

maxloh 4 hours ago

Why not just fork Sourcegraph, instead of building the product from the ground up?

zdw 11 hours ago

Does this make a copy of each repo on ingest?

Can it work against in-place repos, for example if hosted on the same server as a code forge installation?

  • bshzzle 10 hours ago

    Yea exactly - on ingest it clones the repos and will periodically fetch new revisions.

    Currently we don't support in-place repos, but feel free to file a issue and we'd be happy to take a look.

schreiaj 6 hours ago

Can you add repos after starting the container? What about persisting indexes across restarts?

Still, neat. Glad to have an easy to deploy open source tool like this.

jmakov 16 hours ago

Can somebody share the use case of this? Why not just use your IDE?

  • bshzzle 16 hours ago

    yea it's a fair question - an IDE is often more convenient when you have the code checked-out locally. This becomes a pain when you work in a organization with potentially hundreds of repositories that you need to search across (e.g., a org stores their 100+ microservices in separate repos, and you need to find all places where they make a request to your service).

    • Hackbraten 3 hours ago

      I use ghorg in tandem with ripgrep to address that problem. The former is for checking out the main branches of all repositories, the latter to perform the actual search.

  • eptcyka 14 hours ago

    I cannot run Xcode on Linux, I cannot run Visual Studio on Linux, I might not have an IDE set up for the language that I want to inspect. Many reasons. Also, some languages practically require arbitrary code execution to make a build, which I'd much prefer to shove into an isolated VM.

  • metadaemon 15 hours ago

    Finding examples of how others implement similar logic is my biggest use case for code searching, but since GitHub copied SourceGraph, I don't have much of a need for these self-hosted solutions.

planb 15 hours ago

Great work! Any plans to add Gitea/Forgejo (self-hosted) support?

ergocoder 8 hours ago

What a milestone. SourceGraph is big enough to have its own open source clone

cprogrammer 10 hours ago

Does it support Perforce? i couldn't find it in the schema in the repo.

  • bshzzle 8 hours ago

    No just GitLab and GitHub atm - but please feel free to file an issue for Perforce support.

TavsiE9s 14 hours ago

Any plans for non Github/Gitlab integrations? Gitea/Gogs/etc. maybe?

  • bshzzle 14 hours ago

    yes definitely - mind opening a issue so we can track it?

ashobeiri 17 hours ago

This is really exciting. Happy to see someone building an open source solution in this space

j4coh 17 hours ago

Cool to see someone carrying on the dream after SourceGraph lost their way.

  • bastawhiz 13 hours ago

    I haven't followed SG closely. Other than licensing, what have they done to fall out of favor?

    • Starlevel004 11 hours ago

      They started aggressively pushing their (bad) copilot competitor.

      • Squarex 29 minutes ago

        What's wrong with Cody? I find it better than Copilot.

mattfat5 16 hours ago

This is well done thanks for the share.

IshKebab 14 hours ago

Nice! Still not quite as good as grep.app from an interface point of view. They have instant search-as-you-type results over all of GitHub.

It's not open source but I use it all the time. Far superior to Github's search.

  • richardw 12 hours ago

    Anyone know how companies like this maintain tabs on so much of the GitHub repos? I assume very distributed crawling/cloning.

asdev 13 hours ago

sourcegraph is dead with advent of LLMs and AI coding tools right? Github cross repo search is also not bad anymore

  • esafak 12 hours ago

    Wrong. Unless you want to feed the LLM your entire codebase, which is usually infeasible, you need to be able to retrieve relevant context, which relies on understanding the codebase, as Sourcegraph does. Sourcegraph has a product that does precisely this, called Cody.