Page MenuHomePhabricator

Create search engine mappings and indexing data for images
Closed, ResolvedPublic

Description

Create fields in index required for storing image metadata and code that indexes relevant data.

Event Timeline

Change 310472 had a related patch set uploaded (by Smalyshev):
[WIP] Create fields & data for image data

https://gerrit.wikimedia.org/r/310472

@dcausse I wonder whether we should put limit on file text. Or at least some special options on file text field? It can be huge, e.g. for large PDFs. Is it OK to keep it all and do we need any special options?

I think we already index PDF no?
Basically the only limit we have is the request body size sent to elastic which is limited to 100mb by default, the http request will fail if we try to send 100mb json to elastic.
I don't know if we need to worry here, I've seen lucene able to index pdf with more than 700pages of text...
Maybe we already have a limit on the mysql blob?

I'm not worried too much about mysql here, if Elastic would be fine then it's fine.

Change 310472 merged by jenkins-bot:
Create fields & data for image/file data indexing

https://gerrit.wikimedia.org/r/310472