pages-server/server/gitea/cache.go

157 lines
4.2 KiB
Go
Raw Normal View History

package gitea
import (
"bytes"
"fmt"
"io"
"net/http"
"strconv"
"strings"
"time"
"github.com/rs/zerolog/log"
"codeberg.org/codeberg/pages/server/cache"
)
const (
// defaultBranchCacheTimeout specifies the timeout for the default branch cache. It can be quite long.
defaultBranchCacheTimeout = 15 * time.Minute
// branchExistenceCacheTimeout specifies the timeout for the branch timestamp & existence cache. It should be shorter
// than fileCacheTimeout, as that gets invalidated if the branch timestamp has changed. That way, repo changes will be
// picked up faster, while still allowing the content to be cached longer if nothing changes.
branchExistenceCacheTimeout = 5 * time.Minute
// fileCacheTimeout specifies the timeout for the file content cache - you might want to make this quite long, depending
// on your available memory.
// TODO: move as option into cache interface
fileCacheTimeout = 5 * time.Minute
// ownerExistenceCacheTimeout specifies the timeout for the existence of a repo/org
ownerExistenceCacheTimeout = 5 * time.Minute
// fileCacheSizeLimit limits the maximum file size that will be cached, and is set to 1 MB by default.
fileCacheSizeLimit = int64(1000 * 1000)
)
type FileResponse struct {
Exists bool
IsSymlink bool
ETag string
MimeType string
Body []byte
}
func FileResponseFromMetadataString(metadataString string) FileResponse {
parts := strings.Split(metadataString, "\n")
res := FileResponse{
Exists: parts[0] == "true",
IsSymlink: parts[1] == "true",
ETag: parts[2],
MimeType: parts[3],
}
return res
}
func (f FileResponse) MetadataAsString() string {
return strconv.FormatBool(f.Exists) + "\n" +
strconv.FormatBool(f.IsSymlink) + "\n" +
f.ETag + "\n" +
f.MimeType + "\n"
}
func (f FileResponse) IsEmpty() bool {
FIX blank internal pages (#164) (#292) Hello 👋 since it affected my deployment of the pages server I started to look into the problem of the blank pages and think I found a solution for it: 1. There is no check if the file response is empty, neither in cache retrieval nor in writing of a cache. Also the provided method for checking for empty responses had a bug. 2. I identified the redirect response to be the issue here. There is a cache write with the full cache key (e. g. rawContent/user/repo|branch|route/index.html) happening in the handling of the redirect response. But the written body here is empty. In the triggered request from the redirect response the server then finds a cache item to the key and serves the empty body. A quick fix is the check for empty file responses mentioned in 1. 3. The decision to redirect the user comes quite far down in the upstream function. Before that happens a lot of stuff that may not be important since after the redirect response comes a new request anyway. Also, I suspect that this causes the caching problem because there is a request to the forge server and its error handling with some recursions happening before. I propose to move two of the redirects before "Preparing" 4. The recursion in the upstream function makes it difficult to understand what is actually happening. I added some more logging to have an easier time with that. 5. I changed the default behaviour to append a trailing slash to the path to true. In my tested scenarios it happened anyway. This way there is no recursion happening before the redirect. I am not developing in go frequently and rarely contribute to open source -> so feedback of all kind is appreciated closes #164 Reviewed-on: https://codeberg.org/Codeberg/pages-server/pulls/292 Reviewed-by: 6543 <6543@obermui.de> Reviewed-by: crapStone <codeberg@crapstone.dev> Co-authored-by: Hoernschen <julian.hoernschemeyer@mailbox.org> Co-committed-by: Hoernschen <julian.hoernschemeyer@mailbox.org>
2024-02-26 23:21:42 +01:00
return len(f.Body) == 0
}
func (f FileResponse) createHttpResponse(cacheKey string) (header http.Header, statusCode int) {
header = make(http.Header)
if f.Exists {
statusCode = http.StatusOK
} else {
statusCode = http.StatusNotFound
}
if f.IsSymlink {
header.Set(giteaObjectTypeHeader, objTypeSymlink)
}
header.Set(ETagHeader, f.ETag)
header.Set(ContentTypeHeader, f.MimeType)
header.Set(ContentLengthHeader, fmt.Sprintf("%d", len(f.Body)))
header.Set(PagesCacheIndicatorHeader, "true")
log.Trace().Msgf("fileCache for %q used", cacheKey)
return header, statusCode
}
type BranchTimestamp struct {
Branch string
Timestamp time.Time
}
type writeCacheReader struct {
originalReader io.ReadCloser
buffer *bytes.Buffer
FIX blank internal pages (#164) (#292) Hello 👋 since it affected my deployment of the pages server I started to look into the problem of the blank pages and think I found a solution for it: 1. There is no check if the file response is empty, neither in cache retrieval nor in writing of a cache. Also the provided method for checking for empty responses had a bug. 2. I identified the redirect response to be the issue here. There is a cache write with the full cache key (e. g. rawContent/user/repo|branch|route/index.html) happening in the handling of the redirect response. But the written body here is empty. In the triggered request from the redirect response the server then finds a cache item to the key and serves the empty body. A quick fix is the check for empty file responses mentioned in 1. 3. The decision to redirect the user comes quite far down in the upstream function. Before that happens a lot of stuff that may not be important since after the redirect response comes a new request anyway. Also, I suspect that this causes the caching problem because there is a request to the forge server and its error handling with some recursions happening before. I propose to move two of the redirects before "Preparing" 4. The recursion in the upstream function makes it difficult to understand what is actually happening. I added some more logging to have an easier time with that. 5. I changed the default behaviour to append a trailing slash to the path to true. In my tested scenarios it happened anyway. This way there is no recursion happening before the redirect. I am not developing in go frequently and rarely contribute to open source -> so feedback of all kind is appreciated closes #164 Reviewed-on: https://codeberg.org/Codeberg/pages-server/pulls/292 Reviewed-by: 6543 <6543@obermui.de> Reviewed-by: crapStone <codeberg@crapstone.dev> Co-authored-by: Hoernschen <julian.hoernschemeyer@mailbox.org> Co-committed-by: Hoernschen <julian.hoernschemeyer@mailbox.org>
2024-02-26 23:21:42 +01:00
fileResponse *FileResponse
cacheKey string
cache cache.ICache
hasError bool
doNotCache bool
2024-04-16 22:38:48 +02:00
complete bool
}
func (t *writeCacheReader) Read(p []byte) (n int, err error) {
FIX blank internal pages (#164) (#292) Hello 👋 since it affected my deployment of the pages server I started to look into the problem of the blank pages and think I found a solution for it: 1. There is no check if the file response is empty, neither in cache retrieval nor in writing of a cache. Also the provided method for checking for empty responses had a bug. 2. I identified the redirect response to be the issue here. There is a cache write with the full cache key (e. g. rawContent/user/repo|branch|route/index.html) happening in the handling of the redirect response. But the written body here is empty. In the triggered request from the redirect response the server then finds a cache item to the key and serves the empty body. A quick fix is the check for empty file responses mentioned in 1. 3. The decision to redirect the user comes quite far down in the upstream function. Before that happens a lot of stuff that may not be important since after the redirect response comes a new request anyway. Also, I suspect that this causes the caching problem because there is a request to the forge server and its error handling with some recursions happening before. I propose to move two of the redirects before "Preparing" 4. The recursion in the upstream function makes it difficult to understand what is actually happening. I added some more logging to have an easier time with that. 5. I changed the default behaviour to append a trailing slash to the path to true. In my tested scenarios it happened anyway. This way there is no recursion happening before the redirect. I am not developing in go frequently and rarely contribute to open source -> so feedback of all kind is appreciated closes #164 Reviewed-on: https://codeberg.org/Codeberg/pages-server/pulls/292 Reviewed-by: 6543 <6543@obermui.de> Reviewed-by: crapStone <codeberg@crapstone.dev> Co-authored-by: Hoernschen <julian.hoernschemeyer@mailbox.org> Co-committed-by: Hoernschen <julian.hoernschemeyer@mailbox.org>
2024-02-26 23:21:42 +01:00
log.Trace().Msgf("[cache] read %q", t.cacheKey)
n, err = t.originalReader.Read(p)
2024-04-16 22:38:48 +02:00
if err == io.EOF {
t.complete = true
}
if err != nil && err != io.EOF {
log.Trace().Err(err).Msgf("[cache] original reader for %q has returned an error", t.cacheKey)
t.hasError = true
} else if n > 0 {
if t.buffer.Len()+n > int(fileCacheSizeLimit) {
t.doNotCache = true
t.buffer.Reset()
} else {
_, _ = t.buffer.Write(p[:n])
}
}
return
}
func (t *writeCacheReader) Close() error {
2024-04-16 22:38:48 +02:00
doWrite := !t.hasError && !t.doNotCache && t.complete
FIX blank internal pages (#164) (#292) Hello 👋 since it affected my deployment of the pages server I started to look into the problem of the blank pages and think I found a solution for it: 1. There is no check if the file response is empty, neither in cache retrieval nor in writing of a cache. Also the provided method for checking for empty responses had a bug. 2. I identified the redirect response to be the issue here. There is a cache write with the full cache key (e. g. rawContent/user/repo|branch|route/index.html) happening in the handling of the redirect response. But the written body here is empty. In the triggered request from the redirect response the server then finds a cache item to the key and serves the empty body. A quick fix is the check for empty file responses mentioned in 1. 3. The decision to redirect the user comes quite far down in the upstream function. Before that happens a lot of stuff that may not be important since after the redirect response comes a new request anyway. Also, I suspect that this causes the caching problem because there is a request to the forge server and its error handling with some recursions happening before. I propose to move two of the redirects before "Preparing" 4. The recursion in the upstream function makes it difficult to understand what is actually happening. I added some more logging to have an easier time with that. 5. I changed the default behaviour to append a trailing slash to the path to true. In my tested scenarios it happened anyway. This way there is no recursion happening before the redirect. I am not developing in go frequently and rarely contribute to open source -> so feedback of all kind is appreciated closes #164 Reviewed-on: https://codeberg.org/Codeberg/pages-server/pulls/292 Reviewed-by: 6543 <6543@obermui.de> Reviewed-by: crapStone <codeberg@crapstone.dev> Co-authored-by: Hoernschen <julian.hoernschemeyer@mailbox.org> Co-committed-by: Hoernschen <julian.hoernschemeyer@mailbox.org>
2024-02-26 23:21:42 +01:00
fc := *t.fileResponse
fc.Body = t.buffer.Bytes()
if doWrite {
err := t.cache.Set(t.cacheKey+"|Metadata", []byte(fc.MetadataAsString()), fileCacheTimeout)
if err != nil {
log.Trace().Err(err).Msgf("[cache] writer for %q has returned an error", t.cacheKey+"|Metadata")
}
err = t.cache.Set(t.cacheKey+"|Body", fc.Body, fileCacheTimeout)
FIX blank internal pages (#164) (#292) Hello 👋 since it affected my deployment of the pages server I started to look into the problem of the blank pages and think I found a solution for it: 1. There is no check if the file response is empty, neither in cache retrieval nor in writing of a cache. Also the provided method for checking for empty responses had a bug. 2. I identified the redirect response to be the issue here. There is a cache write with the full cache key (e. g. rawContent/user/repo|branch|route/index.html) happening in the handling of the redirect response. But the written body here is empty. In the triggered request from the redirect response the server then finds a cache item to the key and serves the empty body. A quick fix is the check for empty file responses mentioned in 1. 3. The decision to redirect the user comes quite far down in the upstream function. Before that happens a lot of stuff that may not be important since after the redirect response comes a new request anyway. Also, I suspect that this causes the caching problem because there is a request to the forge server and its error handling with some recursions happening before. I propose to move two of the redirects before "Preparing" 4. The recursion in the upstream function makes it difficult to understand what is actually happening. I added some more logging to have an easier time with that. 5. I changed the default behaviour to append a trailing slash to the path to true. In my tested scenarios it happened anyway. This way there is no recursion happening before the redirect. I am not developing in go frequently and rarely contribute to open source -> so feedback of all kind is appreciated closes #164 Reviewed-on: https://codeberg.org/Codeberg/pages-server/pulls/292 Reviewed-by: 6543 <6543@obermui.de> Reviewed-by: crapStone <codeberg@crapstone.dev> Co-authored-by: Hoernschen <julian.hoernschemeyer@mailbox.org> Co-committed-by: Hoernschen <julian.hoernschemeyer@mailbox.org>
2024-02-26 23:21:42 +01:00
if err != nil {
log.Trace().Err(err).Msgf("[cache] writer for %q has returned an error", t.cacheKey+"|Body")
FIX blank internal pages (#164) (#292) Hello 👋 since it affected my deployment of the pages server I started to look into the problem of the blank pages and think I found a solution for it: 1. There is no check if the file response is empty, neither in cache retrieval nor in writing of a cache. Also the provided method for checking for empty responses had a bug. 2. I identified the redirect response to be the issue here. There is a cache write with the full cache key (e. g. rawContent/user/repo|branch|route/index.html) happening in the handling of the redirect response. But the written body here is empty. In the triggered request from the redirect response the server then finds a cache item to the key and serves the empty body. A quick fix is the check for empty file responses mentioned in 1. 3. The decision to redirect the user comes quite far down in the upstream function. Before that happens a lot of stuff that may not be important since after the redirect response comes a new request anyway. Also, I suspect that this causes the caching problem because there is a request to the forge server and its error handling with some recursions happening before. I propose to move two of the redirects before "Preparing" 4. The recursion in the upstream function makes it difficult to understand what is actually happening. I added some more logging to have an easier time with that. 5. I changed the default behaviour to append a trailing slash to the path to true. In my tested scenarios it happened anyway. This way there is no recursion happening before the redirect. I am not developing in go frequently and rarely contribute to open source -> so feedback of all kind is appreciated closes #164 Reviewed-on: https://codeberg.org/Codeberg/pages-server/pulls/292 Reviewed-by: 6543 <6543@obermui.de> Reviewed-by: crapStone <codeberg@crapstone.dev> Co-authored-by: Hoernschen <julian.hoernschemeyer@mailbox.org> Co-committed-by: Hoernschen <julian.hoernschemeyer@mailbox.org>
2024-02-26 23:21:42 +01:00
}
}
log.Trace().Msgf("cacheReader for %q saved=%t closed", t.cacheKey, doWrite)
return t.originalReader.Close()
}
func (f FileResponse) CreateCacheReader(r io.ReadCloser, cache cache.ICache, cacheKey string) io.ReadCloser {
if r == nil || cache == nil || cacheKey == "" {
log.Error().Msg("could not create CacheReader")
return nil
}
return &writeCacheReader{
originalReader: r,
buffer: bytes.NewBuffer(make([]byte, 0)),
FIX blank internal pages (#164) (#292) Hello 👋 since it affected my deployment of the pages server I started to look into the problem of the blank pages and think I found a solution for it: 1. There is no check if the file response is empty, neither in cache retrieval nor in writing of a cache. Also the provided method for checking for empty responses had a bug. 2. I identified the redirect response to be the issue here. There is a cache write with the full cache key (e. g. rawContent/user/repo|branch|route/index.html) happening in the handling of the redirect response. But the written body here is empty. In the triggered request from the redirect response the server then finds a cache item to the key and serves the empty body. A quick fix is the check for empty file responses mentioned in 1. 3. The decision to redirect the user comes quite far down in the upstream function. Before that happens a lot of stuff that may not be important since after the redirect response comes a new request anyway. Also, I suspect that this causes the caching problem because there is a request to the forge server and its error handling with some recursions happening before. I propose to move two of the redirects before "Preparing" 4. The recursion in the upstream function makes it difficult to understand what is actually happening. I added some more logging to have an easier time with that. 5. I changed the default behaviour to append a trailing slash to the path to true. In my tested scenarios it happened anyway. This way there is no recursion happening before the redirect. I am not developing in go frequently and rarely contribute to open source -> so feedback of all kind is appreciated closes #164 Reviewed-on: https://codeberg.org/Codeberg/pages-server/pulls/292 Reviewed-by: 6543 <6543@obermui.de> Reviewed-by: crapStone <codeberg@crapstone.dev> Co-authored-by: Hoernschen <julian.hoernschemeyer@mailbox.org> Co-committed-by: Hoernschen <julian.hoernschemeyer@mailbox.org>
2024-02-26 23:21:42 +01:00
fileResponse: &f,
cache: cache,
cacheKey: cacheKey,
}
}