R: Performing Gradient Descent Directly on Functions Instead of Data Points from the Function-CodePudding

I am using the R programming language. Using this link over here

x = seq(1,1000, by=1)
y = (x^3) - (2*x) -5

gdec.eta1 = gradientR(y = y, X = x, eta = 100, iters = 1000)

However, I got the following error:

Error in if (sqrt(sum(grad^2)) <= epsilon) { : 
  missing value where TRUE/FALSE needed

Can someone please show me what I am doing wrong? Why this is error being produced?

And does anyone know if some other gradient descent functions in R would allow you to "directly" optimize this function instead of generating points from this function?

Something like this:

func2 <- function(x) {
  x^3 - 2* x - 5
}

gradientR(func2, eta = 100, iters = 1000)

Does anyone know if this is possible?

Thanks!

Note: This works for the following example (from the website linked above):

> y = rnorm(n = 10000, mean = 0, sd = 1)
> x1 = rnorm(n = 10000, mean = 0, sd = 1)
> x2 = rnorm(n = 10000, mean = 0, sd = 1)
> x3 = rnorm(n = 10000, mean = 0, sd = 1)
> x4 = rnorm(n = 10000, mean = 0, sd = 1)
> x5 = rnorm(n = 10000, mean = 0, sd = 1)
> 
> ptm <- proc.time()
> gdec.eta1 = gradientR(y = y, X = data.frame(x1,x2,x3, x4,x5), eta = 100, iters = 1000)
[1] "Initialize parameters..."
[1] "Algorithm converged"
[1] "Final gradient norm is 9.80308529574335e-05"

I just don't know why it doesn't work for my example.

CodePudding user response：

We assume that gradientR solves the problem you have and the question is getting it to work with your input. There are several problems here:

One cannot pass a function to the gradientR. y and X must be vectors or matrices.
Many optimization problems have scaling issues if you give them numbers that are very different. This one is no different. Use x/1000 instead of x.
Just to be clear on what the underlying problem being solved is, it is to find a coefficient vector b such that the sum of the squares of the residual vector y - cbind(1, x) %% b is minimized where y and x are known. Several commenters interpreted the problem differently but if their interpretation is what you want then gradientR is not applicable and , anyways, the problem of maximizing or minimizing x^3-2x-5 has no finite solutions.
If you want to pass func instead of y then just write a simple wrapper, grad2, as shown below.

To fix the scaling issue use it with x/1000 and y as shown.

x = seq(1, 1000, by = 1) / 1000
y = (x^3) - (2*x) -5

gdec.eta1 = gradientR(y = y, X = x, eta = 100, iters = 1000)
str(gdec.eta1)
## List of 2
##  $ coef  : num [1:2, 1] -5.2 -1.1
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:2] "rep.1..length.y.." "X"
##   .. ..$ : NULL
##  $ l2loss: num [1:260] 101.44 50.34 25.5 13.83 8.86 ...

# check that lm gives the same coefficients
coef(lm(y ~ x))
## (Intercept)           x 
##   -5.200701   -1.098500

Now define a function which takes func rather than y. func must be such that func(x) is y. We test it out at the end and it gives the same result.

# func must be such that func(X) gives Y
grad2 <- function(func, X, ...) gradientR(func(X), X, ...)

# test
x = seq(1, 1000, by = 1) / 1000
func <- function(x) (x^3) - (2*x) -5

grad2 <- function(func, ...) gradientR(func(x), ...)
gdec.etal2 <- grad2(func, x, eta = 100, iters = 1000)
str(gdec.etal2)
## List of 2
##  $ coef  : num [1:2, 1] -5.2 -1.1
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:2] "rep.1..length.y.." "X"
##   .. ..$ : NULL
##  $ l2loss: num [1:238] 141.66 69.93 34.71 17.59 9.57 ...